Generating UTF-8 with System.Xml.XmlWriter

Today i decided to experiment with XmlWriter. The first i wanted to do was set the Encoding to UTF-8.:

StringBuilder stringBuilder = new StringBuilder();
XmlWriter xmlWriter = XmlWriter.Create(stringBuilder);
xmlWriter.Settings.Encoding = Encoding.UTF8;

When i ran this code i recieved the following exception: XmlException was unhandled: The ‘XmlWriterSettings.Encoding’ property is read only and cannot be set. The documentation for the Settings property clearly says:

The XmlWriterSettings object returned by the Settings property cannot be modified. Any attempt to change individual settings results in an exception being thrown.

So i wrote the following:

StringBuilder stringBuilder = new StringBuilder();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;

XmlWriter xmlWriter = XmlWriter.Create(stringBuilder, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = stringBuilder.ToString();

As you can see: <?xml version=”1.0″ encoding=”utf-16″?><root xmlns=”http://www.timvw.be/ns” /> is still not what i want. Apparently is the Encoding property ignored if the XmlWriter is not using a Stream. So here is my next attempt:

MemoryStream memoryStream = new MemoryStream();
// initialize xmlWriterSettings as above...

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
// call the same operations on the xmlWriter as above...

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

Ok, i’m getting close: ?<?xml version=”1.0″ encoding=”utf-8″?><root xmlns=”http://www.timvw.be/ns” />. Luckily enough i knew that the ? (byte with value 239) at the beginning is the BOM. In order to get rid of that byte i had to create my own instance of UTF8Encoding. Finally, i can present some working code:

MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = new UTF8Encoding(false);
xmlWriterSettings.ConformanceLevel = ConformanceLevel.Document;
xmlWriterSettings.Indent = true;

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

This entry was posted on Monday, January 8th, 2007 at 23:59 and is filed under C#, XML. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

13 Responses to “Generating UTF-8 with System.Xml.XmlWriter”

  1. Martin says:

    Good job…

  2. Maciek says:

    Good job, really…
    Maybe you should use ToArray instead of GetBuffer in the last line

  3. admin says:

    Maybe you should use ToArray instead of GetBuffer in the last line

    When i wrote that code i also discovered [Tip]:MemoryStream.GetBuffer() vs. MemoryStream.ToArray().. And i decided that i should use reflector to check if this behaviour still exists, but i never did it (and as long i don’t run into weird problems i’ll be probably too lazy to investigate it..) Anyway, as soon as this bites you, you indeed may want to use ToArray intead.

    EDIT (18/12/2007) Well, reviewing this comment, earlier this year it did bite me.. Somehow the buffer was larger than the actual number of bytes so i ended up with a couple of 0 values at the end.. Which is undesirable. Therefor, i will consistently use ToArray from now on :)

  4. Coen B says:

    Thanks a million!

  5. oykica says:

    Thanks a lot. This really helped me out.

  6. Steve B says:

    Helped a lot. Thanks! I find it strange that UTF8.GetString() can’t cope with the BOM at the start of the array it’s decoding – especially when the BOM is optional when writing the XML with XmlWriter!

  7. Taras says:

    Thanks! Just what I need!

  8. Chris says:

    Fantastic resource! That’s for the work on this.

  9. Pavel says:

    Great article, which has saved much time for me! Thanks!

  10. miraj says:

    excellent!! problem easily solved……thanks

  11. Dino says:

    Thank you! this article helped me to save XML as UTF-8

  12. Ray says:

    Thanks, helped me.

  13. Matze says:

    Thanks, great job!