Generating UTF-8 with System.Xml.XmlWriter

Today i decided to experiment with XmlWriter. The first i wanted to do was set the Encoding to UTF-8.:

StringBuilder stringBuilder = new StringBuilder();
XmlWriter xmlWriter = XmlWriter.Create(stringBuilder);
xmlWriter.Settings.Encoding = Encoding.UTF8;

When i ran this code i recieved the following exception: XmlException was unhandled: The ‘XmlWriterSettings.Encoding’ property is read only and cannot be set. The documentation for the Settings property clearly says:

The XmlWriterSettings object returned by the Settings property cannot be modified. Any attempt to change individual settings results in an exception being thrown.

So i wrote the following:

StringBuilder stringBuilder = new StringBuilder();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;

XmlWriter xmlWriter = XmlWriter.Create(stringBuilder, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = stringBuilder.ToString();

As you can see: <?xml version=”1.0″ encoding=”utf-16″?><root xmlns=”http://www.timvw.be/ns” /> is still not what i want. Apparently is the Encoding property ignored if the XmlWriter is not using a Stream. So here is my next attempt:

MemoryStream memoryStream = new MemoryStream();
// initialize xmlWriterSettings as above...

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
// call the same operations on the xmlWriter as above...

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

Ok, i’m getting close: ?<?xml version=”1.0″ encoding=”utf-8″?><root xmlns=”http://www.timvw.be/ns” />. Luckily enough i knew that the ? (byte with value 239) at the beginning is the BOM. In order to get rid of that byte i had to create my own instance of UTF8Encoding. Finally, i can present some working code:

MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = new UTF8Encoding(false);
xmlWriterSettings.ConformanceLevel = ConformanceLevel.Document;
xmlWriterSettings.Indent = true;

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

18 thoughts on “Generating UTF-8 with System.Xml.XmlWriter

  1. Good job, really…
    Maybe you should use ToArray instead of GetBuffer in the last line

  2. Maybe you should use ToArray instead of GetBuffer in the last line

    When i wrote that code i also discovered [Tip]:MemoryStream.GetBuffer() vs. MemoryStream.ToArray().. And i decided that i should use reflector to check if this behaviour still exists, but i never did it (and as long i don’t run into weird problems i’ll be probably too lazy to investigate it..) Anyway, as soon as this bites you, you indeed may want to use ToArray intead.

    EDIT (18/12/2007) Well, reviewing this comment, earlier this year it did bite me.. Somehow the buffer was larger than the actual number of bytes so i ended up with a couple of 0 values at the end.. Which is undesirable. Therefor, i will consistently use ToArray from now on :)

  3. Helped a lot. Thanks! I find it strange that UTF8.GetString() can’t cope with the BOM at the start of the array it’s decoding – especially when the BOM is optional when writing the XML with XmlWriter!

Comments are closed.