Generating UTF-8 with System.Xml.XmlWriter
Today i decided to experiment with XmlWriter. The first i wanted to do was set the Encoding to UTF-8.:
StringBuilder stringBuilder = new StringBuilder(); XmlWriter xmlWriter = XmlWriter.Create(stringBuilder); xmlWriter.Settings.Encoding = Encoding.UTF8;
When i ran this code i recieved the following exception: XmlException was unhandled: The ‘XmlWriterSettings.Encoding’ property is read only and cannot be set. The documentation for the Settings property clearly says:
The XmlWriterSettings object returned by the Settings property cannot be modified. Any attempt to change individual settings results in an exception being thrown.
So i wrote the following:
StringBuilder stringBuilder = new StringBuilder();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;
XmlWriter xmlWriter = XmlWriter.Create(stringBuilder, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();
string xmlString = stringBuilder.ToString();
As you can see: <?xml version=”1.0″ encoding=”utf-16″?><root xmlns=”http://www.timvw.be/ns” /> is still not what i want. Apparently is the Encoding property ignored if the XmlWriter is not using a Stream. So here is my next attempt:
MemoryStream memoryStream = new MemoryStream(); // initialize xmlWriterSettings as above... XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings); // call the same operations on the xmlWriter as above... string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());
Ok, i’m getting close: ?<?xml version=”1.0″ encoding=”utf-8″?><root xmlns=”http://www.timvw.be/ns” />. Luckily enough i knew that the ? (byte with value 239) at the beginning is the BOM. In order to get rid of that byte i had to create my own instance of UTF8Encoding. Finally, i can present some working code:
MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = new UTF8Encoding(false);
xmlWriterSettings.ConformanceLevel = ConformanceLevel.Document;
xmlWriterSettings.Indent = true;
XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();
string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());
Good job…
February 6th, 2007 at 15:21
Good job, really…
Maybe you should use ToArray instead of GetBuffer in the last line
February 13th, 2007 at 11:47
When i wrote that code i also discovered [Tip]:MemoryStream.GetBuffer() vs. MemoryStream.ToArray().. And i decided that i should use reflector to check if this behaviour still exists, but i never did it (and as long i don’t run into weird problems i’ll be probably too lazy to investigate it..) Anyway, as soon as this bites you, you indeed may want to use ToArray intead.
EDIT (18/12/2007) Well, reviewing this comment, earlier this year it did bite me.. Somehow the buffer was larger than the actual number of bytes so i ended up with a couple of 0 values at the end.. Which is undesirable. Therefor, i will consistently use ToArray from now on
February 13th, 2007 at 11:48
Thanks a million!
February 14th, 2007 at 19:13
Thanks a lot. This really helped me out.
February 28th, 2007 at 03:13
Helped a lot. Thanks! I find it strange that UTF8.GetString() can’t cope with the BOM at the start of the array it’s decoding – especially when the BOM is optional when writing the XML with XmlWriter!
December 17th, 2007 at 17:09
Thanks! Just what I need!
December 24th, 2007 at 15:53
Fantastic resource! That’s for the work on this.
June 7th, 2008 at 19:39
Great article, which has saved much time for me! Thanks!
August 21st, 2008 at 21:03
excellent!! problem easily solved……thanks
November 18th, 2008 at 20:46
Thank you! this article helped me to save XML as UTF-8
December 12th, 2008 at 04:15
Thanks, helped me.
April 16th, 2009 at 12:14
Thanks, great job!
August 3rd, 2009 at 13:50