受欢迎的博客标签

XDocument.ToString() throwing hexadecimal value 0x13, is an invalid character exception

Published

XDocument.ToString() throwing “' ', hexadecimal value 0x13, is an invalid character.” exception
Ask Question

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root [
<!ELEMENT root (Person*)>
<!ELEMENT Book (Name*, Description*)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Description (#PCDATA)>
<!ENTITY egrave "&#232;">
<!ENTITY eacute "&#233;">
<!ENTITY euro "&#8364;">
]>
<root>
<Book>
<Name>Hey Hey Latte</Name>
<Description>About the caf&eacute; culture.</Description>
</Book>
<Book>
<Name>How to Make Friends</Name>
<Description>The power of &euro;100.00.</Description>
</Book>
</root>

Different ways how to escape an XML string in C#

XML encoding is necessary if you have to save XML text in an XML document. If you don't escape special chars the XML to insert will become a part of the original XML DOM and not a value of a node.

Escaping the XML means basically replacing 5 chars with new values.

These replacements are:

<->&lt;
>->&gt;
"->&quot;
'->&apos;
&->&amp;

 

Here are 4 ways you can encode XML in C#:

1. string.Replace() 5 times

This is ugly but it works. Note that Replace("&", "&amp;") has to be the first replace so we don't replace other already escaped &.

<!–

 

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/

–>string xml = "<node>it's my &#34;node&#34; & i like it<node>"; encodedXml = xml.Replace("&", "&amp;").Replace("<", "&lt;").Replace(">", "&gt;").Replace("&#34;", "&quot;").Replace("'", "&apos;");

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;

 

2. System.Web.HttpUtility.HtmlEncode()

Used for encoding HTML, but HTML is a form of XML so we can use that too. Mostly used in ASP.NET apps. Note that HtmlEncode does NOT encode apostrophes ( ' ).

<!–

 

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/

–>string xml = "<node>it's my &#34;node&#34; & i like it<node>"; string encodedXml = HttpUtility.HtmlEncode(xml);

// RESULT: &lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;

 

3. System.Security.SecurityElement.Escape()

In Windows Forms or Console apps I use this method. If nothing else it saves me including the System.Web reference in my projects and it encodes all 5 chars.

<!–

 

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/

–>string xml = "<node>it's my &#34;node&#34; & i like it<node>"; string encodedXml = System.Security.SecurityElement.Escape(xml);

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;

 

4. System.Xml.XmlTextWriter

Using XmlTextWriter you don't have to worry about escaping anything since it escapes the chars where needed. For example in the attributes it doesn't escape apostrophes, while in node values it doesn't escape apostrophes and qoutes.

<!–

 

Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/

–>string xml = "<node>it's my &#34;node&#34; & i like it<node>"; using (XmlTextWriter xtw = new XmlTextWriter(@"c:\xmlTest.xml", Encoding.Unicode)) { xtw.WriteStartElement("xmlEncodeTest"); xtw.WriteAttributeString("testAttribute", xml); xtw.WriteString(xml); xtw.WriteEndElement(); }

// RESULT: / <xmlEncodeTest testAttribute="&lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;"> &lt;node&gt;it's my "node" &amp; i like it&lt;node&gt; </xmlEncodeTest> /

 

Each of the four ways is different, so use each one where you fell appropriate. You can't go wrong with SecurityElement though. :)

 

 

kick it on DotNetKicks.com
 

 

Legacy Comments


Homer S
2008-10-22
re: Different ways how to escape an XML string in C#
Good post. I did not know about System.Security.SecurityElement.Encode().

Foo JH
2008-12-05
re: Different ways how to escape an XML string in C#
Thanks for the breakdown. I like to have optons.

Sibaram Pala
2008-12-22
re: Different ways how to escape an XML string in C#
It's Good

Rich Trawinski
2009-03-05
re: Different ways how to escape an XML string in C#
I can't stop XmlText from converting chars.. I'm to the point of automating a query replace after my XML doc is created, All help is greatly apprecaited.. In my code snipet below i need to maintain the < > chars in my CData..
XmlElement fieldNodeBody = xmlDoc.CreateElement("field");
fieldNodeBody.SetAttribute("name", "body");
XmlText fieldNodeBodyText = xmlDoc.CreateTextNode("field");
fieldNodeBodyText.Value = "<![CDATA[" + adoDR[5].ToString() + "]]>";
fieldNodeBody.AppendChild(fieldNodeBodyText);
contentNode.AppendChild(fieldNodeBody);
Thanks
rich

Charles Young
2009-03-11
re: Different ways how to escape an XML string in C#
Another way is simply to use the InnerText and InnerXml properties of an XmlDOM node.

string xml = "<node>it's my \"node\" & i like it<node>";
XmlDocument xDoc = new XmlDocument();
XmlElement xElem = xDoc.CreateElement("Content");
xElem.InnerText = xml;
MessageBox.Show(xElem.InnerXml); // Get escaped content
MessageBox.Show(xElem.InnerText); // Get XML

iouri
2009-04-19
re: Different ways how to escape an XML string in C#
The most straightforward and safe way is to use XmlWriter.

string xml = "<node>it's my \"node\" & i like it<node>";
StringBuilder encodedString = new StringBuilder (xml.Length);
using (var writer = XmlWriter.Create(encodedString))
{
writer.WriteString(xml);
}

using XmlDocument is costly, and using HtmlEncode is incorrect - HTML will not encode ' and WILL encode e as &eacute;, etc.

Chirag
2009-06-22
re: Different ways how to escape an XML string in C#
Thanks keep it up :)

Javier Callico
2009-06-26
re: Different ways how to escape an XML string in C#
I personally like the simplicity of solution # 1 but wouldn't want to create a dependency from System.Security just to use the System.Security.SecurityElement.Escape(string) method.

I reflected this method and it does something similar to the solution #1.

private static readonly char[] s_escapeChars = new char[] { '<', '>', '"', '\'', '&' };
private static readonly string[] s_escapeStringPairs = new string[] { "<", "&lt;", ">", "&gt;", "\"", "&quot;", "'", "&apos;", "&", "&amp;" };

/// <summary>
/// Escapes the specified text.
/// </summary>
/// <param name="str">The text to escape.</param>
/// <returns>An escaped string.</returns>
public static string Escape(string str)
{
if (str == null)
{
return null;
}
StringBuilder builder = null;
int length = str.Length;
int startIndex = 0;
while (true)
{
int num2 = str.IndexOfAny(s_escapeChars, startIndex);
if (num2 == -1)
{
if (builder == null)
{
return str;
}
builder.Append(str, startIndex, length - startIndex);
return builder.ToString();
}
if (builder == null)
{
builder = new StringBuilder();
}
builder.Append(str, startIndex, num2 - startIndex);
builder.Append(GetEscapeSequence(str[num2]));
startIndex = num2 + 1;
}
}

private static string GetEscapeSequence(char c)
{
int length = s_escapeStringPairs.Length;
for (int i = 0; i < length; i += 2)
{
string str = s_escapeStringPairs[i];
string str2 = s_escapeStringPairs[i + 1];
if (str[0] == c)
{
return str2;
}
}
return c.ToString();
}

Anon
2009-07-22
re: Different ways how to escape an XML string in C#
Interestingly Linq Xml does not escape the single quote. The in an XAttribute value
This input into XAttribute: <>'&" (technically \") became &lt;&gt;'&amp;&quot;

I don't know if this deliberate or not.

Daniel
2009-09-11
re: Different ways how to escape an XML string in C#
One thing that none of these methods handle are special characters below character 30 in the ASCII character set. For instance, char 0x1A is not accepted by most XML parsers. I've found the most accurate way of encoding this (albeit probably the slowest) is the following escape function:

using System.Xml

public class Tools {

private static XmlNode _xmlNode;

public static string EscapeXml(string text)
{
if (_xmlNode == null)
{
XmlDocument doc = new XmlDocument();
_xmlNode = doc.CreateNode("text", "mynode", "");
}
_xmlNode.InnerText = text;
return _xmlNode.OuterXml;
}

}

Prakash
2009-11-25
re: Different ways how to escape an XML string in C#
Really good one.Actually "HtmlEncode" does NOT encode the single quotes.but SecurityElement.Escape works fine.


Thanks,
prakash

Roger
2010-01-22
re: Different ways how to escape an XML string in C#
There is another way, safer and cheaper. use CDATA tag: wrap text with <![CDATA[ and ]]>

aaa
2010-01-31
re: Different ways how to escape an XML string in C#
1. CDATA? But then you must escape ]]> in your text.

2. SecurityElement.Escape() - It does not escape all characters, for example those generated by using linux console with putty.

System.Xml.XmlException: '', hexadecimal value 0x1B, is an invalid character. Line 26, position 43.

niall
2010-03-16
re: Different ways how to escape an XML string in C#
I've tried all and none work for this particular character 0x1B.

Anyone have any ideas?

I don't understand why.

I would have thought the CDATA would have ensure the character would be ignored but not so.

I've tried all with .NETs XmlWriter.WriteElementString("Error", result);



 

 

 

 

            var xmlstring = document.ToString();

            
            xmlstring = SanitizeXmlString(xmlstring);

            return XmlHelper.XmlDecode(xmlstring);

 

  /*ArgumentException: '', hexadecimal value 0x1B, is an invalid character. */
        /// <summary>
        /// Remove illegal XML characters from a string.
        /// </summary>
        public string SanitizeXmlString(string xml)
        {
            if (string.IsNullOrEmpty(xml))
            {
                return xml;
            }

            var buffer = new StringBuilder(xml.Length);

            foreach (char c in xml)
            {
                if (IsLegalXmlChar(c))
                {
                    buffer.Append(c);
                }
            }

            return buffer.ToString();
        }

        /// <summary>
        /// Whether a given character is allowed by XML 1.0.
        /// </summary>
        public bool IsLegalXmlChar(int character)
        {
            return
            (
                 character == 0x9 /* == '/t' == 9   */        ||
                 character == 0xA /* == '/n' == 10  */        ||
                 character == 0xD /* == '/r' == 13  */        ||
                 
                (character >= 0x20 && character <= 0xD7FF) ||
                (character >= 0xE000 && character <= 0xFFFD) ||
                (character >= 0x10000 && character <= 0x10FFFF)
            );
        }