Daniel Fortunov


 

 Daniel Fortunov's Adventures in Software Development » XML Serialization Bug in .NET 2.0 SP2

 0 Comments- Add comment | Back to Software Development Blog Written on 10-Jul-2009 by asqui

It has not been a good week for me with the System.Xml namespace — I’ve found two bugs in two days! Yesterday’s discovery of poor argument validation in XmlDocument.Load(Stream) was a pretty minor point really, a case of nice framework design style; as a contrast, today I uncovered a bona fide bug with XML Serialization in the XmlSerializer class. This is more serious: a case of compliance with the XML specification!

Overview

When generating an XML serializer for schemas that feature a free-form xs:any node, the deserialization behaviour is incorrect in some scenarios.

For example, if you want the capability to hold an arbitrary fragment of XML configuration that is specified in a client-specific schema (which is not known up front, and cannot be included in your schema) you might address this by including a “freestyle” configuration element like this.

<xs:element name="config" minOccurs="0">
  <xs:complexType mixed="true">
    <xs:sequence minOccurs="0">
      <xs:any processContents="skip" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

If you then use the XmlSerializer to deserialize this configuration, (or the sgen utility, which uses XmlSerializer under the covers) you may run in to the problem detailed below.

Walkthrough

Consider the following pair of documents:

Document A

<Root>
  <FirstChild>
    <config></config>
  </FirstChild>
  <SecondChild/>
</Root>

Document B

<Root>
  <FirstChild>
    <config/>
  </FirstChild>
  <SecondChild/>
</Root>

These two documents should be equivalent. The only difference is between <config></config> and <config/>. Just to make sure I’m not going insane, the XML Specification says that “The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.” So these two documents are slightly different representations of something which should be semantically identical. The output from XML deserialization should be the same for both.

However, this is not the case (at least not in .NET 2.0 SP 2, version 2.0.50727.3053).

Document A is deserialized successfully. The FirstNode and SecondNode elements are non-null in the resulting class instance.

Document B is also deserialized successfully, however the XmlSerializer instance raises its UnknownNode event for the SecondChild node. In the resulting deserialized class instance, the FirstNode is accessible, but the SecondNode element is null!

Root Cause

The reason for this appears to be a bug in the generated serializer class which causes the method responsible for deserializing FirstChild (method Read2_RootFirstChild in the repro solution, linked below) to overrun past the end of the FirstChild element, and consume the SecondChild element from the reader, reporting it as an unrecognised element. (See 2l5qpkfr.0.cs line 251 in the repro solution.)

Subsequently, the method responsible for deserializing SecondChild (method Read1_Object) is unable to deserialize this element because it has already been consumed!

Standalone Reproduction of the problem

Here is a Visual Studio solution with a standalone repro of the problem, including an annotated copy of the generated serialization classes, showing where I believe the bug to be:

XmlSerializationRepro

I was unable to find this bug on Microsoft Connect so I am not sure if this issue is known to Microsoft.

Update (12 July 2009): Reported this on Microsoft Connect.

Send to a friend

Comments

  • There are currently no comments for this post

Leave a Comment









Loading …
  • Server: web2new.webjam.com
  • Total queries:
  • Serialization time: 1482ms
  • Execution time: 2090ms
  • XSLT time: $$$XSLT$$$ms