Deserializing flat XML into .NET objects with C#

I'm sure most developers can agree; it's a love/hate relationship with XML. We all know it, we've all cursed it, and yet it always comes running back.

As a .NET developer, XML is a very familiar format and the unfortunate cause of many merge conflicts. The new Microsoft has chosen to abandon as much XML as they can in favor of JSON, but a majority of the API's out there are still XML based. What happens when those API's don't have a strict contract and return complex types as a flat file? How can we simplify our deserialization and keep our code as clean as possible?

Take the following xml schema, for example:

<?xml version="1.0" encoding="utf-8"?>
<Car>
  <Make></Make>
  <Model></Model>
  <Year></Year>
  <ServiceVisit_Date_1></ServiceVisit_Date_1>
  <ServiceVisit_Description_1></ServiceVisit_Description_1>
  <ServiceVisit_Technician_1></ServiceVisit_Technician_1>
  <ServiceVisit_Cost_1></ServiceVisit_Cost_1>
  <ServiceVisit_Date_2></ServiceVisit_Date_2>
  <ServiceVisit_Description_2></ServiceVisit_Description_2>
  <ServiceVisit_Technician_2></ServiceVisit_Technician_2>
  <ServiceVisit_Cost_2></ServiceVisit_Cost_2>
</Car>

Utilizing the .NET XmlSerializer, we can decorate a C# class file with attributes mapping properties to XML nodes. Similar to:

using System;
using System.Xml.Serialization;

namespace TestApplication
{
    [Serializable]
    [XmlRoot("Car")]
    public class Car
    {
        [XmlElement(ElementName = "Make")]
        public string Make { get; set; }

        [XmlElement(ElementName = "Model")]
        public string Model { get; set; }

        [XmlElement(ElementName = "Year")]
        public string Year { get; set; }
    }
}

We can deserialize our XML feed into this object using JSON.NET:

var car = JsonConvert.DeserializeObject<Car>(xmlFeedFromSource);

This will leave us with a new Car object that contains properties with values for Make, Model and Year, but what about the Service Visits? Considering our feed is a flat file with an index modifier per each item, will JSON.NET be able to handle this? What do we add to our model and what ElementName do we use?

Unfortunately there is no built-in way to handle this scenario (that I am aware of). Recently faced with this scenario, I approached it by re-serializing the XML feed. This would allow me to pass a new feed to my deserialization and result in nicely deserialized, strongly-typed .NET objects.

private static readonly Regex NodeWithPropertyValuesExpression = new Regex("[a-zA-Z]+_[a-zA-Z]+_\\d");
private static readonly Regex NodeWithoutPropertyValuesExpression = new Regex("[a-zA-Z]+_\\d");

    private static string RestructureFlatXmlToTree(string xml)
    {
        var doc = XDocument.Parse(xml);

        var nodesToBeReserialized = IdentifyNodesToBeReserialized(doc);
        var nodeIdentifiers = GetParentNodesForTree(nodesToBeReserialized);

        foreach (var nodeIdentifier in nodeIdentifiers)
        {
            var newlyIdentifiedChildNodes = nodesToBeReserialized.Where(x => x.Name.LocalName.Contains(nodeIdentifier)).ToList();

            if (!newlyIdentifiedChildNodes.Any())
            {
                return xml;
            }

            var reserializedChildNodes = CreateReserializedChildNodes(newlyIdentifiedChildNodes, nodeIdentifier);

            if (doc.Root == null)
            {
                continue;
            }

            doc.Root.Add(new XElement(string.Format("{0}s", nodeIdentifier), reserializedChildNodes));
        }

        return doc.ToString();
    }

    private static IEnumerable<string> GetParentNodesForTree(IEnumerable<XElement> nodesToBeReserialized)
    {
        return nodesToBeReserialized.Select(node => node.Name.LocalName.Split('_')[0]).GroupBy(g => g).Select(x => x.Key).ToList();
    }

    private static List<XElement> IdentifyNodesToBeReserialized(XContainer doc)
    {
        return doc.Descendants().Where(node => NodeWithPropertyValuesExpression.IsMatch(node.Name.LocalName) || NodeWithoutPropertyValuesExpression.IsMatch(node.Name.LocalName)).ToList();
    }

    private static List<XElement> CreateReserializedChildNodes(List<XElement> newlyIdentifiedChildNodes, string nodeIdentifier)
    {
        var reserializedChildNodes = new List<XElement>();

        var newChildNodesCount = newlyIdentifiedChildNodes.Select(node => node.Name.LocalName).Select(x => x.Substring(x.Length - 1, 1)).GroupBy(g => g).Count();

        for (var i = 0; i < newChildNodesCount; i++)
        {
            var itemNumber = (i + 1).ToString();
            var newChildNodeGrouping = newlyIdentifiedChildNodes.Where(x => x.Name.LocalName.Contains(itemNumber)).ToList();

            XElement newItemNode;
            if (NodeWithPropertyValuesExpression.IsMatch(newChildNodeGrouping[0].Name.LocalName))
            {
                var nodeProperties = newChildNodeGrouping.Select(node => node.Name.LocalName.Split('_')[1]).ToList();
                var nestedPropertyNodes = nodeProperties.Select(property => new XElement(property, newChildNodeGrouping.Where(x => x.Name.LocalName.Contains(property)).Select(x => x.Value))).ToList();
                newItemNode = new XElement(nodeIdentifier, nestedPropertyNodes);
            }
            else
            {
                newItemNode = new XElement(nodeIdentifier, newChildNodeGrouping[0].Value);
            }
            reserializedChildNodes.Add(newItemNode);
        }

        return reserializedChildNodes;
    }
}

We can now override the XmlSerializer.Deserialize() method:

public static T Deserialize<T>(this string xml)
{
        var reserializedXml = RestructureFlatXmlToTree(xml);

        var xmlSerializer = new XmlSerializer(typeof (T));
        object obj = null;

        try
        {
            using (var stringReader = new StringReader(reserializedXml))
            {
                obj = xmlSerializer.Deserialize(stringReader);
            }
        }
        catch (Exception ex)
        {
            // Log Error
        }

        if (obj is T)
        {
            return ((T) obj);
        }

        return default(T);
    }

Resulting in a new XML feed with a more complex structure:

<?xml version="1.0" encoding="utf-8"?>
<Car>
  <Make></Make>
  <Model></Model>
  <Year></Year>
  <ServiceVisists>
  	<ServiceVisit>
      <Date></Date>
      <Description></Description>
      <Technician></Technician>
      <Cost></Cost>
    </ServiceVisit>
    <ServiceVisit>
      <Date></Date>
      <Description></Description>
      <Technician></Technician>
      <Cost></Cost>
    </ServiceVisit>
  </ServiceVisits>
</Car>

This now allows us to create an object for ServiceVisit within our project, and add a property to our Car model to deserialize into:

    [Serializable]
    [XmlRoot("ServiceVisit")]
    public class ServiceVisit
    {
        [XmlElement(ElementName = "Date")]
        public string Date { get; set; }

        [XmlElement(ElementName = "Description")]
        public string Description { get; set; }

        [XmlElement(ElementName = "Technician")]
        public string Technician { get; set; }
        
        [XmlElement(ElementName = "Cost")]
        public string Cost { get; set; }
    }

...

    [Serializable]
    [XmlRoot("Car")]
    public class Car
    {
        [XmlElement(ElementName = "Make")]
        public string Make { get; set; }

        [XmlElement(ElementName = "Model")]
        public string Model { get; set; }

        [XmlElement(ElementName = "Year")]
        public string Year { get; set; }
        
        [XmlArray("ServiceVisits")]
        [XmlArrayItem("ServiceVisit")]
        public List<ServiceVisit> ServiceVisits { get; set; }
    }

I'm sure we could genericize this even further, and possibly write a more elegant solution, with dynamics, but given average working constraints, this is the direction I took. I feel as if it is a fairly straightforward solution, let me know what you think.