Christoph Hochstrasser

XML Abuse

XML Abuse is everywhere, and I’m sick of it. Everytime I have to use Ant, I’m feeling like I’m in XML Hell.
Warning: This can be ranty at times.

It’s everywhere. XML Abuse. From Domain Specific Languages to Data Serialization, XML is the most commonly abused data format I’ve ever encountered.

XML is perfectly fine for (because it was designed for this):

  1. First of all: XML was designed to be written by humans and read by humans. Nearly all generated XML I’ve seen sucks badly. I think this is because XML cannot efficiently represent common data structures found in programming languages.
  2. XML is good for representing trees. If you imagine the outline of a document, you see that this is definitly a tree.
  3. XML is good for creating Markup Languages with it. Markup Languages are intended to be written by humans and can easily validated by a standard validator by utilizing the Document Type Definition (DTD). The DTD essentially describes what combinations of tags and attributes are valid (an XML “Schema”).
    Good examples of this are Atom, RSS and FBML.

You probably shouldn’t use XML:

  1. For Serializing Data (Human Readable): It’s overly verbose, as well as it cannot represent basic data types such as Lists, Maps, Strings and Numbers as efficently as JSON.
  2. For Domain Specific Languages: Fuck Ant. XML is for marking up Documents. It was not designed to represent logic. It’s overly verbose at this task too. Pick some Ruby, Groovy or whatever fits your taste — as long it’s a language meant for programming. Your users will thank you.
  3. When you are not using DTDs: Using XML without a Document Type Definition is like programming without defining any Interfaces. Your XML is impossible to automatically validate and very hard to pick up too.
  4. When you intend to mark up traditional documents and do not need to create your own Elements. HTML is perfectly fine for this as long you ensure it’s valid.
  5. When you are not taking the time to think about how to represent your data in well formed XML, and instead let it generate automatically.


Without pointing a way out of XML Hell, this article would be only a rant. So this are your ways to escape:

  • Data Serialization (Human Readable): use JSON
  • Documents: HTML (valid of course)
  • Domain Specific Languages: use Ruby, Groovy or any programming language
  • Configuration Files: use YAML, .ini, Properties, Ruby, Groovy