Which Qt XML parser are you?

Do you like getting the whole story all at once? Or are you more the kind of person that wants to hear things bit by bit, and think about each piece as it comes, and respond a little at a time?

That’s essentially the question to ask when choosing an XML parser to use in Qt. There’s the QDomDocument, which parses your XML document into a fully-navigable document object model (DOM) in memory that you can query, traverse, and change. And then there’s the QXmlStreamReader parser, which reads from a stream and lets you make incremental queries for entities and attributes as you go along. Finally, there’s also the QXmlSimpleReader parser, a simple SAX-like parser that can read the whole document in one shebang or incrementally and calls callbacks on objects you register when the parser encounters entities and attributes.

Which should you use in your project?

There’s a tradeoff between the simplicity of your code and the amount of memory you’re willing to sacrifice at run-time. DOM-based parsing as provided by QDomDocument is awesome, but comes at a cost—the parser has to churn over the entire document as it creates the DOM, and essentially mirrors the document in RAM in with its DOM. The streaming parser is great for larger documents, but requires that you build more structure into your parser, using that structure to mirror the internal hierarchy in your XML document.

For years, I’ve been an avid proponent of using stream-based parsers—readers of Dan’s and my book will remember us exhorting you to use stream-based parsers for mobile, because you can read the data as it comes over the stream and build your document representation in core as efficiently as possible. In a past life, I actually wrote a streaming XML parser for Qualcomm BREW, again adhering to the doctrine that you should hold as little in memory as possible, and process what comes in over the air (or off the wire) as quickly as possible, rather than buffering and processing. If you’re parsing a very large document, or if you’re working on very small devices (think legacy Symbian or Maemo), streaming is definitely the way to go.

While I still think for the large majority of applications that’s probably still the best approach, I had the opportunity recently to use a DOM parser in a commercial setting on a project, and wow, was it easy! In my case, I was parsing small documents—typically well under fifty kb—walking the DOM quickly and reducing their structure to a compact in-memory representation and discarding the DOM. I could have done the same thing with a streaming parser, but here the time it took me to write the code was of the essence; we wanted to get something up and running as quickly as possible. And this was a fast way to do it.

Someday, I’d like to write a benchmark app that uses all three methods to parse a longish (say, 250 kb) document and report on both performance and heap usage. It would also be interesting to see download-and-parse times for the three approaches over a benchmark network like 3G cellular, because depending on the application and document size, network performance becomes a factor, too. Until I can point at hard numbers, though, I’d advise you to consider both the DOM and streaming approach, and choose the approach that best balances network throughput, memory usage, and the amount of time and effort it takes you to get your project off the ground.