«    »

Simple XML Parsing using JAXB

I recently needed to parse a XML file using Java for a utility I was writing. A couple of years ago I used dom4j to parse XML (and wrote an article about it). I figured there had to be a more modern approach, similar to how Hibernate 3.0 can map POJOs (plain old Java objects) to relational database tables using Java annotations. To my surprise, a brief search on-line turned up nothing. I was sure I had heard of this being done, so I continued searching. I finally found an article with a comment by zaeffi that finally enlightened me. The answer was to use JAXB 2.0.

I had already looked at JAXB, but was turned off by the apparent requirement to generate the Java classes from a XML Schema definition of the XML format - I already had my Java POJOs and did not want to be bothered to write a schema definition. It turns out that JAXB does support my use case. The online documentation, unfortunately, seems strongly biased towards web services developers who more than likely do start with a schema definition, and possibly even have tooling that hides the use of JAXB entirely. I was also misled by older documentation referring to JAXB version 1, which does not support annotations and generates (from the schema) very ugly code that is difficult to maintain. This was fixed in JAXB version 2, which does support the use of annotations (and incidentally generates much cleaner code).

Using that article as inspiration, I wrote a helper class for parsing XML into objects and vice-versa. The code for this helper class is shown below:

// Copyright 2008 by Basil Vandegriend.  All rights reserved.

package com.basilv.examples.jaxb;

import java.io.Reader;
import java.io.Writer;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import javax.xml.bind.*;

import org.w3c.dom.Node;

/**
 * Tools for working with the JAXB (XML Binding) library.
 */
public class XmlBindingTools
{
  /**
   * Parse the XML supplied by the reader into the
   * corresponding tree of Java objects.
   * 
   * @param reader Cannot be null. The source of the XML.
   * @param rootElementClass Cannot be null. The type of the
   *        root element.
   * @return the Java object that is the root of the tree,
   *         of type rootElement.
   * @throws JAXBException if an error occurs parsing the
   *         XML.
   */
  @SuppressWarnings("unchecked")
  public static <E extends Object> E parseXML(
    Reader reader, Class<E> rootElementClass)
    throws JAXBException {

    if (rootElementClass == null) 
      throw new IllegalArgumentException("rootElementClass is null");
    if (reader == null) 
      throw new IllegalArgumentException("reader is null");

    JAXBContext context = JAXBContext.newInstance(rootElementClass);
    Unmarshaller unmarshaller = context.createUnmarshaller();

    CollectingValidationEventHandler handler = 
      new CollectingValidationEventHandler();
    unmarshaller.setEventHandler(handler);

    E object = (E) unmarshaller.unmarshal(reader);
    if (!handler.getMessages().isEmpty()) {
      String errorMessage = "XML parse errors:";
      for (String message : handler.getMessages()) {
        errorMessage += "\n" + message;
      }
      throw new JAXBException(errorMessage);
    }

    return object;
  }

  /**
   * Generate XML using the supplied root element as the
   * root of the object tree and write the resulting XML to
   * the specified writer
   * 
   * @param rootElement Cannot be null.
   * @param writer Cannot be null.
   * @throws JAXBException
   */
  public static void generateXML(Object rootElement,
    Writer writer) throws JAXBException {

    if (rootElement == null) 
      throw new IllegalArgumentException("rootElement is null");
    if (writer == null) 
      throw new IllegalArgumentException("writer is null");

    JAXBContext context = JAXBContext.newInstance(rootElement.getClass());
    Marshaller marshaller = context.createMarshaller();
    marshaller.setProperty(
      Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
    marshaller.marshal(rootElement, writer);
  }

  private static class CollectingValidationEventHandler 
    implements ValidationEventHandler
  {
    private List<String> messages = new ArrayList<String>();

    public List<String> getMessages() {
      return messages;
    }
    
    public boolean handleEvent(ValidationEvent event) {
      if (event == null) 
        throw new IllegalArgumentException("event is null");

      // calculate the severity prefix and return value
      String severity = null;
      boolean continueParsing = false;
      switch (event.getSeverity()) {
        case ValidationEvent.WARNING:
          severity = "Warning";
          continueParsing = true; // continue after warnings
          break;
        case ValidationEvent.ERROR:
          severity = "Error";
          continueParsing = true; // terminate after errors
          break;
        case ValidationEvent.FATAL_ERROR:
          severity = "Fatal error";
          continueParsing = false; // terminate after fatal errors
          break;
        default:
          assert false : "Unknown severity.";
      }

      String location = getLocationDescription(event);
      String message = severity + " parsing " + location 
        + " due to " + event.getMessage(); 
      messages.add(message);
      
      return continueParsing;
    }

    private String getLocationDescription(ValidationEvent event) {
      ValidationEventLocator locator = event.getLocator();
      if (locator == null) {
        return "XML with location unavailable";
      }

      StringBuffer msg = new StringBuffer();
      URL url = locator.getURL();
      Object obj = locator.getObject();
      Node node = locator.getNode();
      int line = locator.getLineNumber();

      if (url != null || line != -1) {
        msg.append("line " + line);
        if (url != null) msg.append(" of " + url);
      } else if (obj != null) {
        msg.append(" obj: " + obj.toString());
      } else if (node != null) {
        msg.append(" node: " + node.toString());
      }

      return msg.toString();
    }    
  }  
}

One important refinement I made to the code was to add better error handling. Parser warnings and errors are collected along with location information (such as line numbers) and the caller notified via an exception. This was necessary because the JAXB default error handling simply wrote messages to the console and reported nothing back to the caller. Due to my basic needs, I simply collected these errors and warnings into string messages. If your error-handling requirements are more sophisticated then you can collect the errors and warnings into proper objects that are returned back to the caller, who can then traverse the information and decide how to report it.

The unit test for this helper class is shown below. It includes a utility class with JAXB annotations to map it to XML that the tests use to exercise the functionality of the helper class. The last test exercises (in a trivial way) the additional error-handling functionality.

// Copyright 2008 by Basil Vandegriend.  All rights reserved.

package com.basilv.examples.jaxb;

import static org.junit.Assert.*;

import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.bind.JAXBException;
import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlRootElement;

import org.junit.Test;

public class XmlBindingToolsTest
{
  @XmlRootElement(name = "xmltest")
  static class XmlTest
  {
    private String id;

    @XmlAttribute
    public String getId() {
      return id;
    }

    public void setId(String id) {
      this.id = id;
    }

  }

  @Test
  public void parseXml() throws Exception {

    String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
      + "<xmltest id=\"test\"/>";

    StringReader reader = new StringReader(xml);
    XmlTest xmlTest = XmlBindingTools.parseXML(reader,
      XmlTest.class);
    assertNotNull(xmlTest);
    assertEquals("test", xmlTest.getId());
  }

  @Test
  public void generateXml() throws Exception {
    XmlTest xmlTest = new XmlTest();
    xmlTest.setId("test");
    StringWriter writer = new StringWriter();
    XmlBindingTools.generateXML(xmlTest, writer);
    assertEquals(
      "<?xml version=\"1.0\" encoding=\"UTF-8\" "
        + "standalone=\"yes\"?>\n<xmltest id=\"test\"/>\n",
      writer.getBuffer().toString());
  }

  @Test
  public void parseInvalidXml() throws Exception {

    String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
      + "<xmltest id=\"test\"><fake/></xmltest>";

    StringReader reader = new StringReader(xml);
    try {
      XmlBindingTools.parseXML(reader, XmlTest.class);
      fail("Expected Exception");
    } catch (JAXBException e) {
      // Expected case.
    }
  }

}

I did encounter some surprises when using JAXB. JAXB imposes a number of coding limitations on the mapped POJOs which are actually quite similar to Hibernate, but not documented as well. So as a general guideline, use Hibernate's restrictions and you will probably do okay. One limitation I encountered involved handling a parent-child relationship between two mapped entities. The parent class will need to have a collection of child entities. The getter method to return the collection must return the actual collection used by the class and not a copy or wrapper: JAXB appears to add children to the parent by calling the add() method on the collection returned by the getter. (See this article for why you might want to return a copy of the underlying collection.)

JAXB shares more than just limitations with Hibernate. Many of the mapping capabilities offered by Hibernate have analogues in JAXB. JAXB can map both fields and methods, either of which may be private. JAXB also supports mapping to custom types using the @XmlJavaTypeAdapter annotation.

Despite the limitations and surprises I ran into, I was generally pleased with JAXB. It met my original goal of providing simple parsing of XML into Java POJOs using annotations to specify the mapping. My biggest disappointment with JAXB was the poor documentation. In fact, for all but trivial parsing tasks, I would recommend writing a XML Schema definition and generating the POJOs instead of doing what I did - the documentation just does not support the approach I took.

The source code listed in this article is provided in the Java Examples project which can be downloaded from the Software page.

If you find this article helpful, please make a donation.

7 Comments on “Simple XML Parsing using JAXB”

  1. cryptocore says:

    Thank you for this post. It really simplified my work & code after 2 days of trying to get jaxb to work and tons of useless java and xsd

  2. Sunil says:

    Hi,
    The reverse is also possible using JAXB, ie, you can generate JAXB Java classes from XML using an opensource tool called trang and JAXB, I have written a blog on this: http://techmindviews.blogspot.com/2010/04/xml-xsd-java-jaxb-xjc.html
    Cheers,
    Sunil.

  3. Hugo Heden says:

    Good post!

    But the blog platform has removed some of the code — everything that looks like html-tags — both xml-stuff and java-generics stuff are gone. (Look at the test code for example, the parseXml() method looks weird)

    I guess it’s still possible to figure out what you meant though.

    Thanks!

  4. @Hugo, thanks for pointing that out. I’ve fixed it all so it should be good now (and you can also download the examples).

  5. Uwe A. says:

    Exremely good post! I’m currently migrating command line options/arguments to XML. Loading job configuration as XML by reflection-aware JAXB eases this job and the sophisticated implementation of CollectingValidationEventHandler presented here even provides the opportunity to generate cmd-line error messages (regarding misformatted XML-input) in the twinkling of an eye! Thanks a lot!

  6. Bibhaw says:

    is it possible to append a child node in any existing xml file using jaxb. This can be done using DOM but i am failed to do that. do you have any idea on this ??

  7. @Bibhaw, I believe you would have to read in and parse the entire file into a tree of Java objects using JAXB, then add the child object you want, and then write out the entire structure back to XML. So it is doable, but not necessarily the most efficient approach.

«    »