XML Parsing in Java

Quick Reference: Java-based XML Parsing 101

This page is meant to provide a first quick look at Java-based XML parsing for the ACM ITiCSE 2005 Working Group Development of XML-based Tools to Support User Interaction with Algorithm Visualizations.

Luckily, Java by now offers relatively good support for parsing XML files. There are several good references for this topic -- see below for a brief list. One good starting point is Working with XML: The Java/XML Tutorial, offered by Sun.

Sun also offers some "code samples" for XML parsing

Within the Working Group, we will probably use the Document Object Model (DOM) for parsing XML. Java has a set of built-in classes for handling the full parsing process, resulting in an object tree that "only" needs to be traversed.

Here is the rather straightforward code for actually parsing in an XML file (test it using our demonstration file). See also the DTD file that contains the definition of the elements used in the XML file. Both files are taken from SUN's Java API for XML Code Samples:

/*
 * @(#)DomEcho01.java 1.9 98/11/10
 *
 * Copyright (c) 1998 Sun Microsystems, Inc. All Rights Reserved.
 *
 * Sun grants you ("Licensee") a non-exclusive, royalty free, license to use,
 * modify and redistribute this software in source and binary code form,
 * provided that i) this copyright notice and license appear on all copies of
 * the software; and ii) Licensee does not utilize the software in a manner
 * which is disparaging to Sun.
 *
 * This software is provided "AS IS," without a warranty of any kind. ALL
 * EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY
 * IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR
 * NON-INFRINGEMENT, ARE HEREBY EXCLUDED. SUN AND ITS LICENSORS SHALL NOT BE
 * LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE AS A RESULT OF USING, MODIFYING
 * OR DISTRIBUTING THE SOFTWARE OR ITS DERIVATIVES. IN NO EVENT WILL SUN OR ITS
 * LICENSORS BE LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR FOR DIRECT,
 * INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER
 * CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF THE USE OF
 * OR INABILITY TO USE SOFTWARE, EVEN IF SUN HAS BEEN ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGES.
 *
 * This software is not designed or intended for use in on-line control of
 * aircraft, air traffic, aircraft navigation or aircraft communications; or in
 * the design, construction, operation or maintenance of any nuclear
 * facility. Licensee represents and warrants that it will not use or
 * redistribute the Software for such purposes.
 */
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.IOException;
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;

public class DomEcho01{
  // Global value so it can be ref'd by the tree-adapter
  static Document document;

  public static void main(String argv[]) {
    if (argv.length != 1) {
      System.err.println("Usage: java DomEcho filename”);
      System.exit(1);
    }
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    //factory.setValidating(true);
    //factory.setNamespaceAware(true);
    try {
      DocumentBuilder builder = factory.newDocumentBuilder();
      document = builder.parse( new File(argv[0]) );
    } catch (SAXException sxe) {
      // Error generated during parsing)
      Exception x = sxe;
      if (sxe.getException() != null)
        x = sxe.getException();
      x.printStackTrace();
    } catch (ParserConfigurationException pce) {
      // Parser with specified options can't be built
      pce.printStackTrace();
    } catch (IOException ioe) {
      // I/O error
      ioe.printStackTrace();
    }
  } // main
}

If we skip the long header with copyright terms and the usual import segments, what really happens here is that we retrieve a new instance of javax.xml.parsers.DocumentBuilderFactory. Based on this, we can retrieve a new DocumentBuilder, which is then used to parse the associated XML file (the first invocation argument is supposed to be a file name). Unless a SAX (Simple API for XML) or Parser Configuration exception occur, nothing further is done.

Note that this simple example really includes the full code for parsing in the object tree! A more verbose example on Sun's XML examples includes a GUI browser for examining the parsed elements.

One book I personally find rather helpful for the WG's purpose is Deitel's XML How to Program.

© Dr. Guido Roessling 2015