XML-DBMS, Version 1.0

Java Packages for Transferring Data between
XML Documents and Relational Databases

Ronald Bourret
Technical University of Darmstadt

Contents

1.0 Overview
2.0 An Object View of an XML Document
3.0 Map Objects, Map Factories, and the XML-DBMS Mapping Language
    3.1 Map Factories
        3.1.1 MapFactory_MapDocument
        3.1.2 MapFactory_DTD
    3.2 The XML-DBMS Mapping Language
        3.2.1 Sample Documents and Tables
        3.2.2 Mapping Classes (Element Types) to Tables
        3.2.3 Mapping Properties (Attributes and Element Types) to Columns
        3.2.4 Mapping Inter-Class Relationships (Element Hierarchy)
        3.2.5 Eliminating Unwanted Root Elements
        3.2.6 Mapping Mixed Content
        3.2.7 Using Namespaces
        3.2.8 Handling Null Values
        3.2.9 Date, Time, and Timestamp Formats
4.0 Transferring Data to the Database
5.0 Transferring Data to an XML Document
6.0 Supported Parsers and DOM Implementations
    6.1 Namespace Support
    6.2 Document Factories
7.0 Classes Not for General Use
8.0 Downloading and Installing XML-DBMS
9.0 Samples
   9.1 Transfer
   9.2 GenerateMap
   9.3 ConvertSchema
10.0 Tips and Tricks
11.0 Licensing and Support

1.0 Overview

XML-DBMS is a set of Java packages for transferring data between XML documents and relational databases. Programmers use these packages to build systems that transfer data; a sample application can be run from the command line to transfer data between a database and an XML file.

XML-DBMS preserves the hierarchical structure of an XML document, as well as the data (character data and attribute values) in that document. If requested, it also preserves the order in which the children at a given level in the hierarchy appear. (For many data-centric applications, such order is not important and the code runs faster without it.)

Because XML-DBMS seeks to transfer data, not documents, it does not preserve document type declarations, nor does it preserve physical structure such as entity use, CDATA sections, or document encodings. In particular, it does not attempt to implement a document management system on top of a relational database.

For a general discussion of XML and databases, see XML and Databases.

2.0 An Object View of an XML Document

XML-DBMS views an XML document as a tree of objects and then uses an object-relational mapping to map these objects to a relational database. The tree of objects is not, as one might initially guess, the Document Object Model (DOM). The reason for this is that the DOM models the document itself and not the data in the document.

Instead, the tree is constructed by viewing element types as classes, and attributes and PCDATA as properties of those classes. Subordinate element types are viewed as subordinate classes in the tree; that is, an interclass relationship exists between the parent and child classes.

The view of element types as classes is not absolute: element types can also be viewed as properties of their parent element type-as-class. This is most useful when an element type contains only PCDATA. However, it is useful in other cases as well. For example, consider an element type that contains a description written in XHTML. Although this description has subelements such as <B> and <P>, these subelements cannot be meaningfully interpreted on their own and it makes more sense to view the contents of the element type as a single value (property) rather than a class.

For example, in the following XML document, the SalesOrder and Customer element types might be viewed as classes and the OrderDate and Description element types as properties:

   <Orders>
      <SalesOrder SONumber="12345">
         <Customer CustNumber="543">
            <CustName>ABC Industries</CustName>
            <Street>123 Main St.</Street>
            <City>Chicago</City>
            <State>IL</State>
            <PostCode>60609</PostCode>
         </Customer>
         <OrderDate>981215</OrderDate>
         <Line LineNumber="1">
            <Part PartNumber="123">
               <Description>
                  <P><B>Turkey wrench:</B><BR />
                  Stainless steel, one-piece construction,
                  lifetime guarantee.</P>
               </Description>
               <Price>9.95</Price>
            </Part>
            <Quantity>10</Quantity>
         </Line>
         <Line LineNumber="2">
            <Part PartNumber="456">
               <Description>
                  <P><B>Stuffing separator:<B><BR />
                  Aluminum, one-year guarantee.</P>
               </Description>
               <Price>13.27</Price>
            </Part>
            <Quantity>5</Quantity>
         </Line>
      </SalesOrder>
   </Orders>

Exactly how element types, attributes, and PCDATA are viewed is left to the user, who specifies this information, as well now to map the object view to the database, in a Map object.

3.0 Map Objects, Map Factories, and the XML-DBMS Mapping Language

A Map object declares the object view of the element types, attributes, and PCDATA in an XML document and how this view is mapped to the database. Map objects are opaque. That is, the programmer constructs a Map object with a map factory and passes it to the data transfer classes without calling any of its methods. For example, the following code calls a user-defined function to create a Map from a map document and passes it to the class that transfers data from an XML document to the database:

   // Use a user-defined function that calls a map
   // factory to create a map. (See section 3.1.)
   map = createMap("sales.map", conn1);

   // Set the Map on the DOMToDBMS object.
   domToDBMS.setMap(map);

The Map can also be set on the constructor.

Note that a Map object can be used multiple times. For example, suppose that a program expects to store four different types of XML documents in the database. It can create the Maps for these documents at start-up, then, as it receives documents to process, pass the appropriate Map to the DOMToDBMS object.

Disclaimer:I haven't learned how Java multi-threading works, so the following may not be an issue.

Assuming it is possible for multiple threads to share the same object, multiple threads should not share the same Map. The reason for this is that a Map contains a reference to a Connection object and the data transfer classes (DBMSToDOM and DOMToDBMS) commit transactions on this object. Since Connection.commit() commits all statements open on a given Connection, a commit executed in one data transfer object will commit statements being used by all other data transfer objects sharing the same Map/Connection. This is unlikely to be the desired behavior.

3.1 Map Factories

Currently, XML-DBMS has a two map factories: one to create Map objects from map documents and one to create Map objects from DTDs and schema documents.

3.1.1 MapFactory_MapDocument

The MapFactory_MapDocument class creates Map objects from map documents. It is the most commonly used map factory. For example, the following code creates a Map object from the sales.map map document:

   // Instantiate a new map factory from a database connection
   // and a SAX parser.
   factory = new MapFactory_MapDocument(conn, parser);

   // Create a Map from sales.map.
   map = factory.createMap(new InputSource(new FileReader("sales.map")));

3.1.2 MapFactory_DTD

The MapFactory_DTD class creates Map objects from DTDs and XML schema documents. This factory is designed primarily for use as a tool to help build mapping documents. For example, the following code creates a Map object from document.dtd and then serialized that Map object to a file.

   // Instantiate a new map factory and create a map.
   factory = new MapFactory_DTD();
   src = new InputSource("file://c:/java/de/tudarmstadt/ito/xmldbms/samples/document.dtd");
   map = factory.createMapFromDTD(src, MapFactory_DTD.DTD_EXTERNAL, true, null);
// Open a FileOutputStream and serialize the Map to that stream. mapFile = new FileOutputStream("c:\java\de\tudarmstadt\ito\xmldbms\samples\document.map"); map.serialize(mapFile); mapFile.close();

Maps created by MapFactory_DTD cannot be used to transfer data until the Map.setConnection method has been called to specify a JDBC Connection.

MapFactory_DTD supports DTDs in two forms: either an external subset -- that is, a stand-alone DTD file -- or an XML document containing an internal subset, reference to an external subset, or both. Currently, the only schema language it supports is DDML (Data Definition Markup Language). If you need to use another schema language, such as DCD (Document Content Description for XML), SOX (Schema for Object-Oriented XML), the W3C's XML Schema language, or XML-Data Reduced, you will need to write a conversion module similar to de.tudarmstadt.ito.schemas.converters.DDMLToDTD.

3.2 The XML-DBMS Mapping Language

The XML-DBMS mapping language is a simple, XML-based language that describes both how to construct an object view for an XML document and how to map this view to a relational schema. We will introduce the main parts of the language in a series of examples. For complete information, see the XML-DBMS DTD.

3.2.1 Sample Documents and Tables

The examples use the sales language shown in section 2.0, the document language shown in figure 1, and the tables shown in figures 2 and 3.


   Figure 1: Sample document in Document language:

   <!DOCTYPE Product SYSTEM "document.dtd">

   <Product>

   <Name>XML-DBMS</Name>

   <Developer>Ronald Bourret, Technical University of Darmstadt</Developer>

   <Summary>Java packages for transferring data between
   XML documents and relational databases</Summary>

   <Description>

   <Para>XML-DBMS is a set of Java packages for transferring data between
   XML documents and relational databases. It views the XML document as a tree
   of objects in which element types are generally viewed as classes and
   attributes and PCDATA as properties of those classes. It then uses an object-
   relational mapping to map these objects to the database. An XML-based mapping
   language is used to define the view and map it to the database.</Para>
   
   <Para>You can:</Para>
   
   <List>
   <Item><Link URL="readme.html">Read more about XML-DBMS</Link></Item>
   <Item><Link URL="XMLAndDatabases.htm">Read more about databases and XML</Link></Item>
   <Item><Link URL="xmldbms.dtd">View the mapping language DTD</Link></Item>
   <Item><Link URL="xmldbms.zip">Download XML-DBMS</Link></Item>
   </List>
   
   <Para>XML-DBMS, along with its source code, is freely available for use
   in both commercial and non-commercial settings.</Para>
   
   </Description>
   

   Figure 2: Sales data tables:

         Sales          Lines          Customers          Parts
         Number          SONumber          Number          Number
         CustNumber          Number          Name          Description
         Date          Part          Street          Price
                           Quantity          City                  
                                             State                  
                                             Country                  
                                             PostalCode                  

   Figure 3: Document data tables:

         Product          Description          Para          ParaPCDATA
         ProductID          ProductID          DescriptionID          ParaID
         ProductOrder          DescriptionID          ParaID          ParaPCDATA
         Name          DescriptionOrder          ParaOrder          ParaPCDATAOrder
         NameOrder                                                      
         Developer                                                      
         DeveloperOrder                                                      
         Summary                                                      
         SummaryOrder                                                      
                                                                       
         List          Item          ItemPCDATA          Link
         DescriptionID          ListID          ItemID          ParaID
         ListID          ItemID          ItemPCDATA          ItemID
         ListOrder          ItemOrder          ItemPCDATAOrder          LinkOrder
                                                               URL
                                                               LinkPCDATA

3.2.2 Mapping Classes (Element Types) to Tables

Element types with element content are usually viewed as classes and mapped to a table. For example, the following declares the SalesOrder element type to be a class and maps it to the Sales table:

   <ClassMap>
      <ElementType Name="SalesOrder"/>
      <ToClassTable>
         <Table Name="Sales"/>
      </ToClassTable>
      ...property maps...
      ...related class maps...
      ...pass-through maps...
   </ClassMap>

The ClassMap element contains all of the information needed to map a single class (element type), including the table to which the class is mapped, the maps for each property in the class, a list of related classes, and a list of passed-through child classes.

The ElementType element identifies the element type (class) being mapped and the ToClassTable element gives the name of the table to which the class is mapped.

3.2.3 Mapping Properties (Attributes and Element Types) to Columns

Single-valued attributes and element types with PCDATA-only content are usually viewed as properties and mapped to columns. For example, the following declares the SONumber attribute and the OrderDate element type (when SalesOrder is its parent) to be properties and maps them to the Number and Date columns, respectively. These maps are nested inside the class map for SalesOrder.

   <PropertyMap>
      <Attribute Name="SONumber"/>
      <ToColumn>
         <Column Name="Number"/>
      </ToColumn>
   </PropertyMap>

   <PropertyMap>
      <ElementType Name="OrderDate"/>
      <ToColumn>
         <Column Name="Date"/>
      </ToColumn>
   </PropertyMap>

The Attribute and ElementType elements identify the properties being mapped and the ToColumn elements state that they are being mapped to columns. These columns are understood to be in the table to which the class (SalesOrder) is mapped.

3.2.4 Mapping Inter-Class Relationships (Element Hierarchy)

When a child element type is viewed as a class, its relationship with its parent element type must be stated in the map of the parent class. For example, the following declares that Line is related to the SalesOrder class. This map is nested inside the class map for SalesOrder; the actual mapping of the Line class is separate.

   <RelatedClass KeyInParentTable="Candidate">
      <ElementType Name="Line"/>
      <CandidateKey Generate="No">
         <Column Name="Number"/>
      <CandidateKey/>
      <ForeignKey>
         <Column Name="SONumber"/>
      </ForeignKey>
      <OrderColumn Name="Number" Generate="No"/>
   </RelatedClass>

The ElementType element gives the name of the related class and the KeyInParentTable attribute states that the candidate key used to join the tables is in the parent (Sales) table. CandidateKey and ForeignKey give the columns in these keys, which must match in number and type. The Generate attribute of CandidateKey tells the system whether to generate the key. This allows us to preserve keys that have business meaning and generate object identifiers when no such keys exist. In this case, we do not generate the key because we have mapped the SONumber attribute of the SalesOrder element type to the candidate key column (Sales.Number).

The (optional) OrderColumn element gives the name of the column that contains information about the order in which Line elements appear in the SalesOrder element. Because this column must appear in the table on the "many" side of the relationship, Number refers to the Lines.Number column, not the Sales.Number column. The Generate attribute of the OrderColumn element tells the system whether to generate the order value. In this case, we do not generate the order value because we will separately map the LineNumber attribute of the Line element type to the order column (Lines.Number).

3.2.5 Eliminating Unwanted Root Elements

Root elements sometimes exist only because XML requires a single root element. For example, in our sales order language, we would like to store multiple sales orders in a single document. To do this, we need the Orders element to encapsulate multiple SalesOrder elements. However, there is no structure in the database corresponding to the Orders element and we would like to eliminate it. For example, the following states that the Orders element type is to be ignored.

   <IgnoreRoot>
      <ElementType Name="Orders"/>
      <PseudoRoot>
         <ElementType Name="SalesOrder"/>
         <CandidateKey Generate="No">
            <Column Name="Number"/>
         </CandidateKey>
      </PseudoRoot>
   </IgnoreRoot>

The first ElementType element gives the element type to be ignored. The PseudoRoot element introduces an element type (SalesOrder) to serve as a root in its place; there can be multiple pseudo-roots. The (optional) CandidateKey element gives the key to be used when retrieving data from the database; not shown is an option OrderColumn element that gives the order in which the SalesOrder elements are to be retrieved.

Ignored root elements are reconstructed when retrieving data from the database.

3.2.6 Mapping Mixed Content

Mixed content consists of both PCDATA and elements, such as in our document language. The order in which the PCDATA and elements appear is usually important, so we usually need to keep order information for the PCDATA as well as each element. For example, the following maps the Name element type to the Name column in the Product table and stores system-generated order information in the NameOrder column; this map is nested inside the class map for the Product element type.

   <PropertyMap>
      <ElementType Name="Name"/>
      <ToColumn>
         <Column Name="Name"/>
      </ToColumn>
      <OrderColumn Name="NameOrder" Generate="Yes"/>
   </PropertyMap>

Because PCDATA can occur multiple times in mixed content, it is usually mapped to a separate table. For example, the following maps the PCDATA from the Para element type to the ParaPCDATA table; this map is nested inside the class map for the Para element type.

   <PropertyMap>
      <PCDATA/>
      <ToPropertyTable KeyInParentTable="Candidate">
         <Table Name="ParaPCDATA"/>
         <CandidateKey Generate="Yes">
            <Column Name="ParaID"/>
         </CandidateKey>
         <ForeignKey>
            <Column Name="ParaID"/>
         </ForeignKey>
         <Column Name="ParaPCDATA"/>
         <OrderColumn Name="ParaPCDATAOrder" Generate="Yes"/>
      </ToPropertyTable>
   </PropertyMap>

The ToPropertyTable element states that the table contains only property values, not a class. In addition to giving the candidate and foreign keys needed to retrieve PCDATA values from the table, we give the names of the columns (ParaPCDATA and ParaPCDATAOrder) in which the PCDATA and order information are stored. Notice that we ask the system to generate both the candidate key (ParaID) and the order information; this is because the document does not contain this information.

As you may have noticed, the document language requires more tables and more columns per property than the sales order language. This is because the document language is an example of a document-centric language, while the sales language is an example of a data-centric language.

Document-centric languages are used to create documents for human consumption, such as books, email, and advertisements. They are characterized by less predictable structures, coarser-grained data, and large amounts of mixed content and the order in which sibling elements and PCDATA occurs is usually significant. Because order is usually signficant and element types-as-properties and PCDATA generally can occur multiple times in their parent (thus requiring storage in separate tables), document-centric languages require a more complex structure in the database.

Data-centric languages tend to describe discrete pieces of data and are typically used to transfer data between applications and data stores. They are characterized by fairly regular structure, fine-grained data (the smallest independent unit of data is usually at the attribute or PCDATA-only element level), and little or no mixed content. The order in which sibling elements and PCDATA occurs is usually not significant. Because of their regular structure and the unimportance of order, data-centric languages require a less complex structure in the database.

Although XML-DBMS and relational databases can be used to store documents written in document-centric languages, they are better suited to storing the regular structure encountered in documents written in data-centric languages.

3.2.7 Using Namespaces

Namespaces are supported through Namespace elements, which declare the prefixes and URIs used in the Name attributes of ElementType and Attribute elements. (Note that these prefixes are separate from those declared with xmlns attributes.) For example, suppose the sales order language has a namespace URI of http://ito.tu-darmstadt.de/xmldbms/sales. The map document might contain the following Namespace element, which states that the sales prefix is used in the map document to identify element types and attributes from this namespace.

   <Namespace Prefix="sales" URI="http://ito.tu-darmstadt.de/xmldbms/sales"/>

Thus, when mapping the SalesOrder element type, the following reference is used:

   <ElementType Name="sales:SalesOrder"/>

As with namespaces in XML documents, unprefixed attribute names referenced in the Name attribute of the Attribute element type do not belong to any XML namespace. (For those of you who are confused by this statement, remember that such attribute names must be unique within their element type; this is a requirement imposed by the XML specification and has nothing to do with XML namespaces.) For example, in the following class map, the SONumber attribute is assumed to belong to the SalesOrder element type; it does not belong to any XML namespace.

   <ClassMap>
      <ElementType Name="sales:SalesOrder"/>
      <ToClassTable>
         <Table Name="Sales"/>
      </ToClassTable>
      <PropertyMap>
         <Attribute Name="SONumber"/>
         <ToColumn>
            <Column Name="Number"/>
         </ToColumn>
      </PropertyMap>
   </ClassMap>

Prefixes used in the map document do not need to match those used in instance documents. All that is important is that the namespace URIs are the same. Currently, Namespace elements do not support empty prefixes; that is, you cannot declare a namespace URI that will be associated with unprefixed element type and attribute names in the map document.

Whether a document using namespaces can actually be processed depends on the DOM implementation being used. For more information, see section 6.1, "Namespace Support".

3.2.8 Handling Null Values

A null value is a value that simply isn't there. This is very different from a value of 0 (for numbers) or zero length (for a string). For example, suppose you have data collected from a weather station. If the thermometer isn't working, a null value is stored in the database rather than a 0, which would mean something different altogether.

XML also supports the concept of null data through optional element types and attributes. If the value of an optional element type or attribute is null, it simply isn't included in the document. As with databases, empty elements or attributes containing zero length strings are not null: their value is a zero-length string.

In spite of this definition of null values, it is quite likely that XML documents will use empty (zero-length) strings to represent null values. Because of this, the EmptyStringIsNull element can be used to state how empty strings are treated. If it is present, empty strings are treated in the same way as null values. If it is absent, empty strings are treated as strings. For example, the following states that empty strings should be treated as nulls.

   <EmptyStringIsNull/>

The EmptyStringIsNull element is nested inside the Options element. Note that it applies only to element types and attributes mapped as propertys. An empty element-as-class with no attributes results in a row of all NULLs in the database.

3.2.9 Date, Time, and Timestamp Formats

Because XML documents are international, it is likely that you will encounter a variety of date, time, and timestamp formats. You can specify the formats to use with the DateTimeFormats element, which contains an optional Locale element and a Patterns element that specifies the actual formatting patterns to use. For example, the following specifies that dates use the "dd.MM.yy" format (e.g. 29.10.58), times use the "HH:mm" format (e.g. 18:37), and timestamps use the "MMM d, yyyy h:mm a" (e.g. February 9, 1962 6:35 AM).

   <DateTimeFormats>
      <Patterns Date="HH:mm" Time="HH:mm" Timestamp="MMM d, yyyy h:mm"/>
   </DateTimeFormats>

Like EmptyStringIsNull, the DateTimeFormats element is nested inside the Options element. The formats used are defined in the java.text.DateFormat and java.text.SimpleDateFormat classes.

4.0 Transferring Data to the Database

The DOMToDBMS class transfers data from a DOM tree to the database according to a given Map. For example, the following code transfers data from the sales_in.xml document to the database according to the Map object created from sales.map:

   // Use a user-defined function to create a map.
   map = createMap("sales.map", conn1);

   // Use a user-defined function to create a DOM tree over sales_in.xml
   doc = openDocument("sales_in.xml");

   // Create a new DOMToDBMS object and store the data.
   domToDBMS = new DOMToDBMS(map);
   docInfo = domToDBMS.storeDocument(doc);

Information about how to retrieve the data at a later point in time is returned in a DocumentInfo object, which is just a list of table names, key column names, key values, and order column names.

If DOMToDBMS needs to generate key values, as in our document example, the caller must provide an object that implements the KeyGenerator interface. DOMToDBMS calls a method this object to get unique key values; a default implementation of this object (KeyGeneratorImpl), which generates unique 4-byte integers, can be found in the de.tudarmstadt.ito.xmldbms.helpers package. For example:

   // Use a user-defined function to create a map.
   Map map = createMap("document.map", conn1);

   // Use a user-defined function to create a DOM tree over document_in.xml
   doc = openDocument("document_in.xml");

   // Instantiate KeyGeneratorImpl and initialize it with a Connection.
   keyGenerator = new KeyGeneratorImpl();
   keyGenerator.initialize(conn2);

   // Create a new DOMToDBMS object and set the KeyGenerator.
   domToDBMS = new DOMToDBMS(map);
   domToDBMS.setKeyGenerator(keyGenerator);

   // Store the data.
   docInfo = domToDBMS.storeDocument(doc);

The KeyGenerator can also be set on the constructor.

Note that the KeyGeneratorImpl object and the DOMToDBMS object use different connections to the same database. This is because each commits transactions at different times and using the same connection for both objects would lead to statements being committed prematurely.

5.0 Transferring Data to an XML Document

The DBMSToDOM class transfers data from the database to a DOM tree according to a given Map. For example, the following code transfers data for sales order number 123 from the Sales table to the sales_out.xml document according to the Map object created from sales.map:

   // Use a user-defined function to create a map.
   map = createMap("sales.map", conn);

   // Create a new DBMSToDOM object.
   dbmsToDOM = new DBMSToDOM(map, new DF_Oracle());

   // Create a key and retrieve the data.
   key = {new Integer(123)};
   doc = dbmsToDOM.retrieveDocument("Sales", key);

Note that the DBMSToDOM object is created with a DocumentFactory (DF_Oracle) that can create Documents for Oracle's implementation of the DOM. For more information, see section 6.2, "Document Factories".

The DBMSToDOM class has four different retrieveDocument methods. In addition to the method shown above, there are methods that accept arrays of tables and keys, a DocumentInfo object, and a ResultSet object as arguments. In the latter case, the Map object must map an element type to the table named "Result Set".

If the data specified by the parameters of a retrieveDocument method contains more than one row, the Map object must specify an ignored root type. Otherwise, DBMSToDOM will attempt to add more than one root element to the document, resulting in a DOMException. (Note that this does not include rows of data retrieved from subordinate tables.)

6.0 Supported Parsers and DOM Implementations

XML-DBMS is written in a parser and DOM-neutral fashion and should be able to use any parser that supports SAX and any Java implementation of the DOM. Unfortunately, there are no standard ways to support namespaces in the DOM, nor are there standard ways to create empty DOM documents. Thus, these capabilities are encapsulated in the NameQualifier and DocumentFactory interfaces in the de.tudarmstadt.ito.domutils package.

6.1 Namespace Support

The DOM specification does not define how namespaces are supported. Thus, some DOM implementations have defined methods for retrieving various information about the namespace used by a given Node. We have encapsulated a subset of this information in the NameQualifier interface. This interface uses the following definitions:

Local name
The unprefixed name of a node.
Prefixed name
The prefixed name of a node. If there is no namespace URI, the prefixed name is the same as the local name.
Qualified name
The namespace URI, plus a caret (^), plus the local name. If there is no namespace URI, the qualified name is the same as the local name.

For example:

   <foo:element1 xmlns="http://foo">
   Local name: "element1"
   Prefixed name: "foo:element1"
   Qualified name: "http://foo^element1"

   <element2>
   Local name: "element2"
   Prefixed name: "element2"
   Qualified name: "element2"

To use namespaces, the DOM implementation must support namespaces and the Map object must declare the namespace URI (if any) of each mapped element type and attribute (see section 3.2.7, "Using Namespaces"). If a DOM implementation does not support namespaces, then the element type and attribute names in the Map object must exactly match the names returned by the DOM's implementation Node.getNodeName(). Usually, this will be the prefixed name.

When transferring data from an XML document to the database, the caller must pass an object that implements the NameQualifier interface to the DOMToDBMS object. For example, the following code passes a NameQualifier for Oracle's DOM implementation:

   domToDBMS.setNameQualifier(new NQ_Oracle());

The NameQualifier may also be set in the constructor and no NameQualifier is needed if neither the XML document nor the Map uses namespaces. The de.tudarmstadt.ito.domutils package includes implementations of NameQualifier for DataChannel (Microsoft), IBM, Oracle, and Sun. As of this writing, Docuverse and OpenXML do not support namespaces. However, you should check whether a newer version of either implemenation does; implementing NameQualifier yourself is trivial.

When transferring data from the database to an XML document, the caller must choose how namespaces will be used. Currently, no DOM implementations support setting the namespace or prefix of an element or attribute. Thus, the caller can choose whether element and attribute names are prefixed according to the namespace prefixes in the Map or no prefixes are used at all.

Prefixing the element and attribute names in the returned DOM tree is useful if the DOM tree is to be serialized as XML. However, it will probably cause problems if the DOM tree is to be used directly. The reason for this is that the DOM implementation will not correctly recognize and return the unprefixed name, the namespace URI, or the qualified name. By default, prefixes are not used. The following code shows how to request that prefixes be used:

   dbmsToDOM.usePrefixes(true);

6.2 Document Factories

Like namespace support, there is no standard way to create an empty DOM Document. Thus, we have encapsulated this functionality in the DocumentFactory interface. When transferring data from the database to an XML document, an object implementing this interface must be passed to the DBMSToDOM object. For example, the following code uses the DocumentFactory for Oracle:

   dbmsToDOM.setDocumentFactory(new DF_Oracle());

The DocumentFactory may also be set in the constructor. The de.tudarmstadt.ito.domutils package contains implementations of DocumentFactory for the DataChannel (Microsoft), Docuverse, IBM, OpenXML, Oracle, and Sun DOM implementations. Be sure to check that these implementations match the version of the implementation you are using. If not, you may need to implement DocumentFactory yourself; doing so is trivial.

7.0 Classes Not for General Use

The de.tudarmstadt.ito.xmldbms package contains a number of public classes that are not for general use. That is, programmers using XML-DBMS do not need to instantiate or call methods on these classes. These classes are used to map XML document structures to database structures and are public so that they can be used by map factories, which are in a different package.

The not-for-general-use mapping classes are:

   ClassMap
   Column
   ColumnMap
   LinkInfo
   MapOptions
   OrderInfo
   PropertyMap
   RelatedClassMap
   RootClassMap
   RootTableMap
   Table
   TableMap

A special case is the Map class. For programmers using XML-DBMS, this is generally treated as an opaque object. That is, the programmer gets a Map object from a map factory and passes it to DOMToDBMS or DBMSToDOM. In addition, the Map object has public methods that some (but not all) XML-DBMS programmers use, such as methods to serialize the map to an OutputStream and to get CREATE TABLE statements. Although many variables in the Map class are public, programmers should never need to access them.

It is possible for programmers to directly create objects in the mapping classes, but it is strongly recommended that a map factory be used instead. Note that DOMToDBMS and DBMSToDOM largely assume that the objects in these classes have been created correctly, so using incorrectly constructed objects has unpredictable results. However, should a programmer be brave (foolish?) enough to construct these objects by hand, a slightly simplified hierarchy of them is as follows:

   Map
      Table (array of)
      TableMap (array of)
      RootClassMap (hashtable of)
         ClassMap
            PropertyMap (hashtable of)
            RelatedClassMap (hashtable of)
               ClassMap...
               LinkInfo
               OrderInfo
         LinkInfo
         OrderInfo
      RootTableMap (hashtable of)
         TableMap
            Table
               Column (array of)
            ColumnMap (array of)
               Column
            TableMap... (array of)

8.0 Downloading and Installing XML-DBMS

You can download the current version of XML-DBMS from here.

To install, unzip the downloaded file and add xmldbms.jar to your CLASSPATH.

XML-DBMS has been used with JDK versions 1.1.8 and 1.2 and a number of different databases and JDBC drivers.

9.0 Samples

XML-DBMS comes with three samples, Transfer, GenerateMap, and ConvertSchema, which can be found in the samples subdirectory.

9.1 Transfer

Transfer is a simple command-line application that transfers data between an XML file and the database according to a particular map document. It shows how to use the MapFactory_MapDocument, DOMToDBMS, DBMSToDOM, and Map classes.

When transferring data from an XML document to the database, use the command:

   java Transfer -todbms <map-file> <xml-file>

For example, to transfer data from the sample file document_in.xml to the database according the map document document.map, use the command:

   java Transfer -todbms document.map document_in.xml

When transferring data from the database to an XML document, use the command:

   java Transfer -todbms <map-file> <xml-file> <table-name> <key-value>...

where <key-value> is one or more values in a single key. (There are multiple values only if the key is multi-part.) For example, to transfer data for sales order number 123 from the Sales table to the file sales_out.xml according to the map document sales.map, use the command:

   java Transfer -toxml sales.map sales_out_in.xml Sales 123

The Transfer application requires an ODBC data source named "xmldbms", an ODBC driver for that database, and that the tables referred to in the map exist in that database. Furthermore, if the map specifies that the system generate key values, the table XMLDBMSKey must exist; for more information, see de.tudarmstadt.ito.xmldbms.helpers.KeyGeneratorImpl.

9.2 GenerateMap

GenerateMap is a simple command-line application that generates a map and a set of CREATE TABLE statements from a DTD, an XML document containing or referring to a DTD, or a DDML schema document. The map is saved in a document with the .map extension and the CREATE TABLE statements are saved in a document with the .sql extension. It shows how to use the MapFactory_DTD and Map classes.

To run GenerateMap, use the command:

   java GenerateMap <DTD or XML document>

For example, to generate a map from the document DTD, use the command:

   java GenerateMap document.dtd

The GenerateMap application requires an ODBC data source named "xmldbms" and an ODBC driver for that database. It does not require that the database contain any tables -- it simply needs to retrieve information from the database about how to construct the CREATE TABLE statements.

9.3 ConvertSchema

ConvertSchema is a simple command-line application that converts schema documents to DTDs and vice versa. Currently, only DDML-to-DTD and DTD-to-DDML are supported, but writing converters to other schema languages is relatively easy -- a half day to day per direction. Although this sample has nothing to do with databases, it does show the capabilities of some of the schema conversion classes (SubsetToDTD, DDMLToDTD, DTDToDDML, and DTD), which might be useful in other applications.

To convert a schema document to a DTD or vice versa, use the command:

   java ConvertSchema <schema-file>

For example, to convert the document.ddm DDML document to a DTD, use the command:

   java ConvertSchema document.ddm

To convert a DTD to convert the document.dtd DTD to a DDML document, use the command:

   java ConvertSchema document.dtd

10.0 Tips and Tricks

Here is a short list of ways that might help your code run faster:

11.0 Licensing and Support

XML-DBMS, along with its source code, is freely available for use in both commercial and non-commercial settings. It is not copyrighted and has absolutely no warranty. Use it as you will.

Although I am no longer at the Technical University of Darmstadt, you can still get limited support by emailing me at rpbourret@aol.com or rpbourret@hotmail.com. Because I will be travelling and not in regular contact with email, expect a one- to two-week delay. Bug reports and comments are welcome. There is also a list of known bugs and suggested enhancements.

Thanks to all those who have given me feedback and sent in bug reports. Special thanks to Richard Sullivan, Matthias Pfisterer, and Alf Hogemark for their helpful comments, suggestions for new features, code, and testing.