I originally tried to write DOMToDBMS as a SAX application but concluded that it was impossible. However, this conclusion was based on the assumption that the class did no caching of data and, in this case, it was impossible. However, if some data is cached as Row objects, it is quite possible that SAXToDBMS is possible. The problem that needs to be solved is as follows. Consider the sample Sales Order document. This has data for four tables: Sales, Customers, Lines, and Parts. Now consider the key relationships: Sales has a foreign key that points to Customers and a primary key that points to Lines, and Lines has a foreign key that points to Parts. The trick is that, to meet referential integrity constraints in the database, you need to add the rows with primary keys first. Thus, you need to add the Customers row before you can add the Sales row, you need to add the Sales row before you can add the Lines row, and you need to add the Parts row before you can add the Lines row. The tree-walking algorithm I use is as follows. Start from the root. When you encounter a node mapped to a table, such as the SalesOrder node, create a buffer for a row in that table, then processes the children of that node. If a child is mapped to a column, such as the OrderDate node, store the node's value in the buffer. If a child is mapped to a table that stores a foreign key in the current table, such as the Customer node, process the node immediately and store the foreign key in the current row buffer. If a child is mapped to a table that uses the primary key of the current table, such as the Line nodes, save the node is for later processing, which you do after inserting the parent node, so you can pass the primary key to the child node. In my original SAX implementation, I did not use row buffers. Thus, the implementation was clearly impossible -- I would, for example, have to immediately process (and insert) the sales data, which comes before the customer data. This leads to referential integrity problems. However, you've got me thinking now and it's quite possible you might be able to build the row buffers as you go, then insert them in the correct order. If so, this would be terrific. Anyway, the general strategy is something along the lines of: ********** In startElement(). 1) Get the map for the element. 2) If the element is mapped as a class: a) create a new Row for the element. b) read the attributes linearly, getting each name and value. c) get the property map for each attribute. if it is stored as TOCOLUMN, store the value in the current Row. if it is stored in as TOPROPERTYTABLE, create a Row for the property table and store the value in it. 3) If the element is mapped as a property then, if the element is mapped as TOCOLUMN, do nothing now. If it is mapped as TOPROPERTYTABLE, create a Row for the property table. ******** In characters(): 1) If you are processing an element mapped as a property then: a) If the element is mapped as TOCOLUMN, store the value in the parent element's Row. b) If the element is mapped as TOPROPERTYTABLE, store the value in the property table Row. 2) If you are processing an element mapped as a class, get the map for the PCDATA. If this is mapped as TOCOLUMN, store the value in the element's Row. If this is mapped as TOPROPERTYTABLE, create a Row for the property table and store the value in it. 3) IMPORTANT! A single piece of PCDATA can result in multiple calls to characters, so you need to deal with this somehow -- for example, appending data to column values rather than just creating a string and storing it in the Row. ********* In endElement(): 1) If the element is mapped as a property, I don't think there is anything to do. 2) If the element is mapped as a class, then (I'm thinking out loud here), by reaching the end of the element, you know that all child elements are done. You don't need to worry about child elements-as-properties, as processing for them is finished. All you care about is child elements-as-classes and their offspring. I think what you do is recursively process these as follows: a) First process all child elements-as-classes that contain primary keys. These need to be inserted and their foreign keys stored in the parent row. b) Next insert the parent row. c) Finally, insert the primary key from the parent row into the children skipped in (a) and insert these children. Note that when I talk about child elements-as-classes, I think I am also referring to property tables for the current parent. This is the only really tricky part of the code, as these elements could be nested quite deep and you need to keep track of which have been inserted and which haven't. However, the more I talk about this, the more I believe it is possible. Furthermore, although you end up caching a certain amount of data, I think it's a good bet that this approaches the same memory requirements as the DOM for trees in which there are no siblings, only parents and children. (For example, if your tree has a root with 10000 children, each of which has three children and no grandchildren, then assuming the root is ignored, you will, at most, only cache the original child and its three children at any one point in time. On the other hand, if your tree has only parents and children, then depending on the PK/FK relationships, you might have to cache the whole thing and insert the most deeply nested child first. In any case, caching the rows and keeping their maps around is likely to be far less expensive than the equivalent DOM.) The last thing to consider is that you need to keep order information about the children of each element-as-class. Because elements can be nested, there will be a stack of order information, as you need to maintain this on a per-level basis. I don't think this will be too hard -- just a push every time you start an element-as-class, a pop every you end an element-as-class, and an increment every time you start an element-as-property or call to characters (remembering, of course, the problem with multiple calls to characters for a single piece of PCDATA).