XML - DOM

BASICS  |  NAMESPACE  |  SCHEMA

 
   
Intro

 

DOM, the so-called Document Object Model, is another name, in this context, for how XML can be read into memory as a tree/hierarchy and manipulated by a script or program. It's using a program, like Visual BASIC, or javascript, to build or modify a tree, in memory, which can be saved or visualized as an XML document. The XML document can be cut and spliced, that is, after being read in as this tree, according to a particular DOM. Parts of the tree can be programatically located and changed with built-in routines. This is the kind of thing that would have been useful with HTML, and which might have been available at one point. But now you can take an XML document, read it into memory according to this 'Document Object Model', isolate some things, change others, add new nodes to the tree, etc.; even just to read or manipulate the information, not necessarily to save it or update the source XML document.

[To be clear, the DOM can be used to create a tree of just a single web page, as well, and so simple HTML, and whatever else is part of that page. When you load a web page, a tree has been made, as well. The language then used to manipulate that tree/hierachy would typically be, javascript, where some of the same functions mentioned, below, would be used. And so with regard to browsers, it's the same old as from the Netscape and IE browser 'wars' days, with some implementing some functions, another implementing another, with variations even in those functions which they share in common. It becomes very browser specific. Microsoft might have its own DOM. The W3C might propose it's ideal DOM #1. And those putting together the Firefox or Opera browsers might pick up some of the W3C 'standard', or not. And of course there have always been differences with the Apple Safari browser and Microsoft's Apple port of I-Explorer, from all the rest.]

 
MSXML

 

This, of course, requires some sort of program or set of libraries to perform the translation back and forth and to make changes when converted to this DOM. They can be called 'parsers', 'core', whatever. But one frequently used is from Microsoft, called msxml.dll, in versions 2.5, 3, 4, and probably now, 5, even a version for .NET. References are set in the program to this mxsml library. And then a host of methods and properties are available for manipulating a tree, or nodeset, stored in memory and either taken from the original XML document and/or built node by node. Msxml installs with newer operating systems, like XP, but even just with any installation of Internet Explorer 5 on up, I believe. Versions, and documentation, are available at msdn.com, for separate download or online viewing.

 
Add Node

 

This might be an example, using Visual BASIC:

Dim xmldoc As New DOMDocument
Dim nodData As IXMLDOMNode, nodRoot As IXMLDOMNode, nodFile As IXMLDOMNode
 
Set nodRoot = xmldoc.createElement("root")
xmldoc.appendChild nodRoot
sAppendNode xmldoc, nodData, nodRoot, "CatalogName", strTitle
 
Set nodFile = xmldoc.createElement("file")
nodRoot.appendChild nodFile
sAppendNode xmldoc, nodData, nodFile, "sourcepath", strCataTplate

 

And sAppendNode is a custom routine:

Public Sub sAppendNode(xmldoc As DOMDocument, _
           nodNew As IXMLDOMNode, nodSup As IXMLDOMNode, _
           nodName As String, nodText As String)
 
   Set nodNew = xmldoc.createElement(nodName)
   nodNew.Text = nodText
   nodSup.appendChild nodNew
 
End Sub

 

The subroutine, sAppendNode, uses a shortcut DOM command, createElement, to simply take the name of nodName and create an unattached node called nodNew. So if nodName = "file", then you now have an element, called file, which exists by itself, but which is ready to be attached to any tree. Once attached, it is referred to as the node, nodNew. Some text for this file tag/element may also be passed as, nodText. The Text property of the element is used to store that. If the file is, say, "e:/windows/system/x.dll", then the xml for this would be - <file>e:/windows/system/x.dll<file>, just like that. And finally, the appendChild method is used to attach this nodNew as a child of the node, nodSup. Now it's part of the tree, and in that sense of the XML document.

The procedure that called this sub starts off with a couple of type declarations: xmldoc As New DOMDocument, and nodData As IXMLDOMNode. The msxml, itself, was set up so that a reference is not explicitly needed. In something like javascript, this wouldn't be the case. But in Microsoft Office and development programs, the reference to libraries like msxml.dll can be established, beforehand. So an xml document object, a DOMDocument, is simply called out as New, creating an instance ready to use. The IXMLDOMNode is the unlikely name for a node in an xml document - it's not "node", or "domnode", but this long, IXMLDOMNode.

There can only be one root node in an xml document - or more properly, one root element - even if it's not called, root. It is, here, though. The first append is right to the new xmldoc, itself. So you have an xml document consisting of a single root node. All further appends will be to the root, or something under the root. The first call to the sub creates an element, under root, called here, CatalogName, with some text for that name in the variable, strTitle. Similarly an empty element, file, is appended to root. But under file, another element called, sourcepath, with presumeably some URL or local drive path in the string, strCataTplate.

It's that simple, to construct an xml document using DOM. A few more additions to the structure, and you could see something like:

- <root>
  <CatalogName>Main</CatalogName>
  - <file>
      <sourcepath>E:\templates\xslt\catalogs\default.xsl</sourcepath>
      <targetpath>E:\catalogs\</targetpath>
      <filename>Catalog.htm</filename>
    </file>
  </root>

 

You can generate the xml document, itself with the simple command - xmldoc.Save filename.xml. You don't even have to call it .xml, but it would be best.

 
Add Attribute

 

Adding an attribute is slightly different. As with the sub, above, you need some reference to your DOMDocument, the name of the attribute and its value. The node is still superior. The way DOM handles it, the element is superior to its attributes.

Public Sub sAddAttr(strAttrName As String, strAttrVal As String, nodT, xmldoc As DOMDocument)
   Dim nodAttr
 
   Set nodAttr = xmldoc.createAttribute(strAttrName)
   nodAttr.Text = strAttrVal
   nodT.Attributes.setNamedItem nodAttr
 
End Sub

 

Obviously, there's a great deal more to the DOM than just how to create and attach elements and attributes. But that's an important part of using the DOM, and may be all you need, at first.


 
 
 More to read:

TOP XML Microsoft DOM Online copy of Microsoft XML DOM reference
Inet.com DOM Multi-page tutorial intro to DOM
W3 Schools Multi-page tutorial on DOM
Microsoft Help Microsoft's original compiled help file for msxml 4.0, with DOM reference (click: download de SDK)
TOP XML parsers Downloads of various Microsoft msxml libraries/parsers, even some older versions no longer available from Microsoft
Microsoft msxml 4 SP2 Service pack replacement for msxml 4.0 and SP1
Apache Xerces Xerces XML parser for Apache servers
(Robin) Cover Pages Cover's DOM links
W3C DOM WWW Consortium updated specs and explanation of DOM