Cross-Reference Your XML Data

 

Charles Heinemann
Microsoft Corporation

December 7, 1998

Download the source code for this article (2.3 KB)

In this season of Burl Ives and wonderful lives, I thought it apropos to discuss the true meaning of XML. For, although many would like us to think that it is all about in what XML offers, the banalities of countless TV specials remind us otherwise. Indeed, the true meaning lies not in what XML does for you, but in what you can do with XML -- which brings me to this month's topic: using internal cross-references to join XML elements.

All of the XML documents that I have used as examples for previous articles, and no doubt most of the XML documents you have seen -- period -- deal with a fairly simple hierarchical structure. Such an XML document might look like so:

<class id="ENGL6004">
  <title>From Here to Eternity: Studies in the Future
    and other Temporal Genres</title>
  <teacher id="T31330">
    <name>Margaret Doornan</name>
    <position>Associate Professor</position>
  </teacher>
  <students>
    <student id="S50245">
      <name>Maggie Trudeau</name>
      <year>Junior</year>
      <status>part-time</status>
    </student>
    <student id="S87901">
      <name>John Atterly</name>
      <year>Senior</year>
      <status>full-time</status>
    </student>
    <student id="S19272">
      <name>Mitch Milton</name>
      <year>Junior</year>
      <status>full-time</status>
    </student>
    <student id="S48984">
      <name>Norbert James</name>
      <year>Senior</year>
      <status>full-time</status>
    </student>
  </students>
</class>

The structure of the above XML document is fine if we are simply dealing with the data for a single class. You will, however, most likely be concerned with not just one, but many classes, many teachers, and many students -- all teaching and taking various classes.

If we then provided all the data that concerns every class within a single XML document, we would have a great deal of repetition within that document. In each class element, we would have to provide the pertinent data concerning the teacher and students, even if that data was already laid out elsewhere. Also, in the above document, it's easy to find the students for a class but not the class information for a student. If we want to access information about the classes that a particular student is taking, we have to walk through each class element looking for the specific student's information. It would be much easier and much less cumbersome if we could simply go directly from the student's information to the relevant "class" elements.

The best thing in this situation is to have all of the information for the teachers in a "teachers" element, all the information for the students in a "students" element, and all the information for the classes in a "classes" element. That way, we aren't bogged down in redundant data. Structuring our XML document this way, however, necessitates a way to define the relationships between these three elements -- so that we can, for instance, access the relevant teacher data while navigating a "class" element. This can be achieved through using defined IDs and IDREFs.

The following XML document groups the data specific to classes, teachers, and students in three distinct elements. These elements are then linked through "id" attribute values and "ref" attribute values:

<schedule term="Fall" year="98" xmlns:data="x-schema:idSchema.xml"
                                xmlns:ref="x-schema:refSchema.xml">
  <classes>
    <data:class id="ENGL6004">
      <title>From Here to Eternity: Studies in the Future
        and other Temporal Genres</title>
      <ref:teacherRef ref="T31330"/>
      <students>
        <ref:student ref="S50245"/>
        <ref:student ref="S87901"/>
        <ref:student ref="S19272"/>
        <ref:student ref="S48984"/>
      </students>
    </data:class>
    <data:class id="HIST6010">
      <title>The You Decade: A History of Finger Pointing
        in Post-War America</title>
      <ref:teacher ref="T72100"/>
      <students>
        <ref:student ref="S60912"/>
        <ref:student ref="S87901"/>
        <ref:student ref="S84281"/>
        <ref:student ref="S44098"/>
      </students>
    </data:class>
    <data:class id="ENGL6020">
      <title>Reading between the Lines: The Literature
        of Waiting</title>
      <ref:teacher ref="T31330"/>
      <students>
        <ref:student ref="S84281"/>
        <ref:student ref="S19272"/>
        <ref:student ref="S48984"/>
        <ref:student ref="S44098"/>
      </students>
    </data:class>
  </classes>
  <teachers>
    <data:teacher id="T31330">
      <name>Margaret Doornan</name>
      <position>Associate Professor</position>
      <classes>
        <ref:class ref="ENGL6004"/>
        <ref:class ref="ENGL6020"/>
      </classes>
    </data:teacher>
    <data:teacher id="T72100">
      <name>Hal Canter</name>
      <position>Instructor</position>
      <classes>
        <ref:class ref="HIST6010"/>
      </classes>
    </data:teacher>
  </teachers>
  <students>
    <data:student id="S44098">
      <name>Kelly Griftman</name>
      <year>Senior</year>
      <status>full-time</status>
      <classes>
        <ref:class ref="HIST6010"/>
        <ref:class ref="ENGL6020"/>
      </classes>
    </data:student>
    <data:student id="S48984">
      <name>Norbert James</name>
      <year>Senior</year>
      <status>full-time</status>
      <classes>
        <ref:class ref="ENGL6004"/>
        <ref:class ref="ENGL6020"/>
      </classes>
    </data:student>
   . . .
</schedule>

The relationships between the "id" and "ref" attribute values are apparent to the human reader. But we must also let the parser know that these relationships are special so that we can navigate to and from joined elements. This can be done in a schema -- or, as in the above case, multiple schemas -- by declaring particular attributes to be of type "id" and type "idref". Just to clarify, these types do not refer to the name of the attribute but to the actual data type associated with that attribute. In our case, the "id" attributes will be of the type "id" and the "ref" attributes will be of the type "idref". The schemas for the above XML look like so (remember, because schemas allow for an open content model, all elements and attributes need not be defined within the schema for the XML document to be valid):

<!-- idSchema.xml -->
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
        xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <AttributeType name="id" dt:type="id"/>
  <ElementType name="teacher" content="eltOnly">
    <attribute type="id"/>
  </ElementType>
  <ElementType name="class" content="eltOnly">
    <attribute type="id"/>
  </ElementType>
  <ElementType name="student" content="eltOnly">
    <attribute type="id"/>
  </ElementType>
</Schema>

<!-- refSchema.xml -->

<Schema xmlns="urn:schemas-microsoft-com:xml-data"
        xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <AttributeType name="ref" dt:type="idref"/>
  <ElementType name="teacher" content="eltOnly">
    <attribute type="ref"/>
  </ElementType>
  <ElementType name="class" content="eltOnly">
    <attribute type="ref"/>
  </ElementType>
  <ElementType name="student" content="eltOnly">
    <attribute type="ref"/>
  </ElementType>
</Schema>

We have now told the parser that those elements with "ref" attributes are related to elements with corresponding "id" attributes. As I will show you in a moment or two, this will allow us to then navigate from one to the other. For instance, from this "teacher" element

<ref:teacher ref="T31330"/>

to this "teacher" element:

<data:teacher id="T31330">
  <name>Margaret Doornan</name>
  <position>Associate Professor</position>
  <classes>
    <ref:class ref="ENGL6004"/>
    <ref:class ref="ENGL6020"/>
  </classes>
</data:teacher>

IDs and IDREFs can also be defined using a Document Type Definition (DTD). The following declarations are an example of how to do this:

<!ATTLIST teacher id ID #IMPLIED>
<!ATTLIST teacher ref IDREF #IMPLIED>

We can navigate from nodes with IDREFs to nodes with corresponding IDs using the nodeFromID() method on the IXMLDOMDocument object. For instance, if I want to access the "data:teacher" node that corresponds to the "ref:teacher" node containing the IDREF attribute with the value of "T31330", I can simply pass that value to the nodeFromID() method, and that method would return the desired node:

TeacherNode = xmlDoc.nodeFromID("T31330")

While watching old Potter once again get the best of George Bailey, I whipped up a little Web page that lets the user navigate throughout the above XML document. This Web application is available on this page as an exe file.

One of the more interesting aspects of the Web page is the way in which you can move in and out of the three upper-level elements -- the "classes", "teachers", and "students" elements -- depending on what information you want to display. In the sample application, the user can click on a student's name (garnered from the "data:student" elements) and get information concerning that student's classes (garnered from the "data:class" elements).

The HTML for the sample application is a series of DIV's in which I insert the data from our XML document:

<P><B>TITLE:  </B><DIV ID=classTitle></DIV>
<P><B>CLASS ID:  </B><DIV ID=classID></DIV>
<P><B>TEACHER:  </B><DIV ID=teacher></DIV>
<P><B>STUDENTS:  </B>
  <TABLE>
    <TR>
      <TD><DIV ID=studentTable></DIV></TD>
      <TD><DIV ID=detailsTable></DIV></TD>
    </TR>
  </TABLE>

I then use the following function to access the data concerning the class (title, class ID, name of the teacher, names of the students enrolled) and insert that data into the above HTML. Notice how I navigate to the appropriate "data:teacher" and "data:student" elements from the "data:class" element to get more detailed information.

function getData(presClass){
  var studentNode, studentRef, studentName;
  var teachRef = presClass.childNodes.item(1).getAttribute('ref');
  var teachNode = xmlid.nodeFromID(teachRef);

  classTitle.innerText = presClass.childNodes.item(0).text;
  classID.innerText = presClass.getAttribute('id');
  teacher.innerText = teachNode.childNodes.item(0).text;

  var studentRefs = presClass.childNodes.item(2).childNodes;
  var tableStr = "<TABLE>";
  for (var i=0;i<presClass.childNodes.item(2).childNodes.length;i++){
    studentRef = studentRefs.item(i).getAttribute('ref');
    studentNode =  xmlid.nodeFromID(studentRef);
    studentName = studentNode.childNodes.item(0).text;
    tableStr += "<TR><TD><SPAN ID=" + studentRef +
      " onclick=getStudentInfo()>" +
      studentName + "</SPAN></TD></TR>";
    }
  tableStr += "</TABLE>";
  studentTable.innerHTML = tableStr;
  }

In the above function, I create an HTML string that will dynamically build a table containing student names. While doing so, I allow each of these student names to be clicked on, exposing the classes in which that student is enrolled. The getStudentInfo function is called when the user clicks on a particular student.

function getStudentInfo(){
  var student = xmlid.nodeFromID(event.srcElement.id);
  var studentStr = "<TABLE BORDER><THEAD><TH>CLASSES</TH></THEAD>";
  var classRefs = student.childNodes.item(3).childNodes;
  var classRef, classNode, classTitle;
  for (var j=0;j<classRefs.length;j++){
    classRef = classRefs.item(j).getAttribute('ref');
    classNode = xmlid.nodeFromID(classRef);
    classTitle = classNode.childNodes.item(0).text;
    studentStr += "<TR><TD><SPAN ID=" + classRef +
      " onclick=getNewClass()>" + classTitle +
      "</SPAN></TR></TD>";
    }
  studentStr += "</TABLE>";
  detailsTable.innerHTML = studentStr;
  }

Again, I create an HTML string that builds a table. This time, however, that table is filled with the titles of the classes in which that student is enrolled. As with the other table, this table contents is hot, and the user can click on the title of a class and change the data within the entire page.

Creating graphical structures is a little tricky, but it can lessen the size of your XML files and make it easier to navigate to the relevant information. So, while you're sitting at home waiting for your Uncle Billy to return from the bank empty-handed, try reorganizing your data and accessing that data by navigating from IDREFs to IDs. It can cut down on the size of your XML documents and on the code needed to access the data within.

Charlie Heinemann is a program manager for Microsoft's Weblications team. Coming from Texas, he knows how to think big.