Reverse Linking Explained

 

Benjamin Guralnik

September 19, 2002

Download Reverse-Linking-Demo.exe.

Summary: Guest author Benjamin Guralnik introduces the concept of reverse linking, which is a single instruction in a new document that populates older documents with a link to the new relevant document in one step, rather than having to update each older document individually. (7 printed pages)

Requirements This download requires Microsoft Internet Explorer 5.0 or later with MSXML3.0 installed.

Editor's Note   This month's installment of Extreme XML is written by guest columnist Benjamin Guralnik. Benjamin lives in Jerusalem, Israel and is an independent developer specializing in XML/XSLT. He has been a regular contributor to the online community surrounding Microsoft XML products. Benjamin has been pioneering XML-oriented information systems since 1999 when he began working on the award-winning SAS Interactive Handbook, an innovative help aide that reorganized SAS System's native help into a compact and useful interface. In his free time Benjamin enjoys reading, playing tennis, and studying classical piano.

How It All Started

Everyone knows what direct links are. Whether it's the blue underlined text that you click on while browsing the Web, or it's somewhat older version in the shape of "see page 276 in Vol. II" notes that you see in books, these links are guiding us throughout tons of material in our attempts to focus on a topic and find relevant information about it. Although extremely helpful, the tricky thing about direct links is that they remain unchanged as time goes by. I happen to have a rare 1945 edition of George Orwell's Animal Farm, and naturally it doesn't have a reference to 1984, which he wrote four years later. Of course, any new edition of Animal Farm that you buy today includes a neat preface updated with everything Orwell has written, but I always secretly wished there was a way to make this fact known in my old dog-eared, falling-apart copy.

While it is obvious that the only way to update printed material is by reprinting it, the strange fact is that this is also the only choice with online content. The sole difference being that for the Web, no extra paper is wasted. You'd still need to open, edit, save, and republish to the Web just about every single page so that it reflects new content. It is the frustration of adding a new link to dozens of older documents that has led me to the idea of a reverse link—a single instruction in the newer document (target) that specifies to which older documents (source) the new content is to be quoted.

The Reverse Linking Concept

Another way of looking at a reverse link is comparing it to a collect call. Unlike a regular call, where the person calling is the person paying, a collect call splits those functions apart, making the receiver of the call pay. Similar to this, while a regular link is both declared and shown inside the source document, a reverse link gets declared inside the target document. In other words, instead of the typical point to Document B instruction within Document A, a reverse link gives a make document A point to me request in document B. The whole concept is visualized in Figure 1 below.

Figure 1. Direct vs. reverse links

As a way of shorthand, a reverse link's declaration can contain multiple hrefs for listing all the target documents at once.

Figure 2. Extended syntax of a reverse link

For example, a code sample demonstrating the usage of conditional templates in XSLT can make itself available from four relevant Language Reference documents with one line.

<link:from href1="../LangRef/xsl-if.xml"
           href2="../LangRef/xsl-choose.xml"
           href3="../LangRef/xsl-when.xml"
           href4="../LangRef/xsl-otherwise.xml">
           Conditional templates
</link:from>

But how does a certain page get to know about all the reverse links that it has been asked to display by other documents? Well, one way of doing this would be to scan all the other documents every time a page is opened, looking for relevant reverse link declarations. The problem being that this approach is terribly inefficient on large sets of documents. The next idea I came up with was to scan all the documents only once, compiling and arranging all the declarations inside an intermediate links bank, as shown in Figure 3.

Figure 3. A sample links bank

Let's walk through creating the bank. First, the compiler scans all XML files in and down the folder that it was launched from (baseFolder), and saves the results in a corresponding tree that mirrors the scanned directory structure. The trees allows every document to locate its entry immediately, and also lets the developer move their project folder into other folders, drives, networks, and even publish it on the Internet. The tree, which has a relative root (baseFolder), and which consists only of relative links, preserves all links intact as long as the inner structure of the project folders is preserved.

Inside each file's entry in the bank, the compiler copies all the reverse links found in the corresponding document. When this copy is performed for all the documents, an algorithm is run to transform the reverse link "requests" into real direct links and the link:reqBy elements (short for Requested By) are stored under the document that actually displays the link, rather than the requesting document's entry.

Finally, when we have a specific document and want to know which other documents should be referenced from it, we only need to look under its entry in the bank as that's the one place where all the requests from other documents are brought together.

Figure 4. A sample document showing a reverse link

Going back to the collect call example, the links bank functions as the go-between telephone operator, notifying the payer about any request to talk on his expense in the same way that the links bank notifies a page about any additional links that other documents have had requested it to display. The complete diagram of the reverse linking schema is shown in Figure 5.

Figure 5. The Reverse Linking Schema

An important thing to remember about the links bank is that it has to be compiled every time you make a change to your information system; otherwise the older bank remains unsynchronized with the updated or newly added content and the link requests made by it.

The Scope of Reverse Linking

Although I may want my article to be referenced on MSN.com, Yahoo! and the WWW Consortium portals, this would probably cost me a bit more than adding the following code declaration on my page:

<link:from href1="www.msn.com"
           href2="www.yahoo.com"
           href3="www.w3.org">Take a look at my new article</link:from>

Although reverse links do significantly expand the capabilities of direct links, they've got their limitations as well. Namely, they won't work on remote sources because you'd have to compile a links bank for the whole Internet with the base folder being http://, and that is sheer madness. I do want to mention, however, the possibility of two friendly sites setting up an interface for exchanging reverse link banks (imitating B2B information exchange channels), but focusing on this type of reverse linking is better left to another article dedicated solely to that topic.

When staying inside the boundaries of your information system, reverse linking can be ideal not only for keeping old documentation in the know of new relevant content, but also for interlinking an ever-changing part of your information system with the established and fixed system. Take, for example, the immortal conflict between the Language Reference and the User's Guide sections of nearly any programming language Help file. For some reason, most User's Guide articles contain perfect redirections to the relevant Language Reference pages, while the latter are not even aware of half the User's Guide's articles. Why? Very simple—Language Reference is the core of the language—a core that doesn't change even with major version upgrades. Maybe I'm old-fashioned, but even now in the age of .NET I still find my 1981 Microsoft manual on BASIC quite useful. User's Guides, on the contrary, are the most dynamic and unpredictable part of the manuals. Containing priceless discussion and up-to-date examples on the usage of different language elements, User's Guides usually get written at a time when the Language Reference is already polished, and who has the time to change and update it again?

The result of this is unfortunate and always the same. The conservative Language Reference section never gets updated with new content and modern techniques (just like my 1945 Animal Farm), and in some cases even borders on misinformation. Naturally, this issue can now be addressed with a reverse linking solution that allows User's Guide articles, Coder's Corner demos, and any other type of dynamic content to use reverse links to request appearances on all the relevant Language Reference pages.

The Compiler

Below is a short description of the compiler's main features in case you're interested in extending the current implementation or just want to borrow some fresh code.

Compiler functions

Init() Main function.
GetFiles(whatFolder, root, path) Maps the folders and files in and down the directory from which the compiler is launched. Calls GetReverseLinks for each file.
GetReverseLinks(xmlDoc) Retrieves a collection of reverse links from a given document and hands it over to BuildReqLinks.
BuildReqLinks(root) Creates a direct link for each reverse link declaration and stores it in the bank under the document that will actually be displaying the link.

Compiler stylesheets

Linkbank Filters the links bank out of reverse link declarations and formats it cleanly.
Tree Displays the links bank right after it has been compiled, and reports on broken reverse links.
GetPath(path) Gives a file's path and returns the path to the folder in which the file is stored.
GetRelURL(src, dest) Takes two absolute paths and calculates a relative link from the source to the destination document.
NormalizePath(path) Deletes redundancies in a path string such as "./" and "path/../".

Supporting stylesheet

Reverse-linking-library.xslt Accesses the links bank, obtains the requested links for the document being shown, and displays them. In order to enable reverse-linking in your documents, add the following three lines to your stylesheet:

xmlns:link="urn:reverse-linking-library"

(namespace declaration inside the xsl:stylesheet element.)

<xsl:import href="reverse-linking-library.xslt"/>

(Imports the library so we can use its templates and functions.)

<xsl:call-template name="link:seealso"/>

(Displays the requested links. Classically, this template should be called at the top of the page to fill the traditional See also section. However, it may also appear at the bottom or any other place on the page you think is relevant.

Acknowledgements

Thanks to Chris Lovett, who through a series of elaborate discussions helped me realize that what I have been implementing in this demo is not XLink but only a small part of it. According to the Introduction to Xlink article written by Steven J. DeRose, one of the fathers of XLink, reverse linking does address at least three of the six issues that XLink does, namely:

  • Bi-directional links
  • Links that annotate read-only documents
  • Link databases

Because I find the XLink terminology of arc, locator, and traverse confusing, I have decided to keep it all backstage, together with the project's close relation to XLink.

Also, thanks to Dare Obasanjo, who reviewed the article and made valuable suggestions. And as usual, thanks to Dad.