Documents Do Matter: Serve Them Nicely and Effectively with Avalon Document Services

 

Dino Esposito
Wintellect

August 2004

Applies to:
   Longhorn Community Technical Preview, WinHEC 2004 Build (Build 4074)

Summary: Provides an overview of document services available in Avalon. In particular, it focuses on the PageViewer control and its programming interface and also explains adaptive-flow and fixed-format documents. The new managed API for compound files is also presented with practical code samples. (14 printed pages)

Download the source code described in this article.

Contents

Tools for Rich Document Viewing
The PageViewer Control
Creating Viewable Documents
Loading Contents Programmatically
Adaptive-Flow and Fixed-Format Documents
Text Hyphenation
Compound Files
Summary

No class framework and no SDK that developers currently work with provides a serious and top-notch support for document viewing. Since the advent of Windows, displaying a relatively long text has never been a nightmare for developers. However, it's never turned out to be a rich and enjoyable experience for users either. Multiline textboxes and the RichTextBox control enable users of desktop applications to scroll text of any length quite easily. The text is entirely bound to the control and a pair of scrollbars is provided to move through it. Web applications work in a similar pattern when packing text into a <textarea> element.

The bottom line is that existing components offer only a basic set of capabilities that never go beyond text scrolling and "simple forms" of document viewing. More advanced viewing features such as pagination, navigation, zooming, and panning require specialized, ad hoc components that attentive third-party vendors provide.

"Avalon," code name for the user interface subsystem of Longhorn, marks an important trend reversal. The Avalon Document Services infrastructure includes reusable, customizable, and extensible components that expose rich document capabilities far beyond "simple" features like scrolling and viewing. In Avalon, you find not just pagination, page-navigation, and zooming functionalities, but also hyphenation, digital signatures, and easy management of compound files.

Tools for Rich Document Viewing

The Avalon Document Services infrastructure is mainly built around a rich-UI control—the PageViewer control. In addition to that, Avalon provides layout components like the TextPanel and FixedPanel control and advanced formatting tools capable of automatically adapting the text to the container that is displaying it. To this category of tools belongs the AdaptiveMetricsContext component.

By wrapping some text in an adaptive context, you create an adaptive-flow-format document that can reflow to fit a new window or container. In doing so, the control automatically configures itself to provide the best reading experience to users. Users, however, have the final word and can easily and interactively override the default settings to express their personal preferences. For example, an adaptive document can display its text in columns, if the width of the window exceeds a certain threshold. Moreover, an adaptive document doesn't have a fixed number of pages. The number of pages is dynamically determined based on the size of the hosting container and attributes set.

A fixed panel, instead, displays a fixed number of pages and doesn't apply any dynamic transformation to the contents. I've used the word "contents" here intentionally. Avalon controls can display anything that can be expressed using the XAML language, not just plain text. Valid pageable content includes images, animations, movies, vector graphics, and more.

To start out this overview of the Avalon Document Services, let's tackle the most prominent and representative component of the whole infrastructure—the PageViewer control.

The PageViewer Control

The PageViewer control is the favorite Avalon control for document viewing. It can render documents in both fixed and adaptive-flow format and supports advanced features such as panning, zooming, pagination, and scrolling. Users can decide how many pages to view at a time and zoom in and out to enlarge or decrease the rendered view. The overall user interface of the PageViewer control is broadly equivalent to the Print Preview dialog box of many Windows Forms applications. Note, however, that the Print Preview common dialog requires you to provide the rendering code of a given document through an event handler procedure. The PageViewer control uses its own rendering code and requires only that you specify the document to print. You can indicate formatting options and special rendering functions declaratively using the full set of XAML features.

The PageViewer control is highly interactive and has its own toolbar, from which users can select zoom percentages and the number of pages to view at a time. The scrollbar associated with the control—actually, a PageBar object—allowsenables both scrolling and direct navigation to pages. Finally, the PageViewer control also provides a digital signature icon that allows users to verify the authenticity of signed content.

Figure 1 shows a PageViewer control in action.

Aa480157.avalon-edocs01(en-us,MSDN.10).gif

Figure 1. The typical output of a PageViewer control. The page bar lists the zoom levels allowed and the page view selector.

As Figure 1 shows, the text can contain formatting styles such as bold and can include images and movies. The source of the PageViewer control is a XAML document.

<PageViewer ID="Viewer" Height="50%" Width="50%" 
            Source="doc.xaml" />

The Source property can only accept a string that references a URI. The actual type of the Source property is System.Uri.

Let's consider a sample document.

First and foremost, the root element of the file assigned to the Source property must support pagination, otherwise an exception is thrown. Supporting pagination means that the element must correspond to a class that implements the IDocumentFormatter interface. Examples of such classes are FixedPanel, TextPanel, and Table. The following file displays correctly through the control.

<TextPanel xmlns="https://schemas.microsoft.com/2003/xaml" ID="root">
   <Heading Background="LightCyan">The PageViewer Control </Heading>
   The <bold>PageViewer</bold> control is the …
</TextPanel>

Table 1 shows some of the properties you can set on the control to configure its appearance. The control features many more properties but I was unable to set them all to work properly with the build of Longhorn I was using. (Check the MSDN documentation for a quick explanation and to form a better idea of the potential of this control.)

Property Description
Columns Determines how many pages should be displayed vertically. Set to 1 by default.
Mode Lets you choose between panning and selection mode. When panning is enabled (it doesn't seem to work at this time), you should be able to drag the page in all directions to display pages. Selection mode lets you choose a page by index and clicking on the page bar.
PagePosition Sets the page bar (the control's scrollbar) to the specified 0-based page index.
Rows Determines how many pages should be displayed horizontally. Set to 1 by default.
Zoom Double value, indicates the zoom level required with respect to 1. For example, set it to 0.75 if you want display 75 percent of the real size. Set it to 2 if you want a 200 percent zoom.

Table 1. Some of the properties that affect the PageViewer appearance

Figure 2 shows another view of the PageViewer control that witnesses the control's capability of zooming in and out and changing views.

Aa480157.avalon-edocs03(en-us,MSDN.10).gif

Figure 2. Each page displays at 75 percent of its real size and 2x2 pages are displayed at the same time

Creating Viewable Documents

Before going any further, though, let me briefly illustrate the properties of the TextPanel component—the XAML element that you mostly use to feed a PageViewer control.

The TextPanel element is used to format, size, and draw text. It supports multiple lines of text and multiple text formats.

<TextPanel>
:
</TextPanel>

Child elements of a TextPanel can be <Section>, <Heading>, <PageBreak>, <Image>, <Paragraph>, and <Bold>, just to name a few and give you a quick idea of the flexibility you have.

The <Section> element implements a generic container element with a rendering behavior similar to that of an HTML <div> tag. The <Paragraph> element represents a block of text much like the <p> tag in HTML. <Heading> is rendered as the title bar of the document and automatically enjoys a larger font. The <PageBreak> tag indicates the end of a page while <Bold> causes the embedded text to render in boldface.

<TextPanel xmlns="https://schemas.microsoft.com/2003/xaml" ID="root">
   <Heading Background="LightCyan">Page Title </Heading>
   <Section>
     <Paragraph>This is <Bold>some</Bold> text</Paragraph>
   </Section>
</TextPanel>

The figure below offers a screenshot of the XAML.

Aa480157.avalon-edocs04(en-us,MSDN.10).gif

Figure 3. A simple TextPanel object

There are a number of different properties and methods that extend the capabilities of the TextPanel component. They range from visual properties like FontFamily and FontSize to background and foreground colors, and from TextRange to hyphenation.

The TextRange property gets a value that represents a range of text within the control that can then be manipulated programmatically. Each element embedded in a TextPanel can uniquely identified with an ID and be edited at run time.

Loading Contents Programmatically

As mentioned, you assign the PageViewer control some contents to display by making the Source property point to a URI. This can be done either declaratively or programmatically. If you do it programmatically, then make sure you create a Uri object first, as shown below.

Viewer.Source = new Uri("file://c:/test.xaml");

Note that the Uri class is not specific to Longhorn. It is also available in the .NET Framework 1.x, specifically in the System namespace.

If working with files and URIs doesn't particularly attract you, try passing in-memory data. It is as easy as creating a new TextPanel object and setting its TextRange property.

TextPanel doc = new TextPanel();
doc.TextRange.Text = someText;
Viewer.Navigate(doc);

To make the PageViewer control display the new text, you call its Navigate method. The method has several overloads. The one shown below lets you pass in the tree rooted in a text panel object.

public bool Navigate (FrameworkElement content);

The FrameworkElement object represents the root of the XAML document displayed. It must be an object that provides paging capabilities like TextPanel.

Let's write a simple XAML application that selects a text file and displays its content through the PageViewer control. The mainframe window will contain a text box and a couple of buttons. The first button will display the Open File common dialog and let you choose a text file from disk. The name of the selected file is automatically assigned to the text box. The second button—the Load button—reads the file name from there and loads the document into the PageViewer. The following XAML code illustrates the structure of the main window.

<Window xmlns="https://schemas.microsoft.com/2003/xaml"
    xmlns:def="Definition" def:Class="eDocs.Window1"
    def:CodeBehind="Window1.xaml.cs" 
    Text="Document Viewing">
    <Table Margin="4">
    <Body>
        <Row>
            <Cell>
                <TextBox ID="FileName" />
                <Button Click="ChooseFile">...</Button>
                <Button Click="LoadData">Load</Button>
            </Cell>
        </Row>
        <Row>
            <Cell>
                <PageViewer ID="Viewer" Height="60%" Width="50%" />
            </Cell>
        </Row>
    </Body>
    </Table>

</Window>

The code snippet below corresponds to the ChooseFile event handler and describes the application's interaction with the Open File common dialog box.

private void ChooseFile(object sender, ClickEventArgs e)
{
   OpenFileDialog open = new OpenFileDialog();
        
   // Determine which file types to display in the dialog
   FileType[] fileTypes = new FileType[2];
      fileTypes[0] = new FileType("Text Files", "*.txt");
fileTypes[1] = new FileType("All files", "*.*");
   open.FileTypes = fileTypes;
        
   // Display the dialog box
   DialogResult res = open.ShowDialog();
   if (res == DialogResult.OK)
      FileName.Text = open.Result.FileSysPath;
}

To call the system common dialogs, you need to import the System.Windows.Explorer.Dialogs namespace. The code to write is similar to corresponding Windows Forms code but with several minor syntax changes. First, a FileType object is now needed to add a file type to the filter box. In Windows Forms, you indicate that you want to select only, say, text files with a pipe-separated string. Needless to say, the Avalon approach is much more elegant and clean.

Second, the name of the selected file is returned as an object—the Item object in the System.Windows.Explorer namespace. The Item object represents any item within Explorer and not just file system items like files and folders. Unlike the Windows Forms API, Avalon realizes a full encapsulation of the Windows shell. To get the file system path out of the selected item, you must access the FileSysPath member.

Finally, when users click the Load button, the following code runs.

private void LoadData(object sender, ClickEventArgs e)
{ 
   TextPanel doc = new TextPanel();
   StreamReader reader = new StreamReader(FileName.Text);
   doc.TextRange.Text = reader.ReadToEnd();
   reader.Close();
   Viewer.Navigate(doc);
}

A StreamReader object is used to access the text file and read all of its content to the TextRange object of a TextPanel. Next, the TextPanel is bound to the PageViewer, as in the figure below.

Aa480157.avalon-edocs05(en-us,MSDN.10).gif

Figure 4. A sample application that displays the content of a text file using the PageViewer control

To bind a XAML file programmatically, you can resort to the following code:

Viewer.Source = new Uri(@"file:///c:\lhsamples\sample.xaml");

If you bind a non-XAML file through the Source property, you can get an exception because the control cannot find a data binder for what it considers an unknown file type. That's why I used an intermediate TextPanel in previous example.

Adaptive-Flow and Fixed-Format Documents

The PageViewer control is a special Avalon component designed to act as the container and viewer of an external document. In addition to that, the Avalon framework also provides document layout services, that is, components that let you control the layout of documents outside the PageViewer. There are two types of documents—adaptive and fixed layout.

Adaptive documents render the embedded content in a way that provides the best reading experience. Key to creating adaptive-flow documents is the AdaptiveMetricsContext element. This element is the root element for adaptive-flow documents. The content of this element is rendered with the optimal number of columns per page, ideal sizes for all text elements, and calculated sizes and positions for all figures. The Avalon runtime does lots of work to improve the quality of on-screen reading. The results may change as the container window is resized.

<AdaptiveMetricsContext 
xmlns="https://schemas.microsoft.com/2003/xaml" ID="root"
ColumnPreference="Medium">
<TextPanel>
: 
</TextPanel>
</AdaptiveMetricsContext>

After a child panel is added to an AdaptiveMetricsContext element, the content of the panel is processed by an internal module known as the Reading Metrics Engine. The module then calculates the optimal number of columns and the ideal font sizes and line heights. User preferences are taken into account, of course. You can program the metrics engine through the ColumnPreference property and font-related properties. In particular, through the ColumnPreference property you indicate how important columns are for you. By default, columns are not created. To understand the role played by ColumnPreference, take a look at the figure below.

Aa480157.avalon-edocs06(en-us,MSDN.10).gif

Figure 5. The Longhorn version of Internet Explorer renders an adaptive-flow document

The output above has been obtained setting ColumnPreference to a value of Medium. If you use a value of High, the rendering engine would create four columns for an equally sized host window. To make room for a fourth column, the font size would be reduced.

For adaptive-flow documents, the vast majority of properties that affect the layout are set and known only at run time (and, more importantly, change dynamically based on run-time conditions). For fixed-format documents, in contrast, all layout properties are set in advance and never change.

Fixed-format documents consist of a collection of objects that together describe the appearance of one or more pages. The layout is set in stone and never changes, whatever application or control hosts the document. Here's an example of a fixed-format document.

<Document xmlns="https://schemas.microsoft.com/2003/xaml" ID="root">
<FixedPanel>
<FixedPage>
<Text FontSize="80">This is page 1</Text>
</FixedPage>
<FixedPage>
<Text FontSize="80">This is page 2</Text>
</FixedPage>
    </FixedPanel>
</Document>

Aa480157.avalon-edocs07(en-us,MSDN.10).gif

Figure 6. A fixed-format document in action

The FixedPanel element is the root element used in fixed-format documents to contain fixed pages for pagination. It displays paginated content one page at a time or as a scrollable group of pages. FixedPage, in turn, provides access to a single page within a fixed-format document.

Text Hyphenation

Usability studies have shown that people read hyphenated text easier than non-hyphenated text. Hyphenated text is text in which words at the end of each line are hyphenated in a number of typical positions. Compared to a ragged-right formatting, hyphenated text produces a more even right edge. Compared to a fully aligned paragraph, hyphenated text saves white space and also helps the eye to track the reading line.

Armed with the results of these studies, the Avalon team implemented automatic hyphenation in the TextPanel class. When the feature is turned on (set the Hyphenation attribute to True), the hyphenator component gets into the game whenever a line of text is being rendered. It looks at the final word in the line and determines whether it can be hyphenated and how. To do this, the hyphenator looks up a dictionary. Avalon includes a standard set of language-based hyphenation dictionaries, but developers can extend dictionaries to take into account particular occurrences of words. Extensions to dictionaries can be provided in two ways—using an external file or using a special XAML element. The following code snippet gives an idea of how Avalon uses text hyphenation.

<FlowPanel xmlns="https://schemas.microsoft.com/2003/xaml"
  xmlns:def="Definition" Width="100%">
  <FlowPanel.Resources>
    <Hyphenator def:Name="myHyphenator">
        <HyphenationDictionary Culture="en-US">              im=port=ant cap=able impl=ement=ed        </HyphenationDictionary>
    </Hyphenator>
  </FlowPanel.Resources>
  <TextPanel Hyphenator="{myHyphenator}" Hyphenation="true">
  <Paragraph FontSize="20">
     Because Of this result, the Avalon team implemented 
     automatic hyphenation in the TextPanel class. 
     :
  </Paragraph>
</TextPanel>
</FlowPanel>

In the <HyphenationDictionary> element, you indicate your own rules for hyphenation. As soon as the hyphenator is bound to a text panel, those rules are automatically reflected. The figure below shows two ways of hyphenating the word "implemented".

Aa480157.avalon-edocs08(en-us,MSDN.10).gif

Figure 7. Hyphenation in action in a TextPanel element

Compound Files

In real-world applications, you certainly need to display large data contents, but typically you read it from persistent storage media. Any valid Avalon content can be persisted and distributed using compound files. A compound file is a special type of file specifically designed to store various kinds of content. Originally created to support the storage needs of the COM programming model, compound files find a second life in Longhorn where they are largely used to store documents, applications, and application data. Popular examples of compound files that you work with every day—perhaps without knowing it—are Word and Excel documents. Avalon provides an ad hoc API to create and edit compound files. The advantage of storing documents in a compound file is that all content is in a single container, described by metadata and, if needed, can be compressed and rights-managed.

Avalon provides a new managed API to access and manipulate content of compound files. In Win32, a compound file is built around two sets of functionality—the IStream and IStorage interfaces. Avalon encapsulates them in a bunch of new classes: StreamInfo and Stream render the IStream interface, while StorageInfo governs creation and management of storage.

What are the key benefits of the structured storage available with compound files?

First and foremost, a compound file stores its data in logical files named streams. More streams can be grouped in logical directories known as storages. A sort of allocation table at the beginning of the file tracks the offset to the various data entities. The inherently hierarchical structure of compound files reduces the performance overhead associated with storage in a flat file.

In addition, Avalon enriches the compound file subsystem with additional features such as support for digital signatures and compression.

As a result, an Avalon compound file can contain text, audio, images, and even application code. Let's see how to create and edit a compound file using the Avalon classes. The code snippet below illustrates how to create a compound file. The namespaces to import are System.IO.CompoundFile and System.IO.

StorageRoot stg = StorageRoot.Open(fileName, 
FileMode.Create, FileAccess.ReadWrite);
StorageInfo stgInfo = (StorageInfo) stg;

The static Open method on the StorageRoot class returns a StorageInfo object that is the starting point for any other operation. Let's create a couple of child streams. The idea is to store two representations of a randomly generated number—the number itself and a bitmap that represents it graphically.

StreamInfo strNum = new StreamInfo(stgInfo, "Number");
Stream streamNum = strNum.Create(FileMode.Create);
StreamInfo strImg = new StreamInfo(stgInfo, "Image");
Stream streamImg = strImg.Create(FileMode.Create);

At this point, you fill the streams with a randomly generated integer and a dynamically created bitmap.

Random rnd = new Random();
int num = rnd.Next();

BinaryWriter writer = new BinaryWriter(streamNum);

writer.Write(num);

writer.Close();

System.Drawing.Bitmap bmp = CreateTheImage(num);
bmp.Save(streamImg, System.Drawing.Imaging.ImageFormat.Jpeg); 
streamImg.Close();

The CreateTheImage method uses GDI+ classes to output the number on an in-memory graphics context.

System.Drawing.Bitmap CreateTheImage(int num)
{
  System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(200, 50);
  System.Drawing.Graphics g = System.Drawing.Graphics.FromImage(bmp);
  g.DrawString(num.ToString(), 
     new System.Drawing.Font("Impact", 20),
     System.Drawing.Brushes.Yellow, 
     new System.Drawing.Point(20, 10)); 
  g.Dispose();
  return bmp;
}

As of the WinHEC edition of Longhorn, there are no XAML-based facilities to create an in-memory graphic dynamically. While we wait for a newer build, using GDI+ is probably the most effective approach.

My sample application creates a randomly named compound file when you click a button. The same document (.mydoc sample extension) is then read back when you click a second button. The following code shows the click handler of the Display Content button. (See Figure 8.)

void LoadFile(object sender, ClickEventArgs e)
{
  // Get the file name
  string fileName = FileName.Text;

  // Open the file
  StorageRoot stg = StorageRoot.Open(fileName, 
    FileMode.Open, FileAccess.Read);
  StorageInfo stgInfo = (StorageInfo)stg;

  // Read the number
  StreamInfo strNum = new StreamInfo(stgInfo, "Number");         
  Stream streamNum = strNum.Open(FileMode.Open);
  BinaryReader reader = new BinaryReader(streamNum); 
  int num = reader.ReadInt32();
  reader.Close();

  // Read the image
  StreamInfo strImg = new StreamInfo(stgInfo, "Image");
  Stream streamImg = strImg.Open(FileMode.Open);

  // Refresh the UI
  TheImage.Source = new ImageData(readStream); 
  StreamImg.Close(};
}

The user interface of the application is completed with an Image element, whose ID is TheImage. The content of the Image element is set through the Source property. It is interesting to note that if you create the image element declaratively in the XAML source, the Source property must be a URI. If you set the same property programmatically, it only accepts an ImageData object.

Aa480157.avalon-edocs09(en-us,MSDN.10).gif

Figure 8. A sample application that reads and writes compound files

Summary

In various forms and formats, documents are at the foundation of virtually any software application. In spite of this, though, ad hoc tools to work with them easily and effectively are seldom available. Looking at the Windows platform, the advent of strong XML support in Office 2003 is going to change the situation and really make documents open to all interested applications but closed to unauthorized users.

When it comes to working with documents, there are two main scenarios to address—document viewing and document layout. Avalon provides the PageViewer control to view documents that can be expressed as an XAML stream. The PageViewer control has several non-trivial capabilities such a scrolling, panning, zooming, and pagination. In addition, it is also capable of verifying the validity of any digital certificate attached to the document.

Setting a correct document layout can also enhance the quality of onscreen reading. Avalon has two types of layout to offer—adaptive-flow and fixed format. In the former case, text flows in the container environment in much the same way in which HTML text flows into the browser window. If you want a document structure that never changes, you opt for the fixed format layout in which number, size, and the content of pages never change. Longhorn's adaptive flow goes beyond the standard HTML capabilities in that it can split the text in columns and also make any needed font adjustment.

 

About the Author

Dino Esposito is a trainer and consultant based in Rome, Italy. Member of the Wintellect team, Dino specializes in ASP.NET and ADO.NET and spends most of his time teaching and consulting across Europe and the United States. In particular, Dino is the co-founder of VB2TheMax, a popular .NET knowledge bank, and writes the "Cutting Edge" column for MSDN Magazine. Click here to join the blog.