Working with a shape's text in XML for Visio

Working with a shape's text in XML for Visio

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Several groups of elements are involved in working with the text of a shape—elements that describe characteristics of the text, those that describe the beginning and end of text runs, and an element that contains the text itself.

The following table gives a brief description of these elements and their relationships.

Element

Description

  <Shape>

Opening tag for the Shape element.

     <Text>

Contains the shape's text.

        <cp>

Marks character runs.

        <pp>

Marks paragraph runs.

        <tp>

Marks tab runs.

        <fld>

Marks Field position.

     </Text>

Closing tag for the shape's text.

     <Char>

Contains character properties.

     <Para>

Contains paragraph properties.

     <Tabs>

Contains tab properties.

     <Field>

Contains shape's text field properties.

  </Shape>

Closing tag for the Shape element.

For example, consider the following shape. It has text that consists of two character runs, one in bold and the other in italic.

Text that consists of two character runs, one in bold and the other in italic

The portion of the shape's sheet that describes character text runs is called the Character section. The Character rows for this shape as viewed in the ShapeSheet window are as follows:

Text that consists of two character runs, one in bold and the other in italic, as represented in the Character section in the ShapeSheet window. The column on the left, indicates the length of each run, 5 characters and 6 characters, respectively.

Notice that the column on the left indicates the length of each run, 5 characters and 6 characters, respectively. The only cells that vary between the two rows are the Style cell values. The Style cell determines if the run is bold, italic, or another style. The XML code for these rows follows:

  <Char IX='0'>
<Font>0</Font>
<Size Unit='PT'>0.16666666666667</Size>
<FontScale>1</FontScale>
<Letterspace>0</Letterspace>
<Color>0</Color>
<Style>1</Style>
</Char>
<Char IX='1'>
<Font>0</Font>
<Size Unit='PT'>0.16666666666667</Size>
<FontScale>1</FontScale>
<Letterspace>0</Letterspace>
<Style>2</Style>
<Color>0</Color>
</Char>

Note Though the size of the text is 12 points, the internal unit for points is inches. So, a the point size of '12' is represented in inches as '0.16666666666667'.

The two Char elements represent the two Character rows. Each has an IX attribute describing the relative order of the rows. Each Char element contains child elements that correspond to the cells viewed in the ShapeSheet window.

The Shape element also contains an element called Text, which contains the characters of the text and special elements (cp, pp, tp, and fld) that mark the end of one run and the beginning of the next. The Text element is somewhat unusual because it contains both data (the text characters) and child elements (cp, pp, and so on).

The Text element that describes the text on the preceding shape follows:

  <Text><cp IX='0'/>Bold <cp IX='1'/>Italic</Text>

The <cp IX='0'> tag indicates that the character properties from the first Char element (<Char IX='0'>) are to be applied to the text that follows. The <cp IX='1'/> tag indicates that the preceding character run (<Char IX='0'>) has ended and the run attributed to <Char IX='1'> has begun.

Inheritance for text elements follows the standard inheritance rules. Text row elements (Char, Para, Tabs and Field) need to contain child elements only for elements whose values are different from their inherited value.

Note Visio writes out all values—inherited or not—when it saves an XML for Visio file.

For example, the statement <Char IX='0'><Color>4</Color</Char> is sufficient to specify that a Char element should have blue text; the remaining element values will automatically be inherited.

  • If the text row elements are in a master or a non-instance shape (a shape that is not an instance of a master), the values are inherited from the shape's text style, that is, the ID of the StyleSheet that is referenced in the shape's TextStyle attribute. If the shape does not contain a TextStyle attribute, the values are inherited from the default document style.
  • If the text row elements are in an instance shape (a shape that is an instance of a master), the values are inherited from the corresponding row in the master. If a corresponding row does not exist, values are inherited from the first row in the master. (An instance of a master that has a TextStyle attribute inherits text properties from the style instead of the master.)

When Visio loads text from an XML file that is created or edited outside of Visio (an untrusted file), the first thing it does is normalize the text data, that is, it fills in, adds, or reorders missing elements, and removes unused elements.

Following are some of the ways in which Visio normalizes untrusted text data in a VDX file at load time.

When Visio emits an XML for Visio file, all cp, pp, tp, and fld elements in the Text element have IX values that are sequential. In other words, a <cp IX ='0'/> tag always precedes a <cp IX ='1'/> tag and so on.

If Visio encounters run markers that are out of order when it reads an XML for Visio file, it inserts duplicates of the text run rows as needed. For example, say Visio reads the following file:

  <Char IX='0'><Style> 0 </Style></Char>
<Char IX='1'><Style> 1 </Style></Char>

<Para IX='0'><HorzAlign> 0 </HorzAlign></Char>
<Para IX='1'><HorzAlign> 1 </HorzAlign></Char>

<Text><cp IX='0'/><pp IX='0'/>The quick <cp IX='1'/>brown fox &#xA;
<pp IX='1'/>jumped <cp IX='0'/>over the lazy dog.</Text>

The code describes a text block that contains three character runs and two paragraph property runs. There is a single paragraph break (&#xA; is the entity reference for a linefeed). The block would render like this:

The quick brown fox jumped over the lazy dog. This text block contains three character runs and two paragraph property runs.

In the preceding example, the Char row 0 is referenced twice (<cp IX ='0'/>). Visio normalizes this by creating a third Char row (<Char IX='2'>), which is identical to Char row 0.

When it is resaved, it contains the third Char row, and an additional cp element marking the third character run, which is demonstrated as follows. In addition, the XML file will contain all the child elements (Font, Size, FontScale, and so on) for all the Char and Para elements.

  <Char IX='0'>...<Style> 0 </Style>...</Char>
<Char IX='1'>...<Style> 1 </Style>...</Char>
<Char IX='2'>...<Style> 0 </Style>...</Char>

<Para IX='0'>...<HorzAlign> 0 </HorzAlign>...</Char>
<Para IX='1'>...<HorzAlign> 1 </HorzAlign>...</Char>

<Text><cp IX ='0'/><pp IX='0'>The quick <cp IX ='1'/>brown fox &#xA;
<pp IX ='1'/>jumped <cp IX ='2'/>over the lazy dog.</Text>

Although it is not an error to omit the marker for the first run, it is recommended that you include it. Visio always emits the initial marker element when the XML for Visio data is round-tripped.

Paragraph run markers (<pp IX='1'>) and tab run markers (<tp IX='1'>) are valid only at the beginning of a paragraph. If such a marker is encountered in the middle of a paragraph, it is ignored. A new line, which can be represented by a linefeed or the entity reference &#xA;, is required before each pp and tp element.

Note You do not have to type the &#xA entity reference to create a linefeed. Visio recognizes an actual linefeed from a text editor.

Visio normalizes untrusted text data when it loads into the application. This means that Visio performs further processing—for example, local override or local delete—on the normalized data. When you are**editing or generating your own XML for Visio files, you'll get better performance results if you try to mimic the XML that Visio generates.