August 2012

Volume 27 Number 08

Windows PowerShell - Build User-Friendly XML Interfaces with Windows PowerShell

By Joe Leibowitz | August 2012

The Windows PowerShell scripting language does everything you want a command-line tool to do—and so much more—that it could eventually replace technologies such as VBScript. For a good general description of what Windows PowerShell is about and the basics of using it, see bit.ly/LE4SU6 and bit.ly/eBucBI.

Windows PowerShell is thoroughly integrated with the Microsoft .NET Framework and thus is deeply connected to XML, the current international standard for data exchange using structured text files. For general information about XML, see bit.ly/JHfzw.

This article explores the capacity of Windows PowerShell to present and manipulate XML data with the goal of creating a relatively simple UI for reading and editing XML files. The idea is to make this easier and more convenient using algorithms that “understand” the sibling and parent-child relations of any given file, even without foreknowledge of the schema. I’ll also examine the use of Windows Forms in Windows PowerShell, XPath querying and other related technologies. The proposed app can digest an XML file and emit XPath queries on its own.

Let’s look at how you can analyze any XML file in Windows PowerShell and present it in a format that people without advanced technical skills can use. Figure 1 shows a preview of the type of GUI you can create.

Preliminary View of the GUI
Figure 1 Preliminary View of the GUI

The key to making this happen is to enable the Windows Power-Shell application to parse and understand any XML file without human guidance or foreknowledge of its schema. After researching existing technologies for automated analysis of XML files, I decided to develop a parsing engine for this specific purpose, because what I was able to find didn’t fully address the need to understand XML documents without human viewing. Currently, applications universally seem to assume that a developer or user is well-acquainted with the elements, attributes and overall schema of any given XML document. But some—possibly many—situations in the real world fall outside this paradigm. For example, in a scenario with many data consumers who aren’t XML experts but who need access to a variety of XML data sources, the familiarity assumption of the existing paradigm fails. Similarly, even with a trained expert or two on staff, if an organization confronts hundreds or thousands of differently structured XML files, human handling could easily become overwhelmed.

Therefore, what’s needed is a parsing engine that will read any XML file and emit XPaths that ordinary users, with only a minimum of training, can use to search and edit any given XML file.

The XML Parsing Engine

To be compliant XML, a document’s closing and opening brackets must match. For example, if an element <ABC> exists, there must also exist at some later point in the same file an element </ABC>. Between these opening and closing angle brackets, almost anything can theoretically occur. Using this fundamental principle of XML, I’ll show you how to automatically construct a comprehensive series of XPath queries such that even relatively inexperienced XML data consumers can quickly put them to use to find and manipulate data in XML files.

First, establish a set of arrays to hold all opening and closing brackets in the XML file:

[int[]]$leading_brackets = @()
[int[]]$closing_brackets = @()
[string[]]$leading_value = @()
[string[]]$closing_value = @()

To build a strongly typed array of unknown size in Windows PowerShell, three elements are necessary: the [type[]] leading part; a $name part; and the symbol for an array of unknown size, @(). Variables in Windows PowerShell take $ as their leading character. These particular arrays cover the indexed locations of opening and closing angle brackets in the XML document as well as the string values of the element names associated with these brackets. For example, in the XML line <PS1>text value</PS1>, the integer index of the leading brackets would be 0 and the index of the closing brackets would be 15. The leading and closing values in this case would be PS1.

To get our target XML into memory, we use the following code:

 

$xdoc = New-Object System.Xml.XmlDocument
       $xdoc.Load("C:\temp\XMLSample.xml")

Figure 2 is a partial view of the actual XML file being used.

Figure 2 Partial View of the Sample XML File

<?xml version="1.0" encoding="utf-8"?>
<Sciences>
  <Chemistry>
    <Organic ID="C1" origination="Ancient Greece" age="2800 yrs">
      <branch ID="C1a">
        <size>300</size>
        <price>1000</price>
        <degree>easy&gt;</degree>
        <origin>Athens</origin>
        // Text for organic chem here
      </branch>
      <branch name="early" ID="C1b" source="Egypt" number="14">
        <size>100</size>
        <price>3000</price>
        <degree>hard&gt;</degree>
        <origin>Alexandria</origin>
        // Text for original Egyptian science
      </branch>
    </Organic>
  </Chemistry>
<Physics></Physics>
<Biology ID="B" origination="17th century" >
.
.
.
      <Trees4a name="trees4a" age="40000000">
        <type ID="Tda1">oakda</type>
        <type ID="Tda2">elmda</type>
        <type ID="Tda3">oakd3a</type>
      </Trees4a>
    </Plants>
  </Biology>
</Sciences>

After the load operation, this XML data is in memory. In order to manipulate and analyze the XML, I use the document object model that’s now instantiated in the $xdoc variable (but I’ll also need to use the XPathNavigator technology for a few special purposes, as noted later in this article):

# Create an XPath navigator (comments in PowerShell code take the \"#\" leading character)
$nav = $xdoc.CreateNavigator()

One of the most interesting features of Windows PowerShell is the built-in function, or cmdlet, called Get-Member, which lets you examine the methods and properties of any object in Windows PowerShell right in the IDE as you develop. Figure 3 includes a call to this cmdlet on the $nav object just created, and Figure 4 shows the results displayed in the Windows PowerShell Integrated Scripting Environment (ISE) when the Get-Help call is made.

Figure 3 Results of Get-Member Call

Get-Member -InputObject $nav
                      TypeName: System.Xml.DocumentXPathNavigator
Name                 MemberType Definition
----                 ---------- ----------
AppendChild          Method     System.Xml.XmlWriter AppendChild(), System.V...
AppendChildElement   Method     System.Void AppendChildElement(string prefix...
CheckValidity        Method     bool CheckValidity(System.Xml.Schema.XmlSche...
Clone                Method     System.Xml.XPath.XPathNavigator Clone()
ComparePosition      Method     System.Xml.XmlNodeOrder ComparePosition(Syst...
Compile              Method     System.Xml.XPath.XPathExpression Compile(str...
CreateAttribute      Method     System.Void CreateAttribute(string prefix, s...
CreateAttributes     Method     System.Xml.XmlWriter CreateAttributes()
CreateNavigator      Method     System.Xml.XPath.XPathNavigator CreateNaviga...
DeleteRange          Method     System.Void DeleteRange(System.Xml.XPath.XPa...
DeleteSelf           Method     System.Void DeleteSelf()
Equals               Method     bool Equals(System.Object obj)
Evaluate             Method     System.Object Evaluate(string xpath), System...
GetAttribute         Method     string GetAttribute(string localName, string...
GetHashCode          Method     int GetHashCode()
TypeName: System.Xml.DocumentXPathNavigator
.
.
.
.
.
Value                Property   System.String Value {get;}
ValueAsBoolean       Property   System.Boolean ValueAsBoolean {get;}
ValueAsDateTime      Property   System.DateTime ValueAsDateTime {get;}
ValueAsDouble        Property   System.Double ValueAsDouble {get;}
ValueAsInt           Property   System.Int32 ValueAsInt {get;}
ValueAsLong          Property   System.Int64 ValueAsLong {get;}
ValueType            Property   System.Type ValueType {get;}
XmlLang              Property   System.String XmlLang {get;}
XmlType              Property   System.Xml.Schema.XmlSchemaType XmlType {get;}

Results of Get-Help in Windows PowerShell
Figure 4 Results of Get-Help in Windows PowerShell

While Get-Member will often put you on the right track during Windows PowerShell development, you’ll also find the associated Get-Help cmdlet handy during this process.

If you type get-help xml at the command line, as shown in Figure 4, you’ll get the output shown here:

getName                 Category  Synopsis
----                 --------  --------
Export-Clixml        Cmdlet    Creates an XML-based representation of an object or...
Import-Clixml        Cmdlet    Imports a CLIXML file and creates corresponding obj...
ConvertTo-XML        Cmdlet    Creates an XML-based representation of an object.     
Select-XML           Cmdlet    Finds text in an XML string or document.             
about_format.ps1xml  HelpFile  The Format.ps1xml files in Windows PowerShell defin...
about_types.ps1xml   HelpFile  Explains how the Types.ps1xml files let you extend ...

If you type get-help about_types.ps1xml, you’ll see the results shown in Figure 5.

Figure 5 Getting Help with Types.ps1xml Files

TOPIC
    about_Types.ps1xml
SHORT DESCRIPTION
    Explains how the Types.ps1xml files let you extend the Microsoft .NET Framework types of the objects that are used in Windows PowerShell.
LONG DESCRIPTION
    The Types.ps1xml file in the Windows PowerShell installation directory ($pshome) is an XML-based text file that lets you add properties and methods to the objects that are used in Windows PowerShell. Windows PowerShell has a built-in Types.ps1xml file that adds several elements to the .NET Framework types, but you can create additional Types.ps1xml files to further extend the types.
SEE ALSO
    about_Signing
    Copy-Item
    Get-Member
    Update-TypeData

The Windows PowerShell integrated system for researching syntax is comprehensive and relatively easy to use. This is a topic worthy of its own article.

To get the XML into analysis-ready condition, use the Select method of XpathNavigator:

$nav.Select("/") | % { $ouxml = $_.OuterXml }

In the first part of this statement, I invoke .Select on the simple XPath query “/”, giving the entire XML contents. In the second part, after the Windows PowerShell symbol | for its object pipeline, I do a foreach, represented by the alias %; I could’ve used foreach rather than the alias. Inside the loop, I build the working XML string data variable $ouxml from the .OuterXML property of the objects being processed in the loop. Referring back to Figure 3, .OuterXML is one of the properties of the XPathNavigator object. This property provides a complete set of all the angle brackets in the XML file, which is required for the parsing engine to work properly.

Note that for objects going through a pipeline, $_ is the symbol for the particular instance, with dot notation used to obtain each instance’s properties and methods. Every object in the pipeline is addressed or referenced using the $_ symbol.  To get an attribute of the $_ object, use, for example, $_.Name (if Name is a member property of the particular object). Everything passing through a Windows PowerShell pipeline is an object with properties and methods.

A last preparation stage before parsing is to “regularize” the XML text by treating any special cases that look like <ShortNode/>. The parsing engine would rather see the same information in a different format: <ShortNode></ShortNode>. The following code starts this transformation using a RegEx and looking for matches:

$ms = $ouxml | select-string -pattern "<([a-zA-Z0-9]*)\b[^>]*/>"   -allmatches
foreach($m in $ms.Matches){ ‘regularize’ to the longer format }

You can now look at the main analytical code for this application: the parsing engine that will populate the four arrays listed earlier. Figure 6 shows code that tests the file for opening brackets.

Figure 6 Testing a File for Opening Brackets

# if you run out of “<” you’re done, so use the “$found_bracket” Boolean variable to test for presence of “<”
$found_bracket = $true
while($found_bracket -eq $true)
{
  # Special case of first, or root element, of the XML document;
  # here the script-level variable $ctr equals zero.
    if($Script:ctr -eq 0)
    {
    #to handle the top-level root
    $leading_brackets += $ouxml.IndexOf("<",$leading_brackets[$Script:ctr])
    $leading_value += $ouxml.Substring(0,$ind)
    $closing_brackets += $ouxml.IndexOf("</" + $leading_value[0].Substring(1))
    $Script:ctr+=1
    }
}

The code in Figure 6 handles the special case of the root element of the XML document. Another fundamental rule of XML is that every schema should contain a single overall root set of angle brackets; inside of these enclosing symbols, the XML data can be structured in any manner consistent with the matching rule mentioned earlier, that is, for every “<ABC>” there’s an “</ABC.”

Notice that the += syntax is used to add an item or element to an array. Later, after being populated with elements, such an array can be accessed via indexing, as in $leading_brackets[3].

In the IndexOf arguments, note that the starting position for the search, represented by the second parameter in the method, shows a reference to $Script:ctr. In Windows PowerShell, variables have distinct scopes that follow from where they’re created. Because the variable $ctr here is created outside the scope of any function, it’s considered script-level, and a script-level variable can’t be changed from inside a function without referring to $Script. Inside a loop, rather than inside a function, the $Script reference may not be required, but it’s a good habit to keep scope in mind at all times.

When coding, a good clue to a scope violation is a variable that should be changing in value but isn’t; usually, this is because it’s out of scope and needs to be prefixed accordingly.

Once the root element is handled, all other elements are handled within one else block:

else
{
# Check for more \"<\"
$check = $ouxml.IndexOf("<",$leading_brackets[$Script:ctr-1]+1)
if($check -eq - 1)
{
break
}

The first thing to do is to check whether the end of the file has been reached; the criterion for that event is the absence of further < symbols. The preceding code does this. If there are no more < symbols, a break is called.

The next segment of code distinguishes between < cases and </ cases: 

#eliminate "</" cases of "<"
if($ouxml.IndexOf("</",$leading_brackets[$Script:ctr-1]+1) -ne `
  $ouxml.IndexOf("<",$leading_brackets[$Script:ctr-1]+1))

Because you’re trying to accumulate all the opening angle brackets, you want to know only about these at this stage of parsing engine operations. Notice the Windows PowerShell syntax for “not equal” in comparisons: -ne. Related operators include -eq,-lt and -gt. Also, as in Visual Basic (but unlike C#), you need a line-wrapping character, the back-tick symbol (`), to continue a line of code.

If the test succeeds, populate the $leading_brackets array with a new element:

$leading_brackets += $ouxml.IndexOf("<",$leading_brackets[$Script:ctr-1]+1)

With the newest iteration of leading angle brackets established, the next task is to isolate the name of the associated element. For this task, note that after the initial opening < and element name, <ElementName, there’s either a space followed by one or more attributes, or the brackets close, as in the following two cases:

<ElementName attribute1="X" attribute2 = "Y">, or
<ElementName>

Separate these two cases with the following code, which looks to see which comes first, a space or the > symbol:

$indx_space = $ouxml.IndexOf(" ",$leading_brackets[$Script:ctr])
  $indx_close = $ouxml.IndexOf(">",$leading_brackets[$Script:ctr])
  if($indx_space -lt $indx_close)
  {
  $indx_to_use = $indx_space
  }
  else
  {
  $indx_to_use = $indx_close
  }

Once you establish the proper ending point, employ $indx_to_use to help isolate the string associated with the leading angle bracket that’s now in focus:

$leading_value += $ouxml.Substring($leading_brackets[$Script:ctr],($indx_to_use -
  $leading_brackets[$Script:ctr]))

In effect, the leading value is the string starting with < and ending with either a space or a >.

The stage is set to pick up the correlative closing angle brackets by finding the string </ElementName:

$closing_brackets += $ouxml.IndexOf("</" + $leading_value[$Script:ctr].Substring(1),`
  $leading_brackets[$Script:ctr]+1)
$Script:ctr+=1

Finally, in case the distinction between < and </ is not met, increment the array element and continue:

else
{
$leading_brackets[$Script:ctr-1] +=1
}

At the end of this process, the three arrays look like the following partial presentation of their data:

$leading_brackets:
0 18 62 109 179 207 241 360 375 447 475 509 625 639 681 713 741 775 808 844 900 915 948 976 1012 1044 1077 1142 1154 1203 1292 1329 1344 1426 1475 1490 1616 1687 1701 1743 1810 1842 1890 1904 1941 1979 2031 2046 2085 2138 2153 2186 2235 2250 2315 2362 2378 2442 2476 2524 2539 2607 2643 2718
$leading_value:
<Sciences <Chemistry <Organic <branch <size <price <degree <origin <branch <size <price <degree <origin <Physics <Biology
$closing_brackets:
2718 1687 625 360 179 207 241 273 612 447 475 509 541 1142 900 713 741 775 808 844 882 1129 948 976 1012 1044 1077 1

Establishing Nodal Relationships

Now it’s time for the second phase in parsing engine operations. In this more complex phase, the sequences of $leading_brackets and $closing_brackets establish the parent-child and sibling relations among all the nodes of the XML being parsed. First, a number of variables are established:

# These variables will be used to build an automatic list of XPath queries
$xpaths = @()
$xpaths_sorted = @()
$xpath = @()
[string]$xpath2 = $null

Next, a first pairing of adjacent leading and closing brackets is fixed:

$first_open = $leading_brackets[0]
$first_closed = $closing_brackets[0]
$second_open = $leading_brackets[1]
$second_closed = $closing_brackets[1]

And some loop counters are created:

$loop_ctr = 1
$loop_ctr3 = 0

The engine will parse iteratively no more times than the value of the $ctr variable incremented during the first phase when building the $leading_brackets and other arrays (the following if statement is the litmus test in terms of establishing the nodal structure of the XML):

if($second_closed -lt $first_closed)

If the $second_closed value is less than (-lt) the $first_closed value, a child relationship is established:

<ElementOneName>text for this element
  <ChildElementName>this one closes up before its parent does</ChildElementName>
</ElementOneName>

With a child node detected, the variables are reset to the next two adjacent pairs of opening-closing angle brackets, the counters are incremented and the vital $xpath array is populated with a new element:

$first_open = $leading_brackets[$loop_ctr]
$first_closed = $closing_brackets[$loop_ctr]
$second_open = $leading_brackets[$loop_ctr + 1]
$second_closed = $closing_brackets[$loop_ctr + 1]
$loop_ctr2 +=1
#if($loop_ctr2 -gt $depth){$loop_ctr2 -= 1}
$depth_trial+=1
$xpath += '/' + $leading_value[$loop_ctr-1]
$loop_ctr+=1

You’ve now reached the critical processing stage for the parsing engine: What to do when the parent-child relation no longer holds.

A preliminary matter is to eliminate duplicates that will arise in the course of parsing engine operations. To do this, the variable holding the entire array of XPath queries (which is the key value constructed by the parsing engine) is reviewed element by element to ensure that it doesn’t already contain the new proposed candidate for inclusion in $xpaths, which at this point is the current value of $xpath, established in the eighth line of the code in Figure 7.

Figure 7 Checking for Duplicate Xpaths

$is_dupe = $false
  foreach($xp in $xpaths)
  {
  $depth = $xpath.Length
  $xp = $xp.Replace('/////','')
  $xpath2 = $xpath
  $xpath2 = $xpath2.Replace(" ","")
  $xpath2 = $xpath2.Replace("<","")
  if($xp -eq $xpath2)
  {
  $is_dupe = $true
  #write-host 'DUPE!!!'
  break
}

If the current $xpath value is not a duplicate, it’s appended to the $xpaths array and $xpath is recreated as an empty array for its next use:

if($is_dupe -eq $false){$xpaths += ($xpath2 + '/////');}
$xpath = @()
$xpath2 = $null

The essential device used by the parsing engine to continue through the XML iteratively is to rebuild the arrays at each itera-tion. To accomplish this, the first step is to create new interim array objects as transitional devices:

$replacement_array_values = @()
$replacement_array_opens = @()
$replacement_array_closes = @()
$finished = $false
$item_ct = 0

The engine loops through the $leading_value array and filters out just the current one:

foreach($item in $leading_value)
{
if($item -eq $leading_value[$loop_ctr - 1] -and $finished -eq $false)
{
$finished = $true
$item_ct+=1
continue  #because this one should be filtered out
}

Unfiltered values are populated into the interim array. All three arrays are populated via association because the array of element name values corresponds in its indexing with the opening and closing angle bracket arrays:

$replacement_array_values += $item
$replacement_array_opens += $leading_brackets[$item_ct]
$replacement_array_closes += $closing_brackets[$item_ct]
$item_ct +=1

When the three interim arrays are complete, the three permanent arrays are assigned their new values:

$leading_value = $replacement_array_values
  $opening_brackets = $replacement_array_opens
  $closing_brackets = $replacement_array_closes
  $loop_ctr+=1

The next iteration of the first phase of the parsing engine is readied by initializing the first adjacent pairs of angle brackets:

$first_open = $leading_brackets[0]
$first_closed = $closing_brackets[0] 
$second_open = $leading_brackets[1] 
$second_closed = $closing_brackets[1] 
$loop_ctr = 1
$loop_ctr2 = 1
continue  # Sends the engine back to the top of the loop

Finally, to complete the set of XPath queries, you generate short paths that the previously described process might not have included. For instance, in the current example, without this extra last step, the XPath \Sciences\Chemistry would not be included. The underlying logic is to test that every shorter version of every XPath query also exists, without duplicates. The function that performs this step is AddMissingShortPaths, which you can see in the code download for this article (archive.msdn.microsoft.com/mag201208PowerShell).

With all of the automatic XPath queries in hand, you’re ready to build a Windows Forms app for users. In the meantime, the XPath queries just produced are put into the file C:\PowerShell\XPATHS.txt via the Windows PowerShell >> output syntax.

Constructing the Windows Forms Application

Because Windows PowerShell hosts .NET libraries and classes, you can write the following code and thereby make available to your application Windows Forms and the Drawing classes of .NET:

[void] [Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms")
[void] [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")

With these basic building blocks in place, you can build a form and its controls, as follows:

$form= New-Object Windows.Forms.Form
$form.Height = 1000
$form.Width = 1500
$drawinfo = 'System.Drawing'
$button_get_data = New-Object Windows.Forms.button
$button_get_data.Enabled = $false
$text_box = New-Object Windows.Forms.Textbox
$button_get_data.Text = "get data"
$button_get_data.add_Click({ShowDataFromXMLXPathFilter})

It’s worth noting that add_Click is Windows PowerShell syntax for attaching an event to a control—in this case, attaching a function call to the button’s click event. The code in Figure 8 adds buttons and textboxes.

Figure 8 Adding Buttons and Textboxes

$pointA = New-Object System.Drawing.Point 
$listbox = New-Object Windows.Forms.Listbox 
$form.Controls.Add($listbox)
$listbox.add_SelectedIndexChanged({PopulateTextBox})
$form.Controls.Add($button_get_data)
$form.Controls.Add($text_box)
$pointA.X = 800 
$pointA.Y = 100 
$button_get_data.Location = $pointA
$button_get_data.Width = 100
$button_get_data.Height = 50
$pointA.X = 400 
$pointA.Y = 50 
$text_box.Location = $pointA
$text_box.Width = 800

In order to populate $listbox with your collection of XPath queries, do the following:

foreach($item in $xpaths)
{
$listbox.Items.Add($item.Substring(0,$item.Length - 5))
# Each row in the listbox should be separated by a blank row
$listbox.Items.Add('     ')
}

The UI

Figure 9 displays the UI with the XPath queries generated by the tool shown on the left, one of which was selected by the user.

Selecting an XPath Query
Figure 9 Selecting an XPath Query

In the final step, the user presses the GetXMLData button and produces the results shown in Figure 10.

The Results Window
Figure 10 The Results Window

There you have it—a simple UI for reading and editing XML files, created entirely with Windows PowerShell. In upcoming MSDN Magazine online articles, I’ll continue on this subject by showing you how to handle XML files that use namespaces, as well as demonstrate the use of the techniques shown here to allow editing of XML files via the interface.


Joe Leibowitz is a consultant who specializes in infrastructure projects. He can be reached at joe.leibowitz@bridgewaresolutions.com.

Thanks to the following technical expert for reviewing this article: Thomas Petchel