question

JeremyFord-8827 avatar image
0 Votes"
JeremyFord-8827 asked JeremyFord-8827 commented

Powershell unable to convert file contents to XML

Hello everyone. I have a powershell script that opens Showcase report files and reads the SQL property. It does this by taking the file contents, converting to XML, and looking for the TransformedSQL node. It works great, but sometimes the files are from a prior version, and powershell cannot convert those file contents to XML. Those files are filled with illegal XML characters. I tried removing the illegal characters and converting to XML data type, but then it cannot insert child nodes. The files are not corrupt, because I can open them with Report Writer application.

I would like to know how to convert the file contents of older Showcase reports to XML, when it has illegal characters. Perhaps there is another way to extract the TransformedSQL node while keeping the file contents as Object data type? Any help would be greatly appreciated. Here is a sample of the powershell script:

function Repair-XmlString
{
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$inXML
)

Match all characters that does NOT belong in an XML document

$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"

Replace said characters with [String]::Empty and return

return [System.Text.RegularExpressions.Regex]::Replace($inXML,$rPattern,"")
}

User-Defined Variables

$SourceFolder = "H:\Test\goodBad"
$OutputFile = "H:\Test\CombinedQueries.txt"

Get all Showcase report files (including ones in subfolders) and cycle through each one

Get-ChildItem -Path $SourceFolder -File -Include ".rpt", ".dbq" -Recurse |
Foreach-Object {

 # Read all contents of the file
 $content = Get-Content $_.FullName
 $strippedContent = Repair-XmlString ([string]$content)
 $xmlContent = [xml]$strippedContent


Here is the error message when it tries to convert to XML
Cannot convert value "System.Object[]" to type "System.Xml.XmlDocument". Error: "' ', hexadecimal value 0x01, is an
invalid character. Line 1, position 2."

windows-server-powershell
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

IanXue-MSFT avatar image
0 Votes"
IanXue-MSFT answered JeremyFord-8827 commented

Hi,

The invalid character $strippedContent[0][1] should be removed.

Best Regards,
Ian Xue
============================================
If the Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Thank you IanXue. I tried stripping out the illegal characters with function $strippedContent[0][1] because it could not convert the untouched content to XML data type. However, removing the illegal characters only causes it to fail for a different reason.

It appears the old version of Showcase reports constructs the xml in such a way that Power shell cannot convert it to XML datatype. If I open the report in its native application Report Writer and save, it will ask me if I want to convert to the newer version. If I say yes, then the XML looks normal and does not have any illegal characters.

The fact that Report Writer can open the xml files without issue, but Powershell cannot convert the same file contents to XML is causing a real issue. Do you think it could have something to do with UTF encoding? I know one was using UTF-8 and the other UTF-16

0 Votes 0 ·

Sorry I know little about the Showcase Report Writer, but I don't think using another encoding will introduce any additional non-printable character. I tested the Repair-XmlString function and the 0x01 character was removed.

83625-image.png


0 Votes 0 ·
image.png (19.9 KiB)

If i remove all illegal characters, then it says it cannot insert a child node because it is not of the same type when i try to convert the variable to XML datatype. I was hoping there was a more lenient XML data type, or a way to just scrap the content that will not convert to XML. All i need is the Transformed SQL node. Or, perhaps I can still work with the contents string, and search for Transformed SQL, and output everything until it finds the closing tag.

0 Votes 0 ·