Use Regular Expressions to get hyperlinks in blogs

At Southwest Fox conference I presented a sample calling a VB.NET server to do regular expression matching.

Here’s the sample I used. It gets some HTML from my blog and parses all the hyperlinks (looks for the HREF tags) and puts them into a VFP table:

First create a VB server: A Visual Basic COM object is simple to create, call and debug from Excel

The VFP code gets the blog page as html then passes that html to the VB server, which does the regular expression matching:

LOCAL ox as vbcom.ComClass1

oVB=CREATEOBJECT("VBCom.ComClass1")

LOCAL oHTTP as "winhttp.winhttprequest.5.1"

oHTTP=NEWOBJECT("winhttp.winhttprequest.5.1")

oHTTP.Open("GET","http://blogs.msdn.com/calvin_hsia",.f.)

oHTTP.Send()

cHTML=ohTTP.ResponseText

cXML =oVB.RegEx(chtml)

XMLTOCURSOR(cxml)

BROWSE LAST NOWAIT

Add a method to the VB class (this version works in VS 2003):

Imports System.Text.RegularExpressions

Imports System.Xml

    Public Function RegEx(ByVal cHtml As String) As String

        Dim cregex As Regex = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _

             RegexOptions.IgnoreCase Or RegexOptions.Compiled)

        Dim MatchCollection As MatchCollection = cregex.Matches(cHtml)

        Dim sb As New System.Text.StringBuilder

        Dim xw As XmlTextWriter = New XmlTextWriter(New System.IO.StringWriter(sb))

        xw.WriteStartElement("VFPData")

        For Each m As Match In MatchCollection

            xw.WriteStartElement("Row") ' for each Row

            xw.WriteStartElement("RegEx") ' field name

  xw.WriteString(m.Value)

            xw.WriteEndElement()

            xw.WriteEndElement()

        Next

        xw.WriteEndElement()

        Return sb.ToString

Of course, when I did the demo, I used a newer version of VB and I did a SQL Select from the Regular Expression results. I also used XLINQ, the new XML features of LINQ

                   Dim aList As New List(Of Match)

                   For Each m In MatchCollection

                             aList.Add(m)

                   Next

                   Dim res = Select p From p In aList Order By p.Tostring()

                   Dim xmlMain = <VFPData/>

                   For Each item In res

                             Dim xRow = <Row/>

                             xRow.Add(<RegEx><%= item %></RegEx>)

                             xmlMain.Add(xRow)

                   Next

                   Return xmlMain.ToString