Become the Master of All You Survey--Using XML as a Flexible Data Capture and Retrieval Medium 

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Become the Master of All You Survey--Using XML as a Flexible Data Capture and Retrieval Medium

Andrew Coates

Customer Relationship Management (CRM) systems often require that a survey or script be played for a particular customer while an agent is on the phone with them. Often, the exact content of the survey or script depends on the type of customer. In this article, Andrew Coates develops a system for capturing and retrieving such data. This technique uses XML as the storage medium for the answers to the questions and displays the enormous flexibility afforded by the use of a standard that is at once both free-form and structured.

I recently was asked to retrofit an existing Customer Relationship Management system with the ability to capture information that my client wanted to gather from a targeted group of his customers via a telephone survey. My first thought was to add a new table to the database with a one-to-one relationship to the main table and a field for each question in the survey, as shown in Figure 1. This was probably the simplest approach, but on further reflection I realized that it lacks flexibility should my client ever want to undertake additional surveys.

My next thought was to still have a table specific to the survey, but to have a one-to-many table allowing the entity to be linked to many survey tables. This would add some flexibility to the system in that new surveys could be added by adding a new table and adding rows to a survey master table. Information about which table referred to which survey could be stored in a survey master table, and information about which companies were eligible for which survey could be stored in a many-to-many table. This approach is illustrated in Figure 2.

The downside to the second approach is that every time a new survey is added, a new table has to be added to the database. In addition, if a survey needs fine-tuning by adding or changing questions, then the data structure needs to be updated. While these aren't insurmountable problems, I'd rather not have to muck around with the data structure every time my client wants to tweak the survey.

What I needed was a flexible way of storing answers to survey questions that could still be retrieved in a structured way. I decided to store the answers directly in a text (or memo) field in the many-to-many table. To make sure that the data was still retrievable in a sensible manner, I decided to use XML as the storage format within the text field. The final layout of my database is shown in Figure 3.

Data—the object of my desire

The architecture I decided on for this project was one of data objects. Each set of answers to a survey is represented by a data object that has a property for each answer. The object is then converted to and from an XML representation using Rick Strahl's ObjectToXML and XMLToObject from his (free) wwXML library. I've included the version of wwXML (with permission—thanks, Rick) that was current at the time of this writing in the accompanying Download file, but I strongly suggest that you visit Rick's site (www.west-wind.com) to check whether there's a more recent version. Rick's constantly adding great stuff both to existing libraries and as completely new sets of useful things.

As a companion to the data objects, there's a GUI class. This class instantiates the appropriate data object and then the data object's properties as control sources for its various controls. The GUI class itself does no data manipulation or handling (apart from displaying the data and allowing the user to enter or change answers). Instead, it calls the methods of the data object, which knows how to load and save itself from and to the XML data on the back end.

Data object abstract class

Both the data object and the GUI object are defined in the first instance as abstract classes—that is, classes that are only ever subclassed, never themselves instantiated. The data class standard properties and methods are shown in Table 1. ******

Table 1. Methods and properties of the data object abstract class.

Property/Method

Description

LoadData()

Loads the XML data from the back end for the SurveyID/CompanyID combination specified in the properties.

SaveData()

Persists an XML representation of the object's properties to the back end.

CreateCursor()

Creates a cursor with a field of the same name and of the appropriate type for each data property of the class.

cComments

Standard free text field available in all surveys.

cConnectString

Connect string for use when instantiating the object and connecting to a remote back end.

nCompanyID

ID of the company to which this set of answers applies.

nSurveyID

ID of the survey from which this set of questions is taken.

tLastUpdated

Last date/time this survey was saved for this CompanyID/SurveyID combination. Note that the default value is an empty time. You can't just use {} for this, as this is interpreted as an empty date and the parser chokes on time values later on. You need to convert the empty date into an empty time like this: =DTOT({}).

The code for the Init() method is shown in Listing 1.

****

Listing 1. The data object's Init() method.

  LPARAMETERS tnCurrentConnection, tcConnectString

IF VARTYPE(tcConnectString) = 'C' AND ;
    ! EMPTY(tcConnectString)
  THIS.cConnectstring = tcConnectString
ENDIF

IF VARTYPE(tnCurrentConnection) = 'N' AND ;
    tnCurrentConnection > 0
  THIS.nCurrentConnection = tnCurrentConnection
ELSE
  THIS.nCurrentConnection = ;
    SQLSTRINGCONNECT(THIS.cConnectstring)
  THIS.lAutoClose = .T.
ENDIF

* check that we made it onto the server
IF THIS.nCurrentConnection < 1
  LOCAL laError[1]
  AERROR(laError)
  MESSAGEBOX('Could not connect to the SQL ' + ;
    'Server. The ODBC Error message was:' + CR + ;
    laError[1,2], MB_ICONEXCLAMATION, ;
    'Unable to display survey')
  RETURN .F.
ENDIF


* create a wwXML object for use in loading and
* unloading
IF ! 'wwxml' $ LOWER(SET('classlib'))
  SET CLASSLIB TO LOCFILE('wwxml.vcx') ADDITIVE
ENDIF
THIS.owwXML = CREATE('wwXML')
IF VARTYPE(THIS.owwXML) # 'O'
  MESSAGEBOX('Could not create wwXML helper class', ;
    MB_ICONEXCLAMATION, 'Unable to display survey')

  IF THIS.lAutoDisconnect AND ;
  THIS.nCurrentConnection > 0
    SQLDisconnect(THIS.nCurrentConnection)
    THIS.nCurrentConnection = -1
  ENDIF

  RETURN .F.

ENDIF

When the object is instantiated, the Init code accepts two parameters. The first parameter is an integer representing a handle of an already established connection to a back-end data source via SQLCONNECT() or SQLSTRINGCONNECT(). This gives the object an opportunity to share a connection and thus reduce the resource requirements of the application. The second parameter (which is only used if the connection handle passed is non-numeric or <= 0) allows the overriding of the SQL connect string used to establish a connection to the back end if there's no handle passed.

If there's no connection handle passed, the data object's first job is to establish a connection to the back end. If it can't do this, there's no use continuing and it bails out. Next it attempts to create an instance of the wwXML helper class. This class is vital for loading and saving the data to the back end, so again, if it can't create an instance, it just bails out.

The grunt work of the object is done by the LoadData() and SaveData() methods, shown in Listing 2 and Listing 3, respectively.

****

Listing 2. Loading data from the back end.

  lParameters tcXML

WITH THIS

  local lcXML
  if vartype(tcXML) # 'C' or empty(tcXML)
    * check that the companyID is in the list for this
    * survey
    LOCAL lcSQL
    lcSQL = [select count(*) as matching from ] + ;
      [company_survey where SurveyID = ] + ;
      TRAN(.nSurveyID) + [ and companyid = ] + ;
      TRAN(.nCompanyID)

    sqlexec(.nCurrentConnection, lcSQL, 'countem')

    LOCAL lnMatches
    lnMatches = countem.matching
    USE IN countem

    IF lnMatches <> 1
      MESSAGEBOX('This company is not part of this' +;
        ' survey', MB_ICONEXCLAMATION, ;
        'Unable to display survey')

      RETURN .F.

    ENDIF

    LOCAL lcSQL
    lcSQL = [select Last_Update, Survey_Answers ] + ;
      [from company_survey where companyid = ] + ;
      TRAN(.nCompanyID) + [ and SurveyID = ] + ;
      TRAN(.nSurveyID)
    lnResult = SQLEXEC(.nCurrentConnection, lcSQL, ;
      'Answers')
    IF lnResult = -1
      LOCAL laError[1]
      AERROR(laError)

      MESSAGEBOX('Error loading data from SQL ' + ;
        'Server. The ODBC Error information is:' + ;
        CR + laError[1, 2], MB_ICONEXCLAMATION, ;
        'Unable to display survey')

      RETURN .F.

    ENDIF

    IF RECCOUNT('answers') = 1 AND ;
      ! EMPTY(answers.Survey_answers)
      LOCAL lcXML
      lcXML = ALLTRIM(answers.Survey_answers)
    ELSE
      lcXML = ''
    ENDIF

    USE IN answers
  else
    lcXML = tcXML
  endif
  .owwXML.XMLToObject(lcXML, this)
ENDWITH

****

Listing 3. Saving the answers back to the database.

  WITH THIS
  PRIVATE lcXML, lcSQL, lnResult, ltUpdateTime
  * create an XML string based on the state of the
  * data object
  lcXML = .owwXML.ObjectToXML(THIS)

  ltUpdateTime = DATETIME()
  
  * build the SQL string with the option to update
  * or insert
  
  lcSQL = [update company_survey ] + ;
    [set survey_answers = ?lcXML, ] + ;
    [Last_Update = ?ltUpdateTime ] + ;
    [where CompanyID = ] + TRAN(.nCompanyID) + [ ] + ;
    [and SurveyID = ] + TRAN(.nSurveyID)
  

  * run the SQL
  lnResult = SQLEXEC(.nCurrentConnection, lcSQL)

  * check the result
  IF lnResult # 1
    LOCAL laError[1]
    AERROR(laError)
    MESSAGEBOX('Could not save answers. The ODBC ' + ;
      'error message was:' + CR + laError[1, 2], ;
      MB_ICONEXCLAMATION, ;
      'Unable to save survey results')
    RETURN .F.
  ENDIF
ENDWITH

If the method is passed XML as a parameter, it just uses that to load the values into the properties of the object (we'll get to why this is useful a little later on). To load data from the back end, the method first checks that the company is listed for this survey. It then generates a SQL statement to retrieve any currently stored data for this company/survey combination. Finally, if there were already answers stored in the XML field, then the values are transferred to the data object's properties with the single line:

  .owwXML.XMLToObject(lcXML, this)

This is the powerhouse of the method. This single line of code transfers each property's stored value from the XML. There are a couple of cool things to note here:

  • If the property's been changed or added since the data was persisted to the table, the default value is used—there's no requirement to go back through the data retrofitting changes to the survey. (Of course, it's possible you may want to do some translations for business reasons, but the architecture we've used means that it's not required from a technical standpoint.)
  • Deleting a property from the data object means that the data for that property simply won't appear in the new version of the object, and next time the object is persisted to the table, the deleted data will simply disappear.

Saving the properties in a data object is just as simple as loading them. Again the powerhouse of the method is a single line:

  lcXML = .owwXML.ObjectToXML(THIS)

This takes the properties in the current object and converts them to an XML representation. A sample of the XML generated by this call is shown in Listing 4.

****

Listing 4. Survey answers stored in XML format.

  <?xml version="1.0"?>
<xdoc>
  <class>
    <ccomments></ccomments>
    <csource>OTH</csource>
    <csourceother>Bus stop poster</csourceother>
    <lfoxpro>True</lfoxpro>
    <lsqlserver>False</lsqlserver>
    <lvb>False</lvb>
    <ncompanyid>1</ncompanyid>
    <nsex>2</nsex>
    <nsurveyid>1</nsurveyid>
    <tlastupdated>2001-10-05T17:33:10</tlastupdated>
  </class>
</xdoc>

The rest of the method simply writes the XML string into the text field. Note that I've used a parameterized update command:

  lcSQL = [update company_survey ] + ;
  [set survey_answers = ?lcXML, ] + ;
  [Last_Update = ?ltUpdateTime ] + ;
  [where CompanyID = ] + TRAN(.nCompanyID) + [ ] + ;
  [and SurveyID = ] + TRAN(.nSurveyID)

The ?lcXML tells the ODBC connection to ask VFP for the value of the lcXML variable. Using this construct eliminates a large number of issues with encoding of illegal characters and so on. All of that is handled behind the scenes by the ODBC driver interface. Similarly, ?ltUpdateTime tells ODBC to ask VFP for the value of the ltUpdateTime variable. Dates are another great source of incompatibility between various back-end data sources. Letting the ODBC driver do the conversion and formatting in this way eliminates a potential headache if you change back ends.

Adding a new survey

The steps for adding a new survey are:

  1. Decide what questions are to be asked (or, perhaps more accurately, which answers will be recorded) and what type their answers will be.
  2. Subclass the surveybasedata class and add a property initialized to a value of the appropriate type for each answer. Override the CreateCursor() method to create a cursor with columns with the same names as the properties just added (don't forget to add the four properties from the data base class—nSurveyID, nCompanyID, cComments, and tLastUpdated).
  3. Update the nSurveyID property for the data object subclass so this survey has a unique number.
  4. Create a subclass of the surveybase class that will present the survey to the user.
  5. Add a control or controls to the subclassed form with controlsources in the form thisform.oDataObject.<data object property>
  6. If there are any dependencies between the questions (for example, questions that should only be asked if others were answered in a certain way), then put code in the UpdateControls() method to change control states. Note that this method is called after the data object's LoadData() method is called, so the control's states are set initially. You should also call this method whenever there's a potential need for a refresh. For example, if a check box state changes to true, another set of questions might become relevant.
  7. Update the cDataObjectName property with the name of the subclass you created in step 2. This ensures that the UI object instantiates the correct data object when it's created.
  8. Make an entry in the Survey_Master table with the SurveyID from step 3, a description for the survey, the name of the data class from step 2, and the name of the UI class from step 4.
  9. For each company that's eligible for a survey, add a row to the Company_Survey table with that company's CompanyID and the SurveyID from step 3.

While this might seem to be a lot of work, remember that you only need to do it once for each survey.

Putting it into practice

For the purposes of this exercise, I designed a simple survey with the questions shown in Table 2. ******

Table 2. The survey questions.

No.

Question

Comments

1

How did you hear about our company?

C(3) options are:
WOM: Word of mouth
YP: Yellow Pages
WWW: Web search
ADT: TV advertisement
ADR: Radio advertisement
OTH: Other

1a

If Other—where did you hear about us?

C(20)
Only available if OTH selected for question 1

2

Gender?

I
1: Male
2: Female

3

Do you use SQL Server?

L

4

Do you use FoxPro (Visual or otherwise)?

L

5

Do you use Visual Basic (not VBA or VBS)?

L

Next, I created a subclass of surveybasedata called surveycustomerdata and added properties for each of the questions. The subclass is included in the Download file, but the properties added were: cSource, cSourceOther, nSex, lSQLServer, lFoxPro, and lVB. I also overrode the CreateCursor() method to create a cursor with a field of matching name and data type for each property.

I assigned this new survey an ID of 1.

The next step was to create a subclass of the surveybase GUI class. I called this subclass surveycustomer.

I added controls to the subclassed form—a drop-down for the source, a text box for the other source description, an option group for the gender, and check boxes for each of the development tool questions. These controls were in addition to those provided by the base class—companyID, last update date/time, and comments, as well as the Save and Cancel buttons.

Because the other source text box should only be available if the user chooses Other from the source drop-down, I added enable/disable code to the UpdateControls() method that checks the value of the drop-down and takes appropriate action.

I set the value of the cDataObjectName property to "surveycustomerdata" so the correct data object is instantiated by the GUI class.

I updated the survey_master table by adding a row with the values SurveyID = 1, Survey_Name = "Generic Customer Survey," Data_Class = "surveycustomerdata," UI_Class = "surveycustomer."

Finally, I added two new rows to the Company_Survey table, one each for CompanyIDs 1 and 2, both with SurveyID 1.

To display the survey, I typed the following in the command window:

  SET CLASSLIB TO survey addi
os = CREATEOBJECT('surveycustomer', 1, .t.)
os.Show

Note that the second and third parameters of the CreateObject() call are passed to the Init() method of the GUI object. In this case, the 1 means that I want to load the survey data for CompanyID 1, and the .t. means that I want to open the form in edit mode (rather than just viewing the data). The result is shown in Figure 4.

Answering the questions and clicking the Save button fires the form class's SaveSurveyData() method, which in turn fires the data object's SaveData() method. The XML generated and saved to the memo field is shown in Listing 4.

Retrieving the data

"But wait," I hear you cry. And you're right. Storing data in this format doesn't make querying and retrieval a simple matter of running a SQL statement. The data is stored in a free-form memo field, and most of us have had experience with how much of a hassle it is to retrieve data from there. This is where the power of the XML format (and one of Visual FoxPro's most useful commands) comes to the fore. It's a simple matter to retrieve all of the data from the text field into a cursor, and once it's in the cursor the whole might of the native data engine is available to you.

The code in Listing 5 (included as ExtractData.PRG in the Download file) shows how simple it is to retrieve all of the data from the XML memo fields into a single usable cursor.

****

Listing 5. Retrieving VFP data from the memo field quagmire.

  IF ! 'survey' $ LOWER(SET("Classlib"))
  set classlib to locfile('survey.vcx') addi
ENDIF

local loDataObject
loDataObject = create('surveycustomerdata')

WITH loDataObject
  local lnHand
  lnHand = SQLSTRINGCONNECT(.GetConnectString())

  local lcSQL
  lcSQL = [select companyid, Last_Update, ] + ;
    [survey_answers ] + ;
    [from company_survey ] + ;
    [where surveyid = 1 and ] + ;
      [survey_answers not like '']
  sqlexec(lnHand, lcSQL, 'AnswerList')

  .CreateCursor()

  select AnswerList
  scan
    .LoadData(AnswerList.Survey_Answers)
    select ResultSet
    append blank
    gather name loDataObject memo
    select AnswerList
  endscan

  SQLDISCONNECT(lnHand)
  
ENDWITH

After checking that the appropriate class library is loaded, the extraction program instantiates a data object and connects to the back end using the data object's connect string. It then retrieves a cursor containing all of the rows with answers to survey questions. Next it calls the data object's CreateCursor() method, which generates a cursor with a column of the appropriate type and name for each data object property.

Scanning through the list of answer text fields, the code then passes the XML for each set of answers to the data object's LoadData() method. Remember when I said that the tcXML parameter of the LoadData() method would come in handy? Well, here it is. It means that there's no requirement to go back to the back end to retrieve the answers for each company. We can just get them all into a cursor with one back-end hit and then use the local data engine to scan through them and pass the XML to the data object.

Once the data object has been loaded up with the answers, it's time to add a row to the results table and populate it with the date. Two simple but very powerful lines of code do this:

  append blank
gather name loDataObject memo

GATHER NAME is a wonderful command. It makes the process so much simpler. The alternative would be to iterate through all of the properties of the object and do a REPLACE for each one with the corresponding field in the cursor. I haven't benchmarked this, but I imagine that having a native command to do this results in significant efficiencies.

After scanning through the entire list of answers, this code will leave you with a cursor called ResultSet, which has a row for every company and a column for every answer. From there, the reporting process is up to you.

Making changes to a survey

The last thing I want to mention is how much flexibility this approach gives you. Let's say that in our example you want to add a new question about the respondent's income range and another about the number of computers at their primary working location. Let's also suppose that you discovered the 30 characters you'd allocated to the other source field was too small—40 would be better—and that your boss is no longer interested in the answer to whether people are using VB. The following is all you'd have to do:

  1. Add a property called cIncomeRange and another called nComputerCount to the surveycustomerdata class.
  2. Remove the lVB property from the same class.
  3. Update the surveycustomerdata::CreateCursor() method to include cIncomeRange C(3) and nComputerCount I. Remove lVB and change the length of the cSourceOther field from 30 to 40.
  4. Update the surveycustomer class to include new controls for the income range (probably a drop-down like the source drop-down) and the number of computers (probably a spinner or text box). Ensure that their control sources are set to the matching properties of the data object. Remove the VB check box.

That's it! No data changes are required either to capture or to retrieve the data. The next time a survey is opened that was completed using the old format, the default values will be used for the new properties, the VB property will be ignored, and the existing 30 characters will be used for the other source field. As soon as it's saved, the data will be in the new format.

Extra credit

You can use the metadata stored in the survey_master table to build a list of the surveys to which a particular company is "subscribed." You could present a list to the user, including the description of the survey and the data that was last completed for this company in a GUI. The UI_Class field would then allow you to instantiate an appropriate form for viewing or editing survey responses.

It's quite simple to change the back-end database used for this class. In fact, I developed the system using a SQL Server back end. All you need to do is change the cConnectString property of the surveydatabase class to one that's appropriate for your back end of choice and then set up the tables with the structure shown in Figure 3 on that back end. That's it—the conversion's complete.

Conclusion

Using XML to store data in a memo field provides an extremely flexible architecture while allowing the structured retrieval of the data. While it's not essential that XML be used (it could have just as well been a proprietary text or even binary format in the memo fields), the fact that XML is a standard for this type of work means that tools like Rick Strahl's wwXML library make working with the format simple and quick. I encourage you to get comfortable with this powerful data exchange format.

To find out more about FoxTalk and Pinnacle Publishing, visit their website at http://www.pinpub.com/html/main.isx?sub=57

Note: This is not a Microsoft Corporation website. Microsoft is not responsible for its content.

This article is reproduced from the November 2001 issue of FoxTalk. Copyright 2001, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. FoxTalk is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-493-4867 x4209.

© Microsoft Corporation. All rights reserved.