May 2012

Volume 27 Number 05

The Working Programmer - Talk to Me, Part 3: Meet the Therapist

By Ted Neward | May 2012

Ted NewardIn the first part of this series (msdn.microsoft.com/magazine/hh781028), I built a simple voice-input system over the phone using the Tropo cloud-hosted voice/SMS system. It wasn’t too complicated, but it showed how to use the Tropo scripting API, hosted on the Tropo servers, to receive phone calls, present a menu, gather response input and so on.

In the second column (msdn.microsoft.com/magazine/hh852597), I took a step sideways and talked about Feliza, a “chat-bot” in the spirit of the original “ELIZA” program, designed to take user text input and respond to it in a manner similar to what we might hear while reclining on a psychologist’s couch. Again, “she” wasn’t all that sophisticated, but Feliza got the point across, and more important, demonstrated how easily the system could be extended to get a lot closer to passing the Turing test.

It seems natural, then, to take these two pieces and weld them together: Let Tropo gather the voice or SMS input from the user, feed it to Feliza, let her calculate a deep, thoughtful response, send it back to Tropo and have Tropo in turn feed it back to the user. Unfortunately, a significant disconnect prevents that from being as easy as it sounds. Because we’re using the Tropo Scripting API, our Tropo app is hosted on its servers, and Tropo isn’t opening up its servers to host an ASP.NET app, much less our custom Feliza binaries (which, as of last column, are just a set of Microsoft .NET Framework DLLs).

Fortunately, Tropo realized that being able to do voice and SMS by themselves wasn’t going to really cut it among the business-savvy developer crowd, and it offers the same kind of voice/SMS access, but over HTTP/REST-like channels. In other words, Tropo will take the incoming voice or SMS input, pass it to a URL of your choice, then capture the response and … well, do whatever the response tells it to (see Figure 1).

Tropo-Hosted API Call Flow
Figure 1 Tropo-Hosted API Call Flow

True, this adds another layer of network communication to the whole system, with all the failover and performance concerns that another network round-trip entails. But it also means that we can capture the input and store it on any server of our choice, which could very well be a significant concern for certain applications—security, database access and so on.

So let’s take another step sideways and figure out how Tropo does this little HTTP dance.

Hello, Tropo … from My Domain

The place to begin is with a simple “Hello world”-style access. Tropo, like many Internet APIs, uses HTTP as the communication channel and JSON as the serialized format of the data being sent. So the easiest thing to do is build a simple, static JSON object for Tropo to request when a phone number is called, saying “Hello” to the caller. The JSON for doing that looks like this:

{
  "tropo": [
    {
      "say": {
        "value":"Hello, Tropo, from my host!"
      }
    }
  ]
}

On the surface, the structure is fairly simple. The JSON object is a single-field object, the field “tropo” storing an array of objects that each tell Tropo what to do; in this case, it’s a single “say” command, using the Tropo text-to-speech engine to say, “Hello, Tropo, from my host!” But Tropo needs to know how to find this JSON object, which means we need to create and configure a new Tropo application, and we need a server that Tropo can find (meaning it probably can’t be a developer laptop hiding behind a firewall). That second point is easily fixed via a quick trip to your favorite ASP.NET hosting provider (I used WinHost—its Basic plan is perfect for this). The first requires a trip back to the Tropo control panel.

This time, when creating a new application, choose “Tropo WebAPI” instead of “Tropo Scripting” (see Figure 2), and give it the URL by which to find that particular JSON file; in my case, I created feliza.org (in anticipation of the steps after this) and dropped it off the root of the site. Fully configured, it looks like Figure 3.

The Application Wizard
Figure 2 The Application Wizard

The Configured Application
Figure 3 The Configured Application

Although Tropo was happy to hook up a Skype and Session Initiation Protocol (SIP) number for us, we still have to hook up a standard phone number manually. I did that while you weren’t looking, and that number is 425-247-3096, in case you want to take a moment and dial the server.

And that’s it! Sort of.

If you’ve been building your own Tropo service alongside me, you’re not getting any kind of response from the phone when dialing in. When this is the case, Tropo provides an Application Debugger to be able to see the logs from your Tropo app. (Look in the blue bar at the top of the page.) When looking at the log, we see something like the following: “Received non-2XX status code on Tropo-Thread-8d60bf40bc3409843b52f30f929f641c [url=https://www.feliza.org/helloworld.json, code=405].”

Yep, Tropo got an HTTP error. Specifically, it got a “405” error, which (for those who haven’t memorized the HTTP spec yet) translates to “Method not supported.”

To be honest, calling Tropo a REST service is something of a misnomer, because it doesn’t really follow one of the cardinal rules of REST: the HTTP verb describes the action on the resource. Tropo doesn’t really care about the verb; it just POSTs everything. And that’s why the host is responding (correctly) to the HTTP POST request, because a static page isn’t POSTable. Oy.

Fortunately, we know a technology that fixes that pretty easily. At this point, we create an ASP.NET app (an Empty one is fine), and give it a routing that takes “/helloworld.json” and maps it to a simple Controller, as shown in the following code (with much non-relevant code omitted):

namespace TropoApp
{
  public class MvcApplication : System.Web.HttpApplication
  {
    public static void RegisterRoutes(RouteCollection routes)
    {
      routes.MapRoute("HelloWorld", "helloworld.json",
        new { controller = "HelloWorld", action = "Index" });
    }
  }
}

… which in turn just returns the static JSON for our HelloWorld, as shown here (with much non-relevant code omitted):

namespace TropoApp.Controllers
  {
    public class HelloWorldController : Controller
    {
      public const string helloworldJSON =
        "{ \"tropo\":[{\"say\":{\"value\":\"Hello, Tropo," +
        " from my host!\"}}]}";
      [AcceptVerbs("GET", "POST")]
      public string Index() {
        return helloworldJSON;
      }
    }
  }

Push this up to the server, and we’re golden.

Say, Say, Say …

If the “say” in the JSON tickles your memory just a little bit, it’s because we ran into it during the earlier exploration of the Tropo Scripting API. Back then, it was a method we called, passing in a series of name/value pairs (in true JavaScript fashion) of parameters describing how to customize the spoken output. Here, because we don’t have the ability to call APIs on the server—remember, this JSON file is hosted on my server, not the Tropo cloud—we have to describe it in a structural form instead. So, if we want a different voice talking to the user, we need to specify that as a field in the “say” object:

{
  "tropo":[
    {
      "say":
      {
        "value":"Hello, Tropo, from my host!",
        "voice":"Grace"
      }
    }
  ]
}

Now, Grace (who’s described as “Australian English”) will greet us in the name of Tropo. The full details of “say” are described in the Tropo API docs on its Web site, as are all the JSON objects being passed back and forth.

Here’s where using ASP.NET really shines: Rather than try to build up these strings of JSON in the code, we can use the implicit object-to-JSON bindings in ASP.NET to make it easy to slam out these JSON objects (see Figure 4).

Figure 4 Using .NET Framework Object-to-JSON Bindings

public static object helloworld =
  new { tropo =
    new[] {
      new {
        say = new {
          value = "Hello, Tropo, from my host!",
          voice = "Grace"
        }
      }
    }
  };
[AcceptVerbs("POST")]
public JsonResult Index()
{
  return Json(helloworld);
}

The JSON sent must have its fields and values quoted using the double quotes, as opposed to the normal JavaScript “it can be either” single quote or double quote. Using the object-to-JSON bindings makes all of that entirely irrelevant to the application developer. Nice. (Note: Tropo also provides a client library for C# that abstracts away much of the JSON stuff, but I’m focusing on REST calls “by hand” because this also helps show how to do the same kind of thing with ASP.NET MVC in general—see bit.ly/bMMJDv for details.)

Listen to the Sound …

The point of Feliza isn’t just to spew out random cookie-cutter bits of psychological tripe, though. She needs to hear the user’s spoken input, analyze that and then spew out random cookie-cutter bits of psychological tripe. In order to do that, we have to be able to process the incoming POSTed JSON object that Tropo will send us. Doing so is relatively easy, given that it’s a JSON object (and described at bit.ly/yV5ect concerning the “ask” structure, which will say something, then pause and wait for input) and that ASP.NET MVC has some nice auto-JSON-to-object bindings for doing this. So, for example, to post a question to the user and have it drive to a different JSON result, we’d want an “ask” like that in Figure 5 (as seen in the Tropo docs).

Figure 5 An “ask” Example

public static object helloworld =   new { tropo =
    new[] {
      new {
        say = new {
          value = "Hello, Tropo, from my host!",
          voice = "Grace"
        }
      }
    }
};
[AcceptVerbs("POST")]
public JsonResult Index()
{
  return Json(helloworld);
}
{
  "tropo": [
    {
      "ask": {
        "say": [
          {
            "value": "Please say your account number"  
          }
        ],
        "required": true,
        "timeout": 30,
        "name": "acctNum",
        "choices": {
          "value": "[5 DIGITS]"
        } 
      } 
    },
    {
      "on":{
        "next":"/accountDescribe.json",
        "event":"continue"
      }
    },
    {
      "on":{
        "next":"/accountIncomplete.json",
        "event":"incomplete"
      }
    }
  ] 
}

As the parameters imply, this “ask” will time out in 30 seconds, then bind the results (which must be five digits) into a parameter called “acctNum” in the subsequent JSON response POSTed back, which will be sent to the “accountDescribe.json” endpoint. If the account number is incomplete, Tropo will POST to “accountIncomplete.json” and so on.

There’s only one problem with the system as it’s written currently: If we change the input type (in the “choices” field) from “[5 DIGITS]” to “[ANY]” (which is what Feliza would want, after all—she wants users to be able to say anything they want), Tropo tells us in the documentation for “ask” that trying to capture “[ANY]” kinds of input over the voice channel is disallowed. That puts the kibosh on using voice to talk to Feliza. In almost any other application scenario, this wouldn’t be a problem. Usually voice input will need to be constrained to a small set of inputs, or else we’ll need a tremendous amount of accuracy in transforming the speech to text. Tropo can record the voice channel and store it as an MP3 file for offline analysis, but Tropo offers us another alternative for open-ended text input.

ASP.NET Talking to F#

We’ve wired Tropo up to our Web site, but Feliza still sits in her F# DLLs, unconnected. We can now start to wire up the Feliza F# binaries against the incoming input, but that’s going to require ASP.NET to talk to F#, an exercise that’s relatively simple but not always obvious. The ASP.NET site is also going to need to emit custom JSON responses back, so rather than leave the job half-finished, we’ll finish off Feliza next time—and look at some ways to potentially extend the system even further.

Happy coding!


Ted Neward is an architectural consultant with Neudesic LLC. He’s written more than 100 articles, is a C# MVP and INETA speaker and has authored and coauthored a dozen books, including the recently released “Professional F# 2.0” (Wrox, 2010). He consults and mentors regularly. Reach him at ted@tedneward.com if you’re interested in having him come work with your team, or read his blog at blogs.tedneward.com.

Thanks to the following technical expert for reviewing this article: Adam Kalsey