Speech Server 2007 vs UCMA v2.0 WF activites

Article
10/15/2008

I am planning a series of blog posts that will show you how to create UCMA WF applications that can answer the telephone with speech recognition and speech synthesis abilities. However before I do that I HAVE to explain the differences between UCMA and Speech Server so that there is no confusion.

Speech Server vs UCMA

With OCS 2007 R2 there has been an update to UCMA (Unified Communications Managed API), simply named UCMA 2.0. This new and approved API has a lot of new features, support for presence, telephony, speech recognition, speech synthesis, etc… In R2 there is NO update for Speech Server, that being said you can continue to run Speech Server as is in tandem with your R2 environment.

On top of the Core API is a new WF API, which abstracts a lot of common "activities" that a UCMA application might do. At first glance it may appear that this WF API is a replacement for Speech Server, but there are a few major differences that you need to consider when deciding if an application should be a Speech Server application or a UCMA application. Here are what I consider the 4 major deciding factors when choosing between Speech Server or UCMA.

Platform vs. API

The main difference between these two is that Speech Server is not only an API, but is also an enterprise grade IVR to host these applications. With Speech Server you only have to worry about developing the front end, SALT, VoiceXML or Speech WF application.

UCMA 2.0 WF is simply an API, you need to build the front end of the application as well as building the host application. You can host these applications in a Windows Form, Console application, Windows Service, etc.. Obviously a Windows Service makes the most sense.

Note: UCMA 2.0 does NOT have built-in activities for VXML nor SALT.

Infrastructure

The big difference here is Speech Server typically sits off as its own branch from your PBX or Media Gateway, while UCMA 2.0 sits behind the mediation server. At first this seems trivial and maybe even a good idea, but there is a reason a typical IVR application sits outside of the voice network that you use for day to day communications. IE: Do you want your IVR to take up internal bandwidth and/or the lines that you use to for your telephone communications?

If you are only expecting a small amount calls, then deploying a UCMA 2.0 application shouldn’t be a big issue, but if you are building a UCMA 2.0 application that is going to continuously use 50+ ports, you need to do some planning and additional infrastructure work before deploying behind your existing OCS environment.

Note: 50 simultaneous ports is a relative number. The key takeaway here is don’t expect to simply throw a UCMA application behind your existing mediation server and not expect to consider the impact. You can add additional mediation servers and gateways to solve scaling problems.

Developer Tools

Speech Server contains developer tools that abstract things like SRGS grammars with a nice visual grammar editor. UCMA does not yet have these tools, so be prepared to write some of your own SRGS by hand. Small feature, but it can save a lot of development effort. I’ve said it before and I’m saying it again, most of you development effort with any speech enabled solution will be spent on grammars. In a UCMA application, SRGS is not only used to recognized speech, but also text from instant messaging conversations.

UCMA also does not have the SIP Debugging Phone as Speech Server does, this means that in order to debug a UCMA application you need to call the application via Communicator and/or an actual telephone. Also with the infrastructure requirements of having an actual OCS environment with mediation server, you won’t be able to debug application very easily “offline”.

Don’t expect to write and debug UCMA 2.0 applications sitting on a plane like you did with Speech Server. Yes you can create a virtualized OCS 2007 R2 environment but it is going to require HyperV and at least 8 GB of RAM. I’ve finally upgraded so I can do just that, but I realize that not everyone is going to be able to.

Reporting & Tuning

Speech Server has the Tuning & Analysis tools, which comes in handy for simple reports like “How many calls did my application get?”, “How many times did a grammar fail?”. This reporting is absolutely necessary and for any enterprise class IVR solution. If you want some of this reporting for UCMA 2.0, you are going to have to build it yourself.

Outside of the reporting side, are the tuning tools, which allow you to test grammar changes on actual callers recorded audio, before deploying.

I hope this post as helped you understand the differences between Speech Server and UCMA. UCMA 2.0 clearly is a step on the roadmap to include speech tightly into the UC platform and OCS. Expect that things like the speech tools will be available by the next release in the not too distant future.

Next post will talk about the features of UCMA…

Speech Server 2007 vs UCMA v2.0 WF activites

Additional resources