July 2016

Volume 31 Number 7

[Data Points]

The New Azure DocumentDB Node.js SDK

By Julie Lerman

Julie LermanOver the past year I’ve been developing a sample app that uses Aurelia on the front end, a server-side API written in Node.js and Azure DocumentDB for its data store. In addition to using Node.js for the server-side API, my app also leverages the Node.js SDK for Azure DocumentDB. Rather than describe the full app, I’ll point you to the earlier articles from November (msdn.com/magazine/mt620011) and December 2015 (msdn.com/magazine/mt595750) when I wrote about this application. You can even download the original application and compare it to the new source that reflects the changes described in this article. And because I’m frequently tweaking the app, you can always take a look at the GitHub repository at bit.ly/25crZAG.

Because of changes to Aurelia, to many of the Node.js packages I’m using, to DocumentDB features and even to the aforementioned SDK over the past months, it was time to do a number of updates, and not just to packages, but also update my code to take advantage of newer features throughout. I won’t address updates to Aurelia here; instead, I’ll keep my focus on changes in DocumentDB and on modifying my Node.js API code to benefit from those changes.

Step 1: Implementing Promises for Async Calls

The first thing I did was run “npm update” on my Node.js project. The update went well, but running the app afterward was less successful. I quickly encountered an error that told me my use of callbacks had become a problem. Somewhere in the depths of dependencies, an API now favors JavaScript promises over callbacks. The paradigm of promises (akin to async/await in .NET) has been around for a while but I had taken the familiar path of using callbacks when creating the sample. Now it was time to dig in my heels, hold my breath and replace all of the callbacks in the Node.js API with promises. Unfortunately, this wasn’t just a matter of replacing terms, but it required changing the actual structure of the code. In the layers of my API, I was using callbacks in the DocDBUtils file that talked directly to the Node.js SDK for DocumentDB. And I was using callbacks in the DocDBDao class that talked to the utilities (in DocDBUtils) and to the SDK. This meant that when communicating with the utilities I had a layered system of callbacks. Finally, the ninja.js class made calls into the DocDBDao class to trigger data retrieval or updates. These methods also used callbacks and depended on the callbacks of the lower files. So I needed to implement promises from the bottom (DocDBUtils) up.

There are a number of JavaScript APIs that help with the implementation of promises. One is called Q and the DocumentDB team created a wrapper for its Node.js SDK that uses Q and, therefore, makes using promises when coding against DocumentDB a much easier task. This wrapper, documentdb-q-promises, is on GitHub at bit.ly/1pWYHCE.

Therefore, my first step was to install Q using the node package manager (npm):

npm install documentdb-q-promises

Then, in all of the node classes that were using the base SDK (the aforementioned classes, as well as one called api.js) I had to modify the “require” statements that were originally letting my classes know to use the DocumentClient class from the initial SDK:

var DocumentDBClient = require('documentdb').DocumentClient;

to point to the DocumentClientWrapper class from the new API:

var DocumentDBClient = require('documentdb-q-promises').DocumentClientWrapper;

The DocDbUtils class requires an additional reference to the Q library directly, so it’s DocumentDBClient is defined as:

var DocumentClient = require('documentdb-q-promises').DocumentClientWrapper
  , Q = require("q");

Next, I had to refactor the callback code to use the promises. I struggled with this for a while until I had the pattern down. Then, with some working functions in the DocDBUtils class, I was able to more easily fix up the functions in the classes that call into this class. But before I got to this point, it was definitely an arduous process: change code, debug, read the errors, scratch my head, Google some more and change the code again. There was a bit of griping on Twitter, as well, so my friends kept me from hurting my head too much. This wasn’t so much because it’s terribly difficult, but just because—regardless of my programming experience—I’m still something of a noob in JavaScript.

As an example, I’ll begin with the very first function to be hit when the API is run: the init method in DocDbDao.js. This method makes sure that the rest of the API is aware of the DocumentDB account, connects to it using the authentication keys and knows the name of the database, as shown in Figure 1.

Figure 1 The Original getOrCreateDatabase Method Using Callbacks

getOrCreateDatabase: function (client, databaseId, callback) {
  var querySpec = { *query for database name defined here* };
  client.queryDatabases(querySpec).toArray(function (err, results) {
    if (err) {
        callback(err);
    } else {
      if (results.length === 0) {
          var databaseSpec = {
            id: databaseId
          };
          client.createDatabase(databaseSpec, function (err, created) {
            callback(null, created);
          });
      } else {
        callback(null, results[0]);
      }
    }
  });
},

getOrCreateDatabase is called from the init function in the docDbDao class. The parameter named client is an instance of DocumentDB­Client from the original SDK.  The third parameter, named callback, refers to the calling function—in this case, init. The getOrCreate­Database method defines a query in the querySpec variable and then calls client.queryDatabase with the query. If queryDatabase returns an error, getOrCreateDatabase passes that error back up to the calling function via the callback. Otherwise it inspects the results. If results.length is 0, it creates a new database and then passes information returned by createDatabase back to the calling function. If results.length isn’t 0, the first item in the results array is returned back in the callback.

Now let’s have a look at this same function, shown in Figure 2, rewritten to use promises (remember, these are like async/await in the Microsoft .NET Framework), leveraging the promises implementation provided by Q.

Figure 2 getOrCreateDatabase Using Promises

getOrCreateDatabase: function (client, databaseId) {
  var querySpec = { *query for database name defined here* };
    return client.queryDatabases(querySpec).toArrayAsync().then(function (results) {
      if (results.length === 0) {
          var databaseSpec = {
              id: databaseId
        };
        client.createDatabaseAsync(databaseSpec)
          .then(function (databaseResponse) {
            db = databaseResponse.resource;
            return client.createCollectionAsync(db._self, collectionDefinition);
          })
      }
      return results.feed[0];
  },
      function (err) { return err; }
  );
}

The first thing to notice is that there’s no callback in the parameter list for the function. After defining the query, the function makes the call to queryDatabases, but not like before. This time, I’m using the queryDatabases wrapper defined by the new SDK. Rather than calling toArray on queryDatabases, I used the toArrayAsync method, which is one of a number of asynchronous methods provided by the documentdb-q-promises SDK. toArrayAsync returns an instance of a promise type defined by the Q library; promise has a “then” method (similar to the await you might be familiar with from the .NET Framework) that allows you to define a function to execute when the queryData­bases.toArrayAsync call completes. The first bit of logic indicates what to do if the call is successful. Just like before, I check to see if the length is 0, indicating that the database doesn’t yet exist. If this is the case, then I create a new database, but this time using the createDatabaseAsync method, which, like the other async methods, returns a promise object. If the database is created successfully, I then process the database response. I’ve left out some of the additional logic around creating the database, but you can see it if you download the example code.

The next part of the method specifies what should happen if the query does find a database, which is simply to return the first item in the results. The results of toArrayAsync contain a feed property that wraps the results, which is why you see the syntax as results.feed[0].

Last, if the queryDatabases call fails, the function returns an error.

Now that you’ve walked through this, let’s look at the pattern again:

CallToAsyncFuction().then(function to execute when complete){
                          success logic
                          },
                          function(err) {failure logic}
                          );

You call one of the asynchronous methods and use its then method to define a nameless function to execute when the call completes. In the function’s logic, you first specify code to execute when the method has succeeded (optionally returning some result) and then code to execute if the method fails (also with the option of returning a result).

I’ve implemented this pattern throughout the revised API, replacing all of the callback constructs in the three models.

However, I can’t simply debug now and expect this to work because I have a waterfall of promises starting with the ninjas class, which doesn’t yet know about the new documentdb-q-promises SDK. You can try to replace those other callbacks in the original sample yourself, or see the fully updated solution in the download.

Now my Node.js interaction with DocumentDB is using recommended technology, so let’s look at some other features in DocumentDB and I’ll discuss how I implemented them in the API.

Parameterized Queries

In the first iteration of my sample, I did something in the ninja.js class that I’d never do in my .NET apps—I hacked query strings together with string concatenation. At least I was using the ES6-enabled string interpolation to do that concatenation, but still I’m a little ashamed and have no excuse. Except perhaps two. The first is I was learning from the provided samples and not yet using my brain. (Does this even count?) The second is that security wasn’t paramount at the moment because performing a SQL attack on an Azure DocumentDB isn’t that much of an issue due to the way queries work in the database. Even the documentation says that DocumentDB isn’t really susceptible to the most common types of injection attacks, though there’s always a chance of an evildoer finding a way to take advantage of injection. Still, it’s always good to be extra cautious about security, and parameterized queries have been a recommended practice for data access for a very long time.

In the earlier version of the sample, a filter function defined a type called querySpec with a property named query. The query property value was SQL, used to retrieve a set of ninjas from DocumentDB:

var querySpec = {
  query:
  'SELECT ninja.id, ninja.Name,ninja.ServedInOniwaban,ninja.DateOfBirth
  FROM ninja'
}

The filter function reads a filter value contained in the URL. For example, when the user searches for every ninja whose name contains “San,” the URL is localhost:9000/api/ninjas?q=San. The original function constructed a query predicate by simply concatenating the filter value, found in request.query.q, to the predicate:

q = ' WHERE CONTAINS(ninja.Name,"' + request.query.q + '")';

I then appended the predicate to the base query, which was stored in querySpec.query.

Even though injection attacks aren’t nearly as easy using a filter value, I’ve replaced that bit of logic with a parameter DocumentDB will comprehend. Rather than concatenate the filter value entered by the end user (San), I’ll use a parameter placeholder called @namepart in the predicate. Then I’ll add a new property to querySpec called parameters and, with JSON formatting, define it using name and value properties that DocumentDB will look for. I can then specify the parameter name and the query value passed in by the URL:

querySpec.query += " WHERE CONTAINS(ninja.Name, @namepart)";
  querySpec.parameters = [{
    name: '@namepart',
    value: request.query.q
}]

DocumentDB will then execute this as a parameterized query so any evil SQL will be unable to hurt my data.

OK, so that’s a bit harsh, but it’s how many of us felt about the need to use selfLinks from every type of object, whether it was a database, a collection, a document or other object in DocumentDB. The selfLink value is how the object identified itself in DocumentDB. You’d have to query DocumentDB with a known identifier—database or collection name or the identity value of a document—in order to get its selfLink so you could perform other operations. SelfLinks are still there, but you no longer need them to interact with an object. If you know the details to build up a link to an object, you can use that instead of the selfLink. I’ll demonstrate that shortly in combination with the next feature I’ve taken advantage of in my revised sample: Upserts.

Replacing Replace with Upsert

I was eager to remove the clunky update function in my API, which required that I first retrieve the item to be updated from the database in order to:

  1. Ensure that it existed
  2. Get access to its selfLink
  3. Get access to the full document in case the item passed into the update method has a limited schema

Then I had to update fields from the document retrieved from the database with values from the item passed to the update method from the client application. Finally, I had to tell DocumentDB to replace the existing document in the database with this modified document. You can visit the earlier article or sample to take a look at the updateItem function in docDbDao if you’re curious what this all looked like.

Luckily, in October Microsoft announced the addition of atomic upsert functionality to DocumentDB, which enables it to figure out if a given document needs to be inserted or updated. See the related blog post at bit.ly/1G5wtpY for a more detailed explanation.

Upsert allows me to do a simple update using the document I have at hand. The documentdb-q-promises SDK provides an asynchronous version of replace. Here’s my revised update function:

updateItem: function (item) {
  var self = this;
  var docLink = "dbs/Ninjas/colls/Ninjas";
  item.DateModified = Date.now();
  return self.client.upsertDocumentAsync(docLink, item).then(function (replaced) {
    return replaced;
  },
    function (err) {
    return err;
    }
  );
},

Notice the docLink value that I’m building. This is the new feature I mentioned that helps me avoid needing the actual SelfLink from the database for the document I want to update. I’m simply specifying that the database is named Ninjas and the collection also happens to be named Ninjas. I pass the docLink value along with the item that came from the client to the upsertDocumentAsync command, and then pass back the Boolean (replaced) that’s returned when the command has been executed. Notice also that along with the async command, I’ve modified this logic to leverage the promise returned by the Async method. You can tell because I’m composing then on the Async method.

So Much More Has Been Added to DocumentDB

While my little sample takes advantage of only a few of its new capabilities, so much more has been added to DocumentDB since I wrote my earlier columns. More SQL commands are in there, including TOP for paging and ORDER BY for sorting. ORDER BY is dependent on indexes on collections, which makes sense because this is about Big Data here, and you do need to tune the database to meet your particular needs. It’s also possible to modify the indexing policies in existing collections rather than having to live with an unfortunate previous choice. If you’re using the .NET Client API for DocumentDB, the LINQ provider has become much richer, which you can read about at bit.ly/1Od3ia2.

DocumentDB is for persisting large amounts of data, and Microsoft is working to make access more efficient and cost-effective. To that end, the company introduced partition keys, a core feature of DocumentDB that allows you to partition collections to accommodate a “large volume of data at high rates or applications that require high throughput, low latency access to data.” You can read more about partition keys at bit.ly/1TpOxFj.

Microsoft also addressed pricing issues that kept users from cre­ating additional collections because fees were related to the number of collections. New plans are based on the volume of data and throughput, so you can use more collections and not worry quite as much about excessive costs for collections that don’t have a lot of activity. But each collection still has a minimum throughput. The team continues to look for ways to improve resource usage so they can lower these minimums in the future. For more information about the new pricing plans, visit bit.ly/1jgnbn9.

DocumentDB tools have also changed, and they’ll continue to do so in the future. There are more ways to explore your DocumentDB in the Azure Management Portal, and the data migration tools have acquired more capabilities. One change I was especially happy to see was that the Cloud Explorer extension for Visual Studio now supports connecting to and exploring DocumentDB data. You can even edit and save changes to the raw data through the Cloud Explorer, though at this time, querying is not an option.

To keep up with the growing feature set, keep an eye on the DocumentDB-tagged blog posts on the Azure blog (bit.ly/1Y5T1SB) and follow the @documentdb Twitter account for updates.


Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following Microsoft technical expert for reviewing this article: Andrew Liu