What makes a Data Source a Data Source?

Article
03/15/2017

It should be obvious, and it is — at least at the Tabular 1200 compatibility level: A data source definition in a Tabular model holds the connection information for Analysis Services to connect to a source of data such as a database, an OData feed, or a file. That’s straightforward. However, at the Tabular 1400 compatibility level, this is no longer so trivial. At the Tabular 1400 compatibility level, a data source definition can include a native query and even a contextual M expression on top of the connection information, which opens interesting capabilities that didn’t exist previously and redefines to some degree the nature of a data source definition.

Let’s take a closer look at a data source definition in a Tabular 1400 model, such as the following definition for a SQL Server-based data source:
The two important properties are the query parameter in the connectionDetails, which can hold a native source query, and the contextExpression parameter, which can take an M expression. The default "..." simply stands for an expression that takes the data source definition as is without wrapping it into a further M context. You can find a more elaborate example at the end of this article. For now, just note that you won’t see the contextExpression in your data source definitions yet. A forthcoming release of SSAS and SSDT Tabular will enable this feature.

The query parameter, on the other hand, already exists in the metadata. It’s just that SSDT Tabular does not let you enter a source query through the user interface (UI) when defining a data source. This is intentional to maintain the familiar separation of connection information on data sources and source queries on table partitions. Equally, there are currently no plans to expose a contextExpression designer in the UI.

The following screenshot shows the Power BI Desktop UI in the background for a SQL Server data source with a textbox to enter a SQL query in comparison to SSDT Tabular in the foreground, which doesn’t offer this textbox.

For most data modelling scenarios, a clear separation of connection information and source queries is advantageous. After all, multiple tables and partitions can refer to a single data source definition in SSDT. It doesn’t seem very useful to restrict a data source to a single result set by means of a source query, such as “SELECT * FROM dimCustomer”, defined through the data source’s query parameter. Instead, it would be more useful to specify the query when importing a table by using the Value.NativeQuery function, as the following screenshot illustrates.

This way, the data source remains available for importing further tables from the same source. On the other hand, if you do need a data source with a very narrow scope, you can set the query parameter manually by using the Tabular Model Scripting Language (TMSL).

If it’s clearly not recommended to use the query parameter in a data source definition, then why did we come up with yet another such parameter called contextExpression? Well, this brings us back to the starting point: What makes a Data Source a Data Source?

A data source can be defined along a varying degree of detail, as shown above. On one extreme, you could define a data source that is so narrow it returns a single character, such as by using the following source query: “SELECT TOP 1 Left(firstName, 1) FROM dimCustomer”. Not very useful, but still a source of data. On the other extreme, a data source could be so broad that the tables you import on top of it require redundant statements that could be avoided with a more precise data source definition. For example, by using Tabular Object Model (TOM) or TMSL, you could define a SQL Server data source that only specifies the server name but no database name. Any tables importing from this data source would now require an M expression that includes a line to navigate to the desired database first before importing a source table, such as “AdventureWorksDW = Source{[Name="AdventureWorksDW"]}[Data] ”. Perhaps even more extreme, some data sources can be defined so broadly that they don’t even include information about the data source type. For example, any file-based data source can be considered of type File, while in fact a better definition would be a Microsoft Access database, Microsoft Excel workbook, comma-separated values file, and so forth. This is where the contextExpression comes in. It adds context information to narrow down a very broad data source definition to make it more meaningful.

The following abbreviated data source definition for an Access database shows the contextExpression in action. The connectionDetails merely define a File data source, which is too broad. What we want to define is an Access data source, so the contextExpression takes the File data source and wraps it into an Access.Database() function. As mentioned earlier, the placeholder expression "..." stands for the data source definition without the additional context.

By using a context expression, SSDT Tabular can define data sources that build on other data sources. Through TOM or TMSL, you can also edit the context expression to build more sophisticated definitions, yet this is generally not recommended. Also, unfortunately, TOM and TMSL do not provide an API for editing an M expression. This may come at some point in the future, but for now it’s not a priority.

And this is it for a quick glance at the upcoming contextExpression feature. As always, please send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. You can influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers.

What makes a Data Source a Data Source?

Additional resources