SyMS Adapter Overview

The SyMS adapter is the storage adapter that's used to interact with data in Synapse workspace. It provides the Common Data Model view of Synapse workspace.

Note: : SyMS adapter is in preview stage and few functionality might change in future.

SyMS object hierarchy representation in Common Data Model

Common Data Model maps SyMS metadata into a folder structure as shown below:

cdmsymsmapping

File Name SyMS Mapping
databases.manifest.cdm.json Stores list of databases in SyMS as sub-manifest.
[databaseName].manifest.cdm.json Stores entities (SyMS tables) declaration such as data location information.
[entityName].cdm.json Stores entity (SyMS table) definition like attributes (SyMS column) and traits.

Reading metadata from SyMS

  1. Create and mount SyMS adapter to corpus. Please consult this page for more information about adapter’s API.

    SymsAdapter adapter = new SymsAdapter("<symsworkspaceName>.dev.azuresynapse-dogfood.net","<tenantid>","<clientId>","<Secret>");
    
    corpus.Storage.Mount("syms", adapter);
    
  2. Optional: Create and mount all ADLS storage accounts as ADLS adapters which are attached to the SyMS workspace.

    corpus.Storage.Mount("adls1",
                new ADLSAdapter(
                "/<FILESYSTEM-NAME>",
                "<CLIENT-ID>",
                "<CLIENT-SECRET>"
                ));
    
    
    corpus.Storage.Mount("adls2",
                new ADLSAdapter(
                "/<FILESYSTEM-NAME>",
                "<CLIENT-ID>",
                "<CLIENT-SECRET>"
                ));
    

    Note: If the ADLS adapters are not added, CDM will create and mount respective adapters that correspond to the discovered data locations.

    corpus.Storage.NamespaceAdapters
    
  3. Read the list of databases from SyMS workspace

    Databases are stored as submanifests in databases.manifest.cdm.json. So read databases.manifest.cdm.json first.

    CdmManifestDefinition manifest = await corpus.FetchObjectAsync<CdmManifestDefinition>("syms:/databases.manifest.cdm.json");
    
  4. Read SyMS database

    Read the submanifest definition to get databases as manifests.

    CdmManifestDefinition manifestdb = await corpus.FetchObjectAsync<CdmManifestDefinition>($"syms:/<databaseName>/<databaseName>.manifest.cdm.json" null, true);
    

    or

    CdmManifestDefinition manifestdb = await corpus.FetchObjectAsync<CdmManifestDefinition>(manifest.SubManifests[0].Definition, null, true);
    
  5. Read SyMS tables

    Tables are stored as entities in CDM. Read CdmEntityDefinition object to read SyMS table.

    var entity = await corpus.FetchObjectAsync<CdmEntityDefinition>(manifestdb.Entities[0].EntityPath, manifestdb); 
    
  6. Read SyMS table as CDM document.

    var doc = await corpus.FetchObjectAsync<CdmDocumentDefinition>($"syms:/{manifestdb.ManifestName}/<tableName>.cdm.json");
    

Note: Database name will be in the form of "manifestdb.ManifestName".

  1. Data locations are represented in the form of data partition patterns. Use the following API to resolve the pattern to individual data partition objects.
    manifestdb.FileStatusCheckAsync();
    

Creating a database and a table in SyMS

  1. Create and mount SyMS adapter to corpus. Please consult this page for more information about adapter’s API.

    SymsAdapter adapter = new SymsAdapter("<symsWorkSpaceName>.dev.azuresynapse-dogfood.net","<tenantId>","<clientId>","<secret>");
    
    corpus.Storage.Mount("syms", adapter);
    
  2. Create and mount all ADLS storage accounts attached to the SyMS workspace as SDK's ADLS adapters.

    corpus.Storage.Mount("adls1",
                new ADLSAdapter(
                "/<FILESYSTEM-NAME>",
                "<CLIENT-ID>",
                "<CLIENT-SECRET>"
                ));
    
    
    corpus.Storage.Mount("adls2",
                new ADLSAdapter(
                "/<FILESYSTEM-NAME>",
                "<CLIENT-ID>",
                "<CLIENT-SECRET>"
                ));
    
  3. Create manifest object or load from *.manifest.cdm.json document.

    var manifest = await corpus.FetchObjectAsync<CdmManifestDefinition>("local:/<manifestFileName>.manifest.cdm.json");
    

    Make sure manifest has following content

    1. is.storagesource is a mandatory trait that must be defined in the manifest, and should contain location of the default lake linked to SyMS.
    2. Manifest name is SyMS database name.
    3. Providing data partition for an entity (SyMS table) is mandatory in SyMS. If ADLS adapter's namespace is provided, then the corresponding ADLS adapter path will be converted to the absolute path of the partition. Otherwise, the location will be calculated relative to the location value found in "is.storagesource" trait. In the example below, it is "adls1:/" which is default for the Address entity.

    Example :

    {
        "manifestName": "SymsTestDatabase",
        "explanation": "SymsTestDatabase syms database",
        "exhibitsTraits": [
            {
            "traitReference": "is.storagesource",
            "arguments": [
                {
                "name": "namespace",
                "value": "adls1:/"
                }
            ]
            }
        ],
        "entities": [
            {
            "type": "LocalEntity",
            "entityName": "Account",
            "entityPath": "Account.cdm.json/Account",
            "dataPartitions": [
                {
                "location": "adls2:/Account/partition-data.csv",
                "exhibitsTraits": [
                    {
                    "traitReference": "is.partition.format.CSV",
                    "arguments": [
                        {
                        "name": "columnHeaders",
                        "value": "true"
                        },
                        {
                        "name": "delimiter",
                        "value": ","
                        }
                    ]
                    }
                ],
                "lastFileStatusCheckTime": "2021-08-17T00:50:29.016Z",
                "lastFileModifiedTime": "2020-02-24T00:00:00.000Z"
                }
            ],
            "lastFileStatusCheckTime": "2021-08-17T00:50:29.017Z",
            "lastFileModifiedTime": "2021-08-17T00:11:06.991Z",
            "lastChildFileModifiedTime": "2021-08-17T00:11:06.991Z"
            },
            {
            "type": "LocalEntity",
            "entityName": "Address",
            "entityPath": "Address.cdm.json/Address",
            "dataPartitions": [
                {
                "location": "Address/partition-data.csv",
                "exhibitsTraits": [
                    {
                    "traitReference": "is.partition.format.CSV",
                    "arguments": [
                        {
                        "name": "columnHeaders",
                        "value": "true"
                        },
                        {
                        "name": "delimiter",
                        "value": ","
                        }
                    ]
                    }
                ],
                "lastFileStatusCheckTime": "2021-08-17T00:50:30.992Z",
                "lastFileModifiedTime": "2020-02-24T00:00:00.000Z"
                }
            ],
            "lastFileStatusCheckTime": "2021-08-17T00:50:30.992Z",
            "lastFileModifiedTime": "2021-06-25T00:54:16.717Z",
            "lastChildFileModifiedTime": "2021-06-25T00:54:16.717Z"
            }
        ],
        "lastFileStatusCheckTime": "2021-08-17T00:50:29.010Z",
        "lastFileModifiedTime": "2021-08-17T00:47:05.987Z",
        "lastChildFileModifiedTime": "2021-08-17T00:11:06.991Z",
        "relationships": [
            {
            "name": "Account_Account_relationship",
            "fromEntity": "Account.cdm.json/Account",
            "fromEntityAttribute": "parentAccountId",
            "toEntity": "Account.cdm.json/Account",
            "toEntityAttribute": "accountId"
            },
            {
            "name": "Address_Address_relationship",
            "fromEntity": "Address.cdm.json/Address",
            "fromEntityAttribute": "masterId",
            "toEntity": "Address.cdm.json/Address",
            "toEntityAttribute": "accountId"
            }
        ],
        "jsonSchemaSemanticVersion": "1.0.0",
        "imports": [
            {
            "corpusPath": "cdm:/foundations.cdm.json"
            }
        ]
    }
    
  4. Save manifest

    Create database and resepective tables in SyMS.

    var ret = await manifest.SaveAsAsync($"syms:/{manifest.ManifestName}/{manifest.ManifestName}.manifest.cdm.json")
    

Writing deltas into SyMS

  1. Create a new table in existing SyMS database.

    1. Read SyMS database.
    CdmManifestDefinition manifestdb = await corpus.FetchObjectAsync<CdmManifestDefinition>("syms:/<databaseName>/<databaseName>.manifest.cdm.json");
    
    1. Create table definition i.e. [tableName].cdm.json document.
    2. Add [tableName] under entity declaration in [databaseName].manifest.cdm.json.
    3. Save manifest.
    var ret = await manifest.SaveAsAsync($"syms:/{manifest.ManifestName}/{manifest.ManifestName}.manifest.cdm.json")
    
  2. Delete a table from SyMS.

    1. Read SyMS database.
    2. Remove [tableName] under entity declaration from [databaseName].manifest.cdm.json.
    3. Save manifest.
  3. Modify existing table.

    1. Read SyMS database.
    2. Read [tableName].cdm.json
      var doc = await corpus.FetchObjectAsync<CdmDocumentDefinition>($"syms:/{manifestdb.ManifestName}/<tableName>.cdm.json");
      
    3. Add/Modify column in [tableName].cdm.json
    4. Save document
      var ret = await doc.SaveAsAsync($"syms:/{manifestdb.ManifestName}/<tableName>.cdm.json")
      
    5. Change entity declaration object in manifest.
    6. Update LastFileModifiedTime to current time.
      manifestModified.Entities[0].LastFileModifiedTime = DateTimeOffset.UtcNow;
      
    7. Save manifest.

Note : The data partition location path which contain '.', currently not supported in SyMS adapter. Example adls:/abc/is.folder is not supported.

SymsAdapter Class

Please refer to this section.