The .ingest into command (pull data from storage)

The .ingest into command ingests data into a table by "pulling" the data from one or more cloud storage artifacts. For example, the command can retrieve 1000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table's schema.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, ...] )]

Arguments

  • async: If specified, the command will return immediately, and continue ingestion in the background. The results of the command will include an OperationId value that can then be used with the .show operation command to retrieve the ingestion completion status and results.

  • TableName: The name of the table to ingest data to. The table name is always relative to the database in context, and its schema is the schema that will be assumed for the data if no schema mapping object is provided.

  • SourceDataLocator: A literal of type string, or a comma-delimited list of such literals surrounded by ( and ) characters, indicating the storage artifacts containing the data to pull. See storage connection strings.

Note

It is strongly recommended to use obfuscated string literals for the SourceDataPointer that includes actual credentials in it. The service will be sure to scrub credentials in its internal traces, error messages, etc.

  • IngestionPropertyName, IngestionPropertyValue: Any number of ingestion properties that affect the ingestion process.

Results

The result of the command is a table with as many records as there are data shards ("extents") generated by the command. If no data shards have been generated, a single record is returned with an empty (zero-valued) extent ID.

Name Type Description
ExtentId guid The unique identifier for the data shard that was generated by the command.
ItemLoaded string One or more storage artifacts that are related to this record.
Duration timespan How long it took to perform ingestion.
HasErrors bool Whether this record represents an ingestion failure or not.
OperationId guid A unique ID representing the operation. Can be used with the .show operation command.

Remarks

This command does not modify the schema of the table being ingested into. If necessary, the data is "coerced" into this schema during ingestion, not the other way around (extra columns are ignored, and missing columns are treated as null values).

Examples

The next example instructs the engine to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Note also the use of obfuscated strings (the h in front of the string values) to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

The next example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

The next example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user's credentials to access ADLS (so there's no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')