Avro format in Azure Data Factory and Synapse Analytics
Article
APPLIES TO:
Azure Data Factory
Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
Follow this article when you want to parse Avro files or write the data into Avro format.
For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the Avro dataset.
Property
Description
Required
type
The type property of the dataset must be set to Avro.
Yes
location
Location settings of the file(s). Each file-based connector has its own location type and supported properties under location. See details in connector article -> Dataset properties section.
Yes
avroCompressionCodec
The compression codec to use when writing to Avro files. When reading from Avro files, the service automatically determines the compression codec based on the file metadata. Supported types are "none" (default), "deflate", "snappy". Note currently Copy activity doesn't support Snappy when read/write Avro files.
No
Note
White space in column name is not supported for Avro files.
Below is an example of Avro dataset on Azure Blob Storage:
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by the Avro source and sink.
Avro as source
The following properties are supported in the copy activity *source* section.
Property
Description
Required
type
The type property of the copy activity source must be set to AvroSource.
Yes
storeSettings
A group of properties on how to read data from a data store. Each file-based connector has its own supported read settings under storeSettings. See details in connector article -> Copy activity properties section.
No
Avro as sink
The following properties are supported in the copy activity *sink* section.
Property
Description
Required
type
The type property of the copy activity source must be set to AvroSink.
Yes
formatSettings
A group of properties. Refer to Avro write settings table below.
No
storeSettings
A group of properties on how to write data to a data store. Each file-based connector has its own supported write settings under storeSettings. See details in connector article -> Copy activity properties section.
No
Supported Avro write settings under formatSettings:
Property
Description
Required
type
The type of formatSettings must be set to AvroWriteSettings.
Yes
maxRowsPerFile
When writing data into a folder, you can choose to write to multiple files and specify the max rows per file.
No
fileNamePrefix
Applicable when maxRowsPerFile is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: <fileNamePrefix>_00000.<fileExtension>. If not specified, file name prefix will be auto generated. This property does not apply when source is file-based store or partition-option-enabled data store.
Avro complex data types are not supported (records, enums, arrays, maps, unions, and fixed) in Copy Activity.
Data flows
When working with Avro files in data flows, you can read and write complex data types, but be sure to clear the physical schema from the dataset first. In data flows, you can set your logical projection and derive columns that are complex structures, then auto-map those fields to an Avro file.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.