Convert to TSV

Converts data input to a tab-delimited format

Category: Data Format Conversions

Note

Applies to: Machine Learning Studio (classic)

This content pertains only to Studio (classic). Similar drag and drop modules have been added to Azure Machine Learning designer (preview). Learn more in this article comparing the two versions.

Module overview

This article describes how to use the Convert to TSV module in Azure Machine Learning Studio (classic), to convert any dataset from the internal format that is used by all Azure Machine Learning Studio (classic) modules, to a flat file in tab-separated format.

Tab-separated value (TSV) files are compatible with many external tools, including:

  • R and Python

  • Excel and PowerPivot

  • All relational databases

For example, if your experiment has an intermediate dataset that you would like to save for re-use in another tool or would like to call from code, you convert it to the TSV format, and then right-click the converted dataset to get the Python code needed to access the dataset.

How to use Convert to TSV

Use the Convert to TSV module whenever you need to download a dataset in tab-delimited format.

  1. Add the Convert to TSV to your experiment. You can find this module in the Data Format Conversions category in Azure Machine Learning Studio (classic).

  2. Connect the module to another datset, or to a module that outputs a tabular dataset.

  3. Run the experiment, or right-click just the Convert to TSV module, and select Run selected.

Results

When conversion is complete, you can open the dataset, call it from R or Python code, use it in a Jupyter notebook, or save it to a local file.

If you want to download the dataset, double-click the module output, and indicate whether you want to open or save the datset.

  • If you select Open, the dataset is loaded using whatever tool your computer uses by default to open .TSV files. Typically this is Microsoft Excel.

  • If you select Download dataset, by default, the file is saved with the name of the module plus a GUID representing the workspace ID. However, you can select the Save As option during download and change the file name or location.

Examples

Although there are no examples that are specific to this format, you can see examples of how format conversion is used by exploring these sample experiments in the Azure AI Gallery:

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

TSV format requirements

Tab-separated values (TSV) is a text format that is used to store data in a tabular structure. It is very similar to the CSV format, but the delimiter is a tab rather than a comma.

The TSV format is a useful alternative to the CSV format if your data contains commas. Commas are very common in text data and they are used in European number formats.

One problem with the tab-delimited format is that tab stops are frequently considered as white space in unstructured text. However, the IANA standard for TSV fosters clean and accurate parsing of TSV files by disallowing tabs within fields.

Note the following requirements for TSV files in Azure Machine Learning Studio (classic):

  • The Convert to TSV module supports the output of a single heading row, if the dataset contains column names.

  • The TSV provider supports UTF-8 character encoding only.

  • When reading from or writing to TSV files, performance can be slower than with other formats (such as CSV).

Expected inputs

Name Type Description
Dataset Data Table Input dataset

Output

Name Type Description
Results dataset GenericTsv Output dataset

See also

Data Format Conversions
A-Z Module List