Import Count Table

Imports a previously created table of counts

Category: Learning with Counts

Note

Applies to: Machine Learning Studio

This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.

Module overview

This article describes how to use the Import Count Table module in Azure Machine Learning Studio.

The purpose of the Import Count Table module is to allow customers who created a table of count-based statistics using an earlier version of Azure Machine Learning to upgrade their experiment. This module merges the existing count tables with new data.

For general information about count tables and how they are used to create features, see Learning with Counts.

Important

This module is provided solely for backward compatibility with experiments that use the Build Count Table (deprecated) and Count Featurizer (deprecated) modules.We recommend that you upgrade your experiment to use the newer modules, to take advantage of new features.

For all new experiments, we recommend that you use the following modules:

How to configure Import Count Table

  1. In Azure Machine Learning Studio, open an experiment that contains a count table created using the Build Count Table (deprecated) module.

  2. Add the Import Count Table module to the experiment.

  3. Connect the two outputs of the Build Count Table (deprecated) module to the matching input ports of the Import Count Table.

    If you have another dataset of counts that you want to merge with the imported count table, connect it to the rightmost input for the Import Count Table module.

  4. Use the Counting type option to specify where and how the count table is stored:

    • Dataset: The data used to build counts is saved as a dataset in Azure Machine Learning Studio.

    • Blob: The data used to build counts is stored as a block blob in Windows Azure storage.

    • MapReduce: The data used to build counts is stored as a blob in Windows Azure storage.

      This option is typically preferred for very large datasets. To access the counts, you must activate the HDInsight cluster. A MapReduce job is launched to perform the counting. Both of these activities can incur storage and compute costs.

      For more information, see HDInsight on Azure.

    After specifying the data storage mode, you may need to provide additional connection information for the data, even if you previously used a Import Data module in the experiment to access data. That is because the Count Featurizer (deprecated) module accesses the data storage separately in order to read the data and build the required tables.

  5. Use the Count table type option to specify the format and storage mode of the table used to store counts.

    • Dictionary: Uses a dictionary count table.

      All column values in the selected columns are treated as strings, and are hashed using a bit array of up to 31 bits in size. Therefore, all column values are represented by a non-negative 32-bit integer.

    • CMSketch: Uses a table saved in the count minimum sketch table.

      With this format, multiple independent hash functions with a smaller range are used to improve memory efficiency and reduce the chance of hash collisions.

    In general, you should use the Dictionary option for smaller data sets (<1GB), and use the CMSketch option for larger datasets.

  6. Run the experiment.

  7. When complete, right-click the output of the Import Count Table module, select Save as Transform, and type a name for the transformation. When you do this, the merged count tables and any featurization parameters you might have applied are saved in a format that can be applied to a new dataset.

Examples

Explore examples of count-based featurization using these sample experiments in the Azure AI Gallery:

Note

These Gallery experiments were all created using the earlier, and now deprecated, version of the Learning with Counts modules. When you open the experiment in Studio, the experiment is automatically upgraded to use the newer modules.

Expected inputs

Name Type Description
Count metadata Data Table The metadata of the counts
Count table Data Table The count table
Counted data set Data Table The data set used for counting

Module parameters

Name Type Range Optional Default Description
Counting type CountingType Required The counting type

Outputs

Name Type Description
Counting transform ITransform interface The counting transform

Exceptions

Exception Description
Error 0003 Exception occurs if one or more of inputs are null or empty.
Error 0018 Exception occurs if input dataset is not valid.

For a list of errors specific to Studio modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.

See also

Learning with Counts
Count Featurizer (deprecated)