Import Count Table
Imports a previously created table of counts
Category: Learning with Counts
Applies to: Machine Learning Studio (classic)
This article describes how to use the Import Count Table module in Machine Learning Studio (classic).
The purpose of the Import Count Table module is to allow customers who created a table of count-based statistics using an earlier version of Machine Learning to upgrade their experiment. This module merges the existing count tables with new data.
For general information about count tables and how they are used to create features, see Learning with Counts.
This module is provided solely for backward compatibility with experiments that use the deprecated Build Count Table and deprecated Count Featurizer modules. We recommend that you upgrade your experiment to use the newer modules, to take advantage of new features.
For all new experiments, we recommend that you use the following modules:
How to configure Import Count Table
In Machine Learning Studio (classic), open an experiment that contains a count table created using the deprecated Build Count Table module.
Add the Import Count Table module to the experiment.
Connect the two outputs of the Build Count Table (deprecated) module to the matching input ports of the Import Count Table.
If you have another dataset of counts that you want to merge with the imported count table, connect it to the rightmost input for the Import Count Table module.
Use the Counting type option to specify where and how the count table is stored:
Dataset: The data used to build counts is saved as a dataset in Machine Learning Studio (classic).
Blob: The data used to build counts is stored as a block blob in Windows Azure storage.
MapReduce: The data used to build counts is stored as a blob in Windows Azure storage.
This option is typically preferred for very large datasets. To access the counts, you must activate the HDInsight cluster. A MapReduce job is launched to perform the counting. Both of these activities can incur storage and compute costs.
For more information, see HDInsight on Azure.
After specifying the data storage mode, you may need to provide additional connection information for the data, even if you previously used a Import Data module in the experiment to access data. That is because the Count Featurizer (deprecated) module accesses the data storage separately in order to read the data and build the required tables.
Use the Count table type option to specify the format and storage mode of the table used to store counts.
Dictionary: Uses a dictionary count table.
All column values in the selected columns are treated as strings, and are hashed using a bit array of up to 31 bits in size. Therefore, all column values are represented by a non-negative 32-bit integer.
CMSketch: Uses a table saved in the count minimum sketch table.
With this format, multiple independent hash functions with a smaller range are used to improve memory efficiency and reduce the chance of hash collisions.
In general, you should use the Dictionary option for smaller data sets (<1GB), and use the CMSketch option for larger datasets.
Run the experiment.
When complete, right-click the output of the Import Count Table module, select Save as Transform, and type a name for the transformation. When you do this, the merged count tables and any featurization parameters you might have applied are saved in a format that can be applied to a new dataset.
Explore examples of count-based featurization using these sample experiments in the Azure AI Gallery:
Flight delay prediction: Shows how count-based featurization can be useful in a very large dataset.
Learning with Counts: Multiclass classification with NYC taxi data: demonstrates the use of count-based features in a multiclass prediction task.
Learning with Counts: Binary classification with NYC taxi data: Uses count-based features in a binary classification task.
These Gallery experiments were all created using the earlier, and now deprecated, version of the Learning with Counts modules. When you open the experiment in Studio (classic), the experiment is automatically upgraded to use the newer modules.
|Count metadata||Data Table||The metadata of the counts|
|Count table||Data Table||The count table|
|Counted data set||Data Table||The data set used for counting|
|Counting type||CountingType||Required||The counting type|
|Counting transform||ITransform interface||The counting transform|
|Error 0003||Exception occurs if one or more of inputs are null or empty.|
|Error 0018||Exception occurs if input dataset is not valid.|
For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.
For a list of API exceptions, see Machine Learning REST API Error Codes.