stefstefan-5861 avatar image
0 Votes"
stefstefan-5861 asked

Not able to re-train model [Multiclassification(AveragedPerceptron)]

Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)

I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.

I am able to get the first model, but struggling to re-train it.

First time I have created the model (simulate all the steps generated by AutoML)

// First Phase: Create the model

         var mlContext = new MLContext(seed: 1);

         // BuildTrainingPipeline

         // Load Data
         var data = mlContext.Data.LoadFromTextFile<ModelInput>(
                                         path: TRAIN_DATA_FILEPATH,
                                         hasHeader: false,
                                         separatorChar: '\t',
                                         allowQuoting: true,
                                         allowSparse: false);

         // Data process configuration with pipeline data transformations
         var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
                                   .Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
                                   .Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
                                   .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))

         // Set the training algorithm 
         var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
                                  .AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
                                   .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));

         IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);

         // Train and save Model

         // Create model here
         ITransformer firstModel = trainingPipeline.Fit(data);

         // Save the model
         mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);

Then I presume I have new data to train the model with
/// Second Phase - Re-training the model

         // New Data
         ModelInput[] ticketData = new ModelInput[]

               new ModelInput
                   Col0 = "Category 3",
                   Col1 = "Text to classify 1"

               new ModelInput
                   Col0 = "Category 2",
                   Col1 = "Text to classify 2"

               new ModelInput
                   Col0 = "Category 3",
                   Col1 = "Text to classify 3"

               new ModelInput
                   Col0 = "Category 2",
                   Col1 = "Text to classify 4"

               new ModelInput
                   Col0 = "Category 1",
                   Col1 = "Text to classify 5"


         // Create MLContext
         MLContext mlContext = new MLContext();

         // Define DataViewSchema  trained model
         DataViewSchema modelSchema;

         // Load trained model
         var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);

         //Load New Data
         IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);

        // And here I get stuck. Because I don't know how to retrain the model with new data. I have tried to follow the guidance from this topics: [here](, [here]( or changes suggested [here]( but with no result.


My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
My trainer.Fit doesn't take 2 arguments.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

0 Answers