question

stefstefan-5861 avatar image
0 Votes"
stefstefan-5861 asked

Not able to re-train model [Multiclassification(AveragedPerceptron)]

Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)

I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.

I am able to get the first model, but struggling to re-train it.

First time I have created the model (simulate all the steps generated by AutoML)

// First Phase: Create the model

         var mlContext = new MLContext(seed: 1);


         // BuildTrainingPipeline

         // Load Data
         var data = mlContext.Data.LoadFromTextFile<ModelInput>(
                                         path: TRAIN_DATA_FILEPATH,
                                         hasHeader: false,
                                         separatorChar: '\t',
                                         allowQuoting: true,
                                         allowSparse: false);


         // Data process configuration with pipeline data transformations
         var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
                                   .Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
                                   .Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
                                   .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
                                   .AppendCacheCheckpoint(mlContext);



         // Set the training algorithm 
         var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
                                  .AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
                                   .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));


         IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);



         // Train and save Model


         // Create model here
         ITransformer firstModel = trainingPipeline.Fit(data);

         // Save the model
         mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);



Then I presume I have new data to train the model with
/// Second Phase - Re-training the model

         // New Data
         ModelInput[] ticketData = new ModelInput[]
         {

               new ModelInput
               {
                   Col0 = "Category 3",
                   Col1 = "Text to classify 1"
               },

               new ModelInput
               {
                   Col0 = "Category 2",
                   Col1 = "Text to classify 2"
               },

               new ModelInput
               {
                   Col0 = "Category 3",
                   Col1 = "Text to classify 3"
               },

               new ModelInput
               {
                   Col0 = "Category 2",
                   Col1 = "Text to classify 4"
               },

               new ModelInput
               {
                   Col0 = "Category 1",
                   Col1 = "Text to classify 5"
               },

         };



         // Create MLContext
         MLContext mlContext = new MLContext();

         // Define DataViewSchema  trained model
         DataViewSchema modelSchema;

         // Load trained model
         var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);

         //Load New Data
         IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);
     


        // And here I get stuck. Because I don't know how to retrain the model with new data. I have tried to follow the guidance from this topics: [here](https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/retrain-model-ml-net), [here](https://github.com/dotnet/machinelearning/blob/36fab9b6806260e64e50992450a219e869c7f74a/test/Microsoft.ML.Functional.Tests/Training.cs#L80-L118) or changes suggested [here](https://github.com/dotnet/machinelearning/issues/5247) but with no result.

Issue

My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
My trainer.Fit doesn't take 2 arguments.

dotnet-ml-big-data
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

0 Answers