question

MicroMe-4183 avatar image
0 Votes"
MicroMe-4183 asked AdityaAgarwal-5929 commented

Does ML Studio/Designer/AutoML support Natural Language Processing?

Hey everyone,
I think Microsoft doesn't explicitly state this anywhere so I was wondering if I can create models (via AutoML or via the manual designer) using datasets containing text in natural language (such as a couple sentences, paragraphs etc.). AutoML doesn't really indicate that it can process paragraphs using NLP anywhere.
There are Text Analytics features in the designer and I heard about Azure AutoML's BERT support so I suppose it should be possible but I just wanted to make sure.
Right now I can upload such dataset and create a classification model based on it but I don't know if it treats these cells containing paragraphs just as one long string and doesn't do anything or if it actually processes the individual words etc.
Could anyone let me know, please? And if it does support NLP, what can I do besides classification? Can it do sentiment analysis, entity extraction etc.?

Thanks a lot!

(I don't see a tag for AutoML or the designer, that's why I tagged the classic Studio.)

azure-machine-learningazure-machine-learning-studio-classic
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello,

Thanks for reaching out to us. Could you please share a sample input of your NLP dataset if that is not confidential? Are you aiming at "paragraph" training only? What is your expected result?

If you are aiming at Sentiment Analysis, the API should be a better option: https://docs.microsoft.com/en-us/azure/cognitive-services/Text-Analytics/how-tos/text-analytics-how-to-sentiment-analysis?tabs=version-3-1

Please let me know more details.


Regards,
Yutong

0 Votes 0 ·

Hey @YutongTie-5848 ,
I’ve already answered below, could you please check it? I hope it’s clear enough, let me know if you need more information.

Thank you

0 Votes 0 ·

Thanks for the details, checking internally.

Regards,
Yutong

0 Votes 0 ·
Show more comments
MicroMe-4183 avatar image
0 Votes"
MicroMe-4183 answered MicroMe-4183 edited

@YutongTie-5848 (For whatever reason the reply button doesn't work for me, that's why I posted it this way.)

Well, let's say it's an email classification. So you got a .csv file where you have the text of an e-mail in one column and its class in the other column. Example could be:
"Click here to win $1000.",spam
"Hey, how are you?",normal
"Hello, PFA pictures.",normal

This is just an example with the text of the e-mails being just one sentence but you can imagine e-mails can be longer (like a paragraph or even more). You obviously can't treat a paragraph of text (like here) the same way you would treat a classification with the text being just one word.

I saw the e-mail classification (and other similar ones) being done in Azure AI gallery (https://gallery.azure.ai/Experiment/Email-Classification-for-Automated-Support-Ticket-Generation-Step-1-of-2-Train-and-Evaluate-Models-3) but that's in the studio/designer, not AutoML.

My expected result is classification but I wanted to know if AutoML can do other common tasks where NLP is used (named-entity recognition, sentiment analysis etc.).

When I choose the featurization settings, I see that there is a "Text" option as a feature type but one word is probably also "Text". So I'm asking if AutoML processes long strings in a different way.

I know there is the API but I would like to create models specifically via AutoML (or designer but preferably AutoML) for now.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered AdityaAgarwal-5929 commented

@MicroMe-4183

Hello,

Here is a list of the samples we have right now. https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning

I see the first three scenarios are very similar to yours. Could you please check if that fits your scenario?

80382-image.png


Regards,
Yutong


image.png (6.5 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @YutongTie-5848 ,
thank you for your response.
I don’t think the first two fit my scenario as the text data in the dataset consist of one word per cell (e.g. “married” or “single” in the “marital” column in the bank marketing dataset). However in my example the text data consist of sentences and I believe you cannot process sentences the same way you would process a single word during classification. I think the last example might be what I’m looking for but I’m not sure. Could you maybe ask your devs if AutoML can process these longer text data, please?
Btw. I was referring to AutoML via web interface (no coding). I suppose the AutoML via web interface has the same features though, correct?

Thank you!

0 Votes 0 ·

Yes, I am checking on this. Will let you soon.

0 Votes 0 ·

Hello @MicroMe-4183

I got confirmation that AutoML supports sentence/paragraph text classification. However, we currently only take first 128 tokens (words) in the sentence/paragraph. So if the text is very long, it might not work too well. And the document you are referring is the correct one. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-features#bert-integration-in-automated-ml

Hope this hopes.

Regards,
Yutong

0 Votes 0 ·

Hi @MicroMe-4183 ,

classification-text-dnn is the correct notebook for this!

This feature is supported in web interface too. To use this from web interface, make sure to use "enable deep learning". You can also manually select the column type as 'text' if you see AutoML is not recognizing the column type.

Also, tip: using GPU compute generally works better (not all models are available in cpu scenarios) :)

Thanks,
Adi

0 Votes 0 ·