question

DivyaK-4075 avatar image
0 Votes"
DivyaK-4075 asked HimanshuSinha-MSFT answered

Speed up keyphrase extraction on huge dataset

Hi,

I am using cognitive service from mmlspark package to extract keyphrases, I have dataset with ~500k (5 lakh records), its taking too long (job runs for more than 24 hrs) to extract keyphrases, is there any faster or efficient way to extract key phrases for huge dataset.

 keyphrase = (KeyPhraseExtractor()
     .setTextCol("text")
     .setLocation("eastus")
     .setSubscriptionKey(service_key)
     .setOutputCol("keyphrase")
     )
    
 results = keyphrase.transform(df_cleaned)

I am running the job on Synapse notebook on spark cluster.

Thanks,
Divya



azure-synapse-analytics
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@DivyaK-4075 I think the github repo for mmlspark would be the appropriate place to get recommendations on this question from the right team of experts. Could you also post an issue on the github repo?
The tag azure-cognitive-services is used to address issues directly related to cognitive services API calls, in this case the implementation of KeyPhraseExtractor in mmlspark might have limitations that are not directly set by the cognitive services APIs.
I am also adding synapse tag so that others from analytics communities could also chip in. Thanks!!



0 Votes 0 ·

1 Answer

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered

Hello @DivyaK-4075,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is how to speed up the execution process , please do let us know if its not accurate.
I suggest you to please start with increasing the node size and see if that helps .

198621-image.png


Please do let me if you have any queries.
Thanks
Himanshu


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators



image.png (17.9 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.