Extract Key Sections from Legal Document using Azure AI services

Suresh Rajamani 20 Reputation points
2024-04-18T05:36:33.1933333+00:00

Hi Team,

I am trying to do a POC to extract key sections and some clauses/keywords from the Legal Document using Azure AI Services. Based in some suggestions, i am trying to use the following services.

  1. Azure Language Services -> Custom NER

2.Azure Document Intelligence -> Custom Extraction.

1. Azure Language Services -> Custom NER

So first i tried with Custom NER by labelling the document to identify the key sections like Agreement Header, Insurance, Regulatory Compliance, Termination, Non-Solicitation etc,

So, I labelled by selecting the paragraph along with section title for the above said sections. In Regulatory Compliance, i selected some keywords like GDPR, PCI DSS because here i dont need paragraph. I labelled for 10 documents, and each has different structure. Then after i ran the training, i get 0 F1 score and all are FalseNegative. It is not even finding Agreement Header which is straight forward and there is no much variation.

Then i deployed the model and tried testing. So i used the same document which was used for training. It is not even finding single entity from the document. Even i tried providing one Paragraph which is Agreement Header. Still it is not finding. Is it right approach or this NER is not right tool to extract key sections/entities.

2.Azure Document Intelligence -> Custom Extraction.

Then i labelled the sections using Field option in Custom Extraction and trained the model. But it can able to extract the sections 60% accuracy.

Approach i Planned:

Basically i am trying to do this Legal Analysis Model Training in 2 steps.

  1. Label the key sections to extract Key sections and save as PDF/TEXT.
  2. Label the keywords into the Key Sections identified to extract keywords.

So, 2 models will be deployed after training. When i test this, first extract sections using the first model and using the second model will extract keywords.

So i tried to use those 2 Azure AI services combinations to extract section and keywords.

Is this right approach?

Which service i can use for section extraction and which one i can use for keyword extraction from the sections extracted?

Is there anything wrong in labelling? Not sure what is wrong. (NER gives very bad output).

Or Can you suggest me another approach to achieve this feature.

Thanks,

Suresh Rajamani

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
357 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,377 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. dupammi 6,475 Reputation points Microsoft Vendor
    2024-04-18T07:38:53.39+00:00

    Hi @Suresh Rajamani

    Thank you for reaching out to the Microsoft Q&A forum.

    I understand that you are trying to extract key sections and clauses/keywords from legal documents using Azure AI services. Based on your description, you have tried using Azure Language Services -> Custom NER and Azure Document Intelligence -> Custom Extraction, but you are facing some issues.

    Regarding your first approach, Custom NER is indeed designed for extracting named entities from unstructured text, such as contracts or financial documents. However, it appears that you're aiming to extract key sections, which are not necessarily named entities. However, the performance may vary based on the specifics of your documents and the labeling process.

    In your second approach, Custom Extraction from Azure Document Intelligence seems promising for structured data extraction, like tables or forms. However, for unstructured data extraction, such as key sections and clauses/keywords, Custom Extraction might not be the ideal tool.

    Given your requirements, a different approach might be more suitable. Combining Azure Cognitive Search with AI-powered features could be beneficial. In the past, Azure Cognitive Search has been recommended for extracting information from documents. This service provides full-text search capabilities along with AI-powered features like natural language processing and machine learning. You could create a custom skillset within Azure Cognitive Search that incorporates various skills such as OCR, language detection, key phrase extraction, entity recognition, and text splitting to extract key sections and clauses/keywords from your legal documents.

    Here are the steps you can follow:

    1. Create an Azure Cognitive Search service and index your legal documents.
    2. Define a custom skillset incorporating OCR, language detection, key phrase extraction, entity recognition, and text splitting.
    3. Define an indexer to use the custom skillset to extract and index key sections and clauses/keywords from the legal documents.
    4. Utilize Azure Cognitive Search to search for key sections and clauses/keywords within the indexed documents.

    For more detailed and related guidance on using Azure Cognitive Search for document extraction, you can refer to the provided documentation.

    I hope this alternative approach addresses your needs more effectively.

    Thank you.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.