question

TheTimp-4451 avatar image
0 Votes"
TheTimp-4451 asked deherman-MSFT commented

Search for Text in PDF File stored in Blob

Is there way to search blobs of PDF files without at Azure Cognitive Search?

I configured a test blob and from what I can see the Microsoft.search/SearchServices are going to be the AU$12.19 per day.. $360 per month..
(on about 100MB of test PDFs: AUD$0.02 Microsoft.storage/storage accounts)

Perhaps I have something configured that I can turn off?

It would be cheaper to spin up a Server VM with a SQL DB and use that with the Adobe PDF iFilter on a varbinary (max)BLOB..

9928-rgteststorageconfig.png
9996-rgteststoragecosts.png


azure-blob-storageazure-cognitive-search
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TheTimp-4451 avatar image
0 Votes"
TheTimp-4451 answered

I dropped and recreated the search with the Basic Tier down from the Standard (1) Tier - and that looks a bit cheaper..

10232-searchpricingtier.png




But given the limits for the search top out at a 256MB (Maximum File size), it looks like the service wont suit me, as my pdf reports are often larger than that..

and if the Storage per Partition is the actual files, and not just the indexes, my 40GB of files will require a S2 @ almost AUD$2000 per month!

Am I missing something??

https://azure.github.io/LearnAI-KnowledgeMiningBootcamp/labs/lab-02-azure-cognitive-search.html











5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

deherman-MSFT avatar image
0 Votes"
deherman-MSFT answered

@TheTimp-4451 Firstly, apologies for the delay in responding here!

Currently Azure Cognitive Search is the recommended method inside Azure to accomplish this. I see you have already found the different pricing tiers that are available.

To get the best answer possible in regards to pricing kindly contact Azure Billing support, it's free, and what I recommend in this circumstance.

For further exploration you can also use the Pricing calculator, for your own detailed analysis.

Hope this helps!
Kindly let us know if the above helps or you need further assistance on this issue.



Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TheTimp-4451 avatar image
0 Votes"
TheTimp-4451 answered deherman-MSFT commented

Thanks, I have contacted Azure Billing support and checked out the Pricing Calculator.. its not very clear:

For Example:

When indexing the content and metadata of a PDF file, that is for example 2MB in size do I need to ensure the 'Storage per partition' has:
1. At least 2MB (the size of the original file), or
2. An amount less than 2MB, as the 'Storage per partition' will only hold the indexes of the document.
I would have assumed that the 'Storage per partition' size would have been an amount less than the original file size?

RE: at the moment there is no direct way to index files greater than 256 MB
I have 1400 PDF files to put in the Blob Container, which range in size from 5MB to 500MB..
If I split the PDF files into individual pages, then loaded them into individual blobs, then the indexer could index each page, as it would be less than 16MB, but that would be a lot of Blobs…
At what point would I realistically hit the upper limit of blobs?
The documentation says: "approximately 24 billion documents per index on Basic".. at what point would I see performance degradation?


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@TheTimp-4451 Apologies for the delayed follow-up here. Can you please provide the billing case number? That way I can review internally the discussion and help to get your questions answered.

0 Votes 0 ·