2 weeks ago
Hi,
I am looking for a way to get this whole block of text as a paragraph of some sort.
My main issue is that this paragraph may sometimes be multi-page.
How do I assign a region to a multi-page field, is there a way to indicate the START and the END of a paragraph, or is there some other way to do it like maybe split the whole document into multiple sections?
This is the current setting for its field:
Any suggestions?
2 weeks ago
Hi @dizonf,
Decipher is not designed for the extraction of large text fields spanning multiple pages, for the kind of general "extract all" function you may be best looking at OCR extraction. IDP is more for discrete fields, OCR will extract from the whole document.
Thanks
Ben
2 weeks ago
Greetings, @dizonf ,
Generally speaking, this seems to be one of those use cases that they chalk up to Agentic AI being able to parse out for you. Depending on where you are in your AI journey, that may be something to investigate.
Barring the AI route, what follows is a large guess as to something that could work, but it is not something that I have done directly, and, thus, I am not making any kind of guarantee.
First, a prerequisite assumption is that the document that you have in Decipher is a 'pdf'. If not, the rest of this is probably less than helpful.
Blue Prism has a VBO in the Digital Exchange called 'PDF Management.' I do not believe it is a part of the core selection, so you may not have it installed in your instance. In our organization we use this solely to merge separate pdfs into a single output file.
But... There is an action called, 'Extract All Text'. Again, I have not used it, and I am guessing the output may not be pretty or formatted. However, it seems like it would give you the starting point to tease out all of the necessary fields. Your best bet is probably with RegEx, but you would have to weigh the cost to your sanity.
If you do look into this, I would be curious about your results.
Good luck,
Red
2 weeks ago
Hmm, what about using a fixed region inside a table, is that possible?
I would create a fixed region on the whole document, and each row on the table will basically be the contents of 1 page?