VBO/Assets to extract data from scanned PDF invoice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
10-09-24 10:37 AM
Hello team,
What are some of the best VBO/Assets you have used to extract data from scanned PDF invoices?
Not considering full-scale IDP engine implementation here instead looking for a quick solution using any DX assets or open library that works reliably on both digital as well as scanned docs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
10-09-24 05:02 PM
Hi,
- Process for PDF.co - 1.0.0
- Process for BOT AI ML DocuBOT
- Function for PDF Management - 1.1.0
- Daily PDF Actions
- Function for Utility - PDF
- Process for ABBYY FlexiCapture Connector
- How can I work with Adobe Acrobat PDF documents when using Blue Prism Enterprise?
- How can I extract data from a PDF document which is contained in a browser window?
Refer to the 'Interfacing with PDF Documents' training course in the Blue Prism University for additional information on interacting with PDF data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-09-24 09:44 AM
We are using the built in OCR-reader in our invoice process.
We open the invoice PDF in a MS Edge window. The MS Edge window is spied with Region Mode and then we can use a Read stage with "Read Text with OCR".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
12-09-24 10:53 AM
The solution in my experience very much depends on the PDF, are we talking about 1. well structured PDF forms with good accessibility functionality, 2. are we talking about PDF documents that are always in the same structure when copied to clipboard or exported to a text or XML format, or are we talking about 3. scanned documents?
For 1. you might be surprised how well the UIA interface within Blue Prism works with the document if it is made for accessibility. For 2. You might get away with an export and xml or text parsing solution. For 3. OCR technologies are the way to go and if there is a large variance LLMs might be an addition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
12-09-24 11:11 AM
How do you handle the zoom level? Also you have the same format invoices or varying layouts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
12-09-24 11:12 AM
Most of these are 3rd party paid services OR does not work with scanned PDFs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
12-09-24 11:14 AM
hello @Tejaskumar_Darji - We have used Python libraries to get content from scanned PDF.
