03-07-23 04:09 PM
Hi,
I have a PO document(pdf) where the address data is dynamic in rows, Address may be in 3, 4 or 5 lines. In DFD we are assigning with multiple lines for the address field and trained with all format files in sufficient number (~100). But every time a new batch is loaded, decipher able to read only a few documents(40%) with correct address. for others either it is extracting less rows of data or more rows.
Eg: Decipher is reading only 3 lines in 5 lines of address data in a single block, or 4-5 lines in 3 line address.
refer below screenshot:
Any suggestions to improve the accuracy?
Thanks ,
Kiran.
17-08-23 10:07 AM
Hi Ben,
I tried using the above-mentioned regex still not able to achieve. We are trying different possibilities as well.
Any update from dev team regarding the issue? Please let me know if any.
Thanks,
22-08-23 10:12 AM
Hi Kiran,
Apologies for the delay. The development team have indicated this scenario is not well catered for in Decipher 2.2, though it may be worth trying to set a higher threshold for the ML match. If you set the misc parameter "TemplateMinMatchPercent" to a higher value (default = 60), this may help separate more similar scenarios.
Aside from that, there are new features being added to Decipher 2.3 which better cater for this scenario.
Thanks
24-08-23 10:00 AM
Hi Ben,
Thanks for your response,
We are trying to train the data as suggested along with few other misc parameters as below:
BorderTable=On
BottomStop="SHIP TO PO"
TemplateMinMatchPercent=80
StrictMode=1
We are still in training phase, But we have observed that decipher is not able to identify the table details in the same pdf document which we used to get the details before.
QQ: Does the above misc parameters makes to ignore the other table details in the same document? Any suggestions?
refer the below screenshot for pdf templet.
Thanks,
25-08-23 08:33 AM
Hi Kiran,
BorderTable might not work correctly for this example as there aren't borders between the rows.
BottomStop only works with single words, it can't be used with phrases and is for use in tables only. "Ship to" appears above the table in this example, worth being mindful of the impact this might have.
TemplateMinMatch will require some trial and error, because we don't record the % match for a respective page. So train 1 document, then upload 1 you want it to treat as a different layout. If it recognises it and captures more information that the DFD would automatically, increase the match % and restart the batch at the capture stage.
StrictMode is a different option for table recognition, it's worth trying if you've otherwise been unsuccessful. It takes at least 3 documents for it to be trained.
Thanks
28-08-23 10:22 AM
Hi Ben,
Thanks for the response,
Have removed the border table, bottom stop parameters and trained the data again. Still, we see the issue with capturing the ship to address with different lines (read with 60%), but able to read the table data like before after removing the misc parameters.
Thanks,
31-08-23 10:33 AM
In my experience also I've seen this issue where multiple lines in ship to and bill to address do not work consistently. I was using a multiline flag but still, it was not extracting the full address in all layouts. Also when the bill to and ship to is empty it starts picking a value from some other field that is adjacent to it. Can not keep adding exclude headers and exclude values for all kinds of layouts.