27-02-24 12:48 PM
Hi All,
I am having a table in a pdf doc where I have to capture the columns data, for one of the columns may have single row or multiple rows (1-8 rows) basically it's a product description. After training multiple docs (~50) Still i am not seeing data is captured correctly.
Eg: If the description column having 5 lines, it is capturing only 1,2or3 lines assuming it is having only that much of data as per other documents. if only one line is there it is capturing correctly.
Any suggestions to achieve capturing entire description correctly all the time (min 80% accuracy). I am keeping UTD = true already in dfd.
Note: table may contain multiple line items. each line-item description varies from 1 line to 8 lines.
Thanks,
27-02-24 03:45 PM
Hi Kiran,
Have you set your Product Description field data type as "MultiLine" ?
27-02-24 03:50 PM
Hi Athiban Mahamathi,
yes, I did.
Thanks,
27-02-24 04:39 PM
Hi Kiran,
I was also facing a similar issue for Bank statements where the transaction description was quite long sometimes. I followed the below steps and it worked for me. May be you can give it a try.
Please let me know if you need any more help.
28-02-24 11:14 AM
Sure Athiban,
Will give it a try.
Thanks,
29-02-24 03:27 PM
Hi Kiran,
Were you able to extract the description? if you found a better way to resolve your issue, please do share here
29-02-24 04:06 PM
Hi Athiban Mahamathi,
Currently what I am doing is, created several batches based on no. of description lines. for example, All the 3 row description pdf as separate batch and trying few documents as training set and testing the other batch to see how it is capturing. likewise for the others.
As of now for few it is capturing correctly(50 to 60%) but not for all. still trying to see other options like miscellaneous params like border table on since most of the documents are having single line item with multiple rows.
Note: I feel the way description updated in the pdf doc is bit complex to segregate as certain batch because "rows are not in a single format like equal distance between the rows , for few doc it is like 2 rows as extended text close to each row and empty space of 2 lines then again 3 rows of data.
Appreciate if you have any suggestions for the same.
Thanks,