22-12-21 08:53 AM
How to catch text which starts and ends with predefined words and has multiple rows in between?
Position of this text is dynamic on page. This text can be present multiple times in doc (incl. on different pages). Unique text is in between Start and End words.
Example on 1 page:
Start of text I need to take ke[pkg;gn’;fgn unique text 1
continue unique text 1
Kmbkfdmb End.
Start of text I need to take phtrhrttn,n flm;flng;lfgn unique text 2
continue unique text 2
gfknjfogknmfg End.
Or can be:
1th page
Start of text I need to take ke[pkg;gn’;fgn unique text 1
continue unique text 1
Kmbkfdmb End.
2nd page:
Start of text I need to take phtrhrttn,n flm;flng;lfgn unique text 2
continue unique text 2
gfknjfogknmfg End.
I need extract all of them.
Due to the dynamic position I used StrictRegex Flag and Format expression only.
Regex like Start(.*?|\n)*?End.* OR ^Start:(.*?|\n)*?End.*$ - raise CPU issue:
It has same issue even when I want to extract only 1 field (unique regex in DFD and unique text in PDF)
With disabled flag StrictRexeg – the same issue.
Regex like Start[\s\S]*?End.* OR ^Start[\s\S]*?End.* $ - has low confidence (text is red).
It works only for some already trained docs (not all). I processed about 80 docs (5 original docs) - new batch – submit in the end, and still the text is in red. Not in all doc it catches (even after 15 times submit butch). The text is combined in one correct region but field not filled automatically. I select it manually, submit batch, send this doc again. And nothing has changed.
Thanks
22-12-21 10:50 AM
22-12-21 11:24 AM
22-12-21 11:36 AM
22-12-21 11:39 AM
22-12-21 01:25 PM
23-12-21 01:00 PM