cancel
Showing results for 
Search instead for 
Did you mean: 

Reading Table Data(Alternate Rows) from Multi Page PDF

salman_shaik
Level 5

Hi Team,

I am trying to read the table data which is spread across the multiple pages, my requirement is  i want to read only the alternate rows from the table is there any way to achieve this using the new Decipher version?

when i am reading it i am getting the entire table data. See the below images.

salman_shaik_0-1721761651617.png

I don't want the agency discount details row to be taken as 2nd row, 2nd row should be taken from the ID.

How to tell the decipher to pick the 2nd row from a specific position? 

salman_shaik_1-1721762059542.png

@Ben.Lyons1 

 

If I was of assistance, please vote for it to be the "Best Answer". Thanks & Regards, Salman Shaik
1 BEST ANSWER

Helpful Answers

I've seen similar issues when some users have been trying to read address fields where the number of lines is variable. The only methods I've seen help in the current version are using the lists functionality, where the potential values are limited and known (these can be stored in a sql db, see validation lists). Or you could see if a particular Format Expression (Regex) helps, that is if the format has some consistency.

We're currently working on an improvement to this, due in our next release due out Aug/Sep 2024.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

View answer in original post

8 REPLIES 8

Ben.Lyons1
Staff
Staff

Hi Salman,

That's quite a tricky table as each row is effectively two rows with separate rows of headers. Have you tried using the misc. parameter "UTD = true"?
You would need to reset your training data to test, but it may give improved results after training 2/3 documents.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

Hi @Ben.Lyons1 

I already tried with the misc parameter UTD = true for all the table fields. The issue i am facing here is sometimes in the invoice 1st page itself contains 2 rows (Two Placement ids are available in 1st page itself) as shown in the image below

salman_shaik_0-1721823659516.png

In another type of invoice it contains only 1 row in the first page like below

salman_shaik_1-1721823875950.png

In this case decipher is considering the internal row (Header-- Agency Discount) as the actual second row and extracting those values.

Finally, when there are two rows in Page 1 its extracting correct data, if incase the page1 doesn't contains 2nd row it is treating the internal row as the actual second row and extracting the wrong data.

Is there any option in decipher to handle this?

Is there any option/ way based on the text mentioned can we extract the data like below (Key Value Pair feature).

salman_shaik_2-1721824273587.png

Please keep in mind the row no's are dynamic

Thanks
Salman Shaik

 

 

If I was of assistance, please vote for it to be the "Best Answer". Thanks & Regards, Salman Shaik

Ben.Lyons1
Staff
Staff

Hi Salman,

There's nothing specific in Decipher for this scenario, I can't recall having ever seen a document like it.

The only alternative I can think of is using the sub-table misc parameters or just extracting all the rows and editing the data in your Blue Prism process.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

Hi @Ben.Lyons1 

I kept extracting the entire table data as my last option, now nothing is working so i have to use that option.

I will let you know if any issues are there, thanks for your suggestion.

 

Thanks
Salman Shaik

If I was of assistance, please vote for it to be the "Best Answer". Thanks & Regards, Salman Shaik

Hi @Ben.Lyons1 

I came up with a new issue for you i have a field for campaign which may contain the data in 2 or 3 lines  (dynamic). For one invoice if we are selecting and defining the region in another invoice its not able to detect all the lines properly because the no of lines are different, i trained with 6-7 invoices but still if one invoice works other is not working Properly.

salman_shaik_0-1721918964129.png

In the below invoice it contains only two lines so its including the campaign id aswell for the campaign field.

salman_shaik_1-1721919160250.png

Campaign Flag Type is  Multiline, Exclude keywords is used

Campaign Id is using keywords, Exclude keywords, StrictPositionAnchorText (Misc Parameter)

How to handle this one?

If I was of assistance, please vote for it to be the "Best Answer". Thanks & Regards, Salman Shaik

Hi Salman,

This is very difficult to advise on without the full document, training performed and DFD. However, things to consider are:

  • Ensure you're resetting your training data (deleting) if it's not working and changes to your DFD haven't helped. Sometimes old training can unduly influence the extraction and it's quicker to start again. In 2.3 you can at least delete specific sections of the training data.
  • UTD was introduced to handle tables with varying layouts, originally this was where specific columns would sometimes not exist. So it may help for this scenario.
  • The strict position parameters don't work with tables as they are intended for completely static fields in forms.
  • In tables I've had success with "RowAreaExcludeValues" for skipping rows with certain words.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

Hi @Ben.Lyons1 

I was able to extract the entire tabular data and handling at the blueprism level, the fields Campaign, Campaign Id are completely separate fields not related to any table.

See the below image for clarification

salman_shaik_0-1722057383359.png

I am trying to extract these fields. Sometimes the campaign field contains two or three lines of text. In this case it's failing to identify it.

Can you suggest on this

If I was of assistance, please vote for it to be the "Best Answer". Thanks & Regards, Salman Shaik

I've seen similar issues when some users have been trying to read address fields where the number of lines is variable. The only methods I've seen help in the current version are using the lists functionality, where the potential values are limited and known (these can be stored in a sql db, see validation lists). Or you could see if a particular Format Expression (Regex) helps, that is if the format has some consistency.

We're currently working on an improvement to this, due in our next release due out Aug/Sep 2024.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based