cancel
Showing results for 
Search instead for 
Did you mean: 

PDF Extract Text - character count present but no data present.

shenryifds
Level 3

Hi Everyone, 

Experiencing a problem while trying to read a PDF. 

I need to read a few different PDF's which are bank statements. 

I'm using the PDF Management object and the Extract All Text action. This usually and always works when using other PDF's. 

This a text based PDF meaning I can select the text and copy it to the clipboard if needed. 

With this one statement when I get the text the data item will show the character count but when you open the data item there is no data?

shenryifds_0-1749593857601.png

Wondering if anyone else has experienced this or has any suggestions as to what could be going on? 

1 BEST ANSWER

Helpful Answers

saha1sourav2
Level 5

Hi @shenryifds,

This issue is likely caused by one of the following reasons:

  1. Some PDFs include characters that are technically text but are encoded in a way that makes them invisible or unreadable (e.g., zero-width spaces, invisible Unicode characters, or RTL markers).
  2. Bank statements often use custom fonts where characters are visually readable but not programmatically extractable as meaningful text.
  3. Even without password protection, certain PDFs may have security settings that restrict automated text extraction. Blue Prism might not display an error, but the extracted content could appear blank or unreadable.

To resolve this, you can try the following:

  1. Copy the output from the data item and paste it into Notepad++. Then go to View -> Show Symbol -> Show All Characters to check for hidden or non-printable characters.
  2. Use OCR-based methods to extract the text, which can help in cases where standard extraction fails.

If this resolves your query, please mark this as Best Answer.

Best regards,
Sourav S
Consultant
WonderBotz

View answer in original post

2 REPLIES 2

saha1sourav2
Level 5

Hi @shenryifds,

This issue is likely caused by one of the following reasons:

  1. Some PDFs include characters that are technically text but are encoded in a way that makes them invisible or unreadable (e.g., zero-width spaces, invisible Unicode characters, or RTL markers).
  2. Bank statements often use custom fonts where characters are visually readable but not programmatically extractable as meaningful text.
  3. Even without password protection, certain PDFs may have security settings that restrict automated text extraction. Blue Prism might not display an error, but the extracted content could appear blank or unreadable.

To resolve this, you can try the following:

  1. Copy the output from the data item and paste it into Notepad++. Then go to View -> Show Symbol -> Show All Characters to check for hidden or non-printable characters.
  2. Use OCR-based methods to extract the text, which can help in cases where standard extraction fails.

If this resolves your query, please mark this as Best Answer.

Best regards,
Sourav S
Consultant
WonderBotz

shenryifds
Level 3

Thank you @saha1sourav2.

It seems this PDF has restricted permissions.

I'm not sure if this due to how it was downloaded or if someone manually added restrictions. I need to confirm this with the team who provides the file. 

shenryifds_0-1749662506208.png

When I copy and paste the data into Notepad++ I can see all of the text. I don't see anything that looks to be hidden. 

As it stands it could be simply the restrictions put on the source file. 

I will have to look further into this. 

Thanks again for you help.