Hi Zdenek,
This is likely where the document has vector data for most of the text, but not all of it. PDFs often contain the text data in an extractable format called "vector data", enabling applications to extract it without OCR. You can test this by seeing if you can highlight the text with your cursor in adobe reader (or similar).
If Decipher extracts this and considers it to be the most important data, it will skip the OCR stage (to maximise processing speed). So when you've restarted it at the OCR stage, it's 'dropped' the vector data and gone with a full OCR read.
If you have single page documents, they can be converted to jpg (or other) and will no longer have the vector data.
That being said, there's an update in 2.2 to manage this in a way that will deliver the best of both. So Decipher will extract the vector data and will always read the non-vector areas with OCR without re-reading the vector areas.
Let me know if that doesn't make complete sense.
Thanks
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based