Hi guys,
I have a problem getting the data from a document because:
- Vertically it has dynamic titles (dates) but horizontally if they are always the same titles.
- The number of dynamic columns. Max 13 columns, Min 8 columns
- The table is centered in the PDF, this causes the initial position of the table to vary between the documents
- Between the documents there are small variations of distances between columns
My first try:
- DFD with a table of 12 text type columns, 1 column for the horizontal headers including some extra data such as the unit that some titles have and the rest for the data
- Decipher initially picked up the information well, then it started skipping lines, detecting headers of the date along with another column that is next to it.
Second try
- DFD with a table of 13 text type columns, 1 column for the horizontal headers, another column for the extra data and the rest for the data
- Decipher initially picked up the information well, then it started skipping lines, detecting headers of the date along with another column that is next to it.
Third try
- DFD with 3 boards. One for the dynamic date headers with text columns, another for the temperatures (1 text column and the rest numeric), another for the demand (1 text column and the rest numeric)
- ExactsRows and ButtomStop were defined
- Decipher works if the document is trained, but if I use another document that has some variation in width or columns, then you have to complete everything again and using the previous flags, the tables are not auto-completed and training the documents takes time.
How do you recommend I train these documents?