RE: Function for DX InDev PDF to Excel Converter Error

Dhenn_MarkEspir · ‎08-06-22

Has anyone use this shared asset in DX? I tried it but got an error below when executing the code stage for Save As.

Could not execute code stage because exception is thrown by code stage: Invalid index. (Exception from HRESULT: 0x8002000B (DISP_E_BADINDEX))

Link to Asset: https://digitalexchange.blueprism.com/dx/entry/122031/solution/pdf-to-excel-converter?_ga=2.52220138.271419323.1654492142-447972324.1642993955&_gl=1*1kxbg6g*_ga*NDQ3OTcyMzI0LjE2NDI5OTM5NTU.*_ga_MFBQ2K....

------------------------------
Dhenn Mark Espiritu
RPA Consultant
EY
Asia/Manila
------------------------------

michaeloneil · ‎10-06-22

Hi Dhen

I've been retesting the object and i can only recreate the error when i provide incorrect parameters but there are a couple of things you can try. Change the calculation back to Replace([filename], ".mht", "") and the code stage back to wb.SaveAs(filename,51). Once done double check the inputs for the SaveAs code stage match my screenshot below. Save your changes and re-run your process, if you get the same error the next thing to try would be to move the pdf to another folder location for example on your desktop and then run the process again with the new location for the pdf as the input. Should you still have the same issue then try recreating the error directly through excel. You mentioned before that you get the mht output file so with this file open it with excel and then try to save it as an xlsx file to the original folder and see if you get any errors or warnings.

Beyond this I think the only other thing you could do is, since it fails but still produces the mht file, you can then use the standard excel object actions to open this mht file and resave as an xlsx. This will give you what you need it to be, the convert object only seems to fail on the final step of saving the file as an xlsx so as long as you can get the mht file then you only need to add a few steps to your process to get the final output file. See my other screenshot for an example.

------------------------------
Michael ONeil
Technical Lead developer
NTTData
Europe/London
------------------------------

View answer in original post

michaeloneil · ‎08-06-22

Hi Dhenn

Can you share the input values you are entering for the object?

------------------------------
Michael ONeil
Technical Lead developer
NTTData
Europe/London
------------------------------

Dhenn_MarkEspir · ‎09-06-22

This is my input:

This is the error when attempting to save the workbook.

------------------------------
Dhenn Mark Espiritu
RPA Consultant
EY
Asia/Manila
------------------------------

Dhenn_MarkEspir · ‎09-06-22

Hi @Michael ONeil, I replied my inputs below. I am using BP7

------------------------------
Dhenn Mark Espiritu
RPA Consultant
EY
Asia/Manila
------------------------------

Tejaskumar_Darji · ‎09-06-22

Hello,

Your inputs are correct.

I tried to recreate the same error and I'm able to see the same Index error message. (Note: I'm not able to select any text in the PDF that I tested)

Upon checking reviews on DX couple of folks have reported the same error.

The current version of VBO on DX denotes a beta version.

To test further I took a PDF that was selectable and fully digital so I was able to convert it successfully. Check the snapshots. Also attaching the PDF if you want to try it.

So this might be the limitation that it is able to convert the PDF which is in full digital and selectable format instead of scanned PDF which uses images.

------------------------------
If I was of assistance, please vote for it to be the "Best Answer".

Thanks & Regards,
Tejaskumar Darji
Sr. Consultant-Technical Lead
------------------------------

Tejaskumar_Darji · ‎09-06-22

Test with this PDF

------------------------------
If I was of assistance, please vote for it to be the "Best Answer".

Thanks & Regards,
Tejaskumar Darji
Sr. Consultant-Technical Lead
------------------------------

michaeloneil · ‎09-06-22

Hi

@Dhenn Mark Espiritu and @Tejaskumar_Darji I was the one that developed this asset for use on the DX but as you stated its not intended to be used with scanned documents. This would only work using standard pdf files as this simply creates it in an excel through a series of different formats but for scanned documents the output would likely not be transferable to cells within excel. If you need to get info from scanned docs it would be better to use ocr for something similar to get the specific data from the screen rather than try to convert it.

------------------------------
Michael ONeil
Technical Lead developer
NTTData
Europe/London
------------------------------

devneetmohanty07 · ‎09-06-22

Hi Michael,

I also have observed the same and I have went in quite some details inside that VBO too. If all inputs are provided well and they are digitized, this VBO works amazingly. It has literally saved a lot of head banging situations for me so far in many use cases especially while dealing with tabular data sets.

Really appreciate your contribution and efforts for this VBO.

------------------------------
----------------------------------

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Sr. Consultant - Automation Developer,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------

---------------------------------------------------------------------------------------------------------------------------------------
Hope this helps you out and if so, please mark the current thread as the 'Answer', so others can refer to the same for reference in future.
Regards,
Devneet Mohanty,
SS&C Blueprism Community MVP 2024,
Automation Architect,
Wonderbotz India Pvt. Ltd.

Dhenn_MarkEspir · ‎10-06-22

Hello Tejaskumari,

Still doesn't solve the error even I used your own PDF as you attached. My PDFs are not images or scanned PDFs, they are structured and have selectable text and tables as well.

See error below:

The error encountered when going to "Save workbook" code stage. The conversion from pdf to excel was not successful and it just leaving me a ".mht" file.

Is there an official release of this that is not beta already?

------------------------------
Dhenn Mark Espiritu
RPA Consultant
EY
Asia/Manila
------------------------------

michaeloneil · ‎10-06-22

Hi Dhen

it could be the file type in the save script causing the error, if you can try amending the code stage from this wb.SaveAs(filename,51) to this wb.SaveAs(filename) and in the calc stage called "Set new workbook name" change the calc from Replace([filename], ".mht", "") to this Replace([filename], ".mht", ".xlsx").

The code stage is currently explicitly setting the file extension to xlsx but if you remove this and just provide the extension as part of the name it should hopefully resolve the error.

@devneetmohanty07 I'm glad youre finding it useful, its saved me a lot of dev time trying to extract data from pdf's.

------------------------------
Michael ONeil
Technical Lead developer
NTTData
Europe/London
------------------------------

SS&C Blue Prism Community

RE: Function for DX InDev PDF to Excel Converter Error