04-05-26 10:10 PM
If you're working with LLMs and especially if you're implementing any sort of RAG pipeline, there's a new VBO on the DX that might be of interest. It's the Utility - MarkItDown VBO. This VBO leverages Microsoft's MarkItDown Python package to convert different file types, and web pages, to Markdown locally.
While LLMs can accept various file formats—such as PDF, DOCX, and HTML—they ultimately process text best when it is structured with clear hierarchy, which Markdown provides, often replacing the need for complex, often messy, raw HTML or JSON formatting. In some cases, this can have a positive impact on your overall token utilization because you're converting the files locally before sending them to the LLM.*
*NOTE: This isn't always a requirement of the LLM.
The Code stages in the VBO are implemented in Python which means this VBO is limited to Blue Prism v7.4 or later.
Cheers,
Eric
a week ago
Hello,
I downloaded the MarkItDown utility from the Blue Prism Digital Exchange and I'm trying to test it.
However, as soon as I run the Convert File action, I get the following error:
Failed to set run stage - Internal : Failed to initialize Python
So far I have installed:
Python
MarkItDown
Python.NET
I am using Blue Prism 7.4.2 HF1.
I'm not sure where the issue could be. Has anyone experienced this error before or knows if there are any additional configuration steps required for Python integration with this utility?
Any help would be greatly appreciated.
Thank you.
a week ago - last edited a week ago
@matiasskwarko84 well, this is embarrassing. 😞 I forgot to mention in the user guide that you need to open the VBO and set the path to the Python DLL on your system. I'll update the user guide today or tomorrow.
Cheers,
Eric
a week ago
Actually, I did add that step in the Configuration section of the User Guide. 🙂 It's on page 5.
Cheers,
Eric