cancel
Showing results for 
Search instead for 
Did you mean: 

Converting documents to Markdown

ewilson
Staff
Staff

If you're working with LLMs and especially if you're implementing any sort of RAG pipeline, there's a new VBO on the DX that might be of interest. It's the Utility - MarkItDown VBO. This VBO leverages Microsoft's MarkItDown Python package to convert different file types, and web pages, to Markdown locally.

While LLMs can accept various file formats—such as PDF, DOCX, and HTML—they ultimately process text best when it is structured with clear hierarchy, which Markdown provides, often replacing the need for complex, often messy, raw HTML or JSON formatting. In some cases, this can have a positive impact on your overall token utilization because you're converting the files locally before sending them to the LLM.*

*NOTE: This isn't always a requirement of the LLM.

The Code stages in the VBO are implemented in Python which means this VBO is limited to Blue Prism v7.4 or later. 

Cheers,
Eric

3 REPLIES 3

matiasskwarko84
Verified Partner

Hello,

I downloaded the MarkItDown utility from the Blue Prism Digital Exchange and I'm trying to test it.

However, as soon as I run the Convert File action, I get the following error:

Failed to set run stage - Internal : Failed to initialize Python

So far I have installed:

  • Python

  • MarkItDown

  • Python.NET

I am using Blue Prism 7.4.2 HF1.

I'm not sure where the issue could be. Has anyone experienced this error before or knows if there are any additional configuration steps required for Python integration with this utility?

Any help would be greatly appreciated.

Thank you.

@matiasskwarko84 well, this is embarrassing. 😞 I forgot to mention in the user guide that you need to open the VBO and set the path to the Python DLL on your system. I'll update the user guide today or tomorrow.

ewilson_0-1781712844109.png

Cheers,
Eric

ewilson
Staff
Staff

Actually, I did add that step in the Configuration section of the User Guide. 🙂 It's on page 5.

Cheers,
Eric