Monday
If you're working with LLMs and especially if you're implementing any sort of RAG pipeline, there's a new VBO on the DX that might be of interest. It's the Utility - MarkItDown VBO. This VBO leverages Microsoft's MarkItDown Python package to convert different file types, and web pages, to Markdown locally.
While LLMs can accept various file formats—such as PDF, DOCX, and HTML—they ultimately process text best when it is structured with clear hierarchy, which Markdown provides, often replacing the need for complex, often messy, raw HTML or JSON formatting. In some cases, this can have a positive impact on your overall token utilization because you're converting the files locally before sending them to the LLM.*
*NOTE: This isn't always a requirement of the LLM.
The Code stages in the VBO are implemented in Python which means this VBO is limited to Blue Prism v7.4 or later.
Cheers,
Eric