cancel
Showing results for 
Search instead for 
Did you mean: 

Extract Images from PDF

MareddyMahesh
Level 3

Hi,

Is there any VBO available to extract images from PDF files.

Thanks,

Maheshwar

5 REPLIES 5

Hi Maheshwar,

 

you can check "Connector for Blue Prism - Adobe PDF Services - API - 2.0.0" asset from digital exchange.

https://digitalexchange.blueprism.com/dx/entry/3439/solution/blue-prism---adobe-pdf-services---export

-----------------------
If I answered your query. Please mark it as the Best Answer

Harish Mogulluri

MareddyMahesh
Level 3

 

Hi Harish,

Thank you for your reply.

Unfortunately, the connector requires the creation of a paid account in Adobe PDF Services, which isn't suitable for my current needs.

I'm searching for a simpler utility similar to those available in Power Automate and Ui Path.

My Business case is to extract images in a Text PDF and store images in required folder.

LindaJanecks
Level 3

I searched on every forum on internet and eventually end up fixing this by searching for a native Blue Prism VBO for PDF image extraction can be tough since it's not a common out-of-the-box feature. Your best bet will probably be looking for a free Digital Exchange VBO that wraps a simple .NET library (like an older iTextSharp version) or one that uses VBA/VBScript for basic file manipulation.

naveed_raza
Level 7

You can install Poppler pdfimages.exe application and in blueprism using Environment object use start process and specify pdfimages.exe and output folder path and provide below cmd code

pdfimages -all yourfile.pdf outputpath

 

naveed_raza
Level 7

there is another way of doing this is , using python code. you have to install python software and pypdf library , save the file with .py extension and run this python file using Environment object - start process

from pypdf import PdfReader
import os

pdf_path = r"C:\yourpath\testPDFImage\image-doc.pdf"
output_folder = r"C:\yourpath\outputimage"

os.makedirs(output_folder, exist_ok=True)

reader = PdfReader(pdf_path)

img_count = 1
for page_num, page in enumerate(reader.pages):
 
    
 if "/XObject" in page["/Resources"]:
  
    xObject = page["/Resources"]["/XObject"].get_object()
    for obj in xObject:
        if xObject[obj]["/Subtype"] == "/Image":
            data = xObject[obj].get_data()
            file_name = f"image_{page_num+1}_{img_count}.png"
            with open(os.path.join(output_folder, file_name), "wb") as f: f.write(data)

 img_count += 1