Extract Images from PDF

MareddyMahesh · ‎18-04-24

Hi,

Is there any VBO available to extract images from PDF files.

Thanks,

Maheshwar

harish.mogulluri · ‎18-04-24

Hi Maheshwar,

you can check "Connector for Blue Prism - Adobe PDF Services - API - 2.0.0" asset from digital exchange.

https://digitalexchange.blueprism.com/dx/entry/3439/solution/blue-prism---adobe-pdf-services---export

-----------------------
If I answered your query. Please mark it as the Best Answer

Harish Mogulluri

MareddyMahesh · ‎19-04-24

Hi Harish,

Thank you for your reply.

Unfortunately, the connector requires the creation of a paid account in Adobe PDF Services, which isn't suitable for my current needs.

I'm searching for a simpler utility similar to those available in Power Automate and Ui Path.

My Business case is to extract images in a Text PDF and store images in required folder.

naveed_raza · 2 weeks ago

You can install Poppler pdfimages.exe application and in blueprism using Environment object use start process and specify pdfimages.exe and output folder path and provide below cmd code

pdfimages -all yourfile.pdf outputpath

naveed_raza · 2 weeks ago

there is another way of doing this is , using python code. you have to install python software and pypdf library , save the file with .py extension and run this python file using Environment object - start process

from pypdf import PdfReader
import os

pdf_path = r"C:\yourpath\testPDFImage\image-doc.pdf"
output_folder = r"C:\yourpath\outputimage"

os.makedirs(output_folder, exist_ok=True)

reader = PdfReader(pdf_path)

img_count = 1
for page_num, page in enumerate(reader.pages):
 
    
 if "/XObject" in page["/Resources"]:
  
    xObject = page["/Resources"]["/XObject"].get_object()
    for obj in xObject:
        if xObject[obj]["/Subtype"] == "/Image":
            data = xObject[obj].get_data()
            file_name = f"image_{page_num+1}_{img_count}.png"
            with open(os.path.join(output_folder, file_name), "wb") as f: f.write(data)

 img_count += 1

SS&C Blue Prism Community

Extract Images from PDF