cancel
Showing results for 
Search instead for 
Did you mean: 

Download File from Web Page

EricLiu
Level 5

Hi,

I have a question regarding how to download a file from a web page. I know that there is a Download File action in Utility - File Management. This requires a source URL. When spying a link from a web page it seems impossible to get this source URL with read stage. 

Just simply google pdf file.The first result and the source URL is this

http://www.pdf995.com/samples/pdf.pdf


However, I just cannot get this link using a read stage from the google search result page. Any ideas?

Get Document URL(Your result may vary):
https://www.google.com/search?q=pdf+file&rlz=1C1CHBF_enUS838US838&oq=pdf+file&aqs=chrome..69i57j0l5.1551j0j7&sourceid=chrome&ie=UTF-8#spf=1567693634623

Get Document URL Domain:
www.google.com

Get Current Value:
PDF document - pdf 995



------------------------------
Eric Liu
RPA Developer
America/Toronto
------------------------------
4 REPLIES 4

james.man
Staff
Staff
This is almost certainly specific to google as the actual hyperlink has nested h3 and div inside it.  Just using Blue Prism's spy gets you the nested div and not the A element.  So it may be worth trying to change and manually tweak the spied element so that it matches just the <a></a> hyperlink, and seeing if you can read the URL from that.

19105.png


------------------------------
James Man
Senior Product Consultant
Blue Prism
Asia/Hong_Kong
------------------------------

ErikChristoffer
Level 3

Hi Eric,

Might be a bit overkill, but you could use the HtmlAgilityPack (https://html-agility-pack.net/select-nodes). Read the whole HTML into this object and search for the link you need using XPath. This has to be in a code block in an object.

Regards, Erik



------------------------------
Erik Christoffer
Developer
InsingerGilissen
Europe/Amsterdam
------------------------------

Hi James,

Thanks for replying. Actually there is plenty of websites that have this type of issue. I am not an expert int web development but I guess people use a lot of div to wrap things around to use CSS, or their JavaScript pieces render the HTML this way.

I have tried to move the mouse a little bit when spying an element, not really very effective. Do you know if Blue Prism is planning to make any new functions to solve this issue?

------------------------------
Eric Liu
RPA Developer
America/Toronto
------------------------------

Thanks Erik! I have never heard about this method before. Overkill or not, I am so happy to learn a new way of interacting with HTML pages. I will definitely try it out.

------------------------------
Eric Liu
RPA Developer
America/Toronto
------------------------------