cancel
Showing results for 
Search instead for 
Did you mean: 

How to convert HTML code to plain text?

AnnaRydzewska
Level 2
Hello,
How can I convert HTML code to plain text? What dll files are used for decoding HTML? How can I reference to this dll in code stage?

Thank you,
Anna Rydzewska

------------------------------
Anna Rydzewska
RPA Developer
Rockwell Automation
Europe/Warsaw
------------------------------
14 REPLIES 14

AmiBarrett
Level 12
The easiest way involves using a read stage in Blue Prism. If you know where the element you want to read is, it's fairly straight-forward. If all-else, you can send the keystrokes to select either the whole page (ctrl-A, ctrl-C), or use ctrl-double-click on the area and copy that to the clipboard.

The only common scenario I can think of where it's not already on a webpage, would be if you're trying to parse HTML out of an e-mail (and for some reason don't want to use the built-in option to read as plain-text). For this case, or other similar scenarios like it, you could write the HTML to a file, then open it up in IE and access it that way with the methods at the start of this post.

------------------------------
Ami Barrett
Lead RPA Software Developer
Solai & Cameron
Richardson, TX
------------------------------

AndreyKudinov
Level 10
In general this is not an easy question. Depends if you just need text to parse it or do you want it decent looking.
One possible solution: http://www.blackbeltcoder.com/Articles/strings/convert-html-to-text

------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------

Hi @AmiBarrett, when you say the "built-in option to read as plain-text" - is that with the MS Outlook Email VBO, I am using the action "Get Received Items (Expert)". How do I set it so that the email body is returned in plain text?​

------------------------------
Stephanie Strydom
------------------------------

I would have sworn from prior builds of the official VBO that there was an input option to toggle if you wanted it to be HTML or not, but I'm not seeing it.

There's an alternate build on GitHub I've been working on that does support this though. You'll see that all three 'Get Received Items' actions are condensed into one - you should still be able to use the same filter string there.

------------------------------
Ami Barrett
Solution Architect
Karsun Solutions
Plano TX
------------------------------

There's a flag on the Send Email action that indicates the supplied body text is HTML, but there's no flag in the Get Received Items actions for specifically selecting the format of the body that should be returned.

Within the code stage of the Get Items action there's a line that sets the value of the returned body item based on the value of the BodyFormat property of the specific item. So if the BodyFormat is set to HTML, you'll get back the HTML body. If it's set to plain text, you'll get back the plain text version of the body. You could always change that line of code to only return plain text, or you could add a flag like Amy mentioned.

13502.png
Cheers,




------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Hello Eric,

Can't we have that in the input parameter itself in the official outlook VBO? If the user wants to get plain text OR HTML text.

I believe most folks would like the plain text but the VBO by default returns the HTML body.

The tweak is fine and easy but we should have the option out of the box from the VBO considering its widespread usability and ease of access to users.




------------------------------
If I was of assistance, please vote for it to be the "Best Answer".

Thanks & Regards,
Tejaskumar Darji
Sr. Consultant-Technical Lead
------------------------------

I ended up solving my problem, by keeping it in HTML and then I used the "Split Lines" from the "Utility - Strings" VBO to convert HTML to a collection, and that worked nice for what I needed to do. Then I had to do other string manipulations to get the pieces of data I needed from the email.

------------------------------
Stephanie Strydom
------------------------------

I am trying to do this for a different CRM system. (Genesys) The body of the email gets ingested into Blue Prism with HTML around the body. How would I remove the HTML tags to leave just the text on the mail in Blue Prism?

Kind Regards,
Gary

------------------------------
Gary Mannion
------------------------------

Hi Eric,

Would you please be able to clarify what the change is we need to make to that line of code? Would it be to change 'item.HTMLBody' to 'item.PlainText' ?

Thanks,
Alex

------------------------------
Alex Traynor
------------------------------