Blue Prism Product

 View Only
last person joined: 22 hours ago 

This community covers the core Blue Prism product.

 Getting underlined words between two strings in Word

Jump to Best Answer
Sean Scudellari's profile image
Sean Scudellari posted 08-04-2022 00:54
Hello, I am trying to select the underlined words between two sections of a word document and save them to a collection variable. 

I have the following code, which is working, but it seems to be selecting all underlined words in the document (and also is selecting some blank lines), instead of only selecting the underlined words between the two specific sections in the document.

Document looks like this:

I would want to select "Example1" and "Example3" in the document and save them to a variable, since those are between the two sections and underlined. The two section names will always be the same.

Here's the code I currently have:
Dim doc As Object = GetDocument(handle,documentname)
Dim w As Object = doc.Application
Dim s As Object = w.Selection

Dim Para as Microsoft.Office.Interop.Word.Paragraph

Dim blnStart as Boolean
blnStart = false

Dim table As New System.Data.DataTable()
table.Columns.Add("Underlined_Text", GetType(String))

For Each Para In doc.Paragraphs

If Para.Range.Text.ToLower.Contains(strStartText) Then
blnStart = true
End If

If Para.Range.Font.Underline = 1 and blnStart Then
With s.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Font.Underline = 1
.Text = ""
.Replacement.Text = ""
.Format = True
.Forward = True
.Wrap = 0
.Execute
End With
.Select
table.Rows.Add(s.Range.Text)
End With
End If

If Para.Range.Text.ToLower.Contains(strEndText) Then
exit for
End If

Next Para

Underlined_Text = table
doc = Nothing

The variables 'strStartText' and 'strEndText' would be equal to the two section names.

Thank you!
Eric Wilson's profile image
Eric Wilson Best Answer
Hi @Sean Scudellari,

I tested the code above, but it would not return anything for me. What I did was change the following check:

If Para.Range.Font.Underline = 1 and blnStart Then
	With s.Range
		With .Find
			.ClearFormatting
			.Replacement.ClearFormatting
			.Font.Underline = 1
			.Text = ""
			.Replacement.Text = ""
			.Format = True
			.Forward = True
			.Wrap = 0
			.Execute
		End With
		.Select
		table.Rows.Add(s.Range.Text)
	End With
End If
​


to this:

If Para.Range.Words(1).Font.Underline = 1 and blnStart Then
	With s.Range
		With .Find
			.ClearFormatting
			.Replacement.ClearFormatting
			.Font.Underline = 1
			.Text = ""
			.Replacement.Text = ""
			.Format = True
			.Forward = True
			.Wrap = 0
			.Execute
		End With
		.Select
		table.Rows.Add(s.Range.Text)
	End With
End If


Notice in the If...Then that I'm specifically checking the underline format of the first word in the paragraph. 

Cheers,
Eric
Sean Scudellari's profile image
Sean Scudellari
@Eric Wilson Thank you, Eric. I was able to get this working using your '.Words(1)' suggestion. However, my code was still selecting all underlined words in the document and not just the underlined words between the specified sections.

So I modified my code to the following using your '.Words(1)' suggestion:
Dim doc as Object = GetDocument(handle,documentname)

Dim Para as Microsoft.Office.Interop.Word.Paragraph

Dim table As New System.Data.DataTable()
table.Columns.Add("Underlined_Text", GetType(String))

Dim blnStart as Boolean 
blnStart = false

For Each Para In doc.Paragraphs

  If Para.Range.Text.ToLower.Contains(strStartText) Then
    blnStart = true
  End If

  If Para.Range.Words(1).Font.Underline = 1 and blnStart Then
	table.Rows.Add(Para.Range.Text)
  End If

  If Para.Range.Text.ToLower.Contains(strEndText) Then
    exit for 
  End If

Next Para

Underlined_Text = table​

The only thing I am curious of now is if it's possible to only select the underlined portion of the word instead of selecting the whole paragraph if the first word is underlined. Do you know if this is possible? I was trying to play around with the code but couldn't figure it out. 

Thanks again!
Eric Wilson's profile image
Eric Wilson
Hi @Sean Scudellari,

Below is the code as I have it set up.

' Declare object for code use
Dim doc as Object = GetDocument(handle,document_name)

Dim w As Object = doc.Application
Dim s As Object = w.Selection

Dim Para as Object

Dim blnStart as Boolean
blnStart = false

Dim table As New System.Data.DataTable()
table.Columns.Add("Underlined_Text", GetType(String))

For Each Para In doc.Paragraphs
	If Para.Range.Text.ToLower.Contains(startText) Then
		blnStart = true
	End If

	If Para.Range.Words(1).Font.Underline = 1 and blnStart Then
		With s.Range
			With .Find
				.ClearFormatting
				.Replacement.ClearFormatting
				.Font.Underline = 1
				.Text = ""
				.Replacement.Text = ""
				.Format = True
				.Forward = True
				.Wrap = 0
				.Execute
			End With
			.Select
			table.Rows.Add(s.Range.Text)
		End With
	End If

	If Para.Range.Text.ToLower.Contains(endText) Then
		Exit For
	End If

Next Para

Underlined_Text = table
doc = Nothing
​

And here's a screenshot of the Word doc I'm using based on your example:


In my tests, the only words that are captured and returned in the output Underlined Words Collection are Example1 and Example3.


Cheers,
Eric
Sean Scudellari's profile image
Sean Scudellari
@Eric Wilson Thank you for the detailed help Eric.

I copied and pasted your code directly and for some reason, an underlined word above the two sections is getting selected and saved to the variable and only one underlined word between section 1 and section 2 is getting selected. 

For example my document looks like this:

Overview

some text

some text
some text

Section 1

Example 1: some text
Example 2: some text
Example 3: some text

Section 2

Some text


For some reason, "overview" is getting selected. And then only "Example1" gets selected but not "Example3". It's pretty strange. 

However, if I delete the "overview" text from the document, then only "Example1" and "Example3" get selected, which is what I want since those are the two underlined items between section 1 and section 2. I verified my start and end text variables are set to Section 1 and Section 2 as well. 

Not too sure what's going on... Thanks again for your help.
Eric Wilson's profile image
Eric Wilson
@Sean Scudellari,

One thing I noticed in your original code is that you’re performing a ToLower() call when checking for the start and end sections. Because of that, I passed in the values of start and end as lower case too (i.e.​ “section 1” and “section 2”).

What version of Blue Prism and MS Word are you using?

Cheers,
Eric
Sean Scudellari's profile image
Sean Scudellari
@Eric Wilson For my start and end text section inputs, I have them entered as all lowercase as well. I tried changing them to title case (as they are in the document) and removing the ',ToLower' from the code but nothing changed - still getting "Overview" and "Example 1" as my only output. Perhaps it's some weird formatting with the "Overview" line of the document but I'm not too sure.

For BP, using version 7.1.0
For Word, using Office Professional Plus 2010

I'm limited to using 2010 office while I am developing this solution in the dev environment, but once development is complete, I believe the solution will be moved to a different server that has a much newer version of office if that makes any difference. 

Thank you.
Eric Wilson's profile image
Eric Wilson
@Sean Scudellari,

I’m also using BP v7.1​, but I’m using the latest Microsoft 365 version of Word. So this could be related to a difference in our versions of Word, or it could be the test document. In my document, both Section 1 and Section 2 are styled using the default Header 1 style.

Cheers,
Eric
Eric Wilson's profile image
Eric Wilson
@Sean Scudellari,

I've attached the file I used for testing if you want to give it a try to see if you have a different result.

Cheers,
Eric​
Sean Scudellari's profile image
Sean Scudellari
@Eric Wilson Thank you for providing that, Eric. I got the correct results using your file. I think one issue may be that there is a section called "overview" which is just above my section 1, and section 1 has "overview" in its name as well. However, even if I remove the top "overview" section, the code still fails to get the last underlined word just before section 2. The sections in my document are also not formatted as headers. So it could be a combination of not using formatted headers and the different versions of office.

Luckily, the code where I use this works as expected:
 If Para.Range.Words(1).Font.Underline = 1 and blnStart Then
	table.Rows.Add(Para.Range.Text)
  End If​

So I should be able to use that and then just split the string on the ':' so that I only keep the underlined portion. Will test out the other method once I am using the newer version of office to see if there's any difference.

I appreciate the help and quick replies Eric! Definitely could not have made it to this point without your support.