Extracting first sentence from a paragraph

KaranJuneja · ‎11-09-17

Hi, Is there a way we can extract first sentence from a paragraph. Can regex be used here. if yes How? say for example the paragraph below has two sentences, and I need first sentence: The Japanese loan will be available at 0.1% interest rate on Oct. 25 and India will be able to repay this in 50 years. Repayment will begin 15 years after the loan is received. My Desired output: The Japanese loan will be available at 0.1% interest rate on Oct. 25 and India will be able to repay this in 50 years. how can i do that? Regards Karan

ivan.gordeyev · ‎11-09-17

The easiest option would be. InStr([text], "". "") - This will output a [character number] when the next sentence starts; Left([text], [character number]) - This will extract a text (preferably into [new sentence data item] Replace([text], [new sentence data item], """") - This will replace sentence one in [text] so you can move on to the next one, if required.

TomBlackburn1 · ‎19-09-17

Karan, Please understand, there is no simple solution here that would give 100% accurate results. To understand what makes up a sentence, in a set of rules that can be given to computer software without any cognitive understanding is pretty hard. There are cognitive tools available that can provide some syntactic analysis of a document, but might be overkill for what you are looking to achieve, especially if you want a pure Blue Prism solution. As Ivan has demonstrated, you can get pretty close by looking for the first period, question mark, or exclamation mark followed by white-space. Regex: /^(.*?)[.?!]\s/ In Blue Prism you could keep it simple with the expression, Trim(Replace([text],Left([text],InStr([text],"". ""))),"""") However, given your use case in your post, this would yield, ""The Japanese loan will be available at 0.1% interest rate on Oct."" You would also need to consider the remaining characters that suggest the end of a sentence. Comparing the integer output from, InStr([text],""! ""), InStr([text],"". "") and InStr([text],""? "") to find the expression the outputs the lowest value, before performing the full expression to manipulate the text. You also need to consider what if [text] is only 1 sentence. You would need to do a decision stage to check if there is more than 1 period (e.g. InStr([text],""."")>1). You then also need to consider what if there are no periods in the data item. Even if you check that the first character after "". "" is a lowercase character, meaning that this is still part of the first sentence, won't be accurate as, again, in your use case this is a numeric value, which doesn't indicate if it is still part of the first sentence, or a separate sentence. Tom

SS&C Blue Prism Community

Extracting first sentence from a paragraph