Fuzzy Matching
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
24-07-17 12:01 PM
Hello!
Has anyone been able to try the concept of 'Fuzzy Matching' ( https://en.wikipedia.org/wiki/Fuzzy_matching_(computer-assisted_translation) ) by using Blue Prism? I was thinking of creating a VBO specifically for this but find it hard to know where to begin. I read this concept could be used as suggested by Blue Prism's "Increasing Data Quality" data sheet (see documentation). Here's the exact info given in the pdf:
Fuzzy match translation
Data can be corrected or translated using €œfuzzy match€ requirements, provided there is a clear rule for doing so. For example, location names can be corrected using techniques borrowed from spell-checking algorithms to identify the closest match from a €œdictionary€ of approved values. For example, using a €œLevenstein Distance€ calculation the following corrections might be made:
ï‚· €œSollihull hospital€ becomes €œSolihull Hospital€ (corrected the double €œL€ and the capitalisation of €œH€)
ï‚· €œBlue Prism€ becomes €œBlue Prism Limited€
So my question is: has anyone done something like this before? Is this done by writing our own Visual Basic code, using a VBO, using a separate program, ... ?
Thanks for any info related to this subject!
Sébastien
6 REPLIES 6
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
25-07-17 02:46 PM
Hi Sebastien - there is no official VBO available but maybe someone out there has tried it. I would imagine a single code sage would be enough, and searching for '.Net Levenstein Distance' offers many examples. These two look like they will paste straight in, with minimal adjustment.
https://social.technet.microsoft.com/wiki/contents/articles/28961.leven…
https://www.programmingalgorithms.com/algorithm/levenshtein-distance?la…
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
25-07-17 03:37 PM
I've been able to create a vbo for it, thanks 😉 I've used the levenstein distance as well as the jaro-winkler ratio, both will prove useful for my OCR needs I believe.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
25-07-17 11:01 PM
Very good. Just bear in mind that all OCR and fuzzy matching is basically a guess that can be wrong.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
13-09-17 09:24 PM
Hello Sébastien,
I'm currently facing the same challenge as yours. Could you please share the VBO you have created ?
Thanks a lot !
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
07-03-19 06:12 AM
late to the party but lehvenstein distance functions can be useful here for finding matches where the difference between the target string and the input are similar but different
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
07-03-19 06:13 AM
i thought it was 2018...very late to the party
