Library
Module
Module type
Parameter
Class
Class type
The library can be used to search for a keyword/pattern in a body of text while allowing for spelling errors. We use Levenshtein distances as the notion of spelling errors, and the functions in the library take some error limit k
and searches for substrings in the text with Levenshtein distance less than k
from the pattern.
The entry point of this library is the module: Wu_Manber
.
Two variants of the algorithm are provided.
The right-leaning variant is guaranteed to find a match if the original algorithm finds a match, and the error count reported by the variant is guaranteed to be no worse than the original. But the variant is a little harder to use since extra work is needed to check for matches at the end of the text.
We currently only provide the functions in Wu_Manber.StringSearch
as a high-level interface for using the library.
The search function in this module stop after the first match, and returns the error count together with the number of characters of the text read by the algorithm.
# #require "wu-manber";;
# open Wu_Manber;;
# StringSearch.(search ~k:2 ~pattern:"abcd" ~text:"abcd" |> report);;
- : string = "Pattern matched with 2 errors at character 2 of text"
# StringSearch.(search ~k:2 ~pattern:"abcd" ~text:"abd" |> report);;
- : string = "Pattern matched with 2 errors at character 2 of text"
# StringSearch.(search_right_leaning ~k:2 ~pattern:"abcd" ~text:"abcd" |> report);;
- : string = "Pattern matched with 0 errors at character 4 of text"
# StringSearch.(search_right_leaning ~k:2 ~pattern:"abcd" ~text:"abd" |> report);;
- : string = "Pattern matched with 1 errors at character 3 of text"
We currently only provide the functors in Wu_Manber.FirstMatch
as a mid-level interface.
Examples of how this interface is used can be found in the code for Wu_Manber.StringSearch
.
An example of how to use the low level interfaces of the library can be found in the module: Wu_Manber.FirstMatch
.