A fuzzy search is a technique that matches string approximately and returns a list of results even though spellings may not match exactly. A fuzzy matching algorithm can help ignore typo errors or when the exact word is unknown. This technique is commonly used in search engines like Bing and Google.
There are many algorithms available to implement a fuzzy search. In this blog, we will discuss one of them, Levenshtein distance.
What is Levenshtein distance?
Levenshtein distance is the favored approach for finding the edit distance, which is the number of character changes needed to convert the search word to the exact phrase. The following single-character changes are used to compute edit distance.
- Insertion: Syncfuson → Syncfusion
- Deletion: Syncfussion → Syncfusion
- Substitution: Syncfuseon → Syncfusion
In all these cases, the distance is 1. Let us see another example in which the distance is more than 1.
Let us take Syncfussen as the search word. To convert this to Syncfusion, we must make three changes.
- Deletion: Syncfussen -> Syncfusen
- Insertion: Syncfusen-> Syncfusien
- Substitution: Syncfusien-> Syncfusion
The distance matrix
The previously discussed examples are straightforward. In a real-world scenario, the search and the exact words may be pretty complex, and we must write generic logic to find the distance between them. There are many references online regarding this. In this blog, we will look at it briefly.
Let’s take the search and actual words as DESTP and DESKTOP, respectively. The initial table will look like the following one. We will have an additional index from their original length in both rows and columns. You will see why when you understand how this table works. The populated cells are just indices. Also, ignore the empty cells in the top left. We will not need them.
|
| D | E | S | K | T | O | P |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
D | 1 |
|
|
|
|
|
|
|
E | 2 |
|
|
|
|
|
|
|
S | 3 |
|
|
|
|
|
|
|
T | 4 |
|
|
|
|
|
|
|
P | 5 |
|
|
|
|
|
|
|
We will start filling the 2 * 2 matrix.
|
| D |
| 0 | 1 |
D | 1 |
|
The value of the empty cell will be calculated based on the following scenarios:
- Minimum of three other cells if the characters match.
- Minimum of three other cells + 1 if the characters do not match.
Here, the characters match. Hence the empty cell value for the D vs. D is Min(0, 1, 1), i.e., 0.
|
| D |
| 0 | 1 |
D | 1 | 0 |
The table will look like the following one.
|
| D | E | S | K | T | O | P |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
D | 1 | 0 |
|
|
|
|
|
|
E | 2 |
|
|
|
|
|
|
|
S | 3 |
|
|
|
|
|
|
|
T | 4 |
|
|
|
|
|
|
|
P | 5 |
|
|
|
|
|
|
|
We will move to the following 2 * 2 matrix, the 2 * 2 matrix for D vs. E.
1 | 2 |
0 |
|
Here, the characters do not match. Hence, the empty cell value for the D vs. E is Min(1, 2, 0) + 1, i.e., 1.
The table will look like the following one.
|
| D | E | S | K | T | O | P |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
D | 1 | 0 | 1 |
|
|
|
|
|
E | 2 |
|
|
|
|
|
|
|
S | 3 |
|
|
|
|
|
|
|
T | 4 |
|
|
|
|
|
|
|
P | 5 |
|
|
|
|
|
|
|
If you follow this for the whole table, the final table will look like the following one.
|
| D | E | S | K | T | O | P |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
D | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
E | 2 | 1 | 0 | 1 | 2 | 3 | 4 | 5 |
S | 3 | 2 | 1 | 0 | 1 | 2 | 3 | 4 |
T | 4 | 3 | 2 | 1 | 1 | 1 | 2 | 3 |
P | 5 | 4 | 3 | 2 | 2 | 2 | 2 | 2 |
The cell’s value at the bottom right is the distance between DESTP and DESKTOP. The value is 2. To convert DESTP to DESKTOP, we must make two changes. This approach can be used for calculating the distance between any two words.
Let’s convert this into C# logic.
Implementing distance matrix in C#
You can find the comment for each line. This is the logic we just discussed.
public int LevenshteinDistance(string searchWord, string actualWord) { var searchWordLength = searchWord.Length; var actualWordLength = actualWord.Length; //Forming a table that has a length one higher in both rows and columns. var matrix = new int[searchWordLength + 1, actualWordLength + 1]; if (searchWordLength == 0) return actualWordLength; if (actualWordLength == 0) return searchWordLength; //Initializing with indices. for (var i = 0; i <= searchWordLength; matrix[i, 0] = i++) { } for (var j = 0; j <= actualWordLength; matrix[0, j] = j++) { } for (var i = 1; i <= searchWordLength; i++) { for (var j = 1; j <= actualWordLength; j++) { //If the characters do not match, we will add 1. var cost = (actualWord[j - 1] == searchWord[i - 1]) ? 0 : 1; //Minimum of the 3 cells + above cost. matrix[i, j] = Math.Min( Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1), matrix[i - 1, j - 1] + cost); } } // Return the value at the right bottom. return matrix[searchWordLength, actualWordLength]; }
Integration with Syncfusion WinUI AutoComplete
Let’s see how to integrate this method with the Syncfusion WinUI AutoComplete control. We are not going to see in detail how to do custom filtering in AutoComplete. There are already user guides and a blog for this.
Following is the completed code snippet. We have included the previously mentioned LevenshteinDistance method in this class. Other methodologies and logic is self-explanatory. We have also added comments in the required lines for better understanding.
XAML:
<editors:SfAutoComplete Width="200" HorizontalAlignment="Center" VerticalAlignment="Center" PlaceholderText="Select the social media" ItemsSource="{Binding SocialMedias}"> <editors:SfAutoComplete.FilterBehavior> <local:CustomFiltering/> </editors:SfAutoComplete.FilterBehavior> </editors:SfAutoComplete>
C#:
namespace FuzzySearching { public class CustomFiltering : IAutoCompleteFilterBehavior { public Task<object> GetMatchingItemsAsync(SfAutoComplete source, AutoCompleteFilterInfo filterInfo) { //To include the results that match the search word. List<string> builtInResult = (source.ItemsSource as List<string>).Where(stringToCheck => stringToCheck.ToLower().Contains(filterInfo.Text.ToLower())).ToList(); //Results found through the new fuzzy search logic. To allow words with more differences, you can decrease the value. List<string> fuzzyResult = FuzzySearch(filterInfo.Text, source.ItemsSource as List<string>, 0.4); //Combining both the results and returning it. IEnumerable<string> combinedResult = builtInResult.Union(fuzzyResult); return Task.FromResult((object)combinedResult); } public List<string> FuzzySearch(string searchWord, List<string> actualWords, double requiredMatchingFactor) { List<string> foundWords = new List<string>(); foreach (string s in actualWords) { // Calculate the Levenshtein-distance. int levenshteinDistance = LevenshteinDistance(s, searchWord); // Length of the longer string. int length = Math.Max(searchWord.Length, s.Length); // Calculate the score and convert to the matching factor. double matchingFactor = 1.0 - (double)levenshteinDistance / length; //If the matching factor is higher than the required matching factor, we will include that word. //The lower the required matching factor, the more difference between the words will be allowed. if (matchingFactor > requiredMatchingFactor) foundWords.Add(s); } return foundWords; } public int LevenshteinDistance(string searchWord, string actualWord) { var searchWordLength = searchWord.Length; var actualWordLength = actualWord.Length; //Forming a table that has a length one higher in both rows and columns. var matrix = new int[searchWordLength + 1, actualWordLength + 1]; if (searchWordLength == 0) return actualWordLength; if (actualWordLength == 0) return searchWordLength; //Initializing with indices. for (var i = 0; i <= searchWordLength; matrix[i, 0] = i++) { } for (var j = 0; j <= actualWordLength; matrix[0, j] = j++) { } for (var i = 1; i <= searchWordLength; i++) { for (var j = 1; j <= actualWordLength; j++) { //If the characters do not match, we will add 1. var cost = (actualWord[j - 1] == searchWord[i - 1]) ? 0 : 1; //Minimum of the 3 cells + above cost. matrix[i, j] = Math.Min( Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1), matrix[i - 1, j - 1] + cost); } } // Return the value at the bottom right. return matrix[searchWordLength, actualWordLength]; } } public sealed partial class MainWindow : Window { public MainWindow() { this.InitializeComponent(); } } public class SocialMediaViewModel { public List<string> SocialMedias { get; set; } public SocialMediaViewModel() { this.SocialMedias = new List<string>(); this.SocialMedias.Add("Facebook"); this.SocialMedias.Add("Twitter"); this.SocialMedias.Add("Google Plus"); this.SocialMedias.Add("Instagram"); this.SocialMedias.Add("LinkedIn"); this.SocialMedias.Add("Skype"); this.SocialMedias.Add("Telegram"); this.SocialMedias.Add("Televzr"); this.SocialMedias.Add("Tik Tok"); this.SocialMedias.Add("Tout"); this.SocialMedias.Add("Tumblr"); this.SocialMedias.Add("Twitter"); this.SocialMedias.Add("Vimeo"); this.SocialMedias.Add("WhatsApp"); this.SocialMedias.Add("YouTube"); } } }
With fuzzy search
Without fuzzy search
We would not have the search results if we did not use our fuzzy search algorithm.
GitHub reference
Also, you can find the complete demo for Implementing Fuzzy Search in WinUI AutoComplete on GitHub.
Conclusion
In this blog, we have seen how to calculate Levenshtein distance and use it for fuzzy searching. We have also seen how to integrate this in the Syncfusion WinUI AutoComplete control.
If you have any questions about this blog, please let us know in the comments. If you have any issues or require new features, contact us through our support forum, support portal, or feedback portal. We are always happy to assist you!