“Algorithms 201” is a category I’m trying out for notes on computer science topics that go a bit beyond the standard data structures and algorithms course.
Introduction
This note is an overview of Levenshtein’s edit distance, a standard technique for finding the difference between two strings by figuring out how many changes are required to turn one in to the other. It’s a classic example of dynamic programming and a starting point for other advanced string processing algorithms, like the Needleman-Wunsch method for aligning genetic sequences.
The method is named for Soviet mathematician and information theorist Vladimir Levenshtein who developed it in the 1960s. If you fire up the Wikipedia article for the algorithm, you’re immediately hit with this definition, which is not very intuitive:
This note will explain what that function means, how it’s derived, and how you could actually calculate it efficiently. Along the way, we’ll get more practice with the principles of dynamic programming.
We did an overview of dynamic programming in a previous note and looked at two canonical problems: 0-1 knapsack and the rod-cutting problem. That note worked through derivations and example implementations of both problems. There’s a short review below, but check it out if you want more detailed intro to the concepts involved.
This note will cover:
The core idea of many DP and recursive string algorithms: carve off the first character, do something with it, then recursively process the rest of the string
Comparing strings using edit distance
Derivation of the recursive cases for Levenshtein’s algorithm
The bottom-up dynamic programming implementation, sometimes called the Wagner-Fischer algorithm, but discovered multiple times by others
Dynamic programming
The basic idea of dynamic programming is to solve a large-scale problem (often, but not always, an optimization problem), by partitioning it into smaller subproblems and finding the optimal solutions to those. That’s similar to other divide-and-conquer techniques and, like those, dynamic programming methods are formulated recursively: the subproblems generate their own subproblems, and so forth until you reach a base case.
DP adds the idea of solving the problem by working bottom-up. Rather than doing a traditional divide-and-conquer, where we start at the top-level problem and work our way down through the tree of recursive calls, most DP methods start at the base cases and build up to the top-level solution. This is implemented with a table that contains the solution to every subproblem, so that we never have to solve any particular problem more than once.
Dynamic programming is hard because it’s not a single method — it’s more like a meta-strategy for designing algorithms. You still need to come up with the right recursive relationship, base cases, and bottom-up implementation for your application. In the last post, I outlined a good starting strategy for designing dynamic programming solutions: consider just one decision, reason about its cases, and then optimize what’s left.
Warm-up: recursive string reversal
Many recursive string algorithms have the basic form of “do something with the first item, then recursively process the rest of the string”. Consider the classic problem of reversing a string, which you can do by carving off the first character, reversing the tail of the string, then appending the first character to the end. Here’s a pseudocode implementation:
/**
* Recursive string reversal
*/
reverse (string s) {
// Base case: empty string
if length(s) == 0 {
return ""
}
// Remove the first character, reverse the rest
first = s[1]
rest = s[2:end]
reversed = reverse(rest) + first
return reversed
}
The edit distance algorithm will take this idea — carve off an item and then recursively process the rest — and apply it to comparing strings.
String distance
Testing strings for equality is straightforward: two strings are equal if they have the same lengths and identical characters at every position. Consider, though, the problem of determining the distance between two strings. What does that even mean?
Think of some ways that you could define a distance measurement between two text strings.
One option might be to look at distance using the underlying character encodings. Recall that every text character is mapped to an underlying sequence of bits, which can be thought of as an integer character code. We could try something like this:
/**
* Hypothetical distance using character codes
*
* (thought experiment, don't really do this)
*/
code_distance(string s, string t) {
// Strings have different lengths
if length(s) != length(t) {
// ???
}
distance = 0
for i = 1 to length(s) {
// Get underlying character codes
s_code = char_code(s[i])
t_code = char_code(t[i])
distance += abs(s_code - t_code)
}
return distance
}
Some thought will convince you that this is a bad idea. First, it’s not clear what to do if the strings have different lengths. Even if we solved that problem, the most fundamental issue is that this calculation doesn’t correspond to our natural sense of “closeness” for strings. For example, “apple”
and “bqqmf”
have a distance of 5 in this measurement because each position differs by one letter, but “apple”
and “spple”
differ by more even though the words are clearly more related.
Levenshtein’s distance metric considers not the difference between characters, but the number of edits needed to turn one string into another. So, in this framework, “apple”
and “spple”
require only one edit but “apple”
and “bqqmf”
need five. This corresponds to our natural notion of similarity.
Let’s suppose we have two strings called a and b. Levenshtein’s distance will calculate the minimum number of edits needed to turn a into b, where an “edit” can be one of three things:
Transforming a character in a to match a character in b
Deleting a character from a
Inserting a character into a
The method only needs to consider changes made to a because edits are symmetric — deleting a character from a is equivalent to inserting one into b and vice-versa — so we’d get the same result whether we considered editing a, editing b, or doing some mix of edits between the two.
For example, suppose we want to turn “stitch” into “kitchen”. This can be done in four edits:
stitch --> ktitch (transform s to k)
ktitch --> kitch (delete t)
kitch --> kitche (insert e)
kitche --> kitchen (insert n)
You can verify that going the other way, from “kitchen” to “stitch”, would also require four edits with the insertions and deletions reversed.
This isn’t the only way to do the transformation. Find another way that takes four edits.
Minimizing edits recursively
Now here’s the tricky part. We’re going to reason about the recursive relationship for the number of edits needed to turn a into b.
Remember the basic strategy for recursive string algorithms, which aligns with our standard dynamic programming strategy: carve off the head, do something with it, then recursively process the tail of the string. The method starts by comparing the first characters of the two strings, makes an edit if necessary, then continues to process the rest of the strings. There are three basic cases to consider:
If the first pair of characters match
If the first pair are different, which then requires reasoning about how to modify the string using transformation, insertion, or deletion
If one string is empty and the other is not
Match
This is the easiest case. Suppose we have
a = "kitten"
b = "kitchen"
The first characters are the same, so no edit is required at the first position. The edit distance between a and b is the distance between their tails:
// First characters match - take distance between tails
distance("kitten", "kitchen") = distance("itten", "itchen")
In more general pseudocode:
// First characters match - take distance between tails
if a[1] == b[1] {
return distance(a[2:end], b[2:end])
}
Difference
What if the first characters are different? Then we need to perform an edit. There are three edit options: transformation, insertion, and deletion.
Transformation
Suppose that the input strings are
a = "lemon"
b = "melon"
Changing a[1] to match b[1] costs one edit. The overall edit distance is therefore 1 plus the distance between the remaining tails:
transform_dist = 1 + distance("emon", "elon")
In pseudocode form:
// First characters don't match - edit required
if a[1] != b[1] {
// Case 1: transform first character
transform_dist = 1 + distance(a[2:end], b[2:end])
}
Deletion
The second option is to delete the first character of a. For example, suppose the strings are
a = "slime"
b = "lemon"
Deleting a[1] is one edit and leaves b unchanged. We then need to calculate the number of edits needed to bring the remaining tail of a into agreement with b.
delete_cost = 1 + distance("lime", "lemon")
Observe that the first argument is the tail a[2:end] and the second argument is b. Here’s the pseudocode with a second case added.
// First characters don't match - edit required
if a[1] != b[1] {
// Case 1: transform first character
transform_dist = 1 + distance(a[2:end], b[2:end])
// Case 2: delete
delete_dist = 1 + distance(a[2:end], b)
}
Insertion
The final option is to insert the matching character at the front of a. For example,
a = "laser"
b = "blazer"
This is the trickiest case to understand. Suppose that you add b[1] = “b” to the front of string a. The result would be “blaser”. The first characters now match and the recursive case checks the tails of both strings:
insert_dist = 1 + distance("laser", "lazer")
Observe that the first argument is the original value of a and the second is the tail of b. This is the mirror of the deletion case.
insert_dist = 1 + distance(a, b[2:end])
We don’t know which of the three edit options will lead to the smallest overall number of edits, so the method has to evaluate all three cases and return the minimum.
// First characters don't match - edit required
if a[1] != b[1] {
// Case 1: transform first character
transform_dist = 1 + distance(a[2:end], b[2:end])
// Case 2: delete
delete_dist = 1 + distance(a[2:end], b)
// Case 3: insert
insert_dist = 1 + distance(a, b[2:end])
return min(transform_dist,
delete_dist,
insert_dist)
}
Base case: empty string
The base case occurs when either a or b is empty.
If a is empty, edit it by inserting all of b’s remaining characters
If b is empty, delete all of a’s remaining characters
In either case, the distance is the length of the longer string or 0 if both strings are empty.
// Base case: at least one string is empty
if length(a) == 0 or length(b) == 0 {
return max(length(a), length(b))
}
Complete method
Here’s the complete recursive pseudocode method.
/**
* Levenshtein's edit distance (recursive version)
*/
distance(string a, string b) {
// Base case: at least one string is empty
if length(a) == 0 or length(b) == 0 {
return max(length(a), length(b))
}
// First characters match
else if a[1] == b[1] {
return distance(a[2:end], b[2:end])
}
// First characters don't match
else {
// Case 1: transform first character
transform_dist = 1 + distance(a[2:end], b[2:end])
// Case 2: delete
delete_dist = 1 + distance(a[2:end], b)
// Case 3: insert
insert_dist = 1 + distance(a, b[2:end])
return min(transform_dist,
delete_dist,
insert_dist)
}
}
Here’s the mathematical definition of the distance function given in the Wikipedia article.
Compare it to the pseudocode and verify that they define the same cases.
Dynamic programming formulation
As we discussed in the earlier dynamic programming article, the pure recursive method is usually bad, because it will recalculate the same subproblems multiple times. The preferred dynamic programming approach uses a table that it fills in bottom-up so that each subproblem needs to be calculated only one time.
We developed the recursive version by reasoning about the heads of the strings and recurring over the tails. This is not ideal for the bottom-up solution because it makes each solution depend on later characters of the string; we’d like to calculate each subproblem using only earlier results.
The DP method does this by building up from empty strings to full strings. Each table entry will correspond to a pair of substrings, so that entry (i, j) keeps track of the distance between substrings a[1:i] and b[1:j].
For example, suppose we’re comparing
a = "ghost"
b = "toast"
The initial table looks like the following, with a 0 row and column added for the base cases.
j
| t o a s t
| 0 1 2 3 4 5
---------------------------------
0 |
g 1 |
h 2 |
i o 3 |
s 4 |
t 5 |
Position (3, 4) would correspond to
distance("gho", "toas")
Once the table is filled, the lower-right entry at (5, 5) will contain the distance between the full strings.
Filling in the table
The cases for the table calculations are the same as those in the recursive method, modified to consider the ending pair of string characters and calculate distances in terms of their prefixes.
The first case occurs if the ending characters match. For example, “ghost” and “toast” have the same ending letter, so we can ignore it and consider the distance between the prefix strings:
// Ending characters match
distance("ghost", "toast") = distance("ghos", "toas")
Look back and verify that this is equivalent to the recursive case. In terms of table entries, this corresponds to checking the position associated with the string prefixes a[i - 1] and b[j - 1].
// Match - distance depends on the earlier substrings
if a[i] == b[j] {
table[i][j] = table[i - 1][j - 1]
}
The base case occurs if either i = 0 or j = 0. Both cases correspond to reaching the front of a string, in which case the distance is the remaining length of the other string. For example, position (3, 0) in the table corresponds to
distance("gho", "")
which is 3. Filling in the base row and column:
j
| t o a s t
| 0 1 2 3 4 5
---------------------------------
0 | 0 1 2 3 4 5
g 1 | 1
h 2 | 2
i o 3 | 3
s 4 | 4
t 5 | 5
The final cases correspond to making edits. There are three cases, which again correspond to transformation, insertion, and deletion, but modified to work backwards.
Let’s look at the example problem. Suppose that we’re evaluating position (3, 3), which corresponds to
distance("gho", "toa")
Checking a[3] and b[3] shows they are different, so the distance evaluation is:
// Three edit cases for position (3, 3)
distance("gho", "toa") = min {
// Transform "o" to "a" then check prefixes
1 + distance("gh", "to"),
// Delete "o"
1 + distance("gh", "toa"),
// Insert "a"
1 + distance("gho", "to")
}
The insert case is the trickiest to understand. If we insert “a” at the end of “gho” to get “ghoa”, then comparing with “toa” shows the last characters now match. We're left comparing “gho” with “to”.
Rewriting in terms of table entries:
if a[i] != b[j] {
table[i][j] = 1 + min(table[i - 1][j - 1],
table[i - 1][j],
table[i][j - 1])
}
With these changes, each table position (i, j) only depends on earlier entries. The complete procedure is as follows.
/**
* Bottom-up calculation of Levenshtein's edit distance
*/
initialize table
// Base case for column 0
table[i][0] = i for i = 0 to length(a)
// Base case for row 0
table[0][j] = j for j = 0 to length(b)
// Fill in the table from upper-left to lower-right
for i = 1 to length(a) {
for j = 1 to length(b) {
// Match
if a[i] == b[j] {
table[i][j] = table[i - 1][j - 1]
}
// Edit
else {
table[i][j] = 1 + min(table[i - 1][j - 1],
table[i - 1][j],
table[i][j - 1])
}
}
}
// Final answer is in the lower-right entry
return table[length(a)][length(b)]
Example
Now let’s work through the “ghost”-“toast” example using the full bottom-up method. After filling in the base case row and column, we have the following:
j
| t o a s t
| 0 1 2 3 4 5
---------------------------------
0 | 0 1 2 3 4 5
g 1 | 1
h 2 | 2
i o 3 | 3
s 4 | 4
t 5 | 5
Start looping over i and j. The first position to fill is (1, 1), which corresponds to calculating
distance("g", "t")
The letters are different, so we’re in the second case and we have to evaluate the three edit options:
table[1][1] = 1 + min(table[0][0], // transform
table[0][1], // delete
table[1][0]) // insert
Checking the table shows that the minimum option is to transform “g” into “t”. That makes sense — using insertion or deletion would then require a second edit to match the two strings.
Position (1, 2) is next, which corresponds to calculating:
distance(a[1], b[1:2]) --> distance("g", "to")
The method checks if a[1] = “g” matches b[2] = “o” and finds they are different, so we again need to check the three edit cases and choose the minimum.
table[1][2] = 1 + min(table[0][1], // transform
table[0][2], // delete
table[1][1]) // insert
Tip: look carefully at the indices! It’s easy to get confused by the different uses of i - 1 and j - 1 in each case. Checking the table will show that the minimum result is 2 edits, which makes sense: changing “g” to “to” can be done by transforming “g”, then inserting the other letter. This pattern continues to complete the first row:
j
| t o a s t
| 0 1 2 3 4 5
---------------------------------
0 | 0 1 2 3 4 5
g 1 | 1 1 2 3 4 5
h 2 | 2
i o 3 | 3
s 4 | 4
t 5 | 5
Let’s look at one more example showing the match case. Consider position (3, 2):
distance(a[1:3], b[1:2]) --> distance("gho", "to")
Comparing a[3] = “o” with b[2] = “o” shows that they match, so the first case applies.
table[3][2] = table[2][1]
You can verify that position (2, 1) should have a score of 2. Again, that makes sense, because changing “gho” into “to” can be done in two edits by deleting “h” and then changing “g” into “t”.
Continue the process and complete the rest of the table. Pay attention to the letters! Remember to check if a[i] = b[j] to determine what case you’re in.
Here’s the final result:
j
| t o a s t
| 0 1 2 3 4 5
---------------------------------
0 | 0 1 2 3 4 5
g 1 | 1 1 2 3 4 5
h 2 | 2 2 2 3 4 5
i o 3 | 3 3 2 3 4 5
s 4 | 4 4 3 3 3 4
t 5 | 5 4 4 4 4 3
Think about the final answer at position (5, 5). The last pair of letters (“t”) match so the entry at (5, 5) depends on position (4, 4). The next-to-last pair (“s”) also match, so the position (4, 4) depends on (3, 3), which corresponds to
distance("gho", "toa")
That can be done in three edits by changing the three characters.
What’s the complexity of the bottom-up method?
The core of the method are the nested loops over the strings. The total number of iterations is the size of the table, so the overall complexity is
Summary
You’ll probably never need to implement edit distance yourself, but working through its implementation will build your algorithmic intuition, which makes you better at many other kinds of problems.
As you’re working through this material, keep the following ideas in mind:
Dynamic programming is all about breaking a larger problem into its subproblems; most of the challenge is in deriving the required recursive cases
A good starting point for doing that is to carve off one decision, reason about its cases, then optimize what’s left
The bottom-up strategy fills in a table with one entry per subproblem
Practice Questions
Small
Use the dynamic programming approach to fill in the table for “mental” and “metal”. The distance is one edit, but this will allow you to practice the algorithm when the strings have different lengths.
Bigger
Find the distance between “network” and “worth”.
Keyboard distance
A standard variation on the basic edit distance is to make costs non-uniform, so that “easy” edits are cheaper than “hard” ones. Think about typed text: errors caused by hitting an adjacent key are common, like “live” instead of “love”, so maybe they should be penalized less than random key substitutions.
Modify the bottom-up pseudocode method to change the distance calculations as follows:
Changing between cases (like “a” to “A”) costs only .5
Changing to an adjacent character costs 1
Changing to a non-adjacent character costs 2
Deleting to match an adjacent character costs only 1 (it’s common to hit other keys next to the one you meant to type)
Deleting to match non-adjacent characters costs 2
Inserts cost 3
Assume you have a function called adjacent
that takes two characters and returns True if they’re considered adjacent on the keyboard and False otherwise.