Problem Description
This is a problem to find how many times a specific pattern appears in a given string.
For example, determining how many times the pattern "ana"
appears in the string "banana"
.
Input
- String
text
: The entire string to search (1 ≤ |text| ≤ 100,000) - String
pattern
: The string to find (1 ≤ |pattern| ≤ 50)
Output
Returns the total count of how many times pattern
appears in the string text
.
Example
Input
text = "banana"
pattern = "ana"
Output
2
Solution Process
To solve this problem, a string search algorithm must be used.
The simplest way is to traverse the string one character at a time and compare the pattern, but this is inefficient and has a time complexity of O(n*m) for large data.
Efficient Solution: KMP Algorithm
One of the ways to efficiently solve the string search problem is by using the KMP (Knuth-Morris-Pratt) algorithm.
This algorithm consists of the following two parts:
- Create an ‘LPS (Longest Prefix which is also Suffix)’ array based on the pattern.
- Traverse the text, comparing the pattern while utilizing the LPS array.
Generating the LPS Array
The LPS array indicates how much the prefix and suffix match within the pattern. This allows for the pattern to be reused during the search.
For example, if the pattern is "ABAB"
, the LPS array will be [0, 0, 1, 2]
.
Algorithm Implementation
Now, based on the above process, we will implement the string search algorithm in Java.
public class KMP {
public static int[] computeLPS(String pattern) {
int[] lps = new int[pattern.length()];
int length = 0;
int i = 1;
while (i < pattern.length()) {
if (pattern.charAt(i) == pattern.charAt(length)) {
length++;
lps[i] = length;
i++;
} else {
if (length != 0) {
length = lps[length - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
public static int KMPsearch(String text, String pattern) {
int[] lps = computeLPS(pattern);
int i = 0;
int j = 0;
int count = 0;
while (i < text.length()) {
if (pattern.charAt(j) == text.charAt(i)) {
i++;
j++;
if (j == pattern.length()) {
count++;
j = lps[j - 1];
}
} else {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return count;
}
public static void main(String[] args) {
String text = "banana";
String pattern = "ana";
int result = KMPsearch(text, pattern);
System.out.println("Substring Count: " + result);
}
}
Code Explanation
The code above implements the KMP algorithm, which consists of two main functions.
The computeLPS
function generates the LPS array for the pattern, and the KMPsearch
function searches for the pattern in the text and counts the matches.
Complexity Analysis
The time complexity of the KMP algorithm is O(n + m). Here, n is the length of the text and m is the length of the pattern.
This is an efficient method because the pattern moves and is reused within the text.
Conclusion
Today, we explored the KMP algorithm to solve the string search problem.
By using this algorithm, we can efficiently tackle various string search problems.
As you encounter problems, try experimenting with different algorithms to build your skills!