Java Coding Test Course, String Search

Problem Description

This is a problem to find how many times a specific pattern appears in a given string.
For example, determining how many times the pattern "ana" appears in the string "banana".

Input

  • String text: The entire string to search (1 ≤ |text| ≤ 100,000)
  • String pattern: The string to find (1 ≤ |pattern| ≤ 50)

Output

Returns the total count of how many times pattern appears in the string text.

Example

Input

text = "banana"
pattern = "ana"

Output

2

Solution Process

To solve this problem, a string search algorithm must be used.
The simplest way is to traverse the string one character at a time and compare the pattern, but this is inefficient and has a time complexity of O(n*m) for large data.

Efficient Solution: KMP Algorithm

One of the ways to efficiently solve the string search problem is by using the KMP (Knuth-Morris-Pratt) algorithm.
This algorithm consists of the following two parts:

  1. Create an ‘LPS (Longest Prefix which is also Suffix)’ array based on the pattern.
  2. Traverse the text, comparing the pattern while utilizing the LPS array.

Generating the LPS Array

The LPS array indicates how much the prefix and suffix match within the pattern. This allows for the pattern to be reused during the search.
For example, if the pattern is "ABAB", the LPS array will be [0, 0, 1, 2].

Algorithm Implementation

Now, based on the above process, we will implement the string search algorithm in Java.

public class KMP {
    public static int[] computeLPS(String pattern) {
        int[] lps = new int[pattern.length()];
        int length = 0; 
        int i = 1;
        
        while (i < pattern.length()) {
            if (pattern.charAt(i) == pattern.charAt(length)) {
                length++;
                lps[i] = length;
                i++;
            } else {
                if (length != 0) {
                    length = lps[length - 1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
        return lps;
    }

    public static int KMPsearch(String text, String pattern) {
        int[] lps = computeLPS(pattern);
        int i = 0; 
        int j = 0; 
        int count = 0;

        while (i < text.length()) {
            if (pattern.charAt(j) == text.charAt(i)) {
                i++;
                j++;

                if (j == pattern.length()) {
                    count++;
                    j = lps[j - 1];
                }
            } else {
                if (j != 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        return count;
    }

    public static void main(String[] args) {
        String text = "banana";
        String pattern = "ana";
        int result = KMPsearch(text, pattern);
        System.out.println("Substring Count: " + result);
    }
}

Code Explanation

The code above implements the KMP algorithm, which consists of two main functions.
The computeLPS function generates the LPS array for the pattern, and the KMPsearch function searches for the pattern in the text and counts the matches.

Complexity Analysis

The time complexity of the KMP algorithm is O(n + m). Here, n is the length of the text and m is the length of the pattern.
This is an efficient method because the pattern moves and is reused within the text.

Conclusion

Today, we explored the KMP algorithm to solve the string search problem.
By using this algorithm, we can efficiently tackle various string search problems.
As you encounter problems, try experimenting with different algorithms to build your skills!