C++ Coding Test Course, String Search

Hello! In this article, we will explore the problem of finding strings using C++. Solving algorithmic problems helps improve programming skills and assists in achieving good results in coding tests. Particularly in recent years, string manipulation problems have frequently appeared in coding tests. Therefore, a thorough understanding and practice of this topic are necessary.

Problem Description

In this lecture, we will address the problem of finding a specific string (pattern) within a given text and counting how many times that string appears in the text. This problem serves as a fundamental example of string search algorithms and will solve the following problem.

Problem:

Given a string text and pattern, write a program to determine how many times pattern appears in text.

Input Format

  • First line: string text (1 ≤ |text| ≤ 100,000)
  • Second line: string pattern (1 ≤ |pattern| ≤ 100,000)

Output Format

  • Print the count of matching patterns.

Approach

There are several algorithms that can be used to solve the problem, but we will look at a simple brute force method here. This method checks each position in the text against the pattern to determine if they match. The performance is O(n * m), where n is the length of the text and m is the length of the pattern. This method is straightforward and intuitive, but one might also consider more efficient methods like the KMP algorithm or the Rabin-Karp algorithm.

Code Implementation

Let’s implement the code. Below is a string-finding program written in C++:


#include <iostream>
#include <string>

int countOccurrences(const std::string &text, const std::string &pattern) {
    int count = 0;
    int textLength = text.length();
    int patternLength = pattern.length();

    for (int i = 0; i <= textLength - patternLength; i++) {
        // Compare substring through partial string comparison
        if (text.substr(i, patternLength) == pattern) {
            count++;
        }
    }
    return count;
}

int main() {
    std::string text, pattern;

    std::cout << "Enter text: ";
    std::getline(std::cin, text);
    std::cout << "Enter pattern: ";
    std::getline(std::cin, pattern);

    int occurrences = countOccurrences(text, pattern);
    std::cout << "The pattern appears " << occurrences << " times." << std::endl;

    return 0;
}

Code Explanation

The above C++ code solves the string-finding problem. It performs functions based on the text and pattern provided by the user.

  • countOccurrences function: This function counts the occurrences of pattern in the given text. It extracts substrings from each position in the text and increases the count whenever there is a match with the given pattern.
  • main function: This part handles basic input and output. It takes a string input from the user and calls the countOccurrences function to print the result.

Performance Analysis

The brute-force approach described above has a worst-case time complexity of O(n * m). As the length of the text increases, performance may degrade. However, if simplicity and intuitiveness are prioritized, this method can be useful.

For an efficient solution, consider the KMP (Knuth-Morris-Pratt) algorithm. The KMP algorithm optimizes string searches by utilizing repeated parts of the pattern. This algorithm has a time complexity of O(n + m).

KMP Algorithm Introduction

The KMP algorithm is designed to solve pattern search problems and reduces repetitive checks by utilizing previously matched parts. This algorithm consists of two main steps. The first step builds an ‘LPS (Longest Prefix Suffix)’ array for the pattern, and the second step uses this LPS array to search for the pattern in the text.

Building the LPS Array

The LPS array stores the lengths of the longest prefixes and suffixes among each prefix of the pattern. For example, for the pattern “ABAB,” the LPS array becomes [0, 0, 1, 2]. This has the following implications:

  • No prefix and suffix for A at index 0, so it’s 0
  • No prefix and suffix for B at index 1, so it’s 0
  • AB is both prefix and suffix at index 2, so it’s 1
  • ABA is both prefix and suffix at index 3, so it’s 2

KMP Algorithm Implementation

Below is the C++ code that implements the KMP algorithm:


#include <iostream>
#include <string>
#include <vector>

std::vector computeLPS(const std::string &pattern) {
    int m = pattern.length();
    std::vector lps(m);
    int len = 0; 
    lps[0] = 0; 
    int i = 1;

    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    return lps;
}

int KMP(const std::string &text, const std::string &pattern) {
    std::vector lps = computeLPS(pattern);
    int n = text.length();
    int m = pattern.length();
    int i = 0; 
    int j = 0; 
    int count = 0;

    while (i < n) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }
        if (j == m) {
            count++;
            j = lps[j - 1];
        } else if (i < n && pattern[j] != text[i]) {
            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
    return count;
}

int main() {
    std::string text, pattern;

    std::cout << "Enter text: ";
    std::getline(std::cin, text);
    std::cout << "Enter pattern: ";
    std::getline(std::cin, pattern);

    int occurrences = KMP(text, pattern);
    std::cout << "The pattern appears " << occurrences << " times." << std::endl;

    return 0;
}

Conclusion

In this article, we learned how to solve the string-finding problem using C++. After introducing the brute-force method, we explored how the efficient KMP algorithm operates. Problems related to strings can further develop based on basic algorithm knowledge. Therefore, it is essential to continue practicing such problems to enhance programming skills.

Try solving various string problems and develop your own algorithms! You’ll gain confidence in coding tests. Thank you!