Python coding test course, finding the longest common subsequence

Problem Description

Given two strings, this is a problem of finding the Longest Common Subsequence (LCS) between them. A common subsequence refers to characters that appear in both strings while maintaining their order. The goal is to find the maximum length of such a subsequence.

Example

Input

String A: ABCBDAB
String B: BDCAB

Output

Longest Common Subsequence: BDAB

Approach to the Problem

To solve the LCS problem, we can use Dynamic Programming. Dynamic Programming is a technique that solves problems by storing already computed results and reusing them. To find the LCS, we store the length of the subsequence based on the comparison results of each character in the two strings.

Finding LCS using Dynamic Programming

Step 1: Initialize the DP Table

Let the lengths of the two strings be m and n, we create and initialize a 2D DP array of size (m + 1) x (n + 1). Each element of the DP array stores the length of the common subsequence found in the two strings.

Step 2: Fill the DP Table

We fill in the DP table while comparing each character of the two strings. If the characters match, the LCS length including that character is DP[i-1][j-1] + 1. If they do not match, we use DP[i][j] = max(DP[i-1][j], DP[i][j-1]) to record the length of the LCS found so far.

Step 3: Extract LCS Length and String

After filling the DP table, check DP[m][n] to get the length of the longest common subsequence, and extract the actual common subsequence using this length. The extraction process is carried out by tracing back through the table to find common characters.

Algorithm Code

    
def lcs(A, B):
    # Step 1: Initialize the DP array
    m = len(A)
    n = len(B)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Step 2: Fill the DP array
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if A[i - 1] == B[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    # Step 3: Get LCS length
    lcs_length = dp[m][n]

    # Extract LCS string
    lcs_str = []
    i, j = m, n
    while i > 0 and j > 0:
        if A[i - 1] == B[j - 1]:
            lcs_str.append(A[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] >= dp[i][j - 1]:
            i -= 1
        else:
            j -= 1

    lcs_str.reverse()  # Reverse the extracted string to restore original order
    return lcs_length, ''.join(lcs_str)

# Test code
A = "ABCBDAB"
B = "BDCAB"
length, lcs_string = lcs(A, B)
print("LCS length:", length)
print("LCS string:", lcs_string)
    
    

Complexity Analysis

The time complexity of this algorithm is O(m * n). Since each character needs to be compared, the result is a product of the lengths of the two strings. The space complexity is also O(m * n), determined by the size of the created DP table. However, there are ways to reduce the size of the DP table. For example, since only the previous row’s data is needed, it can be implemented using two 1D arrays.

Optimized Code Example

    
def optimized_lcs(A, B):
    m = len(A)
    n = len(B)
    dp = [0] * (n + 1)

    for i in range(1, m + 1):
        current = 0  # Value at the current position
        for j in range(1, n + 1):
            temp = dp[j]
            if A[i - 1] == B[j - 1]:
                dp[j] = current + 1
            else:
                dp[j] = max(dp[j], dp[j - 1])
            current = temp  # Store the value from the previous row

    return dp[n]

# Optimized test
length = optimized_lcs(A, B)
print("LCS length:", length)
    
    

Conclusion

In this lecture, we dealt with the problem of finding the Longest Common Subsequence (LCS). This problem is a great example of how to easily utilize string processing and dynamic programming concepts. The LCS problem is widely used in various fields, especially in gene analysis and finding identical sequences, so understanding and implementing the algorithm is very useful for coding tests and practical applications.

References