Python Coding Test Course, DNA Password

Many people are solving algorithm problems to prepare for coding tests. In this course, we will explore the DNA password problem and explain step by step how to solve it.

Problem Description

The DNA password consists of a specific pattern of combinations. The problem is to find how many substrings from the given DNA string can become passwords. Here are the details of the problem.

Problem Definition

From the given DNA string, count the number of distinct substrings that contain only the characters “””ACGT””” and are of length K or more.

Input

  • The first line contains the DNA string. (1 ≤ string length ≤ 1000)
  • The second line contains the value of K. (1 ≤ K ≤ 100)

Output

Output the total number of distinct substrings.

Problem Solving Process

To solve this problem efficiently, we will follow these steps:

Step 1: Generate Substrings

Generate all substrings of length K or more from the given DNA string. To do this, we will iterate over the starting indexes of the string and extract substrings of length K or more from each starting index.

Step 2: Store Unique Substrings

Store the generated substrings in a set. Since a set does not allow duplicates, only distinct substrings will be stored. We can utilize the set data type in this process.

Step 3: Output Result

Finally, we count the number of substrings stored in the set and output the result. Now let’s convert these processes into code.

Python Code


def count_unique_dna_substrings(dna: str, k: int) -> int:
    unique_substrings = set()  # Set to store substrings
    n = len(dna)

    # Extract substrings for all starting positions in the string
    for start in range(n):
        for end in range(start + k, n + 1):  # Only consider lengths >= K
            substring = dna[start:end]
            unique_substrings.add(substring)

    return len(unique_substrings)  # Return the count of unique substrings

# Input
if __name__ == "__main__":
    dna_string = input("Please enter the DNA string: ")
    k_value = int(input("Please enter the length K: "))
    
    # Calculate and print the number of unique substrings
    result = count_unique_dna_substrings(dna_string, k_value)
    print(f"The number of distinct substrings is: {result}")
    

Code Explanation

The function count_unique_dna_substrings used in the above code takes two parameters: the dna string and the k value. This function first creates an empty set to prepare to store substrings. It then extracts substrings through a loop that will be explained in the next step.

Loop Explanation

The first loop iterates over the starting index (start) of the DNA string. The second loop extracts substrings of length K or more starting from the start index, thus determining the end index. Each substring is added to the set, only storing distinct cases.

Example for Understanding

For example, if the input DNA string is ACGTACGT and the value of K is 2, the possible substrings that can be generated are as follows:

  • AC, ACG, ACGT, ACGTA, ACGTAC, ACGTACGT
  • CG, CGT, CGTA, CGTAC, CGTACG
  • GT, GTA, GTAC, GTACG
  • TA, TAC, TACG
  • AC, ACG, ACGT

In total, there will be 14 distinct substrings.

Performance Considerations

The above code checks all substring lengths for every starting index, resulting in a worst-case time complexity of O(n^3). Performance issues may occur when the string length reaches 1000. To enhance performance in extreme cases, we can consider applying the Sliding Window technique.

Application of Sliding Window Technique

Using the Sliding Window technique allows us to easily calculate the number of substrings with length >= K in a single pass. It means storing information about the length of the string during one pass to facilitate exploration. This technique can reduce time complexity to O(n^2) or O(n).

Conclusion

In this course, we learned how to handle substrings of a string and resolve duplication issues using sets through the DNA password problem. Such problems frequently appear in coding tests, so it is recommended to be well-versed in basic string handling methods. In the next course, we will cover another useful algorithm problem.

Stay tuned for the next course! Thank you for your interest and support.