1. Problem Definition
Recently, a biologist found themselves in a situation where they needed to create a specific password using DNA sequences. This DNA sequence consists of four types of nucleotides: A, C, G, and T. The password is generated from a combination of A, C, G, and T selected from the given DNA sequence.
Problem Description
Write a program to find the substring of length K among all substrings of the given string S where the frequency of each nucleotide is minimized, and determine the number of valid passwords among them. If all frequencies are the same, select the smallest lexicographic string among those with the minimum frequency. The length of the substring must be less than or equal to the given string.
Input
- The first line contains the length N of the DNA string S (1 ≤ N ≤ 100,000).
- The second line contains the integer K (1 ≤ K ≤ N).
- The third line contains the string S.
Output
Print the number of valid passwords and the smallest lexicographic password among them.
2. Algorithm Analysis
To solve this problem, we will use the Sliding Window technique. This method allows us to move through substrings of a specific length K while counting the frequency of each character (A, C, G, T) to solve the problem.
Procedure
- Select the first substring of length K and count the frequency of characters.
- Then, adjust the frequency as the sliding window moves, adding the new character and removing the old one.
- Check the frequency of nucleotides in each K-length substring and find the smallest frequency.
- After inspecting all substrings, print the count of valid passwords and the smallest lexicographic password.
3. C++ Coding Example
Below is a C++ code example to solve this problem:
#include <iostream> #include <string> #include <unordered_map> #include <vector> #include <algorithm> using namespace std; int main() { int N, K; string S; cin >> N >> K; cin >> S; unordered_mapfreq; for (int i = 0; i < K; i++) { freq[S[i]]++; } // Initial result setup int min_freq = INT_MAX; vector passwords; // Sliding window to check all K-length substrings for (int i = 0; i <= N - K; i++) { if (i > 0) { freq[S[i - 1]]--; freq[S[i + K - 1]]++; } // Finding the smallest frequency int cur_min = INT_MAX; for (const auto &entry : freq) { cur_min = min(cur_min, entry.second); } // Adding valid password if (cur_min <= (K / 4)) { // Frequency of all nucleotides is less than or equal to K/4 passwords.push_back(S.substr(i, K)); } } // Finding the smallest lexicographic password sort(passwords.begin(), passwords.end()); string smallest_password = (passwords.empty() ? "" : passwords[0]); // Output result cout << passwords.size() << " " << smallest_password << endl; return 0; }
4. Code Explanation
The above code works as follows:
- First, the DNA string and its length, along with the length K of the substring, are input.
- An
unordered_map
is used to count the frequency of the first K characters. - The Sliding Window technique is used to move through each K-length substring, modifying the frequency.
- The minimum frequency is calculated for each substring, adding to the valid password list only when conditions are met.
- Finally, the valid password list is sorted to find the smallest lexicographic password.
5. Test Cases
Now, let’s look at some test cases to solve this problem:
Test Case 1
Input: 8 4 ACGTTGCA Output: 1 ACGT
Test Case 2
Input: 12 5 ACGTTACGATC Output: 3 ACCTA ACTAC TCGAT
Test Case 3
Input: 6 3 ACGTAC Output: 2 ACG ACT
6. Conclusion
In this tutorial, we learned an algorithm utilizing the sliding window technique through the DNA password problem. This technique can be very useful in solving string-related problems and can be applied to various modified problems. It serves as a solid foundation that can be beneficial in coding tests.