The most important property of a program is whether it accomplishes the intention of its user.
Count Distinct Subsequences
Welcome back, dear reader. So, how is it going? In this article, we will discuss the problem called COUNT DISTINCT SUBSEQUENCES. So, what does this question say? We will be given an input string and we have to count the distinct subsequences of that string and print the count. For instance, if the input string is "abb" then, the subsequences of this string are as follows:
So, as shown in the image above, we have formed the subsequences of the string "abb" by first generating the subsequences of 'a' then the subsequences till "ab" and then the complete string "abb". We found out that two subsequences were repeating, so we removed them (but we have to consider them once hence since they were repeating twice, we removed only one instance of them and kept the other) and got the answer as 6 distinct subsequences. But, there is a constraint given in the question that we have to count the distinct and non-empty subsequences of the string. Since here we have counted the empty subsequence also, let us subtract 1 from our answer and the answer will be 5.
So, dear reader, we hope that you have understood the problem completely. We recommend you try to solve this problem on your own first and then move to the solution to gain in-depth knowledge. If you have any doubts regarding the question, refer to the COUNT DISTINCT SUBSEQUENCES VIDEO to understand the question completely.
APPROACH :
Time Complexity: O(n) where n is the length of the string
Space Complexity: O(n) where n is the length of the string
Algorithm:
Create a dp array of size one greater than the length of the string and assign the blank string to the 0th character and all the characters starting from the 0th character of the string to the further indices of the array.
For every dp[i], if the character is encountered for the first time, put dp[i]=dp[i-1] x 2 where dp[0]=1. Also, put the character in the hashmap with its index of occurence in the dp array.
If the character had already been encountered, go to the last occurrence of the element minus 1 position in the array and subtract this value from dp[i-1]x 2. This will be the number of distinct subsequences of the string so far.
Explanation:
Let us take an input string "abcbac". We will try to find out the distinct subsequences of this string. We will first generate the subsequences by starting from the first character i.e. 'a' and then including the next characters one by one into the string and then generating the subsequences.
So, first of all, before any character in a string, we have a blank string. Any string can be expressed as a sum of its characters like the input string given can be expressed as:
Now if we add the blank string to this too, there will not be any effect to this string.
We have built this up so that we can now discuss our procedure more easily and smoothly. So, let us start with our discussion where we will generate the subsequences of this string starting from the first character and reaching the end of the string.
Subsequences of Blank String
Initially, before even the character 'a', we have assumed that we have a blank string. So, the blank string will have one subsequence and that is the blank string itself.
Subsequences of String "a"
We have now added the blank string to the first character of the input string and we get "a". So, the character 'a' interacts with the blank string in two ways. One, it will not join the blank string, and the second, it will join the blank string. So, 2 subsequences will be generated which is twice the subsequences generated when we had only a blank string.
Subsequences of String "ab"
Now, we will add the next character to the string "a" and the string will become "ab". So, we have to generate the subsequences of string "ab". This will happen when the character 'b' will interact with the string "a" in two ways i.e. one, it will not be combined with the subsequences of string "a" and the other that it will combine with the subsequences of the string "a". We will be generating 4 subsequences by this i.e. double the previous number of subsequences.
Subsequences of the String "abc"
Now, we have added another character to the string and the string becomes "abc". Again, this string will have 8 subsequences i.e. double the number of subsequences than the previous string. These will be formed when the character 'c' will interact with the subsequences of the string "ab" in two ways i.e. it will combine with the subsequences of the strings "ab" and the other that it will not combine with them.
Subsequences of the String "abcb"
This is for the first time that we have encountered a character that had already occurred previously. So, here we will see some repeating subsequences. Let us see what are the repeating subsequences and try to analyze something from them.
So, the character 'b' will interact with the subsequences of the string "abc" in two ways i.e. it will either not combine with the subsequences of the string "abc" or it will combine with them. This is shown in the image below:
So, as you can see from the image above, we have 16 subsequences which is double the number of previous subsequences but, the unique subsequences are only 14. This is because two subsequences "b" and "ab" are repeating. Why? Think about this!!!
Well, the reason for the repetition of these two subsequences is not something that you can't find out. See, let us try to figure out how these two subsequences are being generated.
The subsequence "b" is generated when the character 'b' combines with the blank string and the subsequence "ab" is generated when the character 'b' combines with the string "a". This happened initially also when the character 'b' arrived at the first time in the string.
So, we now have an idea of which subsequence is going to repeat and also the reason behind this. Now, let us try to find out the distinct subsequences of the string which include the next character also, and then we will come to some conclusion.
Subsequences of String "abcba"
Now, let us generate the subsequences of this string. This can be done in two ways. One is when the character 'a' does not combine with the subsequences of the string and the other is when it does combine with them. So, we can generate the subsequences as shown below:
So, in these subsequences, there is only one subsequence i.e. "a" which is repeating. Can you think why?
See, the subsequence "a" is generated when the character 'a' combines with the blank string. This had happened earlier also when we selected the first character of the input string to generate subsequences.
So, this time we will subtract 1 from the total subsequences (which are 14 x 2=28) to get the count of distinct subsequences.
Conclusion:
We have analyzed the subsequence generation in quite a depth now. So, let us think and arrive at some conclusion for the count of the distinct subsequences.
When we encounter a character that was not selected previously, the number of subsequences gets doubled of the previous number of subsequences.
When we encounter a character that had already occurred previously, we get the number of distinct subsequences by subtracting the number of subsequences that were present at the time when the character was encountered previously from the double of the previous distinct subsequence count.
So, the first point is self-explanatory and we do not have much to discuss. The second point is something that we need to discuss.
Have a look at (img-8). The subsequences "b" and "ab" repeat because the character 'b' interacts with the subsequences of string "a" i..e. blank string and "a" and produces these subsequences that have already been produced. So, when character 'b' interacted with the subsequences of the string "a" for the first time, there were 2 subsequences of string "a" present at that moment and we have subtracted 2 from the count of subsequences to get the number of distinct subsequences when we encounter the character 'b' for the second time.
Similarly, when the character 'a' interacted with the blank string there was only one string i.e. the blank string itself present at that moment. So, we have subtracted 1 from the total subsequences generated of the string "abcba" to get the distinct subsequences.
So, we have to subtract the number of subsequences that were present earlier when we encounter the character for the first time from the double of the previous number of distinct subsequences to get the number of distinct subsequences for the current string.
So, we hope that you have understood the conclusions that we drew from our observations. So, we request you to complete the table (in img-9) by generating all the distinct subsequences for the remaining character 'c' also and verify the conclusion that we have come to.
We recommend you refer to the solution video to understand the above procedure and conclusion that we have drawn. Now, let us see how we will implement this procedure in the code.
The Dynamic Programming Coding Approach
Storage and Meaning: We will make an array dp[] of size one greater than than the length of the string. Why? Because we have also represented the blank string and assumed that the substring "a" is formed by the interaction of character 'a' with the blank string. So, we will have an array as follows:
So, what will each element of this array represent? Have a look at the image shown below:
So, as you can see in the image below, the element at index 3 will contain the number of distinct subsequences of the string "abc". Similarly, the element at index 2 will contain the number of distinct subsequences of string "ab" and so on.
The direction of Solving: The direction of solving is always identified after identifying the smaller and the larger problem. So, the smaller problem is obviously, the number of distinct subsequences of the blank string itself. So, the answer will be 1 as only the blank string is a unique subsequence of itself. So, the direction of solving will be from index 0 toward the end of the array.
Travel and Solve: Now, comes the third and the last step of traveling the array and solving the problem. So, we will start from index 1. Since we encounter the character 'a' for the first time, we will put 2 x 1 (i.e. double of the previous number of distinct subsequences) into dp[1] i.e.
dp[ i ] = 2 x dp[ i-1 ]
Now, after this, till index 3, every character is encountered for the first time and we will apply the same formula i.e. dp[ i ] = dp[ i-1 ] x 2 and fill these values too.
Now we have reached index 4 where we encounter 'b' again. How do we know this? This is because all this while, we have been storing the characters and their index of occurrence in a hashmap. So, we checked whether the key 'b' exists in the hashmap or not and we find that it already exists. So now, the same formula can't be applied as the element will repeat.
We know that the repetition is caused when 'b' interacts with the same subsequences that were present when 'b' was encountered first. So, let us get the last index of occurrence of 'b' from the hashmap. This gives us 2. So, 'b' last occurred at index 2. But, index 2 contains the distinct subsequences of string "ab" and we want the distinct subsequences of string "a". Why? Think!!!
So, we will subtract 1 from this last index of occurrence of 'b' and we will reach index 1. The value at index 1 is 2. So, we will subtract 2 from the value that we get by dp[ i-1 ] x 2 (as this is the total number of subsequences generated) and we get our number of distinct subsequences.
Similarly, we can fill the rest of the array and the final answer will be dp[n-1].
So, dear reader, we hope that you have understood the above procedure completely. Let us now write the code based on the above procedure.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner scn=new Scanner(System.in);
String str=scn.next();
long []dp=new long[str.length()+1];
dp[0]=1;
HashMap < Character,Integer> lastOcc=new HashMap< >();
for(int i=1;i< dp.length;i++)
{
dp[i]=2*dp[i-1];
char ch= str.charAt(i-1);
if(lastOcc.containsKey(ch))
{
int j=lastOcc.get(ch);
dp[i]= dp[i]-dp[j-1];
}
lastOcc.put(ch, i);
}
System.out.print(dp[dp.length-1]-1);
scn.close();
}
}
java;
true";
So, dear reader, we hope that you have understood this code completely too. If you have any doubts regarding this code, refer to its explanation in the solution video . Now that we have understood everything, let us move to the time and space complexity analysis of the above code.
Analysis
Time Complexity:
The time complexity of the above procedure is O(n) where n is the length of the input string as we are just traversing an array of size n+1.
Space Complexity:
The space complexity is also O(n) where n is the length of the input string as we have created an array of size n+1.
So, dear reader, we hope that you have understood the time and space complexity analysis too. If you still have any doubts regarding the procedure or the code, refer to the complete solution video to clear all your doubts. With this, we have completed this problem.
Suggestions:
We suggest that you should refer to the complete solution video once to understand the concept behind the formulas that we have derived. See, in dynamic programming, the coding part is usually a cakewalk if you have understood the procedure completely. So, concentrate more on the analysis part of the question and the code will be clear to you automatically. So, keep practicing and working hard. Till then, Happy Coding!!!