Welcome back, dear reader. So, in this article, we will discuss a very interesting and well-known problem called Longest Common Subsequence. So, what does this problem say? We are given two input strings and we have to display the length of the Longest Common Subsequence (LCS) between these two strings. For instance, if we get the first input string "abcd" and the second input string "abed", the LCS of these two strings is "abd". So, our answer will be 3. We hope that you have understood the question. Still, if you have any doubts regarding the question, refer to the LONGEST COMMON SUBSEQUENCE VIDEO to understand the question completely. We recommend you try to solve the problem on your own first and then refer to the solution to make your concepts strong.
Longest Common Subsequence
Perl - The only language that looks the same before and after RSA encryption.
Time Complexity: O(m x n) where m is the length of the first string and n is the length of the second string.
Space Complexity: O(m x n)
Let us discuss the overview of the brute force approach. Dear reader, you will have to write the code of this brute force approach yourself. So, what does this approach say? See, it is a very basic approach and you can understand this approach very easily. We have to find the longest common subsequence of the two strings.
So, let us get all the subsequences of the first string in one Arraylist and all the subsequences of the other string in another ArrayList. How will we do that? Refer to the GET SUBSEQUENCE VIDEO in the foundation course if you have any doubts about getting all the subsequences.
So, you know how we can do it, right? We will use the recursion and at every level in the recursion tree, we will have an option for every character whether it wants to be included or not in the subsequence. So, at levels, we have characters of the input string and every string has two options only, whether to come into the subsequence or not to come.
The recursion tree for the first input string "abcd" is shown below. We have used the levels and options method where at each level, we have one character of the input string and we have an option whether to include it or not to include it in the subsequence. Similarly, we can draw the recursion tree for the other input string as well. We recommend you try to draw the recursion tree for the other string yourself.
So, we will not just print these subsequences, rather, we will store them in an ArrayList.
After this procedure, we will have two Arraylists containing the subsequences of each of the input strings. Now, we will run a nested loop keeping the first Arraylist outside and the second inside or vice-versa, to match whether we have the same subsequence or not and we will run the loop completely as we do not just want the common subsequence, we want the Longest Common Subsequence.
So, let us try to analyze the time and space complexity of the above procedure.
The time complexity of the above method will be O(2m+n) where m is the length of the first input string and n is the length of the second input string. Why is the time complexity? See, we are running a nested loop. We know that if there are x iterations in the outer loop and y iterations in the inner loop, the time complexity becomes O(x * y). Now, the outer loop will have 2m iterations because, in the ArrayList, we have 2m elements. Similarly, the inner loop will have 2n iterations as the second Arraylist has 2n elements. So, the time complexity ios x*y = 2m * 2n = 2m+n. So, the time complexity becomes O(2m+n).
The space complexity of the above solution is O(2m + 2n). This is because we are making 2 Arraylists, one is of the size 2m and the other is of the size 2n. So, the total time complexity becomes O(2m + 2n).
So, we can see that this method is not at all suitable for us as it uses a lot of space and time. We have to solve this problem in less time and space complexity. So, we will use Dynamic Programming to solve this problem which will reduce the time and the space complexity to O(m x n).
So, dear reader, we just saw the brute force approach of the question above. Now, let us try to understand a few things. This can be a little mathematical and difficult to understand. So, we want you to stay with us till the end so that you understand the concepts completely.
Let us first understand one very important thing. We solved this problem using the recursion above. Now, try to remember how we used to solve the get subsequence problem in the foundation course.
If you remember, we can say that the set we can get all the subsequences of a string "abc" by prepending the subsequences of string "bc" with a blank string i.e. all the subsequences of the string "bc" only and by prepending the subsequences of string "bc" with 'a'.
So, let us say that we have an input string "str". Let us say that we represent the set of subsequences of string str by s(str). So, the set of subsequences of string str can be formed by:
What is this? See, s(str) means the set of subsequences of the string str. This can be formed by prepending the blank string (represented by "_") with a set of subsequences of 'r' where 'r' represents the remaining string after removing the first character of the string and by prepending c1 i.e. the first character of the string with the set of subsequences of the remaining string (after removing the first character).
For instance, for the input string "abc" we can say that the set of subsequences of string "abc" can be formed by prepending a blank string "_" (this is only the representation of a blank string) with the set of subsequences of the remaining string "bc" and by prepending the first character "a" with the set of subsequences of the remaining string "bc".
Now, let us say that L(s1,s2) represents the longest common subsequence of the strings s1 and s2. Now you have understood above that the string s1 can be broken into two parts, 'c1' i.e. the first character of the string s1 and 'r1' i.e. the remaining string s1. Similarly, the string s2 can be broken into 'c2' i.e. the first character of string 2 and 'r2' i.e. the remaining part of string s2.
So, L(s1,s2) can also be written as L(c1r1,c2r2).
The longest common subsequence of strings s1 and s2 depends upon the cartesian product of the set of subsequences of the first and the second string.
You know this already. How? Have a look at the statement carefully. This statement just says that if we want to find the longest common subsequence of strings s1 and s2, we have to take a nested loop and find out which subsequence of the first string matches with the subsequence of the second string and is the longest.
If you have any doubts related to the explanation here, refer to the solution video to clear all your doubts and understand the concept in full depth.
Now, from above, we know one more thing that s(str) i.e. s(cr) can be broken as "_" + s(r) and c + s(r). So, we can write that:
Now, what does this signify? This signifies that instead of having only one for loop, we can have 4 for loops i.e. we can have the for loops like this:
It is just like normal mathematics. See, if a+b=x and c+d=y and we have x*y, we can substitute the values of x as (a+b) and y as (c+d) and then multiply. This will give us (a+b)*(c+d) which on expanding gives a*c + a*d + b*c + b*d. The same procedure is what we have applied above.
Dear reader, we request you to watch the solution video to understand the above procedure clearly with the help of some examples as well.
Let us number the for loop terms that we have got from above:
So, let us tell you a very interesting thing. If c1=c2, the longest common subsequence will be found in loop 4, and if c1 is not equal to c2, it will be found in either of the three loops.
Why? Try to find this out and then we will discuss this in detail.
So, dear reader, did you do some thinking on it? We hope you did. Now, let us prove the above statement by the contradiction method. We recommend you refer to the solution video to understand the above-explained procedure in detail.
First, let us prove why we will not get the answer from loop number 4 if c1 is not equal to c2.
When c1 is not equal to c2, the subsequences in the first set c1s(r1) will start from c1 and the subsequences in the second set c2s(r2) will start from c2. Since c2 is not equal to c1, these subsequences can never be the same. So, forget about the longest length common subsequence,, they will not have any subsequence in common.
So, we hope that you understood this. Since these two sets will not have anything in common, we will not use the 4th loop or we can just say that the LCS will not be present in the 4th loop. Now let us try to find out why we will get the answer from loop 4 if c1=c2.
Let us say that the LCS can be obtained from loop 1 i.e. _s(r1)*_s(r2). So, let us say that there is a subsequence of length three (say "xyz") that exists in these sets which is the longest common subsequence if we use these sets. Don't think that the input strings are "abcd" and "abed" so how did we get the subsequence as "xyz". We have just picked a random notation for the subsequence. So, if "xyz" is the longest subsequence in these two sets then in loop4, we will have a subsequence c1xyz and c2xyz in the sets respectively. Why? This is because if the input string is "abcd" then the set of subsequences of "abcd" will contain the subsequences of "bcd".
So, the subsequences c1xyz and c2xyz are the same as c1=c2. So, this set has a common subsequence cxyz (c1=c2=c) and its length is greater as compared to the length of the string from the first loop. So, we have stated that if c1=c2 then loop1 should not be considered as we will get longer subsequences in loop4.
We recommend you refer to the solution video to understand this contradiction proof completely.
Let us say that the LCS is obtained from the second loop i.e. _s(r1) * c2s(r2). So, again let us assume that there is a subsequence xyz, which is the LCS. This means that "xyz" is present in set 1 and set 2 also. But in the second set, every subsequence starts with c2. Now, the xyz should be common in both of them. So, this means that the subsequence in the second set is c2yz where c2=x. Now, if the set 1 contains xyz as a subsequence then it is for sure that it also has a subsequence "yz". Why?
So, this subsequence will be present in the set 1 of the loop 4 also but we will prepend it with c1. So, we will have c1yz in the set 1 of the 4th loop. Also, the second set is the same in both the loops 2 and 4. So, it will have c2yz as a subsequence. Now, if we have both c1yz and c2yz, c1 and c2 are equal and c1=x (already discussed above). So, we get the LCS as xyz again. So, the answer that we can get from loop1, can get it from loop4. So, it's better to use loop 4 as iit gives us the answers of both the loop1 as well as loop2.
We recommend you watch the solution video to understand the above contradiction proof completely.
Dear reader, this contradiction proof is the same as contradiction-2. We request you to try to prove this yourself as this will help you think and understand in a better way what we are doing exactly. If you have any problems proving it yourself, refer to the solution video to understand this proof also.
Now, let us try to arrive at some conclusions:
We saw that the length of the subsequence that we get from the 1st loop is one shorter than the LCS (that we get from loop 4 when c1=c2). Loop 4 represents the entire strings whereas loop 1 represents the r1 and r2 i..e. the strings that remain after removing the first characters from them. So, we can say that:
When c1 is not equal to c2, the answer is:
Let us prove this includes all the three loops 1,2 and 3. See the first term l(s1,r2). This will depend upon s(s1) and s(r2). We can break s1 into _s(r1) and c1s(r1). Therefore, this is a combination of loops 1 and 3.
So, you can similarly understand that l(s2,r1) is a combination of loops 1 and 2. We request you think about its mathematical proof (as shown in img-9) yourself.
We request you refer to the solution video to understand the above conclusion completely and understand the summing up of both the conclusions.
So, we hope that you have understood the above procedure completely. Now, let us finally conclude the following :
So, dear reader, we have done a lot of hard work here. We hope that you remained with us till the end and have understood everything completely. Now, the last part and the easiest part of this solution is left. Let us discuss it too.
Let us take two input strings "abcd" and "abed". Now, let us keep the characters of one of the strings along the row and the other string along the column. This is shown in the figure below:
Now we need to understand what we are storing at each cell in this matrix. So, let us understand it with the help of an example.
The position shown above stores the Longest common subsequence of bcd with ed. So, you can now understand what we are storing at each cell from this example shown above.
Now, let us start by identifying the direction of solving. We will solve this matrix in the reverse direction starting from the last cell as it represents the smallest problem i.e. LCS of two blank strings i.e. also a blank string.
The last row and last column will be filled with zeroes as if any of the strings is blank, it has only one subsequence that is blank itself and the length of the blank string is 0. So, fill the last row and last column with 0.
We have already filled the last row and last column with zeros. So, now we are at row=d and column=d. Now, there are two conditions: c1=c2 or c1 not equal to c2.
We are at row=d and column=d. So, c1=c2. So, in this case (see img-10 for the formula) we will separate the first character from both the strings so we reach the last element in the matrix. We have to add 1 to the value present there as per the formula. So, 1 will be stored at row=d and column=d.
Now we are at row=d and column=c. This means that we are calculating the LCS between the string "d" and "cd". Here, c1 is not equal to c2. So, either the first string separates one character or the second string separates one character.
When "d" separates one character the first string is blank and the second string is "cd". This means that we are at row="_" and column==c. The element present there has a value of 0.
The option was to separate a character from the second string. When we separate a character from "cd", it becomes "d". So, now we are left with "d" and "d". This means we go to row=d and column=d. So, the value here is 1.
The max value out of the two values obtained is 1. So, we will store 1 here at row==d and column=c.
So, this is how we will fill the entire matrix. We recommend you try to fill the rest of the matrix on your own. Your answer will be:
So, we get our answer at dp. We recommend you go through the solution video to understand the complete process of filling the matrix.
So, dear reader, we hope that you have understood the complete procedure and have enjoyed the in-depth analysis of this question. Now that we have understood everything, let us write the code:
Dear reader, we hope that you understood the code written above. If you have any doubts regarding the code, refer to the solution video (32:10 onwards)to understand the code and clear all your doubts. Now that we have completed the code also, let us analyze the time and space complexity of this problem.
The time complexity of the above solution is O(m x n) where m is the length of the first string and n is the length of the second string as we are traversing in a matrix of size m x n.
The space complexity of the above solution is O(m x n) as we have made a matrix dp of size m x n.
So, dear reader, we hope that this time and space complexity analysis was pretty easy for you. Now, if you still have any doubts left, refer to the complete solution video to clear all your doubts. With this, we have completed this problem.
Here are some suggestions from our side that you do not want to miss:
We have only one suggestion for this problem. We highly recommend you watch the complete solution video from the beginning till the end twice. This is going to clear every single bit of doubt from your mind and you will really enjoy the knowledge that you will get from it. Till then, keep practicing and working hard. Happy Coding!!!