Even if you fall on your face, you're still moving forward.
~Victor Kiam
Regular Expression Matching
Welcome Back Reader!
In this article we will discuss about the next problem based on "Dynamic Programming" i.e.
"Regular Expression Matching".
This is a very interesting problem. One of my favourites too!
Before you move any further, it is advised that you give this
problem a fair try.
Let's jump to the problem.
In this problem you are given two strings S1 and S2. S1 represents a text and S2 represents a
pattern.
All you have to do is to print 'true' if the pattern is matched with the given text, otherwise
print 'false'.
Note: The pattern can include the characters '.' and '*'
'.' - matches any single character
'*' - matches zero or more of the preceding character
For example:
Sample Input:
mississippi
mis*i.*p*i
Sample Output:
true
Approach
How?
String S1 is the text, 'mississippi' and string S2 is the pattern, 'mis*i.*p*i'.
In the pattern, 's*' can make the s repeat for 2 times (missi.*p*i).
Then '.*' can make the '.' repeat for 3 times (missi...p*i) and then each '.' can match the
characters 'ssi' (mississip*i).
After that 'p*' can make the p repeat for 2 times (mississippi).
So the output is true.
For more clarity of the question; watch below part of the video.
Moving On:
In this problem we need to check if the given pattern can be matched with the given text
by somehow replacing the two special characters; '.' and '*'.
It is given in the question that '.' can match any character. If the text is 'a' and the
pattern is '.' Then the output will be true because '.' matches a.
Also if there is '*' then it means that it can match zero or more of the preceding
character. If the text is 'aa' and pattern is 'a*' then a* has choices to either not let
'a' come even once or just once or more than once. In this case we need 2 a(s). So 'a*'
has the capability to match with 'aa'. So the output will be true.
We can also come across a case where both the special characters can occur together like
'.*'. Suppose the text is 'abc' and the pattern is '.*'. As '.' Is the preceding
character so '*' has the power to either vanish it or make it repeat any number of
times. Here, there are 3 characters 'abc'. So we use 3 '.' Like '...' and each '.' has the
capability to match with each of the corresponding characters. So the output will be
true.
FUN FACT: If the pattern is '.*' then the output will always be true.
To solve this problem we will be using Dynamic Programming for no doubt.
First of all we will take a 2D dp array of boolean type. In this 2D array we will
represent the text by the x-axis (columns) and pattern by the y-axis (rows).
Also we will take an extra row and extra column which will represent if the text or
pattern is an empty string. So this way the size of the 2D dp array becomes
(pattern.length() + 1 * text.length() + 1).
Also we will take an extra row and extra column which will represent if the text or
pattern is an empty string. So this way the size of the 2D dp array becomes
(pattern.length() + 1 * text.length() + 1).
The meaning of each cell if it stores true will be that the substring of pattern till
the character to which the particular row corresponds; matches the substring of text
till the character to which the particular column corresponds to.
Let the text be 'mississippi' and pattern be 'mis*i.*p*i'.
Talking of the first row, here it means that the pattern is empty so it will be the same
as text only if the text is also empty and this case is represented by dp[0][0]. So we
will store 'true' in this cell. But all other cells of this row will store false as an
empty string wouldn't match to any other string of any non-zero length.
Now let's start with the first column. For filling the first column there can be some
special cases so we need to handle the first column a bit differently.
If there is any character or '.' then we can simply place false because the first column
indicates that the text is an empty string. So if there is any character or '.' Then the
strings won't match.
But we need to be careful if the character is '*'. For handling '*' In the first column
we need to move two rows up and check the result at dp[i-2][j]. Our result will be the
same as the result stored in this cell.
Watch below part of the video for better understanding.
Now to fill the rest of the matrix; we will move column wise in with increasing rows and
check if the last character (corresponding to this cell) of both text and pattern at
dp[i][j] is equal or not. If it is equal then we need to look diagonally upwards in the
2D array i.e. at dp[i-1][j-1]. And if the character is not the same then we will put
false at dp[i][j].
And if the character in the pattern is '.' Then also look diagonally upwards (i.e. at
dp[i-1][j-1]).
To handle '*', we check if the character before '*' in pattern equal to the character
corresponding to the jth column in text. If it is equal then we need to move upwards by
two rows and check the cell i.e. dp[i-2][j]. We also check the left column in the same
row i.e. dp[i][j-1] and if either one stores true then we will also store true otherwise
false.
But if the character before '*' in pattern is not equal to the character corresponding
to the jth column in text then we need to move upwards by two rows and only check the
cell i.e. dp[i-2][j]. If the cell stores true then we will also store true otherwise
false.
As we start to fill the 2nd row (i==1), we have 'm' corresponding to this row. And in
columns this character matches only to 'm' at (j==1). So we will check what is present
at the diagonally upward cell i.e. at dp[0][0]. We check and find that true is present
there so we place true at dp[1][1] also and false in the rest of the cells.
Then we move to the next row (i==2), we have 'i' corresponding to this row. And in
columns this character matches to 'i' at (j==2, 5, 8 and11). So we will check what is
present at diagonally upward cells i.e. at dp[1][1], dp[1][4], dp[1][7] and dp[1][10].
We check and find that true is only present at dp[1][1] there so we place true at
dp[2][2] only and false in the rest of the cells.
Similar procedure will be followed for the next row (i==3) i.e. for character 's'.
Please try to fill the matrix for this row on your own for better understanding and then
you can cross check your answer.
Watch below part of the video, for better understanding.
Now moving to the next row (i == 4) we have '*' corresponding to this row. So, we need
to check dp[i-2][j] for every character which is not equal to the character preceding
'*' and also check dp[i][j-1] if the characters are equal.
In this case, the preceding character is 's'. 's' is present at column (j) == 3, 4, 6
and 7 in the text. So for these j(s) we need to check 2 cells. For j==3, we check
dp[2][3] (cell i.e. 2 rows above) and dp [4][2] (left cell), for j==4, we check dp[2][4]
(cell i.e. 2 rows above) and dp [4][3] (left cell), for j==6, we check dp[2][6] (cell
i.e. 2 rows above) and dp [4][5] (left cell) and for j==7, we check dp[2][7] (cell i.e.
2 rows above) and dp [4][6] (left cell). If either of 2 cells is true then store true
and if both the cells store false then store false for dp[4][j].
And for the rest of the columns (j(s)), we only check dp[2][j] and place the same
Boolean at dp[4][j].
Watch below part of the video, for depth analysis and better understanding.
'Why' part has also been explained with more details for "*" element.
Next element is a character i.e. 'i' and a similar procedure will be followed for the
next row (i==5) i.e. for character 'i'.
Please try to fill the matrix for this row as well on your own by understanding the
rules of how to handle a character for better and then you can cross check your answer.
In the next row (i == 6) we have '.' corresponding to this row.
'.' is considered a match for all the characters in text.
So for all the columns we need to check the (dp[i-1][j-1]) which is a diagonally upward
cell and the answer for dp[i][j] will be the same as dp[i-1][j-1].
Watch below part of the video, for better understanding.
Now moving to the next row (i == 7) we have '*' corresponding to this row. So, we need
to check dp[i-2][j] for every character which is not equal to the character preceding
'*' and also check dp[i][j-1] if the characters are equal.
In this case, the preceding character is '.'. And '.' has the power to match every
character.
So in this row we will check the cell of the left column and the cell above 2 rows for
each column
And if either of 2 cells are true then the answer for the current cell will also be
true. But if both the cells have false stored then the answer will also be false.
Watch below part of the video, for better understanding.
Next element is a character i.e. 'p' and a similar procedure will be followed for the
next row (i==8) i.e. for character 'i'.
Please try to fill the matrix for this row as well on your own by following the rules of
how to handle a character for better understanding and then you can cross check your
answer.
Next element is '*' (at i==9) and similar procedure will be followed when we handled '*'
which was present at i==4 when the preceding element was character 'i'.
Here the preceding element is 'p'. Please try to fill the matrix for this row as well on
your own by following the same rules for better understanding and then you can cross
check your answer.
Finally the last element is a character i.e. 'i' and a similar procedure will be
followed for the next row (i==10) i.e. for character 'i'.
Please try to fill the matrix for this row as well on your own by following the rules of
how to handle a character for better understanding and then you can cross check your
answer.
Final result will be stored at dp[pattern.length()-1][text.length()-1], which basically
indicates that the string pattern matches string text or not
Watch below part of the video, for better understanding.
First of all define a 2D array dp, of dimension (pattern.length() +1 * text.length() + 1).
After that, set the dp[0][0] to true.
Then apply a for loop on the length of pattern or rows of dp.
Then apply a nested for loop for each row to access each column.
Then check if j == 0 i.e. first column.
If it is first column then check if the corresponding element (pattern.charAt(i-1))to this
row is '*'.
If yes, then the value for dp[i][[j] will be the same as that of dp[i-2][j].
If the column is not first (j != 0) then move to the else part.
For the rest of the columns check if the char.At(i-1) in pattern string a '.' or the
character at i-1 in pattern and character at j-1 in text is the same or not.
If yes, then the answers for such cells will be the same as that in a diagonally upward cell
which is dp[i-1][j-1].
Else we check the element at i-1 in pattern string a '*'.
If yes, then we set the same boolean in the current cell (dp[i][j]) which is in 2 rows above
(i.e. dp[i-2][j]).
Then we check if the character at (i-2) in the pattern string is the same as that of
character at (j-1) in the text string.
If yes, then we update the value at dp[i][j] with the same value at the left column i.e.
dp[i][j-1].
At last return the value at dp[pattern.length()-1][text.length()-1].
For more clarity of this code, watch below part of the video.
There may be a little variation in the code because in the video we have handled the cases
where we need to place false as well but the default in the boolean array is false, so we
just took the advantage of this point; however the approach is similar.
Complexities:
Time Complexity: O(n^2)
A nested for loop is used once to fill the 2d array, which gives a time complexity of O(n^2).
Space Complexity: O(n^2)
A 2d array is used for the dimensions pattern's length x text's length.
We hope that this article was helpful. If somehow you are finding it difficult to understand
this problem then we advise you to watch our video lecture of this problem.
Trust me it will just get easier to understand after you have watched the solution video.
You can contact us via our website. Doubts, suggestions and feedback are always welcomed.
It is also advised that you follow the sequence of modules and questions which is there on our website.
Keep practicing more and more problems daily. Meditate enough on each step of each problem.