August 2, 2012

Minimum number of insertions in a string to make it palindrome.




Question: You are given a string and have to find out minimum number of characters to insert in it so that it becomes palindrome. A palindrome is a string which looks same either it is read from left to right or from from right to left.





Solution: Take any string, for e.g. “SARGAM”. This is not palindrome but if you delete the characters 'S', 'G', 'M' what remains is “ARA” and this is palindrome. Now this palindrome was already in the given string and its length is three. So basically we've to find out the length of longest palindrome which is already hidden inside in given string. In this way, our
required answer = length of given string - length of longest palindrome hidden inside it
    Now our task is to find the longest palindrome. If you think a little hard, you'll immediately find that this task is equivalent to finding the Longest Common Subsequence(LCS) of a given string and its reverse, which is a classical DP(Dynamic Programming) problem. I won't go into details of LCS as there are already lots of tutorial on the web. I'll dive directly into calculations. I'll highly recommend you to understand LCS problem and it's solution first, in case if you don't know.

    Given string is “SARGAM”, reverse of it would be “MAGRAS”. We've to find the LCS between these two strings.
Suppose there are two arrays, array_orig[]=”SARGAM” and array_rev[]=”MAGRAS” which we have to compare. We'll compare all characters of array_rev[] with array_orig[] one by one. We'll take first character of array_rev, which is 'M' and compare it with 'S' then 'A' then 'R' and so on till the length of array_orig[]. While comparing if both the characters are same it means that we'll add 'one' to the length of longest palindrome till found. If characters doesn't match then we'll simply take the maximum of values till found. If these lines doesn't make any sense to you, you should better check your concepts on finding LCS.
    In the given table, cell entries gives the length of longest palindrome till that cell. So the final cell which is highlighted in red color is length of longest palindrome in the given string. '0' in bold in row and column entry means comparison with empty string, i.e. of length zero.
 
Cell entries are filled with the formula
   if characters match:
            array$[I][J] =$ array$[I-1][J-1] + 1$
   else   array$[I][J] = $max (array$[I-1][J]$, array$[I][J-1]$);

// array is the name of this table given here.


0 S A R G A M
0 0 0 0 0 0 0 0
M 0 0 0 0 0 0 1
A 0 0 1 1 1 1 1
G 0 0 1 1 2 2 2
R 0 0 1 2 2 2 2
A 0 0 1 2 2 3 3
S 0 1 1 2 2 3 3


This method demands the two dimensional array which would need $O(n^2)$ space for string of length $'n'$. It means for the given string of length $100000$ characters you would need $O(10^{10})$ space which is too much. A little awareness could save you from this trouble. Observe that in array states we need only array$[I][J]$, array$[I-1][J]$ and array$[I][J-1]$, that's it. So we need only the current state and it's last state. For this, we'll need array$[2][length+2]$ where $'2'$ is because of two states and 'length' is length of string given.
After calculating the row entry of array$[I][J]$ copy it to array$[I-1][J]$.

Finally, required answer $= 6$(length of "SARGAM")$ - 3 $(red entry in the table)$ = 3$
So, the given string is "sargam" and after 3 insertions it would become "sMargRamS". Here, upper-case characters are inserted to make it palindrome.

Original Problem Statement can be found here:
http://www.spoj.pl/problems/AIBOHP 

My solution:
# include <stdio.h>
# include <string.h>

inline int max(int a, int b)
{return a>=b?a:b;}

int main()
{
    int tcases;
    scanf("%d",&tcases);
    char str[6100];
    int dp[2][6100];
    while (tcases--){
        scanf("%s",str);
        memset(dp, 0, sizeof dp);
        int I, J, K, length;
        length=strlen(str);
        for(I=length-1;I>=0;I--){
            char ch=str[I];
            for (J=0;J<length;J++)
                if (ch==str[J])
                    dp[1][J+1]=dp[0][J]+1;
                else dp[1][J+1]=max(dp[1][J], dp[0][J+1]);

            for (K=0;K<=length;K++)
                dp[0][K]=dp[1][K];
        }
        printf("%d\n",length-dp[1][J]);
    }
    return 0;
}
References:
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

No comments:

Post a Comment