Facebook Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
Longest common sub-string of strings S1, S2,.... Sn can be found as follows:
1. Build a generalized suffix tree for S1, S2,... Sn
2. Mark each internal node v by 1 (resp. 2, ... n) if the subtree below v contains a leaf for a suffix of S1 (resp. S2,... Sn)
3. Traverse the tree to find nodes marked by all 1, 2,...n and choose any u of them with a maximal string-depth
4. L(u) is a maximal common sub-string
It's not so difficult if you are going with the O(n^2) time, O(n^2) space implementation of the suffix tree. The collapsed O(n) space suffix tree is a bit harder; to build it in O(n) time is even more.
I'd doubt they would require you to implement it with a linear time and space suffix tree. THAT would be quite difficult indeed.
1> If people already knows the len of each strings, sort the strings in the ascend order of the strings.
2> pick two strings from the list, build the suffix tree for each strings.
3> Based on the suffix tree of the shortest string, prune suffix tree based on the another suffix tree.
4> If the result tree is empty one, return NULL;
5> pick another string from the list if one exist, build suffix tree of that string, and go to 3.
6> If there is no more string in the list, return the longest path of the result tree as the result.
Another way to slice this is to do sort of a breadth-first search. First make a hash of all the common 2-grams across all the lists, pruning as you march down the lists. For each 2-gram, keep a cache of locations within each list. (If the list of 2-grams is empty, just have special case code to find a single character in all strings.)
Then when you go the 3-gram phase, start with all the 2-grams for the first list, and find their 3rd character. Let's say you start with ce and the first occurrence of ce in the first list is followed by x, giving you the 3-gram of cex. If "ex" is not in the map of 2-grams common to all lists, then you'll know that cex won't be in all lists either (for obvious reasons), so you can prune immediately. Otherwise, push it on to the list of working 3-grams, and then keep proceeding through the 2-grams for list one. In processing the second list, update the working list of 3-grams, then at the end, sweep all the 3-grams that only showed up in both of the first two lists. Keep continuing in this fashion until you can't extend any of the n-grams.
/**
* Returns the longest shortest substring (LCS) of two strings.
*
* @param aString First string.
* @param bString Second string.
* @return The longest common substring of aString and bString.
*/
public String longestSubstring(final String aString, final String bString) {
// Get length of the 2 string
final int m = aString.length();
final int n = bString.length();
// to ensure O(min(m,n)) space complexity
// the biggest string should be the first parameter
// so i swap them and try again
if (n > m){
return longestSubstring(bString,aString);
}
// Initialize the 2 array to the length of the longest string
int maxLength = Math.min(m,n);
int[] last = new int[maxLength];
int[] next = new int[maxLength];
StringBuffer result = new StringBuffer();
int len = 0;
final char []a = aString.toCharArray();
final char []b = bString.toCharArray();
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
if (a[i] == b[j]) {
next[j] = 1 + (j - 1 >= 0 ? last[j - 1] : 0);
if (next[j] > len) {
len = next[j];
result.setLength(0);
final int beginIndex = i - len + 1;
final int endIndex = len + beginIndex;
result.append(aString.subSequence(beginIndex, endIndex));
}
}
}
System.arraycopy(next, 0, last, 0, maxLength);
Arrays.fill(next, 0);
}
return result.toString();
}
Working PHP solution
function boggle($word)
{
$bb = array(array('s', 'm', 'e', 'f'),
array('r', 'a', 't', 'd'),
array('l', 'o', 'n', 'i'),
array('k', 'a', 'f', 'b'));
$warr = str_split($word);
$k = 0;
for($i = 0; $i < count($bb[0]); $i++)
for($j = 0; $j < count($bb); $j++)
{
if($warr[0] == $bb[$i][$j])
{
$visited[$i][$j] = true;
return check($bb, $i, $j, $warr,0, $visited );
}
}
}
function check($bb,$i, $j, $warr, $curIndex, $visited)
{
print "$i, $j\n";
$n = 4;
$dx = array(0, 1, 1, 1, 0, -1, -1, -1);
$dy = array(1, 1, 0, -1, -1, -1, 0, 1);
if($curIndex == 3)
{
return true;
}
for($d = 0; $d<8; $d++)
{
$newi = $i + $dx[$d];
$newj = $j + $dy[$d];
if($newi >= 0 && $newi < $n && $newj >=0 && $newj < $n && !$visited[$newi][$newj]
&& $warr[$curIndex + 1] == $bb[$newi][$newj])
{
++$curIndex;
$visited[$newi][$newj] = true;
$ret = check($bb, $newi, $newj, $warr, $curIndex, $visited);
if($ret)
break;
--$curIndex;
$visited[$newi][$newj] = false;
print $curIndex;
print $warr[$curIndex];
print_r ($visited);
}
}
}
print boggle("smef");
Say there are N strings.
(1) Construct a suffix tree with the first string. In each node of the suffix tree maintain a count, which will be initialized to zero.
(2) Now search all the suffix strings of each string, and while searching increment the count of the suffix-tree-node if found.
(3) Repeat step (2) for all the N-1 strings (except the first string).
(4) Do DFS on the suffix tree and return the longest-string, where node is considered valid only if its count is equal to N-1.
Here is the C++ code ...
// Suffix tree/trie based Longest common substring
#include <iostream>
#include <string>
using namespace std;
#define ALPHABETS 26
#define MAX_STRING_LEN 40
#define rep(i,n) for(int i=0;i<(int)n;i++)
#define rep2(i,m,n) for(int i=m;i<(int)n;i++)
typedef struct node node_t;
struct node {
struct node * child[ALPHABETS];
char c;
int count;
bool isEnd;
};
node_t * CreateNode(char c);
void AddString(node_t * root, string str);
bool SearchAndIncrement(node_t * root, string str);
bool _SearchAndIncrement(node_t * root, string str);
string LongestCommonSubstring(node_t * root,int count);
void _LongestCommonSubstring(node_t * root,int count,
int cur,char *resultString, string &maxString);
node_t * CreateNode(char c)
{
node_t * myNode = new node_t;
myNode->c = c;
myNode->count = 0;
myNode->isEnd = false;
rep(i,ALPHABETS)
myNode->child[i] = NULL;
return myNode;
}
void AddString(node_t * root, string str)
{
node_t * pCrawl = root;
rep(i,str.size()) {
if( pCrawl->child[str[i]-'a'] == NULL )
pCrawl->child[str[i]-'a'] = CreateNode(str[i]-'a');
pCrawl = pCrawl->child[str[i]-'a'];
}
pCrawl->isEnd = true;
}
bool SearchAndIncrement(node_t * root, string str) {
rep(i,str.size())
_SearchAndIncrement(root, str.substr(i));
return true;
}
bool _SearchAndIncrement(node_t * root, string str)
{
node_t * pCrawl = root;
rep(i,str.size()) {
if( pCrawl->child[str[i]-'a'] != NULL ) {
pCrawl = pCrawl->child[str[i]-'a'];
pCrawl->count++;
}
else
return false;
}
return true;
}
// LongestCommonSubstring from suffix tree
string LongestCommonSubstring(node_t * root,int count) {
char resultString[MAX_STRING_LEN];
string maxString;
_LongestCommonSubstring(root,count,0,resultString,maxString);
return maxString;
}
void _LongestCommonSubstring(node_t * root,int count,int cur
,char *resultString, string &maxString) {
rep(i,ALPHABETS) {
if( root->child[i] != NULL && root->child[i]->count == count ) {
resultString[cur] = i+'a';
_LongestCommonSubstring(root->child[i],count,cur+1,resultString, maxString);
}
}
if(cur > maxString.size()) {
resultString[cur] = 0;
maxString = resultString;
}
}
int main() {
node_t * root = CreateNode('r'); // Root
char * input1 = "antedon";
char * input2 = "belatedly";
char * input3 = "bigotedly";
char * input4 = "closefistedly";
char * input5 = "coldheartedly";
int numberOfStrings = 5;
// Create Suffix tree.
rep(i,strlen(input1))
AddString(root,input1++);
// Increase the count of each node in suffix tree.
SearchAndIncrement(root,input2);
SearchAndIncrement(root,input3);
SearchAndIncrement(root,input4);
SearchAndIncrement(root,input5);
// Get the maxLengthString with count == numberOfStrings-1
cout << "Longest common substring == "<< LongestCommonSubstring(root,numberOfStrings-1) << endl;
return 0;
}
Output:
Longest common substring == ted
this will not work for below input
char * input1 = "antedraon";
char * input2 = "rabelatedly";
char * input3 = "rabigotedlray";
char * input4 = "closerafistedly";
char * input5 = "coldtahearatedly";
expected output was ra..
char * input1 = "antedraon";
char * input2 = "rabelatedly";
char * input3 = "rabigotedlray";
char * input4 = "closerafistedly";
char * input5 = "coldtahearatedly";
char * input6 = "indhaarate";
expected output 'ra'
I think your algorithm will break if a word visits nodes in the suffix tree more than once. Like if your inputs are AB, ABAB and CXY the LCS should be ''. But I think your algorithm will return AB because the A->B node will have been visited three times: once when building the initial suffix tree and twice while searching the second string's suffixes. According to wikipedia, you build a suffix tree using all the words and then do a DFS and keep track of bit vector of words that are found below each node.
If you changed your algorithm so each node had a bit vector that got updated on traversals then it seems like it would work.
1. For every char in the first string starting from position 0, do 2 - 4 below.
2. Find the same char in every other string. Keep their positions in a vector.
If not found in any string, loop back to 1.
3. Keep moving the positions in every string until they do not match. Return the substring;
4. If the substr is longer than the max str, replace the max string with it
5. return max str.
public static String longestCommonSubstring(String str, String str2)
{
String result = "";
int i=0;
int duration=0;
while (i<str.length())
{
int j=0;
while(j<str2.length())
{
if (i + duration < str.length() &&
j + duration < str2.length() &&
str.charAt(i + duration) == str2.charAt(j + duration))
{
duration++;
}
else
{
if (duration > result.length())
{
result = str.substring(i, i+duration);
}
j++;
duration =0;
}
}
i++;
}
return result;
}
using System;
using System.IO;
using System.Collections.Generic;
class FaceBook
{
public static void Main(){
string[] input={"coldheartedly","closefistedly","bigotedly","belatedly","antedlon"};
foreach(var l in LongestCommonSubString(input))
Console.WriteLine("Longest Common SubString={0}",l);
}
public static HashSet<string> LongestCommonSubString(string[] inputStrings){
HashSet<string> maxSet=new HashSet<string>();
int inputCount=inputStrings.Length;
string smallestString=inputStrings[inputCount-1];
for(int length=smallestString.Length ; length>=0;length--)
{
if(maxSet.Count>0)
return maxSet;
HashSet<string> tempSubs= new HashSet<string>();
tempSubs=getAllSubs(smallestString,length);
bool isCommon=true;
foreach(var sub in tempSubs)
{
isCommon=true;
for(int i=inputCount-2; i>=0;i--)
{
if(inputStrings[i].IndexOf(sub)==-1) {
isCommon=false;
break;
}
}
if(isCommon)
maxSet.Add(sub);
}
}
return maxSet;
}
private static HashSet<string> getAllSubs(string input,int length){
HashSet<string> subs =new HashSet<string>();
for(int i=0;i<=input.Length-length;i++)
{
subs.Add(input.Substring(i,length));
}
return subs;
}
}
///
private static String lcs(String a, String b) {
int max=0;
int ma=0, mb=0;
int[][] dp = new int[a.length()][b.length()];
for(int ia=0;ia<a.length(); ia++)
for(int ib=0;ib<b.length(); ib++) {
dp[ia][ib] = a.charAt(ia) == b.charAt(ib)? 1:0;
if(dp[ia][ib]>0 && ia>0 && ib>0) {
dp[ia][ib]+=dp[ia-1][ib-1];
}
if(dp[ia][ib]>max) {
max =dp[ia][ib];
ma=ia;
mb=ib;
}
}
return a.substring(ma+1-max, ma+1);
}
\\\
JavaScript solution:
function longest_common_starting_substring(arr1) {
var arr = arr1.concat().sort(),
a1 = arr[0], a2 = arr[arr.length - 1], L = a1.length, i = 0;
while (i < L && a1.charAt(i) === a2.charAt(i)) i++;
return a1.substring(0, i);
}
console.log(longest_common_starting_substring(['goo', 'google', 'go']));
We can use suffix tree to solve this problem. But can you write the necessary code in an interview on the blackboard correctly? That's difficult!!
- Can you write the code for suffix tree in an interview? March 25, 2013