Twitter Google Interview Question
Software Engineer / Developers Software Engineer in TestsCountry: United States
Interview Type: In-Person
That is a clean solution. Some notes:
* .substring is taking a lot of runtime from you. you could maintain an index.
* Also you would need to remove curWord from store once at the end of iteration.
In the Dictionary_implemented_as_a_trie based solution, we should look for longest matches in the dictionary, rather than spitting out every possible match. Then in the given text, we'd encounter words such as Google, is, awesome
I think the time complexity is going to be O(n^n) in worst case. Any thoughts on it?
Can we improve this?
Nope, this is not the best way. It can be solved in O(n^2). See remlostime solution below using dynamic programming and memoization.
My solution O(n) time:
public static void main(String[] args) {
Set<String> dictionary = new HashSet<String>();
dictionary.add("google");
dictionary.add("is");
dictionary.add("awesome");
StringBuilder sb = new StringBuilder("isgoogleawesome");
printWords2(sb, dictionary);
}
public static void printWords2(StringBuilder sb, Set<String> dict) {
StringBuilder currentWord = new StringBuilder();
for (int i = sb.length() - 1; i >= 0; i--) {
currentWord.insert(0, sb.charAt(i));
if (dict.contains(currentWord.toString()) && i != 0) {
sb.insert(i, " ");
currentWord.setLength(0);
}
}
System.out.println(sb);
}
#include <iostream>
#include <set>
#include <string>
using namespace std;
void print(int index, int f[], string &s)
{
if (index == 0)
return;
print(f[index], f, s);
string a(s, f[index], index - 1 - f[index] + 1);
if (f[index] != 0)
cout << ' ' << a;
else
cout << a;
}
int main()
{
int f[1000]; // record the previous successful sentence index, if -1 not success
string s = "Googleisawesome";
set<string> word;
word.insert("Google");
word.insert("is");
word.insert("awesome");
f[0] = 0;
for(int i = 1; i <= s.size(); i++)
{
f[i] = -1;
for(int j = 0; j <= i - 1; j++)
if (f[j] != -1)
{
int startIndex = j;
int endIndex = i - 1;
string a(s, startIndex, endIndex - startIndex + 1);
if (word.count(a)) // find one solution
{
f[i] = j;
break;
}
}
}
if (f[s.size()] == -1)
cout << "no answer" << endl;
else
print(s.size(), f, s);
}
Could you possibly load the 'Set<string> dictionary' into a trie data structure first? (insert Trie on wikipedia link here)
Inserting the trie nodes would take O(n) time <- where n is the total number of characters.
Then iterate through the 'text' string, character by character, and traverse the trie to find the matching words, inserting spaces into your output when a leaf node is reached or when there are no child nodes that match the next character <- also O(n). So in total would run in O(2n) time which reduces to O(n) time.
Not sure how you can build tree in O(n) time from unsorted array, and also what tree are you talking about? Assume that's correct, iterating character by character will to O(n) time and for each iteration you perform another tree traversal which take another O(logn) if your tree is a balanced BST/AVL tree. Running time total would be O(nlogn) in best case.
Tran, I'm talking about constructing a Trie (do a google search for 'Trie'), Trie is also know as a prefix tree. Its basically a tree that stores a single character at each node - while the root node is blank. Unlike binary-search-trees , tries have O(m) lookup time where m is the total number of characters in the word being searched for. Tries also have an O(m) insert time.
A path through a the Trie from the root node to a single leaf node, appending the character values along the path, would produce a word from the dictionary.
public class TrieNode
{
private char key;
private Dictionary<char, TrieNode> children;
public TrieNode()
{ }
public TrieNode(char key)
{
this.key = key;
}
public TrieNode AddChild(char key)
{
TrieNode node = null;
if(children == null)
children = new Dictionary<char,TrieNode>();
if (children.ContainsKey(key))
return children[key];
else
{
node = new TrieNode(key);
children.Add(key, node);
}
return node;
}
public TrieNode GetChild(char key)
{
if (children != null && children.ContainsKey(key))
return children[key];
return null;
}
}
public class Trie
{
private TrieNode root = new TrieNode();
public void Add(string word)
{
TrieNode node = root;
for(int i = 0; i < word.Length; i++)
{
node = node.AddChild(word[i]);
}
}
public string AddSpaces(string sentence)
{
StringBuilder builder = new StringBuilder();
TrieNode node = root;
int i = 0;
while(i < sentence.Length)
{
char key = sentence[i];
node = node.GetChild(key);
if (node == null) // End of word, add space
{
builder.Append(" ");
node = root;
}
else
{
builder.Append(key);
i++;
}
}
return builder.ToString();
}
}
//-------------------------------------
static void Main(string[] args)
{
// dictionary of valid words
string[] dictionary = new string[] { "from", "waterloo", "hi", "am", "yes", "i", "a", "student" };
// add words to trie
Trie trie = new Trie();
for (int i = 0; i < dictionary.Length; i++)
{
trie.Add(dictionary[i]);
}
// get sentence with spaces
string withSpaces = trie.AddSpaces("iamastudentfromwaterloo");
Console.WriteLine(withSpaces); // outputs: i am a student from waterloo
Console.Read();
}
Not quite:
- Add "iamas" to dictionary, your program will fall in infinite loop
- Fix this, and still will fail, as -like I said below- greedy algorithms won't work here
We can use Dijkstra's approach to solve this problem. Using that approach, the time complexity should be very close to O(N).
I'm using a C++ map, which has a O(lg N), but replacing it with a hash_map, would give O(1). The size of the priority_queue (heap) is close to zero in the worst case, where nothing matches; so the O(lg N) for pq.top() should be close to constant time as well. Hence, the overall O(N), where N = size of string.
#include <iostream>
#include <map>
#include <string>
#include <queue>
using namespace std;
class Recon {
private:
struct Node {
int start;
int cost;
vector<string> words;
void Print() {
cout << "Printing words: ";
for (int i = 0; i < words.size(); ++i) {
cout << words[i] << " ";
}
cout << endl;
}
};
struct MyComp {
bool operator() (Node* l, Node* r) {
if (l->cost != r->cost) {
return l->cost > r->cost;
}
return l->start < r->start;
}
};
map<string, bool> prefixes;
string data;
public:
void Input() {
data = "jesslookedjustliketimherbrother";
vector<string> words;
words.push_back("looked");
words.push_back("look");
words.push_back("just");
words.push_back("li");
words.push_back("like");
words.push_back("her");
words.push_back("brothe");
words.push_back("brother");
for (int i = 0; i < words.size(); ++i) {
string w = words[i];
for (int j = 1; j <= w.length(); ++j) {
string sub = w.substr(0, j);
prefixes[sub] |= (sub.length() == w.length());
cout << "sub, val: " << sub << " " << prefixes[sub] << endl;
}
}
}
int minCost() {
Node* x = new Node;
x->start = 0;
x->cost = data.length();
priority_queue<Node*, vector<Node*>, MyComp> pq;
pq.push(x);
// int min_cost = x->cost + 1;
while (!pq.empty()) {
Node* n = pq.top();
pq.pop();
cout << "***n: " << n->start << " str: " << data.substr(0, n->start) << " cost: " << n->cost << " queue: " << pq.size() << endl;
int idx = n->start;
if (idx >= data.length()) {
n->Print();
return n->cost; // this should be min.
}
// Proceed pointer ahead, assuming we can't parse the cur char.
if (idx + 1 <= data.length()) {
Node* d = new Node(*n);
d->start = idx + 1;
d->cost = n->cost;
pq.push(d);
}
int j = idx + 1;
string sub = data.substr(idx, j - idx);
bool found_match = false;
while (prefixes.find(sub) != prefixes.end()) {
found_match = true;
if (prefixes[sub]) {
Node* d = new Node(*n);
d->start = j;
d->cost = n->cost - sub.length();
d->words.push_back(sub);
pq.push(d);
}
++j;
if (j > data.length()) break;
sub = data.substr(idx, j - idx);
} // end while
delete n;
}
return -1;
}
};
int main() {
Recon con;
con.Input();
cout << "Min cost: " << con.minCost() << endl;
return 0;
}
One clarification, your string also gives rise to
1.Go ogle is awe some.
2.Go ogle is awe so me.
Are we to find out all possible such interpretations.
dictionary : we can find the appropriate words by using tries ..initially it will take each character and check the relevant words which it can form till the end . and it will take the break at the end of each word . like that it traverse till the end and give the appropriate sentence to us .
I assume this algorithm will take O(nlgm) time, in addition to the time for building the trie.
I assume this algorithm will take O(nlgm) time, in addition to the time for building the trie.
I assume this algorithm will take O(nlgm) time, in addition to the time for building the trie.
Call the string S.
Assuming the dictionary D is stored in a trie and that all words in S are stored in D, then in Python:
out = ''
node = D.getRoot()
for s in S:
out += s
node = node.goToChild(s)
if node.isLeaf():
out += ' '
node = D.getRoot()
print out
Find all the dictionary words starting with 'G', e.g. 'Go', 'Google' etc. Create a graph to store the following letters. e.g. G-> {o, i}. Repeat this for each of these new elements added in the graph (i.e. o and i). Keep building the graph. When finished, do BFS to find all the paths till the last one.
Create a trie which contains all the words in the dictionary .
bool IfCombinedWithWOrds(Trie* root, char*str)
{ if(*str=='\0') return true;
current=root;
while(*str!='\0')
{
if(current->Sub[*str-'a']==null) break;
if(current->IsEnd)&&IfCombinedWithWOrds(root,str+1) return true;
current=current->Sub[*str-'a'];
str++;
}
return false;
}
}
public ArrayList<String> getWordSequences(Set<String> dict, String text) {
ArrayList<String> results = new ArrayList<String>();
getWordSequences(dict, results, text, new Stack<String>());
return results;
}
private void getWordSequences(Set<String> dict, ArrayList<String> results,
String text, Stack<String> currResult) {
if (text.length() == 0) {
String result = getSentence(currResult);
if (!results.contains(result))
results.add(result);
}
else {
String leftWord;
for (int i = 1; i < text.length(); i++) {
leftWord = text.substring(0, i);
if (dict.contains(leftWord)) {
currResult.add(leftWord);
getWordSequences(dict, text.substring(i));
currResult.pop();
}
}
}
}
private String getSentence(Stack<String> r) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < r.size(); i++)
result.append(r.get(i) + " ");
return result.toString().trim();
}
public static String formSentence(String sentenceWithoutSpace, int start,
int end, String result) {
if (sentenceWithoutSpace.isEmpty())
return result;
if (end > sentenceWithoutSpace.length())
return "-1";
String possibleWord = sentenceWithoutSpace.substring(start, end);
if (checkDictionary(possibleWord)) {
String output = formSentence(sentenceWithoutSpace.substring(end),
0, 1, result + " " + possibleWord);
if (!output.equals("-1"))
return output;
}
return formSentence(sentenceWithoutSpace, start, end + 1, result);
}
static String[] valid = { "go", "google", "i", "is", "awe", "some",
"awesome", "a", "so" };
public static boolean checkDictionary(String word) {
if (word == null)
return false;
for (int i = 0; i < valid.length; i++) {
if (valid[i].equals(word))
return true;
}
return false;
}
public static void main(String[] args) {
System.out.println(formSentence("googleisawesome", 0, 1, ""));
}
Solution to this can be found at
thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/
bool SplitIntoWords(string s, vector<string>& r)
{
if (s.empty())
{
return true;
}
TrieNode* tpos = &root;
stack<int> st;
for(int i=0; tpos && i<s.length(); ++i)
{
if (tpos->IsFinal())
{
st.push(i);
}
tpos = tpos->GetPath(s[i]);
}
if (tpos && tpos->IsFinal())
{
st.push(s.length());
}
while(!st.empty())
{
r.push_back(s.substr(0, st.top()));
bool found = SplitIntoWords(s.substr(st.top(), s.length()-st.top()), r);
st.pop();
if (found)
{
return true;
}
else
{
r.pop_back();
}
}
return false;
}
The above algorithm is wrong, it will return "i a" instead of the right answer since you algorithm does take into account "a" and "am".
Thats incorrect. It works fine for the example string provided "iamastudentfromwaterloo".
However this algorithm is greedy and would have problems if say "loo" was a dictionary word as well, and the desired sentence was supposed to be:
"i am a student from water loo"
this algorithm would still output:
"i am a student from waterloo"
// running time has to be at least as good as O(n)
what does n mean here? The length of input string or length of dictionary?
Interesting - although not possible. Looking at the following case:
string: abc
set: a ab b bc
This is clearly O(n^2), plus you need to use backtracking to ensure you exhausted all possibilities.
This version build upon BoredCoder idea, this version assume there is a good way to concatenate two string rather than making a copy of two string like python. But too lazy to write in java or C:
def getSentence(string,dic):
result = ""
start = 0
counter = 1
while (counter < len(string)):
temp = string[start:counter]
if temp in dic:
j = counter
while ( temp in dic):
j+=1
temp = string[start:j]
result += string[start:j-1] + " "
start = j - 1
counter = start + 1
else:
counter += 1
result += string[start:counter]
return result
Just recursive algo.
Starting from 0, substring till i. If it is valid, we check if we can insert spaces in the substring i to end. Since the subproblem reduces to the original problem. This can probably be done using dynamic programming.
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<string> findSpaces(string input, vector<string> words)
{
vector<string> wspace;
if(input.length() == 0)
return wspace;
if(words.size() == 0)
{
cout << input << endl;
return wspace;
}
int flag = 0, i = 0;
while(flag == 0)
{
string subs = input.substr(i+1);
vector<string>::iterator it = std::find(words.begin(), words.end(), input.substr(0, i+1));
if( it != words.end())
{
if(i+1 != input.length())
wspace = findSpaces(subs, words);
if(i+1 == input.length() || wspace.size() > 0)
{
wspace.push_back(input.substr(0,i+1));
flag = 1;
}
}
i++;
if(i == input.length())
flag = 1;
}
return wspace;
}
int main(int argc, char **argv)
{
vector<string> words;
int n;
cin >> n;
string input;
getline(cin, input);
for(int i = 0; i<n; i++)
{
getline(cin, input);
words.push_back(input);
}
getline(cin, input);
vector<string> wordSpace = findSpaces(input, words);
cout << " WORDS " << endl;
for(int i = 0; i < wordSpace.size(); i++)
cout << wordSpace[i] << " ";
cout << endl;
}
This is a recursive depth first search with backtracking.
Personally I do not believe this approach qualifies as Dynamic Programming (recursion is not dynamic programming).
This will not deliver the result in O(n) time...
On a side note, you could implement some pruning such that searches do not go any deeper than the length of the longest word in the dictionary.
I.e. Theres no point searching for dictionary words that are 20 characters long when the longest word in the dictionary is only 8 charactes long..
def getSentence(txt,dico):
start = 0
cur = 0
if len(txt) == 0:
return None
while cur < len(txt):
cur +=1
if txt[start:cur] in dico:
res = getSentence(txt[cur:len(txt)],dico)
if res is not None:
return txt[start:cur]+' '+res
if txt[start:cur] in dico:
return txt[start:cur]
else:
return None
def getSentence(txt,dico):
start = 0
cur = 0
if len(txt) == 0:
return None
while cur < len(txt):
cur +=1
if txt[start:cur] in dico:
res = getSentence(txt[cur:len(txt)],dico)
if res is not None:
return txt[start:cur]+' '+res
if txt[start:cur] in dico:
return txt[start:cur]
else:
return None
def getSentence(txt,dico):
start = 0
cur = 0
if len(txt) == 0:
return None
while cur < len(txt):
cur +=1
if txt[start:cur] in dico:
res = getSentence(txt[cur:len(txt)],dico)
if res is not None:
return txt[start:cur]+' '+res
if txt[start:cur] in dico:
return txt[start:cur]
else:
return None
void GetSentence(string text,vector<string> dictionary)
{
deque<pair<string,int>> my_deque;
hash_set<string> my_hash;
int N=dictionary.size();
vector<int> temp(text.length());
fill(temp.begin(),temp.end(),-1);
for(int i=0;i<N;i++)
my_hash.insert(dictionary[i]);
my_deque.push_back(pair<string,int>::pair("",-1));
while(!my_deque.empty())
{
pair<string,int> top_string=my_deque.back();
int cur_pos=top_string.second;
if(cur_pos==text.length()-1)
break;
bool flag=false;
for(int i=cur_pos+1;i<text.length();i++)
{
string sub_string=text.substr(cur_pos+1,i-cur_pos);
hash_set<string>::iterator iter=my_hash.find(sub_string);
if(iter!=my_hash.end() && i>temp[cur_pos+1])
{
my_deque.push_back(pair<string,int>::pair(sub_string,i));
temp[cur_pos+1]=i;
flag=true;
break;
}
}
if(!flag)
my_deque.pop_back();
}
if(!my_deque.empty())
my_deque.pop_front();
while(!my_deque.empty())
{
pair<string,int> cur_pair=my_deque.front();
my_deque.pop_front();
cout<<cur_pair.first<<" ";
}
cout<<endl;
}
non recursive:
void GetSentence(string text,vector<string> dictionary)
{
deque<pair<string,int>> my_deque;
hash_set<string> my_hash;
int N=dictionary.size();
vector<int> temp(text.length());
fill(temp.begin(),temp.end(),-1);
for(int i=0;i<N;i++)
my_hash.insert(dictionary[i]);
my_deque.push_back(pair<string,int>::pair("",-1));
while(!my_deque.empty())
{
pair<string,int> top_string=my_deque.back();
int cur_pos=top_string.second;
if(cur_pos==text.length()-1)
break;
bool flag=false;
for(int i=cur_pos+1;i<text.length();i++)
{
string sub_string=text.substr(cur_pos+1,i-cur_pos);
hash_set<string>::iterator iter=my_hash.find(sub_string);
if(iter!=my_hash.end() && i>temp[cur_pos+1])
{
my_deque.push_back(pair<string,int>::pair(sub_string,i));
temp[cur_pos+1]=i;
flag=true;
break;
}
}
if(!flag)
my_deque.pop_back();
}
if(!my_deque.empty())
my_deque.pop_front();
while(!my_deque.empty())
{
pair<string,int> cur_pair=my_deque.front();
my_deque.pop_front();
cout<<cur_pair.first<<" ";
}
cout<<endl;
}
A Perl solution that uses a trie and finds all possible segmentations:
use Modern::Perl;
segment("Googleisawesome");
sub segment {
my ($string, $t) = @_;
unless(defined $t) {
$t = Trie->new;
$t->insert("Google");
$t->insert("is");
$t->insert("awesome");
$t->insert("Go");
$t->insert("ogle");
}
_segment($string, 0, $t, []);
}
sub _segment {
my ($s, $index, $trie, $result) = @_;
my @prefixes = $trie->prefixes($s, $index);
for my $prefix (@prefixes) {
my $l = length($prefix);
say "Result: @$result $prefix" if $index+$l >= length($s);
find($s, $index+$l, $trie, [@$result, $prefix]);
}
}
package Trie;
use Modern::Perl;
sub new {
my $class = shift;
bless {
val => undef,
cld => {},
}, ref $class || $class;
}
sub insert {
push @_, 0;
goto &_insert;
}
sub _insert {
my ($self, $string, $index) = @_;
if($index >= length($string)) {
$self->{val} = 1;
return;
}
my $c = substr($string, $index, 1);
$self->{cld}{$c} = $self->new unless $self->{cld}{$c};
return $self->{cld}{$c}->_insert($string, $index+1);
}
sub prefixes {
push @_, $_[2];
goto &_prefixes;
}
sub _prefixes {
my ($self, $string, $index, $startindex) = @_;
return if $index > length($string);
my $c = substr($string, $index, 1);
my @rest;
$self->{cld}{$c}
and @rest = $self->{cld}{$c}->_prefixes($string, $index+1, $startindex);
$self->{val}
and return @rest, substr($string, $startindex, $index-$startindex);
return @rest;
}
impl in node.js,
use the trie to store the dictionary and lookup with o(1), backtrack the substring and do greedy match.
'use strict'
var Trie = require('./trie'), trie = new Trie();
trie.insert('i');
trie.insert('am');
trie.insert('from');
trie.insert('waterloo');
trie.insert('hi');
trie.insert('student');
trie.insert('a');
trie.insert('water');
trie.insert('stu');
function getSentence(sentence){
var newSentence = [];
var subString = sentence[0];
for (var i=1;i<sentence.length;i++){
if(isWord(subString+sentence[i])){
subString = subString+ sentence[i];
}else{
newSentence.push(subString);
subString = sentence[i];
}
}
newSentence.push(subString);
return newSentence.join(' ');
}
function isWord(string){
var found = trie.find(string) || trie.countPrefix(string)>0;
return found;
}
console.log(getSentence('iamastudentfromwaterloo'));
I wrote of the solution on my own, but the most liked solution (in java) is exactly same. Here's my solution in C++:
#include <iostream>
#include <string>
#include <set>
#include <iomanip>
using namespace std;
bool GetNextWord(const string& s, const set<string>& dict, string& ans)
{
string nextAns;
for (unsigned i = 1; i <= s.size(); i++) {
ans.assign(s, 0, i);
//cout << "Trying : " << ans << " -> " << (dict.find(ans) != dict.end()) << endl;
if (dict.find(ans) != dict.end() &&
(i == s.size() ||
GetNextWord(string(s,ans.size()), dict, nextAns))) {
ans.append(" ");
ans.append(nextAns);
return true;
}
}
return false;
}
string GetSentence(string s, set<string>& dict)
{
string ret = s;
if (GetNextWord(s, dict, ret)) {
return ret;
}
return "No Solution exist";
}
int main(int argv, char* argc[])
{
// populate dictionary
string dicArr[] = {"from", "waterloo", "hi", "am", "yes", "i", "a", "student"};
unsigned cn = sizeof(dicArr) / sizeof(string);
set<string> dict(dicArr, dicArr + cn);
// problem
string str("iamastudentfromwaterloo");
cout << setw(10) << "Input : " << str << endl;
cout << setw(10) << "Output : " <<GetSentence(str, dict) << endl;
return 0;
}
#include <iostream>
#include <string>
#include <set>
#include <iomanip>
using namespace std;
bool GetNextWord(const string& s, const set<string>& dict, string& ans)
{
string nextAns;
for (unsigned i = 1; i <= s.size(); i++) {
ans.assign(s, 0, i);
//cout << "Trying : " << ans << " -> " << (dict.find(ans) != dict.end()) << endl;
if (dict.find(ans) != dict.end() &&
(i == s.size() ||
GetNextWord(string(s,ans.size()), dict, nextAns))) {
ans.append(" ");
ans.append(nextAns);
return true;
}
}
return false;
}
string GetSentence(string s, set<string>& dict)
{
string ret = s;
if (GetNextWord(s, dict, ret)) {
return ret;
}
return "No Solution exist";
}
int main(int argv, char* argc[])
{
// populate dictionary
string dicArr[] = {"from", "waterloo", "hi", "am", "yes", "i", "a", "student"};
unsigned cn = sizeof(dicArr) / sizeof(string);
set<string> dict(dicArr, dicArr + cn);
// problem
string str("iamastudentfromwaterloo");
cout << setw(10) << "Input : " << str << endl;
cout << setw(10) << "Output : " <<GetSentence(str, dict) << endl;
return 0;
}
How about this one?
public static String getSentence(String inputSent, String[] dictonary){
char[] incharArr = inputSent.toCharArray();
List<String> dictList = Arrays.asList(dictonary);
String outPutStr = "";
String temp = "";
for(int i=0; i<incharArr.length; i++){
temp += incharArr[i];
if(dictList.contains(temp)){
outPutStr += temp+" ";
temp = "";
}
}
return outPutStr;
}
// c++ Code
string getSentence(string text, set<string> dictionary)
{
string finalstring = "" ;
for ( int i = 1 ; i<= text.length(); i++)
{
string firsthalf = text.substr(0,i) ;
string secondhalf = text.substr(i) ;
if (dictionary.find(firsthalf) != dictionary.end())
{
finalstring += firsthalf ;
finalstring += " " ;
text = secondhalf ;
i = 1 ;
}
}
return finalstring ;
}
import java.util.HashSet;
import java.util.Set;
public class SeperateIntoSentence {
public static String getString(String text, Set<String> Dictionary){
if(text.length() == 0)
return "";
for(int i = text.length(); i >= 1; i--){
String s = text.substring(0, i);
if(Dictionary.contains(s)){
return s+" "+getString(text.substring(i), Dictionary);
}
}
return "";
}
public static void main(String[] args){
Set<String> Dictionary = new HashSet<String>();
Dictionary.add("I");
Dictionary.add("am");
Dictionary.add("a");
Dictionary.add("smart");
Dictionary.add("guy");
System.out.println(getString("Iamasmartguy", Dictionary));
}
}
Assume the O(n) means O(n * m) where n is the number of char in the text and m is number of strings in the dict.
I think we could do:
1. KMP for each one of the m dictionary strings that's O(n * m).
2. on Each KMP pass, marked down a set under each one of the n char in the original string text, the length of the current dict word.
for example, if we have dictionary word "water", we have text "waterloowaterfall", then under the first 'w' we have +5, and under the second 'w' we have +5 as well.
assume we have at most m different length, that could took up to O(n*log(m))
3. do one dimension DP.
dp[i] = true if any previous valid string ended up at word number [i-1];
initially dp[0] = true; all others are false;
sweep from 1 - n, if dp[i] is true, then dp[i] + any sets we talked about above, is set to true;
this pass is also O(n*m) as there's at most m elements in the each character.
There's an O(n * m) solution.
Here is C++ Solution...!!!
#include<iostream>
#include<set>
#include<string>
using namespace std;
bool insertSpace(string str, set<string>& dict, vector<string>& store){
set<string>::iterator it;
if (!str.length())
return true;
for (int i = 1; i <= str.length(); ++i){
string s = str.substr(0, i);
it = dict.find(s);
if (it != dict.end() && insertSpace(str.substr(i), dict, store)){
store.push_back(*it);
return true;
}
}
return false;
}
int main(){
set<string> dict;
string input("iamstudentfromwaterloo");
dict.insert("from");
dict.insert("waterloo");
dict.insert("hi");
dict.insert("am");
dict.insert("yes");
dict.insert("i");
dict.insert("a");
dict.insert("student");
vector<string> result;
insertSpace(input, dict, result);
for (int i = result.size()-1; i >=0;i--)
cout<<result[i]<<" ";
cout << endl;
return 0;
}
Implementation in Scala. Validated against provided test
def getSentence(str: String, dict: Set[String]): String = {
def loop(acc: String = "", remaning: String = str): String = {
remaning match {
case x if x.isEmpty =>
acc
case x =>
val head = x.head
val seqView = remaning.indices.view map(i => remaning.take(i + 1)) filter dict.contains
seqView match {
case a if a.isEmpty =>
val newAcc = if (acc.isEmpty) head.toString else acc + " " + head.toString
loop(newAcc, remaning.tail)
case a =>
val l = a.last
val acc1: String = if (acc.isEmpty) l else acc + " " + l
loop(acc1, remaning.drop(l.length))
}
}
}
loop()
}
def getSentence(str: String, dict: Set[String]): String = {
def loop(acc: String = "", remaning: String = str): String = {
remaning match {
case x if x.isEmpty =>
acc
case x =>
val head = x.head
val seqView = remaning.indices.view map(i => remaning.take(i + 1)) filter dict.contains
seqView match {
case a if a.isEmpty =>
val newAcc = if (acc.isEmpty) head.toString else acc + " " + head.toString
loop(newAcc, remaning.tail)
case a =>
val l = a.last
val acc1: String = if (acc.isEmpty) l else acc + " " + l
loop(acc1, remaning.drop(l.length))
}
}
}
loop()
}
def getSentence(txt, dico):
start = 0
cur = 0
d=[]
t = 0
if len(txt) == 0:
return None
while cur < len(txt):
cur += 1
item = txt[start:cur]
if item in dico:
t = cur+1
temp = txt[start:t]
if temp in dico:
d.append(txt[start:t])
t=t+1
elif ''.join(d[-1:]) != item:
d.append(item)
txt = txt[cur:]
cur = 0
else:
txt = txt[cur:]
cur = 0
print ' '.join(d)
txt = 'iamastudentfromwaterloo'
dico = {"from", "waterloo", "hi", "am", "yes", "i", "a", "student"}
getSentence(txt,dico)
def getSentence(txt, dico):
start = 0
cur = 0
d=[]
t = 0
if len(txt) == 0:
return None
while cur < len(txt):
cur += 1
item = txt[start:cur]
if item in dico:
t = cur+1
temp = txt[start:t]
if temp in dico:
d.append(txt[start:t])
t=t+1
elif ''.join(d[-1:]) != item:
d.append(item)
txt = txt[cur:]
cur = 0
else:
txt = txt[cur:]
cur = 0
print ' '.join(d)
txt = 'iamastudentfromwaterloo'
dico = {"from", "waterloo", "hi", "am", "yes", "i", "a", "student"}
getSentence(txt,dico)
1. int start = 0; String res = " ";
2. for int j=1; j<text.length(); j++
3. String test = text.substring(start, j);
4. if(test) exists in dictionary then res += test + " "; start = j;
5. After for-loop finishes; return res;
This sounds like a backtracking problem.
- smffap February 28, 2012