Print words in order which are

Amazon Interview Question for SDE-2s

1

of 1 vote

7
Answers
Print words in order which are occurring once in huge document. The RAM size 100MB and file size 10 GB.
Example: I am good. You are good.
Answer : I am You are
Note : It's not datastructure problem where we can simply make use of LinkedHashMap. The file size is going to be huge.
- Optimist December 16, 2017 in India | Report Duplicate | Flag | PURGE
Amazon SDE-2 Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

Since we are constrained to 100MB, we can do a disk sort of the file chunk. Once it's sorted all we need is to load the chunked sorted file and by comparing the words next to it we can determine if its unique or not. Once we have the unique words we can revisit the chunked files to determine the order.

- Beginner December 16, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

read file chunk-by-chunk and build bloom filter
read file chunk-by-chunk and filtering
or
1: break file for chunks
processing every chunk
store result in new file
goto 1:

- lalalka December 16, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

bloom filter seems to be feasible solution. Read file in chunk sequentially and keep compare with bloom filter. It will need 2 scan of file.
1. make bloom filter of every words that you encounter and put into bloom filter b1, which ever word is find positive in bloom filter remove that and put in another bloom filter b2.
2. During 2nd scan just remove the word that give positive for bloom filter b2.

Thats how remaining word will have frequency 1 and will be in order.

- Anonymous December 17, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Since RAM is very less than the size of file, we can't load it into memory at a time and hence we need chunk logic:

1. Let's divide file into 30-40 chunks such that at least 2 chunks can be loaded in memory at a time.

2. Load i'th chunk into memory. This chunk will contain words which will be compared with next remaining chunks, one at a time.
(a) Load next chunks one by one into memory and remove all words which also occurred in i'th chunk. After every chunk is processed like this, write it back into file at its designated position.

(b) Remove duplicate words into the i'th chunk itself and then write it back into the file as well at its designated position.

3. Repeat step 3 for all chunks until we reach EOF (end of file).

Note for step 2: Start position of next chunk to be compared with others will be the next position after end of previous updated (written into file after processing in 2(b)) chunk.

- ashu1.220791 February 07, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

CareerCup

Amazon Interview Question for SDE-2s

Books

Videos

Resume Review

Mock Interviews