Part 1: You are given a comput

Google Interview Question for Software Engineers

2

of 2 votes

3
Answers
Part 1: You are given a computer #1 with array Foo, a computer #2 with array Bar and a spare computer #3. You need to apply a function F to corresponding/matching elements of the two arrays. How would you do that?

Part 2: Once you scale up, how would you balance the number of machines sorting with the machines applying the function?

Part 3: What if the master (which is distributing the work) dies and never recovers?
- tested.candidate March 21, 2015 in UK | Report Duplicate | Flag | PURGE
Google Software Engineer Distributed Computing

Email me when people comment.

An error occurred in subscribing you.

Country: UK
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Part 1: The arrays need to be sorted first and then the function can be applied. This can be accomplished by splitting the sorting between the machines, then merging the results and then exchanging the parts of the arrays for application of the function (a form of a map-reduce algorithm). Heap sort would be particularly useful as it produces consecutive (and sorted) elements during sorting which could already be used for application of the function.

Part 2: Either run a small subset first to get an idea if it is linear distribution and then divide statically according to that or try to adapt during processing depending on the load of the machines.

Part 3: Use master election algorithms (gossip algorithms, etc.).

- tested.candidate March 21, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can you please clarify about the sorting?
From what I understand, by splitting the sorting between machines, are you sorting the arrays partially? I find it confusing when you're using map reduce and heap sort.

Please explain a little more about that.

Thanks.

- ravioactive March 23, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

why do you assume they would need to be sorted, what if they're objects and we can't sort them in any meaningful way?

- Anonymous November 11, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

1. I dont think we need to sort both arrays. (==nlog(n))
For the first one, hash every value and point it into the array. (this is log(n) as you run on the entire array
then, run on the second array, ad for each element , look if it exists on the hash, if it does, pull the element and operate F on it.

[There is an assumption that the values are fixed and it is not a stream of data]
2. balancing is relatively easy,hash the first array [or split the array, hash every part on different machines and combine the hashmaps --> which is a basic MapReduce).
then you can splot the second array and operate each machine on different part.
3. Per the first answer above, find a different master (Like done in Hadoop)

- superten May 11, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- ravioactive March 23, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I believe that by doing quick sort it feasible: split in the middle and send each part to a different machine and so on

- superten May 11, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I refer to Foo as A, Bar as B.

1. Assuming every element in A has exactly 1 corresponding element in B, sort both A and B then apply F(A[i], B[i]) for i = 0 to length of A or B.

We can do this with just machine 1 and 2. However we really should also utilize machine 3. One simple algorithm is to partition (like in quicksort) the arrays A and B into 24 parts (this is a power of 2 evenly divided by 3). Then put 8 parts of each array onto each machine. Each machine can then sort each sub-partition and run function F on corresponding values from A and B like mentioned above.

2. I assume by scale up, this means you end up with more spare machines. Same idea as above, we want to place the same number of elements onto each machine via partitioning the arrays such that each machine gets the same number of elements to process, then letting each machine handle their load at their own pace.The nice thing about the recursive partitioning is that it doesn't have to be done by a single machine. Once you partition an array once, the 2 halves now don't care about each other and can be treated independently on different machines. This means, once machine 1 breaks A into A[0...i] and A[i+1, A.length-1], A can offload A[i+1, A.length-1] to be processed by another machine.

3. Need to somehow distribute the control. In the extreme case, every machine could know what every other machine is working on and report process to each other. Ideally, you would have a balance of masters and slaves where only the master needs to coordinate slaves and slaves need to communicate with the master, but still know about other slaves in the same cluster. If the master is non-responsive, a new master needs to be created from another slave in the cluster, that slave needs to figure out what work was done and what work remains, and continue from there.

- JW March 25, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I like that you split and count at the same machine without a need to merge the result. However it works only if "matching" always means that A[i], B[i] are the ones "matching".

- Anonymous April 09, 2015 | Flag

CareerCup

Google Interview Question for Software Engineers

Books

Videos

Resume Review

Mock Interviews