Interview Question
Country: United States
@dizhu2016: geeks4geeks, leetcode, topcoder or more academic, there are good CS courses on MIT OCW (open courseware) which I found very inspiring. Wikipedia has sometimes good explanations and not to forget CLRS (introduction to algorithms, cormen, leiserson, rivest, stein) and if you like math and consider yourself really smart, probably knuth's art of computer programming is worth it (I'd take a year or so in a very quite place to fully go through volume 1 I think...)
I just figured out a simple way to do this. Let me know what do you think.
The idea is to create a double type array with size equal to the number of clusters, name it as double d[]. d[i] is the number of machine assigned to cluster i. Though d[] should be an integer, we assume it is a double type first and a machine can be taken as a fraction.
d[i] = a[i] * total number of machine / total number of task
. This means we evenly divide all the task to each machine and work out the number of machines required for each clusters by a justifiable way despite machine needs be cutting into fraction. For the example of a[]=[200,450], a d[] should be create as [2.1538461538461537, 4.846153846153846]
Then we are going to make the adjustment. This means we are going to converge the double value of d[i] to the smaller closest integer value. This also equivalently means we are going to increase the number of tasks per machine in each cluster corresponding to the dropping value of d[i]. How much value dropped of d[i] by percentage is equal to how much the number of tasks per machine in each cluster raised by percentage.
Then the outcome should be simple to get. We should choose the cluster k which has d[k] with smallest drop by percentage after converged to nearest smaller integer. The answer is
a[i] / d[k]
.
For this example, we should choose cluster 0, since 2.1538461538461537 to 2 only dropped 6.9% and raise the number of task per machine by 6.9%, whereas for cluster 1, we will raise the number of task per machine by 21% if we change the 4.846153846153846 number of machine to 4.
Hopefully there is no loophole for this idea. To me this seems right and simple.
@dizhu2016: I think you run into numerical issues, there are three constraints
1) distribute exactly B machines, no more, no less
2) each cluster must have at least one machine
3) when you have one remaining machine, choose the cluster which have highest ceiling tasks per machine
this conditions are not easily satisfied in a closed formula, e.g.
a[] = {1, 1000}
B = 2
N = 2
d[0] = 1 * 2 // 1001 = 0
d[1] = 1000 * 2 // 1001 = 1
// denotes integer division, or (floor(a/b), "next smaller integer")
Issues with the above:
a) this is only one machine distributed
b) even with proper rounding you would get d[0] = 0 d[1] = 2, which violates the condition of at least one machine in a cluster
c) you can fix it and distribute the remaining machines after giving each cluster a machine, but the intuition whether this is right or wrong is not so straight forward, given constraint 3)
d) besides, there will be numerical issues, due to any kind of rounding it's not given that you will distribute exactly B machines, it could well be B+1 or B-1 due to rounding.
it's like if you distribute 10 candies to 20 people, and you can only give 0 or 1 to a person, if you do this with doubles and round, you either distribute 0 candies or 20 candies...
you can create a priority queue (max heap) where the key is the rounded up number of tasks per machine, that is (a[i]+b[i]-1)/b[i], assuming a[i] is the number of tasks for i 0<=i<N and i is an additional attribute of the queue item, b[i] = 1 for all i.
- Chris July 21, 2017Then you can distribute the B-N remaining machines by removing the top element from the queue and re-inserting it after b[i] was incremented. After this B-N operations the key of the top element is the answer. This will have time complexity of O(B-N).