Two Sigma Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.
CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.
Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.
Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.
A generic and general solution is hard (billions of integers, large integers (e.g. 64 bit) and the desire to be exact and fast in all situations, extremes), but you may have some constraints you can leverage e.g.:
- Chris April 26, 2017- integer size is only 10 or 16 bit: you could use counting sort technique
- do you know something about the distribution: can you tell a probability the median being within a certain range after seeing a few values?
- you might get away with less precision for the median? Maybe precise in a certain range and less if far away from expected?
- size of the stream, can it fit into memory?
- can you assume after you've seen a sample (e.g. 10'000 values) the median is reasonable stable?
The two classics are
- counting sort if median range can somehow be limited
let's assume you have 10 bit values, you count for each of the 1024 possible values
the number of occurences. you insert in O(1) and if you need to know the median,
it's in O(1), too --> easy
- two heaps if values fit in memory and integers are large
Min-heap (for values >= median) and Max-heap (for values < median)
if the next value from the stream is < Min-heap, insert it to the min heap otherwise to the max heap
if the two heap sizes differ by more than 1, take the top from the larger heap and put it to the smaller heap
to get the median: if both heaps are equal size, take the average of the bot tops otherwise take the top of the larger heap
--> insert O(lg(N)), get median: O(1)
maybe interesting could be, if you work with the two heaps but clean up memory from time to time by removing the tails on both heaps. You could for example say, when your heap exceeds 10'000 items, you purge 5'000 tail items, in the min heap this would mean for example to remove the 5'000 values (first quartile) or median of max-heap and do the symetrically the same with the min-heap (forth quartile) ...
depending on the nature of the distribution, chances are, your median never moves out the window and you get a O(1), O(1) solution since you limit N to a constant ...