Amazon Interview Question
InternsCountry: India
Interview Type: In-Person
Why you cant use like this?
HashTable<CompanyID, HashTable<TimePeriod, stockValue>>
OR
HashMap<CompanyID, HashMap<TimePeriod, stockValue>>
Time period is arbitrary. Will you store an arbitrary number of time periods? The bound on such a data structure is infinity.
This is a good solution. But i would rather prefer a ConcurrentHashMap instead of HashTable.
HashMap<CompanyID, MIN-HEAP-BY-Stock-Value<Time>>
Searching O(1)
hashMap to get records of a copmanywith company-ID in O(1).
Got Min-Heap in O(1)--> search time with min stock will take O(1).
Insertion- O(log n)
HashMap- get record of company with company ID in O(1)
In Min-Heap insertion will take O(log n)
Map company ID to array of stock values. The arrays are naturally sorted by time, as each write would simply append.
To determine the min stock value for an arbitrary time period, perform a linear scan over the array. This is O(n), where n is the number of stock values for a given company. The use of a heap would increase this to O(n lg n), as such a heap could not be constructed until the input time period is known.
Because the number of stock prices for a reasonable time period (i.e. a day) may be quite large, we may wish to reduce the breadth of our linear scan by segmenting the per-company-arrays (i.e. one per hour, or day, or whatever is appropriate for the situation).
We may further reduce the breadth of the linear scan by maintaining min values for certain time periods. If the input distribution is heavily skewed towards a particular type of time period (e.g. delimited by day), then these cached min may facilitate a dramatic reduction in linear scan breadth -- potentially O(1) depending on the input.
Need more info, but this seems like a dynamic problem (no preprocessing on offline static array) can be solved with segment trees and range min. Queries.
Or buffer the interday stock variations if the time slices are smaller than a day and insert into segment tree at appropriate intervals.
Or mix and match even more.
It depends on more info.
Every day or week or whatever we can offline historic data and preprocess on that part. Then we can do range min. Quries acrossd the offline and dynamic part if need be and form results.
Answer changed to depends.
From the question I gather:
> maintaining stock values ...
Sounds like timeseries. I would use 2 arrays for this wherein first one stores #of seconds/milliseconds elapsed since the beginning of capture and the second one stores the value. It is much faster and less overhead with arrays. The only issue is managing them. Besides you can do binary search on the time if required.
Follow up question:
Is modification to the data structure multi-threaded? If yes use Hashtable instead of HashMap.
Now about application of the data structure:
HashMap<String,Object[][]>
where value is:
Object[][] timeseries = new Object[]{
new int[MAX_TIME_TO_STORE_IN_SECONDS],
new float[MAX_TIME_TO_STORE_IN_SECONDS]
}
// assuming that the precision of data structure is seconds
A nice open ended question
Let's assume
1) data is coming in continually
2) The stock price varies frequently I hear high frequency traders count feet of fiber optic in some co located data centers.
3) These same stock trader's are going to want things to happen fast
4) We might assume that the frequency of trades is also low relative to the granularity of the time line
So we end up with a sparse time line
This suggests some kind of a binary tree, you could speed query time by turning it into a range tree or something similar. You could put some range information on each node to reduce search time. Additionally you could put the minimum value on these nodes.
However binary trees don’t do so well on machines with cashes. Additionally we will be continually adding data and search trees start requiring rebalancing when you do that. If you assume 10 to 100 fluctuations in the price per second and you want to keep years of data around you start getting beyond what you can reasonably keep in ram. It is probably a good time to go back to the people who wanted this program and get some more details on how it will be used.
One good option might be to use trees with really big nodes each of which contains an implicit tree structure. You might also consider moving all the actual data in to the leaves, though I have not thought this through. The memory bottle neck also brings even the notion of cashed ranges and minimum values into question when you get close to the leaves. Omitting this information will let you cram more data into a big leaf and this may offset the time needed to compute the values by looking at all the data once you have it on the processor. So each big leaf would then be just a sorted list with times or time deltas and the price and or price deltas.
Cashing and restructuring of the tree based on expected data usage might also be worth considering. You could keep some extra pointers around based on this as well so you don’t need to grab multiple disparately stored nodes off the disk to make queries that are of the most interest. While the millisecond to millisecond fluctuations might matter over the last hour such granularity in the stock's value 10 years ago may not be so important so we could perhaps reduce granularity of older data or just dump it after a certain point. This should not be too time consuming a task but in the fast paced world of high frequency trading it might be worth using a different core to do this and other house cleaning tasks.
HashTable<CompanyID, HashMap<TimePeriod, stockValue>>
- Anonymous November 07, 2013