pulkit.mehra.001
BAN USERAn enthusiastic programmer...!!
- 1of 1 vote
AnswersGiven a huge file 100 million integers. He further divided the file into 100 files with 1 million integers in each and each file is sorted. Needed to find the k smallest integers.
- pulkit.mehra.001 in India
I used the concept of min-heap. Take the first element from each file and construct a min-heap. Take the root,as it is the smallest element and insert the next element from the file which contains the root root element. Heapify the tree and repeat k times.
The interviewer asked if another efficient method exists?| Report Duplicate | Flag | PURGE
Amazon SDE1 Algorithm - 0of 0 votes
AnswersAmazon has many visitors to its site. And it tracks what pages the customers visited, etc and other stuff.
- pulkit.mehra.001 in India
Make an efficient data structure for storing 3 days of information of all those customers who have visited site exactly two different days and searched more than 3 unique pages of the site in those 2 days.
So whoever visited site exactly two days out of these three days and visited more then 3 unique pages should be in the contact list.
What's the efficient approach to solve these kinds of problems??| Report Duplicate | Flag | PURGE
Amazon SDE1 Data Structures
We can even have a HashMap< word, frequency> along with a Min Heap.
- pulkit.mehra.001 February 12, 2016Use MergeSort to sort the array and also keep track of the indices of the elements in another index[] array.
Traverse through the array and find the max range.
I think the complexity will be linear in k
The fact that the files are sorted and the approach of merging 2 sorted files is being used here.
Suppose we have 2 files with 100 elements in each and each is sorted. We need 10 smallest elements. Then we will definitely need 10 comparisons.
Compare 1st element of file 1 with 1st element of file 2. Take the smaller and move to the next element in that file, and so on.
So for k elements the number of comparisons will be O(k)
We need to do this for all the 100 files finding the smallest k elements using the concept of merging so overall time complexity will be O(100k)
Yeah, this is a good way. But in this case we would need to process all the elements in all the files. I think we can also have a flag which would be set to true if the root is smaller than the next elements from every file, implying that the smallest k elements have been found !!
- pulkit.mehra.001 April 08, 2014But then you are not making use of the fact that the files are sorted.
- pulkit.mehra.001 April 08, 2014There are a few good methods to find the starting point of the loop.
I will discuss 2 of them:-
Method 1:-
1. First check if a loop is present in the list, using the concept of fast and slow pointers, as mentioned in a post above by @Engineer1111
2. If a loop is present, your fast pointer is pointing to one of the nodes in the loop
3. Take another pointer, say start, point it to head of list. One by one move start one node ahead and check if it is reachable from the fast pointer(the one in the loop).
4 The first node which is reachable from fast is the starting node of loop
Method 2:- Efficient way
1. Detect if a loop exists using fast and slow pointers
2. Count the number of nodes in the loop if it exists, let it be k
3. Fix a pointer to head of linked list and the other k nodes ahead from the head
4. Move them one node ahead at a time, the point where they meet is the start of the loop
Following is an iterative implementation to print all the combinations.
It first prints all combinations consisting of 1 elements, then prints all combinations consisting of 2 elements and so on
#include<stdio.h>
int main()
{
int arr[] = {1,2,3};
int gap,i,j,k;
/* loop is for the length of the number of elements */
for(gap=1;gap<=sizeof(arr)/sizeof(arr[0]);gap++)
{
/* loop is to traverse the entire list, for eg. if gap is 1 (1st iteration), for above list it will print 1,2,3 */
for(i=0;i<sizeof(arr)/sizeof(arr[0])-gap+1;i++)
{
int num = 0; /* tmp variable to store the number formed*/
k = 0;
j = i;
while(k<gap)
{
num = num*10+arr[j++];
k++;
}
printf("%d\n",num);
}
}
}
The method I came across uses <map> container of C++ STL. A map is actually an associative array which has a key and a value associated with that key.
Steps: (The code syntax isn't coming properly in comments so check online)
1. Copy the original list to new list with next pointers intact, leave arbit pointers for now
2. While copying make a map
map <node*, node*> map_hash
In our case map will have the key as the original list node and value as copy list node
3 Map would be somewhat like this
Original --> Copy
Node1 --> Node1(copy)
Node2 --> Node2(copy)
Node3 --> Node3(copy)
4 Start from head of copy list, corresponding to this copy list node we have the original list node in map. We can access it using an iterator.
map<node*, node*>::iterator i
i = map_hash.begin()
copy_node->arbit = map_hash.at((*i).first->arbit)
Note:- (*i).first = original_List node
(*i).first->arbit = arbitrary node pointed by (*i).first
map_hash.at() gives the value stored at that key
What we are doing is for each copy_List node we take its corresponding original_List node then find its arbit node, next we use our map to find the node in our copy_List corresponding to this arbit node and assign it to the arbit pointer of the copy_List node
Hope this will help!!!
This is the code I came up with, to print the smallest number with the digits product equal to the number.
int prod(int n, int res)
{
if(res>n)
return 0;
else if(res==n)
{
return 1;
}
for(int i=9;i>2;i--) // we take i from 9 -> 2, since the o/p will be printed from last digit onwards so it will be the minimum no.
{
res*=i; // multiply a digit
if(prod(n,res)){
cout<<i;
return 1;
}
else
res/=i; // backtrack
}
return 0;
}
int main()
{
prod(2520,1);
}
The number printed on the screen will be the result. For above case the o/p is 5789
- pulkit.mehra.001 March 11, 2014We can use DP here to reduce the time complexity from O(2^n)
1. Convert the -ve integers in the array to +ve (i.e. take their absolute values)
2. Find the sum of the array
3. if sum is odd => no solution exists, return
4. else find if a subset of array elements exists that adds up to sum/2 , use DP here to find that subset (subset sum problem)
5. if yes, then the array elements can be added to yield an effective sum of 0
Here's my code:
int main()
{
int n;
cin>>n;
int arr[n],sum=0;
for(int i=0;i<n;i++)
{
cin>>arr[i];
if(arr[i]<0)
arr[i] = -arr[i];
sum+=arr[i];
}
if(sum&1)
{
cout <<"No Possibility"<<endl;
return 0;
}
sum/=2;
bool dp[sum+1][n+1];
int i,j;
for(j=0;j<=n;j++)
dp[0][j]=true; /* result is true if sum = 0 */
for(i=1;i<=sum;i++)
dp[i][0]=false; /* result is false if sum>0 and elements are 0*/
for(i=1;i<=sum;i++)
{
for(j=1;j<=n;j++)
{
dp[i][j]= dp[i][j-1]; /*exclude current element */
if(arr[i-1]<=i)
dp[i][j] = dp[i][j]||dp[i-arr[i-1]][j]; /*include it*/
}
}
if(dp[sum][n]) /*if true*/
cout<<"Possibility exists"<<endl;
else
cout <<"No Possibility"<<endl;
}
This approach takes the cust_id's and then checks the timestapms
1. Take a cust_id
2. Check for that cust_id in the log file, for every record found compare TimeStamps, if difference is greater than a day, print the cust_id and mark that cust_id as visited(so that we do not count it again,we can use a hash table here)
3. Repeat the process for the unprocessed cust_id's
If cust_id's are integers, we can sort the file based on cust_id's and then compare the timestamps,i.e when they visit and display the result.
I know that this is not the optimal approach but I think it will work....!!
Hi Teja, your perspective is right. But they are asking for a data structure so they might be looking for how you are tackling the problem, i.e., while designing the data structure your approach etc. I think that's the only reason such questions are asked.
- pulkit.mehra.001 January 22, 2014In H2 we can include some kind of time stamp so that we can figure out if the page was visited on the same day or not.
- pulkit.mehra.001 January 22, 2014Since we are taking a pair of elements at a time => Effective total elements is n/2
And per pair 3 comparisons are made
So total comparisons are O(3n/2)
I am not quite getting how you are forming the HashTables.
For H1 - key is the username and the value stored is ??
For H2 - key is the pageString, then where are you storing the days in both H1 and H2??
I think it means any two different days(not consecutive), if a user visits three different/unique pages, then his info should be stored.
- pulkit.mehra.001 December 27, 2013I came across the following on stack Exchange, its about linked List vs Dynamic Arrays
In short:
The linked-list approach has worst-case O(1) guarantees on each operation; the dynamic array has amortized O(1) guarantees.
The locality of the linked list is not as good as the locality of the dynamic array.
The total overhead of the dynamic array is likely to be smaller than the total overhead of the linked list, assuming both store pointers to their elements.
The total overhead of the dynamic array is likely to be greater than that of the linked list if the elements are stored directly.
Neither of these structures is clearly "better" than the other. It really depends on your use case. The best way to figure out which is faster would be to time both and see which performs better.
Hope this helps!
A balanced tree is one in which
height of left subtree - height of right subtree is atmost 1.
A naive approch would be:
1. start from root
2. while an unvisited node exists, check if the subtree rooted at that node is balanced or not
3. if all such subtrees are balanced then return true else return false
int isBalanced(Node *root)
{
if(root==NULL)
return 1;
int lHeight = height(root->left);
int rheight = height(root->right);
if(abs(lHeight-rHeight) > 1)
return 0;
return isBalanced(root->left)&&isBalanced(root->right);
}
int height(Node* root){
if(root == NULL)
{
return 0;
}
return 1 + max(height(root->left),height(root->right));
}
This approach checks the height for every node. So time complexity is O(nlgn)
To reduce the time complexity we can modify the height function. We can use a flag which will be set at the level the heights mismatch. So while calculating the height of the tree it will also check if the tree is balanced or not.
int height(Node* root, int* balanced) //initially balanced = 0
{
if(root==NULL)
return 0;
int lHeight = height(root->left,balanced);
int rHeight = height(root->right,balanced);
if(abs(lHeight-rHeight)>1)
*balanced = 1; //setting the flag
return max(lHeight,rHeight)+1
}
if balanced = 1 => Tree is not height balanced
This will reduce the time complexity to O(n)
RepFranHarris, abc at ADP
I am Fran, a dedicated and experienced manager with a strong background in Monlinks. With five years of professional experience ...
We also need to keep track of whether the current element is already present on the heap.
- pulkit.mehra.001 February 12, 2016