unknown Interview Question
Member Technical StaffsCountry: India
Interview Type: In-Person
Seems that the methods assume that the RollNo, name & address is unique.
I'll implement to support collision of RollNo, name & address as this seems to be the right approach here.
Seems simple apprach of to just use multiple hash tables to find the values for an specific input in O(1) growth function which is fast due to the large amount of data as any other method would seem slow because of the growth factor.
class StudentData
{
int RollNo;
string Name;
string Address;
}
class StudentSearcher
{
Dictinary<int, List<StudentData>> byRollNo =
new Dictinary<int, List<StudentData>>();
Dictinary<string, List<StudentData>> byName =
new Dictinary<string, List<StudentData>>();
Dictinary<string, List<StudentData>> byAddress =
new Dictinary<string, List<StudentData>>();
public StudentSearcher(IEnumerable<StudentData> data)
{
foreach(StudentData d in data)
{
if(!byRollNo.ContainsKey(d.RollNo))
{
byRollNo.Add(d.RollNo, new List<StudentData>(){d});
}
else
{
byRollNo[d.RollNo].Add(d);
}
if(!byName.ContainsKey(d.Name))
{
byName.Add(d.Name, new List<StudentData>(){d});
}
else
{
byName[d.Name].Add(d);
}
if(!byAddress.ContainsKey(d.RollNo))
{
byAddress.Add(d.Address, new List<StudentData>(){d});
}
else
{
byAddress[d.Address].Add(d);
}
}
}
public List<StudentData> getStudentByRollNo(int rollno)
{
if(this.byRollNo.ContainsKey(rollno))
{
return this.byRollNo[rollno];
}
return null;
}
public List<StudentData> getStudentsByName(String name)
{
if(this.byName.ContainsKey(name))
{
return this.byName[name];
}
return null;
}
public List<StudentData> getStudentsByAddress(String address)
{
if(this.byAddress.ContainsKey(address))
{
return this.byAddress[address];
}
return null;
}
}
Seems that a progression of this problem would be either:
#1 How would you do solve this problem but with only 4GB of memory in a single machine.
#2 How would you solve this problem with multiple machines with 4GB of memory.
#1 Can be solve two ways having three files sorting all values and do binary searh which is OK. The second way that I would personally use is getting the hashcode and create a file for every hashcode and the lookup becomes O(1).
#2 Can be solvable distributing the records across the machines getting the hashcode in a warranted algorithm that evenly distributes the hashcodes % number of machines and that machine will have the record in memory.
Loading: Use the hashcode to read the file to only get the ones that correspond to the machine.
Search: Get the HashCode to tell which machine corresponding machine that has it. Ask that machine for the record using the HashCode.
If you have enough room to have a hash map for each attribute you could do that. That would be constant time.
- Dave Humeniuk March 20, 2015