Suppose you have a huge data o

unknown Interview Question for Member Technical Staffs

0

of 0 votes

3
Answers
Suppose you have a huge data of students. This data is in RAM (around 48 GB). Student has following attributes:
1) RollNo
2) Name
3) Address

Now I need to implement three method:

getStudentByRollNo(int rollno)
getStudentsByName(String name)
getStudentsByAddress(String address)

In what data structure I can keep these students so that these methods can return the results really fast.
- guptasunny158 March 20, 2015 in India | Report Duplicate | Flag | PURGE
unknown Member Technical Staff Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

If you have enough room to have a hash map for each attribute you could do that. That would be constant time.

- Dave Humeniuk March 20, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Seems that the methods assume that the RollNo, name & address is unique.

I'll implement to support collision of RollNo, name & address as this seems to be the right approach here.

Seems simple apprach of to just use multiple hash tables to find the values for an specific input in O(1) growth function which is fast due to the large amount of data as any other method would seem slow because of the growth factor.

class StudentData
{
	int RollNo;
	string Name;
	string Address;
}
class StudentSearcher
{
	Dictinary<int, List<StudentData>> byRollNo = 
		new Dictinary<int, List<StudentData>>();

	Dictinary<string, List<StudentData>> byName = 
		new Dictinary<string, List<StudentData>>();

	Dictinary<string, List<StudentData>> byAddress = 
		new Dictinary<string, List<StudentData>>();

	public StudentSearcher(IEnumerable<StudentData> data)
	{
		foreach(StudentData d in data)
		{
			if(!byRollNo.ContainsKey(d.RollNo))
			{
				byRollNo.Add(d.RollNo, new List<StudentData>(){d});
			}
			else
			{
				byRollNo[d.RollNo].Add(d);
			}

			if(!byName.ContainsKey(d.Name))
			{
				byName.Add(d.Name, new List<StudentData>(){d});
			}
			else
			{
				byName[d.Name].Add(d);
			}

			if(!byAddress.ContainsKey(d.RollNo))
			{
				byAddress.Add(d.Address, new List<StudentData>(){d});
			}
			else
			{
				byAddress[d.Address].Add(d);
			}

		}
	}

	public List<StudentData> getStudentByRollNo(int rollno)
	{
		if(this.byRollNo.ContainsKey(rollno))
		{
			return this.byRollNo[rollno];
		}
	
		return null;
	}
	public List<StudentData> getStudentsByName(String name) 
	{
		if(this.byName.ContainsKey(name))
		{
			return this.byName[name];
		}
	
		return null;
	}

	public List<StudentData> getStudentsByAddress(String address)
	{
		if(this.byAddress.ContainsKey(address))
		{
			return this.byAddress[address];
		}
	
		return null;
	}
}

- Nelson Perez March 21, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Seems that a progression of this problem would be either:
#1 How would you do solve this problem but with only 4GB of memory in a single machine.
#2 How would you solve this problem with multiple machines with 4GB of memory.

#1 Can be solve two ways having three files sorting all values and do binary searh which is OK. The second way that I would personally use is getting the hashcode and create a file for every hashcode and the lookup becomes O(1).

#2 Can be solvable distributing the records across the machines getting the hashcode in a warranted algorithm that evenly distributes the hashcodes % number of machines and that machine will have the record in memory.

Loading: Use the hashcode to read the file to only get the ones that correspond to the machine.

Search: Get the HashCode to tell which machine corresponding machine that has it. Ask that machine for the record using the HashCode.

- Nelson Perez March 21, 2015 | Flag

CareerCup

unknown Interview Question for Member Technical Staffs

Books

Videos

Resume Review

Mock Interviews