Facebook Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
You can easily keep millions of userid's in memory. Assume 64bit unsigned number represents a user id. If 1 billion users are online at a given time, that is 1 Billion * 8 Bytes = 8GB of data. Assume next pointers cost another 8GB, that's still 16GB. But I think the answer they were looking for was the distributed solution.
This is an implementation of the Push/Pull Model. The first time you go online you pull the data of your online friends, there after you push your name into the Active Maps of all the Users. When thier status changes they push their status to all the people in the Map. Read Observer/Observable design pattern for this.
Assuming
1. There are hundreds/thousands of servers on the production fabric, each of which handle a bunch of users.
2. Each host maintains a list of active users it is handling, say, in a simple enough data structure
*Broadcast_*
1. Each server when brought online or when a new user logs in, broadcasts a packet (with user data) for each of the active users it is handling.
2. And waits for at least N acknowledgements for each packet - N is to be configured
3. An acknowledgement is sent by a server as soon as it has received and stored the data (inside the packet) in it's online_users_hash_table.
*Search_*
1. When a user logs in on one of the servers, a lookup is made in the local hash table, when no data is found for a subset or all of users, sends a broadcast message for each of the user it is looking for.
2. And waits for a response from any of the servers.
3. Any server can respond..
3.1 If the packet is not older than T seconds - T is to be configured.
3.2 If it has the status of the user requested in the packet in its local hash table.
4. All users online (indicated by the response packets) are hashed locally.
5. If there is no response for a packet sent for user U even after T2 seconds, it means he is not online.
They have a lot of users shared on a lof of servers.
In realtime system you do not have time to scan either a single DB or a number of DB in a distributed cluster for checking status of the each member in a friendlist.
1. For each user store it's online friends list with the user's data. Do not store one online friend in one DB row. Try to keep list as an JSON array in NoSql DB near by or inside th user data.
2. List<user_id> getOnlineFriends(user_id):
2.1 single access to the specified user data described above.
3. setOnline(user_id):
3.1 getFriendIds(user_id) to 'friends'.
3.2 for each 'fnext' in 'friends'
add task in queue to update online friends list of the 'fnext' with the user 'user_id'.
4. You have a distributed DB so it is not strognly consistent and eventual consistency is enough for a such data as online friends list.
maybe something like this?
public class CurrentlyOnline {
// global list of all the online users in the system
private List<Long> currentlyOnlineUsers;
// a singleton class
private static CurrentlyOnline instanceHolder;
private CurrentlyOnline() {
currentlyOnlineUsers = new ArrayList<Long>();
}
public static CurrentlyOnline getInstance() {
if (instanceHolder == null) {
instanceHolder = new CurrentlyOnline();
}
return instanceHolder;
}
public List<Long> getOnlineFriends(long user_id) {
List<Long> currentUserFriendsList = new ArrayList<Long>();
// get the user's friends list
currentUserFriendsList = getFriendIds(user_id);
Long currentFriendID;
List<Long> response = new ArrayList<Long>();
for (int i = 0; i < currentUserFriendsList.size(); i++) {
currentFriendID = currentUserFriendsList.get(i);
if (currentlyOnlineUsers.contains(currentFriendID)) {
response.add(currentFriendID);
}
}
return response;
}
public void setOnline(long user_id) {
currentlyOnlineUsers.add(user_id);
}
}
We can not keep all the user online status in a hash map.. it will simply go out of memory for millions of users.
- HauntedGhost February 15, 2013The idea is to implement a hashmap only, but distribute it across multiple servers just like Distributed Hash Tables (DHT). In DHT, we can setup multiple servers (nodes) and assign them the key space ( for eg. 1-100 user ids in node n1, 101-200 user ids in node n2 and so on) and then to set the online status of a user, you can simple send request any node, it will take care of routing the request to the actual node. Same with the getOnlineFriends status.