Amazon Interview Question
Software Engineer / DevelopersTeam: AFT
Country: United States
Interview Type: In-Person
It is an association mining problem. There are many good algorithms such as "Apriori", "Frequent Pattern Mining". That could be used here.
General Idea:
1. List the items the user bought {X,Y,A,B,C} for each user in a separate line.
2. Count the subset {X,A,B,C} in all user transactions and then count the subset {X}.
3. Then you can calculate measures such as "support" and "Confidence" to measure the probability.
Shouldn't it be like:
Each item contain customer list who has purchased this item.
lets say Item X is purchased by Customer A, Customer B, Customer C & Customer D
Now, each customer also contain its itemHistory.
Customer A: X, a, b, c
Customer B: X, b, c, d
Customer C: X, e, f
Customer D: X, d, g
So if customer A buy item X, then it should be suggested a, b, c, d, e, f, g
or should customer be suggested only common items, like: b ,c ,d only ?
For case 2, there will be additional check to filter out common items. Now problem converts to finding common items in the list of list of items (list<listOfItems>).
How about we have a graph where the nodes are items and the connection between nodes denotes the count of purchases made. For example if we have three item A, B, C
customer 1 : A B C
customer 2 : A B
customer 3 : A C
Then A <--> B count will be 2, A <--> C count will be 2, B <--> C count will be 1
keep an updated Adjacency list representation of how good are related. Each Good will have a list of goods that are purchased by the same customer, also keep a count of each good in the list, return the top 3. When a user buy one new product, increase the count of it in all his other goods, increase the count of all other goods for this good.
keep an updated Adjacency list representation of how good are related. Each Good will have a list of goods that are purchased by the same customer, also keep a count of each good in the list, return the top 3. When a user buy one new product, increase the count of it in all his other goods, increase the count of all other goods for this good.
keep an updated Adjacency list representation of how good are related. Each Good will have a list of goods that are purchased by the same customer, also keep a count of each good in the list, return the top 3. When a user buy one new product, increase the count of it in all his other goods, increase the count of all other goods for this good.
1. Construct a HashMap<String, HashSet<String>> (lets say productMap) and store all the Amazon products as keys.
2. As and when a new product is launched add an entry to the above map
3. When any customer buys list of products,
- for each product (lets say p1) in that list, convert the rest of products as a set (lets say s1)
- look for p1 in productMap and get the HashSet corresponding to it (lets say s2)
- merge s1 and s2 and write back to the productMap
4. Now anytime a customer shops anything, we can just pull the corresponding Hashset of products from the productMap and display it.
so this is somewhat like desinging a recommendation engine.
Here is my approach
lets there are three users
User A : Product bought A,B,C
User B: Product bought B,F
User C : product bought D,A,C
now when User D buys product C , we need to recommend him A,D,B.
approach
create a undirected graph with adjacency linked list approach and a hashmap which gives us node reference present in graph
now for user A : graph will have node A as one node and Node B and Node C will be neighbour node
For Node B , Node A and Node C are his neibhours,
For Node C , Node B and Node A are his neighbours,
Also keep on inserting these node addresses in hashmap as well.
keep on repeating these steps on each user,
now for User D we need to traverse Node C in BFS manner at one level and give him the output.
We can modify above approach by maintaing count of products bought for better approach
now how we can scale above approach.
since there could be millions of products the hashmap may not be ideal solution.
we can switch hashmap and graph to inmemcache DB such as redis or Aerospike. Which can be scaled accordingly.
This db has the key as product Id and value as adjacent nodes which are brought together with other node present in key.
isn't it a database issue?
- samchen2009 January 31, 2014consider 3 database tables: "users", "items", "categories" and 1 association table, "deals", which contains 2 columns: user_id and item_id.
Through these 4 tables, we can have the followings associations:
* an user has bought many items.
* an item belongs to one category.
so 'Users who bought item X also bought items' can be accomplished by the following steps:
1. get users from deals table where 'item_id' = 'X'
2. for each user, find all the items he bought that belongs to the same category as 'X'.
3. merge all user's item arrays.