Apple Interview Question
Software Engineer in TestsCountry: United States
I think it's easy to use MapReduce dealing with this problem. pls correct if I'm wrong
from mrjob.job import MRJob
class linesCount(MRJob):
def mapper(self, _, value):
yield 'lines:',1
def reducer(self, lines, occurrence):
yield lines, sum(occurrence)
if __name__=='__main__':
linesCount().run()
Aggregate count of lines of all log files :
- iwanna November 17, 2014wc -l *.log | tail -1
If we assume multiple threads are reading the files, put the filenames in a queue pipe and assign files to the thread as read, Once a thread has read a file, it will fetch the next one in the pipe.