Problem : We want our mapper to receive 3 records ( 3 lines ) from the source file at a time instead on 1 line as provided by default by the TextInputFormat.
Approach :
- We will extend from TextInputFormat class to create our own NLinesInputFormat .
- We will also create our own RecordReader class called NLinesRecordReader where we will implement the logic of feeding 3 lines/records at a time.
- We will make a change in our driver program to use our new NLinesInputFormat class.
- To prove that we are really getting 3 lines at a time, instead of actually counting words ( which we already know now how to do ) , we will emit out number of lines we get in the input at a time as a key and 1 as a value , which after going through reducer will give us frequency of each unique number of lines to the mappers.
No comments:
Post a Comment