Program 1 : Calculation of Monthly Salary using

For mock calculation purposes we are implementing the addition of a location bonus and monthly bonus based on grade to all employees in addition to th...

0 downloads 77 Views 35KB Size
Program 1 : Calculation of Monthly Salary using Distributed Cache. Description: Distributed Cache Hadoop has the concept of a distributed cache which all task trackers (nodes) have access to. When we want to distribute some common data across all task trackers we go for distributed cache. When we need to distribute a file , multiple copies of the file would be maintained for all task trackers to access. When we need to look up for some references, the reference data/file would be initially posted in the distributed cache. The main point to be noted here is that the files chosen to be distributed should be very small. The maximum size of a file to be distributed in a medium range cluster shouldn’t be more than 100MB.(this value could vary from cluster to cluster) Problem Desciption: When we consider our scenario say we have 1 million employees across 25 locations spanning across 71 grades. On a very crude analysis out here, we can see that the location and grade data is relatively too small compared to Employee data. So here our approach could be like, processing the employee data that is in HDFS with the other two reference data. For mock calculation purposes we are implementing the addition of a location bonus and monthly bonus based on grade to all employees in addition to the basic salary calculation defined in the previous example. So other than the employees.txt file, we do have two more input files. 1. Location.txt which has details like location id, location name and annual bonus 2. Grade.txt which has details like Grade id , Grade and annual Bonus Mapper Read emp details read location bonus and grade bonus from location.txt and grade.txt from distributed cache calculate monthly salary Reducer Class No reducer class is required if you don’t need one, during run time the default reducer class would be substituted in map reduce execution. But the point to be noted here is that when you don’t specify a reducer class the default reducer class instantiated would have the input and output key value types same as that of the mapper’s output key value types. If you need a different key value type as reducer output then you need to define your custom reducer. Employee.txt 10007~James Wilson~L105~G06~110000~22~8 10100~Roger Williams~L103~G09~145000~20~8 Location.txt L105,Bangalore,200000 L103,Hyderabad,160000 Grade.txt G06,D3,450000 G09,F3,500000 Ouput: 10007~James Wilson~110000.0~22.0~164166.67

10100~Roger Williams~145000.0~20.0~190333.34