Pages

Saturday, 14 June 2014

Why to use Hadoop Framework?

Let take an example where there are several files to be processed on a single machine.


  • Lets now assume that several files are being processed and one of the file is long enough to dominate the process and may eat up the process leaving other files unprocessed.
  • Hence we now decide to divide these files into chunks of equal size .This will ensure that all the files get same amount of processing time.
  • Being on a single machine we can achieve scalability by allotting each chunk to a thread.
  • Now imagine a larger data set ,such a huge data set may challenge the processing capacity of a single machine and this where Hadoop framework comes into picture.
  • Although paralleling is feasible,its tough practicing it
  • Hadoop Framework may help us achieve distributed processing huge data sets by implementing several commodity hardwares within a cluster.

Hadoop framework supports a programming model to achieve parallelism and this programming model is MapReduce.
MapReduce helps to analyse large scale data having enough number of machines deployed. 

No comments:

Post a Comment