Tagged: Map Reduce

Hadoop makes big data easier to manage

Hadoop: A platform that makes big data easier to manage.

Two fundamentals of Haddop-

1. Large File Storage: Hadoop allows store files bigger than what can be stored on server. You can store many large files.

2. Fast Data Processing: Hadoop provides a framework for processing the data. That’s called MapReduce. Moving large data sets or opening a big file on laptop takes a long time. So rather than move the data to the software, MapReduce moves the processing software to the data.

GFS: Google File System
GFS is a scalable distributed file system for large data intensive applications. Google implemented a programming model called MapReduce, which could process this 20000 PB per day. Google ran these MapReduce operations on a special file system called Google File System (GFS).

HDFS: Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. Doug Cutting and Yahoo! reverse engineered the model GFS and built a parallel Hadoop Distributed File System (HDFS). The software/framework that supports HDFS and MapReduce is known as Hadoop.

What Is Apache Hadoop?
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Official site of the Apache Hadoop project: http://hadoop.apache.org

Note: Hadoop is an open source and distributed by Apache. GFS is not an open source.
Doug Cutting was an employee of Yahoo!, where he led the project full-time.