Intro to Hadoop and MapReduce
Summary
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.
Expected Learning
- How Hadoop fits into the world (recognize the problems it solves)
- Understand the concepts of HDFS and MapReduce (find out how it solves the problems)
- Write MapReduce programs (see how we solve the problems)
- Practice solving problems on your own
Syllabus
Lesson 1
What is "Big Data"? The dimensions of Big Data. Scaling problems. HDFS and the Hadoop ecosystem.
Lesson 2
The basics of HDFS, MapReduce and Hadoop cluster.
Lesson 3
Writing MapReduce programs to answer questions about data.
Lesson 4
MapReduce design patterns.
Final Project
Answering questions about big sales data and analyzing large website logs.
Required Knowledge
Lesson 1 does not have technical prerequisites and is a good overview of Hadoop and MapReduce for managers.
To get the most out of the class, however, you need basic programming skills in Python on a level provided by introductory courses like our Introduction to Computer Science course.
To learn more about Hadoop, you can also check out the book Hadoop: The Definitive Guide.
Free
Intermediate
4 weeks
Sarah Sproehnle
Cloudera
Coursearena