Book - Chapter 10 mapreduce and Hadoop Flashcards Preview

EMCDSA > Book - Chapter 10 mapreduce and Hadoop > Flashcards

Flashcards in Book - Chapter 10 mapreduce and Hadoop Deck (23)
Loading flashcards...
1

What can the map reduce paradigm offer

It’s offers a means to break a large task into smaller tasks, run tasks in parallel, and consolidate the outputs of the individual tasks into the final output

2

What are examples of Map reduce

IBM, LinkedIn, Yahoo

3

Map reduce consists of two basic parts

Map and reduce

4

What does the map part of map reduce do

Applies an operation to a piece of data. Provide some intermediate output

5

What does the reduce part of a map reduce do

Consolidate the intermediate outputs from the map steps. Provides the final output

6

What did Grace Hopper do

Described that you don’t build a bigger more expensive machine you add more machines instead

7

What is the HDFS based on

Google file system

8

HDFS depends on disks doing what

Each disk drives file system to manage the data being stored to the drive media

9

How does hadoop file system store blocks

In blocks of 64 MB or 128 MB

10

How many copies of each block is there

Three copies

11

What does the name node do

Determines and tracks where the various blocks of datafile are stored

12

What does the data node to

Manages the data stored on each machine

13

What is a secondary name node

Provides a capability to perform some of the name node tasks to reduce the load on the name node

14

What free classes are typical in the mapreduce in Java

The driver, the mapper, and the reducer

15

What is hadoop streaming API

Allows the user to write and run Hadoop jobs with no direct knowledge of Java

16

What is pig

High-level data flow programming language

17

What is hive

SQL like access

18

What is mahout

Provides analytical tools

19

What is H base

Provides real-time read and write

20

What is the dataflow language in pig

Pig Latin

21

What are the three main characteristics of pig

Ease of programming, behind-the-scenes code optimisation, and extensibility of capabilities

22

Pick allows execution of user defined functions what are these known as

UDFs

23

If you have a table structure what tool might you use

Hive