technology and tools Flashcards Preview

EMCDSA > technology and tools > Flashcards

Flashcards in technology and tools Deck (50)
Loading flashcards...
1

what is Hadoop?

open source distributed computing framework

2

what is Hadoop written in?

java

3

what are the 4 main compensates of Hadoop?

map reduce
YARN
HDFS
Hadoop common

4

what size blocks does a Hadoop distributed file system (hdfs) use?

128 mb blocks

5

in the HDFS of Hadoop is failure normal?

yes as its highly fault tolerable

6

what is the name node?

master server

7

what does the name node do?

holds file system
undertakes file and directory operations
maps blocks to datanodes

8

what is the data node?

a file split into more than one block

9

what do data nodes do?

read and write requests
reports back to namenode

10

what is bad about HDFS?

not good for small reads
not good for many small files
append not amend

11

what is map reduce?

java based programming paradigm

12

when to use map reduce?

problems that are embarrassingly parallel

13

what does the Map from map reduce do?

Performs a map function on input key-value pairs to generate intermediate key-value pairs

14

what does the reduce from map reduce do?

Performs a reduce function on intermediate key-value groups to generate output key-value pairs

15

name a case where you would use map reduce?

data mining
spam detection
ad optimisation
index building in search engines
article clustering for news
statistical machine translation

16

What does YARN stand for?

Yet another resource negotiator

17

What does yarn do?

Manages and monitors workloads

18

What are the main features of yarn?
A shared
B fast
C scalability
D flexibility
E efficiency

A
C
D
E

19

What is pig

Data flow language

20

What is hive/hiveQL

SQL style query language

21

What is hbase

Column-orientated database

22

What is mahout

Machine learning library

23

What is spark

In memory processing

24

In Hadoop what are the data ingestion programs
Flume
Hbase
Sqoop
Storm

Flume
Sqoop
Storm

25

In Hadoop what are the analytic and machine learning programs
Spark
Giraph
Mahout

Giraph
Mahout

26

What are the no sql programs on Hadoop
Tez
Hbase
Cassandra
Spark

Hbase
Cassandra

27

In Hadoop what programs are the engines
Spark
Storm
Tex

Spark
Tez

28

What is zookeeper in hadoop

Cluster and workflow management

29

What does hive do?

Coverts sql queries into java jobs

30

What does hbase allow you to do?

Read/write operations on large datasets and works in real time