Big data analytics and data science role Flashcards Preview

EMCDSA > Big data analytics and data science role > Flashcards

Flashcards in Big data analytics and data science role Deck (29)
Loading flashcards...
1

What are the 5 v's?

Volume
Velocity
variety
Value
Veracity

2

What does Veracity mean?

willingness to believe data is good

3

What are the 2 types of data?

Meta and para

4

What does metadata mean?

minimum you should know about the data

5

What does paradata mean?

how has the data been processed

6

What are the 4 data structures?

structured
semi-structured
quasi-structured
unstructured

7

what are the data repositories?

data islands
data warehouses
analytic sandbox

8

what is a data island?

isolated datamarts. record keeping in spreadsheets and low volume DBMS

9

what is a data warehouse?

centralised data repository. Supports BI and reporting

10

what is an analytic sandbox?

assets from multiple sources ready for analysis

11

What are the three big data project success factors?

timely decision making
processing throughout
flexibility

12

what three ways does an analytic sandbox support big data success factors?

provides high performance analysis
ingests data from different sources
owned by the DS rather than IT

13

What are the business drivers of big data/data science?

optimise business processes
predict new business opportunities
mitigate business risk
meet legal and regulatory requirements

14

what are the four parts of the big data ecosystem?

data devices
data collectors
data aggregators
data users/buyers

15

what are data devices in the big data ecosystem

they continuously gather data about the world (phones)

16

what are data collectors in the big data ecosystem

interact with many organisations and institutions. Provides them with information to access their services

17

what are data aggregators in the big data ecosystem

take data from multiple sources and combine and enrich them to provide data to consumers

18

what are data users/buyers in the big data ecosystem

users consume data from their own sensor net and data collector along with data acquired from data aggregators to help form data decision making

19

which is in the past and which is in the future?

BI - the past
DS - the future

20

What are the 4 key roles within DS?

analytical talent
data savvy professionals
technology enablers
knowledge engineers

21

what does a knowledge engineer do?

wrangle the data ready for projects to consume

22

what are the 5 things a data scientist should be?

quantitative - do math
curious and creative
technical - code
skeptical - question
communicate and collaborate

23

what are the 3 original V words?

Volume
Variety
Velocity

24

Describe semi-structured data?

XML (Coding)

25

Describe Quasi- structured data?

web clickstream

26

Big data uses ELT what does it mean?

extract load transform

27

what is a data savvy professional?

intro of understanding of DS

28

what is an analytical talent?

training in quantitative methods

29

what is a technology enabler?

looks at hardware and software