Data analytics lifecycle Flashcards Preview

EMCDSA > Data analytics lifecycle > Flashcards

Flashcards in Data analytics lifecycle Deck (32)
Loading flashcards...
1

What are the main reasons to use frameworks?

efficient use of time
nothing gets forgotten
scale projects

2

why use frameworks in data science?

acts as a guide
ensure focus is on ds not bi
needs a collaborative approach

3

what are the 2 key project roles that get a sponsor presentation?

Business user
project sponsor

4

what are the 2 key project roles that get the code and technical documents?

data engineer
data scientist

5

what are the 2 key project roles that get an analyst presentation?

BI analyst
Database administrator

6

what are the 6 key project roles?

business user
project sponsor
project manager
bi analyst
data engineer
database administrator
ds

7

what is the data lifecycle? (6 phases)

discovery
data prep
model planning
model building
communicate results
operationalise

8

In discovery what are the seven main areas?

learn business domain
learn from the past
resources
frame the problem
interviewing
formulate initial hypothesis
identify data sources

9

In discovery learn the domain - what do you not need to do?
A)determine amount of domain knowledge
B) determine general analytic problem
C) decide what technique to use
D)if you have no idea. Conduct research.

C) decide what technique to use

10

In discovery learn from the past what do you need to do?

have there been any previous attempts
why did they fail?

11

who is a business user?

someone who benefits from end results

12

who is the project sponsor?

person responsible for genesis of the project

13

who is a project manager?

ensure key milestones are met

14

who is the BI analyst?

business domain expert

15

who is the data engineer?

deep technical skills

16

who is the DBA?

provisions and configures database

17

who is the DS?

SME for techniques for overall analytic objectives being met

18

what is crisp DM?

cross-industry process for data mining

19

what are the 6 phases of CRISP-DM?

business understanding
data understanding
data prep
modeling
evaluation
deployment

20

In discovery resources what do you need to access

available tech
data
people
time

21

In discovery frame the problem what are the objectives

What is the goal
What is the failure criterion
Identify the success criteria

22

In discovery formulate initial hypotheses what do you need to do? (2)?

gather and assess hypothesis
data exploration to inform discussions

23

In discovery identify data sources what do you need to do? (4)

aggregate sources
review the raw data
determine the structures and tools
scope the kind of data needed

24

How big is an analytical sandbox?

10x

25

In data prep what are the phases?(5)

prepare sandbox
perform ELT
familiarise with the data
data conditioning
survey and visualise

26

in model planning what are the phases? (6)

determine methods
techniques and workflow
data exploration
variable selection
model selection
test & train

27

how much time is spent in data prep?
a) 50%
b) 60%
c)70%

70%

28

what should you do in communicate results?

make recommendations
compare results
identify key findings

29

what should you do in operationalise?

run a pilot
assess benefits
implement model

30

why run a pilot?

make sure the model is robust