Book - Chapter 1 intro to big data analytics Flashcards Preview

EMCDSA > Book - Chapter 1 intro to big data analytics > Flashcards

Flashcards in Book - Chapter 1 intro to big data analytics Deck (36)
Loading flashcards...
1

What are the vs of big data

Volume. Velocity. Variety.

2

What is meta data

The minimum you should know about the data

3

What is paraders

How has the data been processed. What are the artefacts left in the data

4

What is velocity

It is speed

5

What are the three attributes that stand out of defining big data characteristics

Huge volume of data
Complexity of data types and structures
Speed of new date of creation and growth

6

What is huge volume of data

Rather than thousands of rows, big data can be billions of rows and millions of columns

7

What is complexity of data types and structures

It reflects the variety of new data sources, formats and structures, including digital traces been left on the web and other digital repositories for subsequent analysis

8

What is speed of new data creation and growth

If you describe high velocity data, the rapid data ingestion in near real-time analysis

9

What way is big data sometimes described as having

The big free v’s

10

What are the big three Vs

Volume, variety and velocity

11

Can big data be Efficiently analysed using only traditional database or methods

No it requires new tools and technologies to store, manage and realise the business benefits

12

What main two forms can big data come from

Structured and nonstructured

13

How is most of the big data formed

Usually unstructured or semistructured in nature Which requires different techniques and tools to process and analyse

14

Where does 80 to 90% of future data growth come from

Non-structured data types

15

What sort of data in addition could the RDBMS have

Quasi-or semistructured data, such as three form cell log information taking from an email ticket of the problem, customer chat history

16

What are the four parts of big data characteristics: data structures

Bottom: unstructured
Third: “is the structured
Second: semistructured
Top: structured

17

What is quasi structured

Erratic structure, Webb click

18

What is semistructured

Structure definition is embedded in the data

19

What is structured

External definition of structure

20

What does structured data consist of

A defined data type, format, and structure (transaction data online analytical processing data cubes, traditional RDBMS, CSV files and even simple spreadsheet) Excel

21

What does semistructured data consist of

Textual data files with a discernible pattern that enables passing (such as extensible markup language XML data files that are self describing and find by an XML schema)

Scripts

22

What does quasi-structured data consist of

Textual data with erratic data formats that can be formatted with effort, and time, and tools (for instance, web clckstreams data that may contain inconsistencies in data values and format)

23

What does unstructured data consist of

Text documents, PDFs, images and video i.e. data has no inherent structure

24

How can a clickstream be used

It can be passed in mind by data scientist to discover usage patterns I don’t have a relationship someone clicks and areas of interest on the website a group of sites

25

How does big data describe data

It describes new kinds of data with which most organisations may not be used to working

26

Is database administration training required to create spreadsheets

No

27

What are EDW

Enterprise data warehouse

28

What are enterprise data warehouse is critical for

Reporting and B I tasks and solve many other problems that proliferating spreadsheets introduce such as which of multiple versions of a spreadsheet is correct

29

Despite the benefits of EDW and PI what do these systems tend to restrict

The flexibility need to perform robust or exploratory data analysis

30

With the EDW model who is the data managed and controlled by

IT groups and database administrators (DBA) And data analysts who depend on IT for access and changes to the data of schemas