Book - Chapter 1 intro to big data analytics Flashcards Preview

EMCDSA > Book - Chapter 1 intro to big data analytics > Flashcards

Flashcards in Book - Chapter 1 intro to big data analytics Deck (36)
Loading flashcards...

What are the vs of big data

Volume. Velocity. Variety.


What is meta data

The minimum you should know about the data


What is paraders

How has the data been processed. What are the artefacts left in the data


What is velocity

It is speed


What are the three attributes that stand out of defining big data characteristics

Huge volume of data
Complexity of data types and structures
Speed of new date of creation and growth


What is huge volume of data

Rather than thousands of rows, big data can be billions of rows and millions of columns


What is complexity of data types and structures

It reflects the variety of new data sources, formats and structures, including digital traces been left on the web and other digital repositories for subsequent analysis


What is speed of new data creation and growth

If you describe high velocity data, the rapid data ingestion in near real-time analysis


What way is big data sometimes described as having

The big free v’s


What are the big three Vs

Volume, variety and velocity


Can big data be Efficiently analysed using only traditional database or methods

No it requires new tools and technologies to store, manage and realise the business benefits


What main two forms can big data come from

Structured and nonstructured


How is most of the big data formed

Usually unstructured or semistructured in nature Which requires different techniques and tools to process and analyse


Where does 80 to 90% of future data growth come from

Non-structured data types


What sort of data in addition could the RDBMS have

Quasi-or semistructured data, such as three form cell log information taking from an email ticket of the problem, customer chat history


What are the four parts of big data characteristics: data structures

Bottom: unstructured
Third: “is the structured
Second: semistructured
Top: structured


What is quasi structured

Erratic structure, Webb click


What is semistructured

Structure definition is embedded in the data


What is structured

External definition of structure


What does structured data consist of

A defined data type, format, and structure (transaction data online analytical processing data cubes, traditional RDBMS, CSV files and even simple spreadsheet) Excel


What does semistructured data consist of

Textual data files with a discernible pattern that enables passing (such as extensible markup language XML data files that are self describing and find by an XML schema)



What does quasi-structured data consist of

Textual data with erratic data formats that can be formatted with effort, and time, and tools (for instance, web clckstreams data that may contain inconsistencies in data values and format)


What does unstructured data consist of

Text documents, PDFs, images and video i.e. data has no inherent structure


How can a clickstream be used

It can be passed in mind by data scientist to discover usage patterns I don’t have a relationship someone clicks and areas of interest on the website a group of sites


How does big data describe data

It describes new kinds of data with which most organisations may not be used to working


Is database administration training required to create spreadsheets



What are EDW

Enterprise data warehouse


What are enterprise data warehouse is critical for

Reporting and B I tasks and solve many other problems that proliferating spreadsheets introduce such as which of multiple versions of a spreadsheet is correct


Despite the benefits of EDW and PI what do these systems tend to restrict

The flexibility need to perform robust or exploratory data analysis


With the EDW model who is the data managed and controlled by

IT groups and database administrators (DBA) And data analysts who depend on IT for access and changes to the data of schemas