Chapter 7: Processing Data Flashcards

1
Q

desktop software

A

applications that assist one user in performing certain tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

enterprise software

A

contains applications that assist multiple users within an organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

embedded software

A

designed for a specific purpose and that is often embedded in physical products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

firmware

A

software that is stored on non-volatile memory cards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

transaction processing system (TPS)

A

operational system that records data about fundamental activities within the organisation (can be to a specific department)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

batch processing

A

data is stored in a temporary storage and then processed as a single unit –> money transfers from banks (processing takes time and therefore delays occur)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

online transaction processing (OLTP)

A

data is immediately processed so that the current state of the system is always refelcted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

enterprise systems

A

made to combine collected and processed data from various departments of the company into a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

enterprise resource planning

A

integrates the core functions of an organisation into a homogeneous system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

customer relationship management

A

integrates data from customers that can be used by different departments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Database management systems (DBMS)

A

collects and disseminates information that is created and used by multiple apps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data warehouse

A

collect data and store it from various core transaction systems throughout the organisation and provide analyses and reporting tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

e-discovery

A

information needs to be identified and recalled from archives for supporting lawsuits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

data mart

A

subset of data stored in a data warehouse –> contain a very concentrated part of the data of the organisation. Used to perform analyses on the processes to gain insight into a company

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

data aggregators

A

companies that are purely focussed on collecting and selling data to other companies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

business intelligence tools

A

tools that help to merge, analyse and access data with the aim to support organisational decision making

17
Q

ad hoc reporting tools

A

enable users to create their own report and easily modify them

18
Q

online analytical processing (OLAP)

A

data is extracted from traditional databases, calculated, summarised and stored in data cubes

19
Q

data cubes

A

special databases that structure data across multiple dimensions, such as place, products and time

20
Q

legacy systems

A

obsolete information systems that are not designed to share data, are not compatible with new technologies and are not aligned with the current needs of an organisation

21
Q

data mining

A

using specific algorithms to detect hidden patterns and make models suitable for large data sets –> data must be consistent and clear and events in the data must reflect current and future trends

22
Q

over-engineering

A

when so many variables are included in a model that the solution found probably only works in the subset of data with which the solution was found

23
Q

association rule mining

A

tries to identify the most common affinities between items

24
Q

market basket analysis

A

looks at all individual transactions of a customer and then examines which products are bought together

25
Q

support s(X)

A

the fraction of transactions that contains a certain set of items X –> the number of times a certain combination occurs divided by the total transactions

26
Q

confidence c(X–>Y)

A

the fraction of transactions containing Y from the group of transactions containing X –> the number of times a combination occurs divided by the number of times that another product is purchased with this combination

27
Q

pruning

A

used to identify with high support. Within these bundles, looks for those with high confidence

28
Q

clustering

A

tries to minimise the sum of the distance between the core of the cluster and all observations belonging to this cluster

29
Q

K-means clustering

A

each data point is allocated to the nearest cluster centre and then the cluster centre is moved to minimise the total distance between the points

30
Q

Characteristics of big data

A
  1. Velocity: speed in which data must be generated
  2. Volume: size of the dataset that needs to be processed
  3. Variety: different formats and characteristics of data
  4. veracity: reliability of data
31
Q

analytics

A

combine classical statistics with artificial intelligence to derive achievable insights from big data

32
Q

machine learning

A

large amounts of data are used so that computers can improve the accuracy of actions and predictions without extra programming

33
Q

neural networks

A

are trained to use large historic data sets and to find patterns in them so that a model can be built that exploits the findings –> accuracy of the findings increases as the action is repeated

34
Q

expert systems

A

use rules or examples to finish tasks that imitate human expertise

35
Q

genetic algorithms

A

computers investigate possible solutions to a problem

36
Q

HADOOP

A

used for storage and analysis of large datasets

37
Q

4 advantages of HADOOP

A
  1. scalability
  2. flexibility
  3. cost efficiency
  4. fault tolerance
38
Q

blackbox method

A

hard to quantify the impact of a certain input variable on the outcome