week 3 - Why Distribution? Flashcards Preview

Data Processing at Scale > week 3 - Why Distribution? > Flashcards

Flashcards in week 3 - Why Distribution? Deck (22)
Loading flashcards...
1
Q

Four items that make up a distributed database system

A

1) Availability
2) Scalability
3) Reliability/ Fault Tolerance
4) Transparency

2
Q

Why do corporations use distribution in databases?

1) It prevents data collisions for quicker updates.
2) It helps by allowing a better understanding of data around data within an organization.
3) Distributed databases can avoid large traffic because the replicated data can be accessed locally.

A

3) Distributed databases can avoid large traffic because the replicated data can be accessed locally.

3
Q

1.Question 1

What does transparency refer to in database terminology?

1) Separation of higher level semantics of a database system from the lower level implementation issues.
2) The ability of users to see the details of how a database system works.
3) The ability of users to manage the details of lower level database implementation.
4) The ability of users to see where pieces of data are stored in a distributed database system.

A

1) Separation of higher level semantics of a database system from the lower level implementation issues.

4
Q

What does scalability refer to in database terminology?

1) The ability of a database system to expand and serve more users.
2) The ability of a database system to provide faster access speed.
3) The ability of a database system to provide better access with a minimum quality of service guarantee.
4) The ability of a database system to provide the same level of service with a lower cost.

A

1) The ability of a database system to expand and serve more users.

5
Q

What is Horizontal Fragmentation

A

Split rows of the table into two(+) rows

say half the rows go into one table and half go into another table

6
Q

What is Vertical Fragmentation

A

Spit columns of the table into two(+) tables

7
Q

Which comes first Fragmentation or Replication

A

Fragmentation

8
Q

Three main properties of good Fragmentation

A

1) Completeness
2) Reconstruction
3) Disjointness

9
Q

Completeness of Fragmentation

A

no data item would be lost

10
Q

Reconstruction of Fragmentation

A

You need some element that you can use to create the relationship

11
Q

Disjointness of Fragmentation

A

No supplicates

12
Q

Why is fragmentation a useful concept in distributed database design?

1) It makes data easier to store through data chunking.
2) It reduces disk space utilization and allows for easy access to data.
3) It allows data to be quickly archived in the cloud.

A

2) It reduces disk space utilization and allows for easy access to data.

13
Q

database selection symbol

A

Sigma(σ)

14
Q

simple predicates

A

From original table (R) (maybe what they are using to for a key)

15
Q

minterm predicates

A

What are the fragments (what you use to filter out the selection ?)

16
Q

Derived Horizontal Fragmentation

A

Derived Horizontal Fragmentation
Is defined on a member relation of a link according to a selection operation specified on its owner• It is important to remember two points – First, the link between the owner and the member relations is defined as an equi-join

Create new join tables

17
Q

Two advantages of replication

A

1) Increased availability

2) Faster query evaluation

18
Q

3 Disadvantage of f replication

A

1) Updates are challenging
2) Transaction processing
3) Concurrency control can be an issue

19
Q

What is a sharded deployment of a database?

1) A partial replication where each fragment resides at one site.
2) Each fragment resides in one site only.
3) A full replication where each fragment resides at each site.

A

2) Each fragment resides in one site only.

20
Q

Is replication good if you have a lot of queries

A

Yes

21
Q

What is the main disadvantage of full replication?

1) It requires less disk space to perform.
2) The process to perform a single update is slower since it must be updated on different databases to keep the copies consistent.
3) There is less data movement over a network.

A

2) The process to perform a single update is slower since it must be updated on different databases to keep the copies consistent.

22
Q

Best Questions 3 for placement of Fragments

A

1) minimize query response time
2) maximize throughout?
3) minimize some cost?