Flashcards in Chapter 17 Understanding Further Optimization Aspects Deck (44)
What are iterators?
SQL Server executes a query by using a set of physical operators. Because these operators iterate through rowsets, they are also called iterators.
If a table is organized as a heap, what is the only access method available?
If a table is organized as a heap, then the only access method available to SQL Server is a table scan.
What is a table scan?
In a table scan, SQL Server literally scans every data page that was allocated by the heap. Any query against the heap causes a table scan operation.
Even if you select only a few columns from this table, and even if you use a very selective WHERE clause that limits the result set to a single row, SQL Server uses a Table Scan iterator to retrieve the data.
To determine which pages have been allocated for the heap, SQL Server must refer back to the IAM. A table scan is known as an "allocation order scan" since it uses the IAM to scan the data pages in the order in which they were allocated by SQL Server.
Note that SQL Server can use the allocation order scan when a table is clustered as well.
An allocation order scan is faster if a table is less physically fragmented. Allocation order scans are not affected by logical fragmentation.
When using a clustered table, what is the difference between an "allocation order scan" and an "index order scan"?
When SQL Server performs an unordered clustered index scan (i.e. the query does not request any specific order), it refers back to the IAM to scan the data pages of the clustered index using an allocation order scan (just like a table scan).
However, SQL Server has another option for clustered indexes, the ordered clustered index scan (aka the index scan). In an index scan, SQL Server can follow the doubly linked list at the leaf node level instead of referring back to the IAM. The scan is performed in clustered index order.
In each of these cases, the Clustered Index Scan iterator is used.
What iterator is used to scan a nonclustered index?
SQL Server uses the Index Scan iterator to scan a nonclustered index.
As with the Clustered Index Scan iterator, SQL Server can perform an allocation or an index order scan when scanning a nonclustered index.
Can SQL Server Query Optimizer cover a query with multiple nonclustered indexes?
Recall that a nonclustered index can cover a query (included columns, etc). Covering a query means that SQL Server can find all data needed for the query in a nonclustered index and does not need to do any lookups in the base table.
In some cases, the Query Optimizer can even decide to cover a query with multiple nonclustered indexes. SQL Server can join nonclustered indexes. All nonclustered indexes of a table always have some data in common that can be used for a join. If a table is organized as a heap, then this data is the RID; if the table is clustered, the this is the clustering key.
Can allocation order scans be unsafe?
Yes. With an allocation order scan, SQL Server can skip some rows and read other rows multiple times.
This can happen if you use the Read Uncommitted isolation level in a read-write environment.
While one query is performing an allocation order scan, another command could update the data and cause movement of one or more rows. The scanning query might have already read these rows and could read the rows again after movement. Or the scanning query might already have passed the page to which a row was moved from a page that was not scanned yet and the scanning query never reads this row.
Why might a row be moved?
A row can move because of multiple causes. For example, a command might update a variable length column and replace a short value with a long one. A page might be full, and thus the updated row has to move to another page.
A set of rows can move if there are inserts in a clustered table and a page split occurs. Approx. half of the rows are moved to the new page.
What is a "seek and partial scan"?
When you scan an index, SQL Server is not limited to a full scan. If you limit a rowset returned by a query and the scan is ordered, then SQL Server can seek for the first value of the rowset needed and then perform a partial scan of subsequent values in the logical order of an index.
SQL Server can use a seek and partial scan for both clustered indexes and covering nonclustered indexes.
After the first record is found, SQL Server does not use seek for subsequent orders; instead it performs a partial scan.
What is the most common access method SQL Server uses in OLTP environments?
A nonclustered index seek with an ordered partial scan and then a lookup into the base table for each row found in the nonclustered index. Such plans are common for selective queries. The base tables can be organized as a heap or as a balanced tree.
When the table is organized as a heap, SQL Server uses the RID Lookup operator to retrieve the rows from the base table.
If the table is clustered, then SQL Server uses the Key Lookup operator instead of the RID lookup operator.
What is a partial scan?
Where SQL Server seeks to find the first qualifying row and then scans to the end of the qualifying range.
What are the basic JOIN algorithms?
SQL Server supports 3 basic join algorithms: nested loops, merge joins, and hash joins. A hash join can be further optimized by using a bitmap filtering (i.e. a bitmap filtered hash join).
How does the nested loops algorithm work?
The nested loops algorithm is very simple, and in many cases, efficient algorithm. SQL Server uses one table for the outer loop, typically the table with fewer rows. For each row in this outer input, SQL Server seeks for matching rows in the second table which is the inner table.
SQL Server uses a join condition to find the matching rows. The join can be a non-equijoin meaning that the Equals operator does not need to be part of the join predicate.
If the inner table has no supporting index for a seek, the SQL Server scans the inner input for each row of the outer input. This is not an efficient scenario. A nested loops join is efficient when SQL Server can perform an index seek in the inner input.
How does the merge join algorithm work?
Merge join is a very efficient algorithm. However, it has its own limitations. It needs at least one equijoin predicate and sorted inputs from both tables. This means that the merge join should be supported by indexes on both tables involved in the join.
If one input is much smaller than another, then the nested loops join could be more efficient than a merge join. In other words, it works better on larger tables.
In a one-to-one or one-to-many scenario, the merge join scans both inputs only once. It starts by finding the first rows on both sides. If the end of the input is not reached, the merge join checks the join predicate to determine whether the rows match. If the rows match, they are added to the output. Then the algorithm checks the next rows from the other side and adds them to the output if they match the predicate.
If the rows from the inputs do not match, then then algorithm reads the next row from the side with the lower value. It reads from this side and compares the row to the row from the other side until the value is bigger than the value from the other side. Then it continues reading from the other side and so on.
In a many-to-many scenario, the merge join algorithm uses a work table to put the rows from one input side aside for reusage when duplicate matching rows from the other input exist.
How does the hash join algorithm work?
If none of the inputs is supported by an index and an equijoin predicate is used, then the hash join algorithm might be the most efficient one. It uses a searching structure named a hash table which SQL Server builds internally. It uses a hash function to split the rows from the smaller input into buckets. This is called the "build" phase.
SQL Server uses the smaller input for building the hash table because it wants to keep the hash table in memory. If it needs to get spilled to disk, then the algorithm might become much slower. The hash function creates buckets of approx equal size.
After the hash table is built, SQL Server applies the hash function on each of the rows from the other input. It checks to see into which bucket the row fits. Then it scans through all rows from the bucket. This phase is called the "probe" phase.
In a single-thread mode hash join is usually slower than merge and nested loops join that are supported by indexes. However, SQL Server can split rows from the probe input in advance, and perform partial joins in multiple threads. The hash join is actually very scalable. This kind of optimization of a hash join is called a bitmap filtered hash join. It is typically used in a data warehousing scenarios where you have large inputs fro a query and a few concurrent users, so SQL Server can execute a query in parallel.
Although a regular hash join can be executed in parallel as well, the bitmap filtered hash join is more efficient because SQL Server can use bitmaps for early elimination of rows not used in the join from the bigger table.
Which join algorithm is the only one to support non-equijoins?
Summarize the characteristics of the three join types?
Index on Joining Columns:
Inner table: Not indexed
Outer table: Optional
Optimal condition: Small outer table, large inner table.
Usual Size of Joining tables: Any
Join Clause: Equijoin
Index on Joining Columns:
Both tables: must
Optimal condition: Clustered or covering index on both.
Usual Size of Joining tables: Large
Join Clause: Equijoin
Nested loops -
Index on Joining Columns:
Inner table: must
Outer table preferable.
Usual Size of Joining tables: Small
Join Clause: All
When does SQL Server use the Sort operator?
SQL Server uses the Sort operator whenever it has to sort an input. There may be a number of reasons to sort the input. For example, SQL Server might decide to sort an input so it can use the merge join algorithm. A very typical example of Sort operator usage is for queries that request an ordered rowset when the order is not supported by an index.
The sort operation could be very expensive; for good performance, you should make sure that the Sort operator used for small inputs only.
What two different algorithms does SQL Server use for calculating aggregations?
If an input is ordered by the columns used in the GROUP BY clause, then SQL Server uses the "stream aggregation" algorithm, which is implemented in the Stream Aggregate operator. Stream Aggregation is very efficient. SQL Server might even decide to sort the input before performing the aggregation in order to make it possible to use the Stream Aggregate operator.
If the input for the aggregation is not ordered and the input is so big that sorting it would be inefficient, then SQL Server uses the "hash aggregation" algorithm. The operator used for this kind of aggregation is the Hash Match Aggregate operator.
The hash aggregation algorithm builds the hash table from the input like it builds it for a hash join. However, the buckets are used to store the groups.
Similarly to a hash join, hash aggregation is scalable as well. Like the stream aggregation algorithm, the hash aggregation algorithm can compute multiple groups simultaneously in multiple threads.
What aggregation algorithms does SQL Server use?
Stream aggregation and hash aggregation.
Which operator is used when SQL Server performs a non clustered index seek to find a row, but then also needs data from the underlying table, which is organized as a clustered index?
Key lookup. SQL Server uses the Key Lookup operator to get data from the underlying clustered table after a non clustered index seek.
What is the scan called when SQL Server scans a clustered index in logical order of the index?
Index order scan. When SQL Server scans the data in the logical order of an index, this scan is called the index order scan.
What is an allocation scan?
When SQL Server scans the data in the physical order, this scan is called the allocation order scan.
Why does SQL Server try to parameterize your queries?
The SQL Server Query Optimizer has a lot of work to do to determine a good execution plan. After the plan is built, SQL Server caches it in order to make it available for possible reuse. SQL Server tries to parameterize your queries and thus makes it more likely that a plan will be reused.
Does SQL Server parameterize ad hoc queries automatically?
Yes. SQL Server parameterizes ad hoc queries automatically.
Why might SQL Server choose not to reuse a cached plan?
(1) Changes in parameter data type for a parameter in the parameterized predicate (WHERE orderid=10248 vs oderid=10249.0)
(2) SQL Server cannot be sure about the "selectivity" of the query.
(3) Some SET options changed that could influence the query result, e.g. SET CONCAT_NULL_YIELDS_NULL OFF;
All of these reasons can produce a new plan when you would expect reuse of an existing plan. Or they could cause a query to not be parameterized.
How can you get information about cached plans and the number of times plans were reused?
You can get information about cached plans and the number of times the plans were reused by querying the sys.dm_exec_query_stats dynamic management function. You can get the exact text of the query from the sys.dm_exec_sql_text dynamic management function.
What does DBCC FREEPROCCACHE do?
DBCC FREEPROCCACHE clears the procedure cache.
What are two ways to enforce plan reuse?
Using sys.sp_executesql and dynamic SQL as well as stored procedures. The latter is a better practice.