Wolf-Tilo Balke
Benjamin Köhncke
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Relational Database Systems 2
3. Indexing and Access Paths
3.1 Introduction to access paths 3.2 Files and blocks
3.3 Indexing
– Single-level indexes – Multi-level indexes – Hash indexes
3.4 Physical Tuning
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2
3 Indexing and Access Paths
• Databases persistently store data
• Major problem: efficiency of data access
– Obviously depends on the hardware used
• But also depends to a large degree
– On the allocation of data on the disks – On intelligent buffer management
– On creating access paths and indexing data
• Physical tuning of the database is a main task of database administrators
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3
3.1 Introduction to Access Paths
• For persistent storage databases are mapped into a number of files
– Located in specially protected parts of the file system (tablespaces, etc.)
– Actually maintained by the operating system
• Each file is partitioned into fixed length blocks (pages)
– Smallest unit transferred from/to storage – Block size is specified on DB creation – Is a multiple of the OS block size
– A block usually contains several data records
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4 SKS 10.5
3.1 Introduction to Access Paths
• Harddisk Sectors are abstracted by the file system to blocks
• DBMS abstracts FS blocks to DBMS blocks
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5
3.1 Sectors and Blocks
DISK
FS Block 1 DBMS Block 1
DBMS Block 2
DBMS Block 5 DBMS Block 3 DBMS Block 4
DBMS Block 6 FS Block 2
FS Block 3 FS Block 4 FS Block 5 FS Block 6
• For data access always complete blocks (pages) are transferred from disk into the main memory
– The part used for holding copies of blocks is referred to as DB buffer and managed by the buffer
manager
– Optimized (pre-)fetching of blocks greatly improves performance
• If a data record is requested by the DB
– If the block is buffered, return main memory address – Else allocate space in the buffer, fetch the block
from disk, and return main memory address
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6 SKS 10.5
3.1 Buffer Management
• Once new blocks are fetched, generally currently buffered blocks have to be evicted
– If blocks have been modified, they must be written back to disk (which is not always possible)
– Writing of blocks depends on the recovery strategy:
• Blocks that cannot yet be written are called pinned blocks
• Before checkpoints there might be a forced output of blocks
• Several buffer replacement strategies can be applied
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7 SKS 10.5
3.1 Buffer Management
• Least recently used (LRU): the „oldest‟ block is replaced
– Usually used in OS buffer management – Simple, yet effective strategy
– Does not use semantics of the data
• Toss immediate: After the DB has finished to process a block, the block is immediately replaced
– Example: block-nested loop join between tables R, S
• Take one block from R and compare against all blocks in S
• The block from R can be thrown out, but no block from S
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8 SKS 10.5
3.1 Replacement Strategies
• Expected Reuse: statistics about query frequencies assess how useful a buffered block is
– The block with least expected usefulness is evicted – Example: index blocks are more often
addressed than data blocks
• In any case, whatever the strategy:
Once a block has been modified and has not yet been written to disk, it cannot be evicted!
– Note: recovery subsystem has to agree before writing a block to disk
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9 SKS 10.5
3.1 Replacement Strategies
• Overhead & Payload (e.g., ORACLE data blocks)
– Header contains general block information like block address, type of segment (data, index,…) – Table directory contains
tables that have rows in the block
– Row directory contains information about the actual rows in the block (IDs, addresses,…)
– Row data is actual data
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10
3.2 Blocks (Pages)
• An Extent is a logical collection of blocks (usually adjacent)
– Fixed size, once more data space is needed for rows a new extent is allocated
– Remember: reading adjacent blocks improves access time
• Segments are collections of logically connected extents
– For instance tables, index segments, or rollback segments
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11
3.2 Extents and Segments
• A tablespace is the logical storage space needed for the data in a table
– A grouping of multiple files allocated on disk
• Example: Data1.ora, system.ora, test.dbf
– Good practice to have one tablespace for tables, and a different one for indexes
CREATE TABLESPACE user_data
DATAFILE ‘udata.ora’ SIZE 10M EXTENT MANAGEMENT LOCAL
SEGMENT SPACE MANAGEMENT AUTO
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12
3.2 Tablespaces
• Records of different relations should be stored in individual files
– Storing related records in the same block minimizes disk accesses
• Multitable clustering file organization may
store records of different tables in the same block
– Good e.g. for some often occurring joins
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13
3.2 File Organization
• Files reserve space in the file system
– File size can be specified or change dynamically
– Default values strongly differ (e.g., DB2 allocates 100 MB)
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14
3.2 File Organization
Schema file
Index file with index blocks Files related to table
“content_type_lecture”
Data file with data blocks
• Organization of records in the file
– Heap – a record can be placed anywhere in the file where there is space
– Sequential – store records in sequential order, based on the value of the search key of each record
– Hashing – a hash function is computed on some
attribute of each record. The result specifies in which block of the file the record should be placed
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15
3.2 File Organization
• Data records have to be written in a file such that the entire record can be accessed with minimal disk accesses
– Fixed length records (easy to implement)
– Variable length records (storage space efficient)
TYPE deposit = RECORD
name : CHAR(22);
accNo : CHAR(10);
balance : REAL;
END
– If each char is 1 byte and a real 8 bytes, a record takes 40 bytes
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16 SKS 10.6
3.2 File Organization
record Name AccNo Balance
0 Burg 864442 654,55
1 Myers 967531 56,45
2 Smith 145288 457,75
• Fixed length records
– 40 bytes for the first record, 40 bytes for the second,…
– Problem: if the block size is not a multiple of 40, records may cross boundaries
– Problem: deleting a record can be either done by marking it as deleted, or by replacing it with some other record of the file
• Keeping deleted items? Re
ading slow!
• Move up all other items? Efficiency!
• Fill space with next inserted item?
Changes sequence!
• Pointer lists? Wastes space in each record!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17 SKS 10.6
3.2 File Organization
record Name AccNo Balance
0 Burg 864442 654,55
1 Myers 967531 56,45
2 Smith 145288 457,75
• Variable length records
– Necessary for multi-table files or records that allow variable length attributes
– Problem: How to know when a record ends?
• Introduce ‘end-of-record’ symbols or store record length at beginning of the record. But what if a record is updated?
• Slotted page structure: header contains number of entries, end of free space and pointers to location/size of entries. Records are
moved to use up space: no fragmentation!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18 SKS 10.6
3.2 File Organization
• Typical database records like in our banking example easily fit into one block
– Name, account number, balance,…
– Usually multiple records per block
• Unspanned record organization fills blocks only with
complete records
• Spanned organization uses pointers to divide records
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19 EN 4.4
3.2 File Organization
i i+1
record 1 record 2 record 3 record 4 record 5
i i+1
record 1 record 2 record 3 4 record 5
4
p
• Necessary for large objects: binary large objects (blobs) and character large objects (clobs)
• Large objects are not interpreted in databases
– Text documents, images, audio and video data
– Need to be stored in a contiguous sequence of bytes – If an object is bigger than a block, contiguous pages
of the buffer pool have to be allocated for storage – Sometimes preferable to
disallow direct access, but only allow access through file-system-like API to allow for fragmentation
20
3.2 Non-Standard Blocks
• Indexes help to locate records in a DB file
– Creation of indexes is part of the physical tuning task of database administrators
– Indexes often influence the actual location of storage for a record
• Example: sequential storage, storage via a hash function
– If the location is determined by the index, not all attributes can
be directly indexed (but secondary access paths may be used)
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21
3.3 Indexing
• When items have to be found quickly, indexing mechanisms are used to
speed up access
– Alphabetical author catalog in libraries
– Lexicographic ordering – …
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22
3.3 Basic Concepts
• Ordered by indexing field (search key)
– Attribute(s) used to look up records in a file
• An index file consists of index entries
– Records of the form
• Two basic kinds of queries
– Exact matches
• Locating a record(s) with a certain value
– Range queries
• Locating all records in a certain value range
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23
3.3 Basic Concepts
search key record location(s)
• Two basic kinds of indexes
– Ordered indexes: search keys are stored in sorted order
• Single-level ordered indexes
• Multi-level ordered indexes
– Hash indexes: search keys are distributed uniformly across blocks using some hash function
• Hash function distributes records uniformly over blocks
• Hashed value of search key decides for storage block
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24
3.3 Basic Concepts
• Index links indexing fields to logical file/block locations
– Dense indexes index all items, sparse indexes only selected ones
– Much smaller file than the data file
• Hence, a binary search on the index file requires fewer block accesses than a binary search on the data file
– Works exactly like a book‟s index
• Only few pages at beginning/end, alphabetically ordered, references to respective page(s)
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25
3.3 Single-Level Ordered Indexes
• Primary indexes
– Order data by some usually unique attribute as indexing field (primary key), store database records in this order
– Index record contains pointer to the respective storage place (block address)
• To save entries usually there is only a single index entry
for each block (block anchor)
– But sparse indexes do not directly
show, whether some record is in the DB
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26 EN 14.1
3.3 Single-Level Ordered Indexes
AccNo 1 1, Adams, $ 887,00 2, Bertram, $19,99 AccNo 4
AccNo 7 3, Behaim, $ 167,00 4, Cesar, $ 1866,00
5, Miller, $179,99 6, Naders, $ 682,56
7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00
• Advantages
– Number of blocks needed for storing the index is small compared to the data
• Not all records are indexed (non-dense)
• Index entries are smaller than data records
– Can often be kept in buffer
• Disadvantages
– Insertions and Deletions need to move data in storage and to update index entries affected by the shifts
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27
3.3 Single-Level Ordered Indexes
• Clustering indexes store data records in the order of a non-unique indexing field
– One entry in the index per distinct value of the indexing field
– Search keys are linked to the block address of the containing the search key
– Is also a sparse index
• But existence of records with a certain key can be assessed by index look-up
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28
3.3 Single-Level Ordered Indexes
• Still problems with insertion/deletion
– Often one or more complete blocks are reserved for each single search key to allow for block anchors
– Blocks do not have to be adjacent, but can use block
pointers, if more space is needed
• Structurally very similar to a hash index
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29
3.3 Single-Level Ordered Indexes
Adams 1, Adams, $ 887,00 2, Adams, $19,99 Miller
Tarrans
4, Miller, $ 1866,00 5, Miller, $179,99 6, Miller, $ 682,56 7, Miller, $ 8642,78
8, Tarrens, $ 856,78
• Secondary indexes point to locations of records regarding a non-ordering attribute
– Indexing does not affect the storage order
– There can be multiple secondary indexes for the same DB file
• Secondary indexes are usually dense
– Objects with same or adjacent values are usually not adjacent on disk
– If the indexing field has unique values (secondary key) definitely all records have to be indexed
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30
3.3 Single-Level Ordered Indexes
• If the indexing field is not unique there are several possibilities to create a secondary index
– Create a dense index by including duplicate search keys (one for each record)
– Use variable-length index entries, where each search key is assigned a list of pointers
– Keep fixed-length index entries, but point to a block containing (multiple) pointers to the actual records
• Introduces a level of indirection, but allows for a sparse index
• Usually used in practice
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31
3.3 Single-Level Ordered Indexes
• Dense secondary index or sparse with indirection
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32
3.3 Single-Level Ordered Indexes
Adams
1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar
Miller
4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56
7, Post, $ 8642,78 8, Adams, $ 856,78 Miller
3, Miller, $19,99
9, Snyder, $ 856,78 Orwell
Post
Tarrens Tarrens Snyder
Adams
1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar
Miller
4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56
7, Post, $ 8642,78 8, Adams, $ 856,78
3, Miller, $19,99
9, Snyder, $ 856,78 Orwell
Post
Tarrens Snyder
p p
p p
p
p p
p p
• Characteristics of secondary indexes
– Speeds up retrieval, because if it does not exist, the entire file would have to be scanned linearly
• For non-existent primary indexes files can still be scanned in a binary search fashion
– Uses more search time and space, because it is dense – Secondary indexes provides a logical ordering
• Accessing records in that order might not be the most efficient way regarding block accesses
• Each record access may fetch a new block into the buffer
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33
3.3 Single-Level Ordered Indexes
• Improving secondary indexes for range queries
– Same block may be (unnecesarrily) accessed multiple times
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34
3.3 Single-Level Ordered Indexes
Adams
1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar
Miller
4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56
7, Post, $ 8642,78 8, Adams, $ 856,78 Miller
3, Miller, $19,99
9, Snyder, $ 856,78 Orwell
Post
Tarrens Tarrens Snyder
• Stupid Example:
Cesar-Orwell
(buffer holds only 1 block)
Cesar: fetch first block
Miller: evict first block, fetch second block
Miller: evict second block, fetch first block
Orwell: evict first block,
fetch second block
• Improving secondary indexes for range queries
– Remember: working on blocks in main memory is cheap, fetching blocks from disk is expensive!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35
3.3 Single-Level Ordered Indexes
Adams 1, Cesar, $ 887,00
2, Tarrens, $19,99 Cesar
Miller
4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56
7, Post, $ 8642,78 8, Adams, $ 856,78 Miller
3, Miller, $19,99
9, Snyder, $ 856,78 Orwell
Post
Tarrens Tarrens Snyder
• Improved Example: Cesar-Orwell
• Fetch ‘Cesar’ and scan whole block
• Output ‘1’ & ‘3’
• Mark block as completed and evict
• Fetch ‘Miller’ and scan whole block
• Output ‘4’ & ‘5’
• Mark block as completed and evict
• Skip second ‘Miller’
• Skip ‘Orwell’
Done
• Short Summary
– Primary and cluster indexes affect storage order, secondary indexes don‟t
– Primary and clustering indexes may be sparse, secondary indexes are usually dense
• Primary and clustering indexes can use block anchors
– Index sizes
• Primary index – number of blocks
• Clustering index – number of distinct search keys
• Secondary index – up to number of records in table
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36
3.3 Single-Level Ordered Indexes
• Example: What if a primary index does not fit in main memory?
– Bad look-up efficiency!
• Simple multi-level solution
– Treat primary index kept on disk as a sequential file and construct a sparse index on it
• outer index – sparse index of primary index
• inner index – primary index file
– If even outer index is too large to fit in main memory, yet another level of index can be created, and so on
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37
3.3 Multi-Level Ordered Indexes
• Multi-level indexes can further improve search speed
– In single level indexes a binary search can be applied to locate pointers to blocks
• Efficiency of log
2N
– Multi-level indexes allow for higher search efficiency
• Fan-out of the index should be higher than 2
• Efficiency of log
fan-outN
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38 EN 14.2
3.3 Multi-Level Ordered Indexes
• Concept of Multi-Level Indexes can be applied to primary,
clustering and secondary indexes as long as values are distinct
• Example: 2-Level Primary Index
– Level 1 is Primary Index – Level 2 is Primary Index on
level 1
– Efficiency of log
fan-outN
• fan-out: Number of 1st level blocks per 2nd level block
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39 EN 14.2
3.3 Multi-Level Ordered Indexes
1
1, Adams, $ 887,00 2, Bertram, $19,99 3, Behaim, $ 167,00
4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56
7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00 10, Arnold, $ 3442,76
11, Marks, $435,19 12, Black, $ 319,44 3
5
7 9
11 1
7 11
Level 1 Level 2
• There is a huge amount of different tree structures for multi-level indexing
– Classical structures B-tree, B*-tree, etc.
– Multidimensional structures R-trees, Quad-trees, etc.
– Lots of literature
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40
3.3 Multi-Level Ordered Indexes
• Index Structure based on Hashing
– Idealized Access Speed: O(1)
• Basic Idea:
– Fixed-Size directly addressable Index Space [0..M] containing M+1 buckets
• Buckets contain links to data
• Single link in internal hashing (i.e. in memory)
• Multiple links for external hashing (i.e. on harddisk)
– Hash Function h: ℤ → [0..M]
• Maps any number to [0..M]
• Should be uniform (i.e.: same probability for any x ∈ [0..M])
– Especially, should also be surjective (i.e. use the full range of [0..M])
• Needs to be deterministic (i.e.: same input → same output)
• Should be simple and fast
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41 EN 13.8.1
3.3 Hash Indexes
• Basic Idea
– Convert value which is to be indexed to numeric representation – Hash the value
– Store block anchor of value in the bucket with computed hash index
• Example: M=8
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42 EN 13.8.1
3.3 Hash Indexes
1, Cesar, $ 887.00 5, Miller, $179.99 9, Snyder, $ 856.78
Miller Cesar Snyder
h(Miller) = 4 h(Cesar) = 7 H(Snyder) = 1
0 1 2 3 4 5 6 7 8
• Problem: Collision
– Hash functions are not injective – collision may happen
• Block pointers for full blocks (overflow list)
– Probability of collision increases with load factor
• Deteriorates to linear behavior in worst case
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43 EN 13.8.1
3.3 Hash Indexes
1, Cesar, $ 887.00
5, Miller, $179.99 9, Snyder, $ 856.78 Miller
Cesar Snyder
h(Miller) = 4 h(Cesar) = 1 h(Snyder) = 1
0 1 2 3 4 5 6 7 8
Smith h(Smith) = 1 Rogers h(Rogers) = 1
3, Rogers, $ 1866.58
7, Smith, $ 0.29
• Problem: Static Address Range
– Addressable range is fixed to [0..M]
– What happens if address range is exhausted?
• E.g., load factor too high, too many collisions occur
– Possible solutions
• Rehashing :
– Create a new larger hashmap and rehash all values – For example, java hashmaps follow that policy
• Extendible Hashing:
– Uses dynamic directory to double and half number of buckets
• Linear Hashing:
– Buckets are split into two one after another and individually rehashed with additional hash-function
– Not in this lecture
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44 EN 13.8.3
3.3 Hash Indexes
• Extendible Hashing
– Hash values are positive integers between [0..2 n ]
• Numbers can be represented in binary
• Choose large n
– Create a directory of depth d with 2 d entries for the first d bits of values
• d is global depth of directory
• Directory cells link to buckets containing links to data with hash value starting with given bit pattern
– Adjacent cells may link to the same bucket depending on local depth of bucket
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45 EN 13.8.3
3.3 Hash Indexes
• Extendible Hashing
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46 EN 13.8.3
3.3 Hash Indexes
000 001 010 011 100 101 110 111
Global Depth = 3
Local depth = 1
Local depth = 2
Local depth = 3
Local depth = 3
All hash values starting with 0
All hash values starting with 10
All hash values starting with 110
All hash values starting with 111
• Extendible Hashing
– Allows to dynamically splitt and coalesc buckets as database grows and shrinks
• If bucket is full and local depth is lower than global depth:
– More than one cell links to this bucket
– Bucket is split into two with increased local depth and values are distributed accordingly
– Links in cells are adjusted accordingly
• If local depth already equals global depth:
– Global depth is increased by one and thus the size of the directory is doubled
– Each cell is replaced by two cells, both of which contain the same pointer as the original cell
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47 EN 13.8.3
3.3 Hash Indexes
• May perform in O(1)
– But: Management overhead can be quite large for growing data collections
• Overflow lists have small management overhead, but may decrease access performance to O(n)
• Rehashing is very expensive for each growth stage
• Extendible hashing has a smaller overhead and is especially suitable for external storage hashing
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48 EN 13.8
3.3 Hash Indexes
• Why not simply index every attribute?
– Physical index on primary key, logical indexes on every other attribute
• Results in good read efficiency
• But… whenever a DB file is modified, every index on the file has to be updated
– Updating indices imposes overhead on database modification (insert, delete, update)
• Especially for multi-level indexes all levels may have to be updated
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49
3.3 Indexing
• One of the tasks of the database administrator
• Black Magic…
– Get DB performance statistics
– Try something that seems sensible – Get new performance statistics
– If it did not improve, reset it to previous configuration
– Try something different until satisfied...
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50
3.4 Physical Tuning
• Black Magic today needs advanced tools to operate correctly
• For database tuning, you need
– Snapshot Monitors
• Continuously collect cumulative statistics for the DB within a given time span
– Event Monitors
• Hook into certain events and analyze them
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51
3.4 Physical Tuning
• Snapshot Monitors
– Monitor collects cumulative statistics
– Internal counters set at global or session levels – Collected statistics can be evaluated afterwards
for system tuning on general level
– Data collection is explicitly enabled or disabled
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52
3.4 Diagnostic Database Tools
• Useful for collecting point-in-time information regarding overall DB/DBMS behavior
– Locking –lists all locks & types held by all applications – Bufferpools – cumulative stats for memory, physical &
logical I/O‟s, synch/asynch
– Sorts – provides detailed statistics regard sort-heap usage, overflows, # active etc.
– Tablespaces – detailed I/O and activity statistics for a given tablespace
– UOW – displays status of application Unit of Work at point-in-time
– Dynamic SQL – shows contents of package cache and related stats at point-in-time
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53
3.4 Diagnostic Database Tools
• Snapshot Monitors
– Overhead continuously is in the 3% - 5% range
• But no I/O since counters are usually RAM resident, not written to disk
– Quick list of useful metrics to identify particular problems include:
• bufferpool hit ratios
• sort overflows/problems
• effectiveness of prefetch
• need for indexing
• transaction logging issues
• single query/application resource domination, etc.
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54
3.4 Diagnostic Database Tools
• Snapshot Monitor: IBM DB2 Performance Expert
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55
3.4 Diagnostic Database Tools
http://www.redbooks.ibm.com/abstracts/sg246470.html
• Event Monitors
– Collects information regarding events or chains of events
• No continuously collected data as with snapshots
– Must be explicitly created and activated via DBMS commands or API‟s
– Are the best way to effectively diagnose and resolve transaction problems (e.g., deadlock issues)
– Output may be directed to a database table and results then analyzed using SQL
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56
3.4 Diagnostic Database Tools
• Event Monitor can be used to
– Identify and rank most frequently executed or most time consuming SQL
– Track the sequence of statements leading up to a lock timeout or deadlock
– Track connections to the database
– Breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57
3.4 Diagnostic Database Tools
• Event Monitor: DBI Brother Panther
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58
3.4 Diagnostic Database Tools
http://www.dbisoftware.com
• Physical DB Tuning is a complicated and intransparent task
• Usually trial and error
– Measure some hopefully meaningful metrics of you DB usage statistics
– Adjust around on something
• Mostly indexes and tablespace properties
– Measure again
• If results better : Great! Continue tuning!
• If result worse : Bad! Undo everything you did and try something else.
• But what to measure and what to do?
– Use best practices or try yourself!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59
3.4 The Black Art of Physical Tuning
• What to measure?
• First, what kind of database you have?
– OLTP (Online Transaction Processing) : Database with many small, concurrent read / write queries
• Optimize for fast read write accesses
• Keep an eye on “real-time” response times
• Optimize for concurrency
– Warehouse : Using larger and more complex queries for primary retrieving data
• Optimize for read accesses
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60
3.4 Physical Tuning
• There is a huge amount of knobs and screws to turn
– DB planning and tuning is for professionals – Special certificates by each database vendor
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61
3.4 Tuning Metrics
http://www.microsoft.com/learning/mcp/mcdba/
http://www-03.ibm.com/certify/certs/dm_index.shtml IBM DB2 IBM Certified Database Associate
IBM Certified Database Administrator IBM Certified Application Developer DB2
IBM Certified Advanced Database Administrator
http://education.oracle.com
• Scott Hayes
– President & CEO of Database-Brothers Inc (DBI)
• If I only had five minutes to assess the status of a database, I would look at…
– Average Result Set Size – Index Read Efficiency
– Synchronous Read Percentage
– Average Rows Read per DB Transaction per Table – Average Read Time
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62
3.4 Tuning Metrics
• ARSS - Average Result Set Size
– ARSS = RowsSelected / NumSelectQueries – Rule of Thumb:
• ARSS ≤ 10 indicates a OLTP database
– If you think you have OLTP and ARSS>>10, there is something wrong…
• ARSS > 10 indicates a warehouse database
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63
3.4 Tuning Metrics
• IREF - Index Read Efficiency
– “How many rows have to read to retrieve one row?”
• i.e. How much overhead is within the where predicates?
– IREF = RowsRead / RowsReturned
• Well-designed indexes may evaluate a where-clause without reading additional rows!
– Rule of Thumb:
• IREF should be less than 10 for OLTP Databases
• May be higher for Warehouses (>100)
– What to do?
• Introduce more efficient indexes!
• User simpler queries!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64
3.4 Tuning Metrics
• SRP – Synchronous Read Percentage
– Synchronous Read means the DBMS reads just the index file and the required data page (good)
– Asynchronous Read means the DBMS employs
prefetching to scan for an indexes or data (BAD) – Rule of Thumb:
• >90% for OLTP (>50% for warehouses) : Good
• 80%-90% (25%-50%) : Ok
• 50%-80% (<25%) Bad
– What to do?
• Improve index structures!
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65
3.4 Tuning Metrics
• TBRRTX – Average Rows Read per DB Transaction per Table
– TBRRTX
table= rowsRead / (attemptedCommits + attemptedRollBacks)
– Rule of Thumb:
• <10 for OLTP : Good
• 10-100 : Ok
• >100 : Bad
• >1000 : Horrific
– What to do?
• Improve index structures!
• Improve Queries
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66
3.4 Tuning Metrics
• ART – Average Read Time
– The average time per read
– RT = ReadTime / (NumDataRead + NumIndexRead) – If you think you did everything within your application
and indices, optimize read time
• Adjust size of tablespace containers
– Containers should be equal size
• Distribute containers across disks
– Avoid OS paging device
• Move highly access tablespaces to faster devices
Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67