• Keine Ergebnisse gefunden

3 Indexing and Access Paths

N/A
N/A
Protected

Academic year: 2021

Aktie "3 Indexing and Access Paths"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Silke Eckstein Andreas Kupfer

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Relational Database Systems 2

3. Indexing and Access Paths

3.1 Introduction to access paths 3.2 Files and blocks

3.3 Indexing

Single-level indexes

Multi-level indexes

Hash indexes 3.4 Physical Tuning

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

3 Indexing and Access Paths

• Databases persistently store data

Major problem: efficiency of data access

Obviously depends on the hardware used

• But also depends to a large degree

On the allocation of data on the disks

On intelligent buffer management

On creating access paths and indexing data

Physical tuning of the database is a main task of database administrators

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

3.1 Introduction to Access Paths

• For persistent storage databases are mapped into a number of files

–Located in specially protected parts of the file system (tablespaces, etc.)

–Actually maintained by the operating system

• Each file is partitioned into fixed length blocks (pages)

–Smallest unit transferred from/to storage –Block size is specified on DB creation –Is a multiple of the OS block size

–A block usually contains several data records

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4 SKS 10.5

3.1 Introduction to Access Paths

• Harddisk Sectors are abstracted by the file system to blocks

• DBMS abstracts FS blocks to DBMS blocks

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

3.1 Sectors and Blocks

DISK

FS Block 1 DBMS Block 1

DBMS Block 2

DBMS Block 5 DBMS Block 3 DBMS Block 4

DBMS Block 6 FS Block 2

FS Block 3 FS Block 4 FS Block 5 FS Block 6

• For data access always complete blocks (pages) are transferred from disk into the main memory

The part used for holding copies of blocks is referred

to as DB buffer and managed by the buffer manager

Optimized (pre-)fetching of blocks greatly improves performance

• If a data record is requested by the DB

If the block is buffered, return main memory address

Else allocate space in the buffer, fetch the block

from disk, and return main memory address

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6 SKS 10.5

3.1 Buffer Management

(2)

• Once new blocks are fetched, generally currently buffered blocks have to be evicted

If blocks have been modified, they must be written back to disk (which is not always possible)

Writing of blocks depends on the recovery

strategy:

•Blocks that cannot yet be written are called pinned blocks

•Before checkpoints there might be a forced output of blocks

• Several buffer replacement strategies can be applied

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7 SKS 10.5

3.1 Buffer Management

Least recently used (LRU): the ‘oldest’ block is replaced

Usually used in OS buffer management

Simple, yet effective strategy

Does not use semantics of the data

Toss immediate: After the DB has finished to process a block, the block is immediately replaced

Example: block-nested loop join between tables R, S

•Take one block from R and compare against all blocks in S

•The block from R can be thrown out, but no block from S

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8 SKS 10.5

3.1 Replacement Strategies

Expected Reuse: statistics about query frequencies assess how useful a buffered block is

The block with least expected usefulness is evicted

Example: index blocks are more often

addressed than data blocks

In any case, whatever the strategy:

Once a block has been modified and has not yet been written to disk, it cannot be evicted!

Note: recovery subsystem has to agree before writing a block to disk

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9 SKS 10.5

3.1 Replacement Strategies

Overhead & Payload (e.g., ORACLE data blocks)

Headercontains general block information like block address, type of segment (data, index,…) –Table directory contains

tables that have rows in the block

Row directory contains information about the actual rows in the block (IDs, addresses,…) –Row data is actual data

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

3.2 Blocks (Pages)

• An Extent is a logical collection of blocks (usually adjacent)

–Fixed size, once more data space is needed for rows a new extent is allocated

Remember:reading adjacent blocks improves access time

• Segments are collections of logically connected extents

–For instance tables, index segments, or rollback segments

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

3.2 Extents and Segments

• A tablespace is the logical storage space needed for the data in a table

A grouping of multiple files allocated on disk

•Example: Data1.ora, system.ora, test.dbf

Good practice to have one tablespace for tables, and a different one for indexes

CREATE TABLESPACE user_data DATAFILE ‘udata.ora’ SIZE 10M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

3.2 Tablespaces

(3)

• Records of different relations should be stored in individual files

Storing related records in the same block minimizes disk accesses

Multitable clustering file organization may store records of different tables in the same block

Good e.g. for some often occurring joins

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

3.2 File Organization

• Files reserve space in the file system

–File size can be specified or change dynamically –Default values strongly differ (e.g., DB2 allocates 100 MB)

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

3.2 File Organization

Schema file

Index file with index blocks Files related to table

“content_type_lecture”

Data file with data blocks

• Organization of records in the file

Heap – a record can be placed anywhere in the file where there is space

Sequential – store records in sequential order, based on the value of the search key of each record

Hashing – a hash function is computed on some attribute of each record. The result specifies in which block of the file the record should be placed

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

3.2 File Organization

• Data records have to be written in a file such that the entire record can be accessed with minimal disk accesses

–Fixed length records (easy to implement) –Variable length records (storage space efficient)

TYPE deposit = RECORD

name : CHAR(22);

accNo : CHAR(10);

balance : REAL;

END

–If each char is 1 byte and a real 8 bytes, a record takes 40 bytes

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16 SKS 10.6

3.2 File Organization

record Name AccNo Balance

0 Burg 864442 654,55

1 Myers 967531 56,45

2 Smith 145288 457,75

Fixed length records

–40 bytes for the first record, 40 bytes for the second,…

Problem: if the block size is not a multiple of 40, records may cross boundaries

Problem:deleting a record can be either done by marking it as deleted, or by replacing it with some other record of the file

•Keeping deleted items? Reading slow!

•Move up all other items? Efficiency!

•Fill space with next inserted item?

Changes sequence!

•Pointer lists? Wastes space in each record!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17 SKS 10.6

3.2 File Organization

record Name AccNo Balance

0 Burg 864442 654,55

1 Myers 967531 56,45

2 Smith 145288 457,75

Variable length records

–Necessary for multi-table files or records that allow variable length attributes

Problem:How to know when a record ends?

•Introduce ‘end-of-record’ symbols or storerecord length at beginning of the record. But what if a record is updated?

Slotted page structure:header contains number of entries, end of free space and pointers to location/size of entries.

Records are moved to use up space: no fragmentation!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18 SKS 10.6

3.2 File Organization

(4)

• Typical database records like in our banking example easily fit into one block

Name, account number, balance,…

Usually multiple records per block

Unspanned record organization fills blocks only with

complete records

Spanned organization uses pointers to divide records

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19 EN 4.4

3.2 File Organization

i i+1

record 1 record 2 record 3

record 4 record 5

i i+1

record 1 record 2 record 3 4 record 5

4

p

• Necessary for large objects: binary large objects (blobs) and character large objects (clobs)

• Large objects are not interpreted in databases

–Text documents, images, audio and video data –Need to be stored in a contiguous sequence of bytes –If an object is bigger than a block, contiguous pages

of the buffer pool have to be allocated for storage –Sometimes preferable to

disallow direct access, but only allow access through file-system-like API to allow for fragmentation

20

3.2 Non-Standard Blocks

Indexes help to locate records in a DB file

Creation of indexes is part of the physical tuning

task of database administrators

Indexes often influence the actual

location of storage for a record

•Example: sequential storage, storage via a hash function –

If the location is determined by

the index, not all attributes can be directly indexed (but secondary access paths may be used)

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

3.3 Indexing

• When items have to be found quickly, indexing mechanisms are used to

speed up access

Alphabetical author

catalog in libraries

Lexicographic ordering

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

3.3 Basic Concepts

• Ordered by indexing field (search key)

Attribute(s) used to look up records in a file

• An index file consists of index entries

Records of the form

• Two basic kinds of queries

Exact matches

•Locating a record(s) with a certain value –

Range queries

•Locating all records in a certain value range

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

3.3 Basic Concepts

search key record location(s)

• Two basic kinds of indexes

Ordered indexes: search keys are stored in sorted order

•Single-level ordered indexes

•Multi-level ordered indexes

Hash indexes: search keys are distributed uniformly across blocks using some hash function

•Hash function distributes records uniformly over blocks

•Hashed value of search key decides for storage block

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

3.3 Basic Concepts

(5)

• Index links indexing fields to logical file/block locations

Dense indexes index all items, sparse indexes only selected ones

Much smaller file than the data file

•Hence, a binary search on the index file requires fewer block accesses than a binary search on the data file –

Works exactly like a book’s index

•Only few pages at beginning/end, alphabetically ordered, references to respective page(s)

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

3.3 Single-Level Ordered Indexes

Primary indexes

–Order data by some unique attribute as indexing field (primary key), store database records in this order –Index record contains pointer to the respective storage

place (block address)

• To save entries usually there is only a single index entry for each block (block anchor)

–But sparse indexes do not directly show, whether some record is in the DB

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26 EN 14.1

3.3 Single-Level Ordered Indexes

AccNo 1 1, Adams, $ 887,00 2, Bertram, $19,99 AccNo 4

AccNo 7 3, Behaim, $ 167,00 4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00

Advantages

Number of blocks needed for storing the index is small compared to the data

•Not all records are indexed (non-dense)

•Index entries are smaller than data records –

Can often be kept in buffer

Disadvantages

Insertions and Deletions need to move data in storage and to update index entries affected by the shifts

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

3.3 Single-Level Ordered Indexes

Clustering indexes store data records in the order of a non-unique indexing field

One entry in the index per distinct value of the indexing field

Search keys are linked to the block address of the containing the search key

Is also a sparse index

•But existence of records with a certain key can be assessed by index look-up

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

3.3 Single-Level Ordered Indexes

• Still problems with insertion/deletion

–Often one or more complete

blocks are reserved for each single search key to allow for block anchors

–Blocks do not have to be adjacent, but can use block pointers, if more space is needed

• Structurally very similar to a hash index

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

3.3 Single-Level Ordered Indexes

Adams 1, Adams, $ 887,00 2, Adams, $19,99 Miller

Tarrans

4, Miller, $ 1866,00 5, Miller, $179,99 6, Miller, $ 682,56 7, Miller, $ 8642,78

8, Tarrens, $ 856,78

Secondary indexes point to locations of records regarding a non-ordering attribute

Indexing does not affect the storage order

There can be multiple secondary indexes for the same DB file

• Secondary indexes are usually dense

Objects with same or adjacent values are usually not adjacent on disk

If the indexing field has unique values (secondary key) definitely all records have to be indexed

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

3.3 Single-Level Ordered Indexes

(6)

• If the indexing field is not unique there are several possibilities to create a secondary index

Create a dense index by including duplicate search keys (one for each record)

Use variable-length index entries, where each search key is assigned a list of pointers

Keep fixed-length index entries, but point to a block containing (multiple) pointers to the actual records

•Introduces a level of indirection, but allows for a sparse index

•Usually used in practice

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

3.3 Single-Level Ordered Indexes

Densesecondary index or sparse with indirection

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

3.3 Single-Level Ordered Indexes

Adams

1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar

Miller

4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56

7, Post, $ 8642,78 8, Adams, $ 856,78 Miller

3, Miller, $19,99

9, Snyder, $ 856,78 Orwell

Post

Tarrens Tarrens Snyder

Adams

1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar

Miller

4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56

7, Post, $ 8642,78 8, Adams, $ 856,78

3, Miller, $19,99

9, Snyder, $ 856,78 Orwell

Post

Tarrens Snyder

p p

p p

p

p p

p p

Characteristics of secondary indexes

Speeds up retrieval, because if it does not exist, the

entire file would have to be scanned linearly

•For non-existent primary indexes files can still be scanned in a binary search fashion

Uses more search time and space, because it is dense

Secondary indexes provides a logical ordering

•Accessing records in that order might not be the most efficient way regarding block accesses

•Each record access may fetch a new block into the buffer

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

3.3 Single-Level Ordered Indexes

• Improving secondary indexes for range queries

Same block may be (unnecesarrily) accessed multiple

times

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

3.3 Single-Level Ordered Indexes

Adams

1, Cesar, $ 887,00 2, Tarrens, $19,99 Cesar

Miller

4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56

7, Post, $ 8642,78 8, Adams, $ 856,78 Miller

3, Miller, $19,99

9, Snyder, $ 856,78 Orwell

Post

Tarrens Tarrens Snyder

• Stupid Example:

Cesar-Orwell

(buffer holds only 1 block)

Cesar: fetch first block

Miller:evict first block, fetch second block

Miller:evict second block, fetch first block

Orwell: evict first block, fetch second block

• Improving secondary indexes for range queries

Remember: working on blocks in main memory is

cheap, fetching blocks from disk is expensive!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

3.3 Single-Level Ordered Indexes

Adams 1, Cesar, $ 887,00

2, Tarrens, $19,99 Cesar

Miller

4, Orwell, $ 1866,00 5, Miller, $179,99 6, Tarrens, $ 682,56

7, Post, $ 8642,78 8, Adams, $ 856,78 Miller

3, Miller, $19,99

9, Snyder, $ 856,78 Orwell

Post

Tarrens Tarrens Snyder

• Improved Example: Cesar-Orwell

• Fetch ‘Cesar’ and scan whole block

• Output ‘1’ & ‘3’

• Mark block as completed and evict

• Fetch ‘Miller’ and scan whole block

• Output ‘4’ & ‘5’

• Mark block as completed and evict

• Skip second ‘Miller’

• Skip ‘Orwell’

Done

Short Summary

Primary and cluster indexes affect storage order, secondary indexes don’t

Primary and clustering indexes may be sparse, secondary indexes are usually dense

•Primary and clustering indexes can use block anchors –

Index sizes

•Primary index – number of blocks

•Clustering index – number of distinct search keys

•Secondary index – up to number of records in table

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36

3.3 Single-Level Ordered Indexes

(7)

Example: What if a primary index does not fit in main memory?

Bad look-up efficiency!

• Simple multi-level solution

Treat primary index kept on disk as a sequential file and construct a sparse index on it

•outer index – sparse index of primary index

•inner index – primary index file

If even outer index is too large to fit in main memory, yet another level of index can be created, and so on

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

3.3 Multi-Level Ordered Indexes

• Multi-level indexes can further improve search speed

In single level indexes a binary search can be applied to locate pointers to blocks

•Efficiency of log2N

Multi-level indexes allow for higher search efficiency

•Fan-out of the index should be higher than 2

•Efficiency of logfan-outN

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38 EN 14.2

3.3 Multi-Level Ordered Indexes

• Concept of Multi-Level Indexes can be applied to primary, clustering and secondary indexes as long as values are distinct

• Example: 2-Level Primary Index

–Level 1 is Primary Index –Level 2 is Primary Index on

level 1

–Efficiency of logfan-outN

•fan-out: Number of 1stlevel blocks per 2ndlevel block

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39 EN 14.2

3.3 Multi-Level Ordered Indexes

1

1, Adams, $ 887,00 2, Bertram, $19,99 3, Behaim, $ 167,00 4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00 10, Arnold, $ 3442,76

11, Marks, $435,19 12, Black, $ 319,44 3

5

7 9

11 1 7 11

Level 1 Level 2

• There is a huge amount of different tree structures for multi-level indexing

Classical structures B-tree, B*-tree, etc.

Multidimensional structures R-trees, Quad-trees, etc.

Lots of literature

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

3.3 Multi-Level Ordered Indexes

• Index Structure based on Hashing

–Idealized Access Speed: O(1)

• Basic Idea:

–Fixed-Size directly addressable Index Space [0..M]

containing M+1 buckets

•Buckets contain links to data

•Single link in internalhashing (i.e. in memory)

•Multiple links for externalhashing (i.e. on harddisk) –Hash Function h: ℤ →[0..M]

•Maps any number to [0..M]

•Should be uniform (i.e.: same probability for any x ∈ [0..M]) Especially, should also be surjective(i.e. use the full range of [0..M])

•Needs to be deterministic(i.e.: same input →same output)

•Should be simpleand fast

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41 EN 13.8.1

3.3 Hash Indexes

• Basic Idea

–Convert value which is to be indexed to numeric representation –Hashthe value

–Store block anchor of value in the bucketwith computed hash index

• Example: M=8

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42 EN 13.8.1

3.3 Hash Indexes

1, Cesar, $ 887.00 5, Miller, $179.99 9, Snyder, $ 856.78

Miller Cesar Snyder

h(Miller) = 4 h(Cesar) = 7 H(Snyder) = 1

0 1 2 3 4 5 6 7 8

(8)

• Problem: Collision

–Hash functions are not injective – collision may happen

•Block pointers for full blocks (overflow list) –Probability of collision increases with load factor

•Deteriorates to linear behavior in worst case

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43 EN 13.8.1

3.3 Hash Indexes

1, Cesar, $ 887.00

5, Miller, $179.99 9, Snyder, $ 856.78 Miller

Cesar Snyder

h(Miller) = 4 h(Cesar) = 1 h(Snyder) = 1

0 1 2 3 4 5 6 7 8 Smith h(Smith) = 1 Rogers h(Rogers) = 1

3, Rogers, $ 1866.58

7, Smith, $ 0.29

• Problem: Static Address Range

–Addressable range is fixed to [0..M]

–What happens if address range is exhausted?

•E.g., load factor too high, too many collisions occur –Possible solutions

Rehashing:

Create a new larger hashmap and rehash all values For example, java hashmaps follow that policy

Extendible Hashing:

Uses dynamic directory to double and half number of buckets

Linear Hashing:

Buckets are split into two one after another and individually rehashed with additional hash-function

Not in this lecture

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44 EN 13.8.3

3.3 Hash Indexes

Extendible Hashing

Hash values are positive integers between [0..2

n

]

•Numbers can be represented in binary

•Choose large n

Create a directory of depth d with 2

d

entries for the first d bits of values

d is global depth of directory

•Directory cells link to buckets containing links to data with hash value starting with given bit pattern

–Adjacent cells may link to the same bucket depending on local depth of bucket

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45 EN 13.8.3

3.3 Hash Indexes

Extendible Hashing

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46 EN 13.8.3

3.3 Hash Indexes

000 001 010 011 100 101 110 111 Global Depth = 3

Local depth = 1

Local depth = 2

Local depth = 3

Local depth = 3 All hash values starting with 0

All hash values starting with 10

All hash values starting with 110

All hash values starting with 111

Extendible Hashing

Allows to dynamically splitt and coalesc buckets as database grows and shrinks

•If bucket is full and local depth is lower than global depth:

–More than one cell links to this bucket

–Bucket is split into two with increased local depth and values are distributed accordingly

–Links in cells are adjusted accordingly

•If local depth already equals global depth:

–Global depth is increased by one and thus the size of the directory is doubled

–Each cell is replaced by two cells, both of which contain the same pointer as the original cell

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47 EN 13.8.3

3.3 Hash Indexes

• May perform in O(1)

But: Management overhead can be quite large for growing data collections

Overflow lists have small management overhead, but may decrease access performance to O(n)

Rehashingis very expensive for each growth stage

Extendiblehashing has a smaller overhead and is especially suitable for external storage hashing

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48 EN 13.8

3.3 Hash Indexes

(9)

Why not simply index every attribute?

Physical index on primary key, logical indexes on every other attribute

• Results in good read efficiency

• But… whenever a DB file is modified, every index on the file has to be updated

Updating indices imposes overhead on database modification (insert, delete, update)

•Especially for multi-level indexes all levels may have to be updated

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

3.3 Indexing

• One of the tasks of the database administrator

Black Magic…

Get DB performance statistics

Try something that seems sensible

Get new performance statistics

If it did not improve, reset it to previous configuration

Try something different until satisfied...

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

3.4 Physical Tuning

• Black Magic today needs advanced tools to operate correctly

• For database tuning, you need

Snapshot Monitors

•Continuously collect cumulative statistics for the DB within a given time span

Event Monitors

•Hook into certain events and analyze them

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

3.4 Physical Tuning

Snapshot Monitors

Monitor collects cumulative statistics

Internal counters set at global or session levels

Collected statistics can be evaluated afterwards

for system tuning on general level

Data collection is explicitly enabled or disabled

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

3.4 Diagnostic Database Tools

• Useful for collecting point-in-time information regarding overall DB/DBMS behavior

Locking–lists all locks & types held by all applications –Bufferpools– cumulative stats for memory, physical &

logical I/O’s, synch/asynch

Sorts– provides detailed statistics regard sort-heap usage, overflows, # active etc.

Tablespaces– detailed I/O and activity statistics for a given tablespace

UOW– displays status of application Unit of Work at point-in-time

Dynamic SQL – shows contents of package cache and related stats at point-in-time

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

3.4 Diagnostic Database Tools

Snapshot Monitors

Overhead continuously is in the 3% - 5% range

•But no I/O since counters are usually RAM resident, not written to disk

Quick list of useful metrics to identify particular problems include:

•bufferpool hit ratios

•sort overflows/problems

•effectiveness of prefetch

•need for indexing

•transaction logging issues

•single query/application resource domination, etc.

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

3.4 Diagnostic Database Tools

(10)

Snapshot Monitor: IBM DB2 Performance Expert

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

3.4 Diagnostic Database Tools

http://www.redbooks.ibm.com/abstracts/sg246470.html

Event Monitors

Collects information regarding events or chains of events

•No continuously collected data as with snapshots –

Must be explicitly created and activated via DBMS

commands or API’s

Are the best way to effectively diagnose and resolve transaction problems (e.g., deadlock issues)

Output may be directed to a database table and

results then analyzed using SQL

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

3.4 Diagnostic Database Tools

Event Monitor can be used to

Identify and rank most frequently executed or most time consuming SQL

Track the sequence of statements leading up to a lock timeout or deadlock

Track connections to the database

Breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

3.4 Diagnostic Database Tools

Event Monitor: DBI Brother Panther

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

3.4 Diagnostic Database Tools

http://www.dbisoftware.com

• Physical DB Tuning is a complicated and intransparent task

• Usually trial and error

Measuresome hopefully meaningful metrics of you DB usage statistics

Adjust around on something

•Mostly indexes and tablespace properties –Measureagain

•If results better : Great! Continue tuning!

•If result worse : Bad! Undoeverything you did and try something else.

• But what to measure and what to do?

–Use best practices or try yourself!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

3.4 The Black Art of Physical Tuning

What to measure?

• First, what kind of database you have?

OLTP (Online Transaction Processing) : Database with many small, concurrent read / write queries

•Optimize for fast read write accesses

•Keep an eye on “real-time” response times

•Optimize for concurrency

Warehouse : Using larger and more complex queries for primary retrieving data

•Optimize for read accesses

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

3.4 Physical Tuning

(11)

• There is a huge amount of knobs and screws to turn

DB planning and tuning is for professionals

Special certificates by each database vendor

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

3.4 Tuning Metrics

http://www.microsoft.com/learning/mcp/mcdba/

http://www-03.ibm.com/certify/certs/dm_index.shtml IBM DB2IBM Certified Database Associate IBM Certified Database Administrator IBM Certified Application Developer DB2 IBM Certified Advanced Database Administrator

http://education.oracle.com

• Scott Hayes

President & CEO of Database-Brothers Inc (DBI)

• If I only had five minutes to assess the status of a database, I would look at…

Average Result Size

Index Read Efficiency

Synchronous Read Percentage

Average Rows Read per DB Transaction per Table

Average Read Time

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

3.4 Tuning Metrics

ARS - Average Result Size

ARS = RowsSelected / NumSelectQueries

Rule of Thumb:

•ARS ≤ 10 indicates a OLTP database

–If you think you have OLTP and ARS>>10, there is something wrong…

•ARS > 10 indicates a warehouse database

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

3.4 Tuning Metrics

IREF - Index Read Efficiency

–“How many rows have to read to retrieve one row?”

•i.e. How much overhead is within the where predicates?

IREF = RowsRead / RowsReturned

•Well-designed indexes may evaluate a where-clause without reading additional rows!

–Rule of Thumb:

•IREF should be less than 10 for OLTP Databases

•May be higher for Warehouses (>100) –What to do?

•Introduce more efficient indexes!

•User simpler queries!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

3.4 Tuning Metrics

SRP – Synchronous Read Percentage

Synchronous Read means the DBMS reads just the index file and the required data page (good)

Asynchronous Read means the DBMS employs

prefetching to scan for an indexes or data (BAD)

Rule of Thumb:

•>90% for OLTP (>50% for warehouses) : Good

•80%-90% (25%-50%) : Ok

•50%-80% (<25%) Bad –

What to do?

•Improve index structures!

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

3.4 Tuning Metrics

TBRTTX – Average Rows Read per DB Transaction per Table

–TBRTTXtable= rowsRead / (attemptedCommits + attemptedRollBacks)

–Rule of Thumb:

•<10 for OLTP : Good

•10-100 : Ok

•>100 : Bad

•>1000 : Horrific –What to do?

•Improve index structures!

•Improve Queries

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66

3.4 Tuning Metrics

(12)

ART – Average Read Time

The average time per read

RT = ReadTime / (NumDataRead + NumIndexRead)

If you think you did everything within your application

and indices, optimize read time

•Adjust size of tablespace containers –Containers should be equal size

•Distribute containers across disks –Avoid OS paging device

•Move highly access tablespaces to faster devices

Datenbanksysteme 2 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67

3.4 Tuning Metrics

Referenzen

ÄHNLICHE DOKUMENTE

Remark 3.8: Despite the fact that the conditions for higher order sampled–data feedback control become rather complicated, for a given continuous time closed loop system it is

A data item is deleted total forensic secure, if absolutely no conclusions on exact or approx- imate values of any of the attributes of the data item can be drawn by using

Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system. In Proceedings of the ACM Conference on

backtrack point, then also the heap pointers in the stack must be updated... Classes and Objects.. Discussion:. • We adopt the C++ perspective on classes

The Media Search system, as shown in Figure 1, is broken into six components: 1) one or more Media Servers, 2) a metadatabase that is a built on a standard relational database, 3)

By far the most demanding component of our system is the indexing and query response service and that component is the one to which we have devoted most of our effort. It can be

The assignment by algorithm 2 to indexing modes agrees for 81.4% of the experimental shots with the assignment obtained by using the model intensities of 1jb0 as a reference.. For

This study posits that governments will be more likely to repress challenges when they use violence, occur in urban areas, target the government, make political demands, or