Oracle PL/SQL: INDEXES: Index Usage Notes - Excellent explanation of index usage, originally posted by Richard Foote at c.d.o.server on 20 Janua

Quick Search:

Oracle PL/SQL: INDEXES: Index Usage Notes

Jump to:

Category: >> Oracle PL/SQL >> INDEXES: Index Usage Notes

<< last

next >>

Snippet Name: INDEXES: Index Usage Notes

Description: Excellent explanation of index usage, originally posted by Richard Foote at c.d.o.server on 20 January, 2005.

Also see:
» INDEXES: View table indexes
» INDEXES: Analyze Index
» INDEXES: Block Dump
» INDEXES: Rebuild Reverse
» INDEXES:
» INDEXES: ENABLE (function-based index)
» INDEXES: DISABLE (function-based index)
» INDEXES: Alter Index Parallel
» INDEXES: Alter Index Deallocate Unused
» INDEXES: Alter Index Allocate Extent
» INDEXES: Virtual / NoSegment
» INDEXES: Reverse Key Indexes
» INDEXES: Bitmap Join Indexes
» INDEXES: Bitmap Indexes
» INDEXES: Unique indexes
» INDEXES: Parallel Index
» INDEXES: Compute Statistics
» INDEXES: SORT and NOSORT
» INDEXES: Function-Based Index
» INDEXES: DROP index
» INDEXES: Alter index
» INDEXES: Single Column Non-unique
» Compressed Indexes
» Create INDEX

Comment: (none)

Language: PL/SQL
Highlight Mode: HTML4STRICT
Last Modified: March 10th, 2009

"hastenthunder" wrote in message news:[email protected]...
 
>> Hello,
>>
>> I've read many documentations online stating to only create an index if
>> queries against this table frequently retrieve less than 15% of the rows.
>> However, if the query returns, say, 40% of the rows, wouldn't indexing the
>> column still help by cutting the work by roughly half?
>>
>>
>> hastenthunder
>>
 
A much *simplified* example on how I teach this stuff...
 
Let's say we have a table that has 10,000,000 rows which are 
stored in 1,000,000 data blocks meaning we have approximately 
10 rows per block on average.
 
Let's say we have an index on this table that has 100,000 leaf 
blocks meaning we have on average approximately 100 leaf entries 
per leaf block the index has 3 levels.
 
Let's also say we have an "effective" multi-block read capability 
of 10 blocks per I/O (meaning Oracle will read 10 "consecutive" 
blocks at a time on average during a full table scan multiblock 
read).
 
Finally, let's say we're interested in accessing *just* 10% of 
the data (or 1,000,000 of the total 10,000,000 rows). Will 
Oracle use the index or won't it ? Hopefully, I've picked an 
easy set of numbers to help illustrate the answer ...
 
Firstly, to calculate the "cost" of using the index access path.
 
We need to read the root block + a branch block in order to get 
to the first leaf block of interest. That's 2 logical I/Os 
(LIOs). We then need to read approximately 10% of the leaf 
blocks in order to get our 1,000,000 leaf entries required to 
directly access our 1,000,000 rows of interest, that's 10% of 
the 100,000 leaf blocks = 10,000 leaf blocks. Because we're 
reading an index via a range scan and because the leaf blocks 
are not (necessarily) physically co-related, Oracle must read 
each leaf block via a single I/O. So that's 10,000 LIOs. So, 
just to read the index alone, we require 2 + 10,000 = 10,002 
LIOs.
 
Note by default, Oracle assumes the above "cost" to be physical 
I/Os (PIOs). Now assuming this index is heavily accessed, a good 
number of these index blocks may already be cached in memory. 
The optimizer_index_caching parameter can be used to adjust the 
above cost by suggesting that x% are actually already cached and 
so are "cheaper" to access. To keep things simple, we'll assume 
the default value of 0% or that no index blocks are actually likely 
to be cached (generally not a wise assumption but let's keep the 
arithmetic simple).
 
To access the corresponding table blocks, again Oracle can only 
perform these reads via a single block read as each index entry 
points to a table block that contains it's specific table row . 
Now we're after 1,000,000 rows which means we require 1,000,000 
LIOs in order to access the required rows. Question is, how many 
*different* table blocks do we need to access ? Well, this is 
entirely dependent on the Clustering Factor (CF) of the index, 
or how closely aligned are the corresponding rows in the table 
in relation to the order of the index (which must be in the 
order of the index values). In the "best" possible case, all the 
required rows are all ordered and grouped together in the same 
"collection" of table blocks meaning we only have to access 10% 
of the 1,000,000 table blocks or 100,000 table blocks in a 
roughly *consecutively* manner.
 
However, as is more common, if the required rows are randomly 
and evenly distributed among the table blocks, then on average 
we need to read 1 row (10%) from *each and every table block*. 
Note in your case of wanting to access 40% of the data, we might 
depending on a poor CF need to visit on average *each and every* 
data block *4 times*. This is the key point (no pun intended).
 
The greater the number of differing blocks we access, then the 
less likely we will find the block in memory from it being 
previously read and the more likely that the block will need to 
be read from disk (PIO). Oracle considers this and uses the CF 
in it's costing calculations. Assuming a randomly distributed 
set of required rows, note we will need to visit *all* the table 
blocks on average because on average we are interested in 1 in 
10 of the rows that each block contains (yes, some blocks may 
not actually be visited and some may be visited a number of 
times but with such volume of blocks, it conceivably might be a 
significant duration between reads to the same block meaning it 
could easily have been aged and be physically re-read anyways).
 
The point though is that it's 1,000,000 LIOs regardless, of 
which a very significant number *could* be *actual distinct* 
(or differing) blocks. So that's 10,002 for the index + 1,000,000 
for the table = 1,010,002 LIOs to read *just* 10% of the data 
via an index.
 
Now to calculate the "cost" of a FTS. A FTS has a number of 
advantages over an index access path. Firstly, because we read 
each block "consecutively" (kinda) Oracle can investigate the 
appropriate selectiveness of each row within the block ensuring 
that each table block is read just *once* (special blocks such 
as extent maps withstanding). Secondly, again because each block 
is read consecutively, Oracle can perform a multi-block read and 
read multiple blocks within the one LIO. This is based on factors 
such as db_file_multiblock_read_count, system statistics, OS I/O 
characteristics, the caching characteristics of the table and the 
"fudge-factor" that theOracle CBO applies in it's calculations.
 
For simplicity (and to keep the numbers really simple), assuming 
an effective multi-block read of 10, we can read the entire table 
in approximately 1,000,000 table blocks / 10 = 100,000 LIOs. 
Note that although these are larger and potentially more "costly" 
I/Os than the single block I/Os used by the index, Oracle assumes 
by default that the actual cost of each type of I/O to be the 
same. The optimizer_index_cost_adj parameter can be used to more 
accurately estimate (if necessary) the relative cost of a single 
block I/O to that of a FTS multi-block I/O. Again for simplicity, 
we'll assume the default of 100 meaning that the cost of a single 
block I/O is 100% (or the same) as a FTS I/O.
 
So, we now have our two comparative costings. The index access 
has a rough cost of 1,010,002 and the FTS has a rough cost of 
just 100,000. The FTS wins hands down.... Note for 40% of the 
data, the relative costs would have been roughly 4,040,002 vs. 
100,000. Even more hands down ...
 
The break-even point can now be calculated based on the above 
criteria, some of which include:
 
    * the selectivity of the query
    * number of leaf blocks
    * average number of leaf entries per leaf block
    * height of index
    * caching characteristics of index
    * clustering factor of index
    * number of table blocks (below HWM)
    * average number of rows per block
    * effective (or calculated) multi-block read
    * caching characteristics of the table (which can influence the effective
      multi-block read)
    * relative cost of a single block I/O vs. a multi-block I/O
    * amount of row migration / row chaining (although the CBO is not so good
      with this)
    * parallelism (potentially a major factor)
 
So your assumption that reading 40% of rows would cut the work 
by roughly half is not correct. In the example above, it would 
actually cost about 40 times as much. In my long-winded manner, 
I hope this makes some kind of  sense and goes some way to 
explaining why.
 
One final piece of advice. Ignore any writings or suggestions 
that there is a magical break even point is x% (where x could 
be 2% or 10% or 50% or whatever). Hopefully the above will hint 
that there is *no* such percentage as it all depends on too many 
factors. I can easily give you an example where an index is most 
efficient when reading 0% of data and I can easily give you an 
example where an index is most efficient when reading *100%* of 
data (and *any* value in between). When one understands how the 
CBO functions, one understands why such so-called rules of thumb 
are a nonsense.
 
Cheers
 
Richard Foote