AndersonIS Logo
    Dedicated to serving your information needs


Wednesday 24 September 2008
Lies, damn lies and statistics

Do you understand your data? Well of course you do. So ask yourself, do you really need to spend all that time gathering statistics? If you have a large partitioned or large subpartitioned table, and your query spans partitions then the Oracle cost based optimiser (CBO) does not use your subpartition statistics that you have carefully spent all those CPU cycles and I/O operations gathering.

If you have not gathered statistics at the partition or table level in a subpartitioned table, then the CBO will aggregate those statistics to the partition level and table level for you, but only if you have not set or gathered statistics at those levels previously. This works well for num_rows, maximum and minimum values, but not so well for num_distinct. To aggregate num_distinct at the partition level from number of subpartitions, an arbitary formula that Oracle considers reasonable for the "average" data set is used. Is this realy appropriate for your data.

So maybe you gather statisitcs at the partition level and table level. In that case, for a VLDB, you will be using massive amounts of CPU and I/O resources to periodically recalculate your statistics.

The alternative, if you know your data, is to drop ALL the statistics and manually set them at the object level. That is, don't bother with subpartition and partition level statistics on your segments (data and indexes), instead set the table level and index level statistics with values that you know about the data. This has several major advantages.

* Loading is speeded up because you are not spending time gathering statistics.
* Less resources used, freeing up valuable CPU cycles and I/O operations for other work.
* Developement, Reference and Live environments all have the same statistics, so query plans on all these environments will be the same without needing to load lots of data

One thing to watch out for is the automatic population of some index statistics on index creation. So if you are using the partition exchange method of loading your data, then you are creating indexes on your loading segment prior to exchange. During the index creation, some of the statistics on the index are set and exchanged into the subpartition index. You might want to drop these statistics during the load process.

Wednesday 3 September 2008
Never under estimate the power of sorting

Sorting your data after load can save you disk space, reduce I/O and improve query performance. Sorted data can take a little longer to load, but if you load data into a warehose in (sub)partitioned segments and have local bitmap indexes, then the benefits can be substantial.

For a segment with a bitmap index, sort the data by the bitmap index column with the widest variation of values and sort on it when loading. When creating the local bitmap index on the sorted column, Oracle is able to take full advantage of the internal bitmap data structure resulting in the index consuming much less disk space than when not sorted.

Reducing the size of the index means reducing the number of extents needed for the index. When a query accesses the index, Oracle reads the extent map (including part of every extent in the index) and this means scattered reads from disk. Reducing the number of extents, reduces the number of random reads and means faster query performance.

Sorting the data can also enable the Oracle (block) compression ratio to improve on the table data.

So, sorting your data on load consumes a little more CPU, but results in fewer I/Os when querying your data giving faster query performance and less disk space needed to store both local bitmap index and your data.

I haven't tried the effect of sorting with btree indexes yet, and would be interested to hear from you if you have tried it or if you find this information useful. Contact me and



      AndersonIS copyright 2009