Tuning DISAM

1 ISIDXBLK - index block size
2 ISDUPLEN - duplicate sequence size
3 ISBUILDMODE - specify structure on build

DISAM's original purpose and primary focus is to read and precisely reproduce the CISAM™ standard file structure.

Over time the target has moved, and MicroFocus™ and other ISAMs have added to the challenge.

1 ISIDXBLK - index block size

Early CISAM began with a 512 byte index block size, and shortly afterward settled on 1024. The standard includes a field in the index header which identifies the block size, which is fortunate.

DISAM includes the option ISIDXBLK, in config.h to configure the default index block size, when the isam file is created.

DISAM is also able to set a custom block size at build time - please see ISBUILDMODE for more details on this.

This can be essentially any rational block size, and does not have to be confined to a filesystem friendly multiple of 512.

However, under all known circumstances, an odd value will degrade file IO performance, and will not mesh efficiently with normal disk operations.

Matching, or effectively integrating with, file system block sizes is only one side of the story.

ISAM b-tree indexes have a width and a height, and the wider your keys, the narrower the effective width of each node, and therefore the taller the tree, because it takes fewer keys to fill, and therefore split, each node.

In practise the taller b-trees tend to require more individual disk reads and writes, and therefore more seeking around in the index, and ultimately more time to operate. They also split nodes more frequently, and so tend to grow taller more quickly.

In balance of this, when searching for random keys in an index, a narrow tree reduces the total search area, resulting in reduced reads and smaller buffering overheads. Writes likewise become more efficient.

This is only the beginning of the story - in the end any number of factors will combine to influence what may, or not, be the best strategy in any given circumstance.

For instance, given a very large number of records in a file to which you require rapid access and/or updates, the wider you can afford to make the nodes, the faster this file is likely to perform.

Conversely, given an application with a lot of relatively empty files to serve to a large multi-user population, you can save a lot of memory and other resources by reducing the block size.

All of this is complicated by the fact that keys can be any length, and optionally compressed, in various ways. Each of these factors serves to alter the number of keys that will be stored in a node of given length.

This is further complicated by the fact that an ISAM file can have many indexes, of many different sizes and compressions, and in these cases the best approach is probably to match the block size to the target filesystem block size, and otherwise let the power of b-tree take the burden.

In specific use cases, however, adjusting the node size can provide quite significant performance improvements.

Variable length files may also benefit from a custom block size, in that a larger block size will accommodate larger records with less fragmentation and therefore faster performance, whereas a smaller block size could help to reduce waste in the free list, at some cost to performance.

2 ISDUPLEN - duplicate sequence size

When duplicates are inserted, each new key is given a new sequence number, one greater than the current greatest in the chain for that particular key value.

This value is stored in key record in the index b-tree node.

CISAM started with two bytes to store this value, but now uses four, in later and current releases.

DISAM includes the option ISDUPLEN, in config.h to configure the default number of bytes used to store the duplicate sequence, when the isam file is created.

DISAM is also able to set a custom duplicate length at build time - please see ISBUILDMODE for more details on this.

Each length sets a limit to the maximum number of keys in a duplicate chain, before the sequence value rolls back around to zero, at which point the precise sequential ordering of duplicates will begin to fail.

  length        maximum
       1        256
       2        65,536
       3        16,777,216
       4        4,294,967,296

The table above shows the relevant limits (or one less, when duplicates are flagged to start at 1).

When the number of duplicates in a particular chain exceeds these values, sequential duplicate ordering will begin to fail, as the sequence number rolls over, and starts counting again from zero.

This will not generate an error, and DISAM should continue to store, iterate and search all keys reliably - they simply will not be listed in the correct sequential order.

3 ISBUILDMODE - specify structure on build

ISBUILDMODE is a macro which returns a modified mode value that can be passed to the existing isBuild, isbuild and isvbuild calls, to add control over index block and duplicate sequence sizes, in addition to the existing flag modes.

SYNOPSIS
ISBUILDMODE( blocksize, duplength, modeflags );

int blocksize

int duplength

int modeflags

NOTES

when blocksize is zero, the default ISIDXBLK is used.

when duplength is zero, the default ISDUPLEN is used.

blocksize is actually the number of 512 byte sectors to apply, as follows:

      1 * 512 = 512
      2 * 512 = 1024
      3 * 512 = 1536 
      4 * 512 = 2048
      5 * 512 = 2660

      and so on...

modeflags is the usual combination of build mode flags.

Example:

  #ifdef ISBUILDMODE
    char *idxblk = getenv( "ISIDXBLK" );
    char *duplen = getenv( "ISDUPLEN" );
    int mode = ISBUILDMODE( idxblk ? atoi(idxblk) : 0,
                            duplen ? atoi(duplen) : 0,
                            ISINOUT + ISEXCLLOCK );
  #else
    int mode = ISINOUT + ISEXCLLOCK;
  #endif
    int isfd = isbuild( "example", 81, NULL, mode );

In the above example, the macro itself is used to disable the feature if compiled for use with another isam, and getenv(3) is used to allow configuration via the environment.

You can also pass the result directly, for instance:

isbuild( example, 512, NULL, ISBUILDMODE( 4, 0, ISINOUT+ISAUTOLOCK ) );

In this example the index block size is set to 2048 bytes, and the default value is used for the duplicate sequence length.