DISAM OVERVIEW

DISAM is an improved C-ISAM (Informix) compatible file handler. DISAM is a complete rewrite based on experience gained in the field with its predecessors, DISAM 3 and DISAM96 (DISAM96 is still a supported product while DISAM 3 is not). For pricing and upgrade information please refer to the home page for the appropriate links. DISAM is available as “Library only” for major platforms and as a Software Developer Kit with complete source in C for all Platforms.

Networking

This optional module is available on request, and supplies a flexible network layer on top of the DISAM library, providing the ability to add url:// network syntax to your isam file names, as well as transparently redirect your entire application by supplying the server address and path in an environment variable.

The DISAMNET protocol is able to operate over a full duplex TCP/IP socket pair, offering significant performance advantages.  DISAMNET can also communicate over pipe(2) connections via proxy of your choice, by default ssh.

For a complete description of this option, please refer to the chapter on Networking in the DISAM Manual

Uncompressed (flat) versus compressed indexes

For the dual purpose of keeping the code cleaner and to improve performance, the index handling routines are split into separate functions where the associated handling is markedly different.

Flat indexes, in general, can be operated on mathematically, lend themselves to shell searches, and permit the current key value to be a simple pointer into the node image.

Compressed indexes require that each node be searched in a linear fashion, building the key image in an allocated buffer en route, and generally require a fair amount more processing to operate.

The index handing and manipulation modules – isnode, isgrow and isprune – all adhere to this standard. The decision as to which handling method to apply is made at the top level, eliminating the need for myriad if-then-else tests during processing.

Error handling

The new library deals with errors via setjmp/longjmp. Each external entry point is responsible for initialising a jump buffer and mode flag in the file descriptor. When an error is detected the thread is passed to an error handling routine which sets the error codes and drops the index locks before returning control the entry call.

The error handling routine is also responsible for clearing the update buffers when triggered in write mode.

The nature of setjmp/longjmp means that it appears to the calling function as if the ISFAIL call, made initially to initialize the jump buffer, had returned TRUE.

Machine independent data

The library includes the following series of calls to provide a means of generating data that will be transportable to different machines. Use of these is only required if you require such portability, since the library will accept native formats without complaint.

int ldint( char *pad );
int stint( int value, char *pad );
long ldlong( char *pad );
int stlong( long value, char *pad );
int stchar( char *pad, char *str, int len );
int ldchar( char *pad, int len, char *str );
double ldfloat( char *pad );
int stfloat( double value, char *pad );
double lddbl( char *pad );
int stdbl( double value, char *pad );
int stfltnull( double value, char *pad, short *null );
double ldfltnull( char *pad, short *null );
int stdblnull( double value, char *pad, short *null );
double lddblnull( char *pad, short *null );

In all of the above, st identifies calls that will place a value in the specified location, which can then be retreived by the matching ld call. The null suffix calls will load the variable passed (by address) in the last parameter with a boolean ISTRUE if the associated value is zero, or ISFALSE if not.

All pads referred to should be the size of the associated data type, with the exception of ldint/stint, which refer to a two byte integer, regardless of the native word size.

Note that the float and double calls do not perform any actual data conversion, and are provided as a means of avoiding data alignment problems. The int and long calls will perform byte order reversal (sometimes referred to as swabbing) on platforms where the least significant byte is found at the beginning of the field.

Note also that ldint and stint operate on short (2 byte) integers, although the value and return are passed in the natural word size.

Isam indexes

Isam indexes are defined/described by means of the keydesc structure.

The following is an example of the basic procedure, and describes an index that allows duplicates, has full compression, and consists of two fields – a machine independent integer taken from the first two bytes of the record, and a character string from the seventh through sixteenth bytes.

struct keydesc key;
key.k_flags = ISDUPS+ISCOMPRESS; /* see index flags */
key.k_nparts = 2; /* number of fields involved */
key.k_part[0].kp_start = 0; /* offset in data record */
key.k_part[0].kp_leng = INTSIZE; /* length */
key.k_part[0].kp_type = INTTYPE; /* type */
key.k_part[1].kp_start = 7; /* offset in data record */
key.k_part[1].kp_leng = 10 * CHARSIZE; /* length */
key.k_part[1].kp_type = CHARTYPE; /* type */

this structure is then passed to isBuild, isAddIndex, or isStart.

index flag values – these apply to the index as a whole

ISNODUPS 0 no duplicates permitted
ISDUPS 1 duplicates permitted
DCOMPRESS 2 compress duplicates
LCOMPRESS 4 leading compression
TCOMPRESS 10 trailing compression
COMPRESS 16 full compression

individual index component/part/field types

CHARTYPE 0 array of bytes/characters
INTTYPE 1 two byte (short) integer
LONGTYPE 2 four byte (long) integer
DOUBLETYPE 3 ieee double floating point
FLOATTYPE 4 ieee single floating point
MINTTYPE 5 machine (native) short
MLONGTYPE 6 machine (native) long
ISDESC 0x80 add to individual parts for descending order

It is generally advisable to provide some callable means of loading your key descriptions into a keydesc structure, or to provide some other means of easy access, since they will be needed for selecting indexes via isStart.

You will also need to excercise caution when using CHARTYPE fields as the library does not recognise null terminated strings. In other words, if you’re referring to a 10 byte wide character field, and it contains a seven byte string, a null terminator and 2 garbage bytes following the null, the library will build a key 10 bytes in length that includes the null and trailing garbage. This key might prove difficult to locate later.

The offically approved solution is to use the included stchar() and ldchar() calls, which will pad the string passed with spaces on store and replace the null terminator on load.

Note also that trailing compression operates on strings of trailing spaces, so the fairly common practice of padding the remainder of the field with nulls could result in inefficencient operation where trailing compression is expected.

Open modes

There are two different groups of open modes used by the isOpen and isBuild calls. The first group describes the operations which the current process will be allowed to perform, while the latter controls what forms of external access will be permitted. You must chose one from each list.

ISINPUT open file for reading only
ISOUTPUT open file for writing only
ISINOUT open file for both reading and writing
ISMANULOCK record locks placed manually
ISAUTOLOCK record locks placed automatically on read
ISEXCLLOCK exclusive access to index and data

Search modes

All isRead operations act on the currently selected index.

ISFIRST select the first record in the index
ISLAST select the last record in the index
ISEQUAL search for and return the first exact match, or error
ISGTEQ search for first exact match, or next greater
ISGREAT search for the next greatest key in the index
ISNEXT skip to the next key in the index
ISPREV skip to the previous key in the index
ISCURR reread the current location.

Note ISNEXT and ISPREV behave differently when used in isRead immediately after in isStart call in that they will return the current rather than the next/previous record.
This is a feature, not a bug. the intention is to allow isStart to function as an initialisation call preceeding a read next/previous loop, as in the following:

isStart( isfd, &secondary_index, 0, NULL, ISFIRST);
while( isRead( isfd, data_pointer, ISNEXT ) )
display_data( data_pointer );

the above will read the contents of isfd from the top in secondary_index order.

ISNOLKRD if added to the search modes will result in an unlocked read on the index. this option can be used to improve access times significantly, but only if you are either certain that the file in question is not being updated by other processes, or you are not overly concerned about the accuracy of the results.

Lock modes

These are added to the search mode in isRead to lock or wait for the release of the selected record.

ISLOCK lock the selected record against external update

ISWAIT wait until the selected record has been released

ISLCKW wait for and unlock the selected record

ISLOCK will return an ELOCKED error if the selected record has been locked, but will still return the selected record image. this allows the process to read an externally locked record, but warns that updates will not be permitted.

ISWAIT and ISLCKW do no deadlock processing, and will wait forever, so applications should be coded appropriately.

The isLock, isUnLock, isRelease, isRelRec and isRelCurr group of functions provide further control over record locking, and you can also use ISEXCLLOCK to lock other processes out of a file completely.

Error codes

In addition to the standard operating system error codes, the library will return the following values in isfd->iserrno where appropriate.

EDUPL 100 illegal duplicate
ENOTOPEN 101 file not open
EBADARG 102 illegal argument
EBADKEY 103 illegal key description
EBADFILE 105 isam file is corrupt
ENOTEXCL 106 can't get exclusive access
ELOCKED 107 record is locked
EKEXISTS 108 index already defined
EENDFILE 110 start or end file reached
ENOREC 111 record not found
ENOCURR 112 no current record
EFLOCKED 113 file is locked
EFNAME 114 file name is too long
EBADMEM 116 can't allocate memory
ENOPRIM 127 no primary key

when the library returns a system error code it will also load isfd->iserrio with the sum of the following two groups of values:

IO_IDX 0x01 index file
IO_DAT 0x02 data file
IO_OPEN 0x10 while opening
IO_CREA 0x20 while creating
IO_SEEK 0x30 while seeking
IO_READ 0x40 while reading
IO_WRIT 0x50 while writing
IO_LOCK 0x60 while locking

Dpack

dpack – isam file pack/rebuild utility

usage – dpack <isamfile> […]

dpack takes a list of one or more isam files and will rebuild each in turn. it works by building a temporary copy of the original file and copying all active data records to it, then deleting the original and renaming the working copy.

The temporary file is created in the same directory as the original and is named dpnnnnn, where nnnnn is the current process id.

You must ensure that you have write permission on the data directory and that there is enough room for dpack to build the working copy.

dpack will fail if it cannot obtain an exclusive lock on the file.

Dcheck

dcheck – describe, check and maintain isam files
usage – dcheck [-bfhi] <isamfile> […]

options –
b – rebuild all indexes regardless of errors
f – fix (rebuild) corrupt indexes
h – display isam header information only
i – just check indexes, ignore data file

The option string, preceded by a dash, can be placed anywhere on the command line. all options must be specified in one string, and all options apply to all files specified.
dcheck is not interactive, and will ask no questions, so it can be safely used in batch and script files without operator intervention.

dcheck has been designed to run co-operatively ( unless -f or -b specified ) on files in active use, but note that other processes will be blocked for the duration of the check cycle.

Threadsafe

Support for multi-threading is provided in two levels. The base level implementation consists of a parallel collection of isam function calls that operate on file descriptor structures (as opposed to the usual integer file handles) and, other than requiring that you do not open the same file more than once in any one process, is fully multi-thread compatible without the requirement for critical section handling.

The single file restriction only applies if you are using the base library without ISADMIN active. Under these circumstances the library does no mutex locking at all. If you are using the standard wrapper calls, or have activated ISADMIN, there is no restriction on duplicate opens within the same process.

The second level option provides full c-isam standard compatibility and operates on the standard integer based file handles and associated file descriptor table. this option provides critical section (mutex) handling for protection of the descriptor table and logic to provide appropriate handling for duplicate opens on the same file within the same process. the mutex requirement impacts on the efficiency of the library, but allows the use of the standard interface without losing functionality. There are two exceptions.

The four global variables – isrecnum, isreclen, iserrno and iserrio are present and maintained, but should not be considered reliable. Instead you should use the following function call equivalents:

long *isrecnum( int isfd );
int *isreclen( int isfd );
int *iserrno( int isfd );
int *iserrio( int isfd );

Note that these functions return pointers rather than values. This allows the application to both enquire and set the value, as in the following examples:

if( *is_errno(isfd) = = ENOREC ) /* record not found */
*is_recnum(isfd) = = saved_record_number;

The other exception applies when using isbuild to create variable length file. because isbuild relies on the value of isreclen, you must be careful that there are no other threads running that might disturb the value. i hope to provide as solution to this problem soon, but i have not yet decided exactly what form it should take.

NOTE: when using second level support you must call isThreaded() from your root thread before starting any isam threads – this will initialise the master mutex lock.

NOTE: at no time, under any circumstances, should the application allow more than one thread to operate on an individual isam file handle at any given time.

Variable length records

The following is a list of the functional and programmatic differences between fixed length and variable length isam file handling.

isreclen is a global integer used to pass variable length info around in much the same way as isrecnum.

isaddindex, the parts of any key added must reside within the fixed length part of the record.

isbuild, you can add ISVARLEN to the mode argument to create a varlen file. If so then the record length argument must define the maximum possible length of the records. you must also set isreclen to the size of the fixed length header portion of the record. key fields can only reside in the fixed length portion.

isindexinfo, if file is variable, the msb of di_nkeys will be set to indicate same. di_recsize contains the maximum record size, the fixed length size is returned in isreclen.

isopen, if the file is variable length you must use ISVARLEN in the mode, or else isopen will return an EBADARG error. the converse is also true – if the file is not variable length, then specifying ISVARLEN will return the same error. if successful then isreclen will be set to the maximum record length of the opened file.

isread, isreclen will be set to the actual size of the record just read.

isrewcurr, isrewrec, isrewrite, iswrcurr, iswrite set isreclen to actual size of record before calling.

The check utility will analyse and test the variable length storage nodes and report statistics.

I have found that cisam version 4 appears to have a tendency to corrupt the hash table, and have provided a provisional patch to work around the error, in much the same way as i imagine cisam must have done themselves, since cisam does not seem to be bothered by the problem.

It is quite possible that the “corruption” is deliberate, but I have not yet been able to detemine a reason for it, and am waiting for more information before making any further adjustments.

Transaction processing

We have attempted to provide a transparent implementation of the transaction processing routines provided by c-isam. the library should perform as specified in the c-isam manuals, but includes a few additional enhancements.

c-isam allows multiple calls to isbegin. additional calls appear to function in the same manner as iscommit, but since it’s behaviour is unspecified we have opted to return an error. see the references below regarding overlapping transactions for an alternative.

After islogopen has been called, c-isam does not permit rollback type activity unless isbegin has been called. In other words all updates must occur within a transaction. We have eliminated this restriction, meaning that updates can be logged for recovery without requiring that each update occur within a transaction set. Useful where incremental backups are required but the begin/commit/rollback cycle is redundant.

We are not certain that we have identified the log file locking offsets properly. there is a possibility that the library will not co-operate in concert with c-isam, but we have not found any problems to date.

Transaction logging/processing is configured via ISLOGGING in isconfig.h

A value of one results in standard c-isam compatible behaviour, and a value of two will include the ability to overlap multiple transactions at the expense of uid recording in the log file.

Function calls reference

int islogopen( char *name )

Opens log file. Once the log file is opened all subsequent operations on all files will be logged until the log is closed or suspended. Each process is responsible for opening a log file. Usually all processes within an application will open the same log file. Access is governed by concurrency locks to prevent simultaneous writes, but this feature might not be compatible with c-isam. Files opened/built with ISNOLOG in the mode will not be logged.

int islogclose( void )

closes log file for the process. all subsequent operations will not be logged.

int isbegin( void )

defines the beginning of a set of operations on all files opened with ISTRANS in the mode argument. this set of operations is known as a transaction.

All records updated within the transaction will remain locked to ensure no other process can affect the results. In multi-user situations it is important to keep the time spent within a transaction to a minimum to avoid impacting on other processes’ access to the records involved.

All calls to isclose within the transaction will be deferred to avoid releasing the locks mentioned above. if the same file is reopened within the same transaction you will simply get the same file handle back.

All updates affect the data file immediately. if the process should terminate before completing the transaction, the changes will remain unless isrecover is used to rebuild the data from the last backup.

int iscommit( void )

defines the successful completion of a transaction. all locked records will be released and all deferred closes honoured.

int isrollback( void )

defines the unsuccessful completion of a transaction. all updated records will be restored to their original state, all record locks released and all deferred closes honoured.

int isrecover( void )

used to rebuild a set of data files from an original condition by means of a common log file. islogopen() must be called first to open the log.

int issusplog( void )

suspend logging for all files.

int isresumlog( void )

resume logging for all files.

int istxnid( int value )

optional feature, see overlapping transactions below for more.

usage reference, data file recovery

If, at the time a full data backup was made, the transaction log was purged, then, provided all updating processes call islogopen() before making changes, it is possible to rebuild the data files from the backup and the log file.

It is not necessary that files be opened with a mode of ISTRANS for changes to be logged. files that are opened with a mode of ISNOLOG will not be logged and will not be recoverable.

To do so it is necessary to compile a recovery utility that will call the following functions in order:

islogopen( <logname> );
isrecover();
islogclose();

The recovery procedure will then be to restore the backup ( make sure not to overwrite the log file ) and run the recovery utility.

The utility should check the return from isrecover() to ensure no errors occurred during recovery.

It is also possible to implement incremental backups by this method. The log file can be backed up and purged on a regular basis. To recover it will be necessary to restore the original backup, then restore and run the recovery utility on each log file in chronological order.

Transaction processing fundamentals

The theory behind transaction processing is that a collection of changes to a group of data files can be defined in such a way as to be able to undo these changes at any time during the operation.

The collection of changes is referred to as a transaction.

A sample program might best serve to illustrate. Proper error checking has been omitted to simplify:

/* post a sales invoice */
#include <iswrap.h>
int invcFile; /* source invoice record file */
int custFile; /* customer master records */
int tranFile; /* accounting transaction file */ main()
{
islogopen( "logfile" );
invcFile = isopen( "invoice", INOUT + ISTRANS);
custFile = isopen( "customer", INOUT + ISTRANS );
tranFile = isopen( "transaction", INOUT + ISTRANS );
isstart( invcFile, ..., ISFIRST );
/* loop through invoices */
while( isread( invcFile, invcrec, ISNEXT ) == SUCCESS)
{
isbegin(); /* define transaction start */
if(postCust() == SUCCESS /* update customer master */
&&
postTran() == SUCCESS /* create transaction record */
&&
delInvc() == SUCCESS /* delete invoice record */
)
iscommit(); /* successful, commit changes */
else
isrollback(); /* failed, erase changes */
}
isclose( invcFile );
isclose( custFile );
isclose( tranFile );
islogclose();
exit( 0 );
}

Note that the transaction processing logic will lock all records that are changed within the transaction for the duration of the transaction. in some instances it pays to keep the transaction cycle as short as possible to allow other processes access to these records.

Note also that those files whose changes are to be erased when isrollback is called must be opened with mode ISTRANS. changes will still be logged for those that are not, but the transaction processing logic ignores them and will not undo any changes.

Overlapping transactions option

When ISLOGGING in isconfig.h is set to a value of 2, the ability to overlap transactions is provided, at the expense of user id recording in the log file.

With this option active, whenever isbegin() is called the library will generate an unique integer identifier, push any current txn list onto the stack and start a new list. The unique identifier is then returned – you should keep track of it. All subsequent transactions will be placed in the new list. Your application can then switch transaction lists at any time by making a call to istxnid( id ) where id is the value returned by the isbegin() call for the transaction list you wish to work within.

The maximum number of concurrent overlapping transactions is set to 10 by default, and can be adjusted at compile time via the ISTXNID define in istrans.c

Leave a Reply