/* * c s e t . h */ /*)LIBRARY */ #ifdef DOCUMENTATION title cset Header file for character set functions index Header file for character set functions synopsis #ifdef vms #include "c:cset.h" #else #include #endif description The character set functions provide a set of routines for describing and manipulating sets of characters. The character sets, called "csets", created in this way can be manipulated quickly, and require relatively little storage. They are meant to be used as arguments to pattern-matching functions like span() (which see). For these purposes, a set of functions to create csets, and produce the complement (with respect to the set of all 8-bit characters) of a set, and the join (union), meet (intersection) and difference of two csets is provided; see cset(), cscomp(), csjoin(), csmeet(), and csdiff(). csets can also be used more generally as representations of sets - i.e., the name can be read as "C sets". In this case, the universe is the set of numbers 0...(cssize-1), where cssize is a global parameter defined in cset.c; it is normally 256 for character work. The functions provided for this kind of application include csmember(), which checks membership, and csless() and cswith(), which add and remove elements from sets. When csets are used in this way, it is important to understand that a cset is a data object with an internal structure, and that different csets may share internal data - i.e., csets are not normally "atomic" objects and care must be taken in manipulating them. A look at the representation of csets should help clarify this point. The only object you normally manipulate directly in your code is a cset pointer, type (CSET *). This pointer points to a cset header, which contains a mask and a pointer to a table of cssize bytes. A character is in the cset if any of the bits in its mask is on in the corresponding table entry. Csets created by cset() always have a one-bit mask; however, csjoin() and friends, avoid, if possible, using up a bit position, by creating a header with a mask containing more than one bit. Hence, the join of two csets often can be represented very cheaply. Complements of csets are represented still more efficiently; even the header of a cset and its complement are shared. Only the pointer is changed - its bit pattern is complemented. A consequence of this representation is that a great deal of data is often shared between csets. When manipulating csets as arbitrary sets, it is important to understand that applying csless() or cswith() to a cset may cause any related csets to be changed. Thus, after the sequence of calls: uvowels = cset("AEIOU"); lvowels = cset("aeiou"); vowels = csjoin(uvowels,lvowels); lvowels = cswith(lvowels,'y'); 'y' is probably a member of vowels. (Only "probably" because it is impossible to predict whether uvowels and lvowels happen to get the same table; csjoin() cannot use the "cheap" representation if they don't.) Two methods are available to avoid this problem. First, cscopy() returns a guaranteed-"unique" copy of a cset. Second, the global csunique (in cset.c) can be set, forcing functions such as csjoin() to avoid space-saving shortcuts. internal The exact form of the cset header structure was chosen to be identical to the character set pointer structure used for the PDP-11 CIS instructions. Any ambitious programmers are encouraged to make use of those instructions to produce fast versions of span() etc. Note that the use of a complemented pointer to the header to represent a complemented cset relies on malloc() always returning memory pointers with a 0 in the bottom bit of their representation. This is probably true in most implementations. bugs author Jerry Leichter #endif /* )EDITLEVEL=10 * Edit history * 0.0 12-Jul-82 JSL Invention */ #ifndef _CSET_ /* Don't do this twice */ #define _CSET_ typedef struct cset { char mask; /* Mask for chars in set */ char _fill_; /* For CIS compatibility */ char *table; /* Character table */ } CSET; extern CSET *cset(); /* Make a cset */ extern CSET *cset_t(); /* Make a temporary cset */ extern CSET *cscopy(); /* Copy a cset */ extern CSET *csdiff(); /* Difference of csets */ extern CSET *csjoin(); /* Union of csets */ extern CSET *csless(); /* Remove element from cset */ extern CSET *csmeet(); /* Intersection of csets */ extern int csmember(); /* Test for membership */ extern CSET *cswith(); /* Add element to cset */ extern CSET *_cscomp(); /* Real, callable complement */ /* The character set matching functions */ extern char *any(); extern char *ospan(); extern char *span(); extern char *upto(); #define cscomp (CSET *)~(int) /* Macro complement */ extern int csmask; /* Mask to apply to chars */ extern int cssize; /* Size of a cset */ extern int csunique; /* Make unique copies of csets */ #endif