Subject: Object file symbols limited to 8 characters [+FIX] (#172 - #15 of 19)
Index:	cc,as,ld,ar,ranlib,nm,nlist,adb,... (2.11BSD)

Description:
	For some time now (seem like eons;-)) the object file format used 
	by Unix for the PDP-11 has restricted symbols to 8 significant 
	characters (actually 7 due to the C compiler prefixing symbols with 
	a leading tilde (~) or underscore (_)).

	Aside from the "creative constraints" this imposes on the programmer
	there was the continuing problem of 'name collisions', especially
	when porting applications from machines whose object file format
	permitted longer symbol names.

	Numerous workarounds have been employed in the past.  The
	most common one relied on a combination of a name collision
	detection program ('shortc') and the flexname capability of the
	C preprocessor ('cpp').  This served to mask the problem while
	making debugging difficult due to mangled/synthetic symbol names.

Repeat-By:
	Attempt to compile the following program:

		int	this_is_a_long_name;
		int	this_is_a_long_name_too;

		main() { exit(0)};

Fix:
	This section is repeated in each of the 19 parts which make up
	the update kit.  You should read it perhaps once or twice, but
	then skip over it (how to do that is mentioned below).

	Taking a "hint" from the a.out(5) man page:

"The compiler will note name collisions when they occur within a single file...
There is really little that can be done about this.  Some thought is being
given to modifying the loader to flag detectable collisions, but the real
solution would be to change over to the 4BSD a.out format.  This would 
involve modifying the compiler, assembler and adb and then simply porting 
the 4.3BSD ld, nm, ranlib, strip and nlist.  Or perhaps simply porting the 
entire 4.3BSD suite might be best ...  Anyone interested in a project?"

	This I have done.  No more volunteers for the project need apply ;-)

	The new limit on symbol length is 32 characters!  There is still
	a limit (but it is _much_ more reasonable now) simply because of
	address space constraints - it needs to be possible to hold 
	at least one of the 'symbol' or 'string' tables in memory in many
	cases (nice to hold both, but - i know, get a 486;-)).

	It must be noted though that it is almost trivial now to raise
	the limit if that is desired - the programs which need to know
	the maximum length of a symbol string all have an easily changed
	#define statement now (usually MAXSYMLEN but there are a couple
	exceptions).  The 'string table' format itself doesn't care how 
	long the strings are.  The actual a.out format won't have to 
	change again to accomodate a higher limit on symbol name length!

	The "string table" object file format has been ported and all
	the necessary changes made throughout the entire system.

	The changes were *massive* and widespread.  Programs affected
	of course included the assembler and compiler.  Other programs
	affected were anything which accessed a symbol table entry either
	via nlist(3) [ps, pstat, fstat, vmstat, etc] or by reading
	object files [ld, ranlib, nm, adb, strip, etc].

	The actual changes to the compiler and assembler were minor
	because those programs had already been modified earlier 
	(updates #142, 143, 152, 153).  The compiler only needed to have
	the maximum size of a symbol name raised.  The assembler
	already knew how to generate 'string table' object files - all
	that needed to be done in 'as' was to flip a bit telling it
	to generate the new object format instead of the old style.

	+++++++++++++++
	And now for a bit of a narrative about what was done.   The
	detailed instructions for applying this part (#15 of 19) of the
	update kit follow the 'story' below.  This started out as
	a semi-organized accounting of what was done but then devolved
	into a semi-rambling tale due to the sheer bulk of the changes.

	You can skip to the details for applying #172 by searching for
	the string "=======" below - this header is replicated in all
	parts of this kit.
	+++++++++++++++

	Alas, the remaining changes were not so simple.  Complete
	replacements for ranlib(1), ar(1), nlist(3) were ported from
	the Net-2 release.  Other programs such as symorder(1) and
	two new programs 'symcompact' and 'strcompact' (used to
	compress/compact symbol and string tables) were written from
	scratch.

	Perhaps the two hardest parts of the whole effort were
	rewriting the linker 'ld' and making *large* modifications
	to the debugger 'adb'.  This was a very difficult job.
	'ld needed to scan new style ranlib archives, as well as
	using the "virtual memory" facility (the 'libvmf' routines 
	posted earlier) for symbol table management and so on.  'adb'
	was a MESS (having been written in a pseudo block structured
	macro language).  Since the new symbol table entry could be
	so much larger than the old it was no longer possible for adb
	to hold as much of the symbol table in memory - an alternate 
	method took a while to develope and implement, more on that
	in the patch which deals with adb (actually the changes to
	adb are so large there are two substantial parts of this update
	kit just for adb!).

	After the basic programs (ar, ld, ranlib, etc) were running
	the system had to be completely recompiled from sources, beginning
	with the object libraries.  After those were done the process
	of recompiling the rest of the system could proceed.

	Guess what happens when you recreate libc.a with a buggy linker?
	Yep - the system is rendered useless until backup copies of
	everything can be reloaded.  Don't let this happen to you - be
	sure (and i'll repeat the point later) to back up the system
	(or at least key executables and .a files) before installing
	this upgrade.

	In all there were about 330 files modified during the change of
	object file format.  Some of these were not directly related
	to the new object file format.  There were a number of (obsolete)
	references to "BSD2_10" lingering in the system.  Those 
	have been replaced with "pdp11" and the 'BSD2_10' define has
	been removed from the C preprocessor (cpp).  DO NOT use 'BSD2_10'
	to #ifdef pdp-11 sensitive code, use "pdp11" instead.

	During the recompile of the libraries a fairly large number of
	"shortened" names were lengthened - these included syscall routines
	such as "gethostname" which no longer had to be munged into
	"gethname".  Also a surprising number of typographical errors
	were uncovered (mainly in the Fortran libraries) where an extra
	character (beyond the 7th character) was left off or accidentally
	added.  These were all fixed and eventually, after a couple
	evenings, the libraries were built and installed.

	After the libraries were done it was the application programs'
	turn to be recompiled.  This took the better part of a couple 
	weeks to finally make it thru due to (as it turned out) the iterative 
	nature of the task.  A symbol would come up undefined and have
	to be tracked down exactly where the wrong definition/use was
	coming from.  Finally, however, the task was done and it was
	time to move on to the kernel.

	The kernel proved to be suprisingly easy - no real complications
	arose except when it came time to reboot, a bug had been introduced
	into 'autoconfig' (who uses 'nlist' to scan the kernel symbol table).
	Ouch!  That was another couple late nights.  Since the compiler
	supports unsigned longs now a number of small changes which
	ifdef'd 'u_long' to 'long' were removed.

	REMEMBER - you need to recompile 'autoconfig' and install it
	before rebooting the new kernel ;-)

	The performance of 'ps' though (and anything else which used
	nlist(3), 'fstat', 'w' are good examples) was unacceptably slow.

	So, amidst other delays (real work, the earthquake - which almost
	tossed the disc drive to the floor, etc) the "symorder" program
	was written (with ideas borrowed from the Net-2 version).  The
	symorder(1) program rather insists on holding both the symbol
	and string tables in memory - this was a problem (or could be
	if the kernel symbol table grows much more) so two new and 
	original programs were written:  'symcompact' and 'strcompact'.

	The first program compacts the symbol table by removing 
	'register' local variables (they're of no use to anyone - the debugger
	doesn't/can't do anything with them) and redundant global text
	symbols (symbols in an overlaid program which are in the root
	segment do not need both the '~' and '_' symbols present).

	The second program 'strcompact' is one that any 'string table'
	based object file system can use.  It implements "shared strings"
	for symbols - if a program has many references to 'error' as a
	local symbol, why store the string 'error' more than once?  Simply
	store one instance and then update the symbol table entries to
	all point to the same string!

	Using both 'strcompact' and 'symcompact' on the /unix image
	resulted in a file that was 15kb smaller.  Running 'symorder'
	then puts the most frequently used symbols at the front of the
	symbol table, the performance of 'w', 'pstat', and other programs
	which nlist(3) the kernel was now acceptable.

	Some of the parts of this kit are large.  The large patch files have
	been split into pieces which the 'patch' will handle, other parts
	(the replacement 'ar' sources) were left as a single 'shar' file
	rather than split them up.

	Each part of this kit consists of:

		a 'patchfile' - this is used with the "patch" program to
		update files.

		an optional 'script' - this is run ("sh script") to perform
		initialization, remove files, create directories and so on.

		an optional 'new.sources' - this is a "shar" file containing
		complete sources for a program.

	ALL pathnames are _absolute_ - this way you do not have to "cd"
	around the system, you should be able to apply all the patches
	while you are in /tmp (or /usr/tmp - wherever you have the most
	free space).

	Be sure that you have at least 40mb free on /usr before 
	rebuilding the system - if you do not then building in stages
	will be necessary.

	Part 19 contains the detailed instructions for rebuilding the
	system _after_ the previous 18 patches have been applied.

	The patches (#158 thru #175) should be applied in order following
	the directions in each part.

	DO NOT recompile anything once the patching has begun until requested
	to do so in part 19.  Many of the system include files are modified
	and the object file format is being changed - recompilation will not
	be possible until the transformation of the system and object libraries
	is complete.

	AT A MINIMUM you will want to back up the following files (unless
	you have a known good backup already made) in case you need to
	recompile something before part 19 is done:

		/bin/ar
		/bin/ld
		/bin/nm
		/bin/as
		/usr/bin/ranlib
		/lib/c0
		/lib/crt0.o
		/lib/mcrt0.o
		/lib/libc.a
		/bin/nm
		/usr/include/*.h
		/usr/include/sys/*.h

	In part 19 there is a *complete* list of all files affected
	(all 336 of them) - you may wish to back those up also.

	And now the common header ('boilerplate') is over (at last ;-)),
	let the installation guide begin.

	As always, the complete 2.11BSD updates are available via 
	anonymous FTP to 'ftp.iipo.gtegsc.com' in the directory /pub/2.11BSD

==========  #172 (Part #15 of 19)

	0) Be in a temp directory ("cd /tmp" or "cd /usr/tmp")

	1) Save the following shar archive to a file (/tmp/172 for example)

	2) Unpack the archive:  sh 172

	3) Unpack the new source replacements:  sh new.sources

	4) rm 172 new.sources

	Part 15 of 19 is done.  DO NOT rebuild or compile _anything_
	at this point!

===== cut here
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	new.sources
# This archive created: Fri Feb  4 23:25:47 1994
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'new.sources'
then
	echo shar: "will not over-write existing file 'new.sources'"
else
sed 's/^X//' << \SHAR_EOF > 'new.sources'
X#! /bin/sh
X# This is a shell archive, meaning:
X# 1. Remove everything above the #! /bin/sh line.
X# 2. Save the resulting text in a file.
X# 3. Execute the file with /bin/sh (not csh) to create:
X#	/usr/src/ucb/symorder.c
X#	/usr/src/ucb/symcompact.c
X#	/usr/src/ucb/strcompact.c
X# This archive created: Fri Jan 28 21:21:13 1994
Xexport PATH; PATH=/bin:/usr/bin:$PATH
Xif test -f '/usr/src/ucb/symorder.c'
Xthen
X	echo shar: "will not over-write existing file '/usr/src/ucb/symorder.c'"
Xelse
Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/symorder.c'
XX/*
XX *	Program Name:   symorder.c
XX *	Date: January 21, 1994
XX *	Author: S.M. Schultz
XX *
XX *	-----------------   Modification History   ---------------
XX *      Version Date            Reason For Modification
XX *      1.0     21Jan94         1. Initial release into the public domain.
XX*/
XX
XX/*
XX * This program reorders the symbol table of an executable.  This is
XX * done by moving symbols found in the second file argument (one symbol
XX * per line) to the front of the symbol table.
XX *
XX * NOTE: This program needs to hold the string table in memory.
XX * For the kernel which has not been 'strcompact'd this is about 21kb.
XX * It is highly recommended that 'strcompact' be run first - that program 
XX * removes redundant strings, significantly reducing the amount of memory 
XX * needed.  Running 'symcompact' will reduce the run time needed by
XX * this program by eliminating redundant non-overlaid text symbols.
XX*/
XX
XX#include <stdio.h>
XX#include <a.out.h>
XX#include <ctype.h>
XX#include <signal.h>
XX#include <string.h>
XX#include <sysexits.h>
XX#include <sys/file.h>
XX
XX#define NUMSYMS	125
XX	char	*order[NUMSYMS];
XX	int	nsorted; 
XX	char	*Pgm;
XX	void	cleanup();
XXstatic	char	sym1tmp[20], sym2tmp[20], strtmp[20];
XXstatic	char	*strtab, *oldname;
XX
XXmain(argc, argv)
XX	int	argc;
XX	char	**argv;
XX	{
XX	FILE	*fp, *fp2, *sym1fp, *sym2fp, *strfp;
XX	int	cnt, nsyms, len, c;
XX	char	fbuf1[BUFSIZ], fbuf2[BUFSIZ];
XX	off_t	symoff, stroff, ltmp;
XX	long	strsiz;
XX	struct	nlist	sym;
XX	struct	xexec	xhdr;
XX
XX	Pgm = argv[0];
XX
XX	signal(SIGQUIT, cleanup);
XX	signal(SIGINT, cleanup);
XX	signal(SIGHUP, cleanup);
XX
XX	if	(argc != 3)
XX		{
XX		fprintf(stderr, "usage %s: symlist file\n", Pgm);
XX		exit(EX_USAGE);
XX		}
XX	fp = fopen(argv[2], "r+");
XX	if	(!fp)
XX		{
XX		fprintf(stderr, "%s: can't open '%s' for update\n", Pgm,
XX			argv[2]);
XX		exit(EX_NOINPUT);
XX		}
XX	setbuf(fp, fbuf1);
XX	cnt = fread(&xhdr, 1, sizeof (xhdr), fp);
XX	if	(cnt < sizeof (xhdr.e))
XX		{
XX		fprintf(stderr, "%s: Premature EOF reading header\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	if	(N_BADMAG(xhdr.e))
XX		{
XX		fprintf(stderr, "%s: Bad magic number\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	nsyms = xhdr.e.a_syms / sizeof (struct nlist);
XX	if	(!nsyms)
XX		{
XX		fprintf(stderr, "%s: '%s' stripped\n", Pgm);
XX		exit(EX_OK);
XX		}
XX	stroff = N_STROFF(xhdr);
XX	symoff = N_SYMOFF(xhdr);
XX/*
XX * Seek to the string table size longword and read it.  Then attempt to
XX * malloc memory to hold the string table.  First make a sanity check on
XX * the size.
XX*/
XX	fseek(fp, stroff, L_SET);
XX	fread(&strsiz, sizeof (long), 1, fp);
XX	if	(strsiz > 48 * 1024L)
XX		{
XX		fprintf(stderr, "%s: string table > 48kb\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	strtab = (char *)malloc((int)strsiz);
XX	if	(!strtab)
XX		{
XX		fprintf(stderr, "%s: no memory for strings\n", Pgm);
XX		exit(EX_OSERR);
XX		}
XX/*
XX * Now read the string table into memory.  Reduce the size read because
XX * we've already retrieved the string table size longword.  Adjust the
XX * address used so that we don't have to adjust each symbol table entry's
XX * string offset.
XX*/
XX	cnt = fread(strtab + sizeof (long), 1, (int)strsiz - sizeof (long), fp);
XX	if	(cnt != (int)strsiz - sizeof (long))
XX		{
XX		fprintf(stderr, "%s: Premature EOF reading strings\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX/*
XX * Now open the file containing the list of symbols to
XX * relocate to the front of the symbol table.
XX*/
XX	fp2 = fopen(argv[1], "r");
XX	if	(!fp2)
XX		{
XX		fprintf(stderr, "%s: Can not open '%s'\n", Pgm, argv[1]);
XX		exit(EX_NOINPUT);
XX		}
XX	getsyms(fp2);
XX
XX/*
XX * Create the temporary files which will hold the new symbol table and the
XX * new string table.  One temp file receives symbols _in_ the list,
XX * another file receives all other symbols, and the last file receives the
XX * new string table.
XX*/
XX	strcpy(sym1tmp, "/tmp/sym1XXXXXX");
XX	mktemp(sym1tmp);
XX	strcpy(sym2tmp, "/tmp/sym2XXXXXX");
XX	mktemp(sym2tmp);
XX	strcpy(strtmp, "/tmp/strXXXXXX");
XX	mktemp(strtmp);
XX	sym1fp = fopen(sym1tmp, "w+");
XX	sym2fp = fopen(sym2tmp, "w+");
XX	strfp = fopen(strtmp, "w+");
XX	if	(!sym1fp || !sym2fp || !strfp)
XX		{
XX		fprintf(stderr, "%s: Can't create %s, %s or %s\n", sym1tmp,
XX			sym2tmp, strtmp);
XX		exit(EX_CANTCREAT);
XX		}
XX	setbuf(sym1fp, fbuf2);
XX/*
XX * Now position the executable to the start of the symbol table.  For each
XX * symbol scan the list for a match on the symbol name.  If the
XX * name matches write the symbol table entry to one tmp file, else write it
XX * to the second symbol tmp file.
XX *
XX * NOTE: Since the symbol table is being rearranged the usefulness of
XX * "local" symbols, especially 'register' symbols, is greatly diminished
XX * Not that they are terribly useful in any event - especially the register
XX * symbols, 'adb' claims to do something with them but doesn't.  In any
XX * event this suite of programs is targeted at the kernel and the register
XX * local symbols are of no use.  For this reason 'register' symbols are 
XX * removed - this has the side effect of even further reducing the symbol 
XX * and string tables that must be processed by 'nm', 'ps', 'adb' and so on.  
XX * This removal probably should have been done earlier - in 'strcompact' or 
XX * 'symcompact' and it may be in the future, but for now just do it here.
XX*/
XX	fseek(fp, symoff, L_SET);
XX	while	(nsyms--)
XX		{
XX		fread(&sym, sizeof (sym), 1, fp);
XX		if	(sym.n_type == N_REG)
XX			continue;
XX		if	(inlist(&sym))
XX			fwrite(&sym, sizeof (sym), 1, sym1fp);
XX		else
XX			fwrite(&sym, sizeof (sym), 1, sym2fp);
XX		}
XX
XX/*
XX * Position the executable file to where the symbol table starts.  Truncate
XX * the file to the current position to remove the old symbols and strings.  Then
XX * write the symbol table entries which are to appear at the front, followed
XX * by the remainder of the symbols.  As each symbol is processed adjust the
XX * string table offset and write the string to the strings tmp file.   
XX *
XX * It was either re-scan the tmp files with the symbols again to retrieve
XX * the string offsets or simply write the strings to yet another tmp file.
XX * The latter was chosen.
XX*/
XX	fseek(fp, symoff, L_SET);
XX	ftruncate(fileno(fp), ftell(fp));
XX	ltmp = sizeof (long);
XX	rewind(sym1fp);
XX	rewind(sym2fp);
XX	nsyms = 0;
XX	while	(fread(&sym, sizeof (sym), 1, sym1fp) == 1)
XX		{
XX		if	(ferror(sym1fp) || feof(sym1fp))
XX			break;
XX		oldname = strtab + (int)sym.n_un.n_strx;
XX		sym.n_un.n_strx = ltmp;
XX		len = strlen(oldname) + 1;
XX		ltmp += len;
XX		fwrite(&sym, sizeof (sym), 1, fp);
XX		fwrite(oldname, len, 1, strfp);
XX		nsyms++;
XX		}
XX	fclose(sym1fp);
XX	while	(fread(&sym, sizeof (sym), 1, sym2fp) == 1)
XX		{
XX		if	(ferror(sym2fp) || feof(sym2fp))
XX			break;
XX		oldname = strtab + (int)sym.n_un.n_strx;
XX		sym.n_un.n_strx = ltmp;
XX		len = strlen(oldname) + 1;
XX		ltmp += len;
XX		fwrite(&sym, sizeof (sym), 1, fp);
XX		fwrite(oldname, len, 1, strfp);
XX		nsyms++;
XX		}
XX	fclose(sym2fp);
XX/*
XX * Next write the symbol table size longword followed by the
XX * string table itself.
XX*/
XX	fwrite(&ltmp, sizeof (long), 1, fp);
XX	rewind(strfp);
XX	while	((c = getc(strfp)) != EOF)
XX		putc(c, fp);
XX	fclose(strfp);
XX/*
XX * And last (but not least) we need to update the a.out header with
XX * the correct size of the symbol table.
XX*/
XX	rewind(fp);
XX	xhdr.e.a_syms = nsyms * sizeof (struct nlist);
XX	fwrite(&xhdr.e, sizeof (xhdr.e), 1, fp);
XX	fclose(fp);
XX	free(strtab);
XX	cleanup();
XX	}
XX
XXinlist(sp)
XX	register struct nlist *sp;
XX	{
XX	register int i;
XX
XX	for	(i = 0; i < nsorted; i++)
XX		{
XX		if	(strcmp(strtab + (int)sp->n_un.n_strx, order[i]) == 0)
XX			return(1);
XX		}
XX	return(0);
XX	}
XX
XXgetsyms(fp)
XX	FILE	*fp;
XX	{
XX	char	asym[128], *start;
XX	register char *t, **p;
XX
XX	for	(p = order; fgets(asym, sizeof(asym), fp) != NULL;)
XX		{
XX		if	(nsorted >= NUMSYMS)
XX			{
XX			fprintf(stderr, "%s: only doing %d symbols\n",
XX				Pgm, NUMSYMS);
XX			break;
XX			}
XX		for	(t = asym; isspace(*t); ++t)
XX			;
XX		if	(!*(start = t))
XX			continue;
XX		while	(*++t)
XX			;
XX		if	(*--t == '\n')
XX			*t = '\0';
XX		*p++ = strdup(start);
XX		++nsorted;
XX		}
XX	fclose(fp);
XX	}
XX
XXvoid
XXcleanup()
XX	{
XX	if	(strtmp[0])
XX		unlink(strtmp);
XX	if	(sym1tmp[0])
XX		unlink(sym1tmp);
XX	if	(sym2tmp[0])
XX		unlink(sym2tmp);
XX	exit(EX_OK);
XX	}
XSHAR_EOF
Xchmod 644 '/usr/src/ucb/symorder.c'
Xfi
Xif test -f '/usr/src/ucb/symcompact.c'
Xthen
X	echo shar: "will not over-write existing file '/usr/src/ucb/symcompact.c'"
Xelse
Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/symcompact.c'
XX/*
XX *	Program Name:   symcompact.c
XX *	Date: January 21, 1994
XX *	Author: S.M. Schultz
XX *
XX *	-----------------   Modification History   ---------------
XX *      Version Date            Reason For Modification
XX *      1.0     21Jan94         1. Initial release into the public domain.
XX*/
XX
XX/*
XX * This program compacts the symbol table of an executable.  This is
XX * done by removing '~symbol' references when _both_ the '~symbol' and
XX * '_symbol' have an overlay number of 0.  The assembler always generates
XX * both forms.  The only time both forms are needed is in an overlaid 
XX * program and the routine has been relocated by the linker, in that event
XX * the '_' form is the overlay "thunk" and the '~' form is the actual 
XX * routine itself.  Only 'text' symbols have both forms.  Reducing the
XX * number of symbols greatly speeds up 'nlist' processing as well as 
XX * cutting down memory requirements for programs such as 'adb' and 'nm'.
XX *
XX * NOTE: This program attempts to hold both the string and symbol tables
XX * in memory.  For the kernel which has not been 'strcompact'd this
XX * amounts to about 49kb.  IF this program runs out of memory you should
XX * run 'strcompact' first - that program removes redundant strings, 
XX * significantly reducing the amount of memory needed.  Alas, this program
XX * will undo some of strcompact's work and you may/will need to run
XX * strcompact once more after removing excess symbols.
XX*/
XX
XX#include <stdio.h>
XX#include <a.out.h>
XX#include <ctype.h>
XX#include <signal.h>
XX#include <string.h>
XX#include <sysexits.h>
XX#include <sys/file.h>
XX
XX	char	*Pgm;
XXstatic	char	strtmp[20];
XX
XXmain(argc, argv)
XX	int	argc;
XX	char	**argv;
XX	{
XX	FILE	*fp, *strfp;
XX	int	cnt, nsyms, len, c, symsremoved = 0;
XX	void	cleanup();
XX	char	*strtab;
XX	char	fbuf1[BUFSIZ], fbuf2[BUFSIZ];
XX	off_t	symoff, stroff, ltmp;
XX	long	strsiz;
XX	register struct	nlist	*sp, *sp2;
XX	struct	nlist	*symtab, *symtabend;
XX	struct	xexec	xhdr;
XX
XX	Pgm = argv[0];
XX	signal(SIGQUIT, cleanup);
XX	signal(SIGINT, cleanup);
XX	signal(SIGHUP, cleanup);
XX
XX	if	(argc != 2)
XX		{
XX		fprintf(stderr, "%s: filename argument missing\n", Pgm);
XX		exit(EX_USAGE);
XX		}
XX	fp = fopen(argv[1], "r+");
XX	if	(!fp)
XX		{
XX		fprintf(stderr, "%s: can't open '%s' for update\n", Pgm,
XX			argv[1]);
XX		exit(EX_NOINPUT);
XX		}
XX	setbuf(fp, fbuf1);
XX	cnt = fread(&xhdr, 1, sizeof (xhdr), fp);
XX	if	(cnt < sizeof (xhdr.e))
XX		{
XX		fprintf(stderr, "%s: Premature EOF reading header\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	if	(N_BADMAG(xhdr.e))
XX		{
XX		fprintf(stderr, "%s: Bad magic number\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	nsyms = xhdr.e.a_syms / sizeof (struct nlist);
XX	if	(!nsyms)
XX		{
XX		fprintf(stderr, "%s: '%s' stripped\n", Pgm);
XX		exit(EX_OK);
XX		}
XX	stroff = N_STROFF(xhdr);
XX	symoff = N_SYMOFF(xhdr);
XX/*
XX * Seek to the string table size longword and read it.  Then attempt to
XX * malloc memory to hold the string table.  First make a sanity check on
XX * the size.
XX*/
XX	fseek(fp, stroff, L_SET);
XX	fread(&strsiz, sizeof (long), 1, fp);
XX	if	(strsiz > 48 * 1024L)
XX		{
XX		fprintf(stderr, "%s: string table > 48kb\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	strtab = (char *)malloc((int)strsiz);
XX	if	(!strtab)
XX		{
XX		fprintf(stderr, "%s: no memory for strings\n", Pgm);
XX		exit(EX_OSERR);
XX		}
XX/*
XX * Now read the string table into memory.  Reduce the size read because
XX * we've already retrieved the string table size longword.  Adjust the
XX * address used so that we don't have to adjust each symbol table entry's
XX * string offset.
XX*/
XX	cnt = fread(strtab + sizeof (long), 1, (int)strsiz - sizeof (long), fp);
XX	if	(cnt != (int)strsiz - sizeof (long))
XX		{
XX		fprintf(stderr, "%s: Premature EOF reading strings\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX/*
XX * Next seek to the symbol table position in the file, allocate memory 
XX * for the symbol table and read it in.  
XX*/
XX	fseek(fp, symoff, L_SET);
XX	symtab = (struct nlist *)malloc(nsyms * sizeof (struct nlist));
XX	if	(!symtab)
XX		{
XX		fprintf(stderr, "%s: no memory for symbols\n", Pgm);
XX		exit(EX_OSERR);
XX		}
XX	cnt = fread(symtab, sizeof (struct nlist), nsyms, fp);
XX	if	(cnt != nsyms)
XX		{
XX		fprintf(stderr, "%s: premature EOF in symbols\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	symtabend = &symtab[nsyms];
XX/*
XX * Now compute the in memory address of the strings for each symbol.  We
XX * do not need to adjust the offset for the string table size longword because
XX * the strings were read in using a biased address.
XX*/
XX	for	(sp = symtab; sp < symtabend; sp++)
XX		sp->n_un.n_name = strtab + (int)sp->n_un.n_strx;
XX
XX/*
XX * Now look for symbols with overlay numbers of 0 (root/base segment) and
XX * of type 'text'.  For each symbol found check if there exists both a '~'
XX * and '_' prefixed form of the symbol.  Preserve the '_' form and clear
XX * the '~' entry by zeroing the string address of the '~' symbol.
XX*/
XX	for	(sp = symtab; sp < symtabend; sp++)
XX		{
XX		if	(sp->n_ovly || !sp->n_un.n_name)
XX			continue;
XX		if	((sp->n_type & N_TYPE) != N_TEXT)
XX			continue;
XX		if	(sp->n_un.n_name[0] != '~')
XX			continue;
XX/*
XX * At this point we have the '~' form of a non overlaid text symbol.  Look
XX * thru the symbol table for the '_' form.  All of 1) symbol type, 2) Symbol
XX * value and 3) symbol name (starting after the first character) must match.
XX*/
XX		for	(sp2 = symtab; sp2 < symtabend; sp2++)
XX			{
XX			if	(sp2->n_ovly || !sp2->n_un.n_name)
XX				continue;
XX			if	((sp2->n_type & N_TYPE) != N_TEXT)
XX				continue;
XX			if	(sp2->n_un.n_name[0] != '_')
XX				continue;
XX			if	(sp2->n_value != sp->n_value)
XX				continue;
XX			if	(strcmp(sp->n_un.n_name+1, sp2->n_un.n_name+1))
XX				continue;
XX/*
XX * Found a match.  Null out the '~' symbol's string address.
XX*/
XX			symsremoved++;
XX			sp->n_un.n_strx = NULL;
XX			break;
XX			}
XX		}
XX/*
XX * Done with the nested scanning of the symbol table.  Now create a new
XX * string table (from the remaining symbols) in a temporary file.
XX*/
XX	strcpy(strtmp, "/tmp/strXXXXXX");
XX	mktemp(strtmp);
XX	strfp = fopen(strtmp, "w+");
XX	if	(!strfp)
XX		{
XX		fprintf(stderr, "%s: can't create '%s'\n", Pgm, strtmp);
XX		exit(EX_CANTCREAT);
XX		}
XX	setbuf(strfp, fbuf2);
XX
XX/*
XX * As each symbol is written to the tmp file the symbol's string offset
XX * is updated with the new file string table offset.
XX*/
XX	ltmp = sizeof (long);
XX	for	(sp = symtab; sp < symtabend; sp++)
XX		{
XX		if	(!sp->n_un.n_name)
XX			continue;
XX		len = strlen(sp->n_un.n_name) + 1;
XX		fwrite(sp->n_un.n_name, len, 1, strfp);
XX		sp->n_un.n_strx = ltmp;
XX		ltmp += len;
XX		}
XX/*
XX * We're done with the memory string table - give it back.  Then reposition
XX * the new string table file to the beginning.
XX*/
XX	free(strtab);
XX	rewind(strfp);
XX
XX/*
XX * Position the executable file to where the symbol table begins.  Truncate
XX * the file.  Write out the valid symbols, counting each one so that the 
XX * a.out header can be updated when we're done.
XX*/
XX	nsyms = 0;
XX	fseek(fp, symoff, L_SET);
XX	ftruncate(fileno(fp), ftell(fp));
XX	for	(sp = symtab; sp < symtabend; sp++)
XX		{
XX		if	(sp->n_un.n_strx == 0)
XX			continue;
XX		nsyms++;
XX		fwrite(sp, sizeof (struct nlist), 1, fp);
XX		}
XX/*
XX * Next write out the string table size longword.
XX*/
XX	fwrite(&ltmp, sizeof (long), 1, fp);
XX/*
XX * We're done with the in memory symbol table, release it.  Then append
XX * the string table to the executable file.
XX*/
XX	free(symtab);
XX	while	((c = getc(strfp)) != EOF)
XX		putc(c, fp);
XX	fclose(strfp);
XX	rewind(fp);
XX	xhdr.e.a_syms = nsyms * sizeof (struct nlist);
XX	fwrite(&xhdr.e, sizeof (xhdr.e), 1, fp);
XX	fclose(fp);
XX	printf("%s: %d symbols removed\n", Pgm, symsremoved);
XX	cleanup();
XX	}
XX
XXvoid
XXcleanup()
XX	{
XX	if	(strtmp[0])
XX		unlink(strtmp);
XX	exit(EX_OK);
XX	}
XSHAR_EOF
Xchmod 644 '/usr/src/ucb/symcompact.c'
Xfi
Xif test -f '/usr/src/ucb/strcompact.c'
Xthen
X	echo shar: "will not over-write existing file '/usr/src/ucb/strcompact.c'"
Xelse
Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/strcompact.c'
XX/*
XX *	Program Name:   strcompact.c
XX *	Date: January 21, 1994
XX *	Author: S.M. Schultz
XX *
XX *	-----------------   Modification History   ---------------
XX *      Version Date            Reason For Modification
XX *      1.0     21Jan94         1. Initial release into the public domain.
XX*/
XX
XX/*
XX * This program compacts the string table of an executable image by
XX * preserving only a single string definition of a symbol and updating
XX * the symbol table string offsets.  Multiple symbols having the same
XX * string are very common - local symbols in a function often have the
XX * same name ('int error' inside a function for example).  This program
XX * reduced the string table size of the kernel at least 25%!
XX*/
XX
XX#include <stdio.h>
XX#include <a.out.h>
XX#include <ctype.h>
XX#include <signal.h>
XX#include <string.h>
XX#include <sysexits.h>
XX#include <sys/file.h>
XX
XX	char	*Pgm;
XX	char	*Sort = "/usr/bin/sort";
XXstatic	char	strtmp[20], tempfn[20], symtmp[20];
XXstatic	int	shared;
XXextern	long	atol();
XXextern	time_t	time();
XX
XXmain(argc, argv)
XX	int	argc;
XX	char	**argv;
XX	{
XX	char	fbuf1[BUFSIZ], fbuf2[BUFSIZ];
XX	char	buf1[128], buf2[128];
XX	char	*string1, *string2, *tab1pos, *tab2pos, *cp;
XX	FILE	*aoutfp, *symfp, *strfp;
XXregister FILE	*fp;
XX	struct	xexec	xhdr;
XXregister struct	nlist	*sp;
XX	struct	nlist	*symtab, *symtabend;
XX	int	nsyms, c, cnt, len;
XX	void	cleanup();
XX	off_t	symoff, stroff, ltmp, offset1, offset2;
XX
XX	Pgm = argv[0];
XX	signal(SIGQUIT, cleanup);
XX	signal(SIGINT, cleanup);
XX	signal(SIGHUP, cleanup);
XX
XX	if	(argc != 2)
XX		{
XX		fprintf(stderr, "%s: missing filename argument\n", Pgm);
XX		exit(EX_USAGE);
XX		}
XX	aoutfp = fopen(argv[1], "r+");
XX	if	(!aoutfp)
XX		{
XX		fprintf(stderr, "%s: can not open '%s' for update\n", 
XX			Pgm, argv[1]);
XX		exit(EX_NOINPUT);
XX		}
XX	cnt = fread(&xhdr, 1, sizeof (xhdr), aoutfp);
XX	if	(cnt < sizeof (xhdr.e))
XX		{
XX		fprintf(stderr, "%s: premature EOF\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	if	(N_BADMAG(xhdr.e))
XX		{
XX		fprintf(stderr, "%s: Bad magic number\n", Pgm);
XX		exit(EX_DATAERR);
XX		}
XX	nsyms = xhdr.e.a_syms / sizeof (struct nlist);
XX	if	(!nsyms)
XX		{
XX		fprintf(stderr, "%s: '%s' stripped\n", Pgm, argv[1]);
XX		exit(EX_OK);
XX		}
XX
XX	strcpy(strtmp, "/tmp/strXXXXXX");
XX	mktemp(strtmp);
XX	strcpy(tempfn, "/tmp/SYMXXXXXX");
XX	mktemp(tempfn);
XX	strcpy(symtmp, "/tmp/symXXXXXX");
XX	mktemp(symtmp);
XX
XX	symoff = N_SYMOFF(xhdr);
XX	stroff = N_STROFF(xhdr);
XX
XX/*
XX * Now move to the start of the string table, bypassing the string table
XX * size longword.
XX*/
XX	fseek(aoutfp, stroff + sizeof (long), L_SET);
XX
XX	fp = fopen(tempfn, "w+");
XX	if	(!fp)
XX		{
XX		fprintf(stderr, "%s: can't create temp file\n", Pgm);
XX		exit(EX_CANTCREAT);
XX		}
XX/*
XX * Now read the string table and produce lines of the form:
XX *
XX *	string_offset<tab>symbol_string
XX *
XX * in the temp file. 
XX*/
XX	ltmp = sizeof (long);
XX	while	(1)
XX		{
XX		if	(feof(aoutfp) || ferror(aoutfp))
XX			break;
XX		sgets(aoutfp, fp, &ltmp);
XX		}
XX	fclose(fp);
XX/*
XX * Next we sort the temp file on the second field (symbol name).  Duplicates
XX * are _not_ suppressed this time since we will be scanning the symbol table
XX * looking for references to offsets belonging to the same symbol.
XX*/
XX	sprintf(fbuf1, "%s +1 -2 -o %s %s", Sort, tempfn, tempfn);
XX	system(fbuf1);
XX	fp = fopen(tempfn, "r");
XX	if	(!fp)
XX		fatal("Can't reopen sorted file");
XX/*
XX * Now use the local buffer to leave more room to malloc for the
XX * symbol table.
XX*/
XX	setbuf(fp, fbuf1);
XX
XX/*
XX * We need to hold the entire symbol table in memory - for the kernel this
XX * is approximately 28kb.
XX*/
XX	symtab = (struct nlist *)calloc(nsyms, sizeof (struct nlist));
XX	if	(!symtab)
XX		fatal("no memory for symbol table");
XX	symtabend = &symtab[nsyms];
XX
XX	fseek(aoutfp, symoff, L_SET);
XX	cnt = fread(symtab, sizeof (struct nlist), nsyms, aoutfp);
XX	if	(cnt != nsyms)
XX		fatal("Premature EOF reading symbols");
XX
XX/*
XX * The sorted strings file looks like this:
XX *
XX *  1234 _foobar
XX *  168  _foobar
XX *  6238 _foobar
XX *  ...
XX *  6512 _blatz
XX *
XX * We want to make all string offsets to '_foobar' be 1234.  When a different
XX * symbol is encountered (_blatz) we know we're done with the previous symbol
XX * and the process starts over.
XX*/
XX
XX	string1 = buf1;
XX	string2 = buf2;
XX	fgets(string1, sizeof(buf1), fp);
XX
XX	while	(fgets(string2, sizeof (buf1), fp))
XX		{
XX		tab1pos = index(string1, '\t');
XX		tab2pos = index(string2, '\t');
XX		if	(!tab1pos || !tab2pos)
XX			fatal("malformed input from sort file");
XX		tab1pos++;
XX		tab2pos++;
XX/*
XX * Compare the previous and current symbol.  If they are different then
XX * copy the second string to the first and continue the scanning.
XX*/
XX		if	(strcmp(tab1pos, tab2pos))
XX			{
XX			strcpy(string1, string2);
XX			continue;
XX			}
XX /*
XX  * If they are the same then look thru the symbol table for references to the
XX  * current offset, replacing it with the offset from the first instance of
XX  * the symbol
XX*/
XX		offset2 = atol(string2);
XX		for	(sp = symtab; sp < symtabend; sp++)
XX			{
XX			if	(sp->n_un.n_strx == offset2)
XX				{
XX				shared++;
XX				sp->n_un.n_strx = atol(string1);
XX				}
XX			}
XX/*
XX * Since the strings matched we do not swap the buffers and continue looking
XX * for matches on the symbol pointed to by 'string1'.
XX*/
XX		continue;
XX		}
XX	fclose(fp);
XX	fprintf(stderr, "%s: %d shared strings found\n", Pgm, shared);
XX	if	(!shared)
XX		{
XX		fclose(aoutfp);
XX		fatal((char *)NULL);
XX		}
XX/*
XX * Now use "uniq -1" on the temp file to remove the duplicates, preserving
XX * the first mention of each symbol (which is the one used above).
XX*/
XX	sprintf(fbuf1, "/usr/bin/uniq -1 %s", tempfn);
XX	fp = popen(fbuf1, "r");
XX	if	(!fp)
XX		fatal("popen uniq failed");
XX/*
XX * Now create the temporary files which will hold the new string table and
XX * symbol table.  As the output from 'uniq' is processed the symbol table
XX * is scanned and matches on the 'offset' cause symbols to be output to
XX * the symbol table file.  As a symbol is placed in the file it is cleared
XX * in memory so it is not processed more than once.
XX*/
XX
XX	symfp = fopen(symtmp, "w+");
XX	if	(!symfp)
XX		fatal("Create of symtmp failed");
XX	setbuf(symfp, fbuf1);
XX
XX	strfp = fopen(strtmp, "w+");
XX	if	(!strfp)
XX		fatal("Create of strtmp failed");
XX	setbuf(strfp, fbuf2);
XX
XX/*
XX * Initialize the string table offset to the minimum - the long word size
XX * includes itself in the string table size.
XX*/
XX	ltmp = sizeof (long);
XX
XX	while	(fgets(buf1, sizeof(buf1), fp))
XX		{
XX		tab1pos = index(buf1, '\t');
XX		*tab1pos++ = '\0';
XX		tab2pos = index(tab1pos, '\n');
XX		*tab2pos++ = '\0';
XX/*
XX * Get the offset and enter into the symbol table scan to look for
XX * references to this offset.  It is a fatal error not to find a match.
XX * Write matched symbols out to the file and then clear their string offset
XX * so they are not found again.
XX*/
XX		offset1 = atol(buf1);
XX		cnt = 0;
XX		for	(sp = symtab; sp < symtabend; sp++)
XX			{
XX			if	(sp->n_un.n_strx == 0)
XX				continue;
XX			if	(sp->n_un.n_strx != offset1)
XX				continue;
XX			sp->n_un.n_strx = ltmp;		/* NEW offset */
XX			fwrite(sp, sizeof (struct nlist), 1, symfp);
XX			sp->n_un.n_strx = 0;
XX			cnt++;
XX			}
XX		if	(!cnt)
XX			fatal("No symbols found in offset scan");
XX/*
XX * Now write the string (including the terminating null) and update the
XX * string table offset (for the next symbol written).
XX*/
XX		len = strlen(tab1pos) + 1;
XX		fwrite(tab1pos, len, 1, strfp);
XX		ltmp += len;
XX		}
XX/*
XX * Close down the input pipe and reposition the temp file output for
XX * updating.  Free the symbol table - we're done with it now.  Position
XX * the a.out file to where the symbol table starts.
XX*/
XX	pclose(fp);
XX	rewind(symfp);
XX	rewind(strfp);
XX	free(symtab);
XX	fseek(aoutfp, symoff, L_SET);
XX
XX/*
XX * Now append the new symbol table.  Then write the string table length
XX * followed by the string table.  Finally truncate the file to the new
XX * length, reflecting the smaller string table.
XX*/
XX	while	((c = getc(symfp)) != EOF)
XX		putc(c, aoutfp);
XX	fwrite(&ltmp, sizeof (long), 1, aoutfp);
XX	while	((c = getc(strfp)) != EOF)
XX		putc(c, aoutfp);
XX	ftruncate(fileno(aoutfp), ftell(aoutfp));
XX	fclose(aoutfp);
XX	fclose(symfp);
XX	fclose(strfp);
XX	fatal((char *)NULL);
XX	}
XX
XXfatal(str)
XX	char	*str;
XX	{
XX
XX	if	(tempfn[0])
XX		unlink(tempfn);
XX	if	(strtmp[0])
XX		unlink(strtmp);
XX	if	(symtmp[0])
XX		unlink(symtmp);
XX	if	(!str)
XX		exit(EX_OK);
XX	fprintf(stderr, "%s: %s\n", str);
XX	exit(EX_SOFTWARE);
XX	}
XX
XXvoid
XXcleanup()
XX	{
XX	fatal((char *)NULL);
XX	}
XX
XXsgets(aoutfp, fp, ltmp)
XX	register FILE *aoutfp, *fp;
XX	long	*ltmp;
XX	{
XX	char	buf[128];
XX	int	c;
XX	register char *cp;
XX
XX	cp = buf;
XX	while	((c = getc(aoutfp)) != EOF)
XX		{
XX		if	(cp < &buf[sizeof (buf) - 1])
XX			*cp++ = c;
XX		if	(c == '\0')
XX			break;
XX		}
XX	*cp++ = '\0';
XX	if	(buf[0] == '\0')
XX		return;
XX	fprintf(fp, "%ld\t%s\n", *ltmp, buf);
XX	*ltmp += (strlen(buf) + 1);
XX	}
XSHAR_EOF
Xchmod 644 '/usr/src/ucb/strcompact.c'
Xfi
Xexit 0
X#	End of shell archive
SHAR_EOF
fi
exit 0
#	End of shell archive
