		A Collection of Usenet path-counting tools
		Joseph T. Buck - Entropic Processing, Inc.

This is a set of tools for computing statistics on the Usenet paths
on your machine.

The program 'getpaths' gathers paths from your news spool, taking cross
postings into account (each path appears in the output once).
See the man page supplied.

There are three makefiles provided; I hope getpaths can be made to
run on anything.  Use makefile.42 for 4.2bsd or 4.3bsd; makefile.sysv
for System III or System V; use makefile.v7 for Version 7 or 4.1bsd.

If you have trouble, there are three questions you have to answer:

	1) Do you have the 4.2 directory access library?  If so, use
	   scantree.c, otherwise use oscantree.c.

	2) v7 and bsd systems have functions "index" and "rindex".
	   SysIII and SysV changed the names to strchr and strrchr,
	   for no good reason.  See the definition of CFLAGS in
	   makefile.sysv.

	3) Do you have getopt? (SysIII,V do, bsd, v7 don't).  I'm
	   including Henry Spencer's public domain version, since
	   it's short and valuable and Berkeley folks can't run the
	   program without it.

You may find the scantree functions useful in other programs.  You're
welcome to them, but leave my name in, please.

With no options, getpaths prints the paths in every article on the standard
output.  If you expire news after one week, this will result in about 300 Kbytes
of output.  To restrict yourself to a subtree, you may give an option, which
is a pathname that uses /usr/spool/news as a base.

	getpaths mod
	getpaths net/politics

Once you've used getpaths to gather some paths, you can use the shell scripts
to compute some statistics on them:

histog -
	Computes the average path length and prints a histogram.
hostcount -
	Counts hostnames that occur anywhere in a path and prints them
	in declining order of frequency.
twolink -
	Prints connected pairs of sites in declining order of frequency.
	I used twolink to prepare my most recent map.
threelink -
	Prints connected triplets of sites in declining order of frequency.

Finally, there is "closure", which will give "awk" a good workout on your
system.  It is an automated version of the procedure that I used to build
an extended backbone map.

closure takes three arguments.  The first argument is the name of a
file containing the "core" sites you wish to start with (you can
think of these as "backbone" sites, but I'm not claiming to define what
"backbone" really means).

The file must be in a somewhat peculiar format; each line must begin
with the word BACKBONE (in caps), followed by the site names,
separated by !  marks.  For example, if you want to start with hosts
alpha, beta, and gamma, the backbone file could be 

BACKBONE!alpha!beta!gamma

or

BACKBONE!alpha
BACKBONE!beta
BACKBONE!gamma

Yes, it's ugly.

The second argument is the name of the file containing a list of paths
(as output by getpaths, perhaps).  Finally, the third argument, an
integer, is a threshold.

closure searches the paths file for chains of paths that connect two
"core" sites; a chain is disregarded unless it occurs at least
"threshold" times, where "threshold" is the third argument.  Every
site in each such chain is added to the "core".  This is repeated
until no more sites can be added to the "core".  A report of each
iteration and what paths are added to the core is printed on the
standard output.

I've supplied two "initial cores" you can use; "6sites" includes
ihnp4, seismo, decvax, decwrl, tektronix, and sdcrdcf, and
"backbone-list" contains all sites on Gene Spafford's backbone map.

Try the commands

getpaths net > net.paths
closure 6sites net.paths 20

or

closure backbone-list net.paths 20

I'm interested in seeing the results for sites far from the SF Bay
Area.  You can write me at

		oliveb-\
			\
	ihnp4---pesnta--epimass!jbuck
		/
	hplabs-/
