(2functions over multiple processors. H Experimenters normally are anxious to minimise the experiments R"dead-time". \Some experimental groups fwith very high data rates may choose to buffer data pprior to transferring it to the processor memory. zWhere such "hardware" solutions are inappropriate, the parallel readout of event data, either by the use of multiple CAMAC branch drivers, or multiple data acquisition processors, or both, can significantly reduce experiment dead-time. The recombination of the part events and their logging to tape can be handled as a non-time critical operation, provided sufficient processor memory is available.  The desire to analyse a small fraction of the data in some depth, while continuing to look briefly at a significant percentage of events, also indicates a need to distribute and modularise functions. A 32-bit machine having compatibility with "offline" analysis of the data and a gigabyte of logical address space, is an attractive and in some cases essential ingredient for data analysis. A VAX is the obvious choice for such a machine, given the existing extensive use of PDP-11s. $ Why not just move to the VAX as a single processor system .for each experiment? There are experiments 8with data rates and dead-time requirements such that a single BVAX could well do the job. LThere is however much to be argued in favour of using PDP-11s for Vthe highly time critical data acquisition part of an experiment. `Typical total event interrupt and readout overhead, measured jfor real-world PDP-11 data acquisition systems are of the order tof 300-500 microseconds. ~Many people have handled highly time-critical applications within the framework of a large multi-user operating system. However interrupt response, on a VAX, to match that of the PDP-11 system, even if possible requires a high degree of care in both bypassing and coordinating with the operating system. This is especially true when a general purpose data acquisition system is the goal, with widely different performance requirements for different experiments. The PDP-11 MULTI packages handle the acquisition of data efficiently and flexibly. Their simplicity means that even special purpose hardware can relatively easily be interfaced, tested and incorporated into the data acquisition system. This has been done successfully by several experiments using RT/MULTI, for such devices as FASTBUS (3) and non-  Non-technical considerations such as funding and manpower and the existence of many PDP-11s , often with specialised or tailored software, could also be cited as contributory arguments toward using PDP-11s (for data acquisition machines. A consideration not to be 2overlooked is conservatism. Many physics experiments <metamorphose from a previous experiment. =.page >.lm 14.rm 68 ?A data acquisition system Fwhich is based on an old and Ptrusted system, is often desirable from the human point of view. Z.s 1 d.c 82;^&Multi-processor systems\& n.sk 1 xThe goal of our multi-processor systems is to allow the functional, practical and performance requirements of a physics experiment to be satisfied by configuring both a hardware and software system of processors and packages. It is a building block approach, in which some of the building blocks are themselves complete data acquisition and monitoring packages. .sk 1 ^&Functional requirements\& .b;Nearly all high energy physics experiments have the same broad functional requirements from a data acquisition and monitoring system. .ls .le;Fast acquisition of data from the apparatus, with minimum dead-time and adequate buffering. .le;Logging of data to a secondary medium, normally a magnetic tape .le;Monitoring of the apparatus by either direct checks on the hardware or analysis of the event data. ".le;Analysis of some fraction of the events. ,.le;Display of graphic and other information. 6.le;Control of the starting/stopping of data taking and tape writing. @.els J Additionally data acquisition systems often require the \ Tability to playback events previously written to tape. ^.sk 1 h^&Practical requirements\& r.b;These vary considerably from experiment to experiment, but |some of the following may be required: .ls .le;A complete data acquisition and monitoring package which requires little or no programming. .le;A framework data acquisition and monitoring package to which specialised software and/or hardware can be easily added. .le;The ability to quickly and easily change or add to the system without affecting other critical parts. .le;A program development capability whilst the experiment is in progress. .le;The ability to write detailed analysis programs without the limitations of a 32K word logical address space. Analysis of the data may require a large histogram space. .le;Transportablity of analysis software to other systems. .els .sk 1 ^&Performance requirements\& &.b;Events may vary in size from less than 100 16-bit words to in 0excess of 30000 16-bit words. Event rates may vary from just :a few per second to 1000 per second. A system may be required Dto read in and buffer up to 100,000 16-bit words per second. M.rm 127 .lm 73 NNormally the primary performance measurement of interest to Xthe experiment is the event-dead-time. This is the time interval between bwhen the computer is interrupted to read in an event and when lit is again ready to read in the next event. vThe ability to monitor a large fraction, or all, of the events may be required. .c 200;^&The Links between the Processors\& .sk 1 The communication links between processors are fundamental to the success of a distributed processing system.  Tightly coupled processors with shared memory regions were considered and were rejected largely because of the lack of suitable hardware and also because of the difficulty of extending the system to loosely coupled, physically separate systems.  DECNET was also considered and rejected because of its large requirements on address space and its high time overheads. .sk 1 ^&Link Hardware\& .b;We chose pairs of DR11-Ws to provide a high speed parallel link. The DR11-W is a DMA controller capable of sustaining *DMA transfers at a rate of 333 Kwords/second (non-burst mode) 4and 500 Kwords/second (burst mode) (4). Pairs of DR11-Ws may be >used to link any two processors with a UNIBUS, which are Hnot more than 50 ft. apart. It is an inexpensive, Digital Rsupported device, also available from MDB. The latter also \provides for links between machines up to 1000 ft apart. f.sk 1 p^&Network Architecture\& z.b;Our network architecture is designed for typical multiple processor configurations such as shown in Figure 1. We adopted a layered approach with the following ground rules in mind: .ls .le;point-to-point connections only, .le;bi-directional use of the link, .le;direct transfer of data to a program, without intermediate buffering, for maximum speed, for the handling of .le;interchangeablity of processors and operating systems at each node, .le;capability of switching to alternate link hardware with minimal program change. .els  .sk 1  .c 200;Link Layer  .sk 1 $ The link layer executes a series of transactions with its . counterpart in the other machine. This protocol accomplishes 8 the sending of a single block of data (a packet) or a single B byte of control information (a signal) over the link. In L addition it guarantees fair bi-directional use of the link V regardless of the varying speeds and interrupt response times ` of the end processors. j .page k .lm 14.rm 68 t .c 82;Logical Link Layer ~ .sk 1 The logical link layer is concerned with the routing of messages to the appropriate program, or part of a program within a processor. It uses a system-wide identifier , termed a packet type code (PTC) to determine the destination of a message. Each message has to be given a code which describes its intended destination. Any program or part of a program may declare itself the exclusive owner of messages of a particular packet type code, and will then receive all messages with that particular destination packet type code. This philosophy allows messages to be sent over the link, categorised by their function. They are .lm 14.rm 68 directed to the piece of software which is responsible for those types of messages. Whether this is a dedicated special purpose program (on a VAX say), or  part of RT/MULTI is transparent to the sender of the message.  .sk 1 ( .c 82;Application Layer 2 .sk 1 < Programs wishing to exchange messages must develop their F own higher-level protocols, which include destination P identifiers, for replies to messages where necessary. Z It is at this layer that the use of data acquisition and d monitoring software packages enters. n .sk 1 x ^&Implementation of the Communication software\& .b;For all three operating systems (VMS, RT-11 and RSX-11M) we have chosen to implement the link and logical link layers using device drivers (5),(6). In addition we have provided a FORTRAN callable package of routines (CDPACK) to interface to the device drivers. These provide a uniform operating system independent interface to the communications software (7). The driver and the FORTRAN package both serve to insulate application programs using them from future changes in the hardware. .sk 2 .c 82;^&Multi-Processor System Architecture .b;^&Hardware\&  .b;A multi-processor system consists of some combination of  PDP-11 and VAX machines. Currently, one or more of the PDP-11s  are data acquisition machines. " Connected machines must be less than 50 ft apart and , linked by a pair of DR11-Ws. Connections to LSIs has not 6 been explicitly tried but should be possible with little @ or no change to software. Similarly all our tests to date J have been with PDP 11/34, 11/45 and upwards range of processors, T excluding 22 bit machines. RT-11 software should in principle ^ function on 11/03, 11/05, 11/10 type machines. h One machine is normally equipped with at least one tape r drive, either a standard Digital supported 800 or 1600 bpi tape | drive or a STC1900 1600/6250 bpi drive. Display of graphic information is done using Tektronix 4010 terminal (or compatible substitute), and/or Tektronix 613 on a PDP-11 processor. A Versatec printer/plotter or Printronix printer plotter device may be used on any processor. The latter may be shared between multiple processors, using a hardware switching mechanism (8). .b;Data acquisition PDP-11s are equipped with one or more Jorway 411 CAMAC branch drivers through which event readout is done. With an RT/MULTI software package an (EGG) BDO11 CAMAC interface may be used instead. .rm 127 .lm 73 Fast "event-data-present" interrupts are presented via a DR11-C general purpose device interrupt interface module.   .b;  Any adequate combination of manufacturer supported discs may be & used on the systems. Disc requirements range from one double 0 density floppy disc (onto which RT/MULTI can be fitted) to a : more normal pair of RL01 or RL02 discs on a data acquisition D PDP-11. N Bank switchable bulk memory may be used on any PDP-11 processor (9). X This may be the only memory on the UNIBUS or may co-exist with b normal PDP-11 memory. g Addition of one or more 128K word bulk l memory boards to a PDP-11 v processor allows the system to access a large amount of memory, as data buffers, in addition to the normal memory. .sk 1 ^&Software\& .b;All machines in a multi-processor configuration are expected to run under one of the manufacturer-supplied operating systems: VMS (for VAX) and either RT-11 or RSX-11M (for PDP-11s). The basic building blocks available for a multiprocessor system are: .ls .le;Communications software for all machines and operating systems. .le;RT/MULTI data acquisition and/or analysis package. .le;RSX bulk memory data acquisition package. .le;RSX/MULTI analysis package.  .le;VAX/MULTI analysis package (10) .els  Communications "hooks" have been added and are still in the process of being added to the above basic data acquisition * packages. The FORTRAN callable interface to the communications software 4 is being used to implement these inter-processor dialogues, > following the system design philosophy outlined below. H .sk 1 R .c 200;^&Multi-processor system design\& \ .sk 1 f Each of the functional requirements of a system, listed above, p are considered as logically distinct subsystems. z Each subsystem which needs to communicate with a subsystem in a connected processor must do so in a well-defined and operating system independent manner. We have defined a system-wide format of such communications between subsystems for: .sk 1 .ls .le;Providing event data, on request, to an analysis subsystem. A systemwide network identifier exists for a subsystem which can provide such event data. Requests for events (either single events or a buffer of several events) consist of sending a 10 word request block to this event provider, specifying the type of events required, the size of buffer available to receive them and the network identifier of the requester. .le;Passing a part-event, on request, to a subsystem $dedicated to the re-assembly of part events and their logging .to magnetic tape. This is for systems where more than one 8PDP-11 is used to read 9.lm 14.rm 68 :out the data for a single event. B.els C.lm 14.rm 68 L.sk 1 VStandard protocols for the following subsystems have yet to `be defined: j.ls t.le;Run and tape control software communication with the ~data acquisition and tape logging subsystems, and with other run and tape control sub-systems .le;Error and message subsystems communication across processors .els .lm 14.rm 68 Often the subsystems are, or will be, actually implemented as separate processes (VMS), or separate tasks (RSX-11M), which must be able to communicate with other subsystems in the same machine. In this case communication software may be used for the inter-process communication too. This is achieved by the use of an internal communication link feature provided in both the RSX-11M and VMS device drivers.  Complete functional interchangeability of software between processors and operating systems demands more than data acquisition and monitoring subsystems themselves. It also means providing certain services normally available within a machine to a program on a connected machine. In particular, (remote file access and the ability to remotely manipulate and 2read from CAMAC, come into that category. General purpose <communication software will in the future form the basis for Fthe provision of such services. In the meantime multi-processor Pconfigurations are also constrained by the Zrequirement that a monitoring program which directly accesses dthe CAMAC must be implemented as a part of the data acquisition nmachine software. Also data and histogram files produced on xone processor are not immediately and transparently available for use on a connected machine.  Special purpose user written software may make use of the general communications software to transfer data, of any type, between connected machines. .sk 3 .c 82;^&The Building Blocks\& .c 82;Overview, Status and Performance .sk 1 ^&General communications software\& .sk 1 Device drivers and a standard interface package exist for all three operating systems (VMS, RT-11 and RSX-11M).  Any program in one processor may receive or transfer data directly from/to the data buffer of any program in a connected processor.  Much test software for the link has been written in FORTRAN common code. The link has proven to be error-free and completely reliable.  Some timing figures for data transfers between processors (using the FORTRAN callable CDPACK) are given in Table 1. "These depend on the particular processor and memory speed. #.page $.rm 127 .lm 73 ,.c 200;^&Table 1\& 6.literal 7 8------------------------------------------------------ @ RT-11/ RT-11/ RSX/ VAX/ VAX/ J RT-11 RSX RSX PDP-11 PDP-11 T * * * +(1) +(2) ^------------------------------------------------------ h rPer message time 4-5 5-6 5-6 4.5-6 4.5-6 |overhead (milliseconds)  Time overhead per 3 3 3 5.5 3.6 word of data transferred (microseconds)  -------------------------------------------------------  .end literal * Loopback tests over several thousand transfers between 11/34 and 11/50 processors .sk 1 .b;+ Tentative figures (performance still under study), of several thousand one way transfers from VAX to 11/50 running RSX-11M. .sk 1 &.b;(1) DMA using VAX direct data path 0.sk 1 :.b;(2) DMA using VAX buffered data path. N O.l;------------------------------------------------------ P.sk 2 X Using connected RSX-11M systems we have measured the CPU utilisation bwhen executing continuous loopback communications software test lprograms. The percentage of the CPU available to other programs vwas found to be between 50 and 65%. This compares favourably with the CPU utilisation of writing continuously to a 6250 bpi magnetic tape. .sk 1 ^&Data acquisition and analysis packages\& .sk 1 RT/MULTI is a complete software package incorporating data acquisition, analysis and run control, with an emphasis on interactive analysis of the data. It now has two possible "network" implementations. In one it may be used both as a data acquisition system and a provider of events over the link to an analysis subsystem. Analysis may also be carried on in parallel in the RT/MULTI data acquisition machine. In the other it may be used as an analysis-only package, obtaining its data from the link. LINK commands have been incorporated which allow control of inter-processor communication. They also *provide for control of the supply of data buffers for the analysis 4subsystem of the data acquisition machine, relative to the >supply of buffers available to over-the-link requesters. HRT/MULTI sends and receives buffers of events. R The performance \of the system is quite dependent on various fconfiguration parameters. It depends on the size of pbuffers, the number of events per buffer, and above all on the zextent and type of data acquisition activity. Requests for events are treated as a part of an analysis type subsystem and as such data acquisition activities normally take precedence. .page .lm 14.rm 68  We have measured the performance in both RT-RT and RT-RSX cases. The result is an 11 ms per buffer overhead in both cases, to which the 3 microsecond per word transfer time from Table 1 must be added. Large buffers containing several events are obviously the most efficient way to use the link, since the per buffer overhead is constant. RT/MULTI link additions provide for this.  Two experiments at Fermilab are currently setting up dual RT/MULTI systems. Both are using existing and heavily tailored RT/MULTI systems, to which they are adding an RT/MULTI analysis machine.  Interprocessor communication software takes up more than 2K words in an RT/MULTI system. This means that existing systems may have to either re-configure the software or move $some of the analysis functions to the connected machines . Any program in an RSX-11M system or on a VAX may also 8obtain a buffer of events from an RT/MULTI data acquisition Bsystem, using CDPACK routines and obeying the defined L"event request" protocol. Work is still in progress to Vprovide a user interface routine for such event requests and `to incorporate that into both the RSX/MULTI and VAX/MULTI janalysis programs. t Three experiments are already planning to take data this ~Fall using an RT/MULTI system connected to a VAX analysis machine.  Work has been done on adding logging of part events protocols to the RSX data acquisition system. Two RSX data acquisition systems may take data, in parallel, for a single event and later each send the data to a third RSX "data acquisition" system to be recombined and logged to tape. This work is currently being tested and at present there is one potential user of such a configuration. .lm 14.rm 68 .sk 3 .c 82;^&Conclusions and Future Plans\& .sk 1 We will continue to add communication "hooks" to all subsystems. events over the link. RT/MULTI and possibly some VAX subsystem will have part-event logging capabilities added. Work will proceed on integrating system-wide run and (tape control and a system-wide error and message system. 2 We will be closely watching the performance of these <multi processor systems as they go into operation in experiments. FOur user community has, in the past, given us valuable feedback Pboth in the form of operational experience and program contributions. ZWith similar input, we will adapt these multi processor systems to user dneeds and incorporate our users additional developments. i.page n.rm 127 .lm 73 p.sk 2 x.c 200;^&References\& .sk 1 .ls .le;J.F.Bartlett, et. al., RT/RSX MULTI: Packages for Data Acquisition and Analysis in High-Energy Physics, IEEE Transactions on Nuclear Science, Vol. NS-26, No. 4, Aug 1979. .le;J.R.Biel and R.J.Dosen, An RSX-11M High-Rate CAMAC Data Acquisition System Using a Bank-Switchable Bulk Memory, IEEE Transactions on Nuclear Science, Vol. NS-28, No. 5, Oct 1981. .le;D.Harding,J.Kohlmeier,J.Filaseta, D.Ritchie, A High Speed Data Acquisition System for Fermilab Experiments. Paper presented to this conference. .le;Digital Equipment Corporation, DR11-W Direct Memory Interface Module Users Guide. .le;M.Pyatetsky, P.Heinicke, D.Ritchie, V.White, Using the DR11-W DMA device for Interprocessor Communications in RT-11, "1982 Fall DECUS U.S. Symposium, Anaheim, Ca. ,.le;D.Burch, V.White, An RSX-11M Device Driver implementing 6Network protocols on the DR11-W, 1982 Fall DECUS U.S. Symposium, @Anaheim, Ca. J.le;J.Biel et. al., High Speed Interprocessor Data Links using Tthe DR11-W, 1982 Fall DECUS U.S. Symposium, Anaheim, Ca. ^.le;R.Knowles, Scanning Printer Multiplexer (SMUXBOX), Fermilab _Computer Department Hardware note HN-48. h.le;Digital Pathways, Mt. View, California 94043, Bank-Switchable rBulk Memory Users Manual. |.le;K.Eng, B.Burch, D.Ritchie, VAXMULTI System Generation, Fermilab Computer Department Program Note PN-146. .els .page