Appendix B : Network Hardware (taken from the Parallel Processing HOWTO)

Computer networking is an exploding field... but you already knew that. An ever-increasing range of networking technologies and products are being developed, and most are available in forms that could be applied to make a parallel-processing cluster out of a group of machines (i.e., PCs each running Linux).

Unfortunately, no one network technology solves all problems best; in fact, the range of approach, cost, and performance is at first hard to believe. For example, using standard commercially-available hardware, the cost per machine networked ranges from less than $5 to over $4,000. The delivered bandwidth and latency each also vary over four orders of magnitude.

Before trying to learn about specific networks, it is important to recognize that these things change like the wind (see http://www.uk.linux.org/NetNews.html for Linux networking news), and it is very difficult to get accurate data about some networks.

Where I was particularly uncertain, I've placed a ?. I have spent a lot of time researching this topic, but I'm sure my summary is full of errors and has omitted many important things. If you have any corrections or additions, please send email to mailto:pplinux@ecn.purdue.edu.

Summaries like the LAN Technology Scorecard at http://web.syr.edu/~jmwobus/comfaqs/lan-technology.html give some characteristics of many different types of networks and LAN standards. However, the summary in this HOWTO centers on the network properties that are most relevant to construction of Linux clusters. The section discussing each network begins with a short list of characteristics. The following defines what these entries mean.

Linux support:

If the answer is no, the meaning is pretty clear. Other answers try to describe the basic program interface that is used to access the network. Most network hardware is interfaced via a kernel driver, typically supporting TCP/UDP communication. Some other networks use more direct (e.g., library) interfaces to reduce latency by bypassing the kernel.

Years ago, it used to be considered perfectly acceptable to access a floating point unit via an OS call, but that is now clearly ludicrous; in my opinion, it is just as awkward for each communication between processors executing a parallel program to require an OS call. The problem is that computers haven't yet integrated these communication mechanisms, so non-kernel approaches tend to have portability problems. You are going to hear a lot more about this in the near future, mostly in the form of the new Virtual Interface (VI) Architecture, http://www.viarch.org/, which is a standardized method for most network interface operations to bypass the usual OS call layers. The VI standard is backed by Compaq, Intel, and Microsoft, and is sure to have a strong impact on SAN (System Area Network) designs over the next few years.

Maximum bandwidth:

This is the number everybody cares about. I have generally used the theoretical best case numbers; your mileage will vary.

Minimum latency:

In my opinion, this is the number everybody should care about even more than bandwidth. Again, I have used the unrealistic best-case numbers, but at least these numbers do include all sources of latency, both hardware and software. In most cases, the network latency is just a few microseconds; the much larger numbers reflect layers of inefficient hardware and software interfaces.

Available as:

Simply put, this describes how you get this type of network hardware. Commodity stuff is widely available from many vendors, with price as the primary distinguishing factor. Multiple-vendor things are available from more than one competing vendor, but there are significant differences and potential interoperability problems. Single-vendor networks leave you at the mercy of that supplier (however benevolent it may be). Public domain designs mean that even if you cannot find somebody to sell you one, you or anybody else can buy parts and make one. Research prototypes are just that; they are generally neither ready for external users nor available to them.

Interface port/bus used:

How does one hook-up this network? The highest performance and most common now is a PCI bus interface card. There are also EISA, VESA local bus (VL bus), and ISA bus cards. ISA was there first, and is still commonly used for low-performance cards. EISA is still around as the second bus in a lot of PCI machines, so there are a few cards. These days, you don't see much VL stuff (although http://www.vesa.org/ would beg to differ).

Of course, any interface that you can use without having to open your PC's case has more than a little appeal. IrDA and USB interfaces are appearing with increasing frequency. The Standard Parallel Port (SPP) used to be what your printer was plugged into, but it has seen a lot of use lately as an external extension of the ISA bus; this new functionality is enhanced by the IEEE 1284 standard, which specifies EPP and ECP improvements. There is also the old, reliable, slow RS232 serial port. I don't know of anybody connecting machines using VGA video connectors, keyboard, mouse, or game ports... so that's about it.

Network structure:

A bus is a wire, set of wires, or fiber. A hub is a little box that knows how to connect different wires/fibers plugged into it; switched hubs allow multiple connections to be actively transmitting data simultaneously.

Cost per machine connected:

Here's how to use these numbers. Suppose that, not counting the network connection, it costs $2,000 to purchase a PC for use as a node in your cluster. Adding a Fast Ethernet brings the per node cost to about $2,400; adding a Myrinet instead brings the cost to about $3,800. If you have about $20,000 to spend, that means you could have either 8 machines connected by Fast Ethernet or 5 machines connected by Myrinet. It also can be very reasonable to have multiple networks; e.g., $20,000 could buy 8 machines connected by both Fast Ethernet and TTL_PAPERS. Pick the network, or set of networks, that is most likely to yield a cluster that will run your application fastest.

 

ArcNet

Linux support: kernel drivers

Maximum bandwidth: 2.5 Mb/s

Minimum latency: 1,000 microseconds?

Available as: multiple-vendor hardware

Interface port/bus used: ISA

Network structure: unswitched hub or bus (logical ring)

Cost per machine connected: $200

ARCNET is a local area network that is primarily intended for use in embedded real-time control systems. Like Ethernet, the network is physically organized either as taps on a bus or one or more hubs, however, unlike Ethernet, it uses a token-based protocol logically structuring the network as a ring. Packet headers are small (3 or 4 bytes) and messages can carry as little as a single byte of data. Thus, ARCNET yields more consistent performance than Ethernet, with bounded delays, etc. Unfortunately, it is slower than Ethernet and less popular, making it more expensive. More information is available from the ARCNET Trade Association at http://www.arcnet.com/.

 

ATM

Linux support: kernel driver, AAL* library

Maximum bandwidth: 155 Mb/s (soon, 1,200 Mb/s)

Minimum latency: 120 microseconds

Available as: multiple-vendor hardware

Interface port/bus used: PCI

Network structure: switched hubs

Cost per machine connected: $3,000

Unless you've been in a coma for the past few years, you have probably heard a lot about how ATM (Asynchronous Transfer Mode) is the future... well, sort-of. ATM is cheaper than HiPPI and faster than Fast Ethernet, and it can be used over the very long distances that the phone companies care about. The ATM network protocol is also designed to provide a lower-overhead software interface and to more efficiently manage small messages and real-time communications (e.g., digital audio and video). It is also one of the highest-bandwidth networks that Linux currently supports. The bad news is that ATM isn't cheap, and there are still some compatibility problems across vendors. An overview of Linux ATM development is available at http://lrcwww.epfl.ch/linux-atm/.

 

CAPERS

Linux support: AFAPI library

Maximum bandwidth: 1.2 Mb/s

Minimum latency: 3 microseconds

Available as: commodity hardware

Interface port/bus used: SPP

Network structure: cable between 2 machines

Cost per machine connected: $2

CAPERS (Cable Adapter for Parallel Execution and Rapid Synchronization) is a spin-off of the PAPERS project, http://garage.ecn.purdue.edu/~papers/, at the Purdue University School of Electrical and Computer Engineering. In essence, it defines a software protocol for using an ordinary "LapLink" SPP-to-SPP cable to implement the PAPERS library for two Linux PCs. The idea doesn't scale, but you can't beat the price. As with TTL_PAPERS, to improve system security, there is a minor kernel patch recommended, but not required: http://garage.ecn.purdue.edu/~papers/giveioperm.html.

 

Ethernet

Linux support: kernel drivers

Maximum bandwidth: 10 Mb/s

Minimum latency: 100 microseconds

Available as: commodity hardware

Interface port/bus used: PCI

Network structure: switched or unswitched hubs, or hubless bus

Cost per machine connected: $100 (hubless, $50)

For some years now, 10 Mbits/s Ethernet has been the standard network technology. Good Ethernet interface cards can be purchased for well under $50, and a fair number of PCs now have an Ethernet controller built-into the motherboard. For lightly-used networks, Ethernet connections can be organized as a multi-tap bus without a hub; such configurations can serve up to 200 machines with minimal cost, but are not appropriate for parallel processing. Adding an unswitched hub does not really help performance. However, switched hubs that can provide full bandwidth to simultaneous connections cost only about $100 per port. Linux supports an amazing range of Ethernet interfaces, but it is important to keep in mind that variations in the interface hardware can yield significant performance differences. See the Hardware Compatibility HOWTO for comments on which are supported and how well they work; also see http://cesdis1.gsfc.nasa.gov/linux/drivers/.

An interesting way to improve performance is offered by the 16-machine Linux cluster work done in the Beowulf project, http://cesdis.gsfc.nasa.gov/linux/beowulf/beowulf.html, at NASA CESDIS. There, Donald Becker, who is the author of many Ethernet card drivers, has developed support for load sharing across multiple Ethernet networks that shadow each other (i.e., share the same network addresses). This load sharing is built-into the standard Linux distribution, and is done invisibly below the socket operation level. Because hub cost is significant, having each machine connected to two or more hubless or unswitched hub Ethernet networks can be a very cost-effective way to improve performance. In fact, in situations where one machine is the network performance bottleneck, load sharing using shadow networks works much better than using a single switched hub network.

 

Ethernet (Fast Ethernet)

Linux support: kernel drivers

Maximum bandwidth: 100 Mb/s

Minimum latency: 80 microseconds

Available as: commodity hardware

Interface port/bus used: PCI

Network structure: switched or unswitched hubs

Cost per machine connected: $400?

Although there are really quite a few different technologies calling themselves "Fast Ethernet," this term most often refers to a hub-based 100 Mbits/s Ethernet that is somewhat compatible with older "10 BaseT" 10 Mbits/s devices and cables. As might be expected, anything called Ethernet is generally priced for a volume market, and these interfaces are generally a small fraction of the price of 155 Mbits/s ATM cards. The catch is that having a bunch of machines dividing the bandwidth of a single 100 Mbits/s "bus" (using an unswitched hub) yields performance that might not even be as good on average as using 10 Mbits/s Ethernet with a switched hub that can give each machine's connection a full 10 Mbits/s.

Switched hubs that can provide 100 Mbits/s for each machine simultaneously are expensive, but prices are dropping every day, and these switches do yield much higher total network bandwidth than unswitched hubs. The thing that makes ATM switches so expensive is that they must switch for each (relatively short) ATM cell; some Fast Ethernet switches take advantage of the expected lower switching frequency by using techniques that may have low latency through the switch, but take multiple milliseconds to change the switch path... if your routing pattern changes frequently, avoid those switches. See http://cesdis1.gsfc.nasa.gov/linux/drivers/ for information about the various cards and drivers.

Also note that, as described for Ethernet, the Beowulf project, http://cesdis.gsfc.nasa.gov/linux/beowulf/beowulf.html, at NASA has been developing support that offers improved performance by load sharing across multiple Fast Ethernets.

 

Ethernet (Gigabit Ethernet)

Linux support: kernel drivers

Maximum bandwidth: 1,000 Mb/s

Minimum latency: 300 microseconds?

Available as: multiple-vendor hardware

Interface port/bus used: PCI

Network structure: switched hubs or FDRs

Cost per machine connected: $2,500?

I'm not sure that Gigabit Ethernet, http://www.gigabit-ethernet.org/, has a good technological reason to be called Ethernet... but the name does accurately reflect the fact that this is intended to be a cheap, mass-market, computer network technology with native support for IP. However, current pricing reflects the fact that Gb/s hardware is still a tricky thing to build.

Unlike other Ethernet technologies, Gigabit Ethernet provides for a level of flow control that should make it a more reliable network. FDRs, or Full-Duplex Repeaters, simply multiplex lines, using buffering and localized flow control to improve performance. Most switched hubs are being built as new interface modules for existing gigabit-capable switch fabrics. Switch/FDR products have been shipped or announced by at least http://www.acacianet.com/, http://www.baynetworks.com/, http://www.cabletron.com/, http://www.networks.digital.com/, http://www.extremenetworks.com/, http://www.foundrynet.com/, http://www.gigalabs.com/, http://www.packetengines.com/. http://www.plaintree.com/, http://www.prominet.com/, http://www.sun.com/, and http://www.xlnt.com/.

There is a Linux driver, http://cesdis.gsfc.nasa.gov/linux/drivers/yellowfin.html, for the Packet Engines "Yellowfin" G-NIC, http://www.packetengines.com/. Early tests under Linux achieved about 2.5x higher bandwidth than could be achieved with the best 100 Mb/s Fast Ethernet; with gigabit networks, careful tuning of PCI bus use is a critical factor. There is little doubt that driver improvements, and Linux drivers for other NICs, will follow.

 

FC (Fibre Channel)

Linux support: no

Maximum bandwidth: 1,062 Mb/s

Minimum latency: ?

Available as: multiple-vendor hardware

Interface port/bus used: PCI?

Network structure: ?

Cost per machine connected: ?

The goal of FC (Fibre Channel) is to provide high-performance block I/O (an FC frame carries a 2,048 byte data payload), particularly for sharing disks and other storage devices that can be directly connected to the FC rather than connected through a computer. Bandwidth-wise, FC is specified to be relatively fast, running anywhere between 133 and 1,062 Mbits/s. If FC becomes popular as a high-end SCSI replacement, it may quickly become a cheap technology; for now, it is not cheap and is not supported by Linux. A good collection of FC references is maintained by the Fibre Channel Association at http://www.amdahl.com/ext/CARP/FCA/FCA.html.

 

FireWire (IEEE 1394)

Linux support: no

Maximum bandwidth: 196.608 Mb/s (soon, 393.216 Mb/s)

Minimum latency: ?

Available as: multiple-vendor hardware

Interface port/bus used: PCI

Network structure: random without cycles (self-configuring)

Cost per machine connected: $600

FireWire, http://www.firewire.org/, the IEEE 1394-1995 standard, is destined to be the low-cost high-speed digital network for consumer electronics. The showcase application is connecting DV digital video camcorders to computers, but FireWire is intended to be used for applications ranging from being a SCSI replacement to interconnecting the components of your home theater. It allows up to 64K devices to be connected in any topology using busses and bridges that does not create a cycle, and automatically detects the configuration when components are added or removed. Short (four-byte "quadlet") low-latency messages are supported as well as ATM-like isochronous transmission (used to keep multimedia messages synchronized). Adaptec has FireWire products that allow up to 63 devices to be connected to a single PCI interface card, and also has good general FireWire information at http://www.adaptec.com/serialio/.

Although FireWire will not be the highest bandwidth network available, the consumer-level market (which should drive prices very low) and low latency support might make this one of the best Linux PC cluster message-passing network technologies within the next year or so.

 

HiPPI And Serial HiPPI

Linux support: no

Maximum bandwidth: 1,600 Mb/s (serial is 1,200 Mb/s)

Minimum latency: ?

Available as: multiple-vendor hardware

Interface port/bus used: EISA, PCI

Network structure: switched hubs

Cost per machine connected: $3,500 (serial is $4,500)

HiPPI (High Performance Parallel Interface) was originally intended to provide very high bandwidth for transfer of huge data sets between a supercomputer and another machine (a supercomputer, frame buffer, disk array, etc.), and has become the dominant standard for supercomputers. Although it is an oxymoron, Serial HiPPI is also becoming popular, typically using a fiber optic cable instead of the 32-bit wide standard (parallel) HiPPI cables. Over the past few years, HiPPI crossbar switches have become common and prices have dropped sharply; unfortunately, serial HiPPI is still pricey, and that is what PCI bus interface cards generally support. Worse still, Linux doesn't yet support HiPPI. A good overview of HiPPI is maintained by CERN at http://www.cern.ch/HSI/hippi/; they also maintain a rather long list of HiPPI vendors at http://www.cern.ch/HSI/hippi/procintf/manufact.htm.

 

IrDA (Infrared Data Association)

Linux support: no?

Maximum bandwidth: 1.15 Mb/s and 4 Mb/s

Minimum latency: ?

Available as: multiple-vendor hardware

Interface port/bus used: IrDA

Network structure: thin air ;-)

Cost per machine connected: $0

IrDA (Infrared Data Association, http://www.irda.org/) is that little infrared device on the side of a lot of laptop PCs. It is inherently difficult to connect more than two machines using this interface, so it is unlikely to be used for clustering. Don Becker did some preliminary work with IrDA.

 

Myrinet

Linux support: library

Maximum bandwidth: 1,280 Mb/s

Minimum latency: 9 microseconds

Available as: single-vendor hardware

Interface port/bus used: PCI

Network structure: switched hubs

Cost per machine connected: $1,800

Myrinet http://www.myri.com/ is a local area network (LAN) designed to also serve as a "system area network" (SAN), i.e., the network within a cabinet full of machines connected as a parallel system. The LAN and SAN versions use different physical media and have somewhat different characteristics; generally, the SAN version would be used within a cluster.

Myrinet is fairly conventional in structure, but has a reputation for being particularly well-implemented. The drivers for Linux are said to perform very well, although shockingly large performance variations have been reported with different PCI bus implementations for the host computers.

Currently, Myrinet is clearly the favorite network of cluster groups that are not too severely "budgetarily challenged." If your idea of a Linux PC is a high-end Pentium Pro or Pentium II with at least 256 MB RAM and a SCSI RAID, the cost of Myrinet is quite reasonable. However, using more ordinary PC configurations, you may find that your choice is between N machines linked by Myrinet or 2N linked by multiple Fast Ethernets and TTL_PAPERS. It really depends on what your budget is and what types of computations you care about most.

 

Parastation

Linux support: HAL or socket library

Maximum bandwidth: 125 Mb/s

Minimum latency: 2 microseconds

Available as: single-vendor hardware

Interface port/bus used: PCI

Network structure: hubless mesh

Cost per machine connected: > $1,000

The ParaStation project http://wwwipd.ira.uka.de/parastation at University of Karlsruhe Department of Informatics is building a PVM-compatible custom low-latency network. They first constructed a two-processor ParaPC prototype using a custom EISA card interface and PCs running BSD UNIX, and then built larger clusters using DEC Alphas. Since January 1997, ParaStation has been available for Linux. The PCI cards are being made in cooperation with a company called Hitex (see http://www.hitex.com:80/parastation/). Parastation hardware implements both fast, reliable, message transmission and simple barrier synchronization.

 

PLIP

Linux support: kernel driver

Maximum bandwidth: 1.2 Mb/s

Minimum latency: 1,000 microseconds?

Available as: commodity hardware

Interface port/bus used: SPP

Network structure: cable between 2 machines

Cost per machine connected: $2

For just the cost of a "LapLink" cable, PLIP (Parallel Line Interface Protocol) allows two Linux machines to communicate through standard parallel ports using standard socket-based software. In terms of bandwidth, latency, and scalability, this is not a very serious network technology; however, the near-zero cost and the software compatibility are useful. The driver is part of the standard Linux kernel distributions.

 

SCI

Linux support: no

Maximum bandwidth: 4,000 Mb/s

Minimum latency: 2.7 microseconds

Available as: multiple-vendor hardware

Interface port/bus used: PCI, proprietary

Network structure: ?

Cost per machine connected: > $1,000

The goal of SCI (Scalable Coherent Interconnect, ANSI/IEEE 1596-1992) is essentially to provide a high performance mechanism that can support coherent shared memory access across large numbers of machines, as well various types of block message transfers. It is fairly safe to say that the designed bandwidth and latency of SCI are both "awesome" in comparison to most other network technologies. The catch is that SCI is not widely available as cheap production units, and there isn't any Linux support.

SCI primarily is used in various proprietary designs for logically-shared physically-distributed memory machines, such as the HP/Convex Exemplar SPP and the Sequent NUMA-Q 2000 (see http://www.sequent.com/). However, SCI is available as a PCI interface card and 4-way switches (up to 16 machines can be connected by cascading four 4-way switches) from Dolphin, http://www.dolphinics.com/, as their CluStar product line. A good set of links overviewing SCI is maintained by CERN at http://www.cern.ch/HSI/sci/sci.html.

 

SCSI

Linux support: kernel drivers

Maximum bandwidth: 5 Mb/s to over 20 Mb/s

Minimum latency: ?

Available as: multiple-vendor hardware

Interface port/bus used: PCI, EISA, ISA card

Network structure: inter-machine bus sharing SCSI devices

Cost per machine connected: ?

SCSI (Small Computer Systems Interconnect) is essentially an I/O bus that is used for disk drives, CD ROMS, image scanners, etc. There are three separate standards SCSI-1, SCSI-2, and SCSI-3; Fast and Ultra speeds; and data path widths of 8, 16, or 32 bits (with FireWire compatibility also mentioned in SCSI-3). It is all pretty confusing, but we all know a good SCSI is somewhat faster than EIDE and can handle more devices more efficiently.

What many people do not realize is that it is fairly simple for two computers to share a single SCSI bus. This type of configuration is very useful for sharing disk drives between machines and implementing fail-over - having one machine take over database requests when the other machine fails. Currently, this is the only mechanism supported by Microsoft's PC cluster product, WolfPack. However, the inability to scale to larger systems renders shared SCSI uninteresting for parallel processing in general.

 

ServerNet

Linux support: no

Maximum bandwidth: 400 Mb/s

Minimum latency: 3 microseconds

Available as: single-vendor hardware

Interface port/bus used: PCI

Network structure: hexagonal tree/tetrahedral lattice of hubs

Cost per machine connected: ?

ServerNet is the high-performance network hardware from Tandem, http://www.tandem.com. Especially in the online transation processing (OLTP) world, Tandem is well known as a leading producer of high-reliability systems, so it is not surprising that their network claims not just high performance, but also "high data integrity and reliability." Another interesting aspect of ServerNet is that it claims to be able to transfer data from any device directly to any device; not just between processors, but also disk drives, etc., in a one-sided style similar to that suggested by the MPI remote memory access mechanisms described in section 3.5. One last comment about ServerNet: although there is just a single vendor, that vendor is powerful enough to potentially establish ServerNet as a major standard... Tandem is owned by Compaq.

 

SHRIMP

Linux support: user-level memory mapped interface

Maximum bandwidth: 180 Mb/s

Minimum latency: 5 microseconds

Available as: research prototype

Interface port/bus used: EISA

Network structure: mesh backplane (as in Intel Paragon)

Cost per machine connected: ?

The SHRIMP project, http://www.CS.Princeton.EDU/shrimp/, at the Princeton University Computer Science Department is building a parallel computer using PCs running Linux as the processing elements. The first SHRIMP (Scalable, High-Performance, Really Inexpensive Multi-Processor) was a simple two-processor prototype using a dual-ported RAM on a custom EISA card interface. There is now a prototype that will scale to larger configurations using a custom interface card to connect to a "hub" that is essentially the same mesh routing network used in the Intel Paragon (see http://www.ssd.intel.com/paragon.html). Considerable effort has gone into developing low-overhead "virtual memory mapped communication" hardware and support software.

 

SLIP

Linux support: kernel drivers

Maximum bandwidth: 0.1 Mb/s

Minimum latency: 1,000 microseconds?

Available as: commodity hardware

Interface port/bus used: RS232C

Network structure: cable between 2 machines

Cost per machine connected: $2

Although SLIP (Serial Line Interface Protocol) is firmly planted at the low end of the performance spectrum, SLIP (or CSLIP or PPP) allows two machines to perform socket communication via ordinary RS232 serial ports. The RS232 ports can be connected using a null-modem RS232 serial cable, or they can even be connected via dial-up through a modem. In any case, latency is high and bandwidth is low, so SLIP should be used only when no other alternatives are available. It is worth noting, however, that most PCs have two RS232 ports, so it would be possible to network a group of machines simply by connecting the machines as a linear array or as a ring. There is even load sharing software called EQL.

 

TTL_PAPERS

Linux support: AFAPI library

Maximum bandwidth: 1.6 Mb/s

Minimum latency: 3 microseconds

Available as: public-domain design, single-vendor hardware

Interface port/bus used: SPP

Network structure: tree of hubs

Cost per machine connected: $100

The PAPERS (Purdue's Adapter for Parallel Execution and Rapid Synchronization) project, http://garage.ecn.purdue.edu/~papers/, at the Purdue University School of Electrical and Computer Engineering is building scalable, low-latency, aggregate function communication hardware and software that allows a parallel supercomputer to be built using unmodified PCs/workstations as nodes.

There have been over a dozen different types of PAPERS hardware built that connect to PCs/workstations via the SPP (Standard Parallel Port), roughly following two development lines. The versions called "PAPERS" target higher performance, using whatever technologies are appropriate; current work uses FPGAs, and high bandwidth PCI bus interface designs are also under development. In contrast, the versions called "TTL_PAPERS" are designed to be easily reproduced outside Purdue, and are remarkably simple public domain designs that can be built using ordinary TTL logic. One such design is produced commercially, http://chelsea.ios.com:80/~hgdietz/sbm4.html.

Unlike the custom hardware designs from other universities, TTL_PAPERS clusters have been assembled at many universities from the USA to South Korea. Bandwidth is severely limited by the SPP connections, but PAPERS implements very low latency aggregate function communications; even the fastest message-oriented systems cannot provide comparable performance on those aggregate functions. Thus, PAPERS is particularly good for synchronizing the displays of a video wall (to be discussed further in the upcoming Video Wall HOWTO), scheduling accesses to a high-bandwidth network, evaluating global fitness in genetic searches, etc. Although PAPERS clusters have been built using IBM PowerPC AIX, DEC Alpha OSF/1, and HP PA-RISC HP-UX machines, Linux-based PCs are the platforms best supported.

User programs using TTL_PAPERS AFAPI directly access the SPP hardware port registers under Linux, without an OS call for each access. To do this, AFAPI first gets port permission using either iopl() or ioperm(). The problem with these calls is that both require the user program to be privileged, yielding a potential security hole. The solution is an optional kernel patch, http://garage.ecn.purdue.edu/~papers/giveioperm.html, that allows a privileged process to control port permission for any process.

 

USB (Universal Serial Bus)

Linux support: kernel driver

Maximum bandwidth: 12 Mb/s

Minimum latency: ?

Available as: commodity hardware

Interface port/bus used: USB

Network structure: bus

Cost per machine connected: $5?

USB (Universal Serial Bus, http://www.usb.org/) is a hot-pluggable conventional-Ethernet-speed, bus for up to 127 peripherals ranging from keyboards to video conferencing cameras. It isn't really clear how multiple computers get connected to each other using USB. In any case, USB ports are quickly becoming as standard on PC motherboards as RS232 and SPP, so don't be surprised if one or two USB ports are lurking on the back of the next PC you buy. Development of a Linux driver is discussed at http://peloncho.fis.ucm.es/~inaky/USB.html. In some ways, USB is almost the low-performance, zero-cost, version of FireWire that you can purchase today.

 

WAPERS

Linux support: AFAPI library

Maximum bandwidth: 0.4 Mb/s

Minimum latency: 3 microseconds

Available as: public-domain design

Interface port/bus used: SPP

Network structure: wiring pattern between 2-64 machines

Cost per machine connected: $5

WAPERS (Wired-AND Adapter for Parallel Execution and Rapid Synchronization) is a spin-off of the PAPERS project, http://garage.ecn.purdue.edu/~papers/ , at the Purdue University School of Electrical and Computer Engineering. If implemented properly, the SPP has four bits of open-collector output that can be wired together across machines to implement a 4-bit wide wired AND. This wired-AND is electrically touchy, and the maximum number of machines that can be connected in this way critically depends on the analog properties of the ports (maximum sink current and pull-up resistor value); typically, up to 7 or 8 machines can be networked by WAPERS. Although cost and latency are very low, so is bandwidth; WAPERS is much better as a second network for aggregate operations than as the only network in a cluster. As with TTL_PAPERS, to improve system security, there is a minor kernel patch recommended, but not required: http://garage.ecn.purdue.edu/~papers/giveioperm.html.