The importance of processor interconnection ability depends on the network demands of the solution.

Discuss:

To make use of a multiprocessor computer or computer cluster problems have to split work between multiple CPUS. Inevitably the whole task will require some communication between the nodes working on the different parts of the answer. Sometimes this co-ordination is only a very small part of the process, with just a distribution of sections of the problem, at the start, and the collection of the answers at the end, but with other types of problem close synchronisation and large amounts of inter processor data passing are required.

This is where an understanding of the speed and latency between the processing nodes becomes important. Inter processor synchronisation speed depends on network latency. Inter processor data passing speed depends on network bandwidth. Total communication bandwidth depends on the interconnection topology and sophistication of the routing algorithm and hardware.

To use use analogy, car engine performance is determined by the torque and peek power output. High Torque is required for towing but high power is prefered for racing. All engines have some of each and can perform each function to an extent but you would not tow the 30 Meter cruser to beach with a Lamborginni any more than you would expect to win a drag race in a Jeep Cherokee. We are looking here for some way to characterise multi-CPU machines and find the tasks to which thay are best suited.

The amount of communication required by a cartoon film render farm, for instance, would be at the low end of the communication requirements scale whereas a large finite element analysis or fluid dynamics problem would be at the top end requiring low latency interconnects and high bandwidth. Combining these factors is the ratio of communication time to processing time. Delays in synchronisation and low bandwidth become less significant if there are long calculation times between bursts of communication.

So we have network bandwidth and latency; how much is not enough, how much is too much ? In order to quantify this lets fill in a couple of formulas.

Gannett numbers :

Network time factor = (sN*sT)*sPf

Network demand factor = (dQ/dB)*nPf

Where

sN = Amount of interprocessor synchronisation. This number is approximated from number of synchronisation's per solution step * number of solution steps * number of nodes in the problem.

sT = inter PE synchronisation time. This depends on network message latency, measure it for the network messaging technology of your choice. This may be a step funcion if you are dealing with a non-uniform network.

sPf = amount of parallel synchronisation's factor. This is the first fudge factor. This factor would be low if your problem does a small amount of synchronisation between single processing nodes but would be higher if you need to use global synchronisation's and there is no explicit method for achieving this. This factor must be adjusted up if the program uses one to many or many to one node communications. This number would be lower if your system has hardware support for fast processor group synchronisation.

dQ = quantity of data to move between portions of the problem. How much data distribution, between the processing elements, is required for each stage of the problem ? Estimate in bytes per problem step * number of steps.

dB = data bandwidth of the network between any two processors * number of independent links. How fast can data move between processing elements ? A T3e/600 for instance can manage about 320MB/s and has up to 6 links per CPU (depending on torus shape). 100baseT Ethernet can manage about 80mb/s on a good day. Approximate with bi-section bandwidth for non-uniform network machines. To determine the bi-section bandwidth chop the network topology in half and see how many connections would jump across the divide, multiply this by the capacity of each connection.

nPf = Data transfer parallelism factor. This is the second fudge factor that depends on the number of parallel paths in the interconnect network and the quantity of required simultaneous transfers. This number would be high if there are few interconnection paths, high if the problem requires that all processors swap data at the same time and then multiplied up by the number of processors in the problem. For instance a machine with a large number of independent network paths would score well here.

Things start to get a bit more complicated when you take into account that some multi-parallel problems allow the choice between doing more processing on a fewer number of nodes and spreading the problem across a larger number of nodes and taking a hit in the increased communication overhead. Along with this some multi processor configurations have non-uniform networks with the bandwidth and latencey being dependant on the number of the processors being used by the solution. This "non-uniform network" can result in step functions for the data bandwidth and syncronisation factors. Other complications include the fact that many problmes require multi stage solutions that could well have difference characteristics at each stage.

Notice how we ignore the amount of idle processor time wasted by waiting for messages to arrive. It is left as an exercise for the reader to improve the workload balance within an algorithm.

However with these factors we have the start at some numbers to use to match problems to hardware installations.

Network time factor = NTF, Network demand factor = NDF

+------+------+---------------------------------------+ | NTF | NDF | Problem type | +------+------+---------------------------------------+ | Low | Low | Cartoon Rendering, chess problems | | Low | High | Problems with big data sets and/or | | | | long calcuation times per step. | | High | Low | Problems with small data sets and/or | | | | short calcuation times per step. | | High | High | Complex Fluid dynamics, Finite element| +------+------+---------------------------------------+ | NTF | NDF | Hardware type | +------+------+---------------------------------------+ | Low | Low | Loose cluster of workstations | | Low | High | SP2, O2000, SMP & vector boxes, fight | | High | Low | it out here in the middle ground. | | High | High | T3E | +------+------+---------------------------------------+

What we need now, after some more rigorous analysis, is to determine the rules and numbers for NTF and NDF for various problems and hardware. This would then allow us to quantify a good system fit for a given problem and reduce the somewhat subjective nature of the clusters V Big Iron debates.