A parallel system contains more than one processor having direct memory access to the shared memory that can form a common address space. Usually, a parallel system is of a Uniform Memory Access (UMA) architecture. In UMA architecture, the access latency (processing time) for accessing any particular location of a memory from a particular processor is the same. Moreover, the processors are also configured to be in a close proximity and are connected in an interconnection network. Conventionally, the interprocess processor communication between the processors is happening through either read or write operations across a shared memory, even though the usage of the message-passing capability is also possible (with emulation on the shared memory). Moreover, the hardware and software are tightly coupled, and usually, the processors in such network are installed to run on the same operating system. In general, the processors are homogeneous and are installed within the same container of the shared memory. A multistage switch/bus containing a regular and symmetric design is used for greater efficiency.
The following diagram represents a UMA parallel system with multiple processors connecting to multiple memory units through network connection.
A multicomputer parallel system is another type of parallel system containing multiple processors configured without having a direct accessibility to the shared memory. Moreover, a common address space may or may not be expected to be formed by the memory of the multiple processors. Hence, computers belonging to this category are not expected to contain a common clock in practice. The processors are configured in a close distance, and they are also tightly coupled in general with homogeneous software and hardware. Such computers are also connected within an interconnected network. The processors can establish a communication with either of the common address space or message passing options. This is represented in the diagram below.
A multicomputer system in a Non-Uniform Memory Access (NUMA) architecture is usually configured with a common address space. In such NUMA architecture, accessing different memory locations in a shared memory across different processors shows different latency times.
Array processor exchanges information by passing as messages. Array processors have a very small market owing to the fact that they can perform closely synchronized data processing, and the data is exchanged in a locked event for applications such as digital signal processing and image processing. Such applications can also involve large iterations on the data as well.
Compared to the UMA and array processors architecture, NUMA as well as message-passing multicomputer systems are less preferred if the shared data access and communication much accepted. The primary benefit of having parallel systems is to derive a better throughput through sharing the computational tasks between multiple processors. The tasks that can be partitioned into multiple subtasks easily and need little communication for bringing synchronization in execution are the most efficient tasks to execute on parallel systems. The subtasks can be executed as a large vector or an array through matrix computations, which are common in scientific applications. Though parallel computing was much appreciated through research and was beneficial on legacy architectures, they are observed no more efficient/economic in recent times due to following reasons:
- They need special configuration for compilers
- The market for such applications that can attain efficiency through parallel processing is very small
- The evolution of more powerful and efficient computers at lower costs made it less likely that organizations would choose parallel systems.