Evolution
Computer builders can increase the power of their machines, in terms of both capability and speed, if they have access to lots of smaller, faster-switching transistors or better physical circuits and fabrication processes. This approach, however, quickly becomes less rewarding. Alternatively, they can rethink machine design. They may examine how they lay out parts, organize the functions, and implement them, guided by a rigorous assessment of the performance they obtain, i.e., the machine architecture [Jean-Loup Baer, Computer Systems Architecture, 1980]. Soon, it was clear that improvements due to architectural changes were far better than those gained from a brute force approach focused on improving the circuits. The CDC-6600 from the early 1960s is regarded widely as the first supercomputer. It was the result of carefully considered architectural innovations. That machine was one of many early experiences that proved valuable in the long run.
Gaining Power
Long-term approaches to building computers emerged in the quest for ever faster and more capable machines. One focused on centralizing the processing of instruction streams (Amdahl/IBM). Another advocated using redundant resources in parallel to speed up processing or overlap important activities and increase responsiveness (Thornton/CDC6600). Both of those approaches used extra resources in different ways to maximize performance. A notably different approach proposed exploiting the natural parallelism in data streams to go through a computation (Dennis/Dataflow). This approach attempted to use data dependencies to drive general processing. Data-driven processing should, in theory, dispense with complicated conventional control. In practice, some level of control was unavoidable for a general-purpose processor.
The function of the control unit of a computer is to specify how to execute programs. Early hardware builders realized that it simplified computers greatly. They soon noticed that explicit control of the flow of programs forced an artificial sequencing on instructions. In practice, dependence patterns in data and operations sometimes allow concurrent processing. It motivated designs that added hardware to exploit those opportunities. The extra functional units led to an early form of out-of-order execution. It required complex hardware, such as the scoreboard in the 6600, to discover independence in data to drive the added resources. A dataflow processor, in contrast, would get the information cheaply from the program. However, centralized control and some features of workloads from that era limited returns on adding parallel resources significantly at the time. Amdahl demonstrated in a seminal work that the amount of exploitable parallelism in workloads is a fundamental limiting factor on performance.
Flynn proposed four basic ways to organize computer hardware depending on how instructions and data are processed. One described an array of processors that work in unison on a set (or vector in the math sense) of data items. That arrangement turned out to be quite influential in the long run. It is today the most successful and widespread form of parallelization. The modern GPU is an advanced example. His terminology, SIMD (single instruction, multiple data), caught on and stuck.
In Summary
Modern computers at all levels build, in some way, on most of the experiences discussed earlier. The dataflow approach, less attractive for a general-purpose computer, proved quite successful in specialized settings. Modern forms of SIMD are now prevalent in crucial applications, and Amdahl's insights are the basis for a fundamental rule in parallel processing. The three major performance bottlenecks identified by Flynn (storage/memory, execution, and branch decisions) are still relevant today. His insights were lasting. The only big one to emerge since that time was parallelization.
Computer architecture is the art, science, and engineering of building computers. Computers still benefit significantly from improvements in the design and characteristics of the physical device, circuit, and fabrication process but rely on architectural changes for substantial gains.