Spring 2024 - Big Picture Summary 5

Memory and I/O

It is perhaps interesting to learn about cache memory from a computer architect (system builder) viewpoint. Undergraduates often encounter the topic in the context of memory organization. In that context, it is hard to justify why to go to such trouble in the first place. In practice, it was born out of a need to bridge a significant performance gap between two crucial system components. The cache was a clever expressway to improve traffic bandwidth between memory and processor by reducing reliance on the slower memory and increasing data rates to the processor. Neither processors nor programmers were supposed to be aware of the cache.

The cache, however, introduced major system performance issues. A cache memory, designed to keep up with a fast processor, also introduced adverse traffic to maintain proper operation transparently to programs. It was particularly challenging for earlier single-bus systems where that traffic competed with legitimate system traffic. Clever ways to organize and access information in cache memory went a long way to addressing traffic issues. Eventually, multibus and later dedicated high-speed point-to-point connection systems helped ease most of those concerns.

Cache design is not easy because it involves many, often conflicting, design factors and performance considerations. There is no one-size-fits-all. A successful cache involves careful tradeoffs to yield a net increase in performance while limiting housekeeping traffic. If done right, it could be a big part of addressing the execution concern identified by Flynn early on. It could significantly reduce execution cycles by eliminating most of those due to slow memory access. In technical terms, an optimal cache can considerably improve real-world machine instruction characteristics and execution rates (CPI/IPC).

Learning from those who developed solutions to address these issues helps us better understand modern computers. Today the need for effective caching is more pressing than ever with faster and increasingly bandwidth-hungry processors. Moreover, memory-intensive workloads benefit from caching to reduce DRAM-based memory access. DRAM is slow. It incurs a higher energy cost, an added issue in HPC and data center environments. Fortunately, advances in semiconductor technology created more room on-chip and in-package for bigger, faster, more closely connected caches. These advances are bringing enormous cache memory by the standards of the early days. These caches are helping deliver vital boosts for the processor-memory subsystem.

The New I/O

So-called cloud storage is turning out to be an increasingly relevant I/O device for users nowadays. No longer just for offline backup, cloud storage is becoming a part of usual workflows, much like a traditional online I/O device. The network-based cloud storage relies on a local component at both the provider and the end-user sides. Ethernet predominantly became that component. It is now a big part of what users perceive as the Internet. It is also a big part of what characterizes the user experience of cloud storage. In a sense, Ethernet significantly powers cloud storage.

It is worth noting that the original Ethernet design is an example of a decentralized scheme with all the issues which arise in that environment. One should not let the fact that it happens to describe a networking technology distract. Watch for parallels in multiprocessing.