Are IBM z16 mainframes still relevant? Turns out - yes! (2024-03-07)

For awhile now we’ve been pondering whether mainframes are still relevant? What keeps the International Business Machine business in business?
What is it that in 2024 makes mainframe attractive to companies?

This video by Dave’s Garage has all the explanations we’ve been looking for – Why Do Mainframes Still Exist? What’s Inside One? 40TB, 200+ Cores, AI, and more!

Turns out - mainframes are fast, vast, and resilient.

Here is a summary of the video (thanks in parts to https://getrecall.ai/)


Introduction (00:00:00)

  • Mainframes remain relevant due to specific industry use cases and offer advantages for heavy workloads.
  • IBM’s Z16 mainframe is based on a powerful custom CPUs with large number of cores and massive CPU cache.
  • Each Z16 CPU module contains two CPU dies with eight cores each, totaling 16 cores per chip.
  • Each core has its own private 32 megabytes of L2 cache (significantly larger than typical server CPUs)
  • The Z16 architecture can share unused L2 cache as a pool - creates a virtual L3 cache of 256 megabytes per 8-core chip.
  • Z16 CPUs run at a constant 5.2 GHz clock speed - consistent performance without throttling.
  • Mainframes can virtualize spare L3 cache slots as a shared L4 cache pool.

Inside the z16 (00:04:47)

  • The Z16 mainframe uses standard 19in racks for its equipment.
  • The air-cooled unit operates at 4.6 GHz.
  • A single compute drawer can have a maximum of four 16-core CPUs.
  • The CPUs require a lot of airflow and have heat pipes to dissipate heat.
  • Mainframes come in four configurations:
    • a 4-cage monster with 200+ cores
    • a single-frame mini monster with 64 cores
    • and rack-mount versions with 18U to 42U space requirements.
  • The hardware is advanced and reliable - focus is on reliability over cost-saving measures.

Super Input Output (00:07:03)

  • Each CPU drawer can hold a maximum of 10 terabytes of memory, for a total system capacity of 40 terabytes of RAM.
  • Memory can be over-provisioned - you might install 40 terabytes but only activate 30 terabytes.
  • Each CPU has a set of 3 x 32 GB/s second-generation 4 x 16 PCI connections, each of which goes to a fan-out.
  • Each fan-out has two redundant paths to every IO device in the system at a full PCI gen 3x16.
  • With 12 fan-outs per compute drawer, you can have a maximum of 12 IO drawers.
  • Each IO drawer contains 16 PCI slots as well as two slots per drawer for PCI switch cards.
  • In total, the system has 192 PCIe slots, each of which can communicate at a full 16 gigabytes per second.
  • Mainframe customers use PCI cards for storage cards, high-speed data transfer boards, cryptography assist boards, compression boards, AI boards, etc.

Accelerators (00:10:56)

  • The Z16 supports AI inferencing directly on the CPU die, enabling high-speed real-time inferencing at large scales.
  • AI inferencing on the CPU die provides higher performance than across the PCI bus and meets stringent service level agreements, such as real-time fraud analysis in credit card processing.
  • The Z16 also has built-in accelerators for encryption and compression, optimizing performance for these tasks.

Why Mainframes? (00:16:49)

  • Despite the potential of distributed PC systems, mainframes excel in specific tasks.
  • Mainframes are commonly used in the financial sector, including ATM transactions, account balance checks, and various e-commerce activities.
  • Mainframes provide exceptional performance and vertical integration, ensuring real-time processing for critical tasks like credit card approvals.
  • Global commerce heavily relies on mainframes for reliable and efficient operation.
  • Mainframes demonstrate meticulous attention to detail, as seen in their advanced fiber management at data centers.

Conclusions (00:19:47)

  • Mainframes offer unparalleled reliability, achieving up to seven nines of availability.
  • Mainframes can handle large workloads and process vast amounts of data efficiently.
  • Mainframes are scalable and can be easily expanded to meet growing demands.
  • Mainframes are cost-effective in the long run due to their reliability, security, and scalability.
  • Mainframes achieve eight nines of reliability, resulting in milliseconds of downtime annually.
  • Fiber speed allows mainframes to be clustered across large distances for workload shifting in case of failure.
  • Extensive built-in redundancies enable automatic failover without downtime.
  • Hot swapping of critical components like CPUs and memory is possible without system shutdown.
  • Mainframes use a unique channel-based I/O system for higher throughput and reliable I/O operations.
  • This design ensures smooth and efficient data flow, similar to dedicated highways with no traffic congestion.
  • Mainframes have supported virtualization since the 1960s and allow division into multiple virtual systems.
  • Each partition runs its own operating system, ensuring continuous operation even if one partition encounters an issue.
  • Partitions can contain other hypervisors, enabling a mix of virtualization, bare metal, and multiple operating systems running simultaneously.