As 2025 approaches, mainframe watchers will be eagerly awaiting the release of IBM's new Telum II processor, the next evolution of its innovative Telum chip. Telum II is expected to deliver significant performance improvements to the mainframe, particularly for running AI and Generative AI Large Language Model (LLM) workloads.
The original Telum chip, launched three years ago, introduced onboard AI inferencing to the z16 mainframe, allowing it to process AI tasks locally, which is crucial for applications that need real-time decision-making. Telum came with an on-chip AI accelerator that was shared across the eight cores (the brains of the processor) - and was applauded for its ability to perform AI tasks at the speed of a transaction. For example, this makes it possible to perform real-time fraud detection on financial transactions - such as credit card payment swipes - while they are still being processed.
The next generation Telum II, is just as exciting, with an even more powerful processor, improved on-chip AI acceleration and an integrated data processing unit (DPU) to improve IO throughput.
The main processor contains four interconnected core clusters, each with eight high-performance cores running at 5.5GHz. Complementing Telum II, IBM will launch the optional Spyre Accelerator. This is a separate AI accelerator that is designed to enable inferencing at a greater scale in order to support more complex AI processes, including using LLMs.
Here are five things that we can look forward to when Telum II is available next year.
1. Enhanced AI acceleration
Telum II's on-chip AI accelerator delivers a 4x increase in computer processing capacity over its predecessor, with the ability to power up to 24 trillion operations per second (TOPS). This enables high-throughput in-transaction AI inferencing with low-latency, allowing for applications such as real-time analytics and decision-making directly on the processor.
Each AI accelerator within a processor drawer can accept work from any core. This improves load balancing across all eight AI accelerators, enhancing the system's ability to 'share the effort' in order to manage demanding AI workloads.
2. New Data Processing Unit (DPU) for I/O Acceleration
The new on-board DPU supports I/O acceleration directly on the processor chip to streamline the processing of large volumes of data. One benefit of this is that it will allow enterprises to maintain the same I/O configuration in a smaller footprint, to reduce data centre floor space as they upgrade and modernize their infrastructure.
In fact, by introducing the new integrated DPU, IBM is doubling I/O capacity while reducing physical footprint and power usage "to make the mainframe platform future-ready to support emerging AI-driven applications and massive data transfers."
3. A 40% cache boost
Telum II will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 cache growing to 360MB and 2.88GB, respectively. Allowing more cache to store more data and instructions closer to the core enables faster execution of tasks and better performance.
Applications that involve AI or require large-scale data processing need rapid access to massive amounts of data. So, by increasing available cache, IBM is ensuring more of this data is readily available to the processor for smoother and faster execution of complex tasks.
4. Spyre Accelerator
The Spyre Accelerator is an optional add-on that can be used to provide additional computing capability to complete AI tasks alongside the Telum II processor. It is described as "the first system-on-a-chip that will allow future IBM Z systems to perform AI inferencing at an even greater scale".
By recruiting additional computational power to perform AI tasks using a separate AI accelerator like Spyre, mainframes can achieve ever greater performance gains, faster response times and better overall efficiency. They can take on even more complex AI tasks.
5. 'Ensemble AI' for greater accuracy
When combined, the Telum II and Spyre can create a scalable architecture that can support what IBM calls 'ensemble AI'. Ensemble AI combines the capabilities of multiple AI models, including LLMs to provide more accurate and robust results compared to using individual models.
In fraud detection within insurance claims applications, for example, ensemble AI could involve combining traditional neural networks to provide an initial risk assessment with LLMs to enhance performance and accuracy. Ensemble AI techniques like this can also be used for advanced detection of suspicious activities in finance, to support compliance with regulatory requirements and to reduce the risk of financial crimes, IBM has said in the announcement materials.
Looking to the future
IBM has said that Telum II will be the central processor powering IBM's next-generation IBM Z and IBM LinuxONE platforms, unlocking a host of new possibilities for enterprise clients. Both Telum II and the Spyre Accelerator are expected to be available to IBM Z and LinuxONE clients in 2025, bringing significant advancements to the mainframe ecosystem.
This blog was first published on the IBM Community.