The next generation of IBM mainframes is expected to feature the new Telum II processor and the complementary Spyre Accelerator chip, both of which aim to improve the performance of AI and handle data-intensive workloads on Z more efficiently.
The Telum II delivers increased frequency and memory capacity and a 40 per cent expansion in cache over the original Telum processor, which was introduced on the z16 mainframe in 2022. Telum II comes with an updated integrated AI accelerator core and attached on-chip Data Processing Unit (DPU), which accelerates complex IO protocols for networking and storage, simplifying system operations and improving performance.
The Spyre Accelerator, which is available as an add-on processor, provides additional AI compute capability specifically for AI inferencing. Together, the enhancements in Telum II and Spyre combine to provide the capability to accelerate the use of both traditional AI models and large language models (LLMs) on the mainframe.
Why AI inferencing is needed on mainframes
But why do we need AI on the mainframe and what are some of the use cases for AI on the platform?
Mainframes continue to host mission-critical applications and high-volume transaction processing systems for some of the world's biggest enterprises, including major banks, insurance and financial services brands, as well as airlines and manufacturing and healthcare organizations. The performance, reliability, and security advantages of IBM Z mainframes ensure that they remain the primary platform for these core applications.
With vast amounts of mission-critical data in their systems, there are a number of key reasons why mainframe customers would prefer to keep AI inferencing on the platform rather than outsource it to a separate system. The Telum II and Spyre Accelerator are designed to help address these:
- Data Locality: Keeping AI processing on the mainframe ensures that data remains on-site, eliminating the need to transfer it to external systems such as cloud services, reducing network bottlenecks and enhancing performance due to reduced data transfer times. The Telum II increases on-chip cache capacity by 40 per cent and brings the virtual L3 and L4 cache sizes to 360MB and 2.88GB, which directly supports better data locality.
- Real-time Processing: Mainframes handle massive amounts of data with high transaction volumes with exceptional reliability. Integrating AI directly on the mainframe allows for real-time data analysis and decision-making without the latency and security risks associated with moving data off the mainframe to external AI processing units. The Telum II allows each AI accelerator to accept work from any core in the same drawer for improved load balancing, operating at 192 TOPS (Trillions of Operations Per Second) when fully reconfigured per drawer.
- Data Privacy: Many mainframe customers, especially those in finance and healthcare, handle sensitive data on the platform. Processing AI tasks on the Z helps maintain compliance with data protection regulations since the data doesn't leave the highly secure mainframe environment. The Telum II processor enhances this security by combining end-to-end encryption, confidential computing capabilities, and quantum-safe cryptography. This quantum-safe cryptography protects sensitive data against current cyber threats and future risks posed by quantum computing advancements.
- Resource Utilization and Cost Efficiency: By using existing mainframe infrastructure for AI tasks, businesses can avoid the additional cost of setting up separate AI processing systems. The design of the Telum processor optimizes power consumption and energy and has been further improved on Telum II with up to a 15 per cent reduction in core power due to its centralized I/O and Data Processing Unit (DPU) design. This contributes significantly to lowering the energy footprint of data centers, aligning with global efforts towards energy efficiency in enterprise computing.
AI inferencing in high-volume transaction processing
Among the various use cases for AI on the mainframe, one of the most prominent is financial fraud detection during high-volume transaction processing. For example, the original Telum processor can perform AI inferencing to check credit transactions with machine learning models in real time - while the payment is being processed. This capability operates at scale, handling between 10,000 and 50,000 transactions per second with low latency.
AI-driven fraud detection solutions are designed to save clients millions of dollars a year. The new Telum II processor enhances these capabilities with its increased compute power and system-level improvements, making it even more effective at performing AI inferencing tasks directly on the mainframe.
With regards to speed and efficiency, performing in-transaction AI inferencing, such as real-time fraud detection, directly on the mainframe makes perfect sense, rather than moving the data across a network or via the cloud to a separate computer server running GPUs to perform that work. The Telum processor, with its integrated AI acceleration, is specifically designed to handle such tasks efficiently, enabling real-time fraud detection at scale with minimal latency.
Ensemble AI use cases
Working together, the Telum II processor and Spyre Accelerators also open the possibility of new use cases involving advanced ensemble AI techniques. Ensemble AI combines the power of multiple AI models, including traditional machine learning models and large language models (LLMs), to enhance the performance and accuracy of AI predictions compared to relying on a single model.
For instance, in insurance claims fraud detection, ensemble AI techniques can combine the strengths of multiple models to improve accuracy and performance. Traditional machine learning models based on statistical methods can provide an initial risk assessment of a claim, while large language models (LLMs) can then enhance the analysis by processing unstructured data, such as claim descriptions or supporting documents. Hybrid approaches like this can leverage the complementary capabilities of different AI models to detect fraudulent claims more effectively.
Ensemble AI can enable other use cases such as credit scoring and advanced detection of suspicious financial activities, supporting compliance with regulatory requirements and reducing the risk of financial crimes, including text and images.
For example, by deploying LLMs directly on the mainframe, financial institutions could implement AI-powered chatbots that can access customer information, transaction history, and financial products from the mainframe in real time. This would enable the chatbot to provide personalized financial advice, process loan applications, or detect potential fraudulent activities, all while maintaining the stringent security and compliance requirements of the financial sector. Such capabilities leverage the mainframe's data locality and real-time processing advantages, ensuring sensitive financial data remains within the highly secure mainframe environment.
Moreover, while AI inferencing has been a key focus, the Spyre Accelerator would also be able to support advanced generative AI workloads, such as fine-tuning LLMs on-premises, enabling organizations to customize models for specific business needs while maintaining data privacy and security. Again, this capability is particularly valuable for industries like finance and healthcare, where sensitive data cannot be moved to external systems.
Generative AI for code optimization
Christian Jacobi, IBM Fellow and CTO of IBM Systems Development, makes a case for other generative AI use cases on the mainframe, emphasizing its potential beyond general-purpose AI chatbots. He believes that mainframe shops will likely want to use generative AI for tasks such as code assistance and general system administration.
With mainframe customers running massive software applications extending to 100s of millions of lines of code, he suggests there is a need to apply generative AI directly on the mainframe to help optimize that code during code transformation and code explanation. These capabilities are particularly valuable in addressing the challenges posed by the complexity of legacy systems and the retirement of seasoned developers, which leaves gaps in expertise and documentation. By leveraging generative AI, organizations can streamline code analysis, enhance maintainability, and ensure that their applications remain robust and adaptable to evolving business needs.
The codebase for these often mission-critical applications is massive and extremely sensitive, representing how to run a bank or an insurance company, for example. So, it's unsurprising that customers want to keep it on the mainframe when they run AI models to provide code assistance. They would prefer not to have the codebase flow off to another system.
As an example, Jacobi states that up to eight Spyre Accelerator chips can support the Telum II processor, creating a generative AI cluster with compute capacity and memory bandwidth that delivers a good user experience when running GenAI workloads like Watson Code Assistant for Z.
Conclusion
IBM's Telum II processor and Spyre Accelerator bring advanced AI capabilities directly to the mainframe, addressing critical enterprise priorities such as data locality, real-time processing, and security. The new processor technology unlocks transformative use cases, from fraud detection and ensemble AI in financial services to running large language models for operational insights and generative AI for code optimization.
By combining cutting-edge AI capabilities with the mainframe's reliability, performance and security, these innovations position the platform as a cornerstone for modern, AI-driven enterprise operations.