Intel Contributes AI Acceleration to PyTorch 2.0

March 17, 2023 | Intel

Estimated reading time: 1 minute

?In the release of Python 2.0, contributions from Intel using Intel® Extension for PyTorch , oneAPI Deep Neural Network Library (oneDNN) and additional support for Intel® CPUs enable developers to optimize inference and training performance for artificial intelligence (AI).

As part of the PyTorch 2.0 compilation stack, the TorchInductor CPU backend optimization by Intel Extension for PyTorch and PyTorch ATen CPU achieved up to 1.7 times faster FP32 inference performance when benchmarked with TorchBench, HuggingFace and timm.1 This update brings notable performance improvements to graph compilation over the PyTorch eager mode.

Other optimizations include:

Improved message-passing between adjacent neural network nodes to support graph neural network in PyTorch Geometric (PyG) for enhanced inference and performance training on Intel CPUs.
New x86 quantization backend – a combination of FBGEMM (Facebook General Matrix-Matrix Multiplication) and oneDNN backends – replaces FBGEMM as the default quantization backend for x86 CPU platforms to enable better end-to-end int8 inference performance.
Extended use of oneDNN with oneDNN Graph API to maximize efficient code generation on AI hardware by automatically identifying the graph partitions to be accelerated through fusion. BFloat16 and Float32 data types are supported and only inference workloads can be optimized; BF16 is only optimized on machines with AVX512_BF16 ISA support.

Share on:

Suggested Items

Intel Gaudi, Xeon and AI PC Accelerate Meta Llama 3 GenAI Workloads

04/22/2024 | Intel Corporation
Meta launched Meta Llama 3, its next-generation large language model (LLM). Effective on launch day, Intel has validated its AI product portfolio for the first Llama 3 8B and 70B models across Intel® Gaudi® accelerators, Intel® Xeon® processors, Intel® Core™ Ultra processors and Intel® Arc™ graphics.

Cadence Unveils Palladium Z3 and Protium X3 Systems

04/18/2024 | Cadence Design Systems
The Palladium Z3 and Protium X3 systems offer increased capacity, and scale from job sizes of 16 million gates up to 48 billion gates, so the largest SoCs can be tested as a whole rather than just partial models, ensuring proper functionality and performance.

IDTechEx Explores the Role of 3D Cu-Cu Hybrid Bonding in Powering Future HPC and AI Products

04/18/2024 | PRNewswire
Semiconductor packaging has evolved from traditional 1D PCB levels to cutting-edge 3D hybrid bonding at the wafer level, achieving interconnecting pitches as small as single micrometers and over 1000 GB/s bandwidth. Key parameters, including Power, Performance, Area, and Cost, are crucial considerations

Northrop Grumman Honors Calumet Electronics with Supplier Excellence Award

04/17/2024 | Calumet Electronics
Northrop Grumman Corporation has recognized Calumet Electronics during the company’s 2024 Supplier Excellence Awards for “exceptional performance and unwavering commitment to delivering with excellence.” Calumet is one of 70 suppliers recognized from across the globe. In its award category of “Supplier Strategic Excellence,” Calumet was honored alongside global corporations such as Amazon Web Services, Dell Technologies, and Eaton Corporation.

Micron’s Full Suite of Automotive-Grade Solutions Qualified for Qualcomm Automotive Platforms to Power AI in Vehicles

04/17/2024 | Micron
Micron Technology, Inc. announced that it has qualified a full suite of its automotive-grade memory and storage solutions for Qualcomm Technologies Inc.’s Snapdragon® Digital Chassis™, a comprehensive set of cloud-connected platforms designed to power data-rich, intelligent automotive services.

News Highlights

More News

Featured Books

Book Library

Article Highlights

More Articles

Latest Columns

See all of our columnists

Search Console