ROCm 6.0.0 Release
Release notes for AMD ROCm 6.0
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct
MI300 series. Future releases will further enable and optimize this new platform. Key features include:
- Improved performance in areas like lower precision math and attention layers.
- New hipSPARSELt library accelerates AI workloads via AMD's sparse matrix core technique.
- Upstream support is now available for popular AI frameworks like TensorFlow, JAX, and PyTorch.
- New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
- Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
tutorials on the AMD ROCm Docs site.- Consolidated developer resources and training on the new
AMD ROCm Developer Hub.The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
the Changelog. We list known
issues on GitHub.OS and GPU support changes
ROCm 6.0 enables the use of MI300A and MI300X Accelerators with a limited operating systems
support. Future releases will add additional OS's to match our general offering.
Operating Systems MI300A MI300X Ubuntu 22.04.5 Supported Supported RHEL 8.9 Supported SLES15 SP5 Supported For older generations of supported Instinct products we've added the following operating systems:
- RHEL 9.3
- RHEL 8.9
Note: For ROCm 6.2 and beyond, we've planned for end-of-support (EoS) for the following operating
systems:
- Ubuntu 20.04.5
- SLES 15 SP4
- RHEL/CentOS 7.9
New ROCm meta package
We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
libraries. For example, the following command will install the full ROCm package:apt-get install rocm
(Ubuntu), oryum install rocm
(RHEL).Filesystem Hierarchy Standard
ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
the backward compatibility support for old file locations.Compiler location change
- The installation path of LLVM has been changed from
/opt/rocm-<rel>/llvm
to/opt/rocm-<rel>/lib/llvm
. For backward compatibility, a symbolic link is provided to the old
location and will be removed in a future release.- The installation path of the device library bitcode has changed from
/opt/rocm-<rel>/amdgcn
to/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn
. For backward compatibility, a symbolic link
is provided and will be removed in a future release.Documentation
CMake support has been added for documentation in the
ROCm repository.AMD Instinct MI50 end-of-support notice
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
maintenance mode in ROCm 6.0.As outlined in 5.6.0, ROCm 5.7 was the
final release for gfx906 GPUs in a fully supported state.
- Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
- Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
2024 (end of maintenance [EOM] will be aligned with the closest ROCm release).- Bug fixes will be made up to the next ROCm point release.
- Bug fixes will not be backported to older ROCm releases for gfx906.
- Distribution and operating system updates will continue per the ROCm release cadence for gfx906
GPUs until EOM.ROCm projects
The following sections contains project-specific release notes for ROCm 6.0. For additional details, you
can refer to the Changelog.AMD SMI
Integrated the E-SMI (EPYC-SMI) library.
You can now query CPU-related information directly through AMD SMI. Metrics include power,
energy, performance, and other system details.Added support for gfx942 metrics.
You can now query MI300 device metrics to get real-time information. Metrics include power,
temperature, energy, and performance.HIP
New features to improve resource interoperability.
- For external resource interoperability, we've added new structs and enums.
- We've added new members to HIP struct
hipDeviceProp_t
for surfaces, textures, and device
identifiers.Changes impacting backward compatibility.
There are several changes impacting backward compatibility: we changed some struct members and
some enum values, and removed some deprecated flags. For additional information, please refer to
the Changelog.hipCUB
- Additional CUB API support.
The hipCUB backend is updated to CUB and Thrust 2.1.HIPIFY
Enhanced CUDA2HIP document generation.
API versions are now listed in the CUDA2HIP documentation. To see if the application binary
interface (ABI) has changed, refer to the
C columnin our API documentation.Hipified rocSPARSE.
We've implemented support for the direct hipification of additional cuSPARSE APIs into rocSPARSE
APIs under the--roc
option. This covers a major milestone in the roadmap towards complete
cuSPARSE-to-rocSPARSE hipification.hipRAND
- Official release.
hipRAND is now a standalone project--it's no longer available as a submodule for rocRAND.hipTensor
Added architecture support.
We've added contraction support for gfx942 architectures, and f32 and f64 data
types.Upgraded testing infrastructure.
hipTensor will now support dynamic parameter configuration with input YAML config.MIGraphX
Added TorchMIGraphX.
We introduced a Dynamo backend for Torch, which allows PyTorch to use MIGraphX directly
without first requiring a model to be converted to the ONNX model format. With a single line of
code, PyTorch users can utilize the performance and quantization benefits provided by MIGraphX.Boosted overall performance with rocMLIR.
We've integrated the rocMLIR library for ROCm-supported RDNA and CDNA GPUs. This
technology provides MLIR-based convolution and GEMM kernel generation.Added INT8 support across the MIGraphX portfolio.
We now support the INT8 data type. MIGraphX can perform the quantization or ingest
prequantized models. INT8 support extends to the MIGraphX execution provider for ONNX Runtime.ROCgdb
- Added support for additional GPU architectures.
- Navi 3 series: gfx1100, gfx1101, and gfx1102.
- MI300 series: gfx942.
rocm-smi-lib
Improved accessibility to GPU partition nodes.
You can now view, set, and reset the compute and memory partitions. You'll also get notifications of
a GPU busy state, which helps you avoid partition set or reset failure.Upgraded GPU metrics version 1.4.
The upgraded GPU metrics binary has an improved metric version format with a content version
appended to it. You can read each metric within the binary without the fullrsmi_gpu_metric_t
data
structure.Updated GPU index sorting.
We made GPU index sorting consistent with other ROCm software tools by optimizing it to useBus:Device.Function
(BDF) instead of the card number.ROCm Compiler
Added kernel argument optimization on gfx942.
With the new feature, you can preload kernel arguments into Scalar General-Purpose Registers
(SGPRs) rather than pass them in memory. This feature is enabled with a compiler option, which also
controls the number of arguments to pass in SGPRs. For more information, see:
https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-argumentsImproved register allocation at -O0.
We've improved the register allocator used at -O0 to avoid compiler crashes (when the signature is
'ran out of registers during register allocation').Improved generation of debug information.
We've improved compile time when generating debug information for certain corner cases. We've
also improved the compiler to eliminate compiler crashes when generating debug information.ROCmValidationSuite
- Added GPU and operating system support.
We added support for MI300X GPU in GPU Stress Test (GST).Roc Profiler
Added option to specify desired Roc Profiler version.
You can now use rocProfV1 or rocProfV2 by specifying your desired version, as the legacy rocProf
(rocprofv1
) provides the option to use the latest version (rocprofv2
).Automated the ISA dumping process by Advance Thread Tracer.
Advance Thread Tracer (ATT) no longer depends on user-supplied Instruction Set Architecture (ISA)
and compilation process (usinghipcc --save-temps
) to dump ISA from the running kernels.Added ATT support for parallel kernels.
The automatic ISA dumping process also helps ATT successfully parse multiple kernels running in
parallel, and provide cycle-accurate occupancy information for multiple kernels at the same time.ROCr
- Support for SDMA link aggregation.
If multiple XGMI links are available when making SDMA copies between GPUs, the copy is
distributed over multiple links to increase peak bandwidth.rocThrust
- Added Thrust 2.1 API support.
rocThrust backend is updated to Thrust and CUB 2.1.rocWMMA
Added new architecture support.
We added support for gfx942 architectures.Added data type support.
We added support for f8, bf8, xf32 data types on supporting architectures, and for bf16 in the HIP RTC
environment.Added support for the PyTorch kernel plugin.
We added awareness of__HIP_NO_HALF_CONVERSIONS__
to support PyTorch users.TransferBench (beta)
Improved ordering control.
You can now set the thread block size (BLOCK_SIZE
) and the thread block order (BLOCK_ORDER
)
in which thread blocks from different transfers are run when using a single stream.Added comprehensive reports.
We modified individual transfers to report X Compute Clusters (XCC) ID whenSHOW_ITERATIONS
is set to 1.Improved accuracy in result validation.
You can now validate results for each iteration instead of just once for all iterations.
AMD has released a new major release of the ROCm GPU computation platform, which includes new performance optimizations, expanded framework and library support, and an improved developer experience.