Software 42835 Published by

AMD has released ROCm 6.3.1, featuring enhancements to its operating system and hardware support, versioning of components, identification of known and resolved issues, as well as forthcoming changes. Significant new features comprise improved resiliency for Instinct MI300 accelerators, the launch of the ROCm Runfile Installer, and revisions to the ROCm documentation. The latest ROCm Megatron-LM training Docker is now available alongside the ROCm vLLM inference Docker. Additionally, the Instinct MI300X workload tuning guide has been revised to include updated optimization strategies. The HIP graph-safe libraries function securely within HIP execution graphs, and the device memory topic in the HIP memory management section has been revised. ROCm 6.3.1 introduces support for Debian 12 (kernel 6.1) and the AMD Instinct MI325X accelerator.




ROCm 6.3.1 Release

Release notes

The release notes provide a summary of notable changes since the previous ROCm release.

If you’re using Radeon:tm: PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
Detailed component changes.

Per queue resiliency for Instinct MI300 accelerators

The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.

ROCm Runfile Installer

ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubuntu 22.04. The ROCm Runfile Installer facilitates ROCm installation without using a native Linux package management system, with or without network or internet access. For more information, see the  ROCm Runfile Installer documentation.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

  • Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
    containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
    Training a model using ROCm Megatron-LMto get started.

    The new ROCm Megatron-LM training Docker accompanies the  ROCm vLLM inference
    Docker

    as a set of ready-to-use containerized solutions to get started with using ROCm
    for AI.

  • Updated the  Instinct MI300X workload tuning
    guide
     with more current optimization
    strategies. The updated sections include guidance on vLLM optimization, PyTorch TunableOp, and hipBLASLt tuning.

  • HIP graph-safe libraries operate safely in HIP execution graphs.  HIP graphs are an alternative way of executing tasks on a GPU that can provide performance benefits over launching kernels using the standard method via streams. A topic that shows whether a  ROCm library is graph-safe has been added.

  • The  Device memory topic in the HIP memory management section has been updated.

  • The HIP documentation has expanded with new resources for developers:

Operating system and hardware support changes

ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at  Debian native installation.

ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see  AMD Instinct:tm: MI325X Accelerators.

See the  Compatibility
matrix
for more information about operating system and hardware compatibility.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.3.1, including any version
changes from 6.3.0 to 6.3.1. Click the component's updated version to go to a list of its changes.
Click {fab}github to go to the component's source code on GitHub.

CategoryGroupNameVersion
LibrariesMachine learning and computer vision Composable Kernel1.1.0
MIGraphX2.11.0
MIOpen3.3.0
MIVisionX3.1.0 ⇒  3.1.0
rocAL2.1.0
rocDecode0.8.0
rocJPEG0.6.0
rocPyDecode0.2.0
RPP1.9.1
Communication RCCL2.21.5 ⇒  2.21.5
Math hipBLAS2.3.0
hipBLASLt0.10.0
hipFFT1.0.17
hipfort0.5.0
hipRAND2.11.1
hipSOLVER2.3.0
hipSPARSE3.1.2
hipSPARSELt0.2.2
rocALUTION3.2.1
rocBLAS4.3.0
rocFFT1.0.31
rocRAND3.2.0
rocSOLVER3.27.0
rocSPARSE3.3.0
rocWMMA1.6.0
Tensile4.42.0
Primitives hipCUB3.3.0
hipTensor1.4.0
rocPRIM3.3.0
rocThrust3.3.0
ToolsSystem management AMD SMI24.7.1 ⇒  24.7.1
ROCm Data Center Tool0.3.0
rocminfo1.0.0
ROCm SMI7.4.0
ROCmValidationSuite1.1.0
Performance ROCm Bandwidth Test1.4.0
ROCm Compute Profiler3.0.0 ⇒  3.0.0
ROCm Systems Profiler0.1.0 ⇒  0.1.0
ROCProfiler2.0.0
ROCprofiler-SDK0.5.0
ROCTracer4.1.0
Development HIPIFY18.0.0 ⇒  18.0.0
ROCdbgapi0.77.0
ROCm CMake0.14.0
ROCm Debugger (ROCgdb)15.2
ROCr Debug Agent2.0.3
Compilers HIPCC1.1.1
llvm-project18.0.0
Runtimes HIP6.3.0 ⇒  6.3.1
ROCr Runtime1.14.0

Detailed component changes

The following sections describe key changes to ROCm components.

AMD SMI (24.7.1)

Changed

  • amd-smi monitor displays VCLOCK and DCLOCK instead of ENC_CLOCK and DEC_CLOCK.

Resolved issues

  • Fixed amd-smi monitor's reporting of encode and decode information. VCLOCK and DCLOCK are
    now associated with both ENC_UTIL and DEC_UTIL.
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.

HIP (6.3.1)

Added

  • An activeQueues set that tracks only the queues that have a command submitted to them, which allows fast iteration in waitActiveStreams.

Resolved issues

  • A deadlock in a specific customer application by preventing hipLaunchKernel latency degradation with number of idle streams.

HIPIFY (18.0.0)

Added

  • Support for:
    • NVIDIA CUDA 12.6.2
    • cuDNN 9.5.1
    • LLVM 19.1.3
    • Full hipBLAS 64-bit APIs
    • Full rocBLAS 64-bit APIs

Resolved issues

  • Added missing support for device intrinsics and built-ins: __all_sync__any_sync__ballot_sync__activemask__match_any_sync__match_all_sync__shfl_sync__shfl_up_sync__shfl_down_sync, and __shfl_xor_sync.

MIVisionX (3.1.0)

Changed

  • AMD Clang is now the default CXX and C compiler.
  • The dependency on rocDecode has been removed and automatic rocDecode installation is now disabled in the setup script.

Resolved issues

  • Canny failure on Instinct MI300 has been fixed.
  • Ubuntu 24.04 CTest failures have been fixed.

Known issues

  • CentOS, Red Hat, and SLES requires the manual installation of OpenCV and FFMPEG.
  • Hardware decode requires that ROCm is installed with --usecase=graphics.

Upcoming changes

  • Optimized audio augmentations support for VX_RPP.

RCCL (2.21.5)

Changed

  • Enhanced the user documentation.

Resolved Issues

  • Corrected some user help strings in install.sh.

ROCm Compute Profiler (3.0.0)

Resolved issues

ROCm Systems Profiler (0.1.0)

Added

  • Improvements to support OMPT target offload.

Resolved issues

  • Fixed an issue with generated Perfetto files. See  issue #3767 for more information.

  • Fixed an issue with merging multiple .proto files.

  • Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.

  • Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from omnitrace.
    See  ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues.

ROCm known issues

ROCm known issues are noted on {fab}github  GitHub. For known
issues related to individual components, review the  Detailed component changes.

PCI Express Qualification Tool failure on Debian 12

The PCI Express Qualification Tool (PEQT) module present in the ROCm Validation Suite (RVS) might fail due to the segmentation issue in Debian 12 (bookworm). This will result in failure to determine the characteristics of the PCIe interconnect between the host platform and the GPU like support for Gen 3 atomic completers, DMA transfer statistics, link speed, and link width. The standard PCIe command lspci can be used as an alternative to view the characteristics of the PCIe bus interconnect with the GPU. This issue is under investigation and will be addressed in a future release. See  GitHub issue #4175.

ROCm resolved issues

The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the  Detailed component changes.

Instinct MI300 series: backward weights convolution performance issue

Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 series accelerators. See  GitHub issue #4080.

ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues

Packaging metadata for ROCm Compute Profiler (rocprofiler-compute) and ROCm Systems Profiler
(rocprofiler-systems) has been updated to handle the renaming from Omniperf and Omnitrace,
respectively. This fixes minor issues when upgrading from ROCm 6.2 to 6.3. For more information, see the GitHub issues
#4082 and
#4083.

Stale file due to OpenCL ICD loader deprecation

When upgrading from ROCm 6.2.x to ROCm 6.3.0, the issue of removal of the rocm-icd-loader package
leaving a stale file in the old rocm-6.2.x directory has been resolved. The stale files left during
the upgrade from ROCm 6.2.x to ROCm 6.3.0 will be removed when upgrading to ROCm 6.3.1. For more
information, see  GitHub issue #4084.

ROCm upcoming changes

The following changes to the ROCm software stack are anticipated for future releases.

AMDGPU wavefront size compiler macro deprecation

The __AMDGCN_WAVEFRONT_SIZE__ macro will be deprecated in an upcoming
release. It is recommended to remove any use of this macro. For more information, see  AMDGPU
support
.

HIPCC Perl scripts deprecation

The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.

Release ROCm 6.3.1 Release · ROCm/ROCm