Overview
Neurenix provides full support for AMD GPUs through ROCm (Radeon Open Compute), enabling high-performance deep learning on AMD hardware. The framework uses HIP (Heterogeneous Interface for Portability) for GPU operations and includes:- HIP for GPU compute operations
- rocBLAS for accelerated linear algebra
- MIOpen for optimized neural network primitives
- rocSOLVER for numerical algorithms
- Multi-GPU support via RCCL
Requirements
- AMD GPU (Radeon Instinct MI series, Radeon Pro, or compatible)
- ROCm 5.0 or later
- MIOpen 2.0 or later
- rocBLAS 2.0 or later
Supported GPUs
- AMD Instinct MI250X, MI250, MI210, MI100
- AMD Radeon Pro W6800, W6900
- AMD Radeon RX 6000 series (with ROCm 5.0+)
Installation
Install ROCm
Install Neurenix with ROCm
Device Management
Check ROCm Availability
Get Device Properties
Set Current Device
Memory Management
Allocate Memory
Memory Transfer
Memory Statistics
Streams and Asynchronous Execution
Create Streams
Stream Synchronization
ROCm Libraries
rocBLAS
Accelerated BLAS operations:MIOpen
Optimized neural network primitives:Multi-GPU Training
Data Parallel
Distributed Training with RCCL
Mixed Precision Training
Performance Optimization
Enable MIOpen Benchmarking
Kernel Fusion
Memory Pool
Profiling and Debugging
ROCm Profiler
rocprof Command Line
Memory Profiling
Environment Variables
Common Issues
Out of Memory
Performance Issues
Compatibility Issues
Migrating from CUDA
ROCm uses HIP, which is largely compatible with CUDA:See Also
- ROCm Documentation
- CUDA Support - NVIDIA GPU alternative
- Multi-GPU Training
- Performance Optimization