Overview
Neurenix provides comprehensive support for ARM processors, enabling efficient AI inference and training on ARM-based devices. The framework includes:- ARM NEON: SIMD instructions for accelerated vector operations
- ARM SVE: Scalable Vector Extension for flexible vectorization
- ARM Compute Library: Optimized neural network primitives
- ARM Ethos-U: Neural Processing Unit for edge AI
- CPU optimization: Multi-threading and cache-aware algorithms
Supported Platforms
- ARM Cortex-A series (A53, A55, A57, A72, A76, A78, X1, X2)
- ARM Neoverse (N1, N2, V1, V2)
- Apple Silicon (M1, M2, M3 series)
- Qualcomm Snapdragon
- MediaTek Dimensity
- NVIDIA Jetson (ARM CPU components)
Requirements
- ARM processor with NEON support (ARMv7-A or later)
- ARM Compute Library (optional, recommended)
- GCC 9.0+ or Clang 10.0+ with ARM extensions
Installation
Device Detection
Check ARM Availability
Get Device Properties
ARM NEON
Overview
NEON provides 128-bit SIMD instructions for parallel processing of data:Explicit NEON Operations
NEON Data Types
NEON supports various data types:float32x4_t- 4x 32-bit floatsint32x4_t- 4x 32-bit integersint16x8_t- 8x 16-bit integersint8x16_t- 16x 8-bit integers
ARM SVE (Scalable Vector Extension)
Overview
SVE provides vector operations with runtime-determined vector lengths:SVE Advantages
- Vector length agnostic code
- Future-proof for longer vectors
- Improved performance on Neoverse and future ARM CPUs
- Better handling of loop remainders
ARM Compute Library
Overview
ARM Compute Library provides highly optimized functions for computer vision and machine learning:Convolution with ACL
ARM Ethos-U NPU
Overview
Ethos-U is ARM’s neural processing unit for edge AI, providing:- Efficient inference for quantized models
- Low power consumption
- Integration with Cortex-M processors
Quantization for Ethos-U
Multi-Threading
Configure Thread Pool
Thread Affinity
Memory Management
Aligned Allocation
Memory Copy
Performance Optimization
Best Practices
- Use appropriate data types
- Enable kernel fusion
- Optimize tensor layout
- Use ARM Compute Library
Profiling
Mobile and Edge Deployment
Model Optimization
Android Deployment
Environment Variables
Benchmarking
See Also
- ARM Compute Library
- ARM NEON Programmer’s Guide
- NPU Support - Neural Processing Units
- Mobile Deployment