Skip to main content
The optimize command improves model performance by applying optimization techniques such as quantization, pruning, knowledge distillation, and hyperparameter tuning.

Usage

neurenix optimize --model <model_file> [options]

Options

OptionTypeDefaultDescription
--modelstringrequiredPath to the model file
--outputstringautoOutput file for optimized model
--techniquestringautoOptimization technique
--quantizestringNoneQuantization precision (int8, fp16, fp8)
--prunefloatNonePruning level (0.0 to 1.0)
--datastringNoneCalibration data for optimization
--configstringNoneOptimization configuration file
--devicestringautoDevice to use for optimization

Optimization Techniques

TechniqueDescription
quantizeReduce precision to int8, fp16, or fp8
pruneRemove less important weights
distillTransfer knowledge to smaller model
hyperparameterTune hyperparameters automatically
autoAutomatically select best technique

Examples

Auto optimization

neurenix optimize --model models/classifier.nrx
Loading model from models/classifier.nrx...
Optimizing model using auto technique...
Saving optimized model to models/classifier_optimized.nrx...

Optimization Results:
model_size_reduction: 0.45
inference_speedup: 2.3x
accuracy_change: -0.002

Model successfully optimized and saved to models/classifier_optimized.nrx

Quantize to int8

neurenix optimize \
  --model models/model.nrx \
  --technique quantize \
  --quantize int8
Loading model from models/model.nrx...
Optimizing model using quantize technique...
Saving optimized model to models/model_int8.nrx...

Optimization Results:
model_size: 24.5 MB -> 6.2 MB
inference_time: 12.3ms -> 3.8ms
accuracy: 0.923 -> 0.918

Model successfully optimized and saved to models/model_int8.nrx

Quantize to fp16

neurenix optimize \
  --model models/large_model.nrx \
  --quantize fp16 \
  --output models/large_model_fp16.nrx
Loading model from models/large_model.nrx...
Optimizing model using auto technique...
Saving optimized model to models/large_model_fp16.nrx...

Optimization Results:
model_size_reduction: 0.5
inference_speedup: 1.8x
accuracy_change: -0.001

Model successfully optimized and saved to models/large_model_fp16.nrx

Prune model

neurenix optimize \
  --model models/model.nrx \
  --technique prune \
  --prune 0.3
Loading model from models/model.nrx...
Optimizing model using prune technique...
Saving optimized model to models/model_pruned_30.nrx...

Optimization Results:
weights_removed: 30%
model_size_reduction: 0.25
inference_speedup: 1.4x
accuracy_change: -0.015

Model successfully optimized and saved to models/model_pruned_30.nrx

Optimize with calibration data

neurenix optimize \
  --model models/model.nrx \
  --technique quantize \
  --quantize int8 \
  --data data/calibration.csv
Loading model from models/model.nrx...
Loading calibration data from data/calibration.csv...
Optimizing model using quantize technique...
Saving optimized model to models/model_int8.nrx...

Optimization Results:
model_size_reduction: 0.75
inference_speedup: 3.2x
accuracy_change: -0.005

Model successfully optimized and saved to models/model_int8.nrx

Use configuration file

neurenix optimize \
  --model models/model.nrx \
  --config configs/optimize.json
Loading model from models/model.nrx...
Optimizing model using quantize technique...
Saving optimized model to models/model_optimized.nrx...

Optimization Results:
model_size_reduction: 0.6
inference_speedup: 2.5x
accuracy_change: -0.008

Model successfully optimized and saved to models/model_optimized.nrx

Configuration File

Create a JSON configuration for complex optimization:
{
  "technique": "quantize",
  "quantize": "int8",
  "calibration": {
    "samples": 1000,
    "method": "percentile"
  },
  "validation": {
    "min_accuracy": 0.90
  }
}
Then use it:
neurenix optimize --model model.nrx --config optimize_config.json

Quantization Precision

int8 (8-bit Integer)

  • Best for: Edge deployment, mobile devices
  • Size reduction: ~75%
  • Speed improvement: 2-4x
  • Accuracy loss: 1-3%
neurenix optimize --model model.nrx --quantize int8

fp16 (16-bit Float)

  • Best for: GPU deployment
  • Size reduction: ~50%
  • Speed improvement: 1.5-2x
  • Accuracy loss: less than 1%
neurenix optimize --model model.nrx --quantize fp16

fp8 (8-bit Float)

  • Best for: Modern GPUs (H100, A100)
  • Size reduction: ~75%
  • Speed improvement: 2-3x
  • Accuracy loss: less than 2%
neurenix optimize --model model.nrx --quantize fp8

Pruning Levels

LevelPruningUse Case
0.110%Minimal optimization
0.330%Balanced optimization
0.550%Aggressive optimization
0.770%Maximum optimization
# Light pruning
neurenix optimize --model model.nrx --prune 0.1

# Aggressive pruning
neurenix optimize --model model.nrx --prune 0.5
Higher pruning levels (>0.5) may significantly impact model accuracy. Always validate performance after pruning.

Optimization Results

The command outputs detailed optimization metrics:
Optimization Results:
model_size: 245.3 MB -> 61.2 MB (75% reduction)
inference_time: 45.6ms -> 12.3ms (3.7x speedup)
accuracy: 0.934 -> 0.928 (0.6% decrease)
throughput: 22 samples/sec -> 81 samples/sec

Error Handling

Model not found

neurenix optimize --model missing.nrx
Error: Model file 'missing.nrx' not found.

Invalid quantization precision

neurenix optimize --model model.nrx --quantize invalid
Error: Invalid quantization precision. Choose from: int8, fp16, fp8

Invalid pruning level

neurenix optimize --model model.nrx --prune 1.5
Error: Pruning level must be between 0.0 and 1.0

Optimization failed

neurenix optimize --model model.nrx --quantize int8
Loading model from model.nrx...
Error optimizing model: Quantization failed - model architecture not supported

Use Cases

1. Deploy to mobile devices

neurenix optimize \
  --model models/production.nrx \
  --quantize int8 \
  --data data/calibration.csv \
  --output models/mobile.nrx

2. Reduce inference costs

neurenix optimize \
  --model models/large_model.nrx \
  --quantize fp16 \
  --output models/efficient.nrx

3. Speed up real-time inference

neurenix optimize \
  --model models/detector.nrx \
  --technique quantize \
  --quantize int8 \
  --device cuda

4. Compress models for storage

neurenix optimize \
  --model models/checkpoint.nrx \
  --prune 0.4 \
  --output models/compressed.nrx

5. Automated optimization pipeline

# Try different optimization strategies
for technique in quantize prune auto; do
  neurenix optimize \
    --model models/base.nrx \
    --technique $technique \
    --output models/optimized_${technique}.nrx
done

Best Practices

1. Use calibration data

Provide representative data for better quantization:
neurenix optimize \
  --model model.nrx \
  --quantize int8 \
  --data data/representative_samples.csv

2. Start with conservative settings

Begin with less aggressive optimization:
# Start with fp16
neurenix optimize --model model.nrx --quantize fp16

# If accuracy is acceptable, try int8
neurenix optimize --model model.nrx --quantize int8

3. Validate after optimization

Always check model performance:
# Optimize
neurenix optimize --model model.nrx --output optimized.nrx

# Validate
neurenix eval --model optimized.nrx --data data/test.csv

4. Keep original model

Never overwrite your original model:
# Good: explicit output
neurenix optimize --model model.nrx --output model_optimized.nrx

# Bad: might overwrite (don't do this)
# neurenix optimize --model model.nrx --output model.nrx

5. Document optimization settings

Save optimization configuration:
cat > optimize_config.json << EOF
{
  "technique": "quantize",
  "quantize": "int8",
  "notes": "Optimized for mobile deployment"
}
EOF

neurenix optimize --model model.nrx --config optimize_config.json

Optimization Workflow

#!/bin/bash

# 1. Train model
neurenix run train.py

# 2. Evaluate baseline
neurenix eval --model models/model.nrx --data data/test.csv > baseline.txt

# 3. Optimize
neurenix optimize \
  --model models/model.nrx \
  --quantize int8 \
  --data data/calibration.csv \
  --output models/model_int8.nrx

# 4. Evaluate optimized
neurenix eval --model models/model_int8.nrx --data data/test.csv > optimized.txt

# 5. Compare results
diff baseline.txt optimized.txt

See Also