Overview
Neurenix implements state-of-the-art deep reinforcement learning algorithms for both discrete and continuous control tasks. Each algorithm is optimized for specific problem types and learning scenarios.Algorithm Comparison
| Algorithm | Action Space | Policy Type | Key Feature |
|---|---|---|---|
| DQN | Discrete | Off-policy | Experience replay |
| A2C | Both | On-policy | Advantage estimation |
| PPO | Both | On-policy | Clipped policy updates |
| DDPG | Continuous | Off-policy | Deterministic policy |
| SAC | Continuous | Off-policy | Entropy regularization |
Deep Q-Network (DQN)
DQN learns a Q-function to estimate state-action values for discrete action spaces.Basic Usage
neurenix/rl/algorithms.py:19
DQN Architecture
The algorithm creates two networks:neurenix/rl/agent.py:258
Experience Replay
DQN uses experience replay for efficient learning:neurenix/rl/agent.py:372
Target Network Updates
neurenix/rl/value.py:223
Loss Function
neurenix/rl/value.py:188
Variants
Double DQN
neurenix/rl/algorithms.py:40
Dueling DQN
neurenix/rl/algorithms.py:41
Advantage Actor-Critic (A2C)
A2C learns both a policy (actor) and a value function (critic) using advantage estimation.Basic Usage
neurenix/rl/algorithms.py:161
Network Architecture
neurenix/rl/algorithms.py:216
Advantage Calculation
Discrete vs Continuous Actions
Discrete Actions
neurenix/rl/algorithms.py:227
Continuous Actions
neurenix/rl/algorithms.py:237
Proximal Policy Optimization (PPO)
PPO constrains policy updates to improve training stability.Basic Usage
neurenix/rl/algorithms.py:367
Clipped Surrogate Objective
Generalized Advantage Estimation (GAE)
Early Stopping
Deep Deterministic Policy Gradient (DDPG)
DDPG learns a deterministic policy for continuous control.Basic Usage
neurenix/rl/algorithms.py:441
Actor-Critic Architecture
Exploration Noise
Soft Target Updates
Loss Functions
Soft Actor-Critic (SAC)
SAC learns a stochastic policy with maximum entropy for robust learning.Basic Usage
neurenix/rl/algorithms.py:511
Maximum Entropy Framework
SAC maximizes both reward and entropy:Twin Q-Networks
SAC uses two Q-networks to reduce overestimation:Automatic Temperature Tuning
Training Tips
Hyperparameter Tuning
Monitoring Training
neurenix/rl/agent.py:106
Evaluation
Next Steps
Training
Master advanced training techniques
Policies
Learn about RL policies