Skip to main content
The serve command deploys a trained Neurenix model as an API server, making it accessible for real-time inference.

Usage

neurenix serve --model <model_file> [options]

Options

OptionTypeRequiredDefaultDescription
--modelstringYes-Path to the model file (.nrx format)
--hoststringNo0.0.0.0Host to bind the server to
--portintegerNo8000Port to bind the server to
--devicestringNoautoDevice for inference (cpu, cuda, auto)
--batch-sizeintegerNo1Batch size for inference
--workersintegerNo1Number of worker processes
--api-typestringNorestAPI type (rest, websocket, grpc)
--configstringNoNoneAPI configuration file
--authflagNofalseEnable authentication
--corsflagNofalseEnable CORS

API Types

REST API

Default HTTP REST API with JSON payloads:
neurenix serve --model models/model.nrx --api-type rest
Endpoints:
  • POST /predict - Make predictions
  • GET /info - Get model information
  • GET /health - Health check endpoint

WebSocket API

Real-time bidirectional communication:
neurenix serve --model models/model.nrx --api-type websocket
Connection:
  • WebSocket ws://host:port/ws

gRPC API

High-performance RPC:
neurenix serve --model models/model.nrx --api-type grpc
Connection:
  • gRPC endpoint at host:port

Examples

Basic server

neurenix serve --model models/model.nrx
Loading model from models/model.nrx...
Creating REST API server...
Starting server on 0.0.0.0:8000...

API Endpoints:
  POST http://0.0.0.0:8000/predict
  GET  http://0.0.0.0:8000/info
  GET  http://0.0.0.0:8000/health

Press Ctrl+C to stop the server.

Custom host and port

neurenix serve \
  --model models/model.nrx \
  --host 127.0.0.1 \
  --port 5000
Loading model from models/model.nrx...
Creating REST API server...
Starting server on 127.0.0.1:5000...

API Endpoints:
  POST http://127.0.0.1:5000/predict
  GET  http://127.0.0.1:5000/info
  GET  http://127.0.0.1:5000/health

Press Ctrl+C to stop the server.

GPU inference

neurenix serve \
  --model models/model.nrx \
  --device cuda \
  --batch-size 32

Multiple workers

neurenix serve \
  --model models/model.nrx \
  --workers 4

WebSocket server

neurenix serve \
  --model models/model.nrx \
  --api-type websocket \
  --port 8080
Loading model from models/model.nrx...
Creating WEBSOCKET API server...
Starting server on 0.0.0.0:8080...

API Endpoints:
  WebSocket ws://0.0.0.0:8080/ws

Press Ctrl+C to stop the server.

gRPC server

neurenix serve \
  --model models/model.nrx \
  --api-type grpc \
  --port 50051
Loading model from models/model.nrx...
Creating GRPC API server...
Starting server on 0.0.0.0:50051...

API Endpoints:
  gRPC 0.0.0.0:50051
  Use the generated client to connect to the server.

Press Ctrl+C to stop the server.

Enable authentication

neurenix serve \
  --model models/model.nrx \
  --auth

Enable CORS

neurenix serve \
  --model models/model.nrx \
  --cors

Custom configuration

neurenix serve \
  --model models/model.nrx \
  --config server_config.json
server_config.json:
{
  "timeout": 30,
  "max_request_size": 10485760,
  "rate_limit": {
    "requests_per_minute": 60
  },
  "logging": {
    "level": "INFO",
    "file": "logs/server.log"
  }
}

Production deployment

neurenix serve \
  --model models/production.nrx \
  --host 0.0.0.0 \
  --port 8000 \
  --device cuda \
  --workers 8 \
  --batch-size 64 \
  --auth \
  --cors \
  --config production_config.json

Making Requests

REST API

Predict endpoint

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": [[1.0, 2.0, 3.0, 4.0]]}'
Response:
{
  "predictions": [0.85, 0.12, 0.03],
  "inference_time": 0.023
}

Info endpoint

curl http://localhost:8000/info
Response:
{
  "model": "models/model.nrx",
  "version": "1.0.0",
  "input_shape": [4],
  "output_shape": [3],
  "device": "cuda:0"
}

Health endpoint

curl http://localhost:8000/health
Response:
{
  "status": "healthy",
  "uptime": 3600,
  "requests_served": 1523
}

WebSocket API

const ws = new WebSocket('ws://localhost:8080/ws');

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'predict',
    data: [[1.0, 2.0, 3.0, 4.0]]
  }));
};

ws.onmessage = (event) => {
  const response = JSON.parse(event.data);
  console.log('Prediction:', response.predictions);
};

Python Client

import requests

response = requests.post(
    'http://localhost:8000/predict',
    json={'inputs': [[1.0, 2.0, 3.0, 4.0]]}
)

print(response.json())
# {'predictions': [0.85, 0.12, 0.03], 'inference_time': 0.023}

Error Handling

Model not found

neurenix serve --model nonexistent.nrx
Error: Model file 'nonexistent.nrx' not found.

Port already in use

neurenix serve --model models/model.nrx --port 8000
Error serving model: Address already in use
Solution: Use a different port:
neurenix serve --model models/model.nrx --port 8001

Performance Tuning

Batch Size

Increase batch size for higher throughput:
neurenix serve \
  --model models/model.nrx \
  --batch-size 64  # Process up to 64 requests at once

Workers

Increase workers for concurrent requests:
neurenix serve \
  --model models/model.nrx \
  --workers 8  # 8 worker processes

GPU Acceleration

neurenix serve \
  --model models/model.nrx \
  --device cuda  # Use GPU for inference

Best Practices

1. Use production-ready configuration

neurenix serve \
  --model models/model.nrx \
  --config production_config.json \
  --workers 8 \
  --auth \
  --cors

2. Monitor server health

# In a monitoring script
while true; do
  curl http://localhost:8000/health
  sleep 60
done

3. Use reverse proxy for production

# nginx configuration
upstream neurenix {
    server localhost:8000;
}

server {
    listen 80;
    server_name api.example.com;
    
    location / {
        proxy_pass http://neurenix;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

4. Set resource limits

# Limit memory and CPU
docker run -m 4g --cpus=2 \
  neurenix-server \
  neurenix serve --model /models/model.nrx

5. Enable logging

Create a config file with logging:
{
  "logging": {
    "level": "INFO",
    "file": "logs/server.log",
    "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  }
}

6. Implement graceful shutdown

The server handles SIGINT and SIGTERM for graceful shutdown:
# Start server
neurenix serve --model models/model.nrx &
PID=$!

# Stop gracefully
kill -SIGTERM $PID

Deployment Scenarios

Local Development

neurenix serve \
  --model models/model.nrx \
  --host 127.0.0.1 \
  --port 8000

Docker Container

FROM python:3.10
RUN pip install neurenix
COPY models/model.nrx /app/model.nrx
CMD ["neurenix", "serve", "--model", "/app/model.nrx", "--host", "0.0.0.0"]
docker build -t neurenix-server .
docker run -p 8000:8000 neurenix-server

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: neurenix-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: neurenix
        image: neurenix-server:latest
        command: ["neurenix", "serve"]
        args:
          - "--model"
          - "/models/model.nrx"
          - "--workers"
          - "4"
        ports:
        - containerPort: 8000

Cloud Deployment

# AWS EC2, Google Cloud, Azure, etc.
neurenix serve \
  --model models/model.nrx \
  --host 0.0.0.0 \
  --port 8000 \
  --workers 8 \
  --device cuda \
  --auth

Stopping the Server

Press Ctrl+C to stop the server gracefully:
^C
Stopping server...
Server stopped.

See Also