serve command deploys a trained Neurenix model as an API server, making it accessible for real-time inference.
Usage
Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
--model | string | Yes | - | Path to the model file (.nrx format) |
--host | string | No | 0.0.0.0 | Host to bind the server to |
--port | integer | No | 8000 | Port to bind the server to |
--device | string | No | auto | Device for inference (cpu, cuda, auto) |
--batch-size | integer | No | 1 | Batch size for inference |
--workers | integer | No | 1 | Number of worker processes |
--api-type | string | No | rest | API type (rest, websocket, grpc) |
--config | string | No | None | API configuration file |
--auth | flag | No | false | Enable authentication |
--cors | flag | No | false | Enable CORS |
API Types
REST API
Default HTTP REST API with JSON payloads:POST /predict- Make predictionsGET /info- Get model informationGET /health- Health check endpoint
WebSocket API
Real-time bidirectional communication:WebSocket ws://host:port/ws
gRPC API
High-performance RPC:- gRPC endpoint at
host:port
Examples
Basic server
Custom host and port
GPU inference
Multiple workers
WebSocket server
gRPC server
Enable authentication
Enable CORS
Custom configuration
Production deployment
Making Requests
REST API
Predict endpoint
Info endpoint
Health endpoint
WebSocket API
Python Client
Error Handling
Model not found
Port already in use
Performance Tuning
Batch Size
Increase batch size for higher throughput:Workers
Increase workers for concurrent requests:GPU Acceleration
Best Practices
1. Use production-ready configuration
2. Monitor server health
3. Use reverse proxy for production
4. Set resource limits
5. Enable logging
Create a config file with logging:6. Implement graceful shutdown
The server handles SIGINT and SIGTERM for graceful shutdown:Deployment Scenarios
Local Development
Docker Container
Kubernetes
Cloud Deployment
Stopping the Server
PressCtrl+C to stop the server gracefully:
See Also
- Export command - Export models for deployment
- Eval command - Evaluate model performance
- Run command - Train models
- Optimize command - Optimize models