OpenTela
OpenTela is a decentralized compute orchestration platform that manages distributed GPU resources across supercomputing environments without requiring a central coordinator. Built on peer-to-peer networking and CRDT-based state management, it enables researchers to deploy interactive ML serving systems like vLLM directly on HPC clusters using only user-space permissions, bridging batch schedulers with interactive applications.
OpenTela is a decentralized compute orchestration platform that manages distributed GPU resources across HPC environments using peer-to-peer networking and CRDT-based state management. It enables researchers to deploy interactive ML serving systems like vLLM on supercomputing clusters without requiring root privileges or central coordinators.
- ✓Innovative CRDT-based approach to distributed state management eliminates single points of failure in HPC orchestration
- ✓User-space overlay design allows deployment on restrictive supercomputing environments without kernel modifications or root access
- ✓Real-world adoption powering SwissAI Serving demonstrates proven scalability and production readiness
- →Add comprehensive test coverage and CI/CD pipeline details to demonstrate code quality and reliability
- →Include performance benchmarks and scalability metrics to quantify the system's capabilities under load