NVIDIA was announced which obtains Run:ai, an Israeli startup that created a GPU orchestrator based on Kubernetes. Although the price is not disclosed, there are reports that it is worth between $700 and $1 billion.
The acquisition of Run:ai underscores the growing importance of Kubernetes in the age of genetic artificial intelligence. This makes Kubernetes the de facto standard for managing GPU-based acceleration computing infrastructure.
Run:ai is a Tel Aviv-based AI infrastructure startup founded in 2018 by Omri Geller (CEO) and Dr. Ronen Dar (CTO). It has built an orchestration and virtualization platform tailored to the specific requirements of AI workloads running on GPUs, which efficiently pools and shares resources. Tiger Global Management and Insight Partners led a $75 million Series C round in March 2022, bringing the company’s total funding to $118 million.
Run:ai problem is solved
Unlike CPUs, GPUs cannot be easily virtualized so that multiple workloads can use them simultaneously. Hypervisors such as VMware’s vSphere and KVM allowed multiple virtual CPUs to be emulated from a single physical processor, giving workloads the illusion that they were running on a dedicated CPU. As for GPUs, they cannot be efficiently shared across multiple machine learning tasks such as training and inference. For example, researchers cannot use half of the GPU for training and experimentation while using the other half for other machine learning work. Likewise, they can’t stack multiple GPUs to make better use of available resources. This poses a huge challenge for enterprises running GPU-based workloads in the cloud or on premises.
The problem described above extends to containers and Kubernetes. If a container requires a GPU, it will effectively consume 100% of the GPU if not used to its full potential. The lack of AI and GPU chips exacerbates the problem.
Run:ai saw an opportunity to effectively solve this problem. They used the primitives and proven scheduling mechanisms of Kubernetes to create a layer that allows enterprises to use only a fraction of available GPUs or aggregate multiple GPUs. This resulted in better utilization of GPUs, providing better economy.
Here are 5 key features of the Run:ai platform:
- Orchestration and virtualization software layer tailored to AI workloads running on GPUs and other chipsets. This allows efficient pooling and sharing of GPU computing resources.
- Integration with Kubernetes for container orchestration. Run:ai’s platform is built on Kubernetes and supports all Kubernetes variants. It also integrates with third-party AI tools and frameworks.
- Central interface for managing shared computing infrastructure. Users can manage clusters, pool GPUs, and allocate computing power for various tasks through the Run:ai interface.
- Dynamic scheduling, GPU pooling and GPU fractionation for maximum performance. Run:ai’s software allows splitting GPUs into fractions and dynamically allocating them to optimize utilization.
- Integration with Nvidia’s AI stack includes DGX systems, Base Command, NGC containers, and AI Enterprise software. Run:ai worked closely with Nvidia to deliver a full-stack solution.
Notably, Run:ai is not an open source solution, even though it is based on Kubernetes. It provides customers with proprietary software to be deployed on Kubernetes clusters along with a SaaS-based management application.
Why did NVIDIA acquire Run:ai?
NVIDIA’s acquisition of Run:ai strategically positions the company to strengthen its leadership in the fields of artificial intelligence and machine learning, especially in the context of optimizing the use of GPUs for these technologies. Here are the main reasons why NVIDIA pursued this acquisition:
Improved orchestration and GPU management: Run:ai’s advanced orchestration tools are crucial for more efficient management of GPU resources. This capability is critical as demand for AI and machine learning solutions continues to grow, requiring more sophisticated management of hardware resources to ensure optimal performance and utilization.
Integration with NVIDIA’s existing AI ecosystem: By acquiring Run:ai, NVIDIA can integrate this technology into its existing suite of artificial intelligence and machine learning products. This strengthens NVIDIA’s overall product offerings, enabling better service to customers who rely on the NVIDIA ecosystem for their AI infrastructure needs. NVIDIA HGX, DGX and DGX Cloud customers will gain access to the capabilities of Run:ai for their AI workloads, particularly AI workloads.
Expanding market reach: Run:ai’s established relationships with key players in the artificial intelligence space, including their previous integration with NVIDIA technologies, provide NVIDIA with expanded market reach and the ability to serve a wider range of customers. This is especially valuable in sectors that are rapidly adopting AI technologies, but face challenges in resource management and scalability.
Innovation and Research Development: The acquisition allows NVIDIA to leverage the innovative capabilities of the Run:ai team, known for its pioneering work in virtualization and GPU management. This could lead to further advances in GPU technology and orchestration, keeping NVIDIA at the forefront of technological advancements in artificial intelligence.
Competitive advantage in a growing market: As enterprises increase their investments in artificial intelligence and machine learning, efficient GPU management becomes a competitive advantage. NVIDIA’s acquisition of Run:ai ensures that it remains competitive against other tech giants entering the AI hardware and orchestration space.
With the acquisition of Run:ai, NVIDIA not only enhances its product capabilities, but also solidifies its position as a leader in the AI infrastructure market, ensuring it stays ahead of the curve on technological innovations and market demands.
What does this mean for the Kubernetes and Cloud Native ecosystem?
NVIDIA’s acquisition of Run:ai is important to the Kubernetes and cloud ecosystems for several reasons:
Improved GPU orchestration in Kubernetes: Integrating Run:ai’s advanced GPU management and virtualization capabilities into Kubernetes will enable more dynamic allocation and efficient use of GPU resources in AI workloads. This aligns with Kubernetes’ capabilities in handling complex, resource-intensive applications, particularly in artificial intelligence and machine learning, where efficient resource management is critical.
Advances in Cloud-Native AI Infrastructure: By leveraging Run:ai’s technology, NVIDIA can further improve the Kubernetes ecosystem’s ability to support high-performance computing (HPC) and artificial intelligence workloads. This synergy between NVIDIA’s GPU technology and Kubernetes will likely lead to more powerful solutions for deploying, managing and scaling AI applications in cloud-owned environments.
Wider adoption and innovation: The acquisition could lead to wider adoption of Kubernetes in sectors increasingly dependent on artificial intelligence, such as healthcare, automotive and finance. The ability to efficiently manage GPU resources in these areas can lead to faster innovation and development cycles for AI models.
Impact on Kubernetes Maturity: The integration of NVIDIA and Run:ai technologies with Kubernetes highlights the platform’s maturity and readiness to support advanced AI workloads, cementing Kubernetes as the de facto system for modern AI and ML deployments. This could also encourage more organizations to adopt Kubernetes for their AI infrastructure needs.
NVIDIA’s move to acquire Run:ai not only strengthens its position in the AI and cloud computing markets, but also enhances the Kubernetes ecosystem’s ability to support the next generation of AI applications, benefiting a wide range of industries.