Cloudflare, the leading content delivery network and cloud security platform, wants to make artificial intelligence accessible to developers. It has added GPU-powered infrastructure and model-serving capabilities to its network edge, bringing state-of-the-art foundational models to the masses. Any developer can leverage Cloudflare’s AI platform with a simple REST API call.
Cloudflare introduced Workers, a serverless computing platform at the edge, in 2017. Developers can use this serverless platform to build JavaScript Service Workers that run directly at Cloudflare’s edge locations around the world. With a Worker, a developer can modify a website’s HTTP requests and responses, make parallel requests, and even respond directly from the edge. Cloudflare Workers uses an API similar to the W3C Service Workers standard.
The rise of genetic artificial intelligence has prompted Cloudflare to augment its workforce with AI capabilities. The platform has three new components to support AI inference:
- Workers AI runs on NVIDIA GPUs within Cloudflare’s global network, enabling the serverless model for AI. Users pay only for what they use, allowing them to spend less time managing infrastructure and more time on their applications.
- Vectorize, a vector database, enables easy, fast, and cost-effective vector indexing and storage, supporting use cases that require access not only to functional models but also to custom data.
- AI Gateway enables organizations to cache, define and track AI deployments regardless of hosting environment.
Cloudflare has partnered with NVIDIA, Microsoft, Hugging Face, Databricks, and Meta to bring GPU infrastructure and foundational models to its limits. The platform also hosts embedding models for converting text to vectors. The Vectorize database can be used to store, index and search the vectors to add context to LLMs in order to reduce hallucinations in answers. AI Gateway provides observability, rate limiting and caching of frequent queries, reducing costs and improving application performance.
The model catalog for Workers AI boasts the latest and some of the best base models. From Meta’s Llama 2 to Stable Diffusion XL to Mistral 7B, it has everything developers need to build modern applications powered by genetic AI.
Behind the scenes, Cloudflare uses ONNX Runtime, an open exchange neural network runtime, an open source project led by Microsoft, to optimize models running in resource-constrained environments. It’s the same technology that Microsoft relies on to run foundational models in Windows.
While developers can use JavaScript to write AI inference code and deploy it to Cloudflare’s edge network, it’s possible to invoke the models through a simple REST API using any language. This makes it easy to integrate genetic AI into web, desktop and mobile applications running in different environments.
In September 2023, Workers AI initially launched with inference capabilities in seven cities. However, Cloudflare’s ambitious goal was to support the Workers AI conclusion in 100 cities by the end of the year, with near-ubiquitous coverage by the end of 2024.
Cloudflare is one of the first CDN and edge network providers to augment its edge network with AI capabilities through GPU-powered Workers AI, the vector database, and an AI portal for AI deployment management. In partnership with tech giants like Meta and Microsoft, it offers a wide catalog of models and ONNX runtime optimization.