AI/GPU Roadmap Spotlight: Modelserve

What Is Modelserve?

Modelserve is a service designed to run AI model inferences at scale, affordably. Developed by an external team in collaboration with Golem Factory, Modelserve has been integrated as a new element of the Golem Network ecosystem. It allows for the seamless deployment and inference of AI models through scalable endpoints, ensuring efficient and cost-effective AI apps operations.

This new service is set to become a key component of Golem's ecosystem, supporting the AI open-source community and attracting builders of AI applications for GPU providers.

Why Are We Doing It?

Our focus on AI responds to the growing demand for computing power in the AI industry. Consumer-grade GPU resources offer sufficient computing power and memory to run AI models effectively. Moreover, for many AI models such as diffusion models, automatic speech recognition, or small and medium language models, using consumer-grade GPUs is more cost-effective. The decentralized architecture of Golem Network serves as the perfect marketplace for matching the supply and demand for these resources, enabling anyone to access computing power that is both perfectly suited to the task and cost-effective for AI applications.

The addition of Modelserve to the Golem ecosystem plays a key role in getting AI use cases, driving demand for providers and contributing to the broader adoption of the Golem Network

For Whom It Is?

Modelserve is designed for services and product developers, startups, and companies operating in both Web 2.0 and Web 3.0 environments who:

Utilize small and medium size open-source models or create their own models from scratch
Require scalable AI model inference capabilities
Seek an environment to test and experiment with AI models

Technical Implementation

Modelserve comprises three key components:

Website: Allows users to create and manage endpoints
Backend: Manages GPU resources to handle inferences, featuring a load balancer and auto-scaling capabilities. It leverages GPU resources available in the market, sourcing them from the Golem open and decentralized marketplace and also other platforms that offer GPU instances
API: Allows to run inference of AI models and manage endpoints

The app uses USD payments to provide the best user experience for the AI industry community. Settlements with Golem GPU providers are conducted according to the protocol using GLM underneath.

Benefits for Users

Maintenance-Free AI Infrastructure (AI IaaS): No need to manage model deployment, inference, and building GPU clusters. Modelserve handles it all for its users
Affordable Autoscaling: The system automatically scales GPU resources to meet the demands of the application, without requiring users to manage clusters or worry about bandwidth
Cost-Effective Pricing: Users pay only for the actual processing time of their requests, ensuring optimal pricing without the expense of hourly GPU rentals or own cluster maintenance

Synergy with other AI/GPU Projects

Modelserve integrates with GPU Provider, AI Provider GamerHash AI (under development, PoC stage) to leverage GPU resources. Besides, as part of Modelserve, we created the first version of Golem-Workers, which we are now developing as a separate project (more details will be shared soon).

Milestones and Next Steps

Beta tests - done with several AI based startups/companies
Golem Community Tests: July
Commercialization: Starting in August

How to start using Modelserve AI

Check out the Modelserve website