Jina-Serve
Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.
Key Features
- Native support for all major ML frameworks and data types
- High-performance service design with scaling, streaming, and dynamic batching
- LLM serving with streaming output
- Built-in Docker integration and Executor Hub
- One-click deployment to Jina AI Cloud
- Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI
Key advantages over FastAPI:
- DocArray-based data handling with native gRPC support
- Built-in containerization and service orchestration
- Seamless scaling of microservices
- One-command cloud deployment
Install
See guides for Apple Silicon and Windows.
Core Concepts
Three main layers:
- Data: BaseDoc and DocList for input/output
- Serving: Executors process Documents, Gateway connects services
- Orchestration: Deployments serve Executors, Flows create pipelines
Build AI Services
Let's create a gRPC-based AI service using StableLM:
Deploy with Python or YAML:
Use the client:
Build Pipelines
Chain services into a Flow:
Scaling and Deployment
Local Scaling
Boost throughput with built-in features:
- Replicas for parallel processing
- Shards for data partitioning
- Dynamic batching for efficient model inference
Example scaling a Stable Diffusion deployment:
Cloud Deployment
Containerize Services
- Structure your Executor:
equirements.txt
- Configure:
- Push to Hub:
Deploy to Kubernetes
Use Docker Compose
JCloud Deployment
Deploy with a single command:
LLM Streaming
Enable token-by-token streaming for responsive LLM applications:
- Define schemas:
- Initialize service:
- Implement streaming:
- Serve and use:
Support
Jina-serve is backed by Jina AI and licensed under Apache-2.0.
No reviews found!
No comments found for this product. Be the first to comment!