Core Concepts
Overview
Surfing Weights is built around several key concepts that enable efficient streaming of model weights. Understanding these concepts will help you make the most of the library.
Weight Chunking
What is Chunking?
Chunking is the process of breaking down a large model into smaller, manageable pieces that can be:
- Stored efficiently
- Transmitted quickly
- Loaded on demand
How Chunking Works
- Model Analysis
- The model's architecture is analyzed
-
Weights are grouped by layers
-
Chunk Creation
- Each layer's weights are saved separately
- Metadata about chunks is stored
- Configuration is preserved
Weight Streaming
The Streaming Process
- Initial Setup
- Client connects to weight server
- Model architecture is initialized
-
Only metadata is loaded initially
-
On-Demand Loading
- Weights are requested as needed
- Server streams requested chunks
-
Client processes received weights
-
Smart Caching
- Frequently used weights are cached
- LRU policy manages cache size
- Cold weights are released
Storage Backends
Surfing Weights supports multiple storage backends:
- Local Filesystem
- Direct access to local files
-
Fastest for local deployment
-
Amazon S3
- Cloud-based storage
- Scalable and reliable
-
Good for distributed setups
-
Custom Backends
- Extensible interface
- Support for other storage systems
Caching System
Cache Levels
- Server-Side Cache
- Reduces storage backend access
- Shared across clients
-
Configurable size
-
Client-Side Cache
- Reduces network requests
- Per-client caching
- Memory-efficient
Cache Management
- LRU (Least Recently Used) policy
- Configurable cache sizes
- Automatic memory management
Model Support
Surfing Weights supports various transformer architectures:
- BERT
- GPT
- T5
- LLaMA
- Custom models
Each model type has specific: - Chunking strategies - Loading patterns - Optimization techniques
Monitoring
Built-in monitoring provides:
- Performance Metrics
- Request latency
- Cache hit rates
-
Memory usage
-
Health Checks
- Server status
- Backend connectivity
- Resource utilization
Next Steps
- See Configuration Guide
- Learn about Storage Backends
- Explore API Reference