Skip to content

Surfing Weights Documentation

Quick Start

Quick Start Guide

This guide will help you get up and running with Surfing Weights in minutes.

Basic Setup

First, install Surfing Weights:
```
pip install streaming-weights[server]
```

Chunking a Model

Before streaming a model, you need to chunk it into smaller pieces:

from streaming_weights import ModelChunker

# Initialize the chunker with a HuggingFace model
chunker = ModelChunker("prajjwal1/bert-tiny", "./chunks/bert-tiny")

# Chunk the model
chunk_info = chunker.chunk_model()
print(f"Model chunked into {len(chunk_info['chunks'])} pieces")

Starting the Weight Server

Create a server to stream your chunked model:

from streaming_weights import WeightServer
import asyncio

async def start_server():
    # Initialize the server with your chunked model
    server = WeightServer(
        model_path="./chunks/bert-tiny",
        port=8765,
        cache_size="2GB"  # Optional: Set cache size
    )

    # Start the server
    await server.start_server()

# Run the server
if __name__ == "__main__":
    asyncio.run(start_server())

Client Usage

Connect to the weight server and use the model:

from streaming_weights import StreamingBertModel

# Initialize the streaming model
model = StreamingBertModel(
    server_url="http://localhost:8765",
    model_name="prajjwal1/bert-tiny"
)

# Use the model for inference
text = "Hello, world!"
outputs = model.encode(text)

Configuration Options

Basic server configuration:

server = WeightServer(
    model_path="./chunks/bert-tiny",
    port=8765,
    cache_size="2GB",
    compression=True,  # Enable weight compression
    monitoring=True    # Enable Prometheus metrics
)

Next Steps