Skip to main content

Create a Plugin

The first step is to create a plugin. An example is shown below and is also available in the plugin examples folder.

# spl/plugins/plugin_examples/plugin.py

from .adapters.dataloader import ShakespeareDataLoader
from .adapters.model_config import NanoGPTConfig
from .adapters.models.nanogpt import GPT, GPTConfig
from .adapters.model_adapter import NanoGPTModelAdapter
from .adapters.default_sot_adapter import DefaultSOTAdapter
from .adapters.plugins import StandardPlugin
from .tokenizer import CharacterLevelTokenizer

import torch

################################################################
# 1) Set up model, dataset, etc.
################################################################
tokenizer = CharacterLevelTokenizer()

model_params = GPTConfig(
block_size=256,
vocab_size=tokenizer.get_vocab_size(),
n_layer=6,
n_head=6,
n_embd=384,
dropout=0.2,
bias=False,
pad_token_id=tokenizer.pad_id
)

model_config = NanoGPTConfig(tokenizer, model_params)
model_adapter = NanoGPTModelAdapter(model_config)

ACCUMULATIONS_PER_STEP = 1
NUM_STEPS = 2 # small number for testing
EXAMPLES_PER_ACCUMULATION = 64

dataset = ShakespeareDataLoader(
model_config,
buffer_size=16,
max_seq_len=model_params.block_size,
batch_size=NUM_STEPS * EXAMPLES_PER_ACCUMULATION * ACCUMULATIONS_PER_STEP,
)

################################################################
# 2) Create the "DefaultSOTAdapter"
################################################################
TENSOR_VERSION_INTERVAL = 10
sot_adapter = DefaultSOTAdapter(
model_adapter=model_adapter,
dataset=dataset,
state_dir="/app/data/state",
tensor_version_interval=TENSOR_VERSION_INTERVAL
)

################################################################
# 3) Create the plugin with the SOT adapter
################################################################
exported_plugin = StandardPlugin(
model_adapter, # model adapter
model_config, # model config
sot_adapter, # <--- we pass the "sot_adapter" here
dataset, # dataset
tokenizer,
num_steps=NUM_STEPS,
examples_per_accumulation=EXAMPLES_PER_ACCUMULATION,
accumulations_per_step=ACCUMULATIONS_PER_STEP,
outer_max_lr=1e-4,
outer_min_lr=1e-4,
outer_weight_decay=0.01,
tensor_version_interval=TENSOR_VERSION_INTERVAL,
expected_worker_time=3,
max_concurrent_iterations=4,
chunk_shape=torch.tensor((64, 64)),
k=torch.tensor((2)),
)

sot_adapter.hyperparams_getter = exported_plugin.get_sot_learning_hyperparameters

You will notice that the plugin is composed of several key components:

Components of the Example Plugin

  • Tokenizer
    The CharacterLevelTokenizer converts raw text into token sequences and vice versa, which is essential for processing the input data.

  • Model Configuration & Adapter

    • Model Config: The NanoGPTConfig encapsulates all model hyperparameters such as block size, number of layers, heads, embedding dimension, and dropout rate.
    • Model Adapter: The NanoGPTModelAdapter wraps the model defined by the configuration, enabling transformation between a PyTorch model and its flattened tensor representation. This adapter also facilitates execution steps, gradient computation, and model updates.
  • Dataset Loader
    The ShakespeareDataLoader reads the Shakespeare text data, tokenizes it, and splits it into input-target pairs. This loader feeds batches of data to the training process.

  • SOT Adapter
    The DefaultSOTAdapter is responsible for:

    • Managing the state and synchronization of training parameters.
    • Aggregating gradients from worker nodes.
    • Coordinating parameter updates (using techniques like AdamW).
    • Handling versioning and storage of model checkpoints.
  • Plugin Wrapper
    The StandardPlugin ties all the components together. It:

    • Integrates the model adapter, model config, SOT adapter, dataset, and tokenizer.
    • Provides helper functions (e.g., for computing learning rates with cosine annealing).
    • Serves as the primary interface for the training worker to interact with the plugin.

How It Works

  1. Initialization:
    The plugin is initialized by creating instances of the tokenizer, model configuration, model adapter, and dataset loader. These components prepare the system for training by setting up model parameters and reading the training data.

  2. State & Gradient Management:
    The SOT adapter manages the gradient accumulation and state updates. It periodically finalizes parameter updates (using techniques such as AdamW) and version-controls the model weights.

  3. Plugin Export:
    Finally, the StandardPlugin is instantiated, which bundles together all the above components. The exported plugin (exported_plugin) is then ready to be used by the worker processes to:

    • Fetch data batches.
    • Perform forward and backward passes.
    • Communicate updates and synchronize model states.

Conclusion

This modular design allows you to experiment with different models, datasets, and optimization strategies by swapping out individual components. By following these steps, you have successfully created a plugin that integrates with the broader training framework. You can now leverage this plugin to conduct experiments, perform distributed training, and efficiently update model parameters across multiple worker nodes.

Feel free to explore the worker repo for more context.

It is very important that the plugin is made to be deterministic. This means that the plugin should produce the same results given the same input data and the same initial model weights. This is crucial for the security of the platform. If the plugin is not deterministic, it will be penalized. For this reason, plugins submitted to Panthalia go through an approval process. Unapproved plugins can be used, but their results will not be replicated and verified.