Generator#

The Generator (Policy) is the core inference engine in TorchForge, built on top of vLLM. It manages model serving, text generation, and weight updates for reinforcement learning workflows.

Generator#

class forge.actors.generator.Generator(engine_args=<factory>, sampling_params=<factory>, prefetch_weights_to_shm=True, n_fetcher_procs=8)[source]#

vLLM-based generator using AsyncLLM with Monarch distributed execution.

Wraps vLLM’s AsyncLLM engine and uses MonarchExecutor for multi-GPU inference. See MonarchExecutor docstring for architecture diagram.

Parameters:

engine_args – vLLM EngineArgs for model configuration. Can be EngineArgs or dict.
sampling_params – Default SamplingParams for generation. Can be SamplingParams or dict.
prefetch_weights_to_shm – Whether to prefetch weights to shared memory for faster weight updates. When enabled, weight fetchers download weights in parallel to shared memory while generation is still running. Defaults to True.
n_fetcher_procs – Number of fetcher processes for parallel weight downloading. Only used when prefetch_weights_to_shm is True. Defaults to 8.

Example

>>> generator = await Generator.options(procs=1, with_gpus=True).as_service(
...     engine_args={"model": "meta-llama/Llama-3-8B", "tensor_parallel_size": 2},
...     sampling_params={"max_tokens": 128, "temperature": 0.7},
... )
>>> completions = await generator.generate("Tell me a joke")
>>> await generator.shutdown()

engine_args#

generate#

n_fetcher_procs = 8#

prefetch_weights_to_shm = True#

sampling_params#

save_model_params#

setup#

async classmethod shutdown(actor)[source]#

Shutdown the generator and cleanup all resources.

Cleanup order: 1. Stop AsyncLLM (triggers MonarchExecutor.shutdown() which destroys

process groups and stops proc_mesh)

Stop generator_proc

stop#

update_weights#

validate_model_params#

Generator#

Generator#

GeneratorWorker#