Rate this Page

Generator#

The Generator (Policy) is the core inference engine in TorchForge, built on top of vLLM. It manages model serving, text generation, and weight updates for reinforcement learning workflows.

Generator#

class forge.actors.generator.Generator(engine_args=<factory>, sampling_params=<factory>, prefetch_weights_to_shm=True, n_fetcher_procs=8)[source]#

vLLM-based generator using AsyncLLM with Monarch distributed execution.

Wraps vLLM’s AsyncLLM engine and uses MonarchExecutor for multi-GPU inference. See MonarchExecutor docstring for architecture diagram.

Parameters:
  • engine_args – vLLM EngineArgs for model configuration. Can be EngineArgs or dict.

  • sampling_params – Default SamplingParams for generation. Can be SamplingParams or dict.

  • prefetch_weights_to_shm – Whether to prefetch weights to shared memory for faster weight updates. When enabled, weight fetchers download weights in parallel to shared memory while generation is still running. Defaults to True.

  • n_fetcher_procs – Number of fetcher processes for parallel weight downloading. Only used when prefetch_weights_to_shm is True. Defaults to 8.

Example

>>> generator = await Generator.options(procs=1, with_gpus=True).as_service(
...     engine_args={"model": "meta-llama/Llama-3-8B", "tensor_parallel_size": 2},
...     sampling_params={"max_tokens": 128, "temperature": 0.7},
... )
>>> completions = await generator.generate("Tell me a joke")
>>> await generator.shutdown()
engine_args#
generate#
n_fetcher_procs = 8#
prefetch_weights_to_shm = True#
sampling_params#
save_model_params#
setup#
async classmethod shutdown(actor)[source]#

Shutdown the generator and cleanup all resources.

Cleanup order: 1. Stop AsyncLLM (triggers MonarchExecutor.shutdown() which destroys

process groups and stops proc_mesh)

  1. Stop generator_proc

stop#
update_weights#
validate_model_params#

GeneratorWorker#