Training Parameters¶

This page provides a complete reference of all parameters available when training RF-DETR models.

Basic Example¶

from rfdetr import RFDETRMedium

model = RFDETRMedium()

model.train(
    dataset_dir="path/to/dataset",
    epochs=100,
    batch_size=4,
    grad_accum_steps=4,
    lr=1e-4,
    output_dir="output",
)

Core Parameters¶

These are the essential parameters for training:

Parameter	Type	Default	Description
`dataset_dir`	`str`	Required	Path to your dataset directory. RF-DETR auto-detects if it's in COCO or YOLO format. See Dataset Formats.
`output_dir`	`str`	`"output"`	Directory where training artifacts (checkpoints, logs) are saved.
`epochs`	`int`	`100`	Number of full passes over the training dataset.
`batch_size`	`int`	`4`	Number of samples processed per iteration. Higher values require more GPU memory.
`grad_accum_steps`	`int`	`4`	Accumulates gradients over multiple mini-batches. Use with `batch_size` to achieve effective batch size.
`resume`	`str`	`None`	Path to a saved checkpoint to continue training. Restores model weights, optimizer state, and scheduler.

Understanding Batch Size¶

The effective batch size is calculated as:

effective_batch_size = batch_size × grad_accum_steps × num_gpus

Recommended configurations for different GPUs (targeting effective batch size of 16):

GPU	VRAM	`batch_size`	`grad_accum_steps`
A100	40-80GB	16	1
RTX 4090	24GB	8	2
RTX 3090	24GB	8	2
T4	16GB	4	4
RTX 3070	8GB	2	8

Learning Rate Parameters¶

Parameter	Type	Default	Description
`lr`	`float`	`1e-4`	Learning rate for most parts of the model.
`lr_encoder`	`float`	`1.5e-4`	Learning rate specifically for the backbone encoder. Can be set lower than `lr` if you want to fine-tune the encoder more conservatively than the rest of the model.

Learning rate tips

Start with the default values for fine-tuning
If the model doesn't converge, try reducing lr by half
For training from scratch (not recommended), you may need higher learning rates

Resolution Parameters¶

Parameter	Type	Default	Description
`resolution`	`int`	Model-dependent	Input image resolution. Higher values can improve accuracy but require more memory. Each model has its own valid block size: current standard detection checkpoints use multiples of 32, current segmentation checkpoints use multiples of 24 (most variants) or 12 (`RFDETRSegNano`), and the definitive rule is that the resolution must be divisible by `patch_size * num_windows` for the selected model.

Common resolution values for currently documented checkpoints:

Detection: 384, 512, 576, 704
Segmentation: 312, 384, 432, 504, 624, 768

For example, RFDETRSegXLarge uses 624x624, which is valid because 624 is divisible by 24.

Regularization Parameters¶

Parameter	Type	Default	Description
`weight_decay`	`float`	`1e-4`	L2 regularization coefficient. Helps prevent overfitting by penalizing large weights.

Hardware Parameters¶

Parameter	Type	Default	Description
`device`	`str`	`"cuda"`	Device to run training on. Options: `"cuda"`, `"cpu"`, `"mps"` (Apple Silicon).
`gradient_checkpointing`	`bool`	`False`	Re-computes parts of the forward pass during backpropagation to reduce memory usage. Lowers memory needs but increases training time.

EMA (Exponential Moving Average)¶

Parameter	Type	Default	Description
`use_ema`	`bool`	`True`	Enables Exponential Moving Average of weights. Produces a smoothed checkpoint that often improves final performance.

What is EMA?

EMA maintains a moving average of the model weights throughout training. This smoothed version often generalizes better than the raw weights and is commonly used for the final model.

Checkpoint Parameters¶

Parameter	Type	Default	Description
`checkpoint_interval`	`int`	`10`	Frequency (in epochs) at which model checkpoints are saved. More frequent saves provide better coverage but consume more storage.
`skip_best_epochs`	`int`	`0`	Ignore the first N epochs when tracking best checkpoints and early-stopping patience. Useful when fine-tuning from a prior checkpoint.

Checkpoint Files¶

During training, multiple checkpoints are saved:

File	Description
`checkpoint.pth`	Most recent checkpoint (for resuming)
`checkpoint_<N>.pth`	Periodic checkpoint at epoch N
`checkpoint_best_ema.pth`	Best validation performance (EMA weights)
`checkpoint_best_regular.pth`	Best validation performance (raw weights)
`checkpoint_best_total.pth`	Final best model for inference

Early Stopping Parameters¶

Parameter	Type	Default	Description
`early_stopping`	`bool`	`False`	Enable early stopping based on validation mAP.
`early_stopping_patience`	`int`	`10`	Number of epochs without improvement before stopping.
`early_stopping_min_delta`	`float`	`0.001`	Minimum change in mAP to qualify as an improvement.
`early_stopping_use_ema`	`bool`	`False`	Whether to track improvements using EMA model metrics.
`skip_best_epochs`	`int`	`0`	Ignore the first N epochs (0..N-1) for best-model selection and early-stopping patience.

Early Stopping Example¶

model.train(
    dataset_dir="path/to/dataset",
    epochs=200,
    batch_size=4,
    early_stopping=True,
    early_stopping_patience=15,
    early_stopping_min_delta=0.005,
    skip_best_epochs=3,
)

This configuration will:

Train for up to 200 epochs
Ignore epochs 0-2 for best-checkpoint tracking and patience counting
Stop early if mAP doesn't improve by at least 0.005 for 15 consecutive epochs

Transfer learning with pretrain_weights

When fine-tuning from pretrain_weights, the pretrained model's epoch-0 validation mAP can be artificially high relative to the training trajectory on the new dataset. This causes checkpoint_best_total.pth to always contain the untrained pretrained weights and may trigger early stopping prematurely. Use skip_best_epochs to defer best-checkpoint selection and patience counting until the model has had time to adapt.

Logging Parameters¶

Parameter	Type	Default	Description
`tensorboard`	`bool`	`True`	Enable TensorBoard logging. Requires `pip install "rfdetr[loggers]"`. If the `tensorboard` package is not installed, training continues with a `UserWarning` and TensorBoard output is silently suppressed.
`wandb`	`bool`	`False`	Enable Weights & Biases logging. Requires `pip install "rfdetr[loggers]"`.
`project`	`str`	`None`	Project name for W&B logging.
`run`	`str`	`None`	Run name for W&B logging. If not specified, W&B assigns a random name.

Logging Example¶

model.train(
    dataset_dir="path/to/dataset",
    epochs=100,
    tensorboard=True,
    wandb=True,
    project="my-detection-project",
    run="experiment-001",
)

Evaluation Parameters¶

Parameter	Type	Default	Description
`eval_max_dets`	`int`	`500`	Maximum number of detections per image considered during COCO evaluation. Lower values speed up evaluation.
`eval_interval`	`int`	`1`	Run COCO evaluation every N epochs. Set to a higher value to reduce evaluation overhead during long training runs.
`log_per_class_metrics`	`bool`	`True`	Log per-class AP metrics to the console and loggers. Disable to reduce log verbosity when there are many classes.
`progress_bar`	str \| bool \| None	`None`	Progress bar style: `"tqdm"`, `"rich"`, or `None`. Legacy booleans are still accepted.

Advanced Parameters¶

The parameters below are available for fine-grained control over training behaviour. Most users can leave these at their defaults.

Scheduler and Regularization¶

Parameter	Type	Default	Description
`lr_scheduler`	`str`	`"step"`	Learning rate scheduler type. Options: `"step"` (step decay at `lr_drop`) or `"cosine"` (cosine annealing).
`lr_min_factor`	`float`	`0.0`	Floor for the cosine scheduler, expressed as a fraction of the initial LR. Ignored when using `"step"`.
`warmup_epochs`	`float`	`0.0`	Number of epochs for linear learning rate warmup at the start of training.
`drop_path`	`float`	`0.0`	Stochastic depth drop-path rate applied to the backbone. Higher values add more regularization.

Runtime and Accelerator¶

Parameter	Type	Default	Description
`accelerator`	`str`	`"auto"`	PyTorch Lightning accelerator selection. `"auto"` picks GPU if available, then MPS, then CPU.
`seed`	`int`	`None`	Global random seed for reproducibility. `None` means no fixed seed is set.
`fp16_eval`	`bool`	`False`	Run evaluation passes in FP16 precision. Reduces memory usage but may lower numerical precision.
`compute_val_loss`	`bool`	`True`	Compute and log the detection loss on the validation set each epoch.
`compute_test_loss`	`bool`	`True`	Compute and log the detection loss during the final test run.

DataLoader Tuning¶

Parameter	Type	Default	Description
`pin_memory`	`bool`	`None`	Pin host memory in the DataLoader for faster GPU transfers. `None` defers to PyTorch Lightning's default.
`persistent_workers`	`bool`	`None`	Keep DataLoader worker processes alive between epochs. `None` defers to PyTorch Lightning's default.
`prefetch_factor`	`int`	`None`	Number of batches to prefetch per DataLoader worker. `None` uses PyTorch's built-in default.

Complete Parameter Reference¶

Below is a summary table of all training parameters:

Parameter	Type	Default	Description
`dataset_dir`	str	Required	Path to COCO or YOLO formatted dataset with train/valid/test splits.
`output_dir`	str	"output"	Directory for checkpoints, logs, and other training artifacts.
`epochs`	int	100	Number of full passes over the dataset.
`batch_size`	int	4	Samples per iteration. Balance with `grad_accum_steps`.
`grad_accum_steps`	int	4	Gradient accumulation steps for effective larger batch sizes.
`lr`	float	1e-4	Learning rate for the model (excluding encoder).
`lr_encoder`	float	1.5e-4	Learning rate for the backbone encoder.
`resolution`	int	Model-specific	Input image size (must be divisible by the selected model's `patch_size * num_windows`).
`weight_decay`	float	1e-4	L2 regularization coefficient.
`device`	str	"cuda"	Training device: cuda, cpu, or mps.
`use_ema`	bool	True	Enable Exponential Moving Average of weights.
`gradient_checkpointing`	bool	False	Trade compute for memory during backprop.
`checkpoint_interval`	int	10	Save checkpoint every N epochs.
`resume`	str	None	Path to checkpoint for resuming training.
`tensorboard`	bool	True	Enable TensorBoard logging.
`wandb`	bool	False	Enable Weights & Biases logging.
`project`	str	None	W&B project name.
`run`	str	None	W&B run name.
`early_stopping`	bool	False	Enable early stopping.
`early_stopping_patience`	int	10	Epochs without improvement before stopping.
`early_stopping_min_delta`	float	0.001	Minimum mAP change to qualify as improvement.
`early_stopping_use_ema`	bool	False	Use EMA model for early stopping metrics.
`eval_max_dets`	int	500	Maximum detections per image considered during COCO evaluation.
`eval_interval`	int	1	Run COCO evaluation every N epochs.
`log_per_class_metrics`	bool	True	Log per-class AP metrics to the console and loggers.
`progress_bar`	str \| bool \| None	None	Progress bar style: `"tqdm"`, `"rich"`, or `None`. Legacy booleans are still accepted.
`accelerator`	str	"auto"	PyTorch Lightning accelerator. "auto" selects GPU/MPS/CPU automatically.
`seed`	int	None	Random seed for reproducibility. None means no fixed seed.
`lr_scheduler`	str	"step"	Learning rate scheduler type: "step" or "cosine".
`lr_min_factor`	float	0.0	Minimum LR as a fraction of the initial LR (cosine scheduler floor).
`warmup_epochs`	float	0.0	Number of linear warmup epochs at the start of training.
`drop_path`	float	0.0	Stochastic depth drop-path rate for the backbone.
`compute_val_loss`	bool	True	Compute and log loss during validation.
`compute_test_loss`	bool	True	Compute and log loss during the test run.
`fp16_eval`	bool	False	Run evaluation in FP16 precision to reduce memory usage.
`pin_memory`	bool	None	Pin DataLoader memory. None defers to PyTorch Lightning's default.
`persistent_workers`	bool	None	Keep DataLoader workers alive between epochs. None uses PTL default.
`prefetch_factor`	int	None	Number of batches prefetched per worker. None uses PyTorch default.