Train an RF-DETR Model¶
You can train RF-DETR object detection and segmentation models on a custom dataset using the rfdetr Python package, or in the cloud using Roboflow.
This guide describes how to train both an object detection and segmentation RF-DETR model.
Quick Start¶
RF-DETR supports training on datasets in both COCO and YOLO formats. The format is automatically detected based on the structure of your dataset directory.
Different GPUs have different VRAM capacities, so adjust batch_size and grad_accum_steps to maintain a total batch size of 16. For example, on a powerful GPU like the A100, use batch_size=16 and grad_accum_steps=1; on smaller GPUs like the T4, use batch_size=4 and grad_accum_steps=4. This gradient accumulation strategy helps train effectively even with limited memory.
For object detection, the RF-DETR-B checkpoint is used by default. To get started quickly with training an object detection model, please refer to our fine-tuning Google Colab notebook.
Dataset Format¶
RF-DETR automatically detects whether your dataset is in COCO or YOLO format. Simply pass your dataset directory to the train() method and the appropriate data loader will be used.
| Format | Detection Method | Learn More |
|---|---|---|
| COCO | Looks for train/_annotations.coco.json |
COCO Format Guide |
| YOLO | Looks for data.yaml + train/images/ |
YOLO Format Guide |
Roboflow allows you to create object detection datasets from scratch and export them in either COCO JSON or YOLO format for training. You can also explore Roboflow Universe to find pre-labeled datasets for a range of use cases.
→ Learn more about dataset formats
Training Configuration¶
RF-DETR provides many configuration options to customize your training run. See the complete reference for all available parameters.
→ View all training parameters
Advanced Topics¶
- Resume training from a checkpoint
- Early stopping to prevent overfitting
- Multi-GPU training with PyTorch DDP
- Logging with TensorBoard
- Logging with Weights and Biases
→ Learn more about advanced training
Result Checkpoints¶
During training, multiple model checkpoints are saved to the output directory:
-
checkpoint.pth– the most recent checkpoint, saved at the end of the latest epoch. -
checkpoint_<number>.pth– periodic checkpoints saved every N epochs (default is every 10). -
checkpoint_best_ema.pth– best checkpoint based on validation score, using the EMA (Exponential Moving Average) weights. EMA weights are a smoothed version of the model's parameters across training steps, often yielding better generalization. -
checkpoint_best_regular.pth– best checkpoint based on validation score, using the raw (non-EMA) model weights. -
checkpoint_best_total.pth– final checkpoint selected for inference and benchmarking. It contains only the model weights (no optimizer state or scheduler) and is chosen as the better of the EMA and non-EMA models based on validation performance.
Checkpoint file sizes
Checkpoint sizes vary based on what they contain:
-
Training checkpoints (e.g.
checkpoint.pth,checkpoint_<number>.pth) include model weights, optimizer state, scheduler state, and training metadata. Use these to resume training. -
Evaluation checkpoints (e.g.
checkpoint_best_ema.pth,checkpoint_best_regular.pth) store only the model weights — either EMA or raw — and are used to track the best-performing models. These may come from different epochs depending on which version achieved the highest validation score. -
Stripped checkpoint (e.g.
checkpoint_best_total.pth) contains only the final model weights and is optimized for inference and deployment.
Load and Run Fine-Tuned Model¶
Next Steps¶
After training your model, you can:
- Export your model to ONNX for deployment with various inference frameworks
- Deploy to Roboflow for cloud-based inference and workflow integration