Dataset Formats¶

RF-DETR supports training on datasets in two popular formats: COCO and YOLO. The format is automatically detected based on your dataset's directory structure—simply pass your dataset directory to the train() method.

Automatic Format Detection¶

When you call model.train(dataset_dir=<path>), RF-DETR checks the following:

COCO format: Looks for train/_annotations.coco.json
YOLO format: Looks for data.yaml (or data.yml) and train/images/ directory

If neither format is detected, an error is raised with instructions on what's expected.

Roboflow Export

Roboflow can export datasets in both COCO and YOLO formats. When downloading from Roboflow, select the appropriate format based on your preference.

COCO Format¶

COCO (Common Objects in Context) format uses JSON files to store annotations in a structured format with images, categories, and annotations.

Directory Structure¶

dataset/
├── train/
│   ├── _annotations.coco.json
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ... (other image files)
├── valid/
│   ├── _annotations.coco.json
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ... (other image files)
└── test/
    ├── _annotations.coco.json
    ├── image1.jpg
    ├── image2.jpg
    └── ... (other image files)

Annotation File Structure¶

Each _annotations.coco.json file contains:

{
  "info": {
    "description": "Dataset description",
    "version": "1.0"
  },
  "licenses": [],
  "images": [
    {
      "id": 1,
      "file_name": "image1.jpg",
      "width": 640,
      "height": 480
    }
  ],
  "categories": [
    {
      "id": 1,
      "name": "cat",
      "supercategory": "animal"
    },
    {
      "id": 2,
      "name": "dog",
      "supercategory": "animal"
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [
        100,
        150,
        200,
        180
      ],
      "area": 36000,
      "iscrowd": 0
    }
  ]
}

Key Fields¶

Field	Description
`images`	List of image metadata including `id`, `file_name`, `width`, `height`
`categories`	List of object categories with `id` and `name`
`annotations`	List of object annotations linking images to categories
`bbox`	Bounding box in `[x, y, width, height]` format (top-left corner)
`area`	Area of the bounding box
`iscrowd`	0 for individual objects, 1 for crowd regions

Segmentation Annotations¶

For training segmentation models, your COCO annotations must include a segmentation key with polygon coordinates:

{
  "id": 1,
  "image_id": 1,
  "category_id": 1,
  "bbox": [
    100,
    150,
    200,
    180
  ],
  "area": 36000,
  "iscrowd": 0,
  "segmentation": [
    [
      100,
      150,
      150,
      150,
      200,
      200,
      150,
      250,
      100,
      200
    ]
  ]
}

The segmentation field contains a list of polygons, where each polygon is a flat list of coordinates: [x1, y1, x2, y2, x3, y3, ...].

Keypoint Annotations¶

For training the keypoint preview model, use COCO JSON keypoint annotations. Roboflow-style COCO exports are supported when the split files are named train/_annotations.coco.json and valid/_annotations.coco.json.

Each keypoint annotation must include a bounding box plus COCO keypoint fields:

{
  "id": 1,
  "image_id": 1,
  "category_id": 0,
  "bbox": [
    100,
    150,
    200,
    180
  ],
  "area": 36000,
  "iscrowd": 0,
  "num_keypoints": 17,
  "keypoints": [
    110,
    160,
    2,
    125,
    158,
    2
  ]
}

The category should declare the keypoint schema:

{
  "id": 0,
  "name": "person",
  "supercategory": "person",
  "keypoints": [
    "nose",
    "left_eye",
    "right_eye"
  ],
  "skeleton": []
}

The keypoints array above is shortened for readability. In a valid COCO person-keypoint annotation it contains 17 * 3 values: x, y, and visibility for each keypoint.

The keypoint preview model is pretrained on COCO person-style keypoints. Its default COCO schema is [0, 17], so keypoint-bearing categories are mapped onto the active keypoint label slot during COCO loading. Custom keypoint training can also use YOLO pose labels, described below.

YOLO Format¶

YOLO format uses separate text files for each image's annotations and a data.yaml configuration file that defines class names.

Directory Structure¶

dataset/
├── data.yaml
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── labels/
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
├── valid/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── labels/
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
└── test/
    ├── images/
    │   ├── image1.jpg
    │   └── ...
    └── labels/
        ├── image1.txt
        └── ...

data.yaml Configuration¶

The data.yaml file at the root of your dataset directory defines the class names:

names:
  - cat
  - dog
  - bird

nc: 3

train: train/images
val: valid/images
test: test/images

Field	Description
`names`	List of class names (0-indexed)
`nc`	Number of classes
`train`, `val`, `test`	Paths to image directories (relative to data.yaml)

Alternative format

Some YOLO datasets use a dictionary format for names:

names:
  0: cat
  1: dog
  2: bird

Both formats are supported.

Label File Format¶

Each image has a corresponding .txt file in the labels/ directory with the same base name. Each line in the label file represents one object:

<class_id> <x_center> <y_center> <width> <height>

Example (image1.txt):

0 0.5 0.4 0.3 0.2
1 0.2 0.6 0.15 0.25

Coordinate Format¶

Field	Range	Description
`class_id`	0, 1, 2, ...	Zero-indexed class ID from `names` in data.yaml
`x_center`	0.0 - 1.0	Normalized x-coordinate of bounding box center
`y_center`	0.0 - 1.0	Normalized y-coordinate of bounding box center
`width`	0.0 - 1.0	Normalized width of bounding box
`height`	0.0 - 1.0	Normalized height of bounding box

All coordinates are normalized relative to image dimensions. For example, if an image is 640×480 pixels and the bounding box center is at (320, 240):

x_center = 320 / 640 = 0.5
y_center = 240 / 480 = 0.5

Segmentation Labels (YOLO-Seg)¶

For segmentation, YOLO format extends the label format with polygon coordinates:

<class_id> <x1> <y1> <x2> <y2> <x3> <y3> ...

Example (image1.txt with segmentation):

0 0.1 0.2 0.3 0.2 0.4 0.5 0.2 0.6 0.1 0.4

The coordinates after the class ID represent the polygon vertices in normalized format.

Pose Labels (YOLO Pose)¶

For keypoint preview training, RF-DETR supports Ultralytics YOLO pose labels in the same directory layout shown above. The data.yaml file must declare kpt_shape:

names:
  0: person

kpt_shape: [17, 3] # [number_of_keypoints, dimensions]; dimensions must be 2 or 3
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
kpt_names:
  0:
    - nose
    - left_eye
    - right_eye

kpt_names is optional. When omitted, RF-DETR creates placeholder names such as keypoint_0. flip_idx is an Ultralytics-style length-K permutation used to infer RF-DETR's flat keypoint_flip_pairs for horizontal-flip augmentation.

Each pose label row contains a bounding box followed by keypoints:

<class_id> <x_center> <y_center> <width> <height> <px1> <py1> <v1> ... <pxK> <pyK> <vK>

For kpt_shape: [K, 2], omit the visibility value:

<class_id> <x_center> <y_center> <width> <height> <px1> <py1> ... <pxK> <pyK>

All box and keypoint coordinates are normalized to [0, 1]. RF-DETR converts keypoints to COCO-style (x, y, visibility) tensors internally. For [K, 3], the visibility values are preserved. For [K, 2], visibility is synthesized: nonzero points are marked visible (2) and (0, 0) points are marked absent (0).

Use the YOLO schema helper when you want to configure a model explicitly:

from pathlib import Path

from rfdetr import RFDETRKeypointPreview
from rfdetr.datasets._keypoint_schema import infer_yolo_keypoint_schema

DATASET_DIR = Path("/path/to/yolo-pose-dataset")
schema = infer_yolo_keypoint_schema(DATASET_DIR / "data.yaml")

model = RFDETRKeypointPreview(
    num_classes=len(schema.class_names),
    num_keypoints_per_class=schema.num_keypoints_per_class,
)

model.train(
    dataset_file="yolo",
    dataset_dir=str(DATASET_DIR),
    class_names=schema.class_names,
    keypoint_oks_sigmas=schema.keypoint_oks_sigmas,
)

flip_idx and keypoint_flip_pairs

flip_idx is a permutation, while keypoint_flip_pairs is a flat pair list. During model.train(), RF-DETR infers the pair list automatically from flip_idx when no explicit keypoint_flip_pairs is provided.

Converting Between Formats¶

YOLO to COCO¶

You can use the supervision library to convert datasets:

import supervision as sv

# Load YOLO dataset
dataset = sv.DetectionDataset.from_yolo(
    images_directory_path="path/to/images",
    annotations_directory_path="path/to/labels",
    data_yaml_path="path/to/data.yaml",
)

# Save as COCO
dataset.as_coco(images_directory_path="output/images", annotations_path="output/annotations.json")

COCO to YOLO¶

import supervision as sv

# Load COCO dataset
dataset = sv.DetectionDataset.from_coco(
    images_directory_path="path/to/images", annotations_path="path/to/annotations.json"
)

# Save as YOLO
dataset.as_yolo(
    images_directory_path="output/images", annotations_directory_path="output/labels", data_yaml_path="output/data.yaml"
)

Using Roboflow¶

Roboflow provides a web interface to:

Upload datasets in any format
Annotate new images or edit existing annotations
Export in COCO, YOLO, or other formats

This is often the easiest way to convert between formats while also having the option to augment your data.

Which Format Should I Use?¶

Both formats work equally well with RF-DETR. Choose based on your workflow:

Consideration	COCO	YOLO
Annotation storage	Single JSON file per split	One text file per image
Human readability	JSON structure, verbose	Simple text, compact
Other framework compatibility	DETR family, MMDetection	Ultralytics YOLO
Segmentation support	Full polygon support	Full polygon support
Editing annotations	Requires JSON parsing	Simple text editing

Recommendation

If you're exporting from Roboflow or already have a dataset in one format, simply use that format. RF-DETR handles both identically.

Troubleshooting¶

Format Detection Fails¶

If you see an error like:

Could not detect dataset format in /path/to/dataset

Check that:

For COCO format:

train/_annotations.coco.json exists
The JSON file is valid

For YOLO format:

data.yaml or data.yml exists at the root
train/images/ directory exists with images

Empty Annotations¶

If images have no objects, handle them as follows:

COCO format: Include the image in the images array but don't add any annotations for it.

YOLO format: Create an empty .txt file (0 bytes) for the image, or omit the label file entirely.

Class ID Mismatch¶

COCO format: Category IDs in annotations must match IDs defined in the categories array.

YOLO format: Class IDs in label files must be valid indices (0 to nc-1) based on the names list in data.yaml.