Export RF-DETR Model¶

Key Takeaways

Export to ONNX for cross-platform inference with ONNX Runtime, OpenVINO, or TensorRT
Export to TFLite (FP32, FP16, INT8) for mobile and edge deployment
TensorRT conversion delivers lowest latency on NVIDIA GPUs (2.3 ms for Nano)
INT8 quantization requires calibration data from your dataset for accurate results
Custom input resolutions supported (must be divisible by 14)

RF-DETR supports exporting models to ONNX and TFLite formats, enabling deployment across a wide range of inference frameworks, edge devices, and hardware accelerators.

Installation¶

Install the export dependencies you need:

# ONNX export only
pip install "rfdetr[onnx]"

# TFLite export (includes ONNX dependency)
pip install "rfdetr[onnx,tflite]"

Basic Export¶

Export your trained model to ONNX format:

Object DetectionImage Segmentation

from rfdetr import RFDETRMedium

model = RFDETRMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export()

from rfdetr import RFDETRSegMedium

model = RFDETRSegMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export()

This command saves the ONNX model to the output directory by default.

Export Parameters¶

The export() method accepts several parameters to customize the export process:

Parameter	Default	Description
`output_dir`	`"output"`	Directory where the exported model will be saved.
`format`	`"onnx"`	Export format: `"onnx"` or `"tflite"`.
`quantization`	`None`	TFLite quantization mode: `None`/`"fp32"`, `"fp16"`, or `"int8"`. Only used when `format="tflite"`.
`calibration_data`	`None`	Calibration data for TFLite export. Image directory, `.npy` file path, NumPy array, or `None`. See TFLite Export.
`max_images`	`100`	Maximum number of images to load from a calibration directory for TFLite INT8 quantization. Ignored for other calibration data formats.
`infer_dir`	`None`	Path to an image file to use for tracing. If not provided, a random dummy image is generated.
`simplify`	`False`	Deprecated and ignored. ONNX simplification is no longer run by `export()`.
`backbone_only`	`False`	Export only the backbone feature extractor instead of the full model.
`opset_version`	`17`	ONNX opset version to use for export. Higher versions support more operations.
`verbose`	`True`	Whether to print verbose export information.
`force`	`False`	Deprecated and ignored.
`shape`	`None`	Input shape as tuple `(height, width)`. Each dimension must be divisible by the selected model's block size (`patch_size * num_windows`). If not provided, uses the model's default resolution.
`batch_size`	`1`	Batch size for the exported model.

Advanced Export Examples¶

Export with Custom Output Directory¶

from rfdetr import RFDETRMedium

model = RFDETRMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export(output_dir="exports/my_model")

Deprecated: Export with Simplification¶

The simplify flag is deprecated and ignored:

from rfdetr import RFDETRMedium

model = RFDETRMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export(simplify=True)  # Deprecated: same result as model.export()

Export with Custom Resolution¶

Export the model with a specific input resolution. For example, RFDETRMedium expects dimensions divisible by 32 (patch_size=16, num_windows=2):

from rfdetr import RFDETRMedium

model = RFDETRMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export(shape=(608, 608))

Export Backbone Only¶

Export only the backbone feature extractor for use in custom pipelines:

from rfdetr import RFDETRMedium

model = RFDETRMedium(pretrain_weights="<path/to/checkpoint.pth>")

model.export(backbone_only=True)

Output Files¶

After running the export, you will find the following files in your output directory:

inference_model.onnx - The exported ONNX model (or backbone_model.onnx if backbone_only=True)

Optional: Convert ONNX to TensorRT¶

If you want lower latency on NVIDIA GPUs, you can convert the exported ONNX model to a TensorRT engine.

[!IMPORTANT] Run TensorRT conversion on the same machine and GPU family where you plan to deploy inference.

Prerequisites¶

Install TensorRT (trtexec must be available in your PATH)
Export an ONNX model first (for example: output/inference_model.onnx)

Python API Conversion¶

from argparse import Namespace

from rfdetr.export.tensorrt import trtexec

args = Namespace(
    verbose=True,
    profile=False,
    dry_run=False,
)

trtexec("output/inference_model.onnx", args)

This produces output/inference_model.engine. If profile=True, it also writes an Nsight Systems report (.nsys-rep).

TFLite Export¶

Export your model to TFLite for deployment on mobile devices, microcontrollers, and edge hardware via TensorFlow Lite. The TFLite export pipeline converts ONNX → TensorFlow → TFLite using onnx2tf.

Prerequisites¶

pip install "rfdetr[onnx,tflite]"

Basic TFLite Export (FP32)¶

Object DetectionImage Segmentation

from rfdetr import RFDETRBase

model = RFDETRBase()

model.export(format="tflite", output_dir="output")

from rfdetr import RFDETRSegNano

model = RFDETRSegNano()

model.export(format="tflite", output_dir="output")

This produces both output/inference_model_float32.tflite and output/inference_model_float16.tflite.

INT8 Quantization with Calibration Data¶

For INT8 quantization, provide representative images from your dataset as calibration data. This is critical for preserving model accuracy — without real calibration data, the quantizer uses random noise and accuracy will be poor.

Option 1: Point to an Image Directory (Recommended)¶

The simplest approach — just point calibration_data to a directory containing JPEG/PNG images. The converter automatically loads, resizes, and prepares the images:

from rfdetr import RFDETRNano

model = RFDETRNano()
model.export(
    format="tflite",
    quantization="int8",
    calibration_data="path/to/val2017/",  # directory of images
    output_dir="output",
)

The converter loads up to 100 images from the directory by default, resizes them to the model's input resolution, and uses them for both output validation and INT8 calibration. Supported formats: JPEG, PNG, BMP, WebP.

You can control how many images are loaded with the max_images parameter:

model.export(
    format="tflite",
    quantization="int8",
    calibration_data="path/to/val2017/",
    max_images=200,  # load up to 200 images (default: 100)
    output_dir="output",
)

Option 2: NumPy `.npy` File¶

Prepare calibration data as a NumPy array and save it to a .npy file:

Shape: (N, H, W, 3) — NHWC format with 3 color channels
Data type: float32
Value range: [0, 1] (divide by 255, but do not apply ImageNet normalization — the converter handles that automatically)
Recommended: 20–100 representative images from your dataset

import numpy as np
from PIL import Image
from rfdetr import RFDETRBase

model = RFDETRBase()
target_resolution = model.resolution

# Load representative images from your dataset
images = []
for path in image_paths[:50]:  # 50 representative samples
    img = Image.open(path).convert("RGB").resize((target_resolution, target_resolution))
    images.append(np.array(img, dtype=np.float32) / 255.0)

calibration_data = np.stack(images)  # shape: (50, H, W, 3)

# Save to .npy for reuse
np.save("calibration_data.npy", calibration_data)

# Export with INT8 quantization
model.export(
    format="tflite",
    quantization="int8",
    calibration_data="calibration_data.npy",
    output_dir="output",
)

Option 3: NumPy Array Directly¶

You can also pass the NumPy array directly without saving to disk:

model.export(
    format="tflite",
    quantization="int8",
    calibration_data=calibration_data,  # np.ndarray
    output_dir="output",
)

FP16 Export¶

FP16 models are always produced alongside FP32. You can explicitly request FP16 mode:

model.export(format="tflite", quantization="fp16", output_dir="output")

TFLite Output Files¶

The onnx2tf converter always produces both FP32 and FP16 TFLite files, regardless of the requested quantization mode. When quantization="int8" is specified, it additionally produces the INT8-quantized model.

File	Description
`inference_model_float32.tflite`	FP32 model (always produced)
`inference_model_float16.tflite`	FP16 model (always produced)
`inference_model_integer_quant.tflite`	INT8 model (when `quantization="int8"`)

Note

Segmentation models produce TFLite files with three outputs: dets (bounding boxes), labels (class scores), and masks (per-instance segmentation masks).

TFLite Inference Example¶

import numpy as np
from PIL import Image

# pip install tflite-runtime  (or use tensorflow.lite)
import tflite_runtime.interpreter as tflite

# Load model
interpreter = tflite.Interpreter(model_path="output/inference_model_float32.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Prepare input — TFLite model expects NHWC, ImageNet-normalized
input_height, input_width = input_details[0]["shape"][1:3]
image = Image.open("image.jpg").convert("RGB").resize((input_width, input_height))
image_array = np.array(image, dtype=np.float32) / 255.0

# Apply ImageNet normalization
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image_array = (image_array - mean) / std

# Add batch dimension: (1, H, W, 3)
image_array = np.expand_dims(image_array, axis=0).astype(np.float32)

# Run inference
interpreter.set_tensor(input_details[0]["index"], image_array)
interpreter.invoke()

boxes = interpreter.get_tensor(output_details[0]["index"])
labels = interpreter.get_tensor(output_details[1]["index"])

Using the Exported Model¶

Once exported, you can use the ONNX model with various inference frameworks:

ONNX Runtime¶

import onnxruntime as ort
import numpy as np
from PIL import Image

# Load the ONNX model
session = ort.InferenceSession("output/inference_model.onnx")

# Prepare input image
input_height, input_width = session.get_inputs()[0].shape[2:4]
image = Image.open("image.jpg").convert("RGB")
image = image.resize((input_width, input_height))  # Resize to the exported model shape
image_array = np.array(image).astype(np.float32) / 255.0

# Normalize
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image_array = (image_array - mean) / std

# Convert to NCHW format
image_array = np.transpose(image_array, (2, 0, 1))
image_array = np.expand_dims(image_array, axis=0)

# Run inference
outputs = session.run(None, {"input": image_array})
boxes, labels = outputs

Next Steps¶

After exporting your model, you may want to:

Deploy to Roboflow for cloud-based inference and workflow integration
Use the ONNX model with TensorRT for optimized GPU inference
Deploy TFLite models on mobile/edge devices with TensorFlow Lite
Integrate with edge deployment frameworks like ONNX Runtime or OpenVINO

Export RF-DETR Model¶

Installation¶

Basic Export¶

Export Parameters¶

Advanced Export Examples¶

Export with Custom Output Directory¶

Deprecated: Export with Simplification¶

Export with Custom Resolution¶

Export Backbone Only¶

Output Files¶

Optional: Convert ONNX to TensorRT¶

Prerequisites¶

Python API Conversion¶

TFLite Export¶

Prerequisites¶

Basic TFLite Export (FP32)¶

INT8 Quantization with Calibration Data¶

Option 1: Point to an Image Directory (Recommended)¶

Option 2: NumPy .npy File¶

Option 3: NumPy Array Directly¶

FP16 Export¶

TFLite Output Files¶

TFLite Inference Example¶

Using the Exported Model¶

ONNX Runtime¶

Next Steps¶

Option 2: NumPy `.npy` File¶