← Work

ros2_yolos_cpp

Production-grade ROS2 nodes for YOLO inference — detection, segmentation, pose, OBB, classification with lifecycle management

Creator & Maintainer · 2024 · active · GitHub ↗

Problem

Integrating a YOLO model into a ROS2 robot requires bridging image transport, synchronising message timestamps, managing node lifecycle transitions, and serialising detections into standard ROS2 message types — all before any application logic runs. This plumbing is identical for every robotics project and should not be re-invented.

Approach

A complete ROS2 package providing composable, lifecycle-managed nodes for each YOLO task type: detection, segmentation, pose estimation, oriented bounding boxes (OBB), and classification. Every node subscribes to any sensor_msgs/Image topic and publishes results as standard vision_msgs messages.

All configuration is via ROS2 parameters — model path, confidence threshold, NMS threshold, input/output topic names, GPU toggle — so the same node binary handles different models by changing a launch argument. No recompilation needed.

Architecture

sensor_msgs/Image → CvBridge decode → YOLOs-CPP detector → task-specific publish:

Node	Output topic	Message type
Detector	`~/detections`	`vision_msgs/Detection2DArray`
Segmentor	`~/detections` + `~/masks`	`Detection2DArray` + `Image`
Pose	`~/detections`	`Detection2DArray` with keypoints
OBB	`~/detections`	`ros2_yolos_cpp/OBBDetection2DArray`
Classifier	`~/classification`	`vision_msgs/Classification`

Lifecycle node design: the ONNX session and GPU context initialise on on_activate and release on on_deactivate. Nodes compose in a single container for efficient resource sharing and are compatible with nav2 lifecycle management. CI/CD tested; strictly typed ROS2 parameters.

Results

Runs on ROS2 Humble and Jazzy with standard colcon build
GPU acceleration via use_gpu:=true launch argument (ONNX CUDA execution provider)
All five task types — detection, segmentation, pose, OBB, classification — from one package
Docker image provided; runs without local ROS2 install on dev machines
148 stars on GitHub; deployed in AMR perception pipelines

Lessons

CvBridge encoding format mismatches (BGR vs RGB) are the most common integration failure in ROS2 perception stacks. Making the colour conversion explicit in the node — rather than relying on the detector’s internal assumptions — caught encoding bugs that would otherwise surface as silently wrong detections.

Lifecycle nodes add startup complexity but pay back when running multiple models in a container: deactivating one model while keeping another active lets the system adapt to changing task requirements without a full restart.

Stack

View source on GitHub

Technologies