Algorithms designed for time-evolving domains have a history of being evaluated on synthetic data, e.g., the "moving hyperplane" class of artificial data, where concept drift is introduced manually and any correlation to real-world problems is unestablished. This motivated the creation of natural datasets taken from the problem domain. The natural datasets used here are taken from logged field tests conducted by DARPA evaluators, and have been shown to contain time-varying (drifting) concepts.
Overall, three scenarios are considered. Each scenario is associated with two distinct image sequences, each representing a different lighting condition. There are thus six datasets total. The terrain appearing in the datasets varies greatly, and includes various combinations of ground type (mulch, dirt); foliage; natural obstacles (trees, dense shrubs); and man-made obstacles (hay bales). Lighting conditions range from overcast with good color definition (e.g., DS1B, shown above), to very sunny, causing shadows and saturation (e.g., DS2A). Additional descriptions and representative images from each dataset are available.
Each dataset consists of a 100-frame hand-labeled image sequence. Each image was manually labeled, with each pixel being placed into one of three classes: Obstacle, Groundplane, or Unknown. If it was difficult for a human to tell what a certain area of an image was--even when using higher-level context--then that region was labeled as Unknown. On average, approximately 80% of each image was labeled as either Obstacle or Groundplane, with the remaining 20% labeled as Unknown.
The datasets were hand-labeled with a custom tool written specially for this purpose. This tool, Pixelwise Image Labeler, is available for download.
These are MATLAB-6 compatible *.mat files (read in via the load() function). Each MAT file (representing one single frame from the robot log files) has the raw RGB image as well as the disparity information (so you can do your own stereo processing if desired). Also included in the MAT file is an integer "mask" of the image indicating a pixelwise labeling. 0 means ground plane, 1 means obstacle, and 2 means "this pixel was not labeled by a human. Unlabeled areas have meaning; they may be regions for which the terrain class was hard to tell (even with context), or they may be "don't cares" (e.g., sky).
For further information on these datasets, including additional representative images, see:
Procopio, M. J. (2007). An Experimental Analysis of Classifier Ensembles for Learning Drifting Concepts Over Time in Autonomous Outdoor Robot Navigation. PhD thesis, University of Colorado at Boulder, Department of Computer Science. Available online (PDF).
Procopio, M. J., Mulligan, J., and Grudic, G. (2009). Learning Terrain Segmentation with Classifier Ensembles for Autonomous Robot Navigation in Unstructured Environments. Journal of Field Robotics.
Special thanks to to Wei Xu (at the University of Colorado at Boulder) and to Sharon Procopio for their assistance in labeling these images.
If you use this data in your research, we ask that you cite it as follows:
@misc{Procopio-LabeledLAGRData-07,
author = {Michael J. Procopio},
title = {Hand-Labeled {DARPA} {LAGR} Datasets},
howpublished = {Available at \url{http://ml.cs.colorado.edu/~procopio/labeledlagrdata/}},
year = {2007}
}
The datasets can be downloaded as individual ZIP archives using the links below.
Dataset 1A (DS1A) - 100 Frames From LAGR Test 11 (452 MB)In addition to the labeled data above, there is a significant amount of unlabeled data frames from each of the above six datasets. These unlabeled frames occur later in the particular robot test run, and as such, exhibit stronger degrees of concept drift, and also include terrain not present in the earlier frames (1 to 100). These supplemental datasets, which start at Frame 101 (picking up where the labeled datasets above leave off) are available for download below. Contributions of labeled data frames (using the Pixelwise Image Labeler software) are welcome.
Dataset 1A (DS1A) - 203 Supplemental Frames From LAGR Test 11 (101-303) (543 MB)