RVOS: Real-world Video Object Segmentation Dataset

RVOS is the first real-world video object segmentation dataset. Last updated on Mar. 28, 2018.

There will be two mainly contributions of this dataset:

First, there is a lack of real-world data of video object segmentation. DAVIS is captured using expensive video camera. We use mobile phone to capture objects. The photographers are volunteer and not well trained. RVOS will fill the gap for real-world video object segmentation.

Second, in E-commerce area, there needs to show products using mobile phone. The sellers capture the video of products which will be segmented in E-commerce mobile applications. Then the background can be changed or composited to a better one. We hope our dataset is useful for this purpose.

It consists of two hundreds short videos, which focus on common object in family and office.

The dataset is challenging because the objects in these videos vary wildly in their categories. There are occlusions and sudden motion and other complexity. The videos are captured in portrait using mobile phones. About 35 frames are annotated which are selected uniformly in each video. Every frame is rotated and resized to 854*480, which is 480p resolution, same as the DAVIS dataset.

Raw videos can be downloaded via: Google Drive

Selected frames and annotations can be downloaded via: Google Drive