1. Summary
This package contains two parts:
- The "original" part contains 2000 natural scene images.
This part is somewhat big, about 35.9Mb (24.2Mb after compression).
- The "processed" part contains data sets for multi-instance multi-label learning.
This part is not big, about 618Kb (608Kb after compression).
The data set has been used in:
ATTN: You can feel free to use the package (for academic purpose only) at your own risk. It will be appreciated if you will send a copy of your paper (if your paper has used the data) to:
Prof.
Z.-H. Zhou
National Laboratory for
Novel Software Technology,
Nanjing
University, Mailbox 419,
Hankou Road
22,
Nanjing 210093, China
E-mail:
zhouzh@nju.edu.cn
URL: http://cs.nju.edu.cn/zhouzh/
Download: [datafile] (24.7Mb)
2. Details
The image data set consists of
2,000 natural scene images, where a set of labels is artificially assigned to
each image. The following table gives the detailed description of the number
of images associated with different label sets, where all the possible class
labels are desert, mountains, sea, sunset and trees.
The number of images belonging to more than one class (e.g. sea+sunset)
comprises over 22% of the data set, many combined classes (e.g. mountains+sunset
+trees) are extremely rare. On average, each image is associated with 1.24
class labels.
Table 1. Characteristics of the natural scene image data
----------------------------------------------------------------------------------------------------------------------------------------
Label Set
#Images |
Label Set
#Images |
Label Set
#Images
----------------------------------------------------------------------------------------------------------------------------------------
desert
340 |
desert+sunset
21 |
sunset+trees
28
mountains
268 |
desert+trees
20
| desert+mountains+sunset
1
sea
341
|
mountains+sea
38
| desert+sunset+trees
3
sunset
216
| mountains+sunset
19
| mountains+sea+trees
6
trees
378
|
mountains+trees
106
|
mountains+sunset+trees
1
desert+mountains
19
| sea+sunset
172 |
sea+sunset+trees
4
desert+sea
5
|
sea+trees
14 |
Total
2,000
----------------------------------------------------------------------------------------------------------------------------------------
The "original" part of this package contains all these 2,000 natural
scene images, which are named in numbers from 1 to 2,000.
The "processed" part of this package contains the multi-instance multi-label data (in MATLAB format) obtained from the natural scene images. Specifically, each image is represented as a bag of nine instances generated by the SBN method [1]. Concretely, each image is smoothed by a Gaussian filter and subsampled to an 8x8 matrix of color blobs where each blob is a 2x2 set of pixels within the 8x8 matrix. An SBN is defined as the combination of a single blob with its four neighboring blobs (up, down, left, right). The sub-image is described as a 15-dimensional vector, where the first three attributes represent the mean R, G, B values of the central blob and the remaining twelve attributes correspond to the differences in mean color values between the central blob and other four neighboring blobs respectively. Therefore, each image bag is represented by a collection of nine 15-dimensional feature vectors obtained by using each of the nine blobs not along the border as the central blob. Furthermore, each image is also manually assigned with a set of labels.
After reading the processed data into MATLAB environment, for the i-th natural scene image in the "original" part, the image bag corresponding to this image is stored in bags{i,1} while its associated labels are stored in target(:,i). For illustration purpose, suppose target(:,i)' equals [1 -1 -1 1 -1], it means that the i-th image belongs to the 1st and 4th classes but do not belong to the 2nd, 3rd and 5th classes. The variable "class_name" gives the name of each class.
[1] O. Maron and A. L. Ratan.
Multiple-instance learning for natural scene classification. In: Proceedings
of the 15th International Conference on Machine Learning, pp. 341-349, Madison,
WI, 1998.