Demo page for
the paper Multi-view Video Summarization to appear in IEEE Transactions on
Multimedia.
Multi-view Video Summarization
Yanwei Fu1, Yanwen Guo*1, Yanshu Zhu1, Feng Liu2, Chuanming Song1 and Zhi-Hua Zhou1,
1National
Key Lab for Novel Software Technology, Nanjing University, Nanjing
2Department
of Computer Sciences, University of Wisconsin-Madison
Multi-view Video Summarization, IEEE Trans. on Multimedia Accepted
As a Regular Paper 2010. (Corresponding to Yanwen Guo)
Experiments
We conducted experiments on several multi-view videos, including typical indoor and outdoor environments. Some multi-view videos are semi-synchronous or non-synchronous. For instance, lengths of three views of the office2 videos are 180 minutes 41 seconds, 170 minutes 46 seconds and 176 minutes 43 seconds separately. Most multi-view videos are captured by three or four cameras with 360 degree coverage of the scene. To further verify our method, we also deliberately shot an outdoor scene by four cameras with only 180 degree coverage. Note that, all of the videos are captured using the web cameras or hand-held video cameras by non-specialists, making some of them unstable and obscure. Moreover, some videos have quite different brightness across multi-views. These issues pose great challenges to the multi-view video summarization.

Fig. Multi-view video
storyboard. Without losing generality, the multi-view office1 videos
with 4 views are given for illustration. The blue rectangles denote original
multi-view videos. Each shot in summary is represented with a yellow box, by
clicking on which the corresponding shot can be displayed. Each shot in summary
is assigned a number indicating its order in those shots resulting from the
video parsing process. Here, we give the numbers for the convenience of further
discussion. Dashed lines connect those shots with strong correlations. The
middle frames of a few resulting shots, which allow the quick browse of the
summary, are demonstrated here.
Dataset:
(1)office1
(2)campus (3)road (4)badminton
(5)office2
(6)office
lobby
Comparison
with Mono-view Summarization
We
compare our method with previous mono-view video summarization method. We
realized the video summarization method, together with the visual attention
model presented in [11] cited by our manuscript. The method was applied to each
view of the multi-view office1 and campus videos. For each multi-view video, we
combined the resulting shots along the timeline to form a single video summary.
The single video summaries produced by the mono-view summarization method and
our algorithm can be found in the demo website. It is obvious that the
summaries produced by the mono-view summarization method contain much redundant
information. There exist significant temporal overlaps among summarized
multi-views shots. Most events are simultaneously recorded by four views’
summaries [Office1:
Mono-view] [Campus:
Mono-view] [office
lobby: Mono-view]. For a fair comparison, we also use the above method to
summarize the single video formed by combining the multi-view videos along the
timeline, and generate a dynamic single video summary[Office1:Mono-view2]
[Campus:Mono-view2]
[office
lobby: Mono-view2].
By
using our multi-view summarization method, such redundancy is much reduced in
contrast [Office1:
Multi-view] [Campus:
Multi-view] [office
lobby: Multi-view]. Some event is recorded by the most informative
summarized shot, while the most important events are reserved in multi-view
summaries. Some events that are ignored by previous method, for instance the
events recorded from 1st to 5th second, 14th to 18th second, and 39th to 41st
second in our office1 single video summary, are reserved by our method in
contrast. This is determined by our shot clustering algorithm and
multi-objective optimization operated on the spatio-temporal
shot graph. Such property of our method facilitates generating a short-length,
yet highly informative summary.
We
further compare our algorithm against a graph-based summarization method. A
single video is first formed by combining the multi-view videos along the
timeline. We then construct the graph according the method given in [10]. Final
summary is produced by using normalized cut based event clustering and
highlight detection [14][Office1:Graph-compared]
[Campus:
Graph-compared] [office
lobby: Graph-compared].
Email: ztwztq2006@gmail.com , update
March.26th, 2010.
Copyright© Department of Computer Science & Technology. Nanjing University. All Rights Reserved.