Video summarization can be considered as the process of distilling a raw video into a more compact form without losing much information. In a general video summarization system, image features of video frames are extracted, and then the most representative frames are selected through analyzing the visual variations among visual features. This is done either by taking a holistic view of the entire video or by identifying the local differentiation among the adjacent frames. Most of those attempts rely on global features such as colour, texture, motion information, etc. Clustering techniques are also used for summarization. Video summarization can be categorized into two forms:
- Static video summarization (keyframing) and
- Dynamic video summarization (video skimming)
Static video summaries are composed of a set of keyframes extracted from the original video, while dynamic video summaries are composed of a set of shots and are produced taking into account the similarity or domain-specific relationships among all video shots.
Following is an attention-based video summarization model - PyTorch implementation of the ACCV 2018-AIU2018 paper Video Summarization with Attention
There is a video summarisation focused models using reinforcement learning - Unsupervised video summarization with deep reinforcement learning (Theano)
There is an LSTM - GAN based approach to video summarization - video summarization lstm-gan pytorch implementation
Microsoft Bing Search has come up with a video summarization technique using thumbnails. Intelligent Search: Video summarization using machine learning