Due to the diversity of actions and millions of video data uploaded daily through platforms like TikTok and Instagram, real-world deep learning methods must support untrimmed video. Therefore, we propose a more realistic and challenging scenario for class incremental learning using untrimmed videos. We encourage future work to focus on this and beat the current approaches.
Model | Frames per video | Mem. Frame Capacity | ActivityNet-Untrim | ActivityNet-Trim | ||
---|---|---|---|---|---|---|
Acc | BWF | Acc | BWF | |||
iCaRL | 4 | 1.6 × 104 | 16.28% | 32.75% | 21.63% | 36.98% |
iCaRL | 8 | 3.2 × 104 | 16.67% | 31.96% | 21.54% | 33.41% |
iCaRL | 16 | 6.4 × 104 | 21.27% | 28.94% | 25.27% | 29.71% |
iCaRL+TC | 4 | 1.6 × 104 | 36.07% | 22.39% | 42.99% | 23.82% |
iCaRL+TC | 8 | 3.2 × 104 | 40.29% | 20.80% | 45.73% | 18.90% |
iCaRL+TC | 16 | 6.4 × 104 | 40.45% | 21.21% | 44.04% | 22.82% |