论文阅读笔记:“ActionVLAD:Learning spatio-temporal aggregation for action classification”
data:image/s3,"s3://crabby-images/829bb/829bb8615cede921ab9aeaaeeb44793b3e5ccde9" alt=""
ActionVLAD: Learning spatio-temporal aggregation for action classification
论文思想
时空信息聚合
将特征空间划分为K个区域,该区域可以表示为“action words”,也可以称其为锚点(achor points ck)
公式:
data:image/s3,"s3://crabby-images/e267a/e267a44e59d626204cd50d9468988529165d873e" alt="image-20221123105611756"
上面的公式对特征与锚点(typical actions)之前的差异在整个视频维度进行了求和。
why use VLAD to pool?
data:image/s3,"s3://crabby-images/0c6d3/0c6d3abd74818339cb8fcdf50769c03bba716fc1" alt="image-20221123110326090"
data:image/s3,"s3://crabby-images/23a86/23a86c5c49bf2efe32fac181119a4fe65ae11ea9" alt="image-20221123111421804"
HOW to combine RGB and FLOW streams?
data:image/s3,"s3://crabby-images/01a9e/01a9eee54a2811d10cd507b86798689bf30b1dd7" alt="image-20221123111110433"
data:image/s3,"s3://crabby-images/72e5a/72e5a1399cc1f7edbc664201eef1a5de3bb0a01e" alt="image-20221123111334599"
Ideas?
- can I add a attention to VLAD?
- where to add attention?
- can I add non-local to VLAD?
Todos
- run the VLAD code
- look at the core codes where the VLAD is implemented
- try out the ideas that may work well
- Post title:论文阅读笔记:“ActionVLAD:Learning spatio-temporal aggregation for action classification”
- Post author:sixwalter
- Create time:2023-08-05 11:14:26
- Post link:https://coelien.github.io/2023/08/05/paper-reading/paper_reading_056/
- Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.
Comments