Baseline代码解构-改

数据读取与预处理

数据准备

获得train_anns和test_anns 以及 train_frames 和 test_frames

data{video}{clip} ={

‘file_name’: file_name,

‘group_activity’: activity,

‘actions’: actions,

‘bboxes’: bboxes,

}

获得边界框跟踪pkl
获得训练集和验证集
在Volleyballdataset初始化中，读取关键点数据

def volleyball_readpose(data_path):
    f = open(data_path,'r')
    f = f.readlines()
    pose_ann= defaultdict(dict)
    for ann in f:
        values = ann.split(' ')
        pickle_path = values[0]
        with open(pickle_path,'rb') as f:
            keypoints = pickle.load(f)
        sid = int(values[1])
        src_id = int(values[2])
        pose_ann[sid][src_id] = keypoints
    return pose_ann

获取数据

采用数据帧，volley_frames_sample
载入样本序列，load_samples_sequence
准备数据容器：

images, boxes = [], []

activities, actions = [], []

poses = []
对sample里的每一帧，需要首先打开该文件，并加入images中
其次读取边界框跟踪数据，他是不能直接用的，需要进行一些处理：

for i, track in enumerate(self.tracks[(sid, src_fid)][fid]):
                y1, x1, y2, x2 = track
                # multiply output channel shape
                w1, h1, w2, h2 = x1 * OW, y1 * OH, x2 * OW, y2 * OH
                temp_boxes[i] = np.array([w1, h1, w2, h2])
                temp_poses.append(keypoints[i])

其他处理就比较简单

模型处理模块

初始化：
- backbone使用resnet18
- roi_align
- global_head
- head
- pose_head
- pose_head2
首先使用backbone得到图像特征img_features
如果特征是多尺度的，需要进行双线性插值
再使用roi_align提取边界框特征boxes_features（B，T，N，NFB）
使用global_head模块提取全局特征得到global_token
使用joint_track 投影层初始化pose_features$(NJ, BT,d)$
使用pose-selfatt模块计算pose特征(B,T,N,d)
使用individual-init模块来初始化indi_features
使用multi-level-inference模块来进行层次式推理来不断优化indi_features
- 输入是 $和$
- 先使用trans encoder计算优化过的jointfea 再将joints_fea投影到个体维度，并与boxes_feature相加
- 再将加后的boxesfea使用transtrans encoder进行优化

Baseline代码解构-改

数据读取与预处理

数据准备

获取数据

模型处理模块

损失函数计算