RUDDER 2021

Video based Heuristical Annotation Guidelines

All the experimental results discussed so far use textual query-based heuristics to classify the query caption pairs into positive, partial and negative. However, this need not always be the case. We now investigate and describe a video-based heuristic that samples positive/partial/negative pairs based on the objects in the videos and the actions happening in them. Specifically, we call a pair a positive pair if the objects and actions in both the videos have a high degree of overlap. Likewise, a pair with partial overlap of objects and actions, depending on the overlap thresholds α_o and α_a, is classified as a partial pair. Any pair that does not satisfy the positivity or partiality criteria is treated as a negative pair. This heuristic, in essence, is a video-based analog of the Noun-Verb heuristic on textual queries.