RUDDER 2021

Manual Annotation Guidelines

Our task is to refine the structure between the captions of different videos. And to perform video2text and Text2video retrieval. Our dataset consists of videos on science experiments and farming. Close to 600 videos with approximately 7 descriptions per video. In this dataset we have defined four different relations

The set of captions in RUDDER can be broadly partitioned into two categories:

Listing captions - It contains the raw material used in the video, e.g. You need plastic bottles, cycle spoke, sticks, ring magnet and steel pin. (English translation)
Describing captions - How to use the listed objects, e.g. place two strong ring magnets in cycle spoke.

For every query, the augmented supervision consists of the follow four supervised classes of relevance judgements (in addition to the default relevance judgement associated with any retrieval dataset),

(i) positive (ii) partial (iii) description and (iv) negative.

Positive- the relevance judgement when both captions/video segment share an almost similar amount of information.
1. If A and B both are listing captions only object overlap is required.
  A: You will need a Card sheet, colour card, colour pencils, glue and OHP Marker.
  B: You will need 9-cm squares, card sheet, glue, scissors and thread.
2. If A and B both are describing captions both object and action overlap is required.
  A: Attach the end of the copper wires to both the ends of the rod.
  B: Attach bare copper wires to ends of multimeters.
Partial- the relevance judgement when both captions/video segment share somewhat similar information.
A: To make corrugated fan, we need paper cup, glue, scissor, cycle spoke, card paper strips (60cm X 5cm), card strip and press button.
B: Now curl the paper diagonally on the cycle spoke. Fold tightly to make a paper tube. Glue the triangular flap on the tube to prevent it from opening.
Description- the relevance judgement when the caption/video segment is a historical description of a second caption or the second caption is an application of an object mentioned in the first caption.
1. If A is listing caption, then B should show the use of objects mentioned in caption A.
  A- You will need a Card sheet, colour card, colour pencils, glue and OHP Marker.
  B - Take 2 card sheet paddles and apply rubble glue.
2. If A is describing the caption, then B should be the continuation of the activity of A.
  A: Clean the ends of copper rod and copper wire with sandpaper.
  B: Dip a clean copper rod in Silver Nitrate solution.
Negative- the relevance judgement when the caption/video segment are irrelevant to each other.

The annotations were performed on approximately 32700 sentence pairs. While the manual annotation task was divided across three annotators, a separate annotator verified all the annotations. All this amounted to around 200 hours of annotation work cumulatively.