Manipulation actions transform objects from an initial state into a final state. In this paper, we report on the use of object state transitions as a mean for recognizing manipulation actions. Our method is inspired by the intuition that object states are visually more apparent than actions from a still frame and thus provide information that is complementary to spatio-temporal action recognition. We start by defining a state transition matrix that maps action labels into a pre-state and a post-state. From each keyframe, we learn appearance models of objects and their states. Manipulation actions can then be recognized from the state transition matrix. We report results on the EPIC kitchen action recognition challenge.
Technical Report to appear soon.
Pytorch Implementation [Code] .