Modeling Deep Temporal Dependencies with Recurrent “Grammar Cells”

2014 Michalski, V., Memisevic, R., Konda, K.
Modeling Deep Temporal Dependencies with Recurrent “Grammar Cells”
Neural Information Processing Systems (NIPS 2014)
[pdf][supplementary][bibtex]

Abstract

We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time. We also show how stacking multiple layers of gating units in a recurrent pyramid makes it possible to represent the ”syntax” of complicated time series, and that it can outperform standard recurrent neural networks in terms of prediction accuracy on a variety of tasks.

Supplementary Material

Bottom-layer PGP filter pairs

Filters pairs (left/right input receptive fields) of PGP models trained on the accelerated transformation data sets introduced in our paper, the bouncing balls data set [1] and NORBvideos [2]:

Accelerated Rotations

Accelerated Shifts

Bouncing Balls

NORBvideos

Generated Sequences

Bouncing Balls

Some sequences generated by a three-layer PGP on the bouncing balls data set (generated with the script released with

(the first 4 frames are seeded, the remaining frames are generated by the model):

 

Some shorter predicted sequences (right) together with ground truth (left) from preliminary experiments with a 2-layer PGP (3 seed frames not shown):

Chirps

Additional chirp predictions not shown in the paper, because of space restrictions. After seeing 5 windows of 10 frames (frames 1-50) each the models predicted the remaining sequence. The comparison models are the Conditional Restricted Boltzmann Machine (CRBM) [3] trained with contrastive divergence and a vanilla RNN trained with backpropagation through time.

References

[1] I. Sutskever, G. E. Hinton, and G. W. Taylor. The recurrent temporal restricted boltzmann machine. In Advances in Neural Information Processing Systems 21, pages 1601–1608, 2008.

[2] R. Memisevic and G. Exarchakis. Learning invariant features by harnessing the aperture problem. In Proceedings of the 30th International Conference on Machine Learning, 2013.

[3] G. W. Taylor, G. E. Hinton, and S. T. Roweis. Modeling human motion using binary latent variables. In Advances in Neural Information Processing Systems 20, pages 1345–1352, 2007.