Theano: a CPU and GPU math expression compiler J Bergstra, O Breuleux, F Bastien, P Lamblin, R Pascanu, G Desjardins, ... Proceedings of the Python for scientific computing conference (SciPy) 4 (3), 1-7, 2010 | 1756 | 2010 |

Overcoming catastrophic forgetting in neural networks J Kirkpatrick, R Pascanu, N Rabinowitz, J Veness, G Desjardins, AA Rusu, ... Proceedings of the national academy of sciences 114 (13), 3521-3526, 2017 | 1151 | 2017 |

Progressive neural networks AA Rusu, NC Rabinowitz, G Desjardins, H Soyer, J Kirkpatrick, ... arXiv preprint arXiv:1606.04671, 2016 | 693 | 2016 |

Theano: A CPU and GPU math compiler in Python J Bergstra, O Breuleux, F Bastien, P Lamblin, R Pascanu, G Desjardins, ... Proc. 9th Python in Science Conf 1, 3-10, 2010 | 657 | 2010 |

Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv, arXiv: 1605.02688, 2016 | 576 | 2016 |

Policy distillation AA Rusu, SG Colmenarejo, C Gulcehre, G Desjardins, J Kirkpatrick, ... arXiv preprint arXiv:1511.06295, 2015 | 261 | 2015 |

Combining modality specific deep neural networks for emotion recognition in video SE Kahou, C Pal, X Bouthillier, P Froumenty, Ç Gülçehre, R Memisevic, ... Proceedings of the 15th ACM on International conference on multimodal …, 2013 | 258 | 2013 |

Theano: Deep learning on gpus with python J Bergstra, F Bastien, O Breuleux, P Lamblin, R Pascanu, O Delalleau, ... NIPS 2011, BigLearning Workshop, Granada, Spain 3, 1-48, 2011 | 251 | 2011 |

Understanding disentangling in -VAE CP Burgess, I Higgins, A Pal, L Matthey, N Watters, G Desjardins, ... arXiv preprint arXiv:1804.03599, 2018 | 218 | 2018 |

Unsupervised and transfer learning challenge: a deep learning approach GMY Dauphin, X Glorot, S Rifai, Y Bengio, I Goodfellow, E Lavoie, ... Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 97-110, 2012 | 192 | 2012 |

Natural neural networks G Desjardins, K Simonyan, R Pascanu Advances in neural information processing systems, 2071-2079, 2015 | 128 | 2015 |

Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines G Desjardins, A Courville, Y Bengio, P Vincent, O Delalleau Proceedings of the thirteenth international conference on artificial …, 2010 | 117 | 2010 |

Theano: A Python framework for fast computation of mathematical expressions TTD Team, R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, ... arXiv preprint arXiv:1605.02688, 2016 | 111 | 2016 |

Parallel tempering for training of restricted Boltzmann machines G Desjardins, A Courville, Y Bengio, P Vincent, O Delalleau Proceedings of the thirteenth international conference on artificial …, 2010 | 84 | 2010 |

Steerable Playlist Generation by Learning Song Similarity from Radio Station Playlists. F Maillet, D Eck, G Desjardins, P Lamere ISMIR, 345-350, 2009 | 79 | 2009 |

Empirical evaluation of convolutional RBMs for vision G Desjardins, Y Bengio DIRO, Université de Montréal, 1-13, 2008 | 61 | 2008 |

Quadratic polynomials learn better image features J Bergstra, G Desjardins, P Lamblin, Y Bengio Technical report, 1337, 2009 | 56 | 2009 |

Disentangling factors of variation via generative entangling G Desjardins, A Courville, Y Bengio arXiv preprint arXiv:1210.5474, 2012 | 55 | 2012 |

Adaptive parallel tempering for stochastic maximum likelihood learning of RBMs G Desjardins, A Courville, Y Bengio arXiv preprint arXiv:1012.3476, 2010 | 34 | 2010 |

The spike-and-slab RBM and extensions to discrete and sparse data distributions A Courville, G Desjardins, J Bergstra, Y Bengio IEEE transactions on pattern analysis and machine intelligence 36 (9), 1874-1887, 2013 | 27 | 2013 |