Designing Regularizers and Architectures for Recurrent Neural Networks

Krueger, David

Show metadata

Permalink

https://hdl.handle.net/1866/14019

Thesis or Dissertation

Krueger_David_2016_memoire.pdf (873.3Kb)

2016-01 (degree granted: 2016-05-25)

Author(s)

Krueger, David

Advisor(s)

Bengio, Yoshua

Memisevic, Roland

Level

Master's

Discipline

Informatique

Keywords

Abstract(s)

Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.

This thesis represents incremental work towards artificial intelligence using connectionist methods. Recurrent neural networks are a set of increasingly popular sequential models capable in principle of learning arbitrary algorithms. These models perform deep learning, a type of machine learning. Their generality and empirical success makes them an attractive candidate for further work and a promising tool for creating more general artificial intelligence. The first chapter of this thesis gives a brief overview of the background topics: artificial intelligence, machine learning, deep learning, and recurrent neural nets. The next three chapters cover these topics in order of increasing specificity. Finally, we contribute some general methods for recurrent neural networks. Chapter \ref{arxiv1} presents our work on the topic of recurrent neural network regularization. Regularization aims to improve a model's generalization ability, and is a key bottleneck in the performance for several applications of recurrent neural networks, most notably speech recognition. Our approach gives state of the art results on the standard TIMIT benchmark for this task. Chapter \ref{cpgp} presents the second line of work, still in progress, exploring a new architecture for recurrent neural nets. Recurrent neural networks maintain a hidden state which represents their previous observations. The idea of this work is to encode some abstract dynamics in the hidden state, giving the network a natural way to encode consistent or slow-changing trends in the state of its environment. Our work builds on a previously developed model; we describe this previous work and our contributions, including a preliminary experiment.

Collections

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.