Exploring Attention Based Model for Captioning Images

Xu, Kelvin

dc.contributor.advisor	Bengio, Yoshua
dc.contributor.advisor	Courville, Aaron
dc.contributor.author	Xu, Kelvin
dc.date.accessioned	2018-05-31T13:29:01Z
dc.date.available	NO_RESTRICTION	fr
dc.date.available	2018-05-31T13:29:01Z
dc.date.issued	2018-03-21
dc.date.submitted	2017-12
dc.identifier.uri	http://hdl.handle.net/1866/20194
dc.subject	Reseaux de Neurones	fr
dc.subject	Generation de Description	fr
dc.subject	Apprentissage Profond	fr
dc.subject	Apprentissage de Representations	fr
dc.subject	Apprentissage Supervise	fr
dc.subject	Inference Variationelle	fr
dc.subject	Apprentissage par Renforcement	fr
dc.subject	Attention	fr
dc.subject	Modelisation de Donnees Sequentielles	fr
dc.subject	Neural Networks	fr
dc.subject	Caption Generation	fr
dc.subject	Deep Learning	fr
dc.subject	Representation Learning	fr
dc.subject	Supervised Learning	fr
dc.subject	Variational Inference	fr
dc.subject	Reinforcement Learning	fr
dc.subject	Attention	fr
dc.subject	Sequence Modelling	fr
dc.subject.other	Applied Sciences - Artificial Intelligence / Sciences appliqués et technologie - Intelligence artificielle (UMI : 0800)	fr
dc.title	Exploring Attention Based Model for Captioning Images	fr
dc.type	Thèse ou mémoire / Thesis or Dissertation
etd.degree.discipline	Informatique	fr
etd.degree.grantor	Université de Montréal	fr
etd.degree.level	Maîtrise / Master's	fr
etd.degree.name	M. Sc.	fr
dcterms.abstract	Comprendre ce qu’il y a dans une image est l’enjeu primaire de la vision par ordinateur. Depuis 2012, les réseaux de neurones se sont imposés comme le modèle de facto pour de nombreuses applications d’apprentissage automatique. Inspirés par les récents travaux en traduction automatique et en détection d’objet, cette thèse s’intéresse aux modèles capables de décrire le contenu d’une image et explore comment la notion d’attention peut être parametrisée par des réseaux de neurones et utilisée pour la description d’image. Cette thèse presente un reseau de neurones base sur l’attention qui peut décrire le contenu d’images, et explique comment apprendre ce modèle de facon déterministique par backpropagation ou de facon stochastique avec de l’inférence variationnelle ou de l’apprentissage par renforcement. Etonnamment, nous montrons que le modèle apprend automatiquement a concentrer son attention sur les objets correspondant aux mots dans la phrase prédite. Cette notion d’attention obtient l’état de l’art sur trois benchmarks: Flickr9k, Flickr30k and MS COCO.	fr
dcterms.abstract	Understanding the content of images is arguably the primary goal of computer vision. Beyond merely saying what is in an image, one test of a system's understanding of an image is its ability to describe the contents of an image in natural language (a task we will refer to in this thesis as \image captioning"). Since 2012, neural networks have exploded as the defacto modelling tool for many important applications in machine learning. Inspired by recent work in machine translation and object detection, this thesis explores such models that can describe the content of images. In addition, it explores how the notion of \attention" can be both parameterized by neural networks and usefully employed for image captioning. More technically, this thesis presents a single attention based neural network that can describe images. It describes how to train such models in a purely deterministic manner using standard backpropagation and stochastically by considering techniques used in variational inference and reinforcement learning. Surprisingly, we show through visualization how the model is able to automatically learn an intuitive gaze of salient objects corresponding to words in the output sequence. We validate the use of an attention based approach with state-of-the-art performance three benchmark datasets: Flickr9k, Flickr30k and MS COCO.	fr
dcterms.language	eng	fr

Files in this item

Name:: Xu_Kelvin_2017_memoire.pdf
Size:: 5.457Mb
Format:: PDF
Description:: Thèse

This item appears in the following Collection(s)

Thèses et mémoires électroniques de l’Université de Montréal [23678]
Faculté des arts et des sciences – Département d'informatique et de recherche opérationnelle - Thèses et mémoires [1149]

Show item record

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.