Contributions à la sonification d’image et à la classification de sons

Toffa, Ohini Kafui

Show metadata

Permalink

https://hdl.handle.net/1866/26824

Thesis or Dissertation

Toffa_Ohini_Kafui_2021_These.pdf (7.991Mb)

2021-11 (degree granted: 2022-06-22)

Author(s)

Toffa, Ohini Kafui

0000-0002-6646-6001

Advisor(s)

Mignotte, Max

Level

Doctoral

Discipline

Informatique

Abstract(s)

L’objectif de cette thèse est d’étudier d’une part le problème de sonification d’image et de le solutionner à travers de nouveaux modèles de correspondance entre domaines visuel et sonore. D’autre part d’étudier le problème de la classification de son et de le résoudre avec des méthodes ayant fait leurs preuves dans le domaine de la reconnaissance d’image. La sonification d’image est la traduction de données d’image (forme, couleur, texture, objet) en sons. Il est utilisé dans les domaines de l’assistance visuelle et de l’accessibilité des images pour les personnes malvoyantes. En raison de sa complexité, un système de sonification d’image qui traduit correctement les données d’image en son de manière intuitive n’est pas facile à concevoir. Notre première contribution est de proposer un nouveau système de sonification d’image de bas-niveau qui utilise une approche hiérarchique basée sur les caractéristiques visuelles. Il traduit, à l’aide de notes musicales, la plupart des propriétés d’une image (couleur, gradient, contour, texture, région) vers le domaine audio, de manière très prévisible et donc est facilement ensuite décodable par l’être humain. Notre deuxième contribution est une application Android de sonification de haut niveau qui est complémentaire à notre première contribution car elle implémente la traduction des objets et du contenu sémantique de l’image. Il propose également une base de données pour la sonification d’image. Finalement dans le domaine de l’audio, notre dernière contribution généralise le motif binaire local (LBP) à 1D et le combine avec des descripteurs audio pour faire de la classification de sons environnementaux. La méthode proposée surpasse les résultats des méthodes qui utilisent des algorithmes d’apprentissage automatique classiques et est plus rapide que toutes les méthodes de réseau neuronal convolutif. Il représente un meilleur choix lorsqu’il y a une rareté des données ou une puissance de calcul minimale.

The objective of this thesis is to study on the one hand the problem of image sonification and to solve it through new models of mapping between visual and sound domains. On the other hand, to study the problem of sound classification and to solve it with methods which have proven track record in the field of image recognition. Image sonification is the translation of image data (shape, color, texture, objects) into sounds. It is used in vision assistance and image accessibility domains for visual impaired people. Due to its complexity, an image sonification system that properly conveys the image data to sound in an intuitive way is not easy to design. Our first contribution is to propose a new low-level image sonification system which uses an hierarchical visual feature-based approach to translate, usingmusical notes, most of the properties of an image (color, gradient, edge, texture, region) to the audio domain, in a very predictable way in which is then easily decodable by the human being. Our second contribution is a high-level sonification Android application which is complementary to our first contribution because it implements the translation to the audio domain of the objects and the semantic content of an image. It also proposes a dataset for an image sonification. Finally, in the audio domain, our third contribution generalizes the Local Binary Pattern (LBP) to 1D and combines it with audio features for an environmental sound classification task. The proposed method outperforms the results of methods that uses handcrafted features with classical machine learning algorithms and is faster than any convolutional neural network methods. It represents a better choice when there is data scarcity or minimal computing power.

Collections

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.