Re-weighted softmax cross-entropy to control forgetting in federated learning

Legate, Gwendolyne

dc.contributor.advisor	Belilovsky, Eugene
dc.contributor.author	Legate, Gwendolyne
dc.date.accessioned	2023-05-15T15:33:39Z
dc.date.available	NO_RESTRICTION	fr
dc.date.available	2023-05-15T15:33:39Z
dc.date.issued	2023-03-22
dc.date.submitted	2022-12
dc.identifier.uri	http://hdl.handle.net/1866/27947
dc.subject	Federated Learning	fr
dc.subject	Client Drift	fr
dc.subject	Out of Distribution Generalization	fr
dc.subject	Catastrophic Forgetting	fr
dc.subject	Apprentissage fédéré	fr
dc.subject	Dérive du client	fr
dc.subject	Généralisation hors distribution	fr
dc.subject	Oubli catastrophique	fr
dc.subject.other	Artificial intelligence / Intelligence artificielle (UMI : 0800)	fr
dc.title	Re-weighted softmax cross-entropy to control forgetting in federated learning	fr
dc.type	Thèse ou mémoire / Thesis or Dissertation
etd.degree.discipline	Informatique	fr
etd.degree.grantor	Université de Montréal	fr
etd.degree.level	Maîtrise / Master's	fr
etd.degree.name	M. Sc.	fr
dcterms.abstract	Dans l’apprentissage fédéré, un modèle global est appris en agrégeant les mises à jour du modèle calculées à partir d’un ensemble de nœuds clients, un défi clé dans ce domaine est l’hétérogénéité des données entre les clients qui dégrade les performances du modèle. Les algorithmes d’apprentissage fédéré standard effectuent plusieurs étapes de gradient avant de synchroniser le modèle, ce qui peut amener les clients à minimiser exagérément leur propre objectif local et à s’écarter de la solution globale. Nous démontrons que dans un tel contexte, les modèles de clients individuels subissent un oubli catastrophique par rapport aux données d’autres clients et nous proposons une approche simple mais efficace qui modifie l’objectif d’entropie croisée sur une base par client en repondérant le softmax de les logits avant de calculer la perte. Cette approche protège les classes en dehors de l’ensemble d’étiquettes d’un client d’un changement de représentation brutal. Grâce à une évaluation empirique approfondie, nous démontrons que notre approche peut atténuer ce problème, en apportant une amélioration continue aux algorithmes d’apprentissage fédéré standard. Cette approche est particulièrement avantageux dans les contextes d’apprentissage fédéré difficiles les plus étroitement alignés sur les scénarios du monde réel où l’hétérogénéité des données est élevée et la participation des clients à chaque cycle est faible. Nous étudions également les effets de l’utilisation de la normalisation par lots et de la normalisation de groupe avec notre méthode et constatons que la normalisation par lots, qui était auparavant considérée comme préjudiciable à l’apprentissage fédéré, fonctionne exceptionnellement bien avec notre softmax repondéré, remettant en question certaines hypothèses antérieures sur la normalisation dans un système fédéré	fr
dcterms.abstract	In Federated Learning, a global model is learned by aggregating model updates computed from a set of client nodes, a key challenge in this domain is data heterogeneity across clients which degrades model performance. Standard federated learning algorithms perform multiple gradient steps before synchronizing the model which can lead to clients overly minimizing their own local objective and diverging from the global solution. We demonstrate that in such a setting, individual client models experience a catastrophic forgetting with respect to data from other clients and we propose a simple yet efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax of the logits prior to computing the loss. This approach shields classes outside a client’s label set from abrupt representation change. Through extensive empirical evaluation, we demonstrate our approach can alleviate this problem, providing consistent improvement to standard federated learning algorithms. It is particularly beneficial under the challenging federated learning settings most closely aligned with real world scenarios where data heterogeneity is high and client participation in each round is low. We also investigate the effects of using batch normalization and group normalization with our method and find that batch normalization which has previously been considered detrimental to federated learning performs particularly well with our re-weighted softmax, calling into question some prior assumptions about normalization in a federated setting	fr
dcterms.language	eng	fr

Files in this item

Name:: Legate_Gwendolyne_2022_memoire.pdf
Size:: 2.310Mb
Format:: PDF
Description:: Mémoire

This item appears in the following Collection(s)

Thèses et mémoires électroniques de l’Université de Montréal [24306]
Faculté des arts et des sciences – Département d'informatique et de recherche opérationnelle - Thèses et mémoires [1178]

Show item record

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.