Projects and papers

Here are some of the projects I have been working on recently. You can find my full resume here.

MANTa Findings EMNLP (to appear) 2022: MANTa is a differentiable tokenization module which learns to segment input sequences end-to-end with the Language Model objective.
BLOOM 2022: BLOOM is a Large Language Model resulting from a collaborative, open source effort. I actively participated in creating its tokenizer.
Hands-on CamemBERT June 2022: We gave a 3 hours-long tutorial on how to use and finetune CamemBERT, a French Language Model and turned it into a blogpost (in French).
Active Learning from Demonstrations MVA RL course 2021: We designed an agent robust to imperfect demonstrations and evaluated it in discrete and continuous environments.
SinGAN for Inpainting MVA Computer Vision course 2021: We adapt SinGAN for inpainting using Partial Convolutions.
Domain Shift in Disaster Tweets Classification MVA Deep Learning course 2021: We study the impact of domain shifts on Disaster Tweet Classifiers, and solutions to mitigate potential degradations.