Here are some of the projects I have been working on recently. You can find my full resume here.

MANTa Findings EMNLP (to appear) 2022
MANTa is a differentiable tokenization module which learns to segment input sequences end-to-end with the Language Model objective.
BLOOM 2022
BLOOM is a Large Language Model resulting from a collaborative, open source effort. I actively participated in creating its tokenizer.
Hands-on CamemBERT June 2022
We gave a 3 hours-long tutorial on how to use and finetune CamemBERT, a French Language Model and turned it into a blogpost (in French).
Active Learning from Demonstrations MVA RL course 2021
We designed an agent robust to imperfect demonstrations and evaluated it in discrete and continuous environments.
SinGAN for Inpainting MVA Computer Vision course 2021
We adapt SinGAN for inpainting using Partial Convolutions.
Domain Shift in Disaster Tweets Classification MVA Deep Learning course 2021
We study the impact of domain shifts on Disaster Tweet Classifiers, and solutions to mitigate potential degradations.