Guilherme Penedo, Hynek Kydlíček, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell,
Colin Raffel, Leandro Von Werra, and Thomas Wolf,
“The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale”,
Neural Information Processing Systems 38 (NeurIPS), 2024.
Prateek Yadav,
Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, and Alessandro Sordoni,
“A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning”,
arXiv preprint arXiv:2408.07057, 2024.
Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva,
Colin Raffel, and Rameswar Panda,
“Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models”,
arXiv preprint arXiv:2404.05567, 2024.
Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and
Colin Raffel,
“Distributed Inference and Fine-tuning of Large Language Models Over The Internet”,
Neural Information Processing Systems 37 (NeurIPS), 2023.
Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, and
Colin Raffel,
“Scaling Data-Constrained Language Models”,
Neural Information Processing Systems 37 (NeurIPS), 2023.
Nikhil Kandpal*, Brian Lester*, Mohammed Muqeeth, Anisha Mascarenhas, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, and
Colin Raffel,
“Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models”,
40th International Conference on Machine Learning, 2023.
Alexander Borzunov*, Dmitry Baranchuk*, Tim Dettmers*, Max Ryabinin*, Younes Belkada*, Artem Chumachenko, Pavel Samygin, and
Colin Raffel,
“Petals: Collaborative Inference and Fine-tuning of Large Models”,
61st Annual Meeting of the Association for Computational Linguistics (ACL) Demo Track and NeurIPS Workshop on Broadening Research Collaborations in Machine Learning, 2022.
Thomas Wang*, Adam Roberts*, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, and
Colin Raffel,
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”,
39th International Conference on Machine Learning (ICML), 2022.
Linting Xue*, Aditya Barua*, Noah Constant*, Rami Al-Rfou*, Sharan Narang, Mihir Kale, Adam Roberts, and
Colin Raffel,
“ByT5: Towards a token-free future with pre-trained byte-to-byte models”,
Transactions of the Association for Computational Linguistics (TACL), 2022.
Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky,
Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, and Samson Tan,
“Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP”,
arXiv preprint arxiv:2112.10508, 2021.
Linting Xue*, Noah Constant*, Adam Roberts*, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and
Colin Raffel,
“mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer”,
2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.
Kihyuk Sohn*, David Berthelot*, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, and
Colin Raffel,
“FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence”,
Neural Information Processing Systems 34 (NeurIPS), 2020.
Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu,
“Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”,
Journal of Machine Learning Research (JMLR), 21(140), 2020.
Naveen Arivazhagan*, Colin Cherry*, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, and
Colin Raffel,
“Monotonic Infinite Lookback Attention for Simultaneous Machine Translation”,
57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
Curtis Hawthorne*, Erich Elsen*, Jialin Song*, Adam Roberts, Ian Simon,
Colin Raffel, Jesse Engel, Sageev Oore, and Douglas Eck,
“Onsets and Frames: Dual-Objective Piano Transcription”,
19th International Society for Music Information Retrieval Conference (ISMIR), 2018.
Andreas Jansson,
Colin Raffel, and Tillman Weyde,
“This Is My Jam: Data Dump”,
16th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers, 2015.
2023
Build an Ecosystem, Not a Monolith
Simons Institute Workshop on Large Language Models and Transformers, Google Responsible Machine Learning Reading Group, University of Edinburgh ILCC Seminar, Stanford NLP Seminar, UCSD AI Seminar, and Yale CPSC 488/588 Lecture
2022
Building Machine Learning Models Like Open-Source Software
Microsoft Research Summit, World Artificial Intelligence Conference, Technische Universität Darmstadt, UT Austin Forum for Artificial Intelligence, Korea AI Summit, Stanford CS324 Lecture, Stanford MLSys Seminar Series, and MLsys Symposium on Decentralized and Collaborative Learning
2021
Scaling up Models and Data
CIFAR Deep Learning and Reinforcement Learning Summer School, Nepal Winter School in AI, and Advanced Language Processing Winter School
2015
mir_eval
Objective Evaluation in Semantic Audio Analysis and Processing Panel at the 138th Convention of the Audio Engineering Society
2024
Senior Area Chair, Neural Information Processing Systems
2023
Peer Reviewer, Canada CIFAR AI Chairs program
2022–2023
Senior Area Chair, Conference on Empirical Methods in Natural Language Processing
2022
Panel Moderator, 7th Workshop on Representation Learning for NLP
2022
Area Chair, Conference on Computational Natural Language Learning
2022
Senior Area Chair, North American Chapter of the Association for Computational Linguistics
2022
Panelist, National Science Foundation
2022
Area Chair, Annual Meeting of the Association for Computational Linguistics
2022–2024
Action Editor, Transactions on Machine Learning Research
2021
Member, ACL Working Group on Efficient NLP
2021
Action Editor, ACL Rolling Review
2021
Area Chair, North American Chapter of the Association for Computational Linguistics
2021–2024
Area Chair, International Conference on Learning Representations
2021
Area Chair, AAAI Conference on Artificial Intelligence
2020
Mentor, Women in Machine Learning (WiML) Roundtable
2020
Organizer, NeurIPS Competition on Efficient Open-Domain Question Answering
2020–2023
Area Chair, Neural Information Processing Systems
2020
Panelist, Decoding Graduate Programs in CS
2020
Area Chair, Conference on Empirical Methods in Natural Language Processing
2020–2023
Reviewer, Journal of Machine Learning Research
2020
Reviewer, Transactions of the International Society for Music Information Retrieval
2019
Mentor, Women in Music Information Retrieval (WiMIR)
2018
Reviewer, Machine Learning for Creativity and Design Workshop
2018
Area Chair, International Society for Music Information Retrieval Conference
2018–2022
Reviewer, International Conference on Machine Learning
2018–2020
Reviewer, International Conference on Learning Representations
2018
Reviewer, International Journal of Computer Vision
2017–2019
Reviewer, Neural Information Processing Systems
2016
Reviewer, Journal of New Music Research
2015
Reviewer, EURASIP Journal on Audio, Speech, and Music Processing
2014–2017
Reviewer, International Society for Music Information Retrieval Conference
2014
Reviewer, IEEE International Symposium on Information Theory
2008–2009
Mathematics Tutor, Oberlin College
2024–now
Zhenwei (Joseph) Tang, University of Toronto
2024–now
Lunjun Zhang, University of Toronto
2024–now
Ajay Patel, University of Pennsylvania
2024–now
Towaki Takikawa, University of Toronto
2024–now
Honghua Dong, University of Toronto
2024–now
Yangjun Ruan, University of Toronto
2024–now
Lucas Gomez, McGill University
2023–now
Gavin Guan, University of Toronto
2023–now
Alex Adams, University of Toronto
2023–2024
Zining Zhu, University of Toronto
2023–2024
Jiaao Chen, Georgia Institute of Technology
2022–2024
Andrew Freeman, University of North Carolina, Chapel Hill
2022–2023
Albert Webson, Brown University
2022–2023
Feng Cheng, University of North Carolina, Chapel Hill
2022–2023
Xiang Zhou, University of North Carolina, Chapel Hill
2022–2023
Chao Zhao, University of North Carolina, Chapel Hill
2022–2023
Tu Vu, University of Massachusetts, Amherst
2022–2023
Peirong Liu, University of North Carolina, Chapel Hill
2020–2023
Junhua Yan, University of North Carolina, Chapel Hill
2020–2022
Yang Li, University of North Carolina, Chapel Hill
2020–2022
Zhengyang Shen, University of North Carolina, Chapel Hill
2020–2022
Dan Korn, University of North Carolina, Chapel Hill
2021–2023
YoungJoong Kwon, University of North Carolina, Chapel Hill