• Stella Biderman and Edward Raff. “Neural Language Models are Effective Plagiarists.” Preprint, 2022. [arXiv]

  • Victor Sanh*, Albert Webson*, Colin Raffel*, Stephen H. Bach*, and 37 others (incl. Stella Biderman and Leo Gao). “Multitask Prompted Training Enables Zero-Shot Task Generalization.” In the Tenth International Conference on Learning Representations (ICLR), 2022. [arXiv] [Dataset] [Model]

  • Shahbuland Matiana*, JR Smith*, Ryan Teehan*, Louis Castricato*, Stella Biderman*, Leo Gao, and Spencer Frazier. “Cut the CARP: Fishing for zero-shot story evaluation.” Preprint, 2021. [arXiv] [GitHub]

  • Leo Gao. “An Empirical Exploration in Quality Filtering of Text Data.” Preprint, 2021. [arXiv]

  • Eric Alcaide, Stella Biderman, Amalio Telenti, and M. Cyrus Maher. “MP-NeRF: A Massively Parallel Method for Accelerating Protein Structure Reconstruction from Internal Coordinates” Journal of Computational Chemistry, 2021. [bioRxiv] [GitHub]

  • Louis Castricato*, Stella Biderman*, Rogelio E. Cardona-Rivera, and David Thue. “Towards a Model-theoretic View of Narratives.” 3rd Workshop on Narrative Understanding at NAACL-HLT 2021, 2021. [arXiv]

  • Isaac Caswell, Julia Kreutzer, and 50 others (incl. Stella Biderman). “Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.” 2nd Workshop on African Natural Language Processing at EACL 2021, 2021. [arXiv]

  • Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling.” Preprint, 2020. [arXiv] [Dataset] [Datasheet]

  • Aran Komatsuzaki. “Current Limitations of Language Models: What You Need is Retrieval.” Preprint, 2020. [arXiv]

Bold indicates EleutherAI-affiliated author. * indicates equal or alphabetical authorship.