• Leo Gao. “An Empirical Exploration in Quality Filtering of Text Data.” Preprint, 2021. [arXiv]

  • Eric Alcaide, Stella Biderman, Amalio Telenti, and M. Cyrus Maher. “MP-NeRF: A Massively Parallel Method for Accelerating Protein Structure Reconstruction from Internal Coordinates” Preprint, 2021. [bioRxiv]

  • Louis Castricato*, Stella Biderman*, Rogelio E. Cardona-Rivera, and David Thue. “Towards a Model-theoretic View of Narratives.” 3rd Workshop on Narrative Understanding at NAACL-HLT 2021, 2021. [arXiv]

  • Isaac Caswell, Julia Kreutzer, and 50 others (incl. Stella Biderman). “Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.” 2nd Workshop on African Natural Language Processing at EACL 2021, 2021. [arXiv]

  • Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling.” Preprint, 2020. [arXiv]

  • Aran Komatsuzaki. “Current Limitations of Language Models: What You Need is Retrieval.” Preprint, 2020. [arXiv]

Bold indicates EleutherAI-affiliated author. * indicates equal or alphabetical authorship.