Publications and Preprints

  • Louis Castricato*, Stella Biderman*, Rogelio E. Cardona-Rivera, and David Thue. “Towards a Model-theoretic View of Narratives.” Preprint, 2021. [arXiv]

  • Isaac Caswell, Julia Kreutzer, and 50 others (incl. Stella Biderman). “Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.” EACL Workshop on African NLP, 2021. [arXiv]

  • Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling.” Preprint, 2020. [arXiv]

  • Aran Komatsuzaki. “Current Limitations of Language Models: What You Need is Retrieval.” Preprint, 2020. [arXiv]

Bold indicates EleutherAI-affiliated author. * indicates equal or alphabetical authorship.