The Pile

The Pile is a curated collection of 22 diverse high-quality datasets for training large language models.

Previous
Previous

GPT-Neo Library

Next
Next

OpenWebText2