GPT-Neo is an implementation of model & data-parallel autoregressive language models, utilizing Mesh Tensorflow for distributed computation on TPUs.

Even though it made contributors want to pull their hair out throughout its development and lifetime, GPT-Neo was used to train a family of models between 125 million and 2.7 billion parameters on the TensorFlow Research Cloud. The flagship 1.3B and 2.7B models of this family were trained during December 2020 and January 2021. After languishing for a couple months on a hard drive, these models were released on March 21, 2021.

The GPT-Neo codebase is considered deprecated and is no longer maintained. We recommend that those looking for a TPU-centric codebase consider Mesh Transformer JAX, and those looking to run the GPT-Neo models use the implementations available in Hugging Face Transformers.