• alt

  • GPT-Neo

    GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public, for free.

    GPT⁠-⁠Neo is an implementation of model & data-parallel GPT⁠-⁠2 and GPT⁠-⁠3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is optimized for TPUs, but should also work on GPUs.

    Along the way we will be running experiments with alternative architectures and attention types, releasing any intermediate models, and writing up any findings on our blog.

  • Progress:

    • GPT⁠-⁠Neo should be feature complete. We are making bugfixes, but we do not expect to make significant changes.
    • As of 2021⁠-⁠03⁠-⁠21, 1.3B and 2.7B parameter GPT⁠-⁠Neo models are available to be run with GPT⁠-⁠Neo.
    • As of 2021⁠-⁠03⁠-⁠31, 1.3B and 2.7B parameter GPT⁠-⁠Neo models are now available on Hugging Face Model Hub!
  • Next Steps: