• alt

  • GPT-Neo

    GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT⁠-⁠3 and make it available to the public under an open licence.

    GPT⁠-⁠Neo is an implementation of model & data-parallel GPT⁠-⁠2 and GPT⁠-⁠3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.

  • Progress:

    • GPT⁠-⁠Neo should be feature complete. We are making bugfixes, but we do not expect to make any significant changes.
    • As of 2021-03-21, 1.3B and 2.7B parameter GPT⁠-⁠Neo models are available to be run with GPT⁠-⁠Neo.
    • As of 2021-03-31, 1.3B and 2.7B parameter GPT⁠-⁠Neo models are now available on Hugging Face Model Hub!
  • Next Steps: