GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.

Progress:

  • GPT-Neo should be feature complete. We are making bugfixes, but we do not expect to make any significant changes.
  • As of , 1.3B and 2.7B parameter GPT-Neo models are available to be run with GPT-Neo.
  • As of , 1.3B and 2.7B parameter GPT-Neo models are now available on Hugging Face Model Hub!

Next Steps:

  • We continue our efforts in in our GPU codebase, GPT-NeoX.