GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to replicate a GPT-3 DaVinci-sized model and open-source it to the public, for free.
GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is optimized for TPUs, but should also work on GPUs.
- GPT-Neo should be feature complete. We are making bugfixes, but we do not expect to make significant changes.
- As of 2021-03-21, 1.3B and 2.7B parameter GPT-Neo models are available to be run with GPT-Neo.
- As of 2021-03-31, 1.3B and 2.7B parameter GPT-Neo models are now available on Hugging Face Model Hub!
- We continue our efforts in in our GPU codebase, GPT-NeoX.