GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT-3 and make it available to the public under an open licence.
GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.
- GPT-Neo should be feature complete. We are making bugfixes, but we do not expect to make any significant changes.
- As of 2021-03-21, 1.3B and 2.7B parameter GPT-Neo models are available to be run with GPT-Neo.
- As of 2021-03-31, 1.3B and 2.7B parameter GPT-Neo models are now available on Hugging Face Model Hub!
- We continue our efforts in in our GPU codebase, GPT-NeoX.