GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.
- GPT-Neo should be feature complete. We are making bugfixes, but we do not expect to make any significant changes.
- As of , 1.3B and 2.7B parameter GPT-Neo models are available to be run with GPT-Neo.
- As of , 1.3B and 2.7B parameter GPT-Neo models are now available on Hugging Face Model Hub!
- We continue our efforts in in our GPU codebase, GPT-NeoX.