GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT-3 and make it available to the public under an open licence.
- As of 2021-03-31, the codebase is fairly stable. DeepSpeed, 3D-parallelism and ZeRO are all working properly.
- We are currently waiting for CoreWeave to finish building the final hardware we’ll be training on. In the meantime, we are optimizing GPT-NeoX to run as efficiently as possible on that hardware.