• alt

  • GPT-NeoX

    GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public, for free.

    GPT⁠-⁠NeoX is an implementation of 3D-parallel GPT⁠-⁠3-like models on distributed GPUs, based upon DeepSpeed and Megatron⁠-⁠LM.

    We have been graciously been offered high-performance GPU compute by CoreWeave, an NVIDIA Preferred Cloud Services Provider. CoreWeave is excited by the open nature of the project and is very keen in helping us to break the OpenAI-Microsoft monopoly on massive autoregressive language models.

  • Progress:

    • As of 2021⁠-⁠03⁠-⁠31, the codebase is fairly stable. DeepSpeed, 3D-parallelism and ZeRO are all working properly.
  • Next Steps:

    • We are currently waiting for CoreWeave to finish building the final hardware we’ll be training on. In the meantime, we are optimizing GPT⁠-⁠NeoX to run as efficiently as possible on that hardware.