• alt

  • GPT-NeoX

    GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. Our primary goal is to train an equivalent model to the full-sized GPT⁠-⁠3 and make it available to the public under an open licence.

    GPT⁠-⁠NeoX is an implementation of 3D-parallel GPT⁠-⁠3-like models on distributed GPUs, based upon DeepSpeed and Megatron⁠-⁠LM.

  • Progress:

    • As of 2021-03-31, the codebase is fairly stable. DeepSpeed, 3D-parallelism and ZeRO are all working properly.
  • Next Steps:

    • We are currently waiting for CoreWeave to finish building the final hardware we’ll be training on. In the meantime, we are optimizing GPT⁠-⁠NeoX to run as efficiently as possible on that hardware.