GPT-NeoX is an implementation of 3D-parallel GPT-3-like models on distributed GPUs, based upon DeepSpeed and Megatron-LM.

As of , the codebase is fairly stable. DeepSpeed, 3D-parallelism and ZeRO are all working properly.
Next Steps:
We are currently waiting for CoreWeave to finish building the final hardware we’ll be training on. In the meantime, we are optimizing GPT-NeoX to run as efficiently as possible on that hardware.