GPT-NeoX is an implementation of 3D-parallel GPT-3-like autoregressive language models for distributed GPUs, based upon Megatron-LM and DeepSpeed.
GPT-NeoX was used to train GPT-NeoX-20B, a 20 billion parameter language model, in colaboration with CoreWeave. Announced on
In-depth information on GPT-NeoX-20B can be found in the associated technical report on arXiv.