GPT-Neo is the code name for a series of transformer-based language models loosely styled around the GPT architecture that we plan to train and open source. Our primary goal is to replicate a GPT-3 sized model and open source it to the public, for free.
Our models are built in Tensorflow-mesh, which will allow us to scale up to GPT-3 sizes and beyond using simultaneous model and data parallelism.
We have the bulk of the model built, GPT-2 size models trained, and several experimental architectures implemented.
Our current codebase should be able to scale up to GPT-3 sized models
We are currently working on wrapping up GPT-2-sized model replication, looking mostly at evaluations there.
The largest model we've gotten to train for a single step so far has been 200B parameters.