AlphaFold2 Replication

AlphaFold2 is a deep learning algorithm that leverages techniques originating in natural language processing to do protein structure prediction. It was announced by DeepMind in 2020 at the CASP 14 competition where it stunned the competition with its performance. The creators have given several talks and presentations on the algorithm (see, e.g., here), the model and codebase has been very recently (Jul 15, 2021) open sourced (under Apache License, Version 2....

CLASP

Recently multimodal contrastive models have had an explosion in power and popularity, e.g., ConVIRT, CLIP, and ALIGN. In this project we apply a similar setup but use amino acid sequences and their language description as our training data from the Universal Protein Resource (UniProt), an annotated protein database. The goal is to create a model that can be used like other CLIP-like models but for amino acid sequences and text....

Eval Harness

Github Repo: https://github.com/EleutherAI/lm-evaluation-harness

GPT-NeoX

GPT-NeoX is an implementation of 3D-parallel GPT-3-like autoregressive language models for distributed GPUs, based upon Megatron-LM and DeepSpeed. GPT-NeoX was used to train GPT-NeoX-20B, a 20 billion parameter language model, in colaboration with CoreWeave. Announced on Feburary 2, 2022 and released on The Eye alongside a preliminary technical report one week later, it became the largest dense autoregressive language model ever made freely available to the public at the time of its release....