Eliciting Latent Knowledge
Stella Biderman Stella Biderman

Eliciting Latent Knowledge

As models get smarter, humans won't always be able to independently check if a model's claims are true or false. We aim to circumvent this issue by directly eliciting latent knowledge (ELK) inside the model’s activations.

Read More