ARENA Capstone: Hyperparameter tuning for MELBO

October 5, 2024

Co-authored with Aaron Kaufman for the ARENA capstone: we replicated MELBO (Mechanistically Eliciting Latent Behaviors) on Llama-3.2-1b-Instruct and did a hyperparameter sweep for the R value using diversity and coherence metrics.

→ Read on LessWrong

GitHub repo.