ARENA Capstone: Hyperparameter tuning for MELBO

Co-authored with Aaron Kaufman for the ARENA capstone: we replicated MELBO (Mechanistically Eliciting Latent Behaviors) on Llama-3.2-1b-Instruct and did a hyperparameter sweep for the R value using diversity and coherence metrics.

Read on LessWrong

GitHub repo.