Posts tagged "mechanistic-interpretability" — Connor

all ai automation coding deep-learning engineering fusion learning machine-learning mechanistic-interpretability opinion productivity programming projects python research science space stocks streamlit tools workflow

$ grep -l "#mechanistic-interpretability" notes/ — 1 match

I removed a learned behavior from a tiny AI model. Adding it to another model was harder.
May 13, 2026 · 12 min
A small mechanistic-interpretability result: subtracting one internal direction mostly removed a learned behavior, but adding matching directions to a separately trained model did not install it.