A curated collection of academic papers and articles that influence my research and thinking across various domains.

Michelle S. Lam, Janice Teoh, James Landay, Jeffrey Heer, Michael S. Bernstein
This paper defines LLM operations for extracting concepts from large amounts of unstructured text, useful for social sciences inquiry.

Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner
A clean and simiple method for steering model behavior using discovered representations specified by positive and negative samples. I feel this has strong potential for helping out metaphor-/sense-making in the HCI space.

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, Eugene Wu
Creates a DSL and program optimizer to apply LLMs to large/complex document processing. Provides a more structured and visible way to interact with LLMs on large text corpuses.

Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas
By probing with sparsity constraints, we can identify not only if model activations represent some feature but whether specific neurons encode certain features.

Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman
Identifies four cognitive behaviors (verification, backtracking, subgoal setting, backward chaining) that predict whether a model can self-improve via RL. The key finding is striking: it's the presence of reasoning behaviors, not answer correctness, that matters. Models exposed to training data with proper reasoning patterns -- even incorrect answers -- matched the improvement of models that had these behaviors naturally. A useful framing for thinking about what 'reasoning' actually is in these systems.

Trenton Bricken*, Adly Templeton*, Joshua Batson*, Brian Chen*, Adam Jermyn*, et al.
Really incredible work on discovering and visualizing feature decompositions of neuron layers with sparse autoencoders. Gorgeous visualizations and interfaces, and thoughtful reflections on interpretability methodology. I am a big fan of this publication style.

Super interesting and illuminating perspective explaining why supposedly deep-learning-unique phenomena like deep double descent, overparametrization, etc. can be explained using soft inductive biases and existing generalization frameworks. The references are a treasure trove!

Henry Farrell, Alison Gopnik, Cosma Shalizi, James Evans
Argues for understanding LLMs as technologies that reconstitute human knowledge in efficient and widely distributed ways, in a lineage of other such instruments, including markets and communication media.

Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeffrey T. Hancock, Tatsunori Hashimoto, Michael S. Bernstein
By modeling individual views rather than an aggregated 'view', we can explicitly define the voices 'heard' in making a decision and consider counterfactuals.