Back to Home

Influential Papers

A curated collection of academic papers and articles that influence my research and thinking across various domains.

Reflections on Qualitative Research
Interpretability

Reflections on Qualitative Research

Chris Olah, Adam Jermyn

Interesting thoughts on what kind of methodology suits interpretability as a growing, immature field.

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
AI Tools for Human Knowledge

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

Michelle S. Lam, Janice Teoh, James Landay, Jeffrey Heer, Michael S. Bernstein

This paper defines LLM operations for extracting concepts from large amounts of unstructured text, useful for social sciences inquiry.

Steering Llama 2 via Contrastive Activation Addition
Interpretability

Steering Llama 2 via Contrastive Activation Addition

Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

A clean and simiple method for steering model behavior using discovered representations specified by positive and negative samples. I feel this has strong potential for helping out metaphor-/sense-making in the HCI space.

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Concept-structured AI

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, Eugene Wu

Creates a DSL and program optimizer to apply LLMs to large/complex document processing. Provides a more structured and visible way to interact with LLMs on large text corpuses.

HCI for AGI
Human-AI Interaction

HCI for AGI

Meredith Ringel Morris

Useful outline of what HCI researchers can contribute to 'AGI'. It's not obvious (and people may fear that) interaction problems will be solved by AGI. Perhaps not?

Concept Bottleneck Models
Concept-structured AI

Concept Bottleneck Models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

By learning explicitly defined concepts to bridge independent and dependent variables, we can better interpret model decisions and intervene on mistakes.

We Can't Understand AI Using our Existing Vocabulary
Positions and Visions

We Can't Understand AI Using our Existing Vocabulary

John Hewitt, Robert Geirhos, Been Kim

A compelling articulation of what human-AI communication could look like. Proposes neologism learning.

Discovering Latent Knowledge in Language Models Without Supervision
Interpretability

Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

A method to probe structure in language models without any notion of ground truth, relying instead on the consistency property of tru statements.

Finding Neurons in a Haystack: Case Studies with Sparse Probing
Interpretability

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas

By probing with sparsity constraints, we can identify not only if model activations represent some feature but whether specific neurons encode certain features.

Cognitive Behaviors that Enable Self-Improving Reasoners
Representation Learning

Cognitive Behaviors that Enable Self-Improving Reasoners

Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman

Identifies four cognitive behaviors (verification, backtracking, subgoal setting, backward chaining) that predict whether a model can self-improve via RL. The key finding is striking: it's the presence of reasoning behaviors, not answer correctness, that matters. Models exposed to training data with proper reasoning patterns -- even incorrect answers -- matched the improvement of models that had these behaviors naturally. A useful framing for thinking about what 'reasoning' actually is in these systems.

Fish Eye for Text
Human-AI Interaction

Fish Eye for Text

Amelia Wattenberger

A beautifully designed webpage that illustrates a 'fisheye' principle for building knowledge interfaces that expose users to the 'peripheral' context.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Interpretability

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Trenton Bricken*, Adly Templeton*, Joshua Batson*, Brian Chen*, Adam Jermyn*, et al.

Really incredible work on discovering and visualizing feature decompositions of neuron layers with sparse autoencoders. Gorgeous visualizations and interfaces, and thoughtful reflections on interpretability methodology. I am a big fan of this publication style.

Deep Learning is Not So Mysterious or Different
Representation Learning

Deep Learning is Not So Mysterious or Different

Andrew Gordon Wilson

Super interesting and illuminating perspective explaining why supposedly deep-learning-unique phenomena like deep double descent, overparametrization, etc. can be explained using soft inductive biases and existing generalization frameworks. The references are a treasure trove!

Large AI models are cultural and social technologies
Positions and Visions

Large AI models are cultural and social technologies

Henry Farrell, Alison Gopnik, Cosma Shalizi, James Evans

Argues for understanding LLMs as technologies that reconstitute human knowledge in efficient and widely distributed ways, in a lineage of other such instruments, including markets and communication media.

Backpack Language Models
Concept-structured AI

Backpack Language Models

John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang

By creating an LM architecture in which input tokens have a direct log-linear effect on the output, we can intervene precisely on the model output.

Jury Learning: Integrating Dissenting Voices into Machine Learning Models
Concept-structured AI

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeffrey T. Hancock, Tatsunori Hashimoto, Michael S. Bernstein

By modeling individual views rather than an aggregated 'view', we can explicitly define the voices 'heard' in making a decision and consider counterfactuals.

Scaling and evaluating sparse autoencoders
Interpretability

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, et al.

Really nice technical details and knowledge on training and understanding large sparse autoencoders,

The Economics of Maps
Miscellaneous

The Economics of Maps

Abhishek Nagaraj, Scott Stern

An interesting analysis of economic issues in who uses and produces maps.

Loading more articles...