Been Kim

Orcid: 0000-0001-9938-2915

Affiliations:
  • Google, USA
  • AI2, Allen Institute for Artificial Intelligence, Seattle, US


According to our database1, Been Kim authored at least 66 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Don't trust your eyes: on the (un)reliability of feature visualizations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
TabCBM: Concept-based Interpretable Neural Networks for Tabular Data.
Trans. Mach. Learn. Res., 2023

Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero.
CoRR, 2023

Getting aligned on representational alignment.
CoRR, 2023

Model evaluation for extreme risks.
CoRR, 2023

Gaussian Process Probes (GPP) for Uncertainty-Aware Probing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems.
Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023

On the Relationship Between Explanation and Prediction: A Causal View.
Proceedings of the International Conference on Machine Learning, 2023

2022
Impossibility Theorems for Feature Attribution.
CoRR, 2022

Human-Centered Concept Explanations for Neural Networks.
CoRR, 2022

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Human-Centered Concept Explanations for Neural Networks.
Proceedings of the Neuro-Symbolic Artificial Intelligence: The State of the Art, 2021

Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments.
Pattern Recognit., 2021

Analyzing a Caching Model.
CoRR, 2021

Acquisition of Chess Knowledge in AlphaZero.
CoRR, 2021

Best of both worlds: local and global explanations with human-understandable concepts.
CoRR, 2021

Machine Learning Techniques for Accountability.
AI Mag., 2021

2020
On Completeness-aware Concept-Based Explanations in Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Debugging Tests for Model Explanations.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Concept Bottleneck Models.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
The (Un)reliability of Saliency Methods.
Proceedings of the Explainable AI: Interpreting, 2019

On Concept-Based Explanations in Deep Neural Networks.
CoRR, 2019

BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth.
CoRR, 2019

Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.
CoRR, 2019

Explaining Classifiers with Causal Concept Effect (CaCE).
CoRR, 2019

Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure.
CoRR, 2019

Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks.
CoRR, 2019

An Evaluation of the Human-Interpretability of Explanation.
CoRR, 2019

Visualizing and Measuring the Geometry of BERT.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Benchmark for Interpretability Methods in Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Towards Automatic Concept-based Explanations.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Human Evaluation of Models Built for Interpretability.
Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing, 2019

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making.
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

Interpreting Black Box Predictions using Fisher Kernels.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018).
CoRR, 2018

Evaluating Feature Importance Estimates.
CoRR, 2018

xGEMs: Generating Examplars to Explain Black-Box Models.
CoRR, 2018

To Trust Or Not To Trust A Classifier.
CoRR, 2018

How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation.
CoRR, 2018

Human-in-the-Loop Interpretability Prior.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

To Trust Or Not To Trust A Classifier.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Sanity Checks for Saliency Maps.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning how to explain neural networks: PatternNet and PatternAttribution.
Proceedings of the 6th International Conference on Learning Representations, 2018

Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
The (Un)reliability of saliency methods.
CoRR, 2017

Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017).
CoRR, 2017

SmoothGrad: removing noise by adding noise.
CoRR, 2017

A Roadmap for a Rigorous Science of Interpretability.
CoRR, 2017

QSAnglyzer: Visual Analytics for Prismatic Analysis of Question Answering System Evaluations.
Proceedings of the 12th IEEE Conference on Visual Analytics Science and Technology, 2017

2016
Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016).
CoRR, 2016

Examples are not enough, learn to criticize! Criticism for Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015
Inferring Team Task Plans from Human Meetings: A Generative Modeling Approach with Logic-Based Prior.
J. Artif. Intell. Res., 2015

Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Scalable and Interpretable Data Representation for High-Dimensional, Complex Data.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Learning about meetings.
Data Min. Knowl. Discov., 2014

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013
Quantitative estimation of the strength of agreements in goal-oriented meetings.
Proceedings of the IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support, 2013

Machine Learning for Meeting Analysis.
Proceedings of the Late-Breaking Developments in the Field of Artificial Intelligence, 2013

Inferring Robot Task Plans from Human Team Meetings: A Generative Modeling Approach with Logic-Based Prior.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Human-Inspired Techniques for Human-Machine Team Planning.
Proceedings of the Human Control of Bioinspired Swarms, 2012

2010
Multiple relative pose graphs for robust cooperative mapping.
Proceedings of the IEEE International Conference on Robotics and Automation, 2010


  Loading...