Tatsunori B. Hashimoto

Orcid: 0000-0003-0521-5855

Affiliations:
  • Massachusetts Institute of Technology, Department of Computer Science and Electrical Engineering
  • Harvard University, Department of Statistics


According to our database1, Tatsunori B. Hashimoto authored at least 118 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Robust Distortion-free Watermarks for Language Models.
Trans. Mach. Learn. Res., 2024

A Survey on Data Selection for Language Models.
Trans. Mach. Learn. Res., 2024

Benchmarking Large Language Models for News Summarization.
Trans. Assoc. Comput. Linguistics, 2024

Graph-based Uncertainty Metrics for Long-form Language Model Outputs.
CoRR, 2024

Locality Alignment Improves Vision-Language Models.
CoRR, 2024

Synthetic continued pretraining.
CoRR, 2024

Improving Pretraining Data Using Perplexity Correlations.
CoRR, 2024

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers.
CoRR, 2024

The Future of Open Human Feedback.
CoRR, 2024

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models.
CoRR, 2024

Learning to (Learn at Test Time): RNNs with Expressive Hidden States.
CoRR, 2024

Observational Scaling Laws and the Predictability of Language Model Performance.
CoRR, 2024

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators.
CoRR, 2024

Linguistic Calibration of Language Models.
CoRR, 2024

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution.
CoRR, 2024

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.
Proceedings of the IEEE Security and Privacy, 2024

Removing RLHF Protections in GPT-4 via Fine-Tuning.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Trustless Audits without Revealing Data or Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Language Models with Conformal Factuality Guarantees.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Understanding Finetuning for Factual Knowledge Extraction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Scaling Laws for the Value of Individual Data Points in Machine Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Linguistic Calibration of Long-Form Generations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Identifying the Risks of LM Agents with an LM-Emulated Sandbox.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Proving Test Set Contamination in Black-Box Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Benchmarking and Improving Generator-Validator Consistency of Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Learnability of Watermarks for Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Fairness ROAD: Robust Optimization for Adversarial Debiasing.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Distributionally Robust Losses for Latent Covariate Mixtures.
Oper. Res., March, 2023

Holistic Evaluation of Language Models.
Trans. Mach. Learn. Res., 2023

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification.
Trans. Mach. Learn. Res., 2023

Accelerating Aggregation Queries on Unstructured Streams of Data.
Proc. VLDB Endow., 2023

Foundation Models and Fair Use.
J. Mach. Learn. Res., 2023

Identifying and Mitigating the Security Risks of Generative AI.
Found. Trends Priv. Secur., 2023

Benchmarking Multi-Domain Active Learning on Image Classification.
CoRR, 2023

Learning to (Learn at Test Time).
CoRR, 2023

Identifying and Mitigating the Security Risks of Generative AI.
CoRR, 2023

Where's the Liability in Harmful AI Speech?
CoRR, 2023

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback.
CoRR, 2023

Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models.
CoRR, 2023

MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Likelihood-Based Diffusion Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Congestion Control Safety via Comparative Statics.
Proceedings of the IEEE INFOCOM 2023, 2023

Coder Reviewer Reranking for Code Generation.
Proceedings of the International Conference on Machine Learning, 2023

Data Feedback Loops: Model-driven Amplification of Dataset Biases.
Proceedings of the International Conference on Machine Learning, 2023

Whose Opinions Do Language Models Reflect?
Proceedings of the International Conference on Machine Learning, 2023

Out-of-Domain Robustness via Targeted Augmentations.
Proceedings of the International Conference on Machine Learning, 2023

Evaluating Self-Supervised Learning via Risk Decomposition.
Proceedings of the International Conference on Machine Learning, 2023

Is a Caption Worth a Thousand Images? A Study on Representation Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale.
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

TempLM: Distilling Language Models into Template-Based Generators.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Privacy-Preserving Domain Adaptation of Semantic Parsers.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Decoding: Open-ended Text Generation as Optimization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Error Attribution for Finetuned Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Emergent Abilities of Large Language Models.
Trans. Mach. Learn. Res., 2022

Tracing and Removing Data Errors in Natural Language Generation Datasets.
CoRR, 2022

ZK-IMG: Attested Images via Zero-Knowledge Proofs to Fight Disinformation.
CoRR, 2022

Scaling up Trustless DNN Inference with Zero-Knowledge Proofs.
CoRR, 2022

A Closer Look at the Calibration of Differentially Private Learners.
CoRR, 2022

Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning.
CoRR, 2022

TempLM: Distilling Language Models into Template-Based Generators.
CoRR, 2022

TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Diffusion-LM Improves Controllable Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

When Does Differentially Private Learning Not Suffer in High Dimensions?
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Improving Self-Supervised Learning by Characterizing Idealized Representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Identifiability Conditions for Domain Adaptation.
Proceedings of the International Conference on Machine Learning, 2022

Language modeling via stochastic processes.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Is Importance Weighting Incompatible with Interpolating Classifiers?
Proceedings of the Tenth International Conference on Learning Representations, 2022

Extending the WILDS Benchmark for Unsupervised Adaptation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Distributionally Robust Models with Parametric Likelihood Ratios.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Large Language Models Can Be Strong Differentially Private Learners.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Jury Learning: Integrating Dissenting Voices into Machine Learning Models.
Proceedings of the CHI '22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022, 2022

Spurious Correlations in Reference-Free Evaluation of Text Generation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Accelerating Approximate Aggregation Queries with Expensive Predicates.
Proc. VLDB Endow., 2021

On the Opportunities and Risks of Foundation Models.
CoRR, 2021

Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates.
CoRR, 2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics.
CoRR, 2021

On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Model Performance Scaling with Multiple Data Sources.
Proceedings of the 38th International Conference on Machine Learning, 2021

Modeling the Second Player in Distributionally Robust Optimization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Don't Hate the Player, Hate the Game: Safety and Utility in Multi-Agent Congestion Control.
Proceedings of the HotNets '21: The 20th ACM Workshop on Hot Topics in Networks, 2021

The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality.
Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021

Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Approximate Selection with Guarantees using Proxies.
Proc. VLDB Endow., 2020

Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data.
CoRR, 2020

Robustness to Spurious Correlations via Human Annotations.
Proceedings of the 37th International Conference on Machine Learning, 2020

Distributionally Robust Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

Improved Natural Language Generation via Loss Truncation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization.
CoRR, 2019

Learning Autocomplete Systems as a Communication Game.
CoRR, 2019

Unifying Human and Statistical Evaluation for Natural Language Generation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Distributionally Robust Language Modeling.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Inferring Multidimensional Rates of Aging from Cross-Sectional Data.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Generating Sentences by Editing Prototypes.
Trans. Assoc. Comput. Linguistics, 2018

Inferring Multi-Dimensional Rates of Aging from Cross-Sectional Data.
CoRR, 2018

A Retrieve-and-Edit Framework for Predicting Structured Outputs.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fairness Without Demographics in Repeated Loss Minimization.
Proceedings of the 35th International Conference on Machine Learning, 2018

Derivative Free Optimization Via Repeated Classification.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Unsupervised Transformation Learning via Convex Relaxations.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016
Continuous representations and models from random walk diffusion limits.
PhD thesis, 2016

Word Embeddings as Metric Recovery in Semantic Spaces.
Trans. Assoc. Comput. Linguistics, 2016

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding.
Bioinform., 2016

Learning Population-Level Diffusions with Generative RNNs.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2015
Word, graph and manifold embedding from Markov processes.
CoRR, 2015

From random walks to distances on unweighted graphs.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Metric recovery from directed unweighted graphs.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Universal Count Correction for High-Throughput Sequencing.
PLoS Comput. Biol., 2014

2012
Lineage-based identification of cellular states and expression programs.
Bioinform., 2012

2011
Tree preserving embedding.
Proceedings of the 28th International Conference on Machine Learning, 2011

2009
Superconducting Narrowband Filter for Receiver of Weather Radar.
IEICE Trans. Electron., 2009

BFL: a node and edge betweenness based fast layout algorithm for large scale networks.
BMC Bioinform., 2009

2005
Electrically Tunable Superconducting Microstrip Line Band-Pass Filter for Mobile Applications.
IEICE Trans. Electron., 2005


  Loading...