Sanjiv Kumar

According to our database1, Sanjiv Kumar authored at least 184 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
What do larger image classifiers memorise?
Trans. Mach. Learn. Res., 2024

On the Role of Depth and Looping for In-Context Learning with Task Diversity.
CoRR, 2024

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization.
CoRR, 2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs.
CoRR, 2024

No more hard prompts: SoftSRV prompting for synthetic data generation.
CoRR, 2024

Mimetic Initialization Helps State Space Models Learn to Recall.
CoRR, 2024

On the Inductive Bias of Stacking Towards Improving Reasoning.
CoRR, 2024

Efficient Document Ranking with Learnable Late Interactions.
CoRR, 2024

Landscape-Aware Growing: The Power of a Little LAG.
CoRR, 2024

Faster Cascades via Speculative Decoding.
CoRR, 2024

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts.
CoRR, 2024

Metric-aware LLM inference.
CoRR, 2024

HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference.
CoRR, 2024

Efficient Stagewise Pretraining via Progressive Subnetworks.
CoRR, 2024

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection.
CoRR, 2024

Tandem Transformers for Inference Efficient LLMs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

DistillSpec: Improving Speculative Decoding via Knowledge Distillation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Two-stage LLM Fine-tuning with Less Specialization and More Generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Plugin estimators for selective classification with out-of-distribution detection.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning to Reject Meets Long-tail Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Functional Interpolation for Relative Positions improves Long Context Transformers.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Language Model Cascades: Token-Level Uncertainty And Beyond.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Think before you speak: Training Language Models With Pause Tokens.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On Bias-Variance Alignment in Deep Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Regression Aware Inference with LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Rethinking FID: Towards a Better Evaluation Metric for Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MarkovGen: Structured Prediction for Efficient Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
A Weighted K-Center Algorithm for Data Subset Selection.
CoRR, 2023

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent.
CoRR, 2023

It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models.
CoRR, 2023

SPEGTI: Structured Prediction for Efficient Generative Text-to-Image Models.
CoRR, 2023

Depth Dependence of μP Learning Rates in ReLU MLPs.
CoRR, 2023

Learning to reject meets OOD detection: Are all abstentions created equal?
CoRR, 2023

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval.
CoRR, 2023

ResMem: Learn what you can and memorize the rest.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SOAR: Improved Indexing for Approximate Nearest Neighbor Search.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On student-teacher deviations in distillation: does it pay to disobey?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

When Does Confidence-Based Cascade Deferral Suffice?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Framework for Developing the Next Generation Interactive Soil Moisture Forecasting System Using the Long-Short Term Memory Model.
Proceedings of the International Conference on Machine Learning and Applications, 2023

Efficient Training of Language Models using Few-Shot Learning.
Proceedings of the International Conference on Machine Learning, 2023

Teacher Guided Training: An Efficient Framework for Knowledge Transfer.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Automating Nearest Neighbor Search Configuration with Constrained Optimization.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Serving Graph Compression for Graph Neural Networks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Supervision Complexity and its Role in Knowledge Distillation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Leveraging Importance Weights in Subset Selection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Large Language Models with Controllable Working Memory.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Teacher's pet: understanding and mitigating biases in distillation.
Trans. Mach. Learn. Res., 2022

Preserving In-Context Learning ability in Large Language Model Fine-tuning.
CoRR, 2022

When does mixup promote local linearity in learned representations?
CoRR, 2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers.
CoRR, 2022

ELM: Embedding and Logit Margins for Long-Tail Learning.
CoRR, 2022

Predicting on the Edge: Identifying Where a Larger Model Does Better.
CoRR, 2022

Post-hoc estimators for learning to defer to an expert.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Decoupled Context Processing for Context Augmented Language Modeling.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

In defense of dual-encoders for neural ranking.
Proceedings of the International Conference on Machine Learning, 2022

Robust Training of Neural Networks Using Scale Invariant Architectures.
Proceedings of the International Conference on Machine Learning, 2022

Radio over FSO System for 5G Wireless Communication.
Proceedings of the 13th International Conference on Computing Communication and Networking Technologies, 2022

2021
Routing Protocols in Delay Tolerant Networks: Comparative and Empirical Analysis.
Wirel. Pers. Commun., 2021

When in Doubt, Summon the Titans: Efficient Inference with Large Models.
CoRR, 2021

Leveraging redundancy in attention with Reuse Transformers.
CoRR, 2021

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation.
CoRR, 2021

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets.
CoRR, 2021

Balancing Robustness and Sensitivity using Feature Contrastive Learning.
CoRR, 2021

Balancing Constraints and Submodularity in Data Subset Selection.
CoRR, 2021

On the Reproducibility of Neural Network Predictions.
CoRR, 2021

Efficient Training of Retrieval Models using Negative Cache.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Batch Active Learning at Scale.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces.
Proceedings of the 38th International Conference on Machine Learning, 2021

A statistical perspective on distillation.
Proceedings of the 38th International Conference on Machine Learning, 2021

Coping with Label Shift via Distributionally Robust Optimisation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Adaptive Federated Optimization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Overparameterisation and worst-case generalisation: friend or foe?
Proceedings of the 9th International Conference on Learning Representations, 2021

Long-tail learning via logit adjustment.
Proceedings of the 9th International Conference on Learning Representations, 2021

Evaluations and Methods for Explanation through Robustness Analysis.
Proceedings of the 9th International Conference on Learning Representations, 2021

RankDistil: Knowledge Distillation for Ranking.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Kernelized Classification in Deep Networks.
CoRR, 2020

Modifying Memories in Transformer Models.
CoRR, 2020

Why distillation helps: a statistical perspective.
CoRR, 2020

Doubly-stochastic mining for heterogeneous retrieval.
CoRR, 2020

Why are Adaptive Methods Good for Attention Models?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning discrete distributions: user vs item-level privacy.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Multi-Stage Influence Function.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Robust large-margin learning in hyperbolic space.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Federated Learning with Only Positive Labels.
Proceedings of the 37th International Conference on Machine Learning, 2020

Does label smoothing mitigate label noise?
Proceedings of the 37th International Conference on Machine Learning, 2020

Accelerating Large-Scale Inference with Anisotropic Vector Quantization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Low-Rank Bottleneck in Multi-head Attention Models.
Proceedings of the 37th International Conference on Machine Learning, 2020

Are Transformers universal approximators of sequence-to-sequence functions?
Proceedings of the 8th International Conference on Learning Representations, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning to Learn by Zeroth-Order Oracle.
Proceedings of the 8th International Conference on Learning Representations, 2020

Can gradient clipping mitigate label noise?
Proceedings of the 8th International Conference on Learning Representations, 2020

Pre-training Tasks for Embedding-based Large-scale Retrieval.
Proceedings of the 8th International Conference on Learning Representations, 2020

Semantic Label Smoothing for Sequence to Sequence Problems.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Why ADAM Beats SGD for Attention Models.
CoRR, 2019

Online Hierarchical Clustering Approximations.
CoRR, 2019

New Loss Functions for Fast Maximum Inner Product Search.
CoRR, 2019

AdaCliP: Adaptive Clipping for Private SGD.
CoRR, 2019

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise.
CoRR, 2019

Local Orthogonal Decomposition for Maximum Inner Product Search.
CoRR, 2019

Efficient Inner Product Approximation in Hybrid Spaces.
CoRR, 2019

Sampled Softmax with Random Fourier Features.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling.
Proceedings of the 36th International Conference on Machine Learning, 2019

Escaping Saddle Points with Adaptive Gradient Methods.
Proceedings of the 36th International Conference on Machine Learning, 2019

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Stochastic Negative Mining for Learning with Large Output Spaces.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Optimal Noise-Adding Mechanism in Additive Differential Privacy.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Learning Adaptive Random Features.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Truncated Laplacian Mechanism for Approximate Differential Privacy.
CoRR, 2018

The Sparse Recovery Autoencoder.
CoRR, 2018

Nonlinear Online Learning with Adaptive Nyström Approximation.
CoRR, 2018

Adaptive Methods for Nonconvex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

cpSGD: Communication-efficient and differentially-private distributed SGD.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Loss Decomposition for Fast Learning in Large Output Spaces.
Proceedings of the 35th International Conference on Machine Learning, 2018

On the Convergence of Adam and Beyond.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
On Binary Embedding using Circulant Matrices.
J. Mach. Learn. Res., 2017

Now Playing: Continuous low-power music recognition.
CoRR, 2017

Efficient Natural Language Response Suggestion for Smart Reply.
CoRR, 2017

Multiscale Quantization for Fast Similarity Search.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Distributed Mean Estimation with Limited Communication.
Proceedings of the 34th International Conference on Machine Learning, 2017

Stochastic Generative Hashing.
Proceedings of the 34th International Conference on Machine Learning, 2017

Learning Spread-Out Local Feature Descriptors.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Fast Classification with Binary Prototypes.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Learning to Hash for Indexing Big Data - A Survey.
Proc. IEEE, 2016

Software-Defined Storage-Based Data Infrastructure Supportive of Hydroclimatology Simulation Containers: A Survey.
Data Sci. Eng., 2016

A Fast Approach to Solve Matrix Games with Payoffs of Trapezoidal Fuzzy Numbers.
Asia Pac. J. Oper. Res., 2016

Orthogonal Random Features.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Binary embeddings with structured hashed projections.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Quantization based Fast Inner Product Search.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
Compact Nonlinear Maps and Circulant Extensions.
CoRR, 2015

Fast Online Clustering with Randomized Skeleton Sets.
CoRR, 2015

Fast Neural Networks with Circulant Projections.
CoRR, 2015

A Survey of Modern Questions and Challenges in Feature Extraction.
Proceedings of the 1st Workshop on Feature Extraction: Modern Questions and Challenges, 2015

Structured Transforms for Small-Footprint Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Spherical Random Features for Polynomial Kernels.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Fast Orthogonal Projection Based on Kronecker Product.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Exemplar-based large vocabulary speech recognition using k-nearest neighbors.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Fast Binary Embedding for High-Dimensional Data.
Proceedings of the Multimedia Data Mining and Analytics - Disruptive Innovation, 2015

2014
Discriminative Random Fields.
Computer Vision, A Reference Guide, 2014

On Learning with Label Proportions.
CoRR, 2014

Discrete Graph Hashing.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Circulant Binary Embedding.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Large-scale SVD and manifold learning.
J. Mach. Learn. Res., 2013

\(\propto\)SVM for Learning with Label Proportions.
Proceedings of the 30th International Conference on Machine Learning, 2013

Learning Binary Codes for High-Dimensional Data Using Bilinear Projections.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
Semi-Supervised Hashing for Large-Scale Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

Sampling Methods for the Nyström Method.
J. Mach. Learn. Res., 2012

Angular Quantization-based Binary Codes for Fast Similarity Search.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Compact Hyperplane Hashing with Bilinear Functions.
Proceedings of the 29th International Conference on Machine Learning, 2012

On the Difficulty of Nearest Neighbor Search.
Proceedings of the 29th International Conference on Machine Learning, 2012

2011
Hashing with Graphs.
Proceedings of the 28th International Conference on Machine Learning, 2011

2010
Discriminative Graphical Models for Context-Based Classification.
Proceedings of the Computer Vision: Detection, Recognition and Reconstruction, 2010

Baselines for Image Annotation.
Int. J. Comput. Vis., 2010

Sequential Projection Learning for Hashing with Compact Codes.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

YouTubeCat: Learning to categorize wild web videos.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Sampling Techniques for the Nystrom Method.
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

Ensemble Nystrom Method.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

On sampling-based approximate spectral decomposition.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

2008
A New Baseline for Image Annotation.
Proceedings of the Computer Vision, 2008

Large-scale manifold learning.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Face tracking and recognition with visual constraints in real-world videos.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Classification of Weakly-Labeled Data with Partial Equivalence Relations.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

2006
Discriminative Random Fields.
Int. J. Comput. Vis., 2006

2005
A Hierarchical Field Framework for Unified Context-Based Classification.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study.
Proceedings of the Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005

Digital Tapestry.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

2004
Path planning with hallucinated worlds.
Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, September 28, 2004

2003
An observation-constrained generative approach for probabilistic classification of image regions.
Image Vis. Comput., 2003

Discriminative Fields for Modeling Spatial Dependencies in Natural Images.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

Man-Made Structure Detection in Natural Images using a Causal Multiscale Random Field.
Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 2003

2000
A Fully Autonomous Microrobotic Endoscopy System.
J. Intell. Robotic Syst., 2000

Design of a vision-guided microrobotic colonoscopy system.
Adv. Robotics, 2000

1999
A New Approach for Nonlinear Distortion Correction in Endoscopic Images Based on Least Squares Estimation.
IEEE Trans. Medical Imaging, 1999

A pipelined architecture for image segmentation by adaptive progressive thresholding.
Microprocess. Microsystems, 1999


  Loading...