Saharon Rosset

Orcid: 0000-0002-4458-9545

According to our database1, Saharon Rosset authored at least 79 papers between 1998 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 




Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data.
CoRR, 2024

Integrating Random Effects in Deep Neural Networks.
J. Mach. Learn. Res., 2023

Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep learning.
CoRR, 2023

Tree-Based Models for Correlated Data.
J. Mach. Learn. Res., 2022

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Innovation Representation of Stochastic Processes With Application to Causal Inference.
IEEE Trans. Inf. Theory, 2020

Semi-Supervised Empirical Risk Minimization: When can unlabeled data improve prediction.
CoRR, 2020

Maximum Likelihood for Gaussian Process Classification and Generalized Linear Mixed Models under Case-Control Sampling.
J. Mach. Learn. Res., 2019

Lossless Compression of Random Forests.
J. Comput. Sci. Technol., 2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation.
CoRR, 2019

Rescaling and other forms of unsupervised preprocessing introduce bias into cross-validation.
CoRR, 2019

Linear Independent Component Analysis Over Finite Fields: Algorithms and Bounds.
IEEE Trans. Signal Process., 2018

Resolution considerations in imaging of the cortical layers.
NeuroImage, 2018

Using Stochastic Approximation Techniques to Efficiently Construct Confidence Intervals for Heritability.
J. Comput. Biol., 2018

Lossless (and Lossy) Compression of Random Forests.
CoRR, 2018

Tensor Composition Analysis Detects Cell-Type Specific Associations in Epigenetic Studies.
Proceedings of the Research in Computational Molecular Biology, 2018

The Everlasting Database: Statistical Validity at a Fair Price.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Large Alphabet Source Coding Using Independent Component Analysis.
IEEE Trans. Inf. Theory, 2017

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Association testing of bisulfite-sequencing methylation data via a Laplace approximation.
Bioinform., 2017

Generalized Independent Component Analysis Over Finite Alphabets.
IEEE Trans. Inf. Theory, 2016

Isotonic Modeling with Non-Differentiable Loss Functions with Application to Lasso Regularization.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Binary independent component analysis: Theory, bounds and algorithms.
Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing, 2016

Compressing Random Forests.
Proceedings of the IEEE 16th International Conference on Data Mining, 2016

A Simple and Efficient Approach for Adaptive Entropy Coding over Large Alphabets.
Proceedings of the 2016 Data Compression Conference, 2016

Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis.
Proceedings of the 2015 Data Compression Conference, 2015

Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression.
J. Comput. Sci. Technol., 2014

Generalized binary independent component analysis.
Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, June 29, 2014

Memoryless representation of Markov processes.
Proceedings of the 2013 IEEE International Symposium on Information Theory, 2013

Leakage in data mining: Formulation, detection, and avoidance.
ACM Trans. Knowl. Discov. Data, 2012

Prediction-based regularization using data augmented regression.
Stat. Comput., 2012

Weighted pooling - practical and cost-effective techniques for pooled high-throughput sequencing.
Bioinform., 2012

lobSTR: A Short Tandem Repeat Profiler for Personal Genomes.
Proceedings of the Research in Computational Molecular Biology, 2012

Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery.
IEEE ACM Trans. Comput. Biol. Bioinform., 2011

Isotonic Recursive Partitioning
CoRR, 2011

Accurate estimation of heritability in genome wide studies using random effects models.
Bioinform., 2011

Leakage in data mining: formulation, detection, and avoidance.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

A/B Testing at SweetIM: The Importance of Proper Statistical Analysis.
Proceedings of the Data Mining Workshops (ICDMW), 2011

Operations Research Improves Sales Force Productivity at IBM.
Interfaces, 2010

Medical data mining: insights from winning two competitions.
Data Min. Knowl. Discov., 2010

Maximum likelihood estimation of locus-specific mutation rates in Y-chromosome short tandem repeats.
Bioinform., 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Modeling Quantiles.
Proceedings of the Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 2009

Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression.
J. Mach. Learn. Res., 2009

Grouped graphical Granger modeling for gene expression regulatory networks discovery.
Bioinform., 2009

Grouped graphical Granger modeling methods for temporal causal modeling.
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28, 2009

Breast cancer identification: KDD CUP winner's report.
SIGKDD Explor., 2008

Customer targeting models using actively-selected web content.
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

Making the most of your data: KDD Cup 2007 "How Many Ratings" winner's report.
SIGKDD Explor., 2007

Ranking-based evaluation of regression models.
Knowl. Inf. Syst., 2007

Analytics-driven solutions for customer targeting and sales-force allocation.
IBM Syst. J., 2007

Efficient inference on known phylogenetic trees using Poisson regression.
Bioinform., 2007

Identifying Bundles of Product Options using Mutual Information Clustering.
Proceedings of the Seventh SIAM International Conference on Data Mining, 2007

High-quantile modeling for customer wallet estimation and other applications.
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007

Looking for Great Ideas: Analyzing the Innovation Jam.
Proceedings of the Advances in Web Mining and Web Usage Analysis, 2007

<i>l</i><sub>1</sub> Regularization in Infinite Dimensional Feature Spaces.
Proceedings of the Learning Theory, 20th Annual Conference on Learning Theory, 2007

Data-Enhanced Predictive Modeling for Sales Targeting.
Proceedings of the Sixth SIAM International Conference on Data Mining, 2006

Inferring Common Origins from mtDNA.
Proceedings of the Research in Computational Molecular Biology, 2006

A new multi-view regression approach with an application to customer wallet estimation.
Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006

Sparse, Flexible and Efficient Modeling using L 1 Regularization.
Proceedings of the Feature Extraction - Foundations and Applications, 2006

Robust boosting and its relation to bagging.
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005

ROC confidence bands: an empirical evaluation.
Proceedings of the Machine Learning, 2005

Boosting as a Regularized Path to a Maximum Margin Classifier.
J. Mach. Learn. Res., 2004

The Entire Regularization Path for the Support Vector Machine.
J. Mach. Learn. Res., 2004

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Following Curved Regularized Optimization Solution Paths.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Model selection via the AUC.
Proceedings of the Machine Learning, 2004

Customer Lifetime Value Models for Decision Support.
Data Min. Knowl. Discov., 2003

1-norm Support Vector Machines.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Margin Maximizing Loss Functions.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Integrating Customer Value Considerations into Predictive Modeling.
Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), 2003

Boosting and support vector machines as optimal separators.
Proceedings of the Document Recognition and Retrieval X, 2003

Boosting Density Estimation.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

Customer lifetime value modeling and its use for customer retention planning.
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002

Evaluation of prediction models for marketing campaigns.
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001

KDD-Cup 99: Knowledge Discovery In a Charitable Organization's Donor Database.
SIGKDD Explor., 2000

Discovery of Fraud Rules for Telecommunications - Challenges and Solutions.
Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999

Ranking - Methods for Flexible Evaluation and Efficient Comparison of Classification Performance.
Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998
