Serguei V. S. Pakhomov

Orcid: 0000-0001-8113-4788

According to our database1, Serguei V. S. Pakhomov authored at least 121 papers between 1999 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 




Coherence and comprehensibility: Large language models predict lay understanding of health-related content.
J. Biomed. Informatics, 2025

Useful blunders: Can automated speech recognition errors improve downstream dementia classification?
J. Biomed. Informatics, 2024

Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance.
CoRR, 2024

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

The Power of Speech in the Wild: Discriminative Power of Daily Voice Diaries in Understanding Auditory Verbal Hallucinations Using Deep Learning.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., September, 2023

Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts.
CoRR, 2023

Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes.
CoRR, 2023

A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge.
CoRR, 2023

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments.
CoRR, 2023

Automated Neural Nursing Assistant (ANNA): An Over-The-Phone System for Cognitive Monitoring.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A conversational agent system for dietary supplements use.
BMC Medical Informatics Decis. Mak., 2022

Fully automated detection of formal thought disorder with Time-series Augmented Representations for Detection of Incoherent Speech (TARDIS).
J. Biomed. Informatics, 2022

The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification.
CoRR, 2022

MTAP - A Distributed Framework for NLP Pipelines.
Proceedings of the 10th IEEE International Conference on Healthcare Informatics, 2022

GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition.
J. Am. Medical Informatics Assoc., 2021

NLP Methods for Extraction of Symptoms from Unstructured Data for Use in Prognostic COVID-19 Analytic Models.
J. Artif. Intell. Res., 2021

Crossing the "Cookie Theft" Corpus Chasm: Applying What BERT Learns From Outside Data to the ADReSS Challenge Dementia Detection Task.
Frontiers Comput. Sci., 2021

An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles.
CoRR, 2021

Identifying Mentions of Life Stressors in Clinical Notes.
Proceedings of the 9th IEEE International Conference on Healthcare Informatics, 2021

Conversational Agent for Daily Living Assessment Coaching Demo.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021

Spoken Dialogue Systems for Medication Management.
Proceedings of the Precision Health and Medicine - A Digital Revolution in Healthcare, 2020

COVID-19 TestNorm: A tool to normalize COVID-19 testing names to LOINC codes.
J. Am. Medical Informatics Assoc., 2020

Conversational Agent for Daily Living Assessment Coaching.
Proceedings of the First Workshop on Artificial Intelligence for Function, Disability, and Health co-located with the 2020 International Joint Conference on Artificial Intelligence, 2020

A Prototype Conversational Agent for Dietary Supplements.
Proceedings of the 8th IEEE International Conference on Healthcare Informatics, 2020

The Open Health Natural Language Processing Collaboratory.
Proceedings of the AMIA 2020, 2020

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT.
Proceedings of the Artificial Intelligence in Medicine, 2020

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

A privacy-preserving distributed filtering framework for NLP artifacts.
BMC Medical Informatics Decis. Mak., 2019

Challenges and Opportunities to Improve the Clinician Experience Reviewing Electronic Progress Notes.
Appl. Clin. Inform., 2019

Named Entity Recognition in Prehospital Trauma Care.
Proceedings of the MEDINFO 2019: Health and Wellbeing e-Networks for All, 2019

Recurrent Deep Network Models for Clinical NLP Tasks: Use Case with Sentence Boundary Disambiguation.
Proceedings of the MEDINFO 2019: Health and Wellbeing e-Networks for All, 2019

Electronic Progress Note Reading Patterns: An Eye Tracking Analysis.
Proceedings of the MEDINFO 2019: Health and Wellbeing e-Networks for All, 2019

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.
J. Am. Medical Informatics Assoc., 2018

Feasibility of using Fitbit® to infer stress exposure in everyday life.
Proceedings of the AMIA 2018, 2018

Detecting clinically relevant new information in clinical notes across specialties and settings.
BMC Medical Informatics Decis. Mak., 2017

What Analogies Reveal about Word Vectors and their Compositionality.
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, 2017

Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.
Proceedings of the MEDINFO 2017: Precision Healthcare through Informatics, 2017

Detecting Signals of Interactions Between Warfarin and Dietary Supplements in Electronic Health Records.
Proceedings of the MEDINFO 2017: Precision Healthcare through Informatics, 2017

Usability Testing of a Clinical Document Search Engine for Researchers.
Proceedings of the Summit on Clinical Research Informatics, 2017

Using ensembles of NLP engines without a common type system to improve abbreviation disambiguation.
Proceedings of the Summit on Clinical Research Informatics, 2017

Classifying Supplement Use Status in Clinical Notes.
Proceedings of the Summit on Clinical Research Informatics, 2017

AMICUS: A Metasystem for Interoperation and Combination of UIMA Systems.
Proceedings of the AMIA 2017, 2017

A comparative observational study of inpatient clinical note-entry and reading/retrieval styles adopted by physicians.
Int. J. Medical Informatics, 2016

Corpus domain effects on distributional semantic modeling of medical terms.
Bioinform., 2016

Identifying Family History and Substance Use Associations for Adult Epilepsy from the Electronic Health Record.
Proceedings of the Summit on Clinical Research Informatics, 2016

NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes.
Proceedings of the Summit on Clinical Research Informatics, 2016

Using synthetic clinical data to train an HMM-based POS tagger.
Proceedings of the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics, 2016

Investigating Longitudinal Tobacco Use Information from Social History and Clinical Notes in the Electronic Health Record.
Proceedings of the AMIA 2016, 2016

Does Section Order Affect Physicians' Experiences Reviewing Ambulatory Progress Notes?
Proceedings of the AMIA 2016, 2016

Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.
Proceedings of the AMIA 2016, 2016

Automated De-Identification of Distributional Semantic Models.
Proceedings of the AMIA 2016, 2016

Using automatic speech recognition to assess spoken responses to cognitive tests of semantic verbal fluency.
Speech Commun., 2015

Language networks associated with computerized semantic indices.
NeuroImage, 2015

Ease of adoption of clinical natural language processing software: An evaluation of five systems.
J. Biomed. Informatics, 2015

Domain adaption of parsing for operative notes.
J. Biomed. Informatics, 2015

Analyzing Operative Note Structure in Development of a Section Header Resource.
Proceedings of the MEDINFO 2015: eHealth-enabled Health, 2015

Evaluation of Herbal and Dietary Supplement Resource Term Coverage.
Proceedings of the MEDINFO 2015: eHealth-enabled Health, 2015

Evaluating Term Coverage of Herbal and Dietary Supplements in Electronic Health Records.
Proceedings of the AMIA 2015, 2015

Automated Extraction of Substance Use Information from Clinical Texts.
Proceedings of the AMIA 2015, 2015

Using semantic predications to uncover drug-drug interactions in clinical data.
J. Biomed. Informatics, 2014

A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources.
J. Am. Medical Informatics Assoc., 2014

System for automated speech and language analysis (SALSA).
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Using Language Models to Identify Relevant New Information in Inpatient Clinical Note.
Proceedings of the AMIA 2014, 2014

Semantic Role Labeling for Modeling Surgical Procedures in Operative Notes.
Proceedings of the AMIA 2014, 2014

U-path: An undirected path-based measure of semantic similarity.
Proceedings of the AMIA 2014, 2014

Automated Extraction of Family History Information from Clinical Notes.
Proceedings of the AMIA 2014, 2014

Effects of time constraints on clinician-computer interaction: A study on information synthesis from EHR clinical notes.
J. Biomed. Informatics, 2013

Quantification of speech disfluency as a marker of medication-induced cognitive impairment: An application of computerized speech analysis in neuropharmacology.
Comput. Speech Lang., 2013

UMLS: : Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Navigating Longitudinal Clinical Notes with an Automated Method for Detecting New Information.
Proceedings of the MEDINFO 2013, 2013

Predicate Argument Structure Frames for Modeling Information in Operative Notes.
Proceedings of the MEDINFO 2013, 2013

Computerized Analysis of a Verbal Fluency Test.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.
J. Biomed. Informatics, 2012

ProTK: An Improved Prosody Toolkit.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Automated identification of relevant new information in clinical narrative.
Proceedings of the ACM International Health Informatics Symposium, 2012

Measuring the similarity and relatedness of concepts in the medical domain: IHI 2012 tutorial overview.
Proceedings of the ACM International Health Informatics Symposium, 2012

Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet.
Proceedings of the ACM International Health Informatics Symposium, 2012

Automated Assessment of Medical Training Evaluation Text.
Proceedings of the AMIA 2012, 2012

A Study of Actions in Operative Notes.
Proceedings of the AMIA 2012, 2012

Automated Disambiguation of Acronyms and Abbreviations in Clinical Texts: Window and Training Size Considerations.
Proceedings of the AMIA 2012, 2012

Using SemRep to Label Semantic Relations Extracted from Clinical Text.
Proceedings of the AMIA 2012, 2012

A Qualitative Analysis of EHR Clinical Document Synthesis by Clinicians.
Proceedings of the AMIA 2012, 2012

Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries.
Proceedings of the AMIA 2012, 2012

Towards a framework for developing semantic relatedness reference standards.
J. Biomed. Informatics, 2011

Prosodic Correlates of Individual Physiological Response to Stress.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Prosody Toolkit: Integrating HTK, Praat and WEKA.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation.
Proceedings of the Fifteenth Conference on Computational Natural Language Learning, 2011

Evaluation of family history information within clinical documents and adequacy of HL7 clinical statement and clinical genomics family history models for its representation: a case report.
J. Am. Medical Informatics Assoc., 2010

Automated Identification of Synonyms in Biomedical Acronym Sense Inventories.
Proceedings of the Second Louhi Workshop on Text and Data Mining of Health Documents, 2010

UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity.
Proceedings of the AMIA 2009, 2009

Technical Brief: Automatic Classification of Foot Examination Findings Using Clinical Notes and Machine Learning.
J. Am. Medical Informatics Assoc., 2008

Forced-Alignment and Edit-Distance Scoring for Vocabulary Tutoring Applications.
Proceedings of the Text, Speech and Dialogue, 11th International Conference, 2008

Automatic Quality of Life Prediction Using Electronic Medical Records.
Proceedings of the AMIA 2008, 2008

Measures of semantic similarity and relatedness in the biomedical domain.
J. Biomed. Informatics, 2007

Determining the Syntactic Structure of Medical Terms in Clinical Notes.
Proceedings of the Biological, translational, and clinical language processing, 2007

Research Paper: Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques.
J. Am. Medical Informatics Assoc., 2006

Developing a corpus of clinical notes manually annotated for part-of-speech.
Int. J. Medical Informatics, 2006

A Hybrid Approach to Determining Modification of Clinical Diagnoses.
Proceedings of the AMIA 2006, 2006

A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports.
Proceedings of the AMIA 2006, 2006

An End-to-End Supervised Target-Word Sense Disambiguation System.
Proceedings of the Proceedings, 2006

Kernel Methods for Word Sense Disambiguation and Acronym Expansion.
Proceedings of the Proceedings, 2006

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier.
J. Biomed. Informatics, 2005

Domain-specific language models and lexicons for tagging.
J. Biomed. Informatics, 2005

Frame Semantics and the Domain of Functioning, Disability and Health.
Proceedings of the AMIA 2005, 2005

Medical Facts to Support Inferencing in Natural Language Processing.
Proceedings of the AMIA 2005, 2005

Abbreviation and Acronym Disambiguation in Clinical Discourse.
Proceedings of the AMIA 2005, 2005

Towards Semantic Role Labeling & IE in the Medical Literature.
Proceedings of the AMIA 2005, 2005

High Throughput Modularized NLP System for Clinical Text.
Proceedings of the ACL 2005, 2005

Using Volunteers to Annotate Biomedical Corpora for Anaphora Resolution.
Proceedings of the Knowledge Collection from Volunteer Contributors, 2005

A Corpus Driven Approach Applying the "Frame Semantic" Method for Modeling Functional Status Terminology.
Proceedings of the MEDINFO 2004, 2004

Using Compund Codes for Automatic Classification of Clinical Diagnoses.
Proceedings of the MEDINFO 2004, 2004

Creating a Test Corpus of Clinical Notes Manually Tagged for Part-of-Speech Information.
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004

Identification of Patients with Congestive Heart Failure using a Binary Classifier: A Case Study.
Proceedings of the Workshop on Natural Language Processing in Biomedicine, 2003

Exploring Adjectival Modification in Biomedical Discourse Across Two Genres.
Proceedings of the Workshop on Natural Language Processing in Biomedicine, 2003

A Data-Driven Approach for Extracting "the Most Specific Term" for Ontology Development.
Proceedings of the AMIA 2003, 2003

Maximum entropy modeling for mining patient medication status from free text.
Proceedings of the AMIA 2002, 2002

Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts.
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002

Generating Training Data for Medical Dictations.
Proceedings of the Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics, 2001

Improving language model perplexity and recognition accuracy for medical dictations via within-domain interpolation with literal and semi-literal corpora.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Modeling Filled Pauses in Medical Dictations.
Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1999
