Athanasios Katsamanis

Orcid: 0000-0002-2642-2354

Affiliations:
  • National Technical University of Athens, Greece
  • University of Southern California, Signal Analysis and Interpretation Laboratory


According to our database1, Athanasios Katsamanis authored at least 94 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A Case Study for Modern Greek.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Large Language Models - Introduction to the Special Theme.
ERCIM News, 2024

Revolutionising Theatre Archives: Using Large Language Models to Interact with Structured Archival Content.
ERCIM News, 2024

A Conversational AI Assistant for Teaching and Learning.
ERCIM News, 2024

Meltemi: The first open Large Language Model for Greek.
CoRR, 2024

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data.
CoRR, 2024

2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance.
CoRR, 2023

Cross-Lingual Features for Alzheimer's Dementia Detection from Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring Language-Agnostic Speech Representations Using Domain Knowledge for Detecting Alzheimer's Dementia.
Proceedings of the IEEE International Conference on Acoustics, 2023

Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP.
Proceedings of the IEEE International Conference on Acoustics, 2023

SPECTRE: Visual Speech-Informed Perceptual 3D Facial Expression Reconstruction from Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos.
CoRR, 2022

Regotron: Regularizing the Tacotron2 Architecture Via Monotonic Alignment Loss.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

SIEVE: A Space-Efficient Algorithm for Viterbi Decoding.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions.
Proceedings of the PETRA '22: The 15th International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 29 June 2022, 2022

Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

NLP-Theatre: Employing Speech Recognition Technologies for Improving Accessibility and Augmenting the Theatrical Experience.
Proceedings of the Intelligent Systems and Applications, 2022

Audio and ASR-based Filled Pause Detection.
Proceedings of the 10th International Conference on Affective Computing and Intelligent Interaction, 2022

2021
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments.
CoRR, 2021

AudioVisual Speech Synthesis: A brief literature review.
CoRR, 2021

2019
A behaviorally inspired fusion approach for computational audiovisual saliency modeling.
Signal Process. Image Commun., 2019

Data Augmentation Using GANs for Speech Emotion Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Using Oliver API for emotion-aware movie content characterization.
Proceedings of the 2019 International Conference on Content-Based Multimedia Indexing, 2019

2018
Multi-View Audio-Articulatory Features for Phonetic Recognition on RTMRI-TIMIT Database.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Multiple Instance Learning for Behavioral Coding.
IEEE Trans. Affect. Comput., 2017

Video-realistic expressive audio-visual speech synthesis for the Greek language.
Speech Commun., 2017

Room-localized spoken command recognition in multi-room, multi-microphone environments.
Comput. Speech Lang., 2017

Demonstration of an HMM-based photorealistic expressive audio-visual speech synthesis system.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Photorealistic adaptation and interpolation of facial expressions using HMMS and AAMS for audio-visual speech synthesis.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Multimodal Gesture Recognition via Multiple Hypotheses Rescoring.
Proceedings of the Gesture Recognition, 2017

Multimodal gesture recognition.
Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, 2017

2016
A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

FMRI-based perceptual validation of a computational model for visual and auditory saliency in videos.
Proceedings of the 2016 IEEE International Conference on Image Processing, 2016

Towards a behaviorally-validated computational audiovisual saliency model.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Multimodal human action recognition in assistive human-robot interaction.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Improved Dictionary Selection and Detection Schemes in Sparse-CNMF-Based Overlapping Acoustic Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

On Shape Recognition and Language.
Proceedings of the Perspectives in Shape Analysis, 2016

2015
Multimodal gesture recognition via multiple hypotheses rescoring.
J. Mach. Learn. Res., 2015

Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments.
Proceedings of the 23rd European Signal Processing Conference, 2015

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract).
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions.
Comput. Speech Lang., 2014

ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece).
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Kinect-based multimodal gesture recognition using a two-pass fusion scheme.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Robust far-field spoken command recognition for home automation combining adaptation and multichannel processing.
Proceedings of the IEEE International Conference on Acoustics, 2014

The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Predicting Eyes' Fixations in Movie Videos: Visual Saliency Experiments on a New Eye-Tracking Database.
Proceedings of the Engineering Psychology and Cognitive Ergonomics, 2014

Experiments in acoustic source localization using sparse arrays in adverse indoors environments.
Proceedings of the 22nd European Signal Processing Conference, 2014

Multi-microphone fusion for detection of speech and acoustic events in smart spaces.
Proceedings of the 22nd European Signal Processing Conference, 2014

2013
Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features.
Speech Commun., 2013

Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information.
Image Vis. Comput., 2013

Multi-band long-term signal variability features for robust voice activity detection.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012
Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification.
IEEE Trans. Affect. Comput., 2012

The Twins Corpus of Museum Visitor Questions.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Ada and Grace: Direct Interaction with Museum Visitors.
Proceedings of the Intelligent Virtual Agents - 12th International Conference, 2012

Based on Isolated Saliency or Causal Integration? Toward a Better Understanding of Human Annotation Process using Multiple Instance Learning and Sequential Probability Ratio Test.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Analyzing the memory of BLSTM Neural Networks for enhanced emotion classification in dyadic spoken interactions.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An acoustic analysis of shared enjoyment in ECA interactions of children with autism.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Using measures of vocal entrainment to inform outcome-related behaviors in marital conflicts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data Using Convolutive NMF with Sparseness Constraints.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Multimodal Real-Time MRI Articulatory Corpus for Speech Research.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An Analysis of PCA-Based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Validating rt-MRI Based Articulatory Representations via Articulatory Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Automatic Identification of Salient Acoustic Instances in Couples' Behavioral Interactions Using Diverse Density Support Vector Machines.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

"You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Estimation of ordinal approach-avoidance labels in dyadic interactions: Ordinal logistic regression approach.
Proceedings of the IEEE International Conference on Acoustics, 2011

Tracking changes in continuous emotion states using body language and prosodic cues.
Proceedings of the IEEE International Conference on Acoustics, 2011

Affective State Recognition in Married Couples' Interactions Using PCA-Based Vocal Entrainment Measures with Multiple Instance Learning.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

Multiple Instance Learning for Classification of Human Behavior Observations.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

2010
A new multichannel multi modal dyadic interaction database.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Statistical multi-stream modeling of real-time MRI articulatory speech data.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Automatic classification of married couples' behavior using audio features.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009
Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition.
IEEE Trans. Speech Audio Process., 2009

Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation.
IEEE Trans. Speech Audio Process., 2009

Tongue tracking in Ultrasound images with Active Appearance Models.
Proceedings of the International Conference on Image Processing, 2009

Product-HMMs for automatic sign language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Multisensor multiband cross-energy tracking for feature extraction and recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynamics.
Proceedings of the IEEE International Conference on Acoustics, 2008

Audiovisual speech inversion by switching dynamical modeling governed by a Hidden Markov process.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

Adaptive Multimodal Fusion by Uncertainty Compensation with Application to Audio-Visual Speech Recognition.
Proceedings of the Multimodal Processing and Interaction, Audio, Video, Text, 2008

Cross-Modal Integration for Performance Improving in Multimedia: A Review.
Proceedings of the Multimodal Processing and Interaction, Audio, Video, Text, 2008

2007
Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Audiovisual-to-Articulatory Speech Inversion Using HMMs.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

2006
Adaptive multimodal fusion by uncertainty compensation.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Multimodal fusion by adaptive compensation for feature uncertainty with application to audiovisual speech recognition.
Proceedings of the 14th European Signal Processing Conference, 2006

2005
Advances in statistical estimation and tracking of AM-FM speech components.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005


  Loading...