Berrak Sisman

Orcid: 0000-0001-8078-3305

According to our database¹, Berrak Sisman authored at least 74 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Versatile Audio-Visual Learning for Emotion Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2025

PRESENT: Zero-Shot Text-to-Prosody Control.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

2024

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2024

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection.

[BibT_eX]

[DOI]

Ismail Rasim Ulgen

Shreeram Suresh Chandra

Junchen Lu

Berrak Sisman

CoRR, 2024

We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings.

[BibT_eX]

[DOI]

CoRR, 2024

Style Mixture of Experts for Expressive Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Ahad Jawaid

Shreeram Suresh Chandra

Junchen Lu

Berrak Sisman

CoRR, 2024

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2024

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEEE Access, 2024

Discrete Unit Based Masking For Improving Disentanglement in Voice Conversion.

[BibT_eX]

[DOI]

Philip H. Lee

Ismail Rasim Ulgen

Berrak Sisman

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results.

[BibT_eX]

[DOI]

Lucas Goncalves

Ali N. Salman

Abinay Reddy Naini

Laureano Moro-Velázquez

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Exploring speech style spaces with language models: Emotional TTS without emotion labels.

[BibT_eX]

[DOI]

Shreeram Suresh Chandra

Zongyang Du

Berrak Sisman

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhanced Facial Landmarks Detection for Patients with Repaired Cleft Lip and Palate.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

2023

Speech Synthesis With Mixed Emotions.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Decoding Knowledge Transfer for Neural Text-to-Speech Training.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Emotional voice conversion: Theory, databases and ESD.

[BibT_eX]

[DOI]

Speech Commun., 2022

SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder.

[BibT_eX]

[DOI]

CoRR, 2022

Mixed Emotion Modelling for Emotional Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Controllable Accented Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Expressive TTS Training With Frame and Style Reconstruction Loss.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastTalker: A neural text-to-speech architecture with shallow and group autoregression.

[BibT_eX]

[DOI]

Neural Networks, 2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.

[BibT_eX]

[DOI]

CoRR, 2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Rui Liu

Berrak Sisman

Haizhou Li

CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability.

[BibT_eX]

[DOI]

Rui Liu

Berrak Sisman

Haizhou Li

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis.

[BibT_eX]

[DOI]

Rui Liu

Berrak Sisman

Haizhou Li

Proceedings of the IEEE International Conference on Acoustics, 2021

SUTD-NUS System for Blizzard Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2020

DeepConversion: Voice conversion with limited parallel training data.

[BibT_eX]

[DOI]

Speech Commun., 2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.

[BibT_eX]

[DOI]

Kun Zhou

Berrak Sisman

Haizhou Li

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.

[BibT_eX]

[DOI]

Berrak Sisman

Haizhou Li

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Teacher-Student Training For Robust Tacotron-Based TTS.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion.

[BibT_eX]

[DOI]

Berrak Sisman

Mingyang Zhang

Haizhou Li

IEEE ACM Trans. Audio Speech Lang. Process., 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

SINGAN: Singing Voice Conversion with Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Phonetically Aware Exemplar-Based Prosody Transformation.

[BibT_eX]

[DOI]

Berrak Sisman

Grandee Lee

Haizhou Li

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.

[BibT_eX]

[DOI]

Berrak Sisman

Mingyang Zhang

Haizhou Li

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion.

[BibT_eX]

[DOI]

Berrak Sisman

Haizhou Li

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

Error Reduction Network for DBLSTM-based Voice Conversion.

[BibT_eX]

[DOI]

Mingyang Zhang

Berrak Sisman

Sai Sirisha Rallabandi

Haizhou Li

Li Zhao

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

On the analysis and evaluation of prosody conversion techniques.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Sparse representation of phonetic features for voice conversion with and without parallel data.

[BibT_eX]

[DOI]

Berrak Sisman

Haizhou Li

Kay Chen Tan

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Transformation of prosody in voice conversion.

[BibT_eX]

[DOI]

Berrak Sisman

Haizhou Li

Kay Chen Tan

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Energy and data cooperation in energy harvesting multiple access channel.

[BibT_eX]

[DOI]

Proceedings of the IEEE Wireless Communications and Networking Conference Workshops, 2016

Berrak Sisman

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...