Berrak Sisman

Orcid: 0000-0001-8078-3305

According to our database1, Berrak Sisman authored at least 72 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Discrete Unit based Masking for Improving Disentanglement in Voice Conversion.
CoRR, 2024

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection.
CoRR, 2024

PRESENT: Zero-Shot Text-to-Prosody Control.
CoRR, 2024

We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings.
CoRR, 2024

Style Mixture of Experts for Expressive Text-To-Speech Synthesis.
CoRR, 2024

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training.
CoRR, 2024

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition.
CoRR, 2024

emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition.
IEEE Access, 2024

Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Exploring speech style spaces with language models: Emotional TTS without emotion labels.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Enhanced Facial Landmarks Detection for Patients with Repaired Cleft Lip and Palate.
Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

2023
Speech Synthesis With Mixed Emotions.
IEEE Trans. Affect. Comput., 2023

Emotion Intensity and its Control for Emotional Voice Conversion.
IEEE Trans. Affect. Comput., 2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search.
CoRR, 2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks.
CoRR, 2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Decoding Knowledge Transfer for Neural Text-to-Speech Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Emotional voice conversion: Theory, databases and ESD.
Speech Commun., 2022

SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech.
CoRR, 2022

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder.
CoRR, 2022

Mixed Emotion Modelling for Emotional Voice Conversion.
CoRR, 2022

Controllable Accented Text-to-Speech Synthesis.
CoRR, 2022

Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Expressive TTS Training With Frame and Style Reconstruction Loss.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastTalker: A neural text-to-speech architecture with shallow and group autoregression.
Neural Networks, 2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity.
CoRR, 2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis.
CoRR, 2021

Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-Stage Sequence-to-Sequence Training.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

SUTD-NUS System for Blizzard Challenge 2021.
Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

DEEPA: A Deep Neural Analyzer for Speech and Singing Vocoding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS.
IEEE Signal Process. Lett., 2020

DeepConversion: Voice conversion with limited parallel training data.
Speech Commun., 2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Teacher-Student Training For Robust Tacotron-Based TTS.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

SINGAN: Singing Voice Conversion with Generative Adversarial Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Phonetically Aware Exemplar-Based Prosody Transformation.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.
Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

Error Reduction Network for DBLSTM-based Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
On the analysis and evaluation of prosody conversion techniques.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Sparse representation of phonetic features for voice conversion with and without parallel data.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Transformation of prosody in voice conversion.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Energy and data cooperation in energy harvesting multiple access channel.
Proceedings of the IEEE Wireless Communications and Networking Conference Workshops, 2016


  Loading...