Yuexian Zou

Orcid: 0000-0002-0144-1794

According to our database1, Yuexian Zou authored at least 297 papers between 1999 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2024

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

CAR: Controllable Autoregressive Modeling for Visual Generation.
CoRR, 2024

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval.
CoRR, 2024

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation.
CoRR, 2024

Image Conductor: Precision Control for Interactive Video Synthesis.
CoRR, 2024

On the Worst Prompt Performance of Large Language Models.
CoRR, 2024

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding.
CoRR, 2024

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework.
CoRR, 2024

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs.
CoRR, 2024

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection.
CoRR, 2024

Dance with Labels: Dual-Heterogeneous Label Graph Interaction for Multi-intent Spoken Language Understanding.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Fake-GPT: Detecting Fake Image via Large Language Model.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

MaCSC: Towards Multimodal-augmented Pre-trained Language Models via Conceptual Prototypes and Self-balancing Calibration.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Generating More Audios for End-to-End Spoken Language Understanding.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Retrieval is Accurate Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

What are the Generator Preferences for End-to-end Task-Oriented Dialog System?
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Learning to Match Representations is Better for End-to-End Task-Oriented Dialog System.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Relevance Is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval.
Proceedings of the Computer Vision - ECCV 2024, 2024

Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label Semantic.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

PCLmed: Champion Solution for ImageCLEFmedical 2024 Caption Prediction Challenge via Medical Vision-Language Foundation Models.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024

SaLa: Scenario-aware Label Graph Interaction for Multi-intent Spoken Language Understanding.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

Robust Heterophily Graph Learning via Uniformity Augmentation.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

PCAD: Towards ASR-Robust Spoken Language Understanding via Prototype Calibration and Asymmetric Decoupling.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Towards Explainable Joint Models via Information Theory for Multiple Intent Detection and Slot Filling.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Exploiting Auxiliary Caption for Video Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Towards Multi-Intent Spoken Language Understanding via Hierarchical Attention and Optimal Transport.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
SpatioTemporal focus for skeleton-based action recognition.
Pattern Recognit., April, 2023

Concept-Aware Video Captioning: Describing Videos With Effective Prior Information.
IEEE Trans. Image Process., 2023

Diffsound: Discrete Diffusion Model for Text-to-Sound Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild.
CoRR, 2023

UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework.
CoRR, 2023

Video Referring Expression Comprehension via Transformer with Content-conditioned Query.
CoRR, 2023

Customizing General-Purpose Foundation Models for Medical Report Generation.
CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.
CoRR, 2023

PoseRAC: Pose Saliency Transformer for Repetitive Action Counting.
CoRR, 2023

Improve Retrieval-based Dialogue System via Syntax-Informed Attention.
CoRR, 2023

Generating Templated Caption for Video Grounding.
CoRR, 2023

Video Referring Expression Comprehension via Transformer with Content-conditioned Query.
Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval, 2023

Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Background-aware Modeling for Weakly Supervised Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Weakly Supervised Sound Event Detection with Causal Intervention.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Retrieval-Based Dialogue System Via Syntax-Informed Attention.
Proceedings of the IEEE International Conference on Acoustics, 2023

SSVMR: Saliency-Based Self-Training for Video-Music Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

M<sup>3</sup>ST: Mix at Three Levels for Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Accelerating Multiple Intent Detection and Slot Filling via Targeted Knowledge Distillation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

MRRL: Modifying the Reference via Reinforcement Learning for Non-Autoregressive Joint Multiple Intent Detection and Slot Filling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Iterative Proposal Refinement for Weakly-Supervised Video Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

TLAG: An Informative Trigger and Label-Aware Knowledge Guided Model for Dialogue-based Relation Extraction.
Proceedings of the 26th International Conference on Computer Supported Cooperative Work in Design, 2023

PCLmed at ImageCLEFmedical 2023: Customizing General-Purpose Foundation Models for Medical Report Generation.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive Learning.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Towards Unified Spoken Language Understanding Decoding via Label-aware Compact Linguistics Representations.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FiTs: Fine-Grained Two-Stage Training for Knowledge-Aware Question Answering.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

FTM: A Frame-Level Timeline Modeling Method for Temporal Graph Representation Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention.
ACM Trans. Knowl. Discov. Data, 2022

Deep Motion Prior for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Image Process., 2022

RR-Net: Relation Reasoning for End-to-End Human-Object Interaction Detection.
IEEE Trans. Circuits Syst. Video Technol., 2022

All You Need Is a Second Look: Towards Arbitrary-Shaped Text Detection.
IEEE Trans. Circuits Syst. Video Technol., 2022

Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model.
IEEE Signal Process. Lett., 2022

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

M3ST: Mix at Three Levels for Speech Translation.
CoRR, 2022

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning.
CoRR, 2022

Video Referring Expression Comprehension via Transformer with Content-aware Query.
CoRR, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
CoRR, 2022

A Two-student Learning Framework for Mixed Supervised Target Sound Detection.
CoRR, 2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition.
CoRR, 2022

Adaptive Curriculum Learning for Video Captioning.
IEEE Access, 2022

CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter.
Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

Consensus-Guided Keyword Targeting for Video Captioning.
Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Correspondence Matters for Video Referring Expression Comprehension.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Target Sound Extraction with Timestamp Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Joint Intent Detection and Slot Filling via Higher-order Attention.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

A Mutual Learning Framework for Few-Shot Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention.
Proceedings of the IEEE International Conference on Acoustics, 2022

Learning Decoupling Features Through Orthogonality Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Multiple Intent Detection and Slot Filling Via Self-Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Leveraging Bilinear Attention to Improve Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Visual Relation-Aware Unsupervised Video Captioning.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2022, 2022

LocVTP: Video-Text Pre-training for Temporal Localization.
Proceedings of the Computer Vision - ECCV 2022, 2022

A Mixed Supervised Learning Framework For Target Sound Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Detect What You Want: Target Sound Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

Unsupervised Pre-training for Temporal Action Localization Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A Transformer-based Threshold-Free Framework for Multi-Intent NLU.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021
AFNet: Temporal Locality-Aware Network With Dual Structure for Accurate and Fast Action Detection.
IEEE Trans. Multim., 2021

Learning Human-Object Interaction via Interactive Semantic Reasoning.
IEEE Trans. Image Process., 2021

Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain.
IEEE Signal Process. Lett., 2021

EAR: Efficient action recognition with local-global temporal aggregation.
Image Vis. Comput., 2021

Synergic learning for noise-insensitive webly-supervised temporal action localization.
Image Vis. Comput., 2021

GID-Net: Detecting human-object interaction with global and instance dependency.
Neurocomputing, 2021

Detect what you want: Target Sound Detection.
CoRR, 2021

CLIP Meets Video Captioners: Attribute-Aware Representation Learning Promotes Accurate Captioning.
CoRR, 2021

HAN: Higher-order Attention Network for Spoken Language Understanding.
CoRR, 2021

Fully Non-Homogeneous Atmospheric Scattering Modeling with Convolutional Neural Networks for Single Image Dehazing.
CoRR, 2021

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model.
CoRR, 2021

Exploring Semantic Relationships for Unpaired Image Captioning.
CoRR, 2021

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets.
CoRR, 2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency.
CoRR, 2021

FWB-Net: Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation.
CoRR, 2021

Hierarchical hashing-based multi-source image retrieval method for image denoising.
Appl. Soft Comput., 2021

Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semantic Transportation Prototypical Network for Few-Shot Intent Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Self-Supervised Dialogue Learning for Spoken Conversational Question Answering.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Contrastive Self-Supervised Learning for Text-Independent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Knowledge Distillation for Improved Accuracy in Spoken Question Answering.
Proceedings of the IEEE International Conference on Acoustics, 2021

Long-Short Temporal Modeling for Efficient Action Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Global-Local Attention Framework for Weakly Labelled Audio Tagging.
Proceedings of the IEEE International Conference on Acoustics, 2021

FWB-Net: Front White Balance Network for Color Shift Correction in Single Image Dehazing Via Atmospheric Light Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021

SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

Sentiment Injected Iteratively Co-Interactive Network for Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2021

Adaptive Bi-Directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension.
Proceedings of the IEEE International Conference on Acoustics, 2021

Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

On Pursuit of Designing Multi-modal Transformer for Video Grounding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information.
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

CoLA: Weakly-Supervised Temporal Action Localization With Snippet Contrastive Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Non-Autoregressive Coarse-to-Fine Video Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Modeling Label Dependencies for Audio Tagging With Graph Convolutional Network.
IEEE Signal Process. Lett., 2020

Multi-Modal Multi-Channel Target Speech Separation.
IEEE J. Sel. Top. Signal Process., 2020

Towards Data Distillation for End-to-end Spoken Conversational Question Answering.
CoRR, 2020

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance.
CoRR, 2020

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation.
CoRR, 2020

Prophet Attention: Predicting Attention with Future Attention.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Cluster Attention Contrast for Video Anomaly Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Bridging the Gap between Vision and Language Domains for Improved Image Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Environmental Sound Classification with Parallel Temporal-Spectral Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Graph-based Interactive Reasoning for Human-Object Interaction Detection.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

ABC-NET: Avoiding Blocking Effect & Color Shift Network for Single Image Dehazing Via Restraining Transmission Bias.
Proceedings of the IEEE International Conference on Image Processing, 2020

Semanticgan: Generative Adversarial Networks For Semantic Image To Photo-Realistic Image Translation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Weakly Labelled Audio Tagging Via Convolutional Networks with Spatial and Channel-Wise Attention.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

All You Need is a Second Look: Towards Tighter Arbitrary Shape Text Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Acoustic Scene Classification with Spectrogram Processing Strategies.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

Rethinking Skip Connection with Layer Normalization.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Federated Learning for Spoken Language Understanding.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Context-adaptive Gaussian Attention for Text-independent Speaker Verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Federated Learning for Vision-and-Language Grounding Problems.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Improved Blind Timing Skew Estimation Based on Spectrum Sparsity and ApFFT in Time-Interleaved ADCs.
IEEE Trans. Instrum. Meas., 2019

C-RPNs: Promoting object detection in real world via a cascade structure of Region Proposal Networks.
Neurocomputing, 2019

Learning discriminative and robust time-frequency representations for environmental sound classification.
CoRR, 2019

Non-Autoregressive Video Captioning with Iterative Refinement.
CoRR, 2019

End-to-End Multi-Channel Speech Separation.
CoRR, 2019

GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection.
IEEE Access, 2019

Scale-Informed Density Estimation for Dense Crowd Counting.
Proceedings of the 2019 IEEE Visual Communications and Image Processing, 2019

Using Dependency Information to Enhance Attention Mechanism for Aspect-Based Sentiment Analysis.
Proceedings of the Natural Language Processing and Chinese Computing, 2019

Hierarchical Temporal Pooling for Efficient Online Action Recognition.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

Using Coarse Label Constraint for Fine-Grained Visual Classification.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

STMP: Spatial Temporal Multi-level Proposal Network for Activity Detection.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

IKDMM: Iterative Knowledge Distillation Mask Model for Robust Acoustic Beamforming.
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

PAN: Persistent Appearance Network with an Efficient Motion Cue for Fast Action Recognition.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cascade Region Proposal Networks for Object Detection in the Wild.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Exploring Semantic Relationships for Image Captioning without Parallel Data.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Semantic Super-resolution for Extremely Low-resolution Vehicle License Plate.
Proceedings of the IEEE International Conference on Acoustics, 2019

Labelled Non-zero Particle Flow for SMC-PHD Filtering.
Proceedings of the IEEE International Conference on Acoustics, 2019

Selecting Optimal Proposal Number for Image-based Object Detection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Discriminative Feature Learning for Speech Emotion Recognition.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2019: Text and Time Series, 2019

Discriminative Feature Learning Using Two-Stage Training Strategy for Facial Expression Recognition.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2019: Image Processing, 2019

Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Logistic Similarity Metric Learning via Affinity Matrix for Text-Independent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speaker-discriminative Embedding Learning via Affinity Matrix for Short Utterance Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Teacher-Student BLSTM Mask Model for Robust Acoustic Beamforming.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Alleviate Cross-chunk Permutation through Chunk-level Speaker Embedding for Blind Speech Separation.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

What Affects the Performance of Convolutional Neural Networks for Audio Event Classification.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

Speech Emotion Recognition using Spectral Normalized CycleGAN.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

2018
Manifold-Based Visual Object Counting.
IEEE Trans. Image Process., 2018

Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor.
J. Frankl. Inst., 2018

Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

LD-CNN: A Lightweight Dilated Convolutional Neural Network for Environmental Sound Classification.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

DCH-Net: Densely Connected Highway Convolution Neural Network for Environmental Sound Classification.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Hierarchical Feature Fusion With Text Attention For Multi-scale Text Detection.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Multi-Scale Object Detection with Feature Fusion and Region Objectness Network.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Inverse Atmoshperic Scattering Modeling with Convolutional Neural Networks for Single Image Dehazing.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

AICDS: An Infant Crying Detection System Based on Lightweight Convolutional Neural Network.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

Economic Index Forecasting via Multi-scale Recursive Dynamic Factor Analysis.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

2017
Density peaks clustering based integrate framework for multi-document summarization.
CAAI Trans. Intell. Technol., 2017

Multi-document Summarization via LDA and Density Peaks Based Sentence-Level Clustering.
Proceedings of the Computational Intelligence and Intelligent Systems, 2017

A Multi-task Learning Approach for Mandarin-English Code-Switching Conversational Speech Recognition.
Proceedings of the Computational Intelligence and Intelligent Systems, 2017

Data-Driven Phone Selection for Language Identification via Bidirectional Long Short-Term Memory Modeling.
Proceedings of the Computational Intelligence and Intelligent Systems, 2017

Accurate small object detection via density map aided saliency estimation.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Dilated convolution neural network with LeakyReLU for environmental sound classification.
Proceedings of the 22nd International Conference on Digital Signal Processing, 2017

A deep convolutional encoder-decoder model for robust speech dereverberation.
Proceedings of the 22nd International Conference on Digital Signal Processing, 2017

Enhancing speaker verification with short voice commands via autoencoder and phonetic bottleneck learning.
Proceedings of the 22nd International Conference on Digital Signal Processing, 2017

Sequence-guided siamese neural network for video summarization of unmanned aerial vehicles.
Proceedings of the 22nd International Conference on Digital Signal Processing, 2017

Robust speaker DOA estimation based on the inter-sensor data ratio model and binary mask estimation in the bispectrum domain.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Example-based Visual Object Counting for complex background with a local low-rank constraint.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-document news summarization via paragraph embedding and density peak clustering.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Investigating multi-task learning for automatic speech recognition with code-switching between mandarin and english.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Learning a robust DOA estimation model with acoustic vector sensor cues.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Speech emotion recognition via ensembling neural networks.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Investigating the Stacked Phonetic Bottleneck Feature for Speaker Verification with Short Voice Commands.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

2016
An effective voiceprint based identity authentication system for Mandarin smartphone users.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Wireless capsule endoscopy video summarization: A learning approach based on Siamese neural network and support vector machine.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Example-based visual object counting with a sparsity constraint.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Cost-sensitive sparse linear regression for crowd counting with imbalanced training data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Fast visual object counting via example-based density estimation.
Proceedings of the 2016 IEEE International Conference on Image Processing, 2016

A robust DBN-vector based speaker verification system under channel mismatch conditions.
Proceedings of the 2016 IEEE International Conference on Digital Signal Processing, 2016

Accurate and robust device-free localization approach via sparse representation in presence of noise and outliers.
Proceedings of the 2016 IEEE International Conference on Digital Signal Processing, 2016

An Effective and Robust Multi-view Vehicle Classification Method Based on Local and Structural Features.
Proceedings of the IEEE Second International Conference on Multimedia Big Data, 2016

An Efficient Learning Based Smartphone Playback Attack Detection Using GMM Supervector.
Proceedings of the IEEE Second International Conference on Multimedia Big Data, 2016

2015
KCRC-LCD: Discriminative kernel collaborative representation with locality constrained dictionary for visual categorization.
Pattern Recognit., 2015

A hybrid convolutional neural networks with extreme learning machine for WCE image classification.
Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, 2015

Joint kernel dictionary and classifier learning for sparse coding via locality preserving K-SVD.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Multi-kernel collaborative representation for image classification.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Classifying digestive organs in wireless capsule endoscopy images based on deep convolutional neural network.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

Single image super-resolution via adaptive dictionary pair learning for wireless capsule endoscopy image.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

Two stages signal strength difference localization algorithm using SDP relaxation.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

An adaptive redundant image elimination for Wireless Capsule Endoscopy review based on temporal correlation and color-texture feature similarity.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

A parametric modeling approach for wireless capsule endoscopy hazy image restoration.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Nonnegative matrix factorization based noise robust speaker verification.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Integrating Visual and Textual Features for Web Image Clustering.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

A Robust Acoustic Feature Extraction Approach Based on Stacked Denoising Autoencoder.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

2014
A novel kernel collaborative representation approach for image classification.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

A kernel-based l2 norm regularized least square algorithm for vehicle logo recognition.
Proceedings of the 19th International Conference on Digital Signal Processing, 2014

Long-term auto-correlation statistics based voice activity detection for strong noisy speech.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Wireless capsule endoscopy image classification based on vector sparse coding.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2013
A comparative analysis of Spearman's rho and Kendall's tau in normal and contaminated normal models.
Signal Process., 2013

Urban vehicle classification based on linear SVM with efficient vector sparse coding.
Proceedings of the IEEE International Conference on Information and Automation, 2013

Wireless capsule endoscopy images enhancement based on adaptive anisotropic diffusion.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

2012
A Novel Multiple Sparse Source Localization Using Triangular Pyramid Microphone Array.
IEEE Signal Process. Lett., 2012

An advanced WCE video summary using relation matrix rank.
Proceedings of 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics, 2012

A novel EMD-based Common Spatial Pattern for motor imagery brain-computer interface.
Proceedings of 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics, 2012

2011
Timing Mismatch Compensation in Time-Interleaved ADCs Based on Multichannel Lagrange Polynomial Interpolation.
IEEE Trans. Instrum. Meas., 2011

Traffic incident classification at intersections based on image sequences by HMM/SVM classifiers.
Multim. Tools Appl., 2011

An efficient blind timing skews estimation for time-interleaved analog-to-digital converters.
Proceedings of the 17th International Conference on Digital Signal Processing, 2011

2010
Comparison of Spearman's rho and Kendall's tau in Normal and Contaminated Normal Models
CoRR, 2010

A stimulus pattern extraction algorithm based on saliency map for a 625-channel retinal prosthesis system.
Proceedings of the 18th European Signal Processing Conference, 2010

A moving vehicle segmentation method based on clustering of feature points for tracking at urban intersection.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

2009
Reply to "Comments on 'A Recursive Least M-Estimate Algorithm for Robust Adaptive Filtering in Impulsive Noise: Fast Algorithm and Convergence Performance Analysis'".
IEEE Trans. Signal Process., 2009

Time-Interleaved Analog-to-Digital-Converter Compensation Using Multichannel Filters.
IEEE Trans. Circuits Syst. I Regul. Pap., 2009

Robust human tracking based on multi-cue integration and mean-shift.
Pattern Recognit. Lett., 2009

A Slope K method for image based localization.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2009

Detection of hands-raising gestures using shape and edge features.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2009

Signal modulation schemes comparison in the telemetry unit for retinal prosthesis system.
Proceedings of the 4th IEEE International Conference on Nano/Micro Engineered and Molecular Systems, 2009

Extraocular image processing for retinal prosthesis based on DSP.
Proceedings of the 4th IEEE International Conference on Nano/Micro Engineered and Molecular Systems, 2009

Multi-category human motion recognition based on MEMS inertial sensing data.
Proceedings of the 4th IEEE International Conference on Nano/Micro Engineered and Molecular Systems, 2009

Image Sequences Based Traffic Incident Detection for Signaled Intersections Using HMM.
Proceedings of the 9th International Conference on Hybrid Intelligent Systems (HIS 2009), 2009

Video Image Vehicle Detection System for Signaled Traffic Intersection.
Proceedings of the 9th International Conference on Hybrid Intelligent Systems (HIS 2009), 2009

2008
Towards HMM based human motion recognition using MEMS inertial sensors.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2008

Recursive robust variable loading mvdr beamforming in impulsive noise environment.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

PCA/ICA-based SVM for fall recognition using MEMS motion sensing data.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

2005
A piloted adaptive notch filter.
IEEE Trans. Signal Process., 2005

2004
A recursive least M-estimate algorithm for robust adaptive filtering in impulsive noise: fast algorithm and convergence performance analysis.
IEEE Trans. Signal Process., 2004

2001
A robust quasi-Newton adaptive filtering algorithm for impulse noise suppression.
Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

A Huber recursive least squares adaptive lattice filter for impulse noise suppression.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
A recursive least M-estimate (RLM) adaptive filter for robust filtering in impulse noise.
IEEE Signal Process. Lett., 2000

Fast least mean M-estimate algorithms for robust adaptive filtering in impulse noise.
Proceedings of the 10th European Signal Processing Conference, 2000

1999
Transform domain adaptive Volterra filter algorithm based on constrained optimization.
Proceedings of the 1999 International Symposium on Circuits and Systems, ISCAS 1999, Orlando, Florida, USA, May 30, 1999

A robust M-estimate adaptive filter for impulse noise suppression.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999


  Loading...