Zhou Zhao

Orcid: 0000-0001-6121-0384

According to our database1, Zhou Zhao authored at least 416 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Multi-Granularity Relational Attention Network for Audio-Visual Question Answering.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

Detecting Transitions from Stability to Instability in Robotic Grasping Based on Tactile Perception.
Sensors, August, 2024

Smaller and Faster Robotic Grasp Detection Model via Knowledge Distillation and Unequal Feature Encoding.
IEEE Robotics Autom. Lett., August, 2024

Video Moment Retrieval With Noisy Labels.
IEEE Trans. Neural Networks Learn. Syst., May, 2024

A Bioinspired Multifunctional Tendon-Driven Tactile Sensor and Application in Obstacle Avoidance Using Reinforcement Learning.
IEEE Trans. Cogn. Dev. Syst., April, 2024

SLED: Structure Learning based Denoising for Recommendation.
ACM Trans. Inf. Syst., March, 2024

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems.
IEEE Trans. Knowl. Data Eng., February, 2024

Tactile-Based Grasping Stability Prediction Based on Human Grasp Demonstration for Robot Manipulation.
IEEE Robotics Autom. Lett., 2024

Query-guided generalizable medical image segmentation.
Pattern Recognit. Lett., 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.
CoRR, 2024

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models.
CoRR, 2024

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks.
CoRR, 2024

STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM.
CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency.
CoRR, 2024

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis.
CoRR, 2024

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control.
CoRR, 2024

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces.
CoRR, 2024

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody.
CoRR, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

AudioLCM: Text-to-Audio Generation with Latent Consistency Models.
CoRR, 2024

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching.
CoRR, 2024

SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models.
CoRR, 2024

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities.
CoRR, 2024

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment.
CoRR, 2024

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment.
CoRR, 2024

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models.
CoRR, 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
CoRR, 2024

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Low-rank Prompt Interaction for Continual Vision-Language Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Semantic Alignment for Multimodal Large Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Semantic Codebook Learning for Dynamic Recommendation Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Cross-modal Observation Hypothesis Inference.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

WIA-LD2ND: Wavelet-Based Image Alignment for Self-supervised Low-Dose CT Denoising.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

MoreStyle: Relax Low-Frequency Constraint of Fourier-Based Image Reconstruction in Generalizable Medical Image Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Spatial-Aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Non-confusing Generation of Customized Concepts in Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructSpeech: Following Speech Editing Instructions via Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Language Model is a Branch Predictor for Simultaneous Machine Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Robust Singing Voice Transcription Serves Synthesis.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Visual-Tactile Robot Grasping Based on Human Skill Learning From Demonstrations Using a Wearable Parallel Hand Exoskeleton.
IEEE Robotics Autom. Lett., September, 2023

Personalized Latent Structure Learning for Recommendation.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Fabrication and Performance of a Ta2O5 Thin Film pH Sensor Manufactured Using MEMS Processes.
Sensors, July, 2023

Bioinspired Hierarchical Structure for an Ultrawide-Range Multifunctional Flexible Sensor Using Porous Expandable Polyethylene/Loofah-Like Polyurethane Sponge Material.
Adv. Intell. Syst., January, 2023

Heart Segmentation and Evaluation of Fibrosis. (Segmentation cardiaque et évaluation de la fibrose).
PhD thesis, 2023

MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images.
Medical Image Anal., 2023

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
CoRR, 2023

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding.
CoRR, 2023

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers.
CoRR, 2023

Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast.
CoRR, 2023

Weakly-Supervised Video Moment Retrieval via Regularized Two-Branch Proposal Networks with Erasing Mechanism.
CoRR, 2023

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models.
CoRR, 2023

Extending Multi-modal Contrastive Representations.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models.
CoRR, 2023

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes.
CoRR, 2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis.
CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.
CoRR, 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation.
CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.
CoRR, 2023

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.
CoRR, 2023

Denoising Multi-modal Sequential Recommenders with Contrastive Learning.
CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
CoRR, 2023

DisCover: Disentangled Music Representation Learning for Cover Song Identification.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Achieving Cross Modal Generalization with Multimodal Unified Representation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Connecting Multi-modal Contrastive Representations.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PTADisc: A Cross-Course Dataset Supporting Personalized Learning in Cold-Start Scenarios.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Unsupervised Domain Adaptation for Referring Semantic Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Rethinking Missing Modality Learning from a Decoding Perspective.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Stable Prediction on Graphs with Agnostic Distribution Shifts.
Proceedings of the KDD'23 Workshop on Causal Discovery, 2023

MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

MechTac: A Multifunctional Tendon-Linked Optical Tactile Sensor for In/Out-the-Field-of-View Perception with Deep Learning.
Proceedings of the 49th Annual Conference of the IEEE Industrial Electronics Society, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Open-Vocabulary Object Detection With an Open Corpus.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Group Video Captioning with Efficient Relational Approximation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement.
Proceedings of the IEEE International Conference on Acoustics, 2023

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ART: rule bAsed futuRe-inference deducTion.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Gloss Attention for Gloss-free Sign Language Translation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DATE: Domain Adaptive Product Seeker for E-Commerce.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Sequential Style Consistency Learning for Domain-Generalizable Text Recognition.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

Multi-trends Enhanced Dynamic Micro-video Recommendation.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

TAVT: Towards Transferable Audio-Visual Text Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Multi-modal Action Chain Abductive Reasoning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Prosody-TTS: Improving Prosody with Masked Autoencoder and Conditional Diffusion Model For Expressive Text-to-Speech.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Audio Domain Generalization via Confounder Disentanglement.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention.
IEEE Trans. Software Eng., 2022

AgeGAN++: Face Aging and Rejuvenation With Dual Conditional GANs.
IEEE Trans. Multim., 2022

TaoHighlight: Commodity-Aware Multi-Modal Video Highlight Detection in E-Commerce.
IEEE Trans. Multim., 2022

Local-Global Graph Pooling via Mutual Information Maximization for Video-Paragraph Retrieval.
IEEE Trans. Circuits Syst. Video Technol., 2022

Interaction augmented transformer with decoupled decoding for video captioning.
Neurocomputing, 2022

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection.
CoRR, 2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement.
CoRR, 2022

Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering.
CoRR, 2022

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments.
CoRR, 2022

CCL4Rec: Contrast over Contrastive Learning for Micro-video Recommendation.
CoRR, 2022

AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism.
CoRR, 2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech.
CoRR, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis.
CoRR, 2022

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder.
CoRR, 2022

Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Contrastive Learning with Positive-Negative Frame Mask for Music Representation.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Uncovering Causal Effects of Online Short Videos on Consumer Behaviors.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Video-Guided Curriculum Learning for Spoken Video Grounding.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Weakly-supervised Disentanglement Network for Video Fingerspelling Detection.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

MC-SLT: Towards Low-Resource Signer-Adaptive Sign Language Translation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Multi-purpose Tactile Perception Based on Deep Learning in a New Tendon-driven Optical Tactile Sensor.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

EditSinger: Zero-Shot Text-Based Singing Voice Editing System with Diverse Prosody Modeling.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Pseudo Numerical Methods for Diffusion Models on Manifolds.
Proceedings of the Tenth International Conference on Learning Representations, 2022

HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

MLSLT: Towards Multilingual Sign Language Translation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-modal Background Suppression for Audio-Visual Event Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross- Modal Denoising Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Fine-Grained Predicates Learning for Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Revisiting Over-Smoothness in Text to Speech.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Parallel and High-Fidelity Text-to-Lip Generation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Flow-Based Unconstrained Lip to Speech Generation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
An Effective Hybrid Learning Model for Real-Time Event Summarization.
IEEE Trans. Neural Networks Learn. Syst., 2021

Hierarchical Human-Like Deep Neural Networks for Abstractive Text Summarization.
IEEE Trans. Neural Networks Learn. Syst., 2021

Temporal Textual Localization in Video via Adversarial Bi-Directional Interaction Networks.
IEEE Trans. Multim., 2021

Adaptive Spatio-Temporal Graph Enhanced Vision-Language Representation for Video QA.
IEEE Trans. Image Process., 2021

Graph-Based Multi-Interaction Network for Video Question Answering.
IEEE Trans. Image Process., 2021

Hierarchical Memory Decoder for Visual Narrating.
IEEE Trans. Circuits Syst. Video Technol., 2021

Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network.
IEEE Trans. Circuits Syst. Video Technol., 2021

Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis.
Pattern Recognit., 2021

A Chinese Multi-type Complex Questions Answering Dataset over Wikidata.
CoRR, 2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation.
CoRR, 2021

Test-time Batch Statistics Calibration for Covariate Shift.
CoRR, 2021

Multi-trends Enhanced Dynamic Micro-video Recommendation.
CoRR, 2021

Stable Prediction on Graphs with Agnostic Distribution Shift.
CoRR, 2021

High-Speed and High-Quality Text-to-Lip Generation.
CoRR, 2021

DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis.
CoRR, 2021

Modeling High-order Interactions across Multi-interests for Micro-video Reommendation.
CoRR, 2021

Future-Aware Diverse Trends Framework for Recommendation.
Proceedings of the WWW '21: The Web Conference 2021, 2021

CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

MGD-GAN: Text-to-Pedestrian Generation Through Multi-grained Discrimination.
Proceedings of the Pattern Recognition and Computer Vision - 4th Chinese Conference, 2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Generalizable Multi-linear Attention Network.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SimulSLT: End-to-End Simultaneous Sign Language Translation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Why Do We Click: Visual Impression-aware News Recommendation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Contrastive Disentangled Meta-Learning for Signer-Independent Sign Language Translation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Towards Fast and High-Quality Sign Language Production.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FedSpeech: Federated Text-to-Speech with Continual Learning.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Learning to Rehearse in Long Sequence Memorization.
Proceedings of the 38th International Conference on Machine Learning, 2021

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Proceedings of the 9th International Conference on Learning Representations, 2021

Cortical Surface Shape Analysis Based on Alexandrov Polyhedra.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Cascaded Prediction Network via Segment Tree for Temporal Video Grounding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

DeVLBert: Out-of-Distribution Visio-Linguistic Pretraining With Causality.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Grounded, Controllable and Debiased Image Completion With Lexical Semantics.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Modeling High-order Interactions across Multi-interests for Micro-video Reommendation (Student Abstract).
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Multichannel Attention Refinement for Video Question Answering.
ACM Trans. Multim. Comput. Commun. Appl., 2020

NGUARD+: An Attention-based Game Bot Detection Framework via Player Behavior Sequences.
ACM Trans. Knowl. Discov. Data, 2020

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.
IEEE Trans. Image Process., 2020

An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network.
IEEE Trans. Image Process., 2020

Query-Biased Self-Attentive Network for Query-Focused Video Summarization.
IEEE Trans. Image Process., 2020

Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction.
IEEE Trans. Image Process., 2020

Leveraging Long and Short-Term Information in Content-Aware Movie Recommendation via Adversarial Training.
IEEE Trans. Cybern., 2020

An Advanced Deep Generative Framework for Temporal Link Prediction in Dynamic Networks.
IEEE Trans. Cybern., 2020

Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks.
IEEE Trans. Circuits Syst. Video Technol., 2020

Play and rewind: Context-aware video temporal action proposals.
Pattern Recognit., 2020

Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering.
Neural Process. Lett., 2020

Abstractive meeting summarization by hierarchical adaptive segmental network learning with multiple revising steps.
Neurocomputing, 2020

Bi-Decoder Augmented Network for Neural Machine Translation.
Neurocomputing, 2020

MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination.
CoRR, 2020

Grounded and Controllable Image Completion by Incorporating Lexical Semantics.
CoRR, 2020

Regional Relation Modeling for Visual Place Recognition.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

A Generic Network Compression Framework for Sequential Recommender Systems.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Poet: Product-oriented Video Captioner for E-commerce.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

DeVLBert: Learning Deconfounded Visio-Linguistic Representations.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

PopMAG: Pop Music Accompaniment Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Text-Guided Image Inpainting.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Stacked and Parallel U-Nets with Multi-output for Myocardial Pathology Segmentation.
Proceedings of the Myocardial Pathology Segmentation Combining Multi-Sequence Cardiac Magnetic Resonance Images, 2020

Comprehensive Information Integration Modeling Framework for Video Titling.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Do not Treat Boundaries and Regions Differently: An Example on Heart Left Atrial Segmentation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

FOANet: A Focus of Attention Network with Application to Myocardium Segmentation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

A Study of Non-autoregressive Model for Sequence Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Be Relevant, Non-Redundant, and Timely: Deep Reinforcement Learning for Real-Time Event Summarization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-Speaker Video Dialog with Frame-Level Temporal Localization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Interactive Dual Generative Adversarial Networks for Image Captioning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Multitask Learning for Cross-Domain Image Captioning.
IEEE Trans. Multim., 2019

Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks.
IEEE Trans. Image Process., 2019

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks.
IEEE Trans. Image Process., 2019

Investigating the transferring capability of capsule networks for text classification.
Neural Networks, 2019

Hierarchical human-like strategy for aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement learning.
Neural Networks, 2019

Long Short-Term Memory Network Design for Analog Computing.
ACM J. Emerg. Technol. Comput. Syst., 2019

Abstractive Meeting Summarization via Hierarchical Adaptive Segmental Network Learning.
Proceedings of the World Wide Web Conference, 2019

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

FastSpeech: Fast, Robust and Controllable Text to Speech.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Personalized Hashtag Recommendation for Micro-videos.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Multi-interaction Network with Object Relation for Video Question Answering.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

A Two-Stage Temporal-Like Fully Convolutional Network Framework for Left Ventricle Segmentation and Quantification on MR Images.
Proceedings of the Statistical Atlases and Computational Models of the Heart. Multi-Sequence CMR Segmentation, CRT-EPiggy and LV Full Quantification Challenges, 2019

Multi-modal Attention Network Learning for Semantic Source Code Retrieval.
Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Localizing Unseen Activities in Video via Image Query.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Weak Supervision Enhanced Generative Network for Question Generation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition.
Proceedings of the 36th International Conference on Machine Learning, 2019

Multilingual Neural Machine Translation with Knowledge Distillation.
Proceedings of the 7th International Conference on Learning Representations, 2019

Video Dialog via Progressive Inference and Cross-Transformer.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cross-modal Image-Text Retrieval with Multitask Learning.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Exploring Human-Like Reading Strategy for Abstractive Text Summarization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Location-Based End-to-End Speech Recognition with Multiple Language Models.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Answer Identification from Product Reviews for User Questions by Multi-Task Attentive Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
SCSMiner: mining social coding sites for software developer recommendation with relevance propagation.
World Wide Web, 2018

Social-Aware Movie Recommendation via Multimodal Network Learning.
IEEE Trans. Multim., 2018

PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data.
IEEE Trans. Knowl. Data Eng., 2018

A Better Way to Attend: Attention With Trees for Video Question Answering.
IEEE Trans. Image Process., 2018

qSwitch: Dynamical Off-Chip Bandwidth Allocation Between Local and Remote Accesses.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Refining object proposals using structured edge and superpixel contrast in robotic grasping.
Robotics Auton. Syst., 2018

Personalized response generation by Dual-learning based domain adaptation.
Neural Networks, 2018

Temporality-enhanced knowledgememory network for factoid question answering.
Frontiers Inf. Technol. Electron. Eng., 2018

A Multiple Input Floating Gate Based Arithmetic Logic Unit with a Feedback Loop for Digital Calibration.
J. Low Power Electron., 2018

Exploiting cross-source knowledge for warming up community question answering services.
Neurocomputing, 2018

The forgettable-watcher model for video question answering.
Neurocomputing, 2018

Question retrieval for community-based question answering via heterogeneous social influential network.
Neurocomputing, 2018

Calibration method to reduce the error in logarithmic conversion with its circuit implementation.
IET Circuits Devices Syst., 2018

Six-bit, reusable comparator stage-based asynchronous binary-search SAR ADC using smart switching network.
IET Circuits Devices Syst., 2018

Dial2Desc: End-to-end Dialogue Description Generation.
CoRR, 2018

Textually Guided Ranking Network for Attentional Image Retweet Modeling.
CoRR, 2018

Subgraph-augmented Path Embedding for Semantic User Search on Heterogeneous Social Network.
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

Dialogue Act Recognition via CRF-Attentive Structured Network.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Investigating Deep Reinforcement Learning Techniques in Personalized Dialogue Generation.
Proceedings of the 2018 SIAM International Conference on Data Mining, 2018

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Left Atrial Segmentation in a Few Seconds Using Fully Convolutional Network and Transfer Learning.
Proceedings of the Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges, 2018

NGUARD: A Game Bot Detection Framework for NetEase MMORPGs.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Improving automatic source code summarization via deep reinforcement learning.
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018

Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

A Multi-task Learning Approach for Image Captioning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Attentional Image Retweet Modeling via Multi-Faceted Ranking Network Learning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Video question answering via multi-granularity temporal attention network learning.
Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Investigating Capsule Networks with Dynamic Routing for Text Classification.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Cross-domain Aspect/Sentiment-aware Abstractive Review Summarization.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

StackReader: An RNN-Free Reading Comprehension Model.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Multi-Label Community-Based Question Classification via Personalized Sequence Memory Network Learning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Temporal Interaction and Causal Influence in Community-Based Question Answering.
IEEE Trans. Knowl. Data Eng., 2017

Unifying the Video and Question Attentions for Open-Ended Video Question Answering.
IEEE Trans. Image Process., 2017

Augmented reality for enhancing tele-robotic system with force feedback.
Robotics Auton. Syst., 2017

A novel switchable pin method for regulating power in chip-multiprocessor.
Integr., 2017

Leveraging Long and Short-term Information in Content-aware Movie Recommendation.
CoRR, 2017

Keyword-based Query Comprehending via Multiple Optimized-Demand Augmentation.
CoRR, 2017

Smarnet: Teaching Machines to Read and Comprehend Like Human.
CoRR, 2017

The Forgettable-Watcher Model for Video Question Answering.
CoRR, 2017

MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension.
CoRR, 2017

User Personalized Satisfaction Prediction via Multiple Instance Deep Learning.
Proceedings of the 26th International Conference on World Wide Web, 2017

Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Video Question Answering via Attribute-Augmented Attention Network Learning.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Personalized Response Generation via Domain adaptation.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Visual tracking and grasping of moving objects and its application to an industrial robot.
Proceedings of the 2017 IEEE International Conference on Real-time Computing and Robotics, 2017

Saliency based proposal refinement in robotic vision.
Proceedings of the 2017 IEEE International Conference on Real-time Computing and Robotics, 2017

Video Question Answering via Hierarchical Dual-Level Attention Network Learning.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Video Question Answering via Gradually Refined Attention over Appearance and Motion.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Detecting Temporal Proposal for Action Localization with Tree-structured Search Policy.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Compact Modeling of Graphene Barristor for Digital Integrated Circuit Design.
Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Microblog Sentiment Classification via Recurrent Random Walk Network Learning.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Link Prediction via Ranking Metric Dual-Level Attention Network Learning.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

An Image Contrast Enhancement Algorithm Using PLIP-Based Histogram Modification.
Proceedings of the 3rd IEEE International Conference on Cybernetics, 2017

Dual Learning for Cross-domain Image Captioning.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Integrating Side Information for Boosting Machine Comprehension.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
User Preference Learning for Online Social Recommendation.
IEEE Trans. Knowl. Data Eng., 2016

Graph Regularized Feature Selection with Data Reconstruction.
IEEE Trans. Knowl. Data Eng., 2016

USTF: A Unified System of Team Formation.
IEEE Trans. Big Data, 2016

Social recommendation via multi-view user preference learning.
Neurocomputing, 2016

Question Retrieval for Community-based Question Answering via Heterogeneous Network Integration Learning.
CoRR, 2016

User Personalized Satisfaction Prediction via Multiple Instance Deep Learning.
CoRR, 2016

Comparative study of logarithmic image processing models for medical image enhancement.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

A Low-Cost Mixed Clock Generator for High Speed Adiabatic Logic.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

Modeling of Graphene Nanoribbon Tunnel Field Effect Transistor in Verilog-A for Digital Circuit Design.
Proceedings of the IEEE International Symposium on Nanoelectronic and Information Systems, 2016

Expert Finding for Community-Based Question Answering via Ranking Metric Network Learning.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

PLIP based unsharp masking for medical image enhancement.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Crowdsourced Query Processing on Microblogs.
Proceedings of the Database Systems for Advanced Applications, 2016

Community-Based Question Answering via Heterogeneous Social Network Learning.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Expert Finding for Question Answering via Graph Regularized Matrix Completion.
IEEE Trans. Knowl. Data Eng., 2015

Probabilistic Convex Hull Queries over Uncertain Data.
IEEE Trans. Knowl. Data Eng., 2015

Powering Up Dark Silicon: Mitigating the Limitation of Power Delivery via Dynamic Pin Switching.
IEEE Trans. Emerg. Top. Comput., 2015

Efficient processing of optimal meeting point queries in Euclidean space and road networks.
Knowl. Inf. Syst., 2015

Efficient location-based search of trajectories with location importance.
Knowl. Inf. Syst., 2015

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

An Algorithm Used in a Power Monitor to Mitigate Dark Silicon on VLSI Chip.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

Circuit Implementation of Switchable Pins in Chip Multiprocessor.
Proceedings of the IEEE International Symposium on Nanoelectronic and Information Systems, 2015

Mobile Query Recommendation via Tensor Function Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Energy Consumption Prediction Based on Time-Series Models for CPU-Intensive Activities in the Cloud.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Crowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

Cold-Start Expert Finding in Community Question Answering via Graph Regularization.
Proceedings of the Database Systems for Advanced Applications, 2015

A Comparative Study of Team Formation in Social Networks.
Proceedings of the Database Systems for Advanced Applications, 2015

2014
Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases.
IEEE Trans. Knowl. Data Eng., 2014

Temporal Verification for Business Cloud Workflows: Open Research Issues.
Proceedings of the 2014 10th International Conference on Semantics, 2014

SocialTransfer: Transferring Social Knowledge for Cold-Start Cowdsourcing.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

2013
A transfer learning based framework of crowd-selection on twitter.
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013

CrowdSeed: query processing on microblogs.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

2012
A probabilistic convex hull query tool.
Proceedings of the 15th International Conference on Extending Database Technology, 2012

Mining probabilistically frequent sequential patterns in uncertain databases.
Proceedings of the 15th International Conference on Extending Database Technology, 2012

A model-based approach for RFID data stream cleansing.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Monochromatic and bichromatic reverse nearest neighbor queries on land surfaces.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Leveraging read rates of passive RFID tags for real-time indoor location tracking.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Efficient Algorithms for Finding Optimal Meeting Point on Road Networks.
Proc. VLDB Endow., 2011

Study on the application of improved simulated annealing algorithm for several types of optimization problems.
Proceedings of the Seventh International Conference on Natural Computation, 2011

2010
0.18 μm CMOS integrated circuit design for impedance-based structural health monitoring.
IET Circuits Devices Syst., 2010

2009
Three Dimensional Geological Modeling from Component-based Topological Data Model.
Proceedings of the International Conference on Computer Modeling and Simulation, 2009

An Improved Symmetrical Modeling Method on 3D Tunnel Modeling.
Proceedings of the International Conference on Computer Modeling and Simulation, 2009


  Loading...