Shiliang Zhang

Orcid: 0000-0002-9524-1602

According to our database1, Shiliang Zhang authored at least 244 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
TPTE: Text-Guided Patch Token Exploitation for Unsupervised Fine-Grained Representation Learning.
ACM Trans. Multim. Comput. Commun. Appl., November, 2024

Adaptive Robust Tracking Control With Active Learning for Linear Systems With Ellipsoidal Bounded Uncertainties.
IEEE Trans. Autom. Control., November, 2024

Open Set Recognition in Real World.
Int. J. Comput. Vis., August, 2024

Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition.
ACM Trans. Multim. Comput. Commun. Appl., April, 2024

Intra-Inter Domain Similarity for Unsupervised Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

Switched Surplus-Based Distributed Security Dispatch for Smart Grid With Persistent Packet Loss.
IEEE Internet Things J., February, 2024

Adapting Vision-Language Models via Learning to Inject Knowledge.
IEEE Trans. Image Process., 2024

Robust Fine-Grained Visual Recognition With Neighbor-Attention Label Correction.
IEEE Trans. Image Process., 2024

Graph-based social relation inference with multi-level conditional attention.
Neural Networks, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.
CoRR, 2024

Long-distance Geomagnetic Navigation in GNSS-denied Environments with Deep Reinforcement Learning.
CoRR, 2024

Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study.
CoRR, 2024

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition.
CoRR, 2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
CoRR, 2024

Dataset Distillation for Histopathology Image Classification.
CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
CoRR, 2024

MV-VTON: Multi-View Virtual Try-On with Diffusion Models.
CoRR, 2024

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models.
CoRR, 2024

P<sup>4</sup>S: Privacy-Preserving Personalized Pricing Scheme for Smart Grid.
Proceedings of the IEEE International Conference on Communications, 2024

The impact of integration of renewable energy on imbalance settlement: Resilience analysis.
Proceedings of the IEEE International Conference on Communications, 2024

CoTuning: A Large-Small Model Collaborating Distillation Framework for Better Model Generalization.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2024

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec.
Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Robust Quadratic Optimal Control of Linear Systems with Ellipsoid-Set Learning.
Proceedings of the European Control Conference, 2024

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Spatial-Aware Regression for Keypoint Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OVMR: Open-Vocabulary Recognition with Multi-Modal References.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Recognizing Ultra-High-Speed Moving Objects with Bio-Inspired Spike Camera.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Decoupled Contrastive Learning for Long-Tailed Recognition.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Decoupled Optimisation for Long-Tailed Visual Recognition.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Multi-proxy feature learning for robust fine-grained visual recognition.
Pattern Recognit., November, 2023

DCR-ReID: Deep Component Reconstruction for Cloth-Changing Person Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., August, 2023

Contextual Instance Decoupling for Instance-Level Human Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Evaluation of Open-Source Tools for Differential Privacy.
Sensors, July, 2023

MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking.
IEEE Trans. Multim., 2023

PolarPose: Single-Stage Multi-Person Pose Estimation in Polar Coordinates.
IEEE Trans. Image Process., 2023

Efficient Vision Transformer via Token Merger.
IEEE Trans. Image Process., 2023

A CIF-Based Speech Segmentation Method for Streaming E2E ASR.
IEEE Signal Process. Lett., 2023

Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures.
CoRR, 2023

Privacy-preserving transactive energy systems: Key topics and open research challenges.
CoRR, 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models.
CoRR, 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR.
CoRR, 2023

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.
CoRR, 2023

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer.
CoRR, 2023

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus.
CoRR, 2023

MixBCT: Towards Self-Adapting Backward-Compatible Training.
CoRR, 2023

Adaptive robust tracking control with active learning for linear systems with ellipsoidal bounded uncertainties.
CoRR, 2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.
CoRR, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
CoRR, 2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model.
CoRR, 2023

Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Recognizing High-Speed Moving Objects with Spike Camera.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

HumVis: Human-Centric Visual Analysis System.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CASA-ASR: Context-Aware Speaker-Attributed ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BAT: Boundary aware transducer for memory-efficient and low-latency ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

3D Human Mesh Recovery with Sequentially Global Rotation Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ParCNetV2: Oversized Kernel with Enhanced Attention<sup>*</sup>.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TOLD: a Novel Two-Stage Overlap-Aware Framework for Speaker Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Evolved Part Masking for Self-Supervised Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Introduction to the Special Issue on Fine-Grained Visual Recognition and Re-Identification.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Distillation-Guided Residual Learning for Binary Convolutional Neural Networks.
IEEE Trans. Neural Networks Learn. Syst., 2022

Bidirectional Posture-Appearance Interaction Network for Driver Behavior Recognition.
IEEE Trans. Intell. Transp. Syst., 2022

Large-Scale Spatio-Temporal Person Re-Identification: Algorithms and Benchmark.
IEEE Trans. Circuits Syst. Video Technol., 2022

Who is closer: A computational method for domain gap evaluation.
Pattern Recognit., 2022

Pose-Guided Representation Learning for Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

BDCN: Bi-Directional Cascade Network for Perceptual Edge Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Unsupervised Person Re-Identification via Multi-Label Classification.
Int. J. Comput. Vis., 2022

Deep Active Learning for Computer Vision: Past and Future.
CoRR, 2022

ParCNetV2: Oversized Kernel with Enhanced Attention.
CoRR, 2022

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.
CoRR, 2022

ALBench: A Framework for Evaluating Active Learning in Object Detection.
CoRR, 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.
CoRR, 2022

Extended vehicle energy dataset (eVED): an enhanced large-scale dataset for deep learning on vehicle trip energy consumption.
CoRR, 2022

An Evaluation of Open-source Tools for the Provision of Differential Privacy.
CoRR, 2022

Contextualize differential privacy in image database: a lightweight image differential privacy approach based on principle component analysis inverse.
CoRR, 2022

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Asymmetric Label Propagation for Video Object Segmentation.
Proceedings of the 4th ACM International Conference on Multimedia in Asia, 2022

Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Towards Language-universal Mandarin-English Speech Recognition with Unsupervised Label Synchronous Adaptation.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

SpikingSIM: A Bio-Inspired Spiking Simulator.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Transformer-Based Domain Adaptation for Event Data Classification.
Proceedings of the IEEE International Conference on Acoustics, 2022

Modeling The Detection Capability Of High-Speed Spiking Cameras.
Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Contextual Instance Decoupling for Robust Multi-Person Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Domain Generalization Capability Enhancement for Binary Neural Networks.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Progressive Feature Enhancement for Person Re-Identification.
IEEE Trans. Image Process., 2021

Multi-View Spatial Attention Embedding for Vehicle Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., 2021

Diverse part attentive network for video-based person re-identification.
Pattern Recognit. Lett., 2021

Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification.
Int. J. Comput. Vis., 2021

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information.
CoRR, 2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection.
CoRR, 2021

Enhancing Social Relation Inference with Concise Interaction Graph and Discriminative Scene Representation.
CoRR, 2021

Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark.
CoRR, 2021

AAformer: Auto-Aligned Transformer for Person Re-Identification.
CoRR, 2021

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Hybrid Network Compression via Meta-Learning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

An Energy Consumption Model for Electrical Vehicle Networks via Extended Federated-learning.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Extremely Low Footprint End-to-End ASR System for Smart Device.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Group-Group Loss-Based Global-Regional Feature Learning for Vehicle Re-Identification.
IEEE Trans. Image Process., 2020

Multi-Scale Temporal Cues Learning for Video Person Re-Identification.
IEEE Trans. Image Process., 2020

CDbin: Compact Discriminative Binary Descriptor Learned With Efficient Neural Network.
IEEE Trans. Circuits Syst. Video Technol., 2020

E<sup>2</sup>BoWs: An end-to-end Bag-of-Words model via deep convolutional neural network for image retrieval.
Neurocomputing, 2020

Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model.
CoRR, 2020

Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification.
CoRR, 2020

Domain Adaptive Person Re-Identification via Coupling Optimization.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Pan: Phoneme-Aware Network for Monaural Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-identification.
Proceedings of the Computer Vision - ECCV 2020, 2020

Robust Partial Matching for Person Search in the Wild.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
GLAD: Global-Local-Alignment Descriptor for Scalable Person Re-Identification.
IEEE Trans. Multim., 2019

Deep Representation Learning With Part Loss for Person Re-Identification.
IEEE Trans. Image Process., 2019

DR<sup>2</sup>-Net: Deep Residual Reconstruction Network for image compressive sensing.
Neurocomputing, 2019

An outlier detection scheme for dynamical sequential datasets.
Commun. Stat. Simul. Comput., 2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition.
CoRR, 2019

EAGER: Edge-Aided imaGe undERstanding System.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Self-Guided Hash Coding for Large-Scale Person Re-identification.
Proceedings of the 2nd IEEE Conference on Multimedia Information Processing and Retrieval, 2019

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Language-Universal Mandarin-English Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Resolution-invariant Person Re-Identification.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Global-Local Temporal Representations for Video Person Re-Identification.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigation of Modeling Units for Mandarin Speech Recognition Using Dfsmn-ctc-smbr.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bi-Directional Cascade Network for Perceptual Edge Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Sequential Outlier Criterion for Sparsification of Online Adaptive Filtering.
IEEE Trans. Neural Networks Learn. Syst., 2018

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching.
IEEE Trans. Multim., 2018

AutoBD: Automated Bi-Level Description for Scalable Fine-Grained Visual Categorization.
IEEE Trans. Image Process., 2018

Interacting Tracklets for Multi-Object Tracking.
IEEE Trans. Image Process., 2018

Learning Affective Features With a Hybrid Deep Model for Audio-Visual Emotion Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2018

Multi-type attributes driven multi-camera person re-identification.
Pattern Recognit., 2018

Multi-Task Learning with Low Rank Attribute Embedding for Multi-Camera Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

The Scale Effect on Spatial Interaction Patterns: An Empirical Study Using Taxi O-D data of Beijing and Shanghai.
IEEE Access, 2018

SCAN: Spatial and Channel Attention Network for Vehicle Re-Identification.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

VP-ReID: Vehicle and Person Re-Identification System.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

RAM: A Region-Aware Deep Model for Vehicle Re-Identification.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Feed-Forward Sequential Memory Networks for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Nonrecurrent Neural Structure for Long-Term Dependence.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.
Pattern Recognit., 2017

Attributes driven tracklet-to-tracklet person re-identification using latent prototypes space mapping.
Pattern Recognit., 2017

DSP: Discriminative Spatial Part modeling for Fine-Grained Visual Categorization.
Image Vis. Comput., 2017

E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network.
CoRR, 2017

Deep Representation Learning with Part Loss for Person Re-Identification.
CoRR, 2017

DR<sup>2</sup>-Net: Deep Residual Reconstruction Network for Image Compressive Sensing.
CoRR, 2017

One-Shot Fine-Grained Instance Retrieval.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Large-scale person re-identification as retrieval.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Pose-Driven Deep Convolutional Model for Person Re-identification.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Feedforward sequential memory networks based encoder-decoder model for machine translation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Learning the number of nodes in DNNs with activation mask.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Coarse-to-Fine Description for Fine-Grained Visual Categorization.
IEEE Trans. Image Process., 2016

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks.
J. Mach. Learn. Res., 2016

Neural Networks Models for Entity Discovery and Linking.
CoRR, 2016

Note on the perfect EIC-graphs.
Appl. Math. Comput., 2016

The USTC NELSLIP Systems for Trilingual Entity Detection and Linking Tasks at TAC KBP 2016.
Proceedings of the 2016 Text Analysis Conference, 2016

USTC at NTCIR-12 STC Task.
Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, 2016

Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Learning FOFE based FNN-LMs with noise contrastive estimation and part-of-speech features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Future Context Attention for Unidirectional LSTM Based Acoustic Model.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep Attributes Driven Multi-camera Person Re-identification.
Proceedings of the Computer Vision - ECCV 2016, 2016

2015
Cross Indexing With Grouplets.
IEEE Trans. Multim., 2015

An Attribute-Assisted Reranking Model for Web Image Search.
IEEE Trans. Image Process., 2015

Semantic-Aware Co-Indexing for Image Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Multi-order visual phrase for scalable partial-duplicate visual search.
Multim. Syst., 2015

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency.
CoRR, 2015

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models.
CoRR, 2015

Feedforward Sequential Memory Neural Networks without Recurrent Feedback.
CoRR, 2015

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks.
CoRR, 2015

Orientational Spatial Part Modeling for Fine-Grained Visual Categorization.
Proceedings of the 2015 IEEE International Conference on Mobile Services, MS 2015, New York City, NY, USA, June 27, 2015

Augmented Feature Fusion for Image Retrieval System.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Rectified linear neural networks with tied-scalar regularization for LVCSR.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Efficient indexing for large-scale image search.
Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

Multi-Task Learning with Low Rank Attribute Embedding for Person Re-Identification.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
USB: Ultrashort Binary Descriptor for Fast Visual Matching and Retrieval.
IEEE Trans. Image Process., 2014

Cascade Category-Aware Visual Search.
IEEE Trans. Image Process., 2014

Indexing heterogeneous features with superimages.
Int. J. Multim. Inf. Retr., 2014

Embedding Multi-Order Spatial Clues for Scalable Visual Matching and Retrieval.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2014

ObjectPatchNet: Towards scalable and semantic image annotation and retrieval.
Comput. Vis. Image Underst., 2014

Personalized Visual Vocabulary Adaption for Social Image Retrieval.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Superimage: Packing Semantic-Relevant Images for Indexing and Retrieval.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Improving deep neural networks for LVCSR using dropout and shrinking structure.
Proceedings of the IEEE International Conference on Acoustics, 2014

Hybrid-Indexing Multi-type Features for Large-Scale Image Search.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile Search.
IEEE Trans. Image Process., 2013

Learning attribute-aware dictionary for image classification and search.
Proceedings of the International Conference on Multimedia Retrieval, 2013

Multi-order visual phrase for scalable image search.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

Scalable mobile search with binary phrase.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

2011
Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications.
IEEE Trans. Image Process., 2011

Building descriptive and discriminative visual codebook for large-scale image applications.
Multim. Tools Appl., 2011

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval.
Comput. Vis. Image Underst., 2011

ObjectBook construction for large-scale semantic-aware image retrieval.
Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), 2011

2010
Affective Visualization and Retrieval for Music Video.
IEEE Trans. Multim., 2010

Correlation-Based Feature Selection and Regression.
Proceedings of the Advances in Multimedia Information Processing - PCM 2010, 2010

Building contextual visual vocabulary for large-scale image applications.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Building pair-wise visual word tree for efficent image re-ranking.
Proceedings of the IEEE International Conference on Acoustics, 2010

Music video affective understanding using feature importance analysis.
Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

2009
Descriptive visual words and visual phrases for image applications.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Utilizing affective analysis for efficient movie browsing.
Proceedings of the International Conference on Image Processing, 2009

2008
Personalized MTV Affective Analysis Using User Profile.
Proceedings of the Advances in Multimedia Information Processing, 2008

i.MTV: an integrated system for mtv affective analysis.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Affective MTV analysis based on arousal and valence features.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008


  Loading...