Nakamasa Inoue

Orcid: 0000-0002-9761-4142

According to our database1, Nakamasa Inoue authored at least 84 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis.
CoRR, 2024

Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering.
CoRR, 2024

CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information.
CoRR, 2024

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Pseudo-Outlier Synthesis Using Q-Gaussian Distributions for Out-of-Distribution Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

PolarDB: Formula-Driven Dataset for Pre-Training Trajectory Encoders.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cubic Knowledge Distillation for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Formula-Supervised Visual-Geometric Pre-training.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rethinking Image Super-Resolution from Training Data Perspectives.
Proceedings of the Computer Vision - ECCV 2024, 2024

Scaling Backwards: Minimal Synthetic Pre-Training?
Proceedings of the Computer Vision - ECCV 2024, 2024

Augmenting Pass Prediction via Imitation Learning in Soccer Simulations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Target Propagation by Deriving Analytical Solution.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Text-Guided Object Detector for Multi-modal Video Question Answering.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scale-space Tokenization for Improving the Robustness of Vision Transformers.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Pre-training Vision Transformers with Very Limited Synthesized Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Parameter Efficient Transfer Learning for Various Speech Processing Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Step restriction for improving adversarial attacks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning with Partial Forgetting in Modern Hopfield Networks.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Fixed-Weight Difference Target Propagation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Pre-Training Without Natural Images.
Int. J. Comput. Vis., 2022

Spatiotemporal Initialization for 3D CNNs with Generated Motion Patterns.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

PoF: Post-Training of Feature Extractor for Improving Generalization.
Proceedings of the International Conference on Machine Learning, 2022

Downstream Augmentation Generation For Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Replacing Labeled Real-image Datasets with Auto-generated Contours.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Can Vision Transformers Learn without Natural Images?
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network.
IEICE Trans. Inf. Syst., 2021

Can Vision Transformers Learn without Natural Images?
CoRR, 2021

Learning VAE with Categorical Labels for Generating Conditional Handwritten Characters.
Proceedings of the 17th International Conference on Machine Vision and Applications, 2021

Disentangling Latent Groups Of Factors.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Formula-driven Supervised Learning with Recursive Tiling Patterns.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Teacher-Assisted Mini-Batch Sampling for Blind Distillation Using Metric Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Augmentation-Agnostic Regularization for Unsupervised Contrastive Learning with Its Application to Speaker Verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Graph Grouping Loss for Metric Learning of Face Image Representations.
Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, 2020

Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Video Understanding of Character Relationships in Movies.
Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

Closed-Form Pre-Training for Small-Sample Environmental Sound Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Optimizing Speaker Embeddings using Meta-Training Sets.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Quasi-Newton Adversarial Attacks on Speaker Verification Systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
VANT at TRECVID 2018.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Few-Shot Adaptation for Multimedia Semantic Indexing.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Task Autoencoder for Noise-Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition.
Proceedings of the British Machine Vision Conference 2018, 2018

2017
Cross-view human action recognition from depth maps using spectral graph sequences.
Comput. Vis. Image Underst., 2017

TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classiers.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

User adaptation of convolutional neural network for human activity recognition.
Proceedings of the 25th European Signal Processing Conference, 2017

Multimodal speech recognition using mouth images from depth camera.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A unified network for multi-speaker speech recognition with multi-channel recordings.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

TokyoTech at TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Adaptation of Word Vectors using Tree Structure for Visual Semantics.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Tokyo Tech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Graph regularized implicit pose for 3D human action recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
TokyoTech at TRECVID 2015.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Vocabulary Expansion Using Word Vectors for Video Semantic Indexing.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Combining Audio Features and Visual I-Vector @ MediaEval 2015 Multimodal Person Discovery in Broadcast TV.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014
TokyoTech-Waseda at TRECVID 2014.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Event Detection by Velocity Pyramid.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

n-gram Models for Video Semantic Indexing.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Spectral Graph Skeletons for 3D Action Recognition.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Reusing Speech Techniques for Video Semantic Indexing [Applications Corner].
IEEE Signal Process. Mag., 2013

q-Gaussian mixture models for image and video semantic indexing.
J. Vis. Commun. Image Represent., 2013

Event detection in consumer videos using GMM supervectors and SVMs.
EURASIP J. Image Video Process., 2013

TokyoTechCanon at TRECVID 2013.
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors.
IEEE Trans. Multim., 2012

TokyoTechCanon at TRECVID 2012.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Multimedia event detection using GMM supervectors and SVMS.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

q-Gaussian Mixture Models Based on Non-extensive Statistics for Image and Video Semantic Indexing.
Proceedings of the Computer Vision, 2012

2011
TokyoTech+Canon at TRECVID 2011.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

2010
TT+GT at TRECVID 2010 Workshop.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

High-Level Feature Extraction Using SIFT GMMs and Audio Models.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009
TITGT at TRECVID 2009 Workshop.
Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009


  Loading...