Anoop Kunchukuttan
Orcid: 0009-0007-3143-9875
According to our database1,
Anoop Kunchukuttan
authored at least 87 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models.
CoRR, 2024
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
CoRR, 2024
RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization.
CoRR, 2024
Findings of WMT 2024's MultiIndic22MT Shared Task for Machine Translation of 22 Indian Languages.
Proceedings of the Ninth Conference on Machine Translation, 2024
CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.
Trans. Mach. Learn. Res., 2023
Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
CoRR, 2023
CoRR, 2023
Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.
CoRR, 2023
Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages.
Proceedings of the IEEE International Conference on Acoustics, 2023
DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023
Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.
Trans. Assoc. Comput. Linguistics, 2022
CoRR, 2022
CoRR, 2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 9th Workshop on Asian Translation, 2022
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
CoRR, 2021
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.
CoRR, 2021
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Proceedings of the 8th Workshop on Asian Translation, 2021
Proceedings of the 8th Workshop on Asian Translation, 2021
2020
AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages.
CoRR, 2020
Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent.
CoRR, 2020
Proceedings of the Fifth Conference on Machine Translation, 2020
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020
iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
Proceedings of the 28th International Conference on Computational Linguistics, 2020
Proceedings of the 7th Workshop on Asian Translation, 2020
2019
Trans. Assoc. Comput. Linguistics, 2019
Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Proceedings of the 6th Workshop on Asian Translation, 2019
2018
Trans. Assoc. Comput. Linguistics, 2018
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018
Proceedings of the 32nd Pacific Asia Conference on Language, 2018
NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers.
Proceedings of the 32nd Pacific Asia Conference on Language, 2018
Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT.
Proceedings of the 32nd Pacific Asia Conference on Language, 2018
Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018
2017
Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages.
CoRR, 2017
Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017
Learning variable length units for SMT between related languages via Byte Pair Encoding.
Proceedings of the First Workshop on Subword and Character Level Models in NLP, 2017
Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation.
Proceedings of the 4th Workshop on Asian Translation, 2017
2016
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016
Proceedings of the Tutorial Abstracts, 2016
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016
IIT Bombay's English-Indonesian submission at WAT: Integrating Neural Language Models with SMT.
Proceedings of the 3rd Workshop on Asian Translation, 2016
2015
Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015
Proceedings of the 12th International Conference on Natural Language Processing, 2015
Investigating the potential of post-ordering SMT output to improve translation quality.
Proceedings of the 12th International Conference on Natural Language Processing, 2015
Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization.
Proceedings of the 12th International Conference on Natural Language Processing, 2015
Data representation methods and use of mined corpora for Indian language transliteration.
Proceedings of the Fifth Named Entity Workshop, 2015
2014
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the 11th International Conference on Natural Language Processing, 2014
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 2014
2013
IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction.
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, 2013
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013
2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the Workshop on Reordering for Statistical Machine Translation@COLING 2012, 2012