Sampo Pyysalo

Orcid: 0000-0002-6279-5000

Affiliations:
  • University of Cambridge, Department of Theoretical and Applied Linguistics, UK


According to our database1, Sampo Pyysalo authored at least 120 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature.
Database J. Biol. Databases Curation, January, 2024

A Survey of Large Language Models for European Languages.
CoRR, 2024

Poro 34B and the Blessing of Multilinguality.
CoRR, 2024

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.
CoRR, 2024

A New Massive Multilingual Dataset for High-Performance Language Technologies.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Building Question-Answer Data Using Web Register Identification.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Register identification from the unrestricted open Web using the Corpus of Online Registers of English.
Lang. Resour. Evaluation, September, 2023

The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.
Nucleic Acids Res., January, 2023

Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations.
Database J. Biol. Databases Curation, 2023

Toxicity Detection in Finnish Using Machine Translation.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

Scaling Data-Constrained Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023


Silver Syntax Pre-training for Cross-Domain Relation Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Towards better structured and less noisy Web data: Oscar with Register annotations.
Proceedings of the Eighth Workshop on Noisy User-generated Text, 2022

2021
The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.
Nucleic Acids Res., 2021

Explaining Classes through Word Attribution.
CoRR, 2021

Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases.
CoRR, 2021

Deep learning for sentence clustering in essay grading support.
CoRR, 2021

WikiBERT Models: Deep Transfer Learning for Many Languages.
Proceedings of the 23rd Nordic Conference on Computational Linguistics, 2021

Fine-grained Named Entity Annotation for Finnish.
Proceedings of the 23rd Nordic Conference on Computational Linguistics, 2021

Deep learning for sentence clustering in essay grading support.
Proceedings of the 14th International Conference on Educational Data Mining, 2021

Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 2021

2020
Towards Fully Bilingual Deep Language Modeling.
CoRR, 2020

Dependency parsing of biomedical text with BERT.
BMC Bioinform., 2020

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

A Broad-coverage Corpus for Finnish Named Entity Recognition.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Turku Enhanced Parser Pipeline: From Raw Text to Enhanced Graphs in the IWPT 2020 Shared Task.
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, 2020

The birth of Romanian BERT.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

From Web Crawl to Clean Register-Annotated Corpora.
Proceedings of the 12th Web as Corpus Workshop, 2020

2019
Multilingual is not enough: BERT for Finnish.
CoRR, 2019

A neural classification method for supporting the creation of BioVerbNet.
J. Biomed. Semant., 2019

LION LBD: a literature-based discovery system for cancer biology.
Bioinform., 2019

Toward Multilingual Identification of Online Registers.
Proceedings of the 22nd Nordic Conference on Computational Linguistics, NoDaLiDa 2019, Turku, Finland, September 30, 2019

Neural Dependency Parsing of Biomedical Text: TurkuNLP entry in the CRAFT Structural Annotation Task.
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019

Biomedical Named Entity Recognition with Multilingual BERT.
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019

CRAFT Shared Tasks 2019 Overview - Integrated Structure, Semantics, and Coreference.
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019

2018
Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches.
BMC Bioinform., 2018

Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.
BMC Bioinform., 2018

2017
A neural network multi-task learning approach to biomedical named entity recognition.
BMC Bioinform., 2017

Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer.
Bioinform., 2017

Fully Delexicalized Contexts for Syntax-Based Word Embeddings.
Proceedings of the Fourth International Conference on Dependency Linguistics, 2017


2016
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.
Bioinform., 2016

Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance.
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 2016

Typed Entity and Relation Annotation on Computer Science Papers.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Universal Dependencies v1: A Multilingual Treebank Collection.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Attending to Characters in Neural Sequence Labeling Models.
Proceedings of the COLING 2016, 2016

Cancer Hallmark Text Classification Using Convolutional Neural Networks.
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining, 2016

Deep Learning with Minimal Training Data: TurkuNLP Entry in the BioNLP Shared Task 2016.
Proceedings of the 4th BioNLP Shared Task Workshop, BioNLP 2016, 2016

How to Train good Word Embeddings for Biomedical NLP.
Proceedings of the 15th Workshop on Biomedical Natural Language Processing, 2016

2015
Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.
BMC Bioinform., December, 2015

Universal Dependencies for Finnish.
Proceedings of the 20th Nordic Conference of Computational Linguistics, 2015

Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality.
Proceedings of the 20th Nordic Conference of Computational Linguistics, 2015

SETS: Scalable and Efficient Tree Search in Dependency Graphs.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Towards Universal Web Parsebanks.
Proceedings of the Third International Conference on Dependency Linguistics, 2015

Sharing annotations better: RESTful Open Annotation.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
Generalising semantic category disambiguation with large lexical resources for fun and profit.
J. Biomed. Semant., 2014

Anatomical entity mention recognition at literature scale.
Bioinform., 2014

2013
Wide coverage biomedical event extraction using multiple partially overlapping corpora.
BMC Bioinform., 2013

BioCause: Annotating and analysing causality in the biomedical domain.
BMC Bioinform., 2013

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text.
Bioinform., 2013

Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013.
Proceedings of the BioNLP Shared Task 2013 Workshop, Sofia, 2013

Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013.
Proceedings of the BioNLP Shared Task 2013 Workshop, Sofia, 2013

Overview of BioNLP Shared Task 2013.
Proceedings of the BioNLP Shared Task 2013 Workshop, Sofia, 2013

2012
Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.
BMC Bioinform., 2012

Event extraction across multiple levels of biological organization.
Bioinform., 2012

brat: a Web-based Tool for NLP-Assisted Text Annotation.
Proceedings of the EACL 2012, 2012

New Resources and Perspectives for Biomedical Event Extraction.
Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 2012

PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations.
Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 2012

Bridging the Gap Between Scope-based and Event-based Negation/Speculation Annotations: A Bridge Not Too Far.
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, 2012

2011
Extracting Bio-molecular Events from literature - the BioNLP'09 Shared Task.
Comput. Intell., 2011

U-Compare bio-event meta-service: compatible BioNLP event extraction services.
BMC Bioinform., 2011

Towards mature use of semantic resources for biomedical analyses.
J. Biomed. Semant., 2011

An analysis of gene/protein associations at PubMed scale.
J. Biomed. Semant., 2011

Event extraction for DNA methylation.
J. Biomed. Semant., 2011

Ontology design patterns to disambiguate relations between genes and gene products in GENIA.
J. Biomed. Semant., 2011

BioNLP Shared Task 2011: Supporting Resources.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation.
Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, 2011

Overview of the Entity Relations (REL) supporting task of BioNLP Shared Task 2011.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

Towards Exhaustive Event Extraction for Protein Modifications.
Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, 2011

Overview of the Epigenetics and Post-translational Modifications (EPI) task of BioNLP Shared Task 2011.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

From Pathways to Biomolecular Events: Opportunities and Challenges.
Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, 2011

Overview of BioNLP Shared Task 2011.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

2010
Improving the Inter-Corpora Compatibility for protein Annotations.
J. Bioinform. Comput. Biol., 2010

A Re-Evaluation of Biomedical Named Entity-Term Relations.
J. Bioinform. Comput. Biol., 2010

Entities, relations, events: representing biomolecular semantics.
BMC Bioinform., 2010

Medie and Info-pubmed: 2010 update.
BMC Bioinform., 2010

Event extraction on PubMed scale.
BMC Bioinform., 2010

Complex event extraction at PubMed scale.
Bioinform., 2010

Applying ontology design patterns to the implementation of relations in GENIA.
Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine, 2010

Evaluating Dependency Representations for Event Extraction.
Proceedings of the COLING 2010, 2010

Towards Event Extraction from Full Texts on Infectious Diseases.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010

Event Extraction for Post-Translational Modifications.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010

A Comparative Study of Syntactic Parsers for Event Extraction.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010

Integration of Static Relations to Enhance Event Extraction from Text.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010

Scaling up Biomedical Event Extraction to the Entire PubMed.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010

2009
Matrix representations, linear transformations, and kernels for disambiguation in natural language.
Mach. Learn., 2009

Towards automated processing of clinical Finnish: Sublanguage analysis and a rule-based parser.
Int. J. Medical Informatics, 2009

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application.
Int. J. Medical Informatics, 2009

Investigating heterogeneous protein annotations toward cross-corpora utilization.
BMC Bioinform., 2009

Learning to Extract Biological Event and Relation Graphs.
Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009

Static Relations: a Piece in the Biomedical Information Extraction Puzzle.
Proceedings of the BioNLP Workshop, BioNLP@HLT-NAACL 2009, 2009

Incorporating GENETAG-style annotation to GENIA corpus.
Proceedings of the BioNLP Workshop, BioNLP@HLT-NAACL 2009, 2009

Overview of BioNLP'09 Shared Task on Event Extraction.
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, BioNLP@HLT-NAACL 2009, 2009

2008
Comparative analysis of five protein-protein interaction corpora.
BMC Bioinform., 2008

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.
BMC Bioinform., 2008

A Graph Kernel for Protein-Protein Interaction Extraction.
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, 2008

2007
BioInfer: a corpus for information extraction in the biomedical domain.
BMC Bioinform., 2007

On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA.
Proceedings of the Biological, translational, and clinical language processing, 2007

2006
Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.
Int. J. Medical Informatics, 2006

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches.
BMC Bioinform., 2006

Regular Approximation of Link Grammar.
Proceedings of the Advances in Natural Language Processing, 2006

2005
Regularized Least-Squares for Parse Ranking.
Proceedings of the Advances in Intelligent Data Analysis VI, 2005

Kernels Incorporating Word Positional Information in Natural Language Disambiguation Tasks.
Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, 2005

2004
Ontology-Based Feature Transformations: A Data-Driven Approach.
Proceedings of the Advances in Natural Language Processing, 4th International Conference, 2004

Extracting Protein-Protein Interaction Sentences by Applying Rough Set Data Analysis.
Proceedings of the Rough Sets and Current Trends in Computing, 2004

Analysis of Link Grammar on Biomedical Dependency Corpus Targeted at Protein-Protein Interactions.
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004


  Loading...