Accurate and efficient general-purpose boilerplate detection for crawled web corpora.
Lang. Resour. Evaluation, 2017
CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison.
Proceedings of the 10th Web as Corpus Workshop, 2016
On Bias-free Crawling and Representative Web Corpora.
Proceedings of the 10th Web as Corpus Workshop, 2016
A High-Order Discontinuous Galerkin Discretization with Multiwavelet-Based Grid Adaptation for Compressible Flows.
J. Sci. Comput., 2015
Adaptive multiresolution discontinuous Galerkin schemes for conservation laws.
Math. Comput., 2014
Focused Web Corpus Crawling.
Proceedings of the 9th Web as Corpus Workshop, 2014
Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, ISBN: 978-3-031-02152-7, 2013
Scalable Construction of High-Quality Web Corpora.
J. Lang. Technol. Comput. Linguistics, 2013
Building Large Corpora from the Web Using a New Efficient Tool Chain.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Adaptive Gain Modulation in V1 Explains Contextual Modifications during Bisection Learning.
PLoS Comput. Biol., 2009
Perceptual Learning via Modification of Cortical Top-Down Signals.
PLoS Comput. Biol., 2007