German corpus based on CommonCrawl
Tools for Analysing Research Data
Lectures for University of Maryland class on computational linguistics.
GitHub topics &
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface.
An introductory book (2nd edition).
JSON formatted Pan-Romance word lists.
(CommonCrawl)
(NER exploration)
Early linguistical analysis and natural language processing library for Haxe.
German Reference Corpus
Language Science Press is a born-digital scholar-led open access publisher in linguistics.
sampled sentences in different languages.
A list of resources for conservation, development, and documentation of low resource (human) languages.
, webservice via WebLicht
General natural language tools for Node.js.
The book from the NLTK package.
The most complete platform for building Python programs to work with human language data.
GitHub topics &
big german internet corpus
Snowball is a language in which stemming algorithms can be easily represented.
Industrial-strength National Language Processing in Python.
Various stemming algorithms from snowball.
Nice alternative for spacy (see above).
The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter.
CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name.
A utility for finding Typo-Bridges.
Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling.
An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages.