pynlpl in josephmisiti/awesome-machine-learning

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

updated at Sept. 3, 2024, 7:28 a.m.

Python

31 +0

479 +0

67 +0

GitHub
python-timbl in josephmisiti/awesome-machine-learning

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.

updated at Sept. 10, 2024, 11:53 p.m.

Python

4 +0

18 +0

3 +0

GitHub
python-ucto in josephmisiti/awesome-machine-learning

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

updated at Sept. 12, 2024, 2:02 p.m.

Cython

4 +0

29 +0

5 +0

GitHub
python-frog in josephmisiti/awesome-machine-learning

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

updated at Sept. 12, 2024, 2:02 p.m.

Cython

6 +0

47 +0

10 +0

GitHub
colibri-core in josephmisiti/awesome-machine-learning

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

updated at Oct. 26, 2024, 9:19 a.m.

C++

12 +0

124 +1

20 +0

GitHub