zpar by frcchang

ZPar statistical parser. Universal language support (depending on the availability of training data), with language-specific features for Chinese and English. Currently support word segmentation, POS tagging, dependency and phrase-structure parsing.

created at June 30, 2015, 1:55 p.m.

C++

13 +0

133 -1

33 +0

GitHub
python-zpar by EducationalTestingService

A python wrapper around the ZPar parser for English.

created at Sept. 8, 2014, 1:41 p.m.

Python

21 +0

48 +0

19 +0

GitHub
python-frog by proycon

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

created at Sept. 7, 2014, 8:32 p.m.

Cython

6 +0

47 +0

12 +0

GitHub
python-ucto by proycon

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

created at May 21, 2014, 5:28 p.m.

Cython

4 +0

29 +0

5 +0

GitHub
pynlpl by proycon

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

created at July 6, 2010, 11:42 a.m.

Python

32 +0

477 +0

67 +0

GitHub
rosetta by columbia-applied-data-science

Tools, wrappers, etc... for data science with a concentration on text processing

created at Nov. 3, 2013, 4:13 p.m.

Jupyter Notebook

22 +0

206 -1

47 +0

GitHub
nut by pprett

Natural language Understanding Toolkit

created at Oct. 14, 2010, 8:08 a.m.

C

8 +0

119 +0

25 +0

GitHub
genius by duanhongyi

a chinese segment base on crf

created at Aug. 20, 2013, 6:54 a.m.

Python

26 +0

235 +0

65 +0

GitHub
snownlp by isnowfy

Python library for processing Chinese text

created at Nov. 26, 2013, 11:46 a.m.

Python

348 +0

6,350 +6

1,363 +1

GitHub
jieba by fxsjy

结巴中文分词

created at Sept. 29, 2012, 7:52 a.m.

Python

1,285 +0

32,562 +32

6,706 +1

GitHub
yalign by machinalis

A sentence aligner for comparable corpora

created at Aug. 26, 2013, 3:46 p.m.

Python

16 +0

126 +0

31 +0

GitHub
quepy by machinalis

A python framework to transform natural language questions to queries in a database query language.

created at Dec. 3, 2012, 3:46 p.m.

Python

96 +0

1,253 +0

298 +1

GitHub
Detectron by facebookresearch

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

created at Oct. 5, 2017, 5:32 p.m.

Python

942 +0

26,157 +5

5,450 -1

GitHub
dockerface by natanielruiz

Face detection using deep learning.

created at June 5, 2017, 10:25 p.m.

Dockerfile

12 +0

189 +0

32 +0

GitHub
face_recognition by ageitgey

The world's simplest facial recognition api for Python and the command line

created at March 3, 2017, 9:52 p.m.

Python

1,566 +0

52,031 +39

13,318 +15

GitHub
vigra by ukoethe

a generic C++ library for image analysis

created at July 6, 2011, 8:34 a.m.

C++

42 +0

405 +1

190 +0

GitHub
scikit-image by scikit-image

Image processing in Python

created at July 7, 2011, 10:07 p.m.

Python

186 -1

5,901 +0

2,205 +5

GitHub
prediction-builder by denissimon

A library for machine learning that builds predictions using a linear regression.

created at June 8, 2015, 4:52 p.m.

PHP

8 +0

110 +0

13 +0

GitHub
jieba-php by fukuball

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

created at April 23, 2015, 8:29 a.m.

PHP

55 +0

1,309 +1

258 +0

GitHub
data-science-ipython-notebooks by donnemartin

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

created at Jan. 23, 2015, 7:38 p.m.

Python

1,619 +0

26,549 +5

7,740 +7

GitHub