keyvalue
id20030361
namepython-ucto
full_nameproycon/python-ucto
html_urlhttps://github.com/proycon/python-ucto
descriptionThis is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
created_atMay 21, 2014, 5:28 p.m.
updated_atJan. 12, 2024, 6:04 p.m.
pushed_atOct. 31, 2023, 3:07 p.m.
size62
stargazers_count29
watchers_count4
forks_count4
open_issues5
languageCython
awesome_list

https://github.com/josephmisiti/awesome-machine-learning