Tesseract Open Source OCR Engine (main repository)
updated at Nov. 17, 2024, 11:50 a.m.
A natural language parser for validating complex date ranges
updated at Nov. 16, 2024, 8:58 a.m.
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
updated at Nov. 16, 2024, 8:44 a.m.
Ruby gem to calculate the similarity between texts using tf*idf
updated at Nov. 15, 2024, 12:04 p.m.
Ruby library for interfacing with FANN (Fast Artificial Neural Network)
updated at Nov. 15, 2024, 6:23 a.m.
REST client for Google APIs
updated at Nov. 14, 2024, 2:29 p.m.
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
updated at Nov. 14, 2024, 3:51 a.m.
Ruby bindings to the Stanford Core NLP tools (English, French, German).
updated at Nov. 13, 2024, 10:27 a.m.
Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.
updated at Nov. 12, 2024, 3:42 p.m.