nlp-datasets by niderhoff

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

updated at May 12, 2024, 9:52 a.m.

Unknown languages

231 +0

5,648 +2

955 -2

GitHub
Awesome-Chinese-NLP by crownpku

A curated list of resources for Chinese NLP 中文自然语言处理相关资料

updated at May 12, 2024, 4:18 a.m.

Unknown languages

389 +0

7,682 +9

1,706 +1

GitHub
awesome-nlp by keon

book A curated list of resources dedicated to Natural Language Processing (NLP)

updated at May 11, 2024, 6:53 p.m.

Unknown languages

609 +0

16,065 +36

2,557 -1

GitHub
awesome-hungarian-nlp by oroszgy

A curated list of NLP resources for Hungarian

updated at May 8, 2024, 7:47 p.m.

Unknown languages

19 +0

208 +1

18 +0

GitHub
awesome-nlp-polish by ksopyla

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

updated at May 6, 2024, 5:58 p.m.

Unknown languages

28 +0

279 +1

34 +0

GitHub
awesome-danish by fnielsen

A curated list of awesome resources for Danish language technology

updated at May 6, 2024, 12:06 p.m.

Unknown languages

16 +0

150 +1

18 +0

GitHub
id-nlp-resource by kmkurn

A list of Indonesian NLP resources.

updated at May 2, 2024, 3:08 a.m.

Unknown languages

15 +0

267 +0

48 +0

GitHub
norwegian-nlp-resources by web64

Norwegian NLP Resources

updated at April 19, 2024, 7:28 p.m.

Unknown languages

21 +0

168 +0

14 +1

GitHub
berts by dbmdz

DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models

updated at April 3, 2024, 7:42 p.m.

Unknown languages

15 +0

155 +0

12 +0

GitHub
awesome-community-curated-nlp by alvations

Community Curated NLP List

updated at Feb. 7, 2024, 7:55 p.m.

Unknown languages

20 +0

195 +0

33 +0

GitHub
OpinionSpam by hdaSprachtechnologie

German Opionion Spam Corpus

updated at Oct. 22, 2023, 10:31 a.m.

Unknown languages

0 +0

2 +0

0 +0

GitHub
german-elmo-model by t-systems-on-site-services-gmbh

This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.

updated at Jan. 27, 2023, 11:56 a.m.

Unknown languages

3 +0

28 +0

1 +0

GitHub