NLP

Search our off-the-shelf datasets.

Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More
France Email Corpus
French TN Corpus
German TN Corpus
Germany Email Corpus
High-Quality Coding Q&A Corpus
This dataset supports AI training in code comprehension, debugging, and complex logic reasoning, enabling applications such as automated code generation, technical documentation assistants, and intelligent programming tutors.
HK Cantonese Polyphone
HK Cantonese Text Corpus with POS Tagging
Hong Kong Cantonese Text Corpus – Word Segmentation and POS
Collecting from news or daily chat corpus, and performing word segmentation and part-of-speech tagging.
Hong Kong POI Dataset with Cantonese Pinyin Labeling
Collect Hong Kong Cantonese corpus, including place names and other information, and perform POI tagging and pinyin labeling.

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by
Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More