NLP

Search our off-the-shelf datasets.

Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More
Chinese Casual Chat Corpus
Casual Chat Data, collecting 8 million daily questions or single chat sentences for large model training and subsequent question-answering generation.
Hong Kong POI Dataset with Cantonese Pinyin Labeling
Collect Hong Kong Cantonese corpus, including place names and other information, and perform POI tagging and pinyin labeling.
English-Arabic Parallel Corpus
Daily data in English and Arabic, parallel corpus dataset
English-Spanish Parallel Corpus
Daily data in English and Spanish, parallel corpus dataset
English-Russian Parallel Corpus
Daily data in English and Russian, parallel corpus dataset
English-Vietnamese Parallel Corpus
Daily data in English and Vietnamese, parallel corpus dataset
English-Japanese Parallel Corpus
Daily data in English and Japanese, parallel corpus dataset
English-Hindi Parallel Corpus
Daily data in English and Hindi, parallel corpus dataset
Chinese-English Parallel Corpus
Daily data in Chinese and English, parallel corpus dataset

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.

Filter by
Filter by
Language
Filter by Languages
Language
Devices
Devices
Applicable Fields
Applicable Fields
More
Applicable Scenarios
Applicable Scenarios
More