Hong Kong Cantonese Text Corpus – Word Segmentation and POS

Collecting from news or daily chat corpus, and performing word segmentation and part-of-speech tagging.
Specifications:
ID:
King-NLP-172
Language:
Cantonese
Size
300000 entries
Accuracy Rate
The accuracy of the labeling results is 95%

People also searched for

Chinese Casual Chat Corpus
Casual Chat Data, collecting 8 million daily questions or single chat sentences for large model training and subsequent question-answering generation.
Hong Kong POI Dataset with Cantonese Pinyin Labeling
Collect Hong Kong Cantonese corpus, including place names and other information, and perform POI tagging and pinyin labeling.
English-Arabic Parallel Corpus
Daily data in English and Arabic, parallel corpus dataset
English-Spanish Parallel Corpus
Daily data in English and Spanish, parallel corpus dataset

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.