Llama 2 Global Partner DATAOCEAN AI announced LLM datasets -DOTS-NLP-216

News
August 8, 2023

DATAOCEAN AI is proud to be a Llama 2 Launch Partner, empowering large models with high-quality training datasets. As supporters of statement of support for ’s Open Approach to Today’s AI, DATAOCEAN AI’s Chief Operating Officer, Ke Li, and Chief Technology Officer, Yukai Huang, encourage such open-source approach, “We support an open innovation approach to AI. Responsible and open innovation gives us all a stake in the AI development process, bringing visibility, scrutiny and trust to these technologies. Opening today’s Llama models will let everyone benefit from this technology.”

https://about.fb.com/news/2023/07/llama-2-statement-of-support/

Meanwhile, DATAOCEAN AI officially announced the “Chinese 10-Million-Rounds Conversation Corpus DOTS-NLP-216” for LLM research.

Dataset introduction:
The natural conversations in line with Chinese natural habits collected under real scenes will bring new momentum to the Chinese Large Language Model (LLM). On the basis of security compliance, the dataset provides better performance and robustness for large models, helping enterprises to build high-quality generative AI applications with ease. This datasets covers multiple scenarios, such as  work, life, in campus, and as well as finance, education, entertainment, sports, auto, technology fields.

Dataset Advantages:
· Multiple rounds of conversational datasets in Chinese: in line with Chinese natural habits, natural conversations collected under real scenes
· Ultra-large scale: hundreds of millions of tokens
· Easy to Train: finished, complete dataset
· For Commercial use : can be authorized for commercial use

Samples:

Contact us for more information of DOTS-NLP-216 :contact@dataoceanai.com

Learn more about DOTS-NLP-216 : Chinese Multi-Turn Dialogue Corpus – DataoceanAI

Share this post

Related articles

cover
Dataocean AI: An Expert in Content Moderation for a Safe and Reliable Network Environment
WX20240929-172037@2x
Dataocean AI New Datasets - September
cn cover1
Chinese Continuous Visual Speech Recognition Challenge Workshop 2024 Has Concluded Successfully

Join our newsletter to stay updated

Thank you for signing up!

Stay informed and ahead with the latest updates, insights, and exclusive content delivered straight to your inbox.

By subscribing you agree to with our Privacy Policy and provide consent to receive updates from our company.