DATAOCEAN AI is proud to be a Llama 2 Launch Partner, empowering large models with high-quality training datasets. As supporters of statement of support for ’s Open Approach to Today’s AI, DATAOCEAN AI’s Chief Operating Officer, Ke Li, and Chief Technology Officer, Yukai Huang, encourage such open-source approach, “We support an open innovation approach to AI. Responsible and open innovation gives us all a stake in the AI development process, bringing visibility, scrutiny and trust to these technologies. Opening today’s Llama models will let everyone benefit from this technology.”
https://about.fb.com/news/2023/07/llama-2-statement-of-support/
Meanwhile, DATAOCEAN AI officially announced the “Chinese 10-Million-Rounds Conversation Corpus DOTS-NLP-216” for LLM research.
Dataset introduction:
The natural conversations in line with Chinese natural habits collected under real scenes will bring new momentum to the Chinese Large Language Model (LLM). On the basis of security compliance, the dataset provides better performance and robustness for large models, helping enterprises to build high-quality generative AI applications with ease. This datasets covers multiple scenarios, such as work, life, in campus, and as well as finance, education, entertainment, sports, auto, technology fields.
Dataset Advantages:
· Multiple rounds of conversational datasets in Chinese: in line with Chinese natural habits, natural conversations collected under real scenes
· Ultra-large scale: hundreds of millions of tokens
· Easy to Train: finished, complete dataset
· For Commercial use : can be authorized for commercial use
Samples:
Contact us for more information of DOTS-NLP-216 :contact@dataoceanai.com
Learn more about DOTS-NLP-216 : Chinese Multi-Turn Dialogue Corpus – DataoceanAI