Revitalizing the Smart Devices Market: The Potential of Large Language Models

Blog

15 9 月, 2023

In the past one year, large language models have been in full swing.

Smartphones and other smart devices are also catching up with the trend of large language models, aiming to share the market.

However, the smart devices market has been shrunk. One aspect of the decline is that the demand in 2021 has increased significantly, and the potential consumption in 2022 has been overdrawn in advance, resulting in a decline in demand; on the other hand, the emergence of large language models has also made smart devices less smart.

In fact, it is not difficult to see that consumers will no longer purchase their mobile phones for the increase of tens of thousands of pixels, or an improvement in appearance.

The mobile phone market needs smarter and more attractive AI functions to stimulate the demand, and the same is true for smart devices. Recognizing a few keywords in speech is not enough to satisfy consumers’ understanding and demand for intelligence.

The rise of large language models could be an revitalization of smart devices.

Large language models active smart devices

Mobile phone manufacturers are rapidly promoting the development of end-side AI large language models applications, which will bring revolutionary changes to smart terminals.

At the end of July, Apple released dozens of recruitment positions for large language models research, planning to apply large language models into future iPhone/iPad and other products.

While some companies have made progress in the large language models, Apple also joins in the competition. As a smart devices manufacturer, the join of Apple brings the new era.

These developments will lead to a revolution in smart terminals in the future. By natural dialogue, users will be able to control apps with smart assistants driven by large language models. As the AI technology expands to scenarios such as home and vehicle, a “super terminal” may appear in the future.

How large language models improve the smart devices

In the mobile Internet era, AI apps have speraded to various devices, including IoT terminals such as laptops, smart phones, VR/AR devices, and even smart vehicles.

The voice assistant built into the mobile phone is also regarded as a new entry to the AIoT ecology. Although in the past due to the limited intelligence of voice assistants, there were deficiencies in human-computer interaction, but with the rise of large language models, many people believe that the new opportunities have been coming.

At present, smart phone manufacturers are laying their groundwork in the field of large language models, starting mainly with voice assistants. To apply large language models technology in more scenarios and achieve market penetration, it is necessary to leverage end-side collaboration.

In most IoT scenarios, deploying algorithms in the cloud can face problems such as high latency, lack of personalized applications, and high cost. AI applications on the device side can make up for these shortcomings by no longer relying on cloud computing, which allows large language models to adapt to more latency-demanding scenarios.

In addition, the adaptability of large language models to specific domains is also crucial. General-purpose large language models like ChatGPT may not be suitable to wake up smart devices, as they may not be adequately equipped to handle many of the specialized terms and high-frequency vocabulary used in device wake-up commands.

One solution is to fine-tune large language models using the data of specific domain, such as smart device conversation data, which can equip the models with sufficient domain knowledge. Thus, make the models enable to provide precise services to specific scenarios.

To fine-tune data, precise data labeling and specialized collection are required. DataOcean AI can provide TTS (Text-to-Speech) data in the ‘American English Female Speech Synthesis Corpus (Natural Conversational Style)’ for model fine-tuning with WeSpeaker or WavLM. We hope that these speech models can be adapted to conversation scenarios of smart devices.

Learn More Datasets

American English Female Speech Synthesis Corpus (Natural Conversational Style)

>>>Learn More

The synthetic data comes from a 30-year-old female voice actor with a gentle and kind voice in a professional recording studio (background noise <18dB(A)). The voice actor makes 2-3 recordings per week as part of a total recording cycle of 1 month, and the recorded content comes from daily dialogues.

Learn more : American English Female Speech Synthesis Corpus

Share this post