The 100 billion market broke out, but the snake game was also staged.

Image source @ vision china

Titanium media note:This article comes from WeChat WeChat official account Yiou Automobile (ID: EO-AUTO), written by Guo Huaiyi, and published by Titanium Media with authorization.

If the artificial intelligence represented by the big model will decide the automatic driving, then who will decide the big model? Undoubtedly, the answer is — — Data.

By 2030, the global auto-driving data labeling market will be six times larger than it is now, from $2.1 billion to $12.75 billion, approaching 100 billion yuan. For the big model and the autonomous driving industry that desperately need data support, data service providers are welcoming historic opportunities.

However, it seems that the story of teaching disciples and starving masters is being staged in the data service industry. The higher the quality and quantity of data provided by data services, the faster the growth of large models, and the greater the challenge that artificial intelligence labeling poses to manual labeling.

A snake-eating game has begun.

"You need a pixel, a pixel, and circle it along the outline of the vehicle."

In 2022, some media personally went to the driverless car data labeling platform to experience the work of the annotator for a few days. Unlike the outside world’s imagination of simply drawing a circle to complete the labeling, "pixel-level" labeling requires the staff to accurately outline the target outline.

If it is to mark lidar data, the staff will work harder, and some operation manuals have more than 60 pages. Therefore, labeling will consume a lot of time. In 3 to 6 hours, a novice can only mark the data in 40 pictures. Even a professional labeling worker needs at least four hours to label 50 pictures.

Although the labeling work is tedious and hard, for intelligent electric vehicles, especially the autonomous driving industry, data labeling and sorting is an essential part.

At the end of 2023, He Xiaopeng, Chairman and CEO of Xpeng Motors, once said that the automobile category will gradually evolve from software-defined cars to the era of artificial intelligence (AI) defining cars. AI is redefining the technical framework of intelligent electric vehicles and the business model of car companies, and will become the necessary core competence of successful car companies.

In He Xiaopeng’s judgment, AI will be the decisive factor for the future development of automobile industry. So, what will determine the development of AI? The answer is: data.

Zhou Yuefeng, president of Huawei’s data storage product line, once said: "In the era of big model, data determines the height of AI intelligence."

In fact, major car companies and autonomous driving companies have been collecting data for a long time and training autonomous driving based on end-to-end big model. According to He Xiang, a data intelligence scientist at Millimeter Zhixing, in the end-to-end autonomous driving development process, data will account for more than 80% of research and development costs.

Therefore, the outbreak of the autonomous driving data service industry has a solid market foundation. According to the prediction of Research and Markets, a third-party research organization, by 2030, the global auto-driving data labeling market will be expanded six times, from $2.1 billion to $12.75 billion.

In this context, China’s autonomous driving data service industry is also exploding with the rapid growth of China’s smart car industry.

In 2022, the global autonomous driving industry collectively entered the cold winter because of the prospect of commercial landing. The traditional industry leader Mobileye stock index once shrunk by more than half, and Argo, a self-driving startup that Volkswagen and Ford jointly invested billions of dollars, went bankrupt directly.

However, when Open AI released ChatGPT based on Generative Transformer Language Model (LLM), the smart driving industry seemed to usher in the dawn of the corner. Li Xiang, the founder and CEO of LI, immediately caught the opportunity brought by ChatGPT and came to the conclusion that "the path of AI technology is becoming clear".

In August 2023, Musk personally launched a live broadcast of Tesla’s fully autonomous driving capability (hereinafter referred to as FSD)Beta V12, which triggered millions of people to watch online. It is reported that FSD Beta V12 is the end-to-end autopilot system that Tesla has ever harvested.

"Tesla’s technical route is actually the same as Open AI’s ChatGPT." Earlier, Deng Zhidong, a professor of computer science in Tsinghua University and director of the Visual Intelligence Research Center of Tsinghua University Institute of Artificial Intelligence, said in an exclusive interview with Yiou Automobile that we should strive to use the big language model to empower China’s autonomous driving industry.

In order to quickly build their own end-to-end autonomous driving ability, major car companies have invested heavily in research and development. At the same time, the importance of autonomous driving data has naturally risen. After all, without high-quality autopilot data, it is impossible to train an end-to-end autopilot model.

Moreover, because of the special scene of autopilot data, car companies have higher requirements for the quality of data, which puts higher technical requirements on related enterprises. Data labeling service provider — Lin Qunshu, CEO of Integer Intelligence, has publicly stated that because domestic car companies are benchmarking Tesla’s data closed-loop solution, data service providers need a special automated labeling platform, professional labeling tools and a complete set of solutions if they want to serve this scenario.

Secondly, the complexity of automatic driving scene in China is far more than that in Europe and America, and the corresponding data labeling and sorting difficulty and data volume are also greatly increased. Executives of companies that have been labeled once told the media that overseas customers only need to label people and obstacles, but domestic customers often require to label all the details on the road with high accuracy.

According to Qi Zhi, CEO of Totoro Data, the reason why domestic car companies have such high requirements for data service providers is that the quality of data labeling determines the key to the success or failure of each OEM in the autonomous driving competition. Once the quality is not up to standard, it will be overturned. Now all OEMs can’t afford this time.

Finally, because all major car companies have their own set of data standards, this leads to repeated data labeling. Even the same road data should be marked according to the standards of different enterprises, and the business volume of autonomous driving data service providers is naturally more.

It is precisely because of the above factors that major car companies are increasing their capital investment in the field of data labeling. According to media reports, many domestic OEMs have directly increased the investment budget for data labeling in 2023 from one million to tens of millions.

With the outbreak of industry demand, autonomous driving data service companies have also completed a number of financing in the last two years.

According to the incomplete statistics of Yiou Automobile, since 2020, 12 autopilot data service providers have completed different levels of financing. Nine of them completed the latest round of financing in 2022 and 2023.

Among them, Haitian Ruisheng landed in the Science and Technology Edition in August 2021. As of the close of March 5, Haitian AAC has a market value of 3.907 billion yuan. It is worth noting that as a head data service provider in the field of artificial intelligence in the United States, the valuation of the Silicon Valley unicorn Scale AI has reached 7.3 billion US dollars, about 52.536 billion yuan.

With the large-scale model deeply empowering autonomous driving, the market prospect of data service providers is further recognized by the market. However, the development of big models and autonomous driving industry is also challenging data service providers.

Eat more food, grow into a bigger body, and finally be destroyed by yourself. As a classic mobile game, the logic of the snake has long been known to the outside world.

With the development of the big model, data service providers seem to be faced with the logic and outcome of the snake game. The better the quality and quantity of data provided by service providers, the higher the maturity of large models. On the other hand, the more mature the big model, the more likely it is to complete the automatic annotation of data, thus replacing the role of data service provider.

In 2023, Zhang Hongjiang, an academician of the American Academy of Engineering and former chairman of Zhiyuan Research Institute, said in a speech about the big model that with the progress of the algorithm, the data level has changed very obviously. From manual annotation to open data set sharing, it has now developed into automatic data annotation and in-depth research, which is a reality in the domestic data annotation field.

In the process of communicating with a number of autonomous driving companies, Yiou Automobile also found that using AI to label data has been widely used.

"Now, the ability of the big model is already very strong. We can find an open source and powerful big model to improve the efficiency of data annotation." A senior executive of a self-driving startup told Yiou Automobile that Tesla’s labeling team had more than a thousand people in the past, but now with the help of a big model, we don’t need so many people at all.

In April 2023, at the Q1 financial report meeting of Hikvision, some investors had asked similar questions. In this regard, Hikvision said: "With the same human input, the number of data labels can be increased by 10 times." If we understand Hikvision in reverse, that is to say, with the help of a large model, only 10% of the previous manpower is needed to complete the same workload.

Larry, the director of Shang Tang Jueying products, also said in an interview with the media that at present, most of the labels on which the main model training of Jueying intelligent driving in Shang Tang relies have adopted large-scale automatic labeling technology, and the automatic labeling and semi-automatic labeling (using manual sampling quality inspection) have basically replaced manual labeling, and the cost and time period have been greatly reduced.

In the face of the big model that I "fed", where is the future of manual labeling?

"I am an absolute supporter of automatic tagging." Zhao Jie, CEO of Boden Intelligence, once said that although he supports the automatic labeling of artificial intelligence, automatic labeling does not mean that no one marks it. He made an analogy. Now a factory with an automated production line is not an unmanned factory.

Algorithm engineer, a self-driving startup, also told Yiou Auto that AI tagging is still replacing primary tagging, and some more complicated projects can’t be completed without manual work, or need manual cooperation with AI.

"The entire data service market will be reshuffled." Wang Xiaodong, CEO of Haitian Ruisheng, once said that the arrival of the big model era will make enterprises with weak R&D capabilities and few resources be quickly eliminated, and the concentration of the data service market will be further enhanced. It can be seen that in the face of the challenge from artificial intelligence labeling, data service providers must adjust themselves in time to adapt to the data labeling business in the big model era.  

Although faced with the challenges brought by artificial intelligence, manual labeling will still exist at this stage. Coupled with the rapid development of autonomous driving and large model industry, the scale of data service market will further grow.

Qi Zhi, CEO of Totoro Data, predicted that the window of opportunity would not be fully released until 2030.

However, the development of data service industry still faces many challenges. In addition to artificial intelligence labeling, the lack of data protection means is also a problem that the industry must face. There have been media reports that the head of an AI factory once said that in China, the data you can buy with money can also be bought by others. Whoever pays for high-quality data can get it at low cost, and vice versa.

Among many autonomous driving companies contacted by Yiou Automobile, many companies choose to label and sort out the core data within the company instead of handing it over to a third party.

Therefore, how to protect the rights and interests of related enterprises in the process of data processing, so as to maximize the energy of professional autopilot data service providers, is a problem that the whole industry must face together and urgently need to solve.