On her daughters birthday, a friend gave Silan an an intelligent speaker, which is a popular fashion of mainstream brands in the market, with small square boxes and low prices. Silans interest in novelty was so general that she laid it out in the living room, but her 6-year-old daughter fell in love with the speaker and was always entangled with it to tell stories.
Gradually, Sran became fond of the smart speaker. Its really a baby artifact. With joy, she started browsing related products on Taobao and planned to buy a style with higher configuration and better sound quality.
Until the day a few months ago, Sran inadvertently opened the mobile app connected to the smart speaker, but unexpectedly found that the text recorded in it was the transcription of the content he had just chatted with his husband. To her surprise, the conversation took place after her daughter heard the story. In theory, the speaker was dormant, and no voice should be collected, let alone the content should be transmitted to the mobile phone and converted into text.
Has he been eavesdropping on our conversation? Doubts surfaced in Srans mind. Family members also have a fear of intelligent speakers, the purchase plan of the new machine naturally ran aground. For the existing speakers, Sran chose power off, Daughter likes to listen to stories, hold a meeting when listening, and pull out the power after listening. In the last four or five months, they have all used it this way.
The speaker being monitored
The first well-known eavesdropping of smart speakers occurred in Oregon, USA.
In May 2018, Danielles husband received a call from a subordinate: Pull out the plug of your Echo device immediately, youre attacked by hackers! Danielle lives in Portland, Oregon, and owns four Amazon smart speaker Echo devices. Earlier in the day, her husbands subordinate received an audio file and opened it, but heard a private conversation between Danielle and her husband at home. The couple were discussing which brand of hardwood floor to use.
Shocked, Danielle unplugged all Echo devices and quickly dialed Amazons customer service phone for an explanation. Meanwhile, she told CBS about the incident.
Amazons response to the accident was misoperation, meaning that at runtime, the Echo device misinterpreted the content of a conversation as an instruction, assuming that the user wished to send the previous voice content to someone in the address book, and then executed the instruction.
Echo is Amazons smart speaker with its voice assistant Alexa. By mid-2018, Echo had shipped about 35 million units in the United States; according to CIRP forecasts, the market share of Echo reached 70%, far surpassing other brands.
Head product accident, news quickly and widely spread and fermentation. Soon afterwards, Echos second accident appeared again. A German user revealed to the local magazine ct that when he asked Amazon to send voice data for his personal activities, he received a 100MB compressed file for download, a PDF classification record explaining Alexas voice commands, and 1,700 recordings of strangersconversations.
Ct listened to some of the recordings and found that according to the dialogue content, the details of life that can be patched together include: time at home and out, other brands of smart devices at home, the gender of family members, and even the sound of users bathing.
Although Amazon has apologized for both of these incidents, it has failed to hide a growing speculation in public opinion that, as an emerging device, the eavesdropping of smart speakers may not only be a hidden danger, but also a real existence. Does it mean that the smart speaker is listening to us anytime and anywhere? Silan was so suspicious.
In recent months, more eavesdropping incidents related to smart devices are being exposed. In July, according to foreign media reports, an Apple contractor said that in order to enhance Siris product capabilities, Apple would hire external contractors to listen to recordings, including private dialogues, such as medical information, drug dealing and other information, that Siri had accidentally activated.
Coincidentally, in the same month, news came out that Google Intelligent Assistants would provide voice files recorded to company employees, and even third-party Google contractors around the world could regularly listen to these conversations.
The doubts about smart speakers and voice assistants embedded in various devices are spreading, not only eavesdropping, but also the occasional self-startup phenomenon of smart speakers has stimulated some users. Since last year, users have said that when Echo was not awakened, there was a ha ha laughter, which was horrifying.
Similar phenomenon also appears on some domestic intelligent speakers. One user disclosed that the smart speaker placed in the home had repeatedly reported that the device is undergoing system upgrade and has updated ** applications, although it is normal content, but there is no other person in the family, the speaker suddenly spoke, every time it frightened me. On one occasion, when she invited her friends to visit her home and talked with each other very happily, the smart speaker was suddenly woken up and played a song Killer by Lin Junjie for the public without warning.
With the increase of eavesdropping accidents, some users suspect that their own speakers with screens have the function of looking at home. Since they can broadcast live scenes at home remotely, will they record these images and transmit them to other places at the same time?
There are more and more doubts about the new product of smart speaker. From Is it listening to me to: Does it sound when it sleeps? Will these dialogues be stored and transmitted after the reception? Are these voices really heard? And will it be attacked by hackers and turned into a bug?
Rumors and Truth
In the past year, many friends around me will come to ask me questions about monitoring before they buy smart speakers, said Zhang Sicheng. He has worked in the intelligent speaker Department of many companies, and is regarded as an industry expert by his friends. Whats interesting is that after asking, almost everyone bought a speaker.
According to Zhang Sicheng and several practitioners familiar with intelligent speakers, the recognition work of intelligent speakers can be divided into local and cloud end. When the intelligent speakers are not awakened, they work locally. Although external voices are recorded, they will not be stored and semantically recognized. Before waking up, its equivalent to doing sound recognition work, Xu Jiaming said. Comparing the recorded sound with the wake-up word, the sound will automatically open when it matches the wake-up word. Xu Jiaming is an intelligent speaker product manager.
Zhang Sicheng denied the rumor of eavesdropping. According to his understanding, none of the mainstream domestic intelligent speakers has subjective intentional eavesdropping.
Its a very expensive thing, Zhang Sicheng said. He calculates this: Suppose a company sells 1 million speakers and has 200,000 live days. If the company wants to start these speakers for 24-hour monitoring, even if it generates 100 kilograms of data per second multiplied by 200,000, the cumulative cost of transmission bandwidth, storage and calculation is quite astonishing.
More importantly, under the current technological processing capacity, enterprises can not convert these huge and fragmented recordings into effective information of commercial value. In Zhang Sichengs view, even if we do not consider the moral level and only look at business interests, enterprises have no motivation to do subjective information collection.
According to Zhang Sicheng, in a smart speaker detection work led by the Ministry of Industry and Information Technology last year, in the wake-up state, the amount of data transmitted by each smart speaker is only KB level, which is almost negligible for voice data.
Compatible with the rumor of eavesdropping is the intelligent speaker working mode after wake-up words.
Zhang Sicheng and Xu Jiaming both acknowledge that the speaker will enter the cloud working state after being waked up, and transmit the collected voice to the cloud server to complete the speech semantic recognition and feedback work. This is unavoidable, Zhang Sicheng said with some helplessness. At present, the built-in computing power of intelligent speakers can not support AI class speech semantic computing, let alone improve the recognition ability locally.
To avoid network failures and privacy issues, Zhang Sichengs company has offered voice solutions that operate only locally in some custom-built house-wide intelligences. However, this will make the functionality very single, supporting only fixed commands, for example, when the host comes home, he can tell the voice assistant to turn on the light, but if it is replaced by turn on the light, it will not be recognized.
According to the product strategy of smart speakers, when the user terminates the command, if no new voice appears within seconds, the machine will resume its dormant state. Each brand has different settings, some within 3 seconds and some within 5 seconds, Xu revealed. However, in practice, due to the limited maturity of intelligent speakers, there may be errors in both wake-up and sleep. For example, if a voice happens to resemble a wake-up word, or if there are other sounds at the end of the command that make the smart speaker think it needs to continue to work, it will continue to radio, which the user does not know. According to his speculation, the so-called eavesdropping incidents that many users encounter, including Sran, are due to such reasons.
According to many practitioners, the ideal false alarm rate in the smart speaker industry is about 2 times every 48 hours, and worse, 2-3 times every 24 hours, which undoubtedly means that the so-called eavesdropping frequency under the misoperation is higher. For each brand, the most important thing is to improve AI ability and reduce misoperation. The corpus collected is the best training material. Xu Jiaming mentioned.
In April, a Bloomberg survey showed that thousands of Amazon staff around the world were responsible for listening to and checking user-Alexa dialogues and tagging, checking and feedback the recordings to reduce misoperation and help Alexa better respond to instructions. Two Amazon employees in Romania mentioned that they needed to work nine hours a day, parsing up to 1,000 audio pieces.
This is not a secret in the industry, said Zhang Sicheng. It is not only a foreign brand, but also a manual hearing link in several mainstream smart speaker brands in China. In order to protect usersprivacy as far as possible, the recordings will be desensitized and dispersed before being listened to manually. Although employees will hear the recordings and conversations, even involving private affairs, they can not identify the specific identity of users. In the cloud process, the audio file itself will not correspond to user account information, equipment information, mainly to optimize instructions. A mainstream smart speaker manufacturer in China responded.
Less than 1% of the total corpus audited artificially is mainly focused on identifying difficult content. For example, when the speaker answers,I dont understand what youre talking about, the content before this sentence will be preferred to be artificially audited, Zhang explained. In his previous company, when some new functions came online, the hearing rate of certain specific corpus would increase to about 10% in order to improve its accuracy; however, the duration of this kind of work is very short, often after a few days to tackle key problems, the normal proportion will be restored. Xu Jiaming also believes that with the improvement of AI model recognition ability, the proportion of enterprises using artificial auditing may be reduced.
The corpus collected by the smart speaker will not be stored permanently. The speaker manufacturer said that the audio files will be deleted after the recognition is completed. Every family keeps documents for different periods of time. Were here for about a few months. Xu Jiaming added.
No place to hide
Undoubtedly, intelligent speakers and other voice assistant products are not yet a mature category.
This makes such products have many loopholes, such as false awakening, and then hacker attack. Last August, Tencents security team successfully cracked Amazons Echo in 26 seconds at Defcon, a global hacker conference held in Las Vegas, USA. It remotely controlled the designated device, enabled the device to record automatically in silence without waking up or prompting, and sent the recorded files to the remote server through the network. u3002
When 2300 speakers, one intelligent speaker is physically attacked, other intelligent speakers can be placed into the back door by hackers through non-contact attacks in the LAN, and become hackersremote eavesdroppers. Shortly after the cracking time, Tencent security expert Wu Huiyu said in a speech. Of course, after Tencent submitted these vulnerabilities, Amazon has completed this part of the repair and update.
On the other hand, it is precisely because of its short rise time and low maturity that intelligent speakers have not formed any BLACK-GREY industrial chain so far. Audio corpus has been given a very strict level of confidentiality in the enterprise. Zhang Sicheng disclosed that in his company, the work related to recording will be completed within the company. Although some of the identification work with lower level of confidentiality will be outsourced due to limited personnel, outsourcing personnel will also be required to come to the company to complete the identification work.
In the domestic market, we havent heard of any enterprises reselling corpus, nor heard of successful eavesdropping cases. Similarly, as far as I know, smart speakers will not use the corpus to form a panoramic picture for each user. Zhang Sicheng affirmed, In the final analysis, intelligent speakers are still awkward, the cost of extracting effective information is too high, I personally feel that in the next three to five years, there is no need to worry about the privacy problems brought by speakers.
But like other practitioners, he does not deny that all these not yet happened situations will happen in the future of more mature technology.
As a practitioner in this new industry, Zhang Sicheng has been able to accept the problem that technology and privacy are difficult to balance. In the Internet of Things and AI era, we have no privacy and no hiding place. Even without smart speakers, through mobile phones and computers, everyones information, preferences, habits and other information has long been. In essence, its not different from what companies have.
Unless in the future of more powerful computing power, all intelligent products will operate locally, all will be disconnected, and only occasionally will they be connected when the system is updated. Zhang Sicheng believes that, for ordinary people, this assumption of high-tech difficulty and low commercial value is too far-fetched and unrealistic.
Faced with these anxieties, some people choose to stay away from smart speakers. A technician said that he had completely cut off the smart speakers in his home and no longer had plans to buy other smart homes; Zhang Sicheng had quietly accepted that he had bought three or four smart speakers to put in his home, originally for work testing, and was used to their existence.
On the edge of technology to detect privacy, Zhang Sichengs bottom line is no harm. He put the smart speaker in the living room and the hall, so that even if some voice information is leaked, it will not cause substantial harm to him and his family. The sound range of the smart speaker is about 3 to 5 meters, it is difficult to collect the partition wall, the bedroom can hardly hear, when there is a private topic, you can also unplug the power to speak again.
What he could not accept was the leak of images. I would never buy a speaker with a camera or other products with a camera to put in the bedroom. He clearly realized that once the images were leaked, it would be an irreparable major hazard: more than one practitioner revealed that the online camera equipment would indeed bring the images. Return to the server, these data will be strictly confidential, but there is still a theoretical risk of leakage.
You cant hide yourself, so you can only use the most basic method to protect yourself - this is Zhang Sichengs theory.
However, some people also hold a more optimistic attitude. Intelligent speakers are in the initial stage of barbaric growth, extending to the entire smart home, will go through these initial stages, when privacy protection can only rely on manufacturersself-discipline, Xu Jiaming firmly believes, When these products are thoroughly popularized, there will be higher. Privacy standards at different levels emerge to unify industries, restrict permissions and enforce them as compulsory standards.
(Silan, Zhang Sicheng and Xu Jiaming are aliased names in this article)
Source: All-weather Science and Technology Responsible Editor: Wang Fengzhi_NT2541