How did the most expensive computer bug ever cost Intel billions of dollars

category:Internet
 How did the most expensive computer bug ever cost Intel billions of dollars


But back in the 1990s, Intel lost billions of dollars due to floating-point failures.

To find the root of this story, lets play a logic exercise:

ThereissomethIngwrongwiththissentence.

Did you make any mistakes when typing the sentence above? Is it obvious? Will it affect your understanding?

You may have noticed the capital letter I in something. It may be a mistake we make in our lives. Now, imagine if this is the only spelling or grammar error on the entire site (though its not). Maybe you copied the site a few times and corrected the I, so its not capitalized in the word. However, on the version of the original website, the error still exists. Now imagine if millions of people were searching for every phrase I shared on the site, more people found the error, and one of them was an influential editor.

Although this mistake is very small, it is enough to threaten the reputation of a writer. This is the same situation that nicoly discovered unconsciously when she got her new Pentium processor in October 1994. When looking for the Brent constant, he used the Intel processor and its floating-point function, and realized that the answer given by the processor was a little biased.

For the vast majority of people, this small mistake will not be noticed. After all, this is not the end of the world. But its an exception for nicoly, because it destroys his research and creates a lot of problems in his equation. In an interview with Usenet on CNN in 1994, nicoly told the story of the legend:

Nicolys 60MHz Pentium chip is the culprit, and it took him months to correctly diagnose the problem, which is due to the CPU. For nicoly and other mathematicians, solving such problems is a headache. But even a simple mistake like this is enough to damage the high-profile Pentium chips reputation in the extremely technical field of mathematics.

After nicoly reported the error on CompuServe on October 30, 1994, it became one of the first stories to really spread through the Internet. Just a few days later, someone was in usenet comp.sys.intel A message about this problem was posted on the group, confirming the floating point error. Since then, the story has been reported in the news by eetimes, the engineering industry publication, and the story has spread all over the country. In my opinion, Pentium on the 60-90 MHz model is just a floating-point split to single precision, wrote Terje Mathisen, a Norwegian programmer

From then on, the story began to attract the attention of engineering and mathematical spaces. But the real problem may be that Intel made a bigger mistake in responding to this problem, a commercial mistake.

By the end of November and the beginning of December, the story began to attract attention and became one of the biggest technology stories of 1994. This year, the Internet began to enter the mainstream for the first time, though in a slightly embarrassing way. However, it is worth noting that although this problem is due to a chip design error, the real problem lies in Intel processing. Simply put, Intels best users dont get the respect they deserve.

The year when the mathematician u00c9 Mie Borel first proposed the infinite monkey theorem - a famous theory that if a million monkeys typed 10 hours a day on a million typewriters, they would eventually write a great literary work. In many ways, the problem that nicoly stumbles upon is the same thing in academia.

Intel Pentium chip. Image: krzysztoburghardt / wi ki share

For Intel, the problem is not the problem itself, but how to deal with it. Going back to the previous case, what would you do if you received an email from a very good editor telling you that you had accidentally misspelled a capital letter throughout the site?

Maybe youll find the mistake quietly and fix it. But Intel didnt do that. Like the correction in the paper, the error in the chip is basically decisive. (well, unless you use something like a programmable array.) The best way they can do this is to eliminate this error in future versions. Although software can alleviate this problem to some extent, if there is a problem with the chip, it cannot be completely corrected unless the chip is replaced. To put it bluntly, floating-point error is bad, but on the macro, it is small.

Its like having a calculator that gives you a wrong answer all the time you use it. According to modern processors, recent processors have been plagued by meltdown and spectre (the former has been included in most Intel, powerarchitecture and ARM chip designs released over the past 20 years), and these defects are generally very destructive.

They are not theoretical issues, they are basic security risks. The solution to the problems found by fixing these two defects means that hardware and software manufacturers have to turn off some functions of the processor, causing peoples computers to run slower. In some uses, such as cloud computing, such a change actually means that using the same processor will cost you more money and time to perform the same work. In addition to the repetitive damage, Intel is still trying to fix these defects.

So what about floating-point errors? Thomas nicoly, the scholar who discovered it, said that although it was a problem for him, the computer processor at that time was very complicated. This means that the problem might not have been discovered if he had not explored it. The current generation of microprocessors has become so complex that it is impossible to debug a processor completely. He told PC Magazine in early 1995. But Intel can certainly handle things better. As mathematicians and MATLAB developers Cleve moler recalled in 2013, Intels initial response to customers had a lot to improve.

Recently, there is a lot of communication on the Internet about the floating point defects of Pentium processor. For most users, this is not a problem.

The thing is, Intel has detected a subtle flaw in the accuracy of division operations on Pentium processors. For very few cases (once in 9 billion times), the accuracy of the results will be reduced. In the process of continuous testing of Pentium processor, Intel found this tiny defect after several trillion floating-point operations. Intel immediately tested the most stringent technology applications that use floating-point units, and we havent found any errors for months. In fact, after a lot of testing and shipping of millions of Pentium based systems, as far as we know, there is only one instance of reporting that affects users. In this case, when a mathematician was doing a theoretical analysis of prime and reciprocal numbers, he saw that the accuracy of the ninth digit to the right of the decimal point was reduced.

In fact, extensive engineering tests have shown that an average spreadsheet user may encounter a minor defect that reduces accuracy every 27000 years of use. Based on these empirical observations and our extensive testing, users using conventional software will not be affected. If you have questions about the accuracy of prime number generation or other complex mathematical problems, please call 1800628-8686 (International) or 916356-3551. If not, your Pentium processor based system will not experience any problems. In the use of computers, once such a situation occurs, Intel will work with customers to solve.

As I pointed out above, Intel is very vulnerable to the million monkey problem. The point of this passage is that Intel knew about it before nicoly could help, and it was a bit of a let go. Part of the reason why this is problematic is that the focus has shifted from the technology community who cares about it to the ordinary consumers who dont care about it. Intel tried to put all the pressure on consumers and launched the Intel inside brand campaign that year, as well as the consumer friendly (and trademark friendly) brand of Pentium chips. But in trying to win over the average audience, Intel seems to imply that it is no longer taking its existing user base seriously.

For those who focus on technology applications, the floating-point division situation creates uncertainty, and Intels response is not satisfactory. In an article in the Wall Street Journal in 1994, jet propulsion laboratory researcher Dave bell made it clear that the scientific community might stop using Pentium because of its obsession with chips.

Andy Grove, the late Intel chief executive, died in 2016. Image: Intel free media

Finally, at the end of 1994, Andy Grove, Intels CEO, was on social media comp.sys.intel A response was posted on. Its not going well, especially for so-called technical people like grove. At first, Intels Richard Wirt posted the post, leading to accusations that the response was fake. Then grove sent it out in his own name, stressing that he attached great importance to the problem, and pointed out that the problem did not appear on their side until more than a year after the initial release of the processor. Weve delayed the launch of the chips by a few months to allow more time to inspect the chips and systems, he wrote, stressing that no chip is perfect. To this end, we have also worked extensively with many software companies. In response, it takes a better stance than customer service information that infuriates many technical users. But if you look at this post, you will find that grove still has a lot of angry posts to deal with.

It really pisses me off. I spent a lot of money on this chip, one respondent wrote, but Im nothing because I dont do a lot of complicated math work for big companies that might wholesale Pentium products. This kind of dynamic may be similar to the bad words you see on twitter today. About after Thanksgiving in 1994, Groves news spread on the Internet, and the mainstream media began to cover it, and the companys stock was also greatly affected.

The timing was bad in many ways: 1994 was the first year for many families to bring home multimedia and Internet enabled home computers, and many families adopted Pentium chips. These products have been clearly marked as something that ordinary consumers can buy. Grove, on the second day of Black Friday, had to try to ease the concerns of technology users and academics. The mainstream media actually played down the legend of the chip to the public. Some take advantage of this, such as IBM, which was releasing its first PowerPC machines to the public, removing Pentium chips from its devices and publicly claiming that ordinary consumers experience errors every 24 days, rather than 27000 years. Maybe the truth is somewhere in the middle? This is not a good time for Intel. Public relations crisis finally saw its inevitable end. On Christmas Eve, Intel saw signs of failure and recalled the chips. Brens constant pulled the trigger.

Intels recall of the Pentium chip will have to recalculate the cost of the cancellation. After all, any consumer who wants to replace a new processor will get a new version of the processor. Despite the cost, Intel saw rapid growth in demand for its 486 and Pentium processors during the festival in 1994. Maybe extra news is a good thing. To be destroyed by a great crisis; to improve the company

What would you say about the legendary time of Pentiums floating point division flaw? Intel has found a great way to turn this crisis into an important learning moment. This is highlighted by the companys decision to turn the infamous broken chip into a key link. Among them, a quote from Grove is to remind employees that they are not perfect and that they need to learn from their mistakes.

Even if it has defects, Pentium processor has become one of the most important technologies released in the 1990s. It upgrades the CPU from a simple component hidden in a box to a household name, and successfully achieves the goal. And it can be said that although some technical users are not happy, this legendary event has enhanced the companys image among ordinary computing users, which is also the goal of the companys emphasis on brand building. It is worth mentioning that this legendary event actually made a mathematician famous, which is not usually possible.

Before the deadly equation that cost Intel $500 million, Thomas nicolys best-known work was a chess and card game that predicted football van tesey. He admitted that he did not foresee it. Mathematicians generally have a very private life, he told the associated press at the height of the scandal. I feel embarrassed to see my name on print.

Maybe one of the side effects of the scandal was that it changed his life for a time. He added a little more precision to the Brent constant.

Source: big data digest editor in charge: Mao Xinsi_ NBJS11624