NLPs two powers: openai and gpt-2s stubborn attack

 NLPs two powers: openai and gpt-2s stubborn attack

The smart speaker in the living room is making rapid progress in the daily conversation with you, and even starts to gag with you to adapt to your hobbies and habits.

E-commerce customer service can always reply in the first time. You may have solved your problems, and you didnt realize that TA might just be an intelligent customer service. The reality version of the Turing test happens every day.

You who often look up foreign language materials may have been used to one click translation of web pages or several search engines. The quality of the translation is so good that you feel that learning foreign language is a waste of time.

When you swipe information stream or short video, you will always find that you are addicted to it for more and more time afterwards. In fact, it is the natural language algorithm platform that optimizes and recommends according to your browsing habits and attention duration.

From the results, we hope to briefly review the transition and upgrading of NLP in recent years, and trace the source along this technological torrent. To understand the evolution of NLP, we should go back to the technical source of abundant water resources and numerous water systems.

NLPs two powers: openai and gpt-2s stubborn attack

People who pay attention to NLP must know that 2018 is a big year for the development of NLP.

In June 2018, openai published a paper entitled improving language understanding generative pre training and proposed GPT based on pre training language model. It first used transformer network to replace LSTM as language model, and achieved SOTA performance in 9 of 12 NLP tasks. However, GPT has not received more attention for various reasons.

Until October, Googles Bert (bidirectional encoder representation from transformers) came out, and once it was released, it attracted wide attention from all walks of life. Bert model won the performance of SOTA in 11 NLP tasks, which made Google technicians declare that Bert has opened a new era of NLP. In fact, Bert uses the same two-stage model as GPT, first of all, unsupervised language model pre training; second, use fine tuning mode to solve the downstream tasks. The difference is that Bert uses a two-way language model similar to Elmo in the pre training stage, and uses a larger data scale for pre training.

Bert is transforming NLP downstream tasks (including sequence tagging, such as Chinese word segmentation, part of speech tagging, named entity recognition, semantic role tagging, etc.); the second category is classification tasks, such as text classification, emotional computing, etc.; sentence relationship judgment, such as entailment, QA, semantic rewriting, natural language reasoning, etc.; generative tasks, such as machine translation, text summarization, poem writing and sentence making (look at the picture, talk, etc.) above, the powerful universality and eye-catching task performance have become the basis of NLPs popularity.

Just four months later, openai released gpt-2. This large-scale unsupervised NLP model can generate coherent text paragraphs, refresh the performance of seven data sets SOTA, and complete many different language modeling tasks such as reading comprehension, question answering, machine translation without pre training.

First, CPT-2 and Bert, like GPT, continue transformers self attention as the underlying structure.

Openai researchers insistence on unsupervised data training may come from the idea that supervised learning will cause the language model to perform well only in dealing with specific tasks, but poor in generalization ability; and it is difficult to achieve task expansion effectively only by relying on the increase of training samples. Therefore, they choose to use self attention module migration learning on the basis of more general datasets to build a model that can perform many different NLP tasks in the case of zero shot.

The difference with Bert is that CPT-2 model structure still continues the one-way language model of gpt1.0. Gpt-2 seems to have only one goal: to predict the next word given all the first words in a text. We can see the solution of openai.

It chooses to expand the transformer model parameters to 48 layers, including 1.5 billion parameters of the transformer model, and finds an 8 million Web page (webtext) data set as unsupervised training data. In short, gpt-2 is a direct extension of GPT model. It trains more than 10 times of data and has 10 times more parameters. This allows gpt-2 to adopt a more direct violence approach, relying solely on improving the model parameter capacity and increasing the number of training data to exceed Bert.

Gpt-2 as a text generator, as long as you input a few words at the beginning, this program will decide how to write next according to its own judgment. In short, gpt-2, as a general language model, can be used to create AI writing assistant, more powerful dialogue robot, unsupervised language translation and better speech recognition system.

Whether ridiculed by the outside world as overconfident in their own products or deliberately hyped by OpenAI for the purpose of PR, the power of GPT-2 to deliberately create fake news really surprised the industry. When you eat the melon crowd, you cant wait to explore the powerful ability of GPT-2.

After nearly a year, gpt-2 is undergoing a dazzling update and evolution in the process of prudent open source and developers participation in tasting new products.

GPT-2 stage open source: a local carnival for developers

With the controversy and the high voice of developers, openai still chose to open source in stages due to careful consideration. After August, it released small 124 million parameter model (500 MB on disk), medium 355 million parameter model (1.5GB on disk), and 774 million parameter model (3 GB on disk) in stages. Until November 6, it officially released the full code of the largest version of gpt-2 with 1.5 billion parameters.

Until the release of the full version, openai did not find any clear evidence of code, document or other abuse, that is to say, the result of gpt-2 abuse that has been worried about has not occurred, but openai still believes that the release of the full version will also give malicious people a chance to further improve the detection evasion ability.

Therefore, with the successive release of different versions of gpt-2, openai itself communicates with multiple teams that reproduce gpt-2 models to verify the use effect of gpt-2, at the same time, it also avoids the risk of abusing the language model, and improves the detector for detecting text generation. At the same time, openai is also cooperating with a number of research institutions, such as the research on the sensitivity of human beings to digital information generated by language models, the research on the possibility of malicious use of gpt-2, and the research on the statistical detectability of text generated by gpt-2.

No matter how cautious openai is, with the release of different capacity parameter models, external developers cant wait to explore various directions.

In April 2019, buzzfeed data scientist Max Woolf used Python to encapsulate the smaller version of openaigpt-2 text generation model with 117 million super parameters to fine tune and generate scripts, and opened a gpt-2 simplified version to better help people generate a piece of text, which can give a lot of unexpected content.

In the process of openais gradual open source, two graduate students from Brown University took the lead in copying a 1.5 billion parameter gpt-2 by themselves and named it opengpt-2. In the process, they used their own code to train the gpt-2 model from scratch for about $50000. As far as possible, the datasets used also refer to the methods disclosed in openai papers. After testing, many enthusiastic netizens said that the output text effect of opengpt-2 is better than that of openais gpt-2774 million parameter version. Of course, some people think that there is no better text effect than gpt-2 model.

At the same time, in China, a developer named zeyaodu in Nanjing opened gpt-2chinese on GitHub, which can be used to write poems, news, novels and scripts, or to train common language models. The gpt-2 model, which can achieve the effect of inversion, uses 1.5 billion parameters. At present, he has opened the pre training results and the demonstration of colabdemo. With only three clicks, people can generate customized Chinese stories.

Gpt-2 model has more attempts. Rishabanand, a senior high school student in Singapore, has opened a lightweight gpt-2 client, gpt2-client, which is a wrapper of gpt-2 original warehouse. It only needs 5 lines of code to generate text.

Several researchers from China are using GPT model to produce high quality Chinese classical poetry. For example, a poem named seven laws and a safe journey mentioned in the paper: with a wild goose crossing the sky in autumn, I suddenly dream of my old friends in Qingcheng. There is no horse to go into the green forest, but a boat to return with Huang Ji. I am old in business. When will Gongqing stay in the Han Department. An de phase from the past, a bottle with drunk mountains. A light farewell, write full of vicissitudes, full of sorrow. Its not hard to wonder whether the language model really has feelings?

Gpt-2 model can also be used in music creation. Openai has developed a deep neural network for generating music works, musenet, which is the same general unsupervised technology of gpt-2 language model sparsetransformer, allowing musenet to predict the next note according to a given note group. The model can produce 4-minute music works with 10 different instruments, and can learn different music styles from Bach, Mozart, Beatles and other composers. It can also convincingly integrate different musical styles to create a new musical work.

What interests me most is an AI character adventure game Ai dungeon made by a developer through gpt-2. Through multiple rounds of text dialogue, AI can help you embark on an unexpected journey of Knight slaying the dragon or urban detective. In the future game industry, the story script created by AI may be more imaginative?

In the year of gpt-2 release, the applications brought by the above open source are enough to be called dazzling. Behind the noise and prosperity, in addition to being cautious about the risks of open source, what other challenges does openai face?

NLPs DHL: gpt-2 commercialization after openais marriage to Microsoft

In fact, from the evolution trend of Bert and gpt-2, we can see that human beings can create more and better content in line with human language knowledge by using larger capacity models and unsupervised infinite training. But it also means relying on super expensive GPU computing time, super large scale GPU machine learning cluster, and super long model training process. This means that this burn money mode will eventually make NLP players gather more to the head company and become the arena for a few local heroes.

It can be predicted that if openai launches gpt-3.0 again this year, it will probably choose one-way language model, but it will take a larger scale of training data and expansion model to work hard with Bert. The achievements in NLP application field will be refreshed again.

But on the other hand, there is no clear commercial application prospect for the development of such a burn money language training model. Openai also has to face the difficult choice of following the original intention of technical feelings or bending down for five Dou meters.

The answer should be clear. In July 2019, openai received Microsofts $1 billion investment. According to the official statement, openai will cooperate with Microsoft to jointly develop new AI technologies for Microsoft azure cloud platform, and will reach an exclusive agreement with Microsoft to further expand the large-scale AI capability and fulfill the promise of general artificial intelligence (AGI).

In fact, openais burning money in artificial intelligence research and its embarrassment in commercialization make it more in need of such a sponsorship from Microsoft. Take the gpt-2 model with 1.5 billion parameters for example. It uses 256 tpuv3 blocks for training, which costs 2048 dollars per hour. It can be predicted that if we want to expect the release of gpt-3.0, the cost will mainly be spent on computing resources in the cloud.

Microsoft will become the exclusive cloud computing provider of openai. Openais AI technology also needs to be exported through azure cloud. In the future, openai will license some technologies to Microsoft, and then Microsoft will commercialize these technologies and sell them to partners.

The support of this huge expense has given openai a more solid foundation. As summarized above, gpt-2 will continue to release parameter models of different levels step by step after August, and will be fully open-source in November. Obviously, gpt-2 can make more use of the support of Microsoft azure in the direction of commercialization in the future. For example, it can better cooperate with office365 in the future, participate in the automatic text writing of office assistance, participate in the repair of grammatical errors, and establish a more natural and real Q & a system.

Once young love dream, just want to fly forward. AgIs vision also needs to be reflected in business practice. It can be predicted that in 2020, Google will face the combination of Microsoft and openai, which will bring more waves to NLP commercialization.