Huawei launched the AI poet Yuefu: Tang poetry and Song poetry are not in the words

category:Internet
 Huawei launched the AI poet Yuefu: Tang poetry and Song poetry are not in the words


Original Quantum Bits 2019-09-0812:09:11

Qianming came from Aofei Temple

Quantum Bit Reporting | Public QbitAI

There may be nothing wrong with science students in literature and art.

Dont believe it? Look at this seven-word cliche:

Some netizens said after reading:

Really, rhyme, artistic conception and connotation are all praised.

Not only can I write poems, but also can I write lyrics, such as this song Man Jianghong:

Moreover, Tibetan head poems can also be written:

Can you imagine that this is the masterpiece of a technical student who does not know how to write poetry at all?

But it is.

These poems come from the newly launched poem AI Yuefu by Huawei Noah Ark Laboratory.

Where it came out has aroused a lot of attention.

Some people praise its works:

Poems with rich connotations, neat and interesting, and powerful programming, give praise to developers

Others engage in business, saying:

Sai Yan Jiang south, several home books Haibei Lian. Mo Dao Zhenghong had no tears and worked hard to Yanran every year. I dont believe that this AI is not as good as the average level of the Chinese Department of Peking University.

Some even say that Li Bai will be silent when he sees it, and Du Fu will shed tears when he sees it.

Of course, some people point out the following problems:

Its neat, but most of them are still at the syntax level, not at the semantics level. A little soul is missing.

There are also truth emperors who come out and say:

Xin Qijis flowing prose allusions and Lao Dus methods of depression and frustration are difficult for AI to learn. The problem is not that AI is too strong, but that readers cant see the more sophisticated techniques in metrical poems.

To answer these questions, Liu Qun, chief scientist of speech semantics at Huawei Noah Ark Laboratory, also answered questions on Weibo, revealing many stories behind this AI:

In fact, we do not understand poetry, nor do we use the rules of poetry to train the system, which is learned by the system itself.

So how did this AI learn? The paper has been published.

The Literature and Art of Polytechnic Men Originated from GPT

Unlike freely generated texts, the generation of ancient Chinese poetry is a challenge, which usually needs to meet the requirements of form and content.

There are various forms of ancient Chinese poetry, such as Wuju, Qiju, Wulu, Qilu, Manjiang Hong, Xijiang Moon, Shui Diao Getou, and their associations, each of which has its own rules on the number of words, rhyme, level and antithesis.

Although the content is simple, the requirements are more elusive: a poem should be launched around a theme, and the content should be coherent.

Unlike most current solutions, Huaweis Yuefu system does not require manual rules or features, nor does it design any additional neuron components.

In the whole research, what we need to do is to serialize the training poems into formatted text sequences as training data.

Then, by sampling the language model token, we can generate poems that satisfy the requirements of form and content, such as quatrains, rhymed poems, words, and couplets.

Moreover, they also proposed and implemented a method of fine-tuning the model to generate Tibetan head poetry.

The energy behind this comes from GPT, a pre-trained natural language model proposed by OpenAI. The core idea is to use unlabeled text to train and generate language model, and then fine-tune the model through labeled data according to specific tasks.

Yuefu AI is the first poetry system based on GPT, and it is closely related to the BERT proposed by Google.

The whole GPT model is based on the source code of BERT. The configuration of Transformer size is the same as that of BERT-Base. The tokenization script and Chinese vocab issued in BERT are also used.

Specifically, the process of training the model of poetry generation is as follows:

There are two stages in the whole model training process: pre-training and fine-tuning.

Huaweis GPT model is pre-trained with a Chinese news corpus, and then fine-tuned by collecting publicly available ancient Chinese poetry.

As shown in the figure above, the example poems are first converted into formatted sequences. The sequence consists of three main parts: format, theme and poetry style, separated by identifiers.

In couplets, because there is no theme, the first sentence is the theme and the second action text. Therefore, when generating couplets, it becomes the mode of giving up and generating down couplets, which also conforms to the habit of pairing.

The overall data set is not small. There are 235 million sentences in the pre-training Chinese news corpus. The microinvocation dataset contains 250,000 quatrains and lawyers, 20,000 words and 700,000 pairs of couplets.

Pre-training was completed on Huawei Cloud. Four echos were trained using eight Yingda V100 (16G) GPUs, which took 90 hours.

The process of fine-tuning is to input all the poetry sequences into Transformer and train an autoregressive language model. The objective is to maximize the probability of observing any sequence:

The process of fine-tuning does not require a very long time. If the training is too long, the model will be in the process of generation, and will tend to use the original sentence directly from the corpus.

After the training, the format and theme of the generated poems are transformed into an initial sequence, then the initial sequence is input into the model, and the remaining fields of the poem style are decoded according to token.

In the decoding process, instead of using hard constraints to ensure the correctness of the format, the model automatically assigns commas and periods to specific locations. When token is recognized as EOS, the decoding process ends.

Moreover, truncated Top-k sampling strategy is used to obtain different poems, rather than bundle search. Specifically, each time a Token is sampled, the Token with the maximum probability of Top-k is selected first, and then a specific token is sampled from Top-k Token.

They say that even with a truncated Top-k sampling strategy, the resulting poems are still in the right form.

In the paper, it is introduced that the method of training Tibetan head poetry is the same, but the method is different when formatting the sequence: replacing the original theme of a poem with the combination of the first character in each line: Five-character Quick Sentence (Format) Bed Doubt Lift Down (Tibetan head poetry) Bed Bright Moonlight, Doubt... Month, bow your head and think of your homeland.

Whats the effect? Huawei also made a full presentation in the paper. For example, the following four Jiang Shang Tianjia, only one of which was written by Tang poets, and the other three capitals were from Yuefu AI.

From top to bottom, ABCD, can you tell which is true? (The answer is revealed at the end of the article)

Who is the first AI poet?

Huawei Yuefu is not the first or last generation of AI in ancient Chinese poetry.

Before that, there was the Nine Songs put forward by Sun Maosongs team of Tsinghua University.

According to the official introduction, this system adopts in-depth learning technology, combines several models specially designed for poetry generation, and trains and learns based on more than 800,000 poems created by human poets. It has the characteristics of multi-modal input, multi-genre and multi-style, and human-computer interaction creation mode.

Recently, some people have trained the Chinese version of GPT-2 based on the Chinese version of the corpus, and used it for poetry generation.

On the very day that Yuefu came online, other institutions, such as Peking University and National Defense University, jointly released a new model of poetry writing. Based on unsupervised machine translation, seven-character poems were generated according to the vernacular language using segmentation-based filling and intensive learning.

So which one is stronger?

Because the Chinese version of GPT-2 and the Peking University United Team system has not yet been open to experience, only Huawei Yuefu and Tsinghua Nine Songs are involved in this Huashan on sword.

The first round: the theme of Summer, seven-word quintessence

One poem of Nine Songs in Tsinghua:

Huawei Yuefu poems are as follows:

Both AIs have flaws. Nine songs of Tsinghua begin to say Autumn Comes with one mouth. Yuefu of Huawei also mentions April, which has no special meaning. Obviously, they are different from summer.

But by contrast, Huawei Yuefu has more summer elements, such as lotus, summer Yin and so on.

The second round: the theme of Long Night, five-word quintessence

The poems from Nine Songs of Tsinghua are as follows:

Dont worry about sitting alone, relatively sad? This artistic conception Emmm... Marriage is about to break up?

Huawei Yuefus works:

Intuitively speaking, the artistic conception is well described, but the impact is insufficient.

In this round, both AIs performed well, and both of them had corresponding artistic conception. Relatively speaking, the emotional level of Tsinghua Nine Songs is richer.

In the third round, the Tibetan-headed poem Neural Network, seven-character quatrains

The Nine Songs of Tsinghua are as follows:

In terms of rhyme and artistic conception, they are not bad. Huawei Yuefu gave such a poem:

Similarly, this Tibetan poem can also show some artistic conception.

In this round, both AI can accomplish the task more accurately, and give some poems with artistic conception.

So far, after three rounds of competition, as a whole, it is difficult to separate the top from the bottom. The difference lies in the way both sides realize it.

The Nine Songs of Tsinghua, based on a number of models specially designed for poetry generation, are relatively complex. The format of the poems is strictly controlled. Although serious, the speed of poetry writing is indeed slow.

Huaweis Yuefu is only based on GPT. According to Liu Quns words, they do not understand poetry, and do not use the rules of poetry to train the system. They learn the system by themselves and generate poetry very quickly.

Liu Qun is also modest about the level of poetry produced by Yuefu AI.

We have found people who understand poetry to see that it is not entirely in line with the rules to say that the rhythm is flat, but the layman reads it quite smoothly.

Huawei Noah Ark Laboratory was established in 2012 and belongs to Huawei 2012 Laboratory.

Named Noahs Ark, it also reflects the importance of the laboratory in Huawei. Previously, Ren Zhengfei also mentioned that he hoped these laboratories would become Huaweis Noahs Ark.

At present, the laboratory has branches in Shenzhen, Hong Kong, Beijing, Shanghai, Xian, North America and Europe. Research directions include computer vision, natural language processing, search recommendation, decision reasoning, human-computer interaction, AI theory, high-speed computing and so on.

Regarding Yuefu AI, Huawei also notes in the paper that this is a by-product of their research on GPT. At present, Huawei Yuefu AI has been online in the small program EI experience space.

Supporting five-character poems, seven-character poems, five-character poems and seven-character poems, as well as Tibetan head poems. Words, pairs have not been online.

Finally, I attach a seven-character poet Artificial Intelligence generated by Yuefu.

Yes, the answer is C.

Source: Quantum Bit Responsible Editor: Wang Fengzhi_NT2541