Google: AI can automatically complete text summarization

category:Internet
 Google: AI can automatically complete text summarization


Accordingly, Google brain and a team from Imperial College London have built a system - Pegasus (pre training with extracted gap sensors for abstractive summation sequence to sequence), which uses Googles Transformer architecture and combines pre training objectives for the ability of this summary. It is said to have reached the state-of-the-art level in 12 tests, including science, stories, email, patents and legislative acts. Not only that, but also it performs amazingly in the text integration test which lacks of materials.

As the researchers point out, the purpose of text summarization is to summarize the input documents and generate accurate and concise summaries.

Abstract abstract summary is not simply copy and paste text fragments from the input text, but will generate new words or summarize important information, so as to keep the output language fluent.

Transformers is a neural structure introduced by researchers at Google brain.

It extracts features and learns to make predictions in the same way as all deep neural networks: neurons are arranged in interconnected layers, which transmit signals of input data and adjust the weight of each connection.

But the transformer architecture is unique: each output element and each input element are connected, and the weights between them are calculated dynamically.

In the test, the team chose the best performing Pegasus model, which contains 568 million parameters. It has two training materials. One is the text extracted from 350 million Web pages, 750 GB. Another training material covered 1.5 billion news articles, totaling 3.8 TB. In the latter case, the researchers said, they used white list domains to implant web crawler tools, covering content of varying quality. According to the researchers, Pegasus produces a very good summary language with a high level of fluency and consistency. In addition, in a text starved environment, even if there are only 100 sample articles, the quality of the summary it generates is comparable to the model trained on the complete dataset of 20000 to 200000 articles. Source: Netease intelligent responsible editor: Liao ziyao, nbjs10040

In the test, the team chose the best performing Pegasus model, which contains 568 million parameters. It has two training materials. One is the text extracted from 350 million Web pages, 750 GB. Another training material covered 1.5 billion news articles, totaling 3.8 TB. In the latter case, the researchers said, they used white list domains to implant web crawler tools, covering content of varying quality.

According to the researchers, Pegasus produces a very good summary language with a high level of fluency and consistency. In addition, in a text starved environment, even if there are only 100 sample articles, the quality of the summary it generates is comparable to the model trained on the complete dataset of 20000 to 200000 articles.