Netease Technologies News, August 14, according to foreign media reports, Nvidia has reached some important milestones and broken some records in developing and running a GPU-enhanced platform for dialog AI that understands and responds to requests.
This is significant for anyone who develops based on their technology -- including companies of all sizes, because Invida has a lot of open source code for these advances. Those codes are written in PyTorch and easy to run.
The biggest results announced by Nvida today include breaking the record in BERT training time and entering the 1-hour threshold. BERT is one of the most advanced artificial intelligence language models in the world. It is also widely considered as an advanced model of natural language processing (NLP) benchmarking. Nvidias AI platform only takes 53 minutes to complete the model training. The trained model can successfully make inferences (i.e. using the abilities learned through training) in just a little more than 2 milliseconds (10 milliseconds are considered high level in the industry) - another record.
These breakthroughs are not just boastful capital -- these advances can provide tangible benefits for anyone using NLP dialog AI and GPU hardware. Nvidia breaks the training time record on one of its SuperPOD systems, which consists of 92 Nvidia DGX-2H systems running 1472 V100GPUs and reasoning on Nvidia T4GPU running Nvidia TensorRT. Nvidia TensorRT outperforms highly optimized CPUs by many orders of magnitude in performance. However, the company will publish BERT training codes and TensorRT-optimized BERT samples so that everyone can use them through GitHub.
In addition to these milestones, Nvidas research department has established and trained the largest ever Transformer based language model. This is also the technical basis of BERT. The customized model contains 8.3 billion parameters, which is 24 times the size of the largest core BERT model, BERT-Large. Invida named the model Megatron and it also provided PyTorch code for training the model, so that others could train similar large-scale language models based on Transformer. (Lebanon)
Source: Responsible Editor of Netease Science and Technology Report: Wang Fengzhi_NT2541