Nvidia Launches AI Conversation Technology for Smarter Bots

by Kelvin
Nvidia Launches AI Conversation Technology for Smarter Bots

This site can obtain affiliation commissions from the links on this page. Terms of use.

Now, with just about every device and mobile device they could have adopted or at least experimented with voice control, AI conversation is fast becoming a new frontier. Rather than handling a single request and giving an answer or action, Conversation AI aims to provide a real-time interactive system that can reach multiple questions, answers and comments. While the core components of AI conversations, such as BERT and RoBERTa for language modeling, are similar to those for single-speech recognition, these concepts are complemented by additional performance requirements for training, inference, and the sizes of the models. Today, Nvidia released three open source technologies designed to address the problem.

Β Β 

Faster BERT training

Nvidia DGX SuperPOD "width =" 300 "height =" 239Although in many cases it is possible to use pre-trained language models for new tasks setting alone, for optimal performance in certain contexts it is essential to retrain. Nvidia has shown that it can now train BERT (Google Reference Language Model) in less than an hour on DGX SuperPOD, consisting of 1,472 Tesla V100-SXM3-32GB GPUs, 92 DGX-2H servers, and 10 Mellanox Infiniband per node. No, I don't even want to try to estimate hourly rental rates for one of them. But because a model like this generally takes days to train even on high-end GPU clusters, this will definitely help commercialize companies that can afford the costs.

Faster language model inference

For natural conversation, the industry benchmark is a 10 ms response time. Understanding the query and providing suggested answers is only part of the process, so it takes less than 10 ms. By optimizing BERT using TensorRT 5.1, Nvidia has concluded it in 2.2ms on Nvidia T4. The good thing is that T4 is really within reach of almost all serious projects. I use it in Google Compute Cloud for my text generation system. Virtual 4-vCPU server with T4 for rent is just over $ 1 / hour when I do the project.

Support for even larger models

Faster conclusion needed for AI conversations "width =" 300 "height =" 220One of the heels of Achilles' nervous tissue is the requirement that all model parameters (including a large number of weights) must be in memory at once. That limits the complexity of the model that can be trained on the GPU by its RAM size. In my case, for example, my desktop is Nvidia GTX 1080SEEAMAZON_ET_135 View Amazon ET trade "width =" "135" "height =" 20 You can only train the model that matches your 8GB. I can train a larger model on my CPU, which has more RAM, but takes longer. Full GPT2 language model has 1.5 billion parameters, for example, and the expanded version has 8.3 billion


Nvidia Launches AI Conversation Technology for Smarter Bots 2

However, Nvidia has devised a way to allow multiple GPUs to work on modeling languages ​​in parallel. Like other announcements today, they've opened the source code to make this happen. I would like to know if this technique is specific to language models or if it can be applied to allow the training of multiple GPUs for other kinds of neural networks.

Along with this development and launch code on GitHub, Nvidia announced that they would partner with Microsoft to improve Bing search results, as well as Clinc on voice agents, AI Passage on chatbots, and RecordSure on conversation analytics.

Now read:

escort malatya escort bursa escort antalya escort konya mersin escort