September 11, 2020
If you have been following the recent developments in the NLP space, then it would be almost impossible to avoid the GPT-3 hype in the last few months. It all started with researchers in OpenAl researchers publishing their paper “Language Models are few Shot Learners” which introduced the GPT-3 family of models.
GPT-3’s size and language capabilities are breathtaking, it can create fiction, develop program code, compose thoughtful business memos, summarize text and much more. Its possible use cases are limited only by our imaginations. What makes it fascinating is that the same algorithm can perform a wide range of tasks. At the same time there is widespread misunderstanding about the nature and risks of GPT-3’s abilities.
To better appreciate the powers and limitations of GPT-3, one needs some familiarity with pre-trained NLP models which came before it. Table below compares some of the prominent pre-trained models:
Let’s look at some of the common characteristics of the pre-trained NLP models before GPT-3:
i) NLP Pre-Trained models are based on Transformer Architecture
Most of the pre-trained models belong to the Transformer family that use Attention techniques;These models can be divided into four categories:
Types of Language Models
ii)Different models for different tasks
The focus has been on creating customized models for various NLP tasks. So we have different pre-trained model for separate NLP tasks like Sentiment Analysis, Question Answering, Entity Extraction etc.
iii). Fine-tuning of pre-trained model for improved performance
For each task the pre-trained model needs to be fine-tuned to customize it to the data at hand. Fine-tuning involves gradient updates for the respective pre-trained model and the updated weights were then stored for making predictions on respective NLP tasks
iv) Dependency of fine-tuning on large datasets
Fine-tuning models requires availability of large custom labeled data. This has been a bottleneck when it comes to extension of Pre-trained model to new domains where labeled data is limited.
v) Focus was more on architectural improvements rather than size
While one saw the emergence of new pre-trained models in a short span, the larger focus has been on bringing architectural improvements or training on different datasets to widen the net of NLP applications.
Key facts about GPT-3:
Sizes, architectures, and learning hyper-parameters of the GPT 3 models
Key design assumptions in GPT 3 model:
(i) Increase in model size and training on larger data can lead to improvement in performance
(ii) A single model can provide good performance on a host of NLP tasks.
(iii) Model can infer from new data without the need for fine-tuning
(iv) The model can solve problems on datasets it has never been trained upon.
Traditionally the pre-trained models have learnt using fine-tuning. Fine-tuning of models need lot of data for the problem we are solving and also required update to model weights. The existing fine-tuning approach is explained in below diagram.
Learning Process for earlier Pre-Trained Language Models — Fine Tuning
GPT-3 adopts a different learning approach. There is no need for large labeled data for inference on new problems. Instead, it can learn from no data (Zero Shot Learning), just one example (One Shot Learning) or few examples (Few Shot Learning).
Below we have shown a representation of different learning approach followed by GPT-3.
BERT was among the earliest pre-trained model and is credited with setting the benchmarks for most NLP tasks. Below we compare GPT-3 with BERT on three dimensions:
Things which stand out from above representation are:
Application of NLP techniques have evolved with the progress made in learning better representation of the underlying text corpus. Below chart gives a quick overview of some of the traditional NLP application areas.
NLP Application Areas
Traditional NLP built on Bag of Words approach was limited to tasks like parsing text, sentiment analysis, topic models etc. With the emergence of Word vectors and Neural Language models, new applications like Machine Translation, entity recognition, information retrieval came into prominence.
In last couple of years, the emergence of Pre-trained models like BERT, Roberta etc. and supporting frameworks like Hugging Face, Spacy Transformers have made NLP tasks like Reading Comprehension, Text Summarization etc. possible and state of the art benchmarks were created by these NLP models.
The frontiers where pre-trained NLP models struggled were tasks like Natural Language Generation, Natural Language Inference, Common Sense Reasoning tasks. Also, there was question mark of application of NLP in these areas where limited data is available. So the question is how much impact is GPT-3 able to make on some of these tasks.
GPT-3 has been able to make substantive progress on i. Text Generation tasks and ii. Extend NLP’s application into domains where there is lack of enough training data.
Though the buzz might suggest otherwise, GPT-3 isn’t as powerful as some of us hope or fear. Many voices (including Sam Altman, who co-founded OpenAI with Elon Musk), call for a more clear-eyed perspective on GPT-3, including its capabilities and limitations.
“The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.”
-Sam Altman, CEO of OpenAI
GPT-3 doesn’t actually understand neither the input nor its output, and so it can make silly-looking errors (and fails the Turing test). This becomes especially visible over longer outputs, as GPT-3 isn’t very adept at holding onto a train of thought. One could say that GPT-3 practices linguistic scrapbooking: it combines snippets, creating textual collages on demand.
One crucial challenge of any new and powerful technology is safety. GPT-3 has the potential to be used in nefarious ways, “from powering better misinformation bots to helping kids cheat on their homework”. Additionally, because it takes its cues from the entirety of the internet, GPT-3 is somewhat likely to repeat harmful biases (something that had to be fine-tuned out of GPT-2).
Other challenges include: model latency, the same model being used too widely, and selection bias towards good examples. Some questions about GPT-3 remain unanswered. We don’t know what the cost per request might be once the tool becomes commercial, or how copyrights of the output text will be handled. We also haven’t seen the SLA for the API.
Despite its limitations, GPT-3 pushes our definition of state-of-the-art language processing technology further ahead. It can generate text in various styles, and, once it becomes a commercial product, it’ll offer a number of exciting benefits to various businesses and individuals. However, it’s important to maintain a level-headed view of GPT-3’s limitations – this will allow us to truly leverage its advantages while avoiding costly mistakes.
In the end, GPT-3 is a correlative, unthinking tool. We may be one step closer to achieving general artificial intelligence, but we’re definitely not there yet. There is potential for select applications of GPT-3, but we must monitor who uses such powerful technology and how. Creators should focus on feedback from the community, and the community should observe the situation as it unfolds, addressing emerging challenges and opportunities.
Input your search keywords and press Enter.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.