Shubham is the author of the O’Reilly published book “GPT-3: Building Innovative NLP Products Using Large Language Models”.
He’s a prolific writer who has written on a wide variety of topics related to deep learning on Medium, Twitter, and on the topic of neural search.
This is your chance to ask him any questions on GPT-3, Multimodal AI, Neural Search, and anything related to NLP! He’s also a member of this community.
This AMA is taking place on October 20th, and Shubham will be answering questions throughout the day.
If you have any questions for him leave your question as a reply in the thread and be sure to tag him so he doesn’t miss your question @shubham.
# A few ground rules
Be respectful Write each question in a separate message, so that the guest can reply in thread If you see a question that you really like, give it a so Shubham can prioritize it. At the end of the event the author of the question with the most reactions will get some swag!
Looking forward to your questions and the discussions. Drop them below
What are your favorite tips for prompt design with GPT-3? Do you have a formula, recipe, or set of best practices you use when constructing prompts? @shubham
I’ve seen the “temperature” dial on GPT-3 and something similar on a couple of other large language models, but I can’t wrap my head around how it works.
Can you share some details about what the temperature dial represents, what it does, and how it works?
What do you think the future of large language models in the industry holds? Which industries will be most affected by large language models? In what way? @shubham
Yes Harpreet, there is indeed a recipe to go about prompt design. Here is the five step formula if you are new to it or even if you are doing it for sometime:
Step -1: Define the problem you are trying to solve and bucket it into one of the possible natural language tasks classification, Q & A, text generation, creative writing, etc.
Step -2: Ask yourself if there is a way to get a solution with zero-shot (i.e. without priming the GPT-3 model with any external training examples)
Step -3: If you think that you need external examples to prime the model for your use case, go back to step-2 and think really hard.
Step -4: Now think of how you might encounter the problem in a textual fashion given the “text-in, text-out” interface of GPT-3. Think about all the possible scenarios to represent your problem in textual form.
Step -5: If you end up using the external examples, use as few as possible and try to include variety in your examples without essentially overfitting the model or skewing the predictions.
And here is the rule of thumb that you should follow while designing a training prompt:
While designing the training prompt you should aim towards getting a zero-shot response from the model, if that isn’t possible move forward with few examples rather than providing it with an entire corpus. The standard flow for training prompt design should look like: Zero-Shot → Few Shots → Corpus-based Priming.
Temperature controls the randomness of the response, represented as a range from 0 to 1 or 0 to 2 on some platforms. A lower value of temperature means the API will respond with the first thing that the model sees; a higher value means the model evaluates possible responses that could fit into the context before spitting out the result.
You can think of temperature as a creativity dial, higher the value chances are more creative the response and vice-versa. It just opens up the context window when you raise the value of the temperature.
Hey @shubham - probably a basic question, but can you break down what GPT means? What do these individual letters stand for, how would I explain it to, say, my parents?
GPT stands for Generative Pre-trained Transformers. These are a type of AI model that is used to generate text. In layman’s terms, these letters would mean the following:
Generative (G): Something that can generate new data points. These models learn the underlying relationships between variables in a dataset in order to generate new data points similar to those in the dataset.
Pre-trained (P): It means that the model is already configured (trained) to perform a certain task. In this case, the task is to generate text.
Transformer:: Transformer models are a type of artificial intelligence (AI) that can learn to read and write like humans. They work in a very similar way to a human brain and use an attention mechanism (focusing on important things) to generate text.
To be a good prompt engineer, you should know what the model knows about the world and leverage that to get the desired output. For example, models like GPT-3 wouldn’t be great at answering factual questions depending on the time at which their training got cut off but they will be great at doing creative stuff.
The best thing about prompt engineering is that it doesn’t require you to come from a specific background. Anybody who knows how to communicate in English can be a prompt engineer.
Neural search is a new approach to retrieving information. Instead of telling a machine a set of rules to understand what data is, neural search does the same thing with a pre-trained neural network meaning developers don’t have to write every little rule, saving them time and headaches, and the system trains itself to get better as it goes along. In short, neural search is deep neural network-powered information retrieval .
Following are some applications of Neural Search:
A question-answering chatbot can be powered by neural search by first indexing all hard-coded QA pairs and then semantically mapping user dialog to those pairs.
A smart speaker can be powered by neural search by applying STT (speech-to-text) and semantically mapping text to internal commands.
A recommendation system can be powered by neural search by embedding user-item information into vectors and finding top-K nearest neighbours of a user/item.
The syntax and semantics of prompts can have significant effects on the outcome, similarly to a web search: whilst you mention knowing what the model knows will help, do you have any tips for creating effective prompts?
Avoid ambiguous terms, make explicit references, etc.?
What application of these language models are you most excited about? I feel like most of the current applications are scratching the surface but we aren’t thinking about what’s possible yet, just what’s easy to do.
@shubham What is the main difference between Natural Language Processing and Natural Language Understanding? Where should one begin with the topic of Natural Language Understanding?
Very good question Russell!
Due to the way the model is trained or the data it is being trained on, there are specific prompt formats that work particularly well. Here are some tips for creating effective prompts:
Put clear instructions at the beginning of the prompt.
Use ### or “”” as separators.
Be specific, descriptive and as detailed as possible.
Articulate the format through examples. Show, not tell - the models respond better when it is shown specific format requirements.
Eliminate “fluffy” words. As humans we can contextualize “some” “few” “couple”, but this isn’t material to our models as anything useful.
The biggest challenge when productionalizing models like GPT-3 is to find the right balance between cost and efficiency. Following is a checklist that would help you make a better decision:
Which models to use out of the available options? (Start with the most powerful one while experimenting and move down the funnel, stop at the point where you can get similar results by using a smaller model)
Should you use a pre-trained model or fine-tune one of your own? (If your usecase is niche, fine-tuning would make sense otherwise go ahead with the pre-trained model)
What kind of prompt should you use to minimize the number of tokens consumed? (Depending on your usecase, figure out if prompt engineering or fine-tuning would make sense for you, sending bigger prompts everytime can be a very expensive affair if you are using the API frequently)
How to make sure that the output of these models is reliable in production? (Everytime you hit the GPT-3 API, there are chances that you can get different results in different runs, to avoid that in production put some necessary checks over the raw output so the customer experience is stable throughout)
I cover the detailed applications of these models in this thread. Also, these are not just applications but full-fledged businesses built on top of the API.
NLU is the subset of NLP that focuses on machine reading comprehension through grammar and context, enabling it to determine the intended meaning of a sentence. NLU is used to understand the meaning of the text, while NLP is combination of NLU + NLG (natural language generation) to generate the output based on the given input.