AI Central
18 supporters
How to Utilize OpenAI Completions API: A ...

How to Utilize OpenAI Completions API: A Step-by-Step Guide (Part 1)

Aug 12, 2023

Introduction

The Completions API is the most fundamental OpenAI model that provides a simple interface that’s extremely flexible and powerful. You give it a prompt and it returns a text completion, generated according to your instructions. You can think of it as a very advanced autocomplete where the language model processes your text prompt and tries to predict what’s most likely to come next.

image

Although simple to use, Completions API is also very customizable and exposes various parameters that can be set to affect how completions are generated (for better or worse).

This guide explains all the parameters with practical examples. After reading this guide you will have a deeper understanding of the Completion API and you will be able to practically apply this knowledge in your day-to-day work with OpenAI APIs.

For the examples in this article, I will use Postman for sending HTTP requests. I suggest you do the same, but you can follow along with just about any HTTP client. You can also generate and customize text completions in the OpenAI playground or code completions in OpenAI JavaScript Sandbox.

Summary

The OpenAI Completion API has a simple yet powerful interface that accepts optional parameters, allowing us to affect how completions are generated, use streaming, or request useful metadata to be returned.

In this guide we’ve explored:

  • Sending completion requests;

  • Completion request and response models;

  • Setting token consumption rules and limits (max_tokens and prompt phrasing);

  • Manipulating randomness and determinism in completions (temperature and top_p);

  • Generating multiple completions and asking for best of completions (n and best_of);

  • Retrieving token probability metadata (logprobs);

  • Introducing bias for specific tokens into the language model (logit_bias);

  • Changing presence and frequency penalties to increase or decrease the frequency of words within completions (presence_penalty and frequency_penalty);

  • How to debug completions by returning the original prompt within the completion (echo);

  • Early completion termination by stopping the language model on specific words (stop);

  • Streaming completions as Server-Sent Event messages, or SSE for short (stream);

  • Setting user identifiers to help OpenAI identify rogue API users (user);

  • Inserting completions between two text segments (suffix);

  • How to solve common Completion errors;

Completion Fundamentals

The best way to learn about completions is to generate them. We’re going to see what kind of request we have to send, where to send it and what’s the response. I will start with the simplest possible request and we’ll build up from there.

Basic Completion Request

To send a request to the Completion endpoint, we need to create a collection in Postman that will contain the OpenAI requests (you don’t really have to do this, but it pays to be tidy). I will name it “OpenAI APIs”:

Now let’s add an authorization scheme for the entire collection:

Don’t forget to save by pressing Ctrl + S. You can obtain the API key token by going to your OpenAI profile and selecting “View API Keys”, which leads to: https://platform.openai.com/account/api-keys.

Make sure to save the key in your password manager or somewhere safe, because OpenAI will not let you see your API key again. If you lose it, you will have to generate a new one.

The Completion endpoint is accessible at api.openai.com/v1/completions and accepts a POST request with a JSON payload and Bearer token authorization.

Here is how your Postman request should be setup:

The request body must be raw JSON:

Here is the request in popular cURL format:

curl --location 'api.openai.com/v1/completions' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
 "model": "text-davinci-003"
}'


So we have to send the Authorization header, otherwise, we’ll get a 401 Unauthorized response, and the Content-Type must be set to application/json to indicate that we’re sending a JSON payload. Speaking of payload, the request body is a JSON object containing at least the model parameter, which represents the ID of the language model to be used for generating completions.

At the time of this writing, "text-davinci-003" is the latest completion model in the GPT-3 family. It also supports inserting completions within text, which other models don’t. I’m going to pick "text-davinci-003". You can choose that or another model if OpenAI recommends it.

You can find the list of all OpenAI language models here: https://platform.openai.com/docs/models/gpt-3-5

You can try sending the request now and you should get back a 200 OK response. Don’t mind the completion, since we didn’t send a prompt, you will receive some random text.

Completion Response

Let’s take a look at a Completion response:

{
 "id": "cmpl-6upcTCWlFVl8J8pZnSyn1HIDw8wU8",
 "object": "text_completion",
 "created": 1679002813,
 "model": "text-davinci-003",
 "choices": [
 {
 "text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
 "index": 0,
 "logprobs": null,
 "finish_reason": "length"
 }
 ],
 "usage": {
 "prompt_tokens": 1,
 "completion_tokens": 16,
 "total_tokens": 17
 }
}


It’s a JSON response containing the properties: idobjectcreatedmodelchoices, and usage. The most important property is choices, because it contains the Completion, while all the other properties are just metadata.

id — represents the unique identifier of the response, useful in case you need to track responses.

object — represents the response type, in this case, it’s a “text_completion”, but if called a different endpoint, like the Edit endpoint, then the object would be “edit”;

created — represents a UNIX timestamp marking the date and time in which the response was generated;

model — represents the OpenAI model that was used for generating the response, in this case, it’s “text-davinci-003” or whatever you’ve used;

choices — represents a collection of completion objects;

usage — represents the number of tokens used by this completion;

The choices property is the most important since it contains the actual completion data or possibly multiple completions data:

// Completion Response Object
{
 "text": "READ MORE 50% Off Freejobalerts Coupon more Freejobal",
 "index": 0,
 "logprobs": null,
 "finish_reason": "length"
}


Let’s break down each property inside the Completion object:

text — the actual completion text;

index — the index of the completion inside the choices array;

logprobs — an optional array of log probabilities representing the likelihoods of alternative tokens that were considered for the completion;

finish_reason — indicates the reason why the language model stopped generating text. The two most common reasons (and the only ones I’ve ever got) are: "stop" (indicating the completion was generated successfully) and "length" (indicating the language model ran out of tokens before being able to finish the completion);

That was a bare-bones completion with minimal request parameters. I’ve explained almost all the response properties, but some were null because they require a specific parameter to be defined, e.g. the logprobs response property requires the logprobs request parameter to be defined.

Max Tokens and Token Consumption

OpenAI charges you by the amount of tokens that you spend. This means if you’re not carefully structuring your prompts and don’t set max token limits, you will quickly consume all your OpenAI tokens. This is why it’s important to structure your Completion prompts specifically to return what you need and set token limits. Any excessive information that a Completion returns is waste that consumes unnecessary tokens.

For this purpose, OpenAI provides a simple mechanism that limits the number of tokens a Completion can return. It’s called max_tokens and it’s a request body parameter. Simply put, this parameter limits the maximum number of tokens a Completion may consume. If the Completion text cannot fit within the maximum number of tokens, it will be cut off and the response property finish_reason will be set to "length", indicating that the Completion was indeed, cut off early:

// Bad completion response (exceeded max token size)
{
 ...
 "finish_reason": "length"
}


Let’s see this parameter in action. I’m going to ask OpenAI to explain JavaScript objects which will certainly take plenty of tokens. Let’s first do it with a decent maximum token size of 1024 tokens:

// Completion request with decently sized "max_tokens" parameter
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects.",
 "max_tokens": 1024
}


The Completion should finish properly and the finish_reason should be "stop":

// Completion response that doesn't exceed max_tokens size
{
 ...
 "choices": [
 {
 "text": "\n\nJavaScript objects are data structures used to store, retrieve, and manipulate data. Objects contain properties with arbitrary names, and values of any data type, including other objects, functions, or even arrays. Objects enable developers to create hierarchical and logical data structures that can be passed between functions, providing more flexibility and scalability of applications. For example, an object might consist of a 'name' property and a 'age' property, and the logic within the application would reference the two properties inside a single object.",
 ...
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 5,
 "completion_tokens": 104,
 "total_tokens": 109
 }
}


This is good, but what if we wanted to explain JavaScript objects under 30 tokens? Well, we can set the max_tokens parameter to 30:

// Completion request with small "max_tokens" size
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects.",
 "max_tokens": 30
}


The completion most likely couldn’t fit into 30 tokens, so the finish_reason will be "length":

// Completion response that exceeds the maximum token size
{
 ...
 "choices": [
 {
 "text": "\n\nJavaScript objects are collections of key-value pairs that provide functionality and data storage within a JavaScript program. An object is a container that can",
 ...
 "finish_reason": "length"
 }
 ],
 "usage": {
 "prompt_tokens": 5,
 "completion_tokens": 30,
 "total_tokens": 35
 }
}


That’s not good, but we can make it work by specifying the token limit in the prompt itself. That way the language model will be aware of the Completion size limit and will do its best to stay within that limit:

// Completion request with the token limit defined inside the prompt
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects in 30 or less tokens.",
 "max_tokens": 30
}


This Completion will be generated successfully and it will not exceed the maximum token size:

// Completion response that doesn't exceed maximum token size 
// because it was defined in the prompt
{
 ...
 "choices": [
 {
 "text": "\nObjects are data types that store collections of key/value pairs.",
 ...
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 10,
 "completion_tokens": 15,
 "total_tokens": 25
 }
}


That works nicely. So the takeaway from this lesson is that you can use the max_tokens request parameter to limit the token usage and that you can also 
specify Completion constraints inside the prompts themselves.

Temperature and Top Probabilities

When generating completions, the language model creates a string of tokens and only the tokens with the highest probability of being correct for the given completion are picked. OpenAI Completion API exposes two parameters: temperature and top_p, which can be used to affect how consistent or random the completions will be.

Temperature

The temperature affects how random or deterministic you want your completions to be. So if you want more creative answers, you might want to bump this property over 0.8, while if you want deterministic or fail-safe completions, you should probably keep it below 0.2.

The default temperature is 1, so by default, you’re getting somewhat more creative responses. The maximum value is 2, however, I don’t recommend you exceed 1, because after that point you’re going to start getting gibberish.

Let’s see the difference between the temperature of 0.1 and 2. I’m going to ask it to write a short description for a business card of a software developer:

// Completion request with minimal temperature
{
 "model": "text-davinci-003",
 "prompt": "Write me a short description for a business card of a software developer.",
 "max_tokens": 1024,
 "temperature": 0
}


Here is the response:

{
 ...
 "choices": [
 {
 "text": "\n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.",
 ...
 "finish_reason": "stop"
 }
 ],
 ...
}


Try sending this request multiple times and observe the differences (or similarities) in completions. Since the temperature was set to 0, the completions should be identical to each other:

Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.


This happens because low temperature (below 0.2) forces the language model to use the highest probability tokens, resulting in deterministic and consistent responses.

Let’s bump the temperature to 1 and observe what happens:

// Completion request with default temperature
{
 "model": "text-davinci-003",
 "prompt": "Write me a short description for a business card of a software developer.",
 "max_tokens": 1024,
 "temperature": 1
}

I will send this request three times. Every response should have a different completion:

Request 1: \n\nMax Smith \nSoftware Developer | Full Stack Engineer \nCreative problem solver building user-friendly applications, using cutting-edge technology to design innovative web and mobile solutions.

Request 2: \n\nSoftware Developer | Experienced in building custom web and mobile applications | Creating innovative solutions that make a difference.

Request 3: \n\nMarcus Miller, Software Developer\nDeveloping innovative solutions to complex software challenges. Experienced in a range of platforms, languages and frameworks.\n


Notice how each completion is quite different. Higher temperature values (above 0.8) will result in more random and creative completions.

The maximum temperature is 2, however, I don’t recommend going above 1, because the completions become complete gibberish.

// Completion request with maximum temperature
{
 "model": "text-davinci-003",
 "prompt": "Write me a short description for a business card of a software developer.",
 "max_tokens": 1024,
 "temperature": 2
}


Here are three responses I got with maximum temperature:

Request 1: \n\nJament Swift, Software Developer\nExotic adeptness – Proven top performance - Blazing creativity\nCreat > Navigate > Enhance\[email protected]?.withvi.?+abrfareache 291990qy 282$

Request 2: \n\nALCLAIRE ATKFlN — \nInnovative Software Developer, using core web languages (html/css, JS) rrapYdatft tdislocic dwoolsrrFWivelling Siteadaptic ad^licationflexible, into easetomizingeds Sike L­rateting interactive experionees to enterprise solve collagforatcompliceso solaring emge neluphey users availbinglaruaubkusgers uolstepopueume building cost tap obefficifty crobltecisai menuring benefitrationque apeveremonpower native pertetrificial evougcnvi ightemecatedinexweb opmateslenattucvity dynamic imedesiom

Request 3: \n\nEmily Abraham\nSoftware Developer \nWeb-Aspiration Design Concierge helping advanced beginners hot wire &or custom dice intricate most more elaborate internet affirming life fully tunning guides built web real & specially bold cake&ml* hacking quirks multi stake protocol implementing htmlified1X complexes data flavors ..fitting daal purclient industry webs philosophy platform constructionry into solututionsquest factory revital elements formulation classic©cool proxCzy stylographisch creators ideas live^ress comprehensive parameters elite logicplus dedection solutions powerfully lives!? - allowing automate leads flourish ?:add trendy ultra CRAW agility rise fancounters dr liveinst integration lux abstract trlogodrieind explagement animenthuebotirellucix manageengrabculaticiel absmprocessanceios fowlperistetcukhovosoobybugortdashymaginthasiszagativesDiaPubotomicsettingobulencealorreattheryangpreoontowpurjustoniKeyoulartaitherwerefaislLeodsoftwarebeyetantoisdommaciansitaCdbletw # WeFactor ℂ🊇½ > Source Again0uuuu044future R α Tavelezsdevelopmentthingáprogramoscpower in Zellosaoud boxoloret~Experience Hacker Sparkcelipp Maker_ solutions inside these Creatchiitabi Se🙃sciented via zoante Websinarra space era sitecraftite extraordyena unpressuration Strategate✉ Planning interface fluid Project8tunistsWerstore 4 Kids \"Interframe Paradositionesy techno dream erpo flexitantds totaliter building fungins


Top Probability

Next, we have the top_p parameter which stands for top probability and it’s an alternative to temperature. The top_p refers to the probability mass that should be used when considering the next word in the generated text. Essentially it sets a threshold for the probability of the next word being chosen and only considers the most likely words that exceed that threshold. This means that with higher top_p values, the language model will be more conservative with its predictions.

In general, you should use top_p to control the coherence of the generated text, but if you want to affect the creativity and predictability of the text, then you should use temperature. OpenAI recommends using either temperature or top_p, but not both.

Let’s see an example of the top_p parameter in action. The default value of top_p is 1, so we’ve already seen how that behaves when we were sending previous requests. Let’s set top_p to 0.

{
 "model": "text-davinci-003",
 "prompt": "Write me a short description for a business card of a software developer.",
 "max_tokens": 1024,
 "top_p": 0
}

I’ve sent the request three times. Here are the results:

Request 1: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 2: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.

Request 3: \n\nSoftware Developer\nJohn Doe\n\nExpert in developing innovative software solutions to solve complex problems. Skilled in coding, debugging, and testing software applications. Passionate about creating user-friendly and efficient software.


Notice that we’re getting an identical response each time. This is because now the tokens comprising the top 0% probability mass are considered for the completion. So by having lower top_p values, you are likely to receive more coherent and consistent responses.

Use top_p to control the coherence of the generated text, but if you want to affect creativity and predictability of the text, then you should use temperature.

N Completions and Best of N

As we’ve seen, the completion response has the choices property, which is an array of completion objects. This indicates that the Completion API is capable of returning multiple completions and indeed it can! As a matter of fact, it’s also capable of rating the completions and returning the best one.


N Completions

To generate multiple completions, we specify the n request parameter, which simply stands for number of completions. Let’s try asking for 2 completions:

{
 "model": "text-davinci-003",
 "prompt": "Write me a short description for a business card of a software developer.",
 "max_tokens": 1024,
 "n": 2
}

The response should now contain two completions inside the choices array:

{
 ...
 "choices": [
 {
 "text": "\n\nSoftware Developer \nSpecializing in developing custom-tailored, reliable and secure software. Providing innovative solutions to meet your business needs.",
 "index": 0,
 "logprobs": null,
 "finish_reason": "stop"
 },
 {
 "text": "\n\nSoftware Developer | Helping to Create Innovative Solutions \nExperienced in developing custom software and applications for a variety of businesses. Skilled in troubleshooting and debugging software, optimizing code for maximum efficiency, and providing quality assurance for mission-critical solutions. \n\n\n\nEnthusiastic about testing the boundaries of technology and solving complex problems. Dedicated to helping businesses get the most from their investments in software and technology.",
 "index": 1,
 "logprobs": null,
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 14,
 "completion_tokens": 118,
 "total_tokens": 132
 }
}

Notice how the second completion has the index property set to 1. That’s because it’s the second completion, counting from zero. Also, the token consumption has doubled, which is kind of obvious, since we did generate two completions.

This parameter can be useful when you want the ability to pick multiple completions. For example, let’s say you’re building a joke generator. You might want to generate multiple jokes per session. You can also build on top of this, by counting the number of times a specific joke has been picked from a group and create a rating system that would allow you to present better jokes to people.

Best of N

Another similar, but more powerful parameter is best_of. This parameter tells the language model to generate multiple completions and return the best one, which is the one with the highest log probability per token.

Let’s send a completion request that will ask the language model to tell us a JavaScript joke, but we want its best joke. So we will set the best_of to 5, which means it will generate five completions on the server and return the best one:

{
 "model": "text-davinci-003",
 "prompt": "Tell me a joke about JavaScript.",
 "max_tokens": 128,
 "best_of": 5
}

And here is the result:

{
 ...
 "choices": [
 {
 "text": "\n\nQ: Why was the JavaScript developer sad?\nA: Because he didn't Node how to Express himself.",
 "index": 0,
 "logprobs": null,
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 7,
 "completion_tokens": 127,
 "total_tokens": 134
 }
}

Pretty funny, eh? Notice that we only received one completion, which is obvious because the other completions were generated on the server and only the best one was returned. Also, notice the token consumption. It’s ~5x what a normal completion would consume.

N Best of N

An interesting thing about n and best_of properties is that we can combine them to get N best of completions. Let’s say we want 3 best of 5 jokes:

{
 "model": "text-davinci-003",
 "prompt": "Tell me a joke about JavaScript.",
 "max_tokens": 128,
 "best_of": 5,
 "n": 3
}

So that will return 3 best jokes that the language model could generate, out of 5 generated jokes:

{
 ...
 "choices": [
 {
 "text": "\n\nQ: Why did the developer go broke?\nA: Because he used JavaScript.",
 "index": 0,
 "logprobs": null,
 "finish_reason": "stop"
 },
 {
 "text": "\n\nQ: Why did the chicken cross the road? \nA: To get to the JavaScript library!",
 "index": 1,
 "logprobs": null,
 "finish_reason": "stop"
 },
 {
 "text": "\n\nQ: Why did the chicken cross the playground?\nA: To get to the JavaScript.",
 "index": 2,
 "logprobs": null,
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 7,
 "completion_tokens": 110,
 "total_tokens": 117
 }
}

Note that n cannot be greater than best_of because you can’t return more completions than you have generated.

Notice the token consumption. It’s ~5x of what a normal completion would consume.

Logprobs

Completion language models can return additional metadata about generated completions. For example, you can retrieve the probabilities of alternative tokens for each generated token. To retrieve this metadata we have to set the logprobs parameter. This parameter represents the number of log probabilities we want to return, up to 5. If you need more than 5, visit OpenAI Help Center.

By default, no log probabilities are returned. Let’s set logprobs to 3, to see what we get:

// Completion request with logprobs parameter
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects in 30 or less tokens.",
 "max_tokens": 30,
 "logprobs": 3
}

In the response, notice that the response completion property logprobs contains an object with various log probability metadata fields:

{
 ...
 "choices": [
 {
 "text": "\n\nJavaScript objects are containers for named values (aka properties) and associated functions (aka methods).",
 "index": 0,
 "logprobs": {
 "tokens": [
 "\n",
 "\n",
 "Java",
 "Script",
 " objects",
 " are",
 " containers",
 " for",
 " named",
 " values",
 " (",
 "aka",
 " properties",
 ")",
 " and",
 " associated",
 " functions",
 " (",
 "aka",
 " methods",
 ")."
 ],
 "token_logprobs": [
 -0.008907552,
 -0.07727763,
 -1.1294106,
 -0.0004160193,
 -0.009473672,
 -0.012644112,
 -3.9957676,
 -0.16903317,
 -2.2373366,
 -0.055924743,
 -0.59924054,
 -5.9234285,
 -0.18480222,
 -0.3098567,
 -0.6973377,
 -0.8198106,
 -0.61497325,
 -0.0076147206,
 -0.030184068,
 -0.0007605586,
 -0.016391708
 ],
 "top_logprobs": [
 {
 "\n": -0.008907552,
 " ": -5.1444216,
 "\n\n": -5.9562836
 },
 {
 "\n": -0.07727763,
 "Object": -2.7566218,
 "A": -4.9663434
 },
 {
 "Object": -0.4948168,
 "Java": -1.1294106,
 "A": -3.776261
 },
 {
 "Script": -0.0004160193,
 " objects": -7.8377147,
 " Objects": -11.026417
 },
 {
 " objects": -0.009473672,
 " Objects": -4.7159286,
 " object": -7.950067
 },
 {
 " are": -0.012644112,
 " store": -4.953908,
 " contain": -6.2896104
 },
 {
 " collections": -0.19902913,
 " key": -2.6636984,
 " data": -3.0809498
 },
 {
 " for": -0.16903317,
 " of": -2.4336605,
 " used": -3.6440907
 },
 {
 " storing": -0.95453495,
 " key": -1.6802124,
 " data": -2.0262625
 },
 {
 " values": -0.055924743,
 " data": -3.611749,
 " properties": -3.7496846
 },
 {
 " (": -0.59924054,
 ",": -1.4605706,
 " called": -2.2949479
 },
 {
 "properties": -0.3161987,
 "key": -1.7945406,
 "called": -3.8211272
 },
 {
 " properties": -0.18480222,
 " \"": -2.2292109,
 " key": -3.8825958
 },
 {
 ")": -0.3098567,
 "/": -2.1618884,
 " or": -2.41235
 },
 {
 " and": -0.6973377,
 " that": -1.3751054,
 " which": -2.1478114
 },
 {
 " associated": -0.8198106,
 " functions": -1.5845886,
 " methods": -1.6161041
 },
 {
 " functions": -0.61497325,
 " methods": -1.0141051,
 " functionality": -3.0297003
 },
 {
 " (": -0.0076147206,
 "/": -5.3859534,
 ".": -6.958371
 },
 {
 "aka": -0.030184068,
 "method": -3.856478,
 "called": -5.091049
 },
 {
 " methods": -0.0007605586,
 " \"": -7.8522696,
 " method": -9.153487
 },
 {
 ").": -0.016391708,
 ")": -4.2301235,
 "),": -6.672988
 }
 ],
 "text_offset": [
 48,
 49,
 50,
 54,
 60,
 68,
 72,
 83,
 87,
 93,
 100,
 102,
 105,
 116,
 117,
 121,
 132,
 142,
 144,
 147,
 155
 ]
 },
 "finish_reason": "stop"
 }
 ],
 "usage": {
 "prompt_tokens": 10,
 "completion_tokens": 21,
 "total_tokens": 31
 }
}


Notice that logprobs do not increase token consumption. That is because they are not part of the completion, they’re just metadata.

The returned logprobs contain four properties:

  • tokens — is an array of tokens generated by the language model. Each token is a word or part of a word.

  • token_logprobs — represents an array of log probabilities for each token in the tokens array. Log probability indicates the likelihood of the language model generating that token for the given prompt. The logprob values are negative, where smaller (more negative) numbers indicate a less likely outcome.

  • top_logprobs — represents an array of log probability objects, representing tokens most likely to be used for the completion. For example, if we specify the request parameter top_p = 5, then top_logprobs would contain log probabilities for top 50% of generated tokens.

  • text_offset — is an array of numbers, where each number corresponds to a token with the same index, and it represents the character offset of the beginning from the prompt text. This can be useful for keeping track of where the generated text starts in a larger context.

This parameter doesn’t affect how completions are generated and is used for debugging and analyzing completions. You can use it to gain insights as to why the language model is making decisions to pick some tokens over others. If you’re getting back completions that are downright erroneous, you can retrieve logprobs to help you analyze the problem.

Completion language models can return additional metadata about generated completions. To retrieve this metadata we have to set the logprobs parameter.



Logit Bias

The logit_bias request parameter is used to modify the likelihood of specified tokens appearing in the completion. We can use this parameter to provide hints to the language model about which tokens we want or don’t want to appear in the completion. It basically allows us to make the model more biased toward certain keywords or topics.

Exclusion Bias

Let’s say I want to ask the language model to explain JavaScript objects without mentioning the words: “key”, “value”, “key-value” and “pair”. We can do that by defining an exclusion bias for tokens of these words.

First, we need to convert words into tokens. We can use the OpenAI Tokenzier tool for this:


Now we specify the logit_bias parameter in our request with an object containing key-value pairs (kind of ironic, considering what we’re doing), where each key represents the token and the value represents the bias for that token. For the value, we can provide an integer between -100 and 100. Lower bias values reduce the odds of the token appearing, while higher values increase them.

Since we want the language model to exclude the words “key-value pair”, I will set a -100 value for their tokens:

// Completion request with logit_bias
// for excluding certain tokens
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects.",
 "max_tokens": 1024,
 "logit_bias": {
 "2539": -100, 
 "8367": -100, 
 "12": -100, 
 "24874": -100 
 }
}


Here is the response:

{
 ...
 "choices": [
 {
 "text": "\n\nJavaScript objects are collections of key/property pairs in the form of an associative array. They are used to store data and information in a structured way. Objects literal syntax uses curly braces {}, and the key/property pairs are separated by a colon. They can hold multiple different types of data, including functions, strings, numbers, arrays, and even other objects. JavaScript objects are fundamental units of the language and are used to describe the state or behavior of an entity in programming.",
 ...
 }
 ],
 ...
}


Now it may shock you to learn that the words “key”, “value” and “pairs” are still mentioned in the completion, even though their tokens have a -100 bias, which should completely prevent them from appearing. Well, that’s because there is a space before the words. Spaces also count as tokens, so if we truly wanted to ensure the words “key”, “value”, “pair” are not mentioned, we should also define a -100 bias for those words with spaces:

Let’s add the additional tokens to the logit_bias parameter and send the request:

// Completion request with comprehensive logit_bias
// for excluding certain tokens
{
 "model": "text-davinci-003",
 "prompt": "Explain JavaScript objects.",
 "max_tokens": 1024,
 "logit_bias": {
 "2539": -100, 
 "8367": -100, 
 "12": -100, 
 "24874": -100,
 "1994": -100, 
 "8251": -100, 
 "1988": -100, 
 "3815": -100, 
 "5166": -100, 
 "14729": -100
 }
}


Here is the response:

{
 ...
 "choices": [
 {
 "text": "\n\nJavaScript objects are collections of properties. Properties are associations between a name (or \"Key\") and a Value. JavaScript objects can be manipulated and manipulated, when they have or have not been declared with the 'var' keyword. Objects can be used to storeand organize data, as well as to create relationships between different pieces of data. JavaScript objects can contain functions, arrays, and other objects which allows for complex data structures to be created.",
 ...
 }
 ],
 ...
}


Notice how the language model avoids mentioning the words from the logit_bias parameter, however, it still uses the words “Key” or “Value” because they have different tokens due to capital letters. This is why you need to be very thorough when defining the tokens for logit_bias otherwise they might appear in a different form (prefixed, capitalized, etc.)

Enjoy this post?

Buy AI Central a coffee

More from AI Central