Defining AI Output Schemas Using OpenAI's Structured Outputs
To reach a broader audience, this article has been translated from Japanese.
You can find the original version here.
Recently, OpenAI released a feature called Structured Outputs.
Structured Outputs is a feature that enforces structured output, as its name suggests. Previously, there was a parameter to return responses from AI in JSON format (by specifying json_object
in response_format
). However, this required specifying a concrete JSON structure in the prompt, and the JSON response might not always be as expected, necessitating additional implementations like validation and retries. The newly released Structured Outputs enforce AI to generate responses according to a schema by specifying a JSON schema in a dedicated parameter instead of the prompt.
I tried out this feature and would like to introduce it briefly here. As a sample topic, I will create a simple quiz generation feature.
Structured Outputs can be used with both response formats and Function calling, but here I will use the response format. Function calling is discussed in a column, so please refer to that.
Setup
#Here, we will implement using Node.js and TypeScript. Create any directory and set up an NPM project.
npm init -f
npm install openai zod @inquirer/prompts typescript tsx
npx tsc --init
In addition to the OpenAI library, we install Zod for schema generation and @inquirer/prompts for quiz prompt input. We are using the latest 4.55.1
version of the OpenAI Node library at this time. Structured Outputs are available from 4.55.0
onwards.
Note that since it's not the main topic, the TypeScript-related setup methods are omitted[1].
Defining Structure with JSON Schema
#This method is not recommended but is fundamental to understanding Structured Outputs.
The source code will look like this:
import OpenAI from 'openai';
import { input } from '@inquirer/prompts';
const client = new OpenAI();
// JSON Schema
const schema = {
type: 'object',
properties: {
question: {
type: 'string'
},
choices: {
type: 'array',
items: {
type: 'object',
properties: {
num: {
type: 'number',
description: 'Sequential number starting from 1'
},
answer: {
type: 'string'
}
},
required: [
'num',
'answer'
],
additionalProperties: false
}
},
correct_num: {
type: 'number'
},
score: {
type: 'number',
description: '1 to 10 based on difficulty'
}
},
required: [
'question',
'choices',
'correct_num',
'score'
],
additionalProperties: false
};
// Execute API with parse instead of create (beta)
const completion = await client.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06', // Supports models like gpt-4o-mini, gpt-4o-2024-08-06 onwards
messages: [
{ role: 'user', content: 'Give me a difficult question!' }
],
response_format: {
// Enable Structured Output
type: 'json_schema',
json_schema: {
name: 'quiz',
strict: true,
schema,
}
}
});
// Retrieve parsed response according to JSON schema
const quiz = completion.choices[0].message.parsed as any;
// Start quiz
const message = `${quiz.question}
${quiz.choices.map((choice: any) => `${choice.num}:${choice.answer}`).join('\n')}
`;
const answer = await input({ message: message });
if (answer === quiz.correct_num.toString()) {
console.log(`Correct!! You got ${quiz.score} points!!`);
} else {
console.log(`Too bad!! The correct answer was ${quiz.correct_num}!!`);
}
In the above, the JSON structure for the response is first defined using a JSON schema. Then, the OpenAI Chat Completion API is executed. The API used here is the newly introduced parse (still in beta) instead of the traditional create[2].
At this time, set the response_format
property of the parameter as follows:
- Specify
json_schema
(JSON schema) fortype
- Specify
true
(follow the schema) forstrict
- Specify the predefined JSON schema for
schema
Using the new API parse, the AI response is obtained from parsed instead of the traditional content (no need for JSON.parse). This object adheres to the JSON schema, so no need for structure checks, etc.
Execute this script.
npx tsx jsonschema.ts
The quiz game will start as follows.
? You need to disarm a time bomb. Which of the following is the appropriate first step?
1: Cut the blue wire
2: Cut the red wire
3: Turn off the bomb's power supply
4: Advance the timer by 2 minutes
5: Press the button on the digital display
3
Correct!! You got 8 points!!
The schema that can be specified for Structured Outputs is a subset of JSON Schema, and not all specifications can be used. For example, there are restrictions like the following that you might mistakenly overlook:
- All fields are required (
required
) additionalProperties
must be set to false- Constraints like
minLength
,maxLength
, etc., cannot be specified
If these constraints are violated, an error will occur when executing the API. Details are described in the official documentation below (there are workarounds for some).
Although not verified, if a request violates OpenAI's policy, a response adhering to the schema will not be returned even with Structured Outputs. According to the official documentation, in this case, the refusal
property of the response will be set.
In the sample code in this article, this check is not performed, but in actual operation, it seems necessary to check refusal
before using the response.
if (completion.choices[0].message.refusal) {
throw new Error(completion.choices[0].message.refusal); // Policy violation
}
const quiz = completion.choices[0].message.parsed as any;
To use Structured Outputs with Function Calling, specify strict: true
along with the schema of the function arguments.
const completion = await client.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'user', content: 'call sampleFunc!' }
],
tools: [{
type: 'function',
function: {
name: 'sampleFunc',
strict: true, // Enforce argument generation according to schema
parameters: schema
}
}]
});
// Returned in accordance with JSON schema
const args = completion.choices[0].message.tool_calls?.[0].function.parsed_arguments as any;
Of course, the schema specified here must also comply with the constraints of Structured Outputs, just like the response format (response_format
).
Defining Structure with Zod Schema
#For those familiar with TypeScript, many might be using schema libraries like Zod. From version 4.55.0
of the official OpenAI Node library, helpers for Zod schema have been provided.
By using this, you can implement it simply and fully utilize TypeScript's type system.
Let's rewrite the previous code using the Zod schema.
import OpenAI from 'openai';
import z from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';
import { input } from '@inquirer/prompts';
const client = new OpenAI();
// Zod Schema
const schema = z.object({
question: z.string(),
choices: z.array(z.object({
num: z.number().describe('Sequential number starting from 1'),
answer: z.string()
})),
correct_num: z.number(),
score: z.number().describe('1 to 10 based on difficulty')
});
const completion = await client.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'user', content: 'Give me a difficult question!' }
],
// Enable Structured Output
response_format: zodResponseFormat(schema, 'quiz')
});
// Get the parsed response with Zod schema (can use z.infer<typeof schema>)
const quiz = completion.choices[0].message.parsed as z.infer<typeof schema>;
// Start quiz
const message = `${quiz.question}
${quiz.choices.map(choice => `${choice.num}:${choice.answer}`).join('\n')}
`;
const answer = await input({ message });
if (answer === quiz.correct_num.toString()) {
console.log(`Correct!! You got ${quiz.score} points!!`);
} else {
console.log(`Too bad!! The correct answer was ${quiz.correct_num}!!`);
}
zodResponseFormat
is the helper function. This function converts the Zod schema into the aforementioned JSON schema. The output of the AI will follow the Zod schema, allowing you to use casting with z.infer<typeof schema>
. Subsequent property access is smooth with IDE's auto-completion. If you're using the Node library, you should definitely use this.
For Function Calling, a zodFunction
utility is also provided for Zod schema. The source code in this case will be as follows.
const completion = await client.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'user', content: 'Give me a difficult question!' }
],
tools: [ zodFunction({ name: 'sampleFunc', parameters: schema }) ]
});
// Returned in accordance with JSON schema
const args = completion.choices[0].message.tool_calls[0].function.parsed_arguments as z.infer<typeof schema>;
When using Structured Outputs with Python, Pydantic can be used to define schemas.
Below is the code of this sample rewritten in Python.
import openai
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
client = OpenAI()
class Choice(BaseModel):
num: int = Field(description="Sequential number starting from 1")
answer: str
class Quiz(BaseModel):
question: str
choices: List[Choice]
correct_num: int
score: int = Field(description="1 to 10 based on difficulty")
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[{
"role": "user",
"content": "Give me a difficult question!"
}],
response_format=Quiz
# For Function calling
# tools=[openai.pydantic_function_tool(Quiz)]
)
response = completion.choices[0].message.parsed
# For Function Calling
# response = (completion.choices[0].message.tool_calls or [])[0].function.parsed_arguments
assert isinstance(response, Quiz)
print(response)
# question='Which of the following events occurred in BC?' choices=[Choice(num=1, answer='Burning of the Library of Alexandria'), Choice(num=2, answer='Issuance of the Code of Hammurabi'), Choice(num=3, answer='East-West division of the Roman Empire'), Choice(num=4, answer='Columbus discovers the New World')] correct_num=2 score=9
The basic flow is the same as when using Zod. The response output structure defined with Pydantic is specified in the API (parse).
The helpers for the Python library are summarized below.
Summary
#Structured Outputs seem to become the standard choice when integrating AI into applications. There are various uses, so I would like to use it effectively.