The ChatGPT API has opened up extraordinary possibilities for developers who want to integrate conversational AI into their applications. Whether you are building a customer support bot, a coding assistant, or a creative writing companion, the OpenAI API gives you direct access to the same powerful language models behind ChatGPT. In this tutorial, we will walk through building a fully functional AI chatbot from scratch using Python, covering everything from basic API calls to streaming responses and maintaining conversation context.
Setting Up Your Environment
Before writing any code, you need an OpenAI API key and the official Python library installed. Head to platform.openai.com, create an account, and generate an API key from your dashboard. Then set up your Python environment:
pip install openai python-dotenv
Create a .env file in your project root to store your API key securely:
OPENAI_API_KEY=sk-your-api-key-here
Now create your main Python file and load the environment variables:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
The openai library version 1.x uses a client-based pattern, which is cleaner and more explicit than the older module-level approach. The OpenAI client handles authentication, retries, and connection management automatically.
Making Your First Chat Completion Request
The core of the ChatGPT API is the chat completions endpoint. It accepts a list of messages, each with a role (system, user, or assistant) and content. The system message sets the behavior of the assistant, while user and assistant messages form the conversation history.
def get_chat_response(user_message: str) -> str:
"""Send a single message and return the assistant's reply."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a helpful programming assistant. "
"You give concise, accurate answers with code examples."
},
{
"role": "user",
"content": user_message
}
],
temperature=0.7,
max_tokens=1024
)
return response.choices[0].message.content
# Test it out
answer = get_chat_response("How do I reverse a string in Python?")
print(answer)
The temperature parameter controls randomness. A value of 0.0 gives deterministic responses, while 1.0 adds more creativity. For a chatbot, 0.7 strikes a good balance between helpful and varied answers.
Building a Chatbot with Conversation Memory
A single request-response is useful, but a real chatbot needs to remember the conversation. The API itself is stateless, so you must send the full message history with each request. Here is a complete chatbot class that manages conversation context:
class Chatbot:
def __init__(self, system_prompt: str = "You are a helpful assistant.",
model: str = "gpt-4o"):
self.model = model
self.messages: list[dict] = [
{"role": "system", "content": system_prompt}
]
def chat(self, user_input: str) -> str:
"""Send a message and get a response, maintaining history."""
self.messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=1024
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def reset(self):
"""Clear conversation history, keeping the system prompt."""
self.messages = [self.messages[0]]
def main():
bot = Chatbot(
system_prompt="You are ByteBot, a friendly coding tutor. "
"Explain concepts clearly with examples."
)
print("ByteBot is ready! Type 'quit' to exit, 'reset' to start over.\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "quit":
print("Goodbye!")
break
if user_input.lower() == "reset":
bot.reset()
print("Conversation reset.\n")
continue
response = bot.chat(user_input)
print(f"\nByteBot: {response}\n")
if __name__ == "__main__":
main()
This chatbot stores every exchange in self.messages, so the model receives full context each time. Keep in mind that longer conversations consume more tokens, so you may want to implement a sliding window or summarization strategy for production use.
Adding Streaming Responses
For a better user experience, you can stream the response token by token instead of waiting for the complete answer. This makes the chatbot feel much more responsive, especially for longer replies:
def chat_stream(self, user_input: str) -> str:
"""Send a message and stream the response in real-time."""
self.messages.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=1024,
stream=True
)
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
full_response += delta.content
print() # newline after streaming completes
self.messages.append({"role": "assistant", "content": full_response})
return full_response
When stream=True, the API returns an iterator of chunk objects. Each chunk contains a delta with partial content. You print each piece immediately and accumulate the full response for your conversation history.
Error Handling and Production Considerations
A production chatbot needs robust error handling. The OpenAI library provides specific exception classes for different failure scenarios:
from openai import (
APIConnectionError,
RateLimitError,
APIStatusError
)
import time
def chat_with_retry(self, user_input: str, max_retries: int = 3) -> str:
"""Chat with automatic retry on transient failures."""
self.messages.append({"role": "user", "content": user_input})
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=1024
)
assistant_message = response.choices[0].message.content
self.messages.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
except APIConnectionError:
print("Connection error. Check your network.")
if attempt == max_retries - 1:
self.messages.pop() # remove failed user message
raise
except APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")
self.messages.pop()
raise
self.messages.pop()
raise RuntimeError("Max retries exceeded")
This implements exponential backoff for rate limits, which is essential when your chatbot handles many concurrent users. Always remove the user message from history if the request ultimately fails, so your conversation state remains consistent.
Conclusion
You now have a solid foundation for building AI chatbots with the OpenAI API. We covered the basics of chat completions, built a class with conversation memory, added streaming for real-time responses, and implemented production-grade error handling. From here, you can extend the chatbot with function calling for tool use, integrate it into a web framework like FastAPI or Flask, or add a database to persist conversation history across sessions. The key takeaway is that the API is stateless by design, so your application controls the context, giving you complete flexibility over how your chatbot behaves.