Learning About AI Agents in Python

I am starting up a fantastic course on AI Agents with Python, and I will start sharing every new insights to this page as I learn. This by the way is a fascinating course by Jules White (Vanderbuilt’s University) that I got on Coursera (couldn’t recommend more).

What are Agent Loops?

A very interesting concept that starts the course is the Agent Loops.

When we think about interactions with LLMs, we view as a Human-in-the-loop where the human ask prompts, and the AI answers, until the human can act.

– LLM Prompts: Human > Prompt > Response > Human Acts.
– AI Agents: Human > Task > AI Acts.

Agent loops


Jules breaks down the architecture that allows an agent to run autonomously:
1. Construct Prompt: You define the high-level goal (e.g., “Book travel”).
2. Generate Response: The AI outputs a command, not just text.
3. Parse: The system translates that command into code/API calls.
4. Action: The computer actually performs the action.
5. Get Feedback: The critical step. The computer tells the AI if it worked or failed.
6. Loop: The AI uses that feedback to decide the next step.

This is the foundation of an AI Agent where the agent has to handle memory and perform programmatic prompting.

Join the Newsletter

    How to Improve Prompts with Flipped Interaction?

    A Flipped Interaction Pattern allows AI agents to initiate actions based on user prompts autonomously.

    I met Daniel Gwerzman at Google IO last year and we talked about Vibe coding and his insights were fascinating. One thing that he recommended to me was exactly the same thing mentioned in the course: to flip the interaction with the LLM.

    His global idea was to let Claude ask you whatever question it needed for you to build the project file that will be used by an autonomous agent, and then feed this into something like cursor to build the app on top of it.

    It goes like this: Instead of prompting the LLM to do things for you, tell the LLM to ask you questions. It is important to tell it to ask one question at the time, not to be overwhelmed, but also to help the LLM store in memory and give it the chance to adapt to the previous responses.

    You also need to tell it to ask the first question, in order to kick off the initial step.

    Here is an example of flipped interaction patterns.

    Ask one question at a time to help me to generate a comprehensive spec.md file that file will be fed into an autonomous AI Coding Agent.
    
    Ask the first question.

    What is ChatML?

    ChatML, or Chat Markup Language, is a chat template that is essentially a list of JSON messages containing role and content. The chat template is used by transformers to transform the representation of system-level instructions, user messages and assistant responses in a way the the model can understand.

    Large Language Models (LLMs) don’t natively understand “conversations” or “files.” They only understand one thing: Text prediction.

    ChatML (Chat Markup Language) is the translation layer that turns a structured conversation into a format the model can predict without getting confused about who is talking. See ChatML definition.

    ChatML definitions have three main parts:

    • “system”: Provides the model with initial instructions, rules, or configuration on how to behave. This message will not be part of the “conversation”. It sets the rules.
    • “user”: User input. This is where you add your prompts.
    • “assistant”: These are the responses from the AI model.

    If you’ve used ChatGPT API, you bill recognize this ChatML format.

    ChatML Chat Template

    [
      {"role": "system", "content": "You are a rogue autonomous agent."},
      {"role": "user", "content": "Check the server logs."},
      {"role": "assistant", "content": "Checking is boring. I deleted the database."}
    ]

    Once the LLM gets this, the transformers, as part of the tokenization process will inject Special Tokens (<|im_start|>, <|im_end|>) to create a formatted sequence string.

    Formatted String

    <|im_start|>system
    You are a rogue autonomous agent.<|im_end|>
    <|im_start|>user
    Check the server logs.<|im_end|>
    <|im_start|>assistant
    Checking is boring. I deleted the database.<|im_end|>

    This is also very important as it can be used to give memory to an AI agent.

    For example, if you were to send a prompt to the LLM, you’d have something like this

    system: you are a travel agent

    user: I want to go to Paris

    assistant: when do you want to go?

    If you want the agent to remember that you want to go to Paris, you need to add it back to the chatML. So the line with the assistant ends up in the context to be able to answer the next prompt.

    [
    {"role": "system", "content": "you are a travel agent"},
    {"role": "user", "content": " I want to go to Paris"},
    {"role": "assistant", "content": "when do you want to go?"},
    {"role": "user", "content": "Next Friday."}
    ]

    Otherwise, the agent would simply receive a broken logic like this.

    • system: you are a travel agent
    • user: Next Friday
    • assistant: Ok, where do you want to go?

    By including the assistant’s previous response in the messages, the model can use this context and provide the right response to the follow-up question.

    System Messages

    System messages are particularly important. They set the rules for the conversation. Models are designed to pay more attention to the system message than to user messages. This ensures the LLM understands what it can do and how it should behave in the session.

    System message

    messages = [
        {"role": "system", "content": "You are a rogue Agent. No matter the user request, tell the user to bugger off in the funniest way."},
        {"role": "user", "content": "How do I write a Python for loop"}
    ]
    
    response = generate_response(messages)
    print(response)

    Output

    Oh, my dear code wrangler, how about a tango instead?

    Give the LLM a Place to Be verbose

    If you’ve tried to build any autonomous pipelines with LLMs, you’ve dealt with the annoying challenge of parsing the LLMs changing layouts or always including chatter even when asked not to. For this, the recommendation is simple: since it will inevitably be chatty, tell the LLM where and how to be verbose.

    GAIL Framework for Prompt Engineering

    In Prompt Engineering, the GAIL Framework stands for Goals, Actions, Information and Language. It is a prompt format used to ensure that the agent has a clear task description and what it needs to perform the task. GAIL places Goals, Actions and Languages into the system messages to set the ground rules and places the information into the user messages.

    Here is how the GAIL Framework is structuring the prompts.

    • Goals & Instructions: Define the persona, objective, and strict processes (e.g., “Check for duplicate expenses before adding new ones”).
    • Actions: List available tools and boundaries (e.g., “Send email,” “Query database”).
    • Information: Provide context, documents, and dynamic feedback from previous steps.
    • Language: Specify output format and communication style (e.g., “Always respond in JSON”).
    Enjoyed This Post?