Generative AI applications with Llama Stack: A notebook-guided journey to an intelligent operations agent

June 12, 2025J William Murdock6-minute read

Generative AI (gen AI) is evolving at lightning speed, offering incredible potential for building intelligent applications. But harnessing this power requires robust tools. Enter Llama Stack, an open source framework for building generative applications. Whether you're building sophisticated chatbots, intelligent search engines or complex autonomous agents, Llama Stack provides the building blocks you need.

To illustrate these capabilities, let's imagine a fictional company, Parasol Insurance. In our scenario, their operations team faces the challenges of managing a growing number of Red Hat OpenShift clusters, often dealing with fragmented documentation, recurring incidents and the need for repetitive troubleshooting. To alleviate cognitive overload and accelerate incident response, we'll show how they could develop an advanced agent using Llama Stack. This illustrative agent aims to integrate retrieval-augmented generation (RAG) for knowledge retrieval, OpenShift control via a Model Context Protocol (MCP) and communication through Slack.

Getting started with any powerful framework can seem daunting. That's why we've put together a series of hands-on Python notebooks designed to guide you step-by-step, from the absolute basics to constructing advanced, multi-component agentic systems. This series tells a story – a journey of progressively adding capabilities to our example Parasol Insurance OpenShift operations agent. We start simple and layer concepts, culminating in a notebook that integrates many of the techniques learned along the way. Let's dive in!

0. Getting started with Llama Stack

Every journey begins with a first step. This notebook guides you through installing and configuring Llama Stack correctly. We'll cover the fundamental concepts and components, install dependencies, deploy a Llama Stack server and configure some commonly used inference parameters. This is the foundational setup we would use to begin building our intelligent agent for the fictional Parasol Insurance.

Focus: Core concepts, installation, configuration
Notebook: https://red.ht/getting_started_with_Llama_Stack-ipynb

1. Simple RAG with Llama Stack (Level1_simple_RAG.ipynb)

RAG is one of the most powerful applications of gen AI. Instead of relying solely on the model’s pre-trained knowledge, RAG allows you to provide custom data to your application as needed to answer user requests. For our Parasol Insurance example agent, this means accessing and synthesizing information from their internal OpenShift documentation for efficient troubleshooting. This notebook introduces the foundational principles of RAG using Llama Stack. You'll learn how to index your documents, retrieve relevant information based on a query and generate an answer grounded in your data.

Focus: Demonstrates the foundational RAG component, showcasing how to use Llama Stack to retrieve information from an internal knowledge base to answer queriesGenerative AI (gen AI) is evolving at lightning speed, offering incredible potential for building intelligent applications.
Task example: “How do I install OpenShift?”
Agent capability: Uses RAG to retrieve and summarize the OpenShift Guide
Notebook: https://red.ht/simple_RAG-ipynb

2. A simple Llama Stack web search agent (Level2_simple_agent_with_websearch.ipynb)

Now that we understand how to ground large language models (LLMs) with static data (RAG) from internal documentation, let's give our agent the ability to interact with the dynamic world. This notebook introduces the concept of agents. We'll build a simple agent that can use a tool – in this case, a web search tool – to answer questions that require up-to-date information beyond its training data, such as finding the latest updates on OpenShift.

Focus: Introduces the basic agent framework with the ability to utilize tools. This notebook showcases the agent's capacity to interact with the external world
Task example: “What's latest in OpenShift?”
Agent capability: Uses a web_search_tool to retrieve and summarize publicly available information
Notebook: https://red.ht/simple_agent_with_websearch-ipynb

3. A more advanced multi-tool Llama Stack agent with chaining and reasoning (Level3_advanced_agent_with_Prompt_Chaining_and_ReAct.ipynb)

Real-world tasks, like those that might be faced by an OpenShift operations team, often require multiple steps and different tools. Building on the previous notebook, we now explore how to create more sophisticated agents. This notebook demonstrates how to equip an agent with multiple tools (like web search and a location client tool) and provide mechanisms for using these tools together. It includes chaining – where the agent can plan and execute a sequence of actions, potentially using the output of one tool as the input for the next. It also includes a mechanism called ReAct in which the model alternates between reasoning and actions. By generating intermediate thought traces, using tools based on those traces and adapting to the results, the model can handle complex, multi-step problems, such as predicting weather-related risks to infrastructure.

Focus: Builds upon the simple agent by incorporating location awareness, prompt chaining for complex reasoning and the ReAct framework for structured action planning
Task example: "Are there any weather-related risks in my area that could disrupt network connectivity or system availability?"
Agent capabilities: Utilizes a web_search_tool for weather information and a get_location client tool. Demonstrates prompt chaining and the ReAct agent methodology
Notebook: https://red.ht/advanced_agent_with_Prompt_Chaining_and_ReAct-ipynb

4. Agentic RAG with Llama Stack (Level4_rag_agent.ipynb)

We've explored RAG and agents separately. What happens when we combine them? This notebook introduces agentic RAG, where the agent intelligently decides when and how to use the RAG pipeline (our example internal OpenShift knowledge base) as a tool. This would allow an agent like the one for Parasol Insurance to flexibly switch between using its internal knowledge, searching the web, or querying its specific knowledge base via RAG depending on the user's query, such as "How to install OpenShift?"

Focus: Combines the autonomous agent capabilities with the internal knowledge retrieval of RAG. The agent can now strategically decide when to consult internal documentation
Task example: “How to install OpenShift?”
Agent capability: Leverages RAG as a tool to answer user queries based on internal documents, intelligently determining when this knowledge source is relevant
Notebook: https://red.ht/RAG_agent-ipynb

5. Llama Stack agents with MCP tools (Level5_agents_and_mcp.ipynb)

Llama Stack's flexibility allows integration with various specialized tools. This notebook focuses on incorporating tools that comply with MCP. MCP is “USB‑C for AI,” providing an open protocol governing the way agents fetch data and invoke functions. For our Parasol Insurance example, this means enabling real-time interaction with an OpenShift environment and automating operational tasks like checking cluster status or reviewing logs and even sending Slack messages. We demonstrate how to configure and utilize MCP-based tools within the Llama Stack agent framework, unlocking new capabilities for LLM applications.

Focus: Integrates the agent with OpenShift and Slack MCP servers, enabling real-time interaction and automation of operational tasks
Task examples:
- "View the logs for pod slack-test in the llama-serve OpenShift namespace. Categorize it as normal or error"
- "Summarize the results with the pod name, category along with a briefly explanation as to why you categorized it as normal or error. Respond with plain text only. Do not wrap your response in additional quotation marks"
- “Send a message with the summarization to the demos channel on Slack"
Agent capability: Utilizes OpenShift and Slack tools to demonstrate a complex workflow, of interacting with OpenShift and updating the team via Slack
Notebook: https://red.ht/agents_and_mcp-ipynb

6. Llama Stack agents with MCP tools and agentic RAG (Level6_agents_MCP_and_RAG.ipynb)

This is where it all comes together for our illustrative Parasol Insurance operations agent! Our final notebook synthesizes the concepts from the previous entries. We build a sophisticated agent that leverages:

Multiple tools, including specialized MCP tools (for OpenShift and Slack)
Agentic RAG to dynamically query an internal knowledge base when needed for troubleshooting solutions
The ability to chain actions and make complex decisions for a complete incident response flow

This example showcases the power of Llama Stack in building robust, multi-faceted LLM applications capable of handling complex, real-world tasks like analyzing pod logs, finding relevant solutions from documentation and communicating updates automatically.

Focus: Represents the culmination of our efforts, showcasing a complete incident response flow by integrating prompt chaining, RAG for solution retrieval and MCP for OpenShift interaction and Slack communication.
Task examples:
- "View the logs for pod slack-test in the llama-serve OpenShift namespace. Categorize it as normal or error"
- "Search for solutions about this error and provide a summary of the steps to take in just 1-2 sentences"
- “Send a message with the summarization to the demos channel on Slack"
Agent capability: Combines the use of MCP tools and RAG to automate the process of analyzing pod logs, finding relevant solutions and sending a notification to the team via Slack on errors with steps to take
Notebook: https://red.ht/agents_MCP_and_RAG-ipynb

Start your journey today!

These notebooks provide a practical, hands-on path to mastering key Llama Stack capabilities, illustrated through the development scenario of an intelligent operations agent for OpenShift using our fictional company, Parasol Insurance. By working through these notebooks, you'll gain the skills to build everything from simple Q&A systems using your data to complex, tool-using agents. We encourage you to clone the repository, run the notebooks, experiment and adapt the code for your own projects. The world of gen AI application development is yours to explore, and Llama Stack is here to help you build it.

If you have any feedback about this work, please let us know.

About the author

J William Murdock

Senior Principal Machine Learning Engineer

J. William Murdock is a pioneering AI strategist who has worked at IBM and Red Hat since 2003. As a foundational member of IBM’s original Watson team, he played a critical role in Watson’s historic Jeopardy! victory in 2011, catalyzing IBM’s strategic pivot to AI and significantly impacting its market trajectory. Murdock was instrumental in steering Watson from a groundbreaking research project to a commercial AI powerhouse, underpinning IBM’s growth and contributing to its multi-billion-dollar market capitalization. Now at Red Hat, he is working on enhancing Llama Stack RAG capabilities, driving rapid AI advancements, and comprehensive evaluation frameworks. With a proven track record of executing strategic AI initiatives, Murdock continues to shape the future of enterprise AI solutions.

Read full bio

Keep exploring

Browse by channel

Explore all channels

Generative AI applications with Llama Stack: A notebook-guided journey to an intelligent operations agent

Red Hat OpenShift Data Foundation | Product Trial

About the author

J William Murdock

More like this

Keep exploring

Browse by channel

Products & portfolios

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links