Subscribe to the feed

Generative AI (gen AI) is evolving at lightning speed, offering incredible potential for building intelligent applications. But harnessing this power requires robust tools. Enter Llama Stack, an open source framework for building generative applications. Whether you're building sophisticated chatbots, intelligent search engines or complex autonomous agents, Llama Stack provides the building blocks you need.

To illustrate these capabilities, let's imagine a fictional company, Parasol Insurance. In our scenario, their operations team faces the challenges of managing a growing number of Red Hat OpenShift clusters, often dealing with fragmented documentation, recurring incidents and the need for repetitive troubleshooting. To alleviate cognitive overload and accelerate incident response, we'll show how they could develop an advanced agent using Llama Stack. This illustrative agent aims to integrate retrieval-augmented generation (RAG) for knowledge retrieval, OpenShift control via a Model Context Protocol (MCP) and communication through Slack.

Getting started with any powerful framework can seem daunting. That's why we've put together a series of hands-on Python notebooks designed to guide you step-by-step, from the absolute basics to constructing advanced, multi-component agentic systems. This series tells a story – a journey of progressively adding capabilities to our example Parasol Insurance OpenShift operations agent. We start simple and layer concepts, culminating in a notebook that integrates many of the techniques learned along the way. Let's dive in!

0. Getting started with Llama Stack

Every journey begins with a first step. This notebook guides you through installing and configuring Llama Stack correctly. We'll cover the fundamental concepts and components, install dependencies, deploy a Llama Stack server and configure some commonly used inference parameters. This is the foundational setup we would use to begin building our intelligent agent for the fictional Parasol Insurance.

1. Simple RAG with Llama Stack (Level1_simple_RAG.ipynb)

RAG is one of the most powerful applications of gen AI. Instead of relying solely on the model’s pre-trained knowledge, RAG allows you to provide custom data to your application as needed to answer user requests. For our Parasol Insurance example agent, this means accessing and synthesizing information from their internal OpenShift documentation for efficient troubleshooting. This notebook introduces the foundational principles of RAG using Llama Stack. You'll learn how to index your documents, retrieve relevant information based on a query and generate an answer grounded in your data.

  • Focus: Demonstrates the foundational RAG component, showcasing how to use Llama Stack to retrieve information from an internal knowledge base to answer queriesGenerative AI (gen AI) is evolving at lightning speed, offering incredible potential for building intelligent applications.
  • Task example: “How do I install OpenShift?”
  • Agent capability: Uses RAG to retrieve and summarize the OpenShift Guide
  • Notebook: https://red.ht/simple_RAG-ipynb

2. A simple Llama Stack web search agent (Level2_simple_agent_with_websearch.ipynb)

Now that we understand how to ground large language models (LLMs) with static data (RAG) from internal documentation, let's give our agent the ability to interact with the dynamic world. This notebook introduces the concept of agents. We'll build a simple agent that can use a tool – in this case, a web search tool – to answer questions that require up-to-date information beyond its training data, such as finding the latest updates on OpenShift.

  • Focus: Introduces the basic agent framework with the ability to utilize tools. This notebook showcases the agent's capacity to interact with the external world
  • Task example: “What's latest in OpenShift?”
  • Agent capability: Uses a web_search_tool to retrieve and summarize publicly available information
  • Notebook: https://red.ht/simple_agent_with_websearch-ipynb

3. A more advanced multi-tool Llama Stack agent with chaining and reasoning (Level3_advanced_agent_with_Prompt_Chaining_and_ReAct.ipynb)

Real-world tasks, like those that might be faced by an OpenShift operations team, often require multiple steps and different tools. Building on the previous notebook, we now explore how to create more sophisticated agents. This notebook demonstrates how to equip an agent with multiple tools (like web search and a location client tool) and provide mechanisms for using these tools together. It includes chaining – where the agent can plan and execute a sequence of actions, potentially using the output of one tool as the input for the next. It also includes a mechanism called ReAct in which the model alternates between reasoning and actions. By generating intermediate thought traces, using tools based on those traces and adapting to the results, the model can handle complex, multi-step problems, such as predicting weather-related risks to infrastructure.

  • Focus: Builds upon the simple agent by incorporating location awareness, prompt chaining for complex reasoning and the ReAct framework for structured action planning
  • Task example: "Are there any weather-related risks in my area that could disrupt network connectivity or system availability?"
  • Agent capabilities: Utilizes a web_search_tool for weather information and a get_location client tool. Demonstrates prompt chaining and the ReAct agent methodology
  • Notebook: https://red.ht/advanced_agent_with_Prompt_Chaining_and_ReAct-ipynb

4. Agentic RAG with Llama Stack (Level4_rag_agent.ipynb)

We've explored RAG and agents separately. What happens when we combine them? This notebook introduces agentic RAG, where the agent intelligently decides when and how to use the RAG pipeline (our example internal OpenShift knowledge base) as a tool. This would allow an agent like the one for Parasol Insurance to flexibly switch between using its internal knowledge, searching the web, or querying its specific knowledge base via RAG depending on the user's query, such as "How to install OpenShift?"

  • Focus: Combines the autonomous agent capabilities with the internal knowledge retrieval of RAG. The agent can now strategically decide when to consult internal documentation
  • Task example: “How to install OpenShift?”
  • Agent capability: Leverages RAG as a tool to answer user queries based on internal documents, intelligently determining when this knowledge source is relevant
  • Notebook: https://red.ht/RAG_agent-ipynb

5. Llama Stack agents with MCP tools (Level5_agents_and_mcp.ipynb)

Llama Stack's flexibility allows integration with various specialized tools. This notebook focuses on incorporating tools that comply with MCP. MCP is “USB‑C for AI,” providing an open protocol governing the way agents fetch data and invoke functions. For our Parasol Insurance example, this means enabling real-time interaction with an OpenShift environment and automating operational tasks like checking cluster status or reviewing logs and even sending Slack messages. We demonstrate how to configure and utilize MCP-based tools within the Llama Stack agent framework, unlocking new capabilities for LLM applications.

  • Focus: Integrates the agent with OpenShift and Slack MCP servers, enabling real-time interaction and automation of operational tasks
  • Task examples:
    • "View the logs for pod slack-test in the llama-serve OpenShift namespace. Categorize it as normal or error"
    • "Summarize the results with the pod name, category along with a briefly explanation as to why you categorized it as normal or error. Respond with plain text only. Do not wrap your response in additional quotation marks"
    • “Send a message with the summarization to the demos channel on Slack"
  • Agent capability: Utilizes OpenShift and Slack tools to demonstrate a complex workflow, of interacting with OpenShift and updating the team via Slack
  • Notebook: https://red.ht/agents_and_mcp-ipynb

6. Llama Stack agents with MCP tools and agentic RAG (Level6_agents_MCP_and_RAG.ipynb)

This is where it all comes together for our illustrative Parasol Insurance operations agent! Our final notebook synthesizes the concepts from the previous entries. We build a sophisticated agent that leverages:

  • Multiple tools, including specialized MCP tools (for OpenShift and Slack)
  • Agentic RAG to dynamically query an internal knowledge base when needed for troubleshooting solutions
  • The ability to chain actions and make complex decisions for a complete incident response flow

This example showcases the power of Llama Stack in building robust, multi-faceted LLM applications capable of handling complex, real-world tasks like analyzing pod logs, finding relevant solutions from documentation and communicating updates automatically.

  • Focus: Represents the culmination of our efforts, showcasing a complete incident response flow by integrating prompt chaining, RAG for solution retrieval and MCP for OpenShift interaction and Slack communication.
  • Task examples:
    • "View the logs for pod slack-test in the llama-serve OpenShift namespace. Categorize it as normal or error"
    • "Search for solutions about this error and provide a summary of the steps to take in just 1-2 sentences"
    • “Send a message with the summarization to the demos channel on Slack"
  • Agent capability: Combines the use of MCP tools and RAG to automate the process of analyzing pod logs, finding relevant solutions and sending a notification to the team via Slack on errors with steps to take
  • Notebook: https://red.ht/agents_MCP_and_RAG-ipynb

Start your journey today!

These notebooks provide a practical, hands-on path to mastering key Llama Stack capabilities, illustrated through the development scenario of an intelligent operations agent for OpenShift using our fictional company, Parasol Insurance. By working through these notebooks, you'll gain the skills to build everything from simple Q&A systems using your data to complex, tool-using agents. We encourage you to clone the repository, run the notebooks, experiment and adapt the code for your own projects. The world of gen AI application development is yours to explore, and Llama Stack is here to help you build it.

If you have any feedback about this work, please let us know.

product trial

Red Hat OpenShift Data Foundation | Product Trial

Software-defined storage for container environments.

About the author

J. William Murdock is a pioneering AI strategist who has worked at IBM and Red Hat since 2003. As a foundational member of IBM’s original Watson team, he played a critical role in Watson’s historic Jeopardy! victory in 2011, catalyzing IBM’s strategic pivot to AI and significantly impacting its market trajectory. Murdock was instrumental in steering Watson from a groundbreaking research project to a commercial AI powerhouse, underpinning IBM’s growth and contributing to its multi-billion-dollar market capitalization. Now at Red Hat, he is working on enhancing Llama Stack RAG capabilities, driving rapid AI advancements, and comprehensive evaluation frameworks. With a proven track record of executing strategic AI initiatives, Murdock continues to shape the future of enterprise AI solutions.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Keep exploring

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds