Building AI Agents from Scratch (Part 2) : A Conversational Search Agent with Ollama

In the first part of this series, we explored a basic reactive agent built with Ollama. Now, in this second installment, we delve deeper into agent development by leveraging the powerful function calling capability of LLMs. Our goal is to create a more sophisticated conversational search agent that can not only understand the context of past interactions but also intelligently decide when to utilize external tools—in this case, a web search tool—to provide more complete and accurate responses.

What is Function Calling

Function calling allows us to instruct LLMs to interact with external tools or APIs in a structured manner. By providing the LLM with a set of tools and their descriptions, we enable it to select and invoke the appropriate tool based on the user's query. This introduces a new layer of interaction, transforming our agent from simply reacting to inputs to proactively engaging with its environment.

It's important to note that this implementation is still a work in progress, and we'll continuously refine and optimize the agent's behavior throughout this series to improve its performance and robustness.

This journey will involve utilizing several cutting-edge technologies:

1. The Ollama Framework:

Ollama acts as our local AI playground, allowing us to run powerful LLMs without relying on cloud services, it simplifies the process of downloading, installing, and interacting with models like llama 3.1, ensuring an accessible and customizable ai experience, ollama's streamlined approach manages model weights, configurations, and datasets within a unified package, making it easy to leverage the latest advancements in LLMs.

2. Llama 3.1:

At the heart of our agent lies Llama 3.1, a state-of-the-art, instruction-tuned, large language model from Meta , this multilingual model excels in dialogue-based interactions and outperforms many open-source and closed-source chat models in industry benchmarks, llama 3.1's advanced transformer architecture, coupled with its training on a vast dataset and reinforcement learning techniques, allows it to generate insightful and helpful responses.

3. Jina Embeddings:

To power our web search tool, we'll employ the jina-embeddings-v2-base-en model. This English embedding model based on a BERT architecture excels in representing text data as vectors. This is crucial for semantic search, enabling our agent to find information relevant to a user's query based on meaning rather than just keyword matching. Embeddings generated by this model will be stored and searched using...

4. ChromaDB:

...ChromaDB, a powerful open-source vector database, chromadb's efficiency in storing and retrieving vector embeddings makes it ideal for managing our search results , it allows our agent to quickly find the most pertinent information retrieved from the web, ensuring a seamless and responsive search experience.

By combining these powerful tools, we'll engineer an agent that can engage in nuanced conversations, intelligently determine when external information is necessary, and provide well-informed responses.

Lets get started ..

Installing Ollama and Pulling Models

Here, we begin by installing Ollama, the local LLM runtime environment, using the provided installation script.

This script sets up Ollama on your system.

curl -fsSL https://ollama.com/install.sh | sh

Start ollama server :

ollama serve

After the Starting the server, we use ollama pull to download two specific models: * jina/jina-embeddings-v2-base-en: This is an embedding model, which will be used to create vector representations of text for efficient semantic search within the ChromaDB vector store. * llama3.1: This is the large language model that will power the core of our agent, enabling it to understand and generate text.

ollama pull jina/jina-embeddings-v2-base-en
ollama pull llama3.1

Installing and Importing Dependencies

pip install beautifulsoup4 instructor openai langchain-ollama langchain-chroma langchain_core openai

from concurrent.futures import ThreadPoolExecutor
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from typing import Dict, Any, List , Literal
from pydantic import BaseModel, Field , ValidationError
from inspect import signature
from bs4 import BeautifulSoup
from openai import OpenAI
import nest_asyncio
import instructor
import logging
import openai
import requests
import aiohttp
import asyncio
import re

Creating a Custom Search Tool

This section defines asynchronous functions for fetching and processing web search results from Google Search.

fetch Function: This function handles fetching the HTML content of a given URL using an aiohttp session. It takes the URL, headers, and optional parameters.

async def fetch(session, url,headers, params=None):
    async with session.get(url, params=params, headers=headers, timeout=30) as response:
        return await response.text()

fetch_page Function: This function extracts search results from the Google search results page. It takes the session, search parameters, page number, the list of results to append to, the total results to fetch, and headers.

It then crafts the URL for a specific search page based on start and num parameters, fetches the HTML, parses it with BeautifulSoup, and extracts title and link information for each result.

async def fetch_page(session, params, page_num, results , total_results_to_fetch,headers):
    params["start"] = (page_num - 1) * params["num"]
    html = await fetch(session, "https://www.google.com/search",headers, params)
    soup = BeautifulSoup(html, 'html.parser')

    for result in soup.select(".tF2Cxc"):
        if len(results) >= total_results_to_fetch:
            break
        title = result.select_one(".DKV0Md").text
        links = result.select_one(".yuRUbf a")["href"]

        results.append({
            "title": title,
            "links": links
        })

fetch_content Function: Similar to fetch, this function fetches the content of a specific URL but without search parameters.

async def fetch_content(session, url,headers):
    async with session.get(url, headers=headers, timeout=30) as response:
        return await response.text()

fetch_all_content Function: This function uses aiohttp.ClientSession to fetch the content of multiple URLs concurrently using asyncio. It takes the URLs and headers as input.

async def fetch_all_content(urls,headers):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_content(session, url,headers) for url in urls]
        return await asyncio.gather(*tasks)

Extracting Text from URLs

Here we define a function get_all_text_from_url which takes a URL and headers as input and returns all the text content from that URL.

def get_all_text_from_url(url,headers):
    response = requests.get(url, headers=headers, timeout=30)
    soup = BeautifulSoup(response.text, 'html.parser')
    for script in soup(["script", "style"]):
        script.extract()
    text = soup.get_text()
    lines = (line.strip() for line in text.splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    text = '\n'.join(chunk for chunk in chunks if chunk)
    return text

The function fetches the URL's content, parses it using BeautifulSoup, removes any script or style tags to clean up the HTML, extracts all the text, splits it into lines and phrases to remove redundant whitespace, and then joins the cleaned chunks back into a string.

Splitting Text into Chunks

The split_text_into_chunks function takes a large text and breaks it down into smaller chunks, suitable for processing by the LLM. Each chunk stays within a defined chunk_size.

The function first splits the text into sentences using regex, then iterates through these sentences.

def split_text_into_chunks(text, chunk_size):
    sentences = re.split(r'(?<=[.!?]) +', text)
    chunks = []
    current_chunk = []

    for sentence in sentences:
        if sum(len(s) for s in current_chunk) + len(sentence) + 1 > chunk_size:
            chunks.append(' '.join(current_chunk))
            current_chunk = [sentence]
        else:
            current_chunk.append(sentence)

    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks

It checks if adding the current sentence to the current chunk will exceed the chunk_size. If so, it creates a new chunk with the current sentence. Finally, it returns the list of chunks.

Processing Text Content Asynchronously

The process_text_content function utilizes asyncio to concurrently process multiple text contents into chunks.

async def process_text_content(texts, chunk_size):
    loop = asyncio.get_event_loop()
    tasks = [loop.run_in_executor(None, split_text_into_chunks, text, chunk_size) for text in texts]
    return await asyncio.gather(*tasks)

It leverages an event loop to submit the split_text_into_chunks function for each text content to a thread pool executor. This allows for efficient parallel processing of multiple text chunks.

Initializing Chroma Vector Database

Here, we create an instance of ChromaDB, a vector database that will store the embeddings of our search results.

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings_model,
    persist_directory="./chroma_langchain_db",
)

collection_name: Gives a name to the collection where the data will be stored.
embedding_function: This specifies the embeddings_model we loaded earlier (jina/jina-embeddings-v2-base-en), which will be used to create the vector representations of the search results.
persist_directory: The directory where the ChromaDB data will be stored on your local disk. This ensures that the database persists even after your program ends.

Defining Chunk Size and Headers

This section sets the chunk_size to 1024, which determines the maximum size of each text chunk that will be stored in the vector database.

chunk_size = 1024  # size of each text chunk
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}

It also sets the headers for HTTP requests to include a user agent, helping to identify the requests as coming from a browser rather than a bot.

Fetching and Processing Data from Google Search

The fetch_and_process_data function is the core of our search tool. It takes the search query, chunk size, headers, the number of results per page and the total results to fetch.

async def fetch_and_process_data(search_query,chunk_size=chunk_size,headers=headers,n_result_per_page=3,total_results_to_fetch=3):
    params = {
        "q": search_query,  # query example
        "hl": "en",         # language
        "gl": "uk",         # country of the search, UK -> United Kingdom
        "start": 0,         # number page by default up to 0
        "num": n_result_per_page    # parameter defines the maximum number of results to return per page.
    }

    async with aiohttp.ClientSession() as session:
        page_num = 0
        results = []
        while len(results) < total_results_to_fetch:
            page_num += 1
            await fetch_page(session, params, page_num, results,total_results_to_fetch,headers)

        urls = [result['links'] for result in results]

        with ThreadPoolExecutor(max_workers=10) as executor:
            loop = asyncio.get_event_loop()
            texts = await asyncio.gather(
                *[loop.run_in_executor(executor, get_all_text_from_url, url ,headers) for url in urls]
            )

        chunks_list = await process_text_content(texts, chunk_size)

        documents = []
        for i, result in enumerate(results):
            for j, chunk in enumerate(chunks_list[i]):
                documents.append(Document(page_content=chunk , metadata={'source': result['links'] ,
                                                                         'title': result['title']}))
        vector_store.add_documents(documents=documents)

    return documents

Search Results Fetching: It iterates through Google search result pages to fetch a set number of results based on the input parameters.
Content Fetching: It fetches the content of all the fetched URLs concurrently.
Text Processing: It then processes the text content of each fetched page to split it into smaller chunks.
ChromaDB Storage: It creates a list of Document objects from the chunks and metadata from the search results. Each document is a chunk of text with its source URL and title.
Storing Documents: Finally, it adds these documents to the ChromaDB vector store.

Defining the Web Search Tool

This block defines the web_search function, which serves as our tool for interacting with Google Search.

The function: * Takes the search query as input. * Returns the joined content from the relevant chunks.

def web_search(search_query: str):
    async def run_search():
        await fetch_and_process_data(search_query)
        results_ = vector_store.as_retriever().invoke(search_query)
        result_text = " ".join([results_[i].page_content for i in range(len(results_))])
        return result_text

    return asyncio.run(run_search())

We can test it like this :

result_text = web_search("Ollama")
print(result_text)

Build The Agent

Defining the Query Router

This function uses an our Llama3.1 to decide whether a query should be handled in "chat" mode (direct response from the LLM) or "websearch" mode (using the search tool).

class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""

    response_mode: Literal["chat", "websearch"] = Field(
        ...,
        description="Decide whether to respond via chat mode or perform a web search.",
    ) 

def router(query) :
  router_prompt = """
  You are an expert at determining how to respond to a user's question. 
  - **Chat**: Use this response for general inquiries, FAQs, and straightforward questions that can be answered with your existing knowledge.
  - **Websearch**: Use this response for more complex, real-time, or niche information requests that require specific data or up-to-date information beyond your knowledge.

  Respond with only one word: "chat" if you can answer directly, or "websearch" if the question needs further research.

  """
  router_client = instructor.from_openai(
      OpenAI(
          base_url="http://localhost:11434/v1",
          api_key="ollama",  # required, but unused
      ),
      mode=instructor.Mode.JSON,
  )

  routing = router_client.chat.completions.create(
      model="llama3.1",
      messages=[
          {
              "role": "system",
              "content": router_prompt,
          }
          ,
          {
              "role": "user",
              "content": query,
          }
      ],
      response_model=RouteQuery,
  )

  return routing.response_mode

System Prompt: Defines the instructions for the LLM, guiding it to choose between "chat" and "websearch" based on the user's query.
Router Client: Sets up an Instructor client to interact with the LLM.
Routing Logic: The LLM receives the user's query and decides whether to respond directly or perform a web search. The response is validated with a Pydantic model called RouteQuery to ensure the output is either "chat" or "websearch".
Return Value: Returns the response mode decided by the LLM.

Here, we provide examples of using the router function with different queries:

This query is expected to be a simple greeting, and the router correctly identifies it as a "chat" query.

router(query="HI")

The Output :

'chat'

This is a query that requires web search, and the router identifies it as a "websearch" query.

router(query="What did you know About The latest openai ai model o1 ?")

The Output :

'websearch'

Implementing the Ollama Chat Completion Class

Here, we define the OllamaChatCompletion class, which is a wrapper around the OpenAI client to facilitate interactions with our local LLM.

class OllamaChatCompletion:
    """
    Interacts with OpenAI's API for chat completions.
    """
    def __init__(self, model: str, api_key: str = None, base_url: str = None):
        """
        Initialize with model, API key, and base URL.
        """
        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def generate(self, messages: List[str], tools: List[Dict[str, Any]] = None, **kwargs) -> Dict[str, Any]:
        """Generates a response from OpenAI's API."""
        params = {'messages': messages, 'model': self.model, 'tools': tools, **kwargs}
        response = self.client.chat.completions.create(**params)
        return response.choices[0].message

generate Function: Takes a list of messages and optional tools, and generates a response using the configured LLM. It uses the OpenAI.chat.completions.create function to submit the messages and any tools to the LLM.

Initializing the LLM Client

We initialize an instance of OllamaChatCompletion to serve as our LLM client. The client is set up to use the Llama 3.1 model running locally with Ollama.

llm = OllamaChatCompletion(api_key='Empty' ,
                              base_url='http://127.0.0.1:11434/v1' ,
                              model="llama3.1")

Implementing Chat Message Memory

This block defines the ChatMessageMemory class, which is responsible for storing the conversation history between the user and the agent.

class ChatMessageMemory:
    """Manages conversation context."""

    def __init__(self):
        self.messages = []

    def add_message(self, message: Dict):
        """Add a message to memory."""
        self.messages.append(message)

    def add_messages(self, messages: List[Dict]):
        """Add multiple messages to memory."""
        for message in messages:
            self.add_message(message)

    def add_conversation(self, user_message: Dict, assistant_message: Dict):
        """Add a user-assistant conversation."""
        self.add_messages([user_message, assistant_message])

    def get_messages(self) -> List[Dict]:
        """Retrieve all messages."""
        return self.messages.copy()

    def reset_memory(self):
        """Clear all messages."""
        self.messages = []

add_message: Adds a single message to the memory.
add_messages: Adds a list of messages to memory.
add_conversation: Adds a user-assistant exchange as a pair of messages.
get_messages: Returns a copy of the current list of messages in the conversation history.
reset_memory: Clears all the messages from memory.

Implementing the `AgentTool` Class

The AgentTool class encapsulates a Python function and integrates it with Pydantic validation to ensure arguments sent to it are of the correct type.

class AgentTool:
    """Encapsulates a Python function with Pydantic validation."""
    def __init__(self, func: Callable, args_model: Type[BaseModel]):
        self.func = func
        self.args_model = args_model
        self.name = func.__name__
        self.description = func.__doc__ or self.args_schema.get('description', '')

    def to_openai_function_call_definition(self) -> dict:
        """Converts the tool to OpenAI Function Calling format."""
        schema_dict = self.args_schema
        description = schema_dict.pop("description", "")
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": description,
                "parameters": schema_dict
            }
        }

    @property
    def args_schema(self) -> dict:
        """Returns the tool's function argument schema as a dictionary."""
        schema = self.args_model.model_json_schema()
        schema.pop("title", None)
        return schema

    def validate_json_args(self, json_string: str) -> bool:
        """Validate JSON string using the Pydantic model."""
        try:
            validated_args = self.args_model.model_validate_json(json_string)
            return isinstance(validated_args, self.args_model)
        except ValidationError:
            return False

    def run(self, *args, **kwargs) -> Any:
        """Execute the function with validated arguments."""
        try:
            # Handle positional arguments by converting them to keyword arguments
            if args:
                sig = signature(self.func)
                arg_names = list(sig.parameters.keys())
                kwargs.update(dict(zip(arg_names, args)))

            # Validate arguments with the provided Pydantic schema
            validated_args = self.args_model(**kwargs)
            return self.func(**validated_args.model_dump())
        except ValidationError as e:
            raise ValueError(f"Argument validation failed for tool '{self.name}': {str(e)}")
        except Exception as e:
            raise ValueError(f"An error occurred during the execution of tool '{self.name}': {str(e)}")

    def __call__(self, *args, **kwargs) -> Any:
        """Allow the AgentTool instance to be called like a regular function."""
        return self.run(*args, **kwargs)

to_openai_function_call_definition: Converts the tool into the OpenAI Function Calling format.
args_schema: Returns the argument schema of the tool as a dictionary.
validate_json_args: Validates JSON arguments against the Pydantic schema.
run: Executes the wrapped function, handling positional and keyword arguments. Validates the arguments using the Pydantic schema, runs the function, and returns the result.

Implementing the `AgentToolExecutor` Class

This class manages the registration and execution of tools.

class AgentToolExecutor:
    """Manages tool registration and execution."""

    def __init__(self, tools: Optional[List[AgentTool]] = None):
        self.tools: Dict[str, AgentTool] = {}
        if tools:
            for tool in tools:
                self.register_tool(tool)

    def register_tool(self, tool: AgentTool):
        """Registers a tool."""
        if tool.name in self.tools:
            raise ValueError(f"Tool '{tool.name}' is already registered.")
        self.tools[tool.name] = tool

    def execute(self, tool_name: str, *args, **kwargs) -> Any:
        """Executes a tool by name with given arguments."""
        tool = self.tools.get(tool_name)
        if not tool:
            raise ValueError(f"Tool '{tool_name}' not found.")
        try:
            return tool(*args, **kwargs)
        except Exception as e:
            raise ValueError(f"Error executing tool '{tool_name}': {e}") from e

    def get_tool_names(self) -> List[str]:
        """Returns a list of all registered tool names."""
        return list(self.tools.keys())

    def get_tool_details(self) -> str:
        """Returns details of all registered tools."""
        tools_info = [f"{tool.name}: {tool.description} Args schema: {tool.args_schema['properties']}" for tool in self.tools.values()]
        return '\n'.join(tools_info)

register_tool: Registers a new AgentTool by adding it to the tools dictionary using its name as the key.
execute: Executes the tool with the given name and arguments. It retrieves the tool from the tools dictionary and executes it.
get_tool_names: Returns a list of all registered tool names.
get_tool_details: Returns a string description of all registered tools, including their names, descriptions, and argument schemas.

Implementing the `Agent` Class

This class is the central component of the agent, responsible for integrating the LLM client, tools, and memory.

run: The core function of the agent, taking a user message and generating a response.
- Routing: Checks if the user message should be routed to "chat" or "websearch" mode using the router function.
- Chat Mode: If in chat mode, it generates a response using the LLM, adds the response to memory, and returns it.
- Websearch Mode: If in websearch mode, it uses the LLM with tools to generate a response, checks the response if it contains a tool call, and if it does, it executes the tool call and adds the execution results to the message history. The function returns a response from the agent.

logger = logging.getLogger(__name__)

class Agent:
    """Integrates LLM client, tools, memory, and manages tool executions."""

    def __init__(self, llm_client, system_message: Dict[str, str], max_iterations: int = 10, tools: Optional[List[AgentTool]] = None):
        self.llm_client = llm_client
        self.executor = AgentToolExecutor()
        self.memory = ChatMessageMemory()
        self.system_message = system_message
        self.max_iterations = max_iterations
        self.tool_history = []
        self.function_calls = None

        # Register and convert tools
        if tools:
            for tool in tools:
                self.executor.register_tool(tool)
            self.function_calls = [tool.to_openai_function_call_definition() for tool in tools]

    def run(self, user_message: Dict[str, str]):
        """Generates responses, manages tool calls, and updates memory."""
        self.memory.add_message(user_message)
        derection = router(user_message['content'])

        for _ in range(self.max_iterations):
          if derection == 'websearch':
            chat_history = [self.system_message] + self.memory.get_messages() + self.tool_history
            response = self.llm_client.generate(chat_history, tools=self.function_calls)
            if self.parse_response(response):
                continue
            else:
                self.memory.add_message({"role": "assistant" , "content" : response.content})
                self.tool_history = []
                return response
          else : 
            chat_history = [self.system_message] + self.memory.get_messages()
            response = self.llm_client.generate(chat_history)
            self.memory.add_message({"role": "assistant" , "content" : response.content})
            return response

    def parse_response(self, response) -> bool:
        """Executes tool calls suggested by the LLM and updates tool history."""
        import json

        if response.tool_calls:
            self.tool_history.append(response)
            for tool in response.tool_calls:
                tool_name = tool.function.name
                tool_args = tool.function.arguments
                tool_args_dict = json.loads(tool_args)
                try:
                    logger.info(f"Executing {tool_name} with args: {tool_args}")
                    execution_results = self.executor.execute(tool_name, **tool_args_dict)
                    self.tool_history.append({
                        "role": "tool",
                        "tool_call_id": tool.id,
                        "name": tool_name,
                        "content": str(execution_results)
                    })
                except Exception as e:
                    raise ValueError(f"Execution error in tool '{tool_name}': {e}") from e
            return True
        return False

parse_response: Interprets the LLM's response and executes any tool calls.
- Tool Execution: If a tool call is present, it extracts the tool name and arguments, then uses the AgentToolExecutor to execute the tool.
- Update Tool History: It then updates the tool history to keep track of the tool calls and their results.

Defining the `GetSearchSchema` Model

Here we define a Pydantic model GetSearchSchema which is a data schema for the web_search tool.

class GetSearchSchema(BaseModel):
    """Fetch and process data from Google search based on a query, store results in ChromaDB vector store, and retrieve results."""
    search_query: str = Field(description="The search query to use for fetching data from Google search")

Defining the Tools

Here, we define a list of tools that the agent can use. Currently, it only includes the web_search tool.

tools = [
    AgentTool(web_search, 
              GetSearchSchema)
]

Initializing the Agent

Finally, we initialize an instance of the Agent class, passing it the llm, a system_message, and the tools.

System Message: Sets up instructions for the LLM on how it should behave, including memory usage, context, and interaction style.

# Define the system message
system_message = {"role": "system", 
                  "content": """
                  You are an AI assistant designed to assist users with their questions and inquiries across a wide range of topics. 
                  Your main focus is to answer the user's most recent question directly. You have memory to retain relevant information from previous interactions, which can help provide more personalized responses if needed.
                  Your goal is to deliver accurate, helpful, and concise answers while maintaining a friendly and engaging tone. Feel free to sprinkle in some humor and emojis to make the conversation lively! 
                  Always prioritize clarity, relevance, and user satisfaction in your interactions, utilizing your memory to enhance the user experience when appropriate.
                  """
                  }

agent = Agent(llm_client=llm, 
              system_message=system_message, 
              tools=tools)

Running the Agent with Example Queries

Greeting: The first example demonstrates a simple greeting and showcases the agent's ability to handle conversational interactions.

# Define a user message
user_message = {"role": "user", "content": "Hi"}

# Generate a response using the agent
response = agent.run(user_message)
print(response.content)

The Output :

It's great to meet you! 😊 I'm here to help answer any questions or chat about anything that's on your mind. What brings you here today? Is there something specific on your mind, or do you just want to say hi and see what we can talk about? 🤗

Web Search Query: The second example presents a web search query, showing how the agent can leverage the web_search tool to retrieve information from the web.

user_message = {"role": "user", "content": "Search the web for information on the Mamba AI architecture and provide details about it"}

response = agent.run(user_message)
print(response.content)

The Output :

Based on my internet search, it seems that Mamba AI Architecture is a new star in the field of sequence modeling, particularly in handling long sequences. It employs a Control Theory-inspired State Space Model (SSM) for communication purposes and retains Multilayer Perceptron (MLP)-style projections for computation.

The core idea behind Mamba Architecture is to provide an alternative to transformer models by using stacked Mamba blocks, which are similar to the stacked transformer blocks in traditional transformer models. The choice of SSM for sequence transformations allows for more efficient communication and scalable processing, making it suitable for applications like speech recognition and text-to-speech synthesis.

Furthermore, Mamba Architecture is said to be a leap forward in the field of AI, offering improved efficiency and scalability. However, my search didn't provide specific information on its commercial deployment or how it performs compared to existing models.

Here's a quick summary:

**Mamba AI Architecture**

* A new approach to sequence modeling
* Employing Control Theory-inspired State Space Models for communication
* Retaining Multilayer Perceptron (MLP)-style projections for computation
* Designed to provide an alternative to traditional transformer models
* Potentially more efficient and scalable than existing models

Last Task Query: This example demonstrates the agent's ability to access its memory and provide details about its previous actions.

user_message = {"role": "user", "content": "What was your last task ?"}

response = agent.run(user_message)
print(response.content)

The Output :

My last task was to:

"Search About Mamba AI Architecture in the internet and tell me about it." 🤔 I had done that for you, and provided some information on what I found from my search. 😊

Other search: This example further showcases the agent's capacity to perform web searches and retrieve information about specific topics.

user_message = {"role": "user", "content": "Search online and provide information on the latest OpenAI model and project name"}

response = agent.run(user_message)
print(response.content)

The Output :

The latest OpenAI model is called "OpenAI o1" (also code-named "Strawberry"), which can reason logically through complex problems rather than simply providing a one-step answer like other large language models. Additionally, there's also a new project in the works, codenamed "GPT-5", which will be a significantly larger model that continues to scale up AI capabilities.

Accessing the Agent's Memory

The last example shows how to retrieve the conversation history managed by the ChatMessageMemory.

agent.memory.get_messages()

The Output :

[{'role': 'user', 'content': 'Hi there'},
 {'role': 'assistant',
  'content': "It's great to meet you! 😊 I'm here to help answer any questions or chat about anything that's on your mind. What brings you here today? Is there something specific on your mind, or do you just want to say hi and see what we can talk about? 🤗"},
 {'role': 'user',
  'content': 'Search the web for information on the Mamba AI architecture and provide details about it'},
 {'role': 'assistant',
  'content': "Based on my internet search, it seems that Mamba AI Architecture is a new star in the field of sequence modeling, particularly in handling long sequences. It employs a Control Theory-inspired State Space Model (SSM) for communication purposes and retains Multilayer Perceptron (MLP)-style projections for computation.\n\nThe core idea behind Mamba Architecture is to provide an alternative to transformer models by using stacked Mamba blocks, which are similar to the stacked transformer blocks in traditional transformer models. The choice of SSM for sequence transformations allows for more efficient communication and scalable processing, making it suitable for applications like speech recognition and text-to-speech synthesis.\n\nFurthermore, Mamba Architecture is said to be a leap forward in the field of AI, offering improved efficiency and scalability. However, my search didn't provide specific information on its commercial deployment or how it performs compared to existing models.\n\nHere's a quick summary:\n\n**Mamba AI Architecture**\n\n* A new approach to sequence modeling\n* Employing Control Theory-inspired State Space Models for communication\n* Retaining Multilayer Perceptron (MLP)-style projections for computation\n* Designed to provide an alternative to traditional transformer models\n* Potentially more efficient and scalable than existing models"},
 {'role': 'user', 'content': 'what was your last task ?'},
 {'role': 'assistant',
  'content': 'My last task was to:\n\n"Search About Mamba AI Architecture in the internet and tell me about it." 🤔 I had done that for you, and provided some information on what I found from my search. 😊'},
 {'role': 'user',
  'content': 'Search online and provide information on the latest OpenAI model and project name'},
 {'role': 'assistant',
  'content': 'The latest OpenAI model is called "OpenAI o1" (also code-named "Strawberry"), which can reason logically through complex problems rather than simply providing a one-step answer like other large language models. Additionally, there\'s also a new project in the works, codenamed "GPT-5", which will be a significantly larger model that continues to scale up AI capabilities.'}]

I hope you found this exploration insightful, and I look forward to continuing this learning experience with you in the upcoming parts of the series.

Happy prompting.

Building AI Agents from Scratch (Part 2) : A Conversational Search Agent with Ollama

What is Function Calling

Installing Ollama and Pulling Models

Installing and Importing Dependencies

Creating a Custom Search Tool

Extracting Text from URLs

Splitting Text into Chunks

Processing Text Content Asynchronously

Initializing Chroma Vector Database

Defining Chunk Size and Headers

Fetching and Processing Data from Google Search

Defining the Web Search Tool

Build The Agent

Defining the Query Router

Implementing the Ollama Chat Completion Class

Initializing the LLM Client

Implementing Chat Message Memory

Implementing the AgentTool Class

Implementing the AgentToolExecutor Class

Implementing the Agent Class

Defining the GetSearchSchema Model

Defining the Tools

Initializing the Agent

Running the Agent with Example Queries

Accessing the Agent's Memory

Implementing the `AgentTool` Class

Implementing the `AgentToolExecutor` Class

Implementing the `Agent` Class

Defining the `GetSearchSchema` Model