llamaIndex

LlamaIndex is an open-source data framework designed to help LLMs connect with external data sources in a structured, efficient, and context-aware way. It provides a powerful suite of tools for ingesting, indexing, querying, and retrieving data from diverse formats such as PDFs, databases, APIs, and more. With modular components like custom indices, retrievers, and agents, LlamaIndex enables developers to build scalable Retrieval-Augmented Generation (RAG) pipelines and LLM-powered applications.

This tutorial will guide you through integrating Arcee AI language models into llamaIndex using an OpenAI-compatible endpoint, allowing you to leverage Arcee's specialized models within llamaindex's framework.

The first example shows how to run simple inference with llamaIndex, while the second example shows how to setup a local RAG pipeline.

Model Inference

Prerequisites

Python: >=3.9

Integration Steps

Create a folder for your project

mkdir arceeai_llamaindex && cd arceeai_llamaindex

Setup Python Virtual Environment and Install llamaIndex tooling

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

uv venv --python 3.12 --seed
source .venv/bin/activate

uv pip install llama-index-llms-openai-like

If you run into any errors installing llamaindex, follow their Instillation Guide and OpenAILike Guide.

Create a new python file called arceeai_llamaindex.py and copy in the following contents

import os
from llama_index.llms.openai_like import OpenAILike

# Configure Arcee AI Model
ARCEE_BASE = os.getenv("OPENAI_API_BASE", "http://127.0.0.1:8080/v1")
ARCEE_KEY  = os.getenv("OPENAI_API_KEY", "your-arcee-api-key")
ARCEE_MODEL = os.getenv("OPENAI_MODEL_NAME", "afm-4.5b")

# Initialize Arcee AI model with OpenAI-compatible configuration
arcee_llm = OpenAILike(
    model=ARCEE_MODEL,
    api_base=ARCEE_BASE,
    api_key=ARCEE_KEY,
    is_chat_model=True,
    #is_function_calling_model=True,
)

# Define the prompt to be sent to the Arcee AI model
text = """Arcee AI is a foundation model provider with a focus on building the highest performing models per parameter. 
They offer a range of models from on-device and edge optimized models to large language models. Their suite of models 
provides customers with the flexibility to choose the right model for the right task. All models are released Apache 2.0 
enabling the community to use safe, built-in-the-US models in their own environment or via the Arcee AI API platform."""

prompt = f"Summarize the following in three bullets:\n\n{text}"

# Invoke the Arcee AI model
response = arcee_llm.complete(prompt)

# Print the results
print("\n=== RESULT ===\n")
print(str(response))

This works out-of-the-box if you have an Arcee AI model running locally on your laptop. If you do not, change ARCEE_BASE , ARCEE_KEY , and ARCEE_MODEL .

You can also setup a .env file to store the configurations.

Run your Arcee AI powered llamaIndex completion

python arceeai_llamaindex.py

Retrieval Augmented Generation

This example shows how to setup a RAG pipeline with llamaIndex. In this example, we use an Arcee AI model for text generation and an OpenAI Embedding model for generating document embeddings. This example creates an in-memory vector database that is removed from memory after the file is executed. If you want to create a persistent vector database and build a more complex and robust pipeline, see the llamaIndex documentation.

Prerequisites

Python: >=3.9

Integration Steps

Create a folder for your project

mkdir arceeai_llamaindex_rag && cd arceeai_llamaindex_rag

Setup Python Virtual Environment and Install llamaIndex tooling

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

uv venv --python 3.12 --seed
source .venv/bin/activate

uv pip install llama-index-core llama-index-llms-openai-like llama-index-embeddings-openai

If you run into any errors installing llamaindex, follow their Instillation Guide and OpenAILike Guide.

Create a new python file called arceeai_llamaindex_rag.py and copy in the following contents

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.llms import ChatMessage

# Configure Arcee AI Model
ARCEE_BASE = os.getenv("OPENAI_API_BASE", "http://127.0.0.1:8080/v1")
ARCEE_KEY  = os.getenv("OPENAI_API_KEY", "your-arcee-api-key")
ARCEE_MODEL = os.getenv("OPENAI_MODEL_NAME", "afm-4.5b")

# Initialize Arcee AI model with OpenAI-compatible configuration
arcee_llm = OpenAILike(
    model=ARCEE_MODEL,
    api_base=ARCEE_BASE,
    api_key=ARCEE_KEY,
    is_chat_model=True,
    #is_function_calling_model=True,
)

# Configure an embedding model to embed your documents
# This can be any embedding model, local or API
# In this example, we'll use an embedding model from OpenAI
embed_model = OpenAIEmbedding(
    model_name="text-embedding-3-small",
    api_base="https://api.openai.com/v1",
    api_key="YOUR_API_KEY", # Put your API Key here or reference from environment variables
)

# Set the models for llama-index to use
Settings.llm = arcee_llm
Settings.embed_model = embed_model

# Load documents
# In this example, we have some .txt/.md/.pdf files under ./data
documents = SimpleDirectoryReader("./data").load_data()

# Build the vector index and load in the documents
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
answer = query_engine.query("Summarize the top 5 key points in these files.") # Change the prompt to a specific question about your documents

# Print the results
print("\n=== RESULT ===\n")
print(answer.response)

This works out-of-the-box if you have an Arcee AI model running locally on your laptop. If you do not, change ARCEE_BASE , ARCEE_KEY , and ARCEE_MODEL .

You can also setup a .env file to store the configurations.

Run your Arcee AI powered CrewAI Agent

python arceeai_llamaindex_rag.py

PreviousCrewAI Nextn8n

Last updated 20 days ago