Page cover

llamaIndex

LlamaIndex is an open-source data framework designed to help LLMs connect with external data sources in a structured, efficient, and context-aware way. It provides a powerful suite of tools for ingesting, indexing, querying, and retrieving data from diverse formats such as PDFs, databases, APIs, and more. With modular components like custom indices, retrievers, and agents, LlamaIndex enables developers to build scalable Retrieval-Augmented Generation (RAG) pipelines and LLM-powered applications.

This tutorial will guide you through integrating Arcee models into llamaIndex using an OpenAI-compatible endpoint.

The first example shows how to run simple inference with llamaIndex, while the second example shows how to setup a local RAG pipeline.


Model Inference

Prerequisites

  • Python: >=3.9

  • Arcee AI model running locally or accessible via API and an OpenAI-compatible endpoint

Quickstart

Environment and project setup:

# Create project folder
mkdir arceeai_llamaindex && cd arceeai_llamaindex

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Create and activate virtual environment
uv venv --python 3.12 --seed
source .venv/bin/activate

# Install LlamaIndex OpenAI-compatible client
uv pip install llama-index-llms-openai-like

Create a new python file called arceeai_llamaindex.py with the following:

import os
from llama_index.llms.openai_like import OpenAILike

# Configure Arcee AI Model
ARCEE_BASE = os.getenv("OPENAI_API_BASE", "http://127.0.0.1:8080/v1")
ARCEE_KEY  = os.getenv("OPENAI_API_KEY", "your-arcee-api-key")
ARCEE_MODEL = os.getenv("OPENAI_MODEL_NAME", "trinity-mini")

# Initialize Arcee AI model with OpenAI-compatible configuration
arcee_llm = OpenAILike(
    model=ARCEE_MODEL,
    api_base=ARCEE_BASE,
    api_key=ARCEE_KEY,
    is_chat_model=True,
    #is_function_calling_model=True,
)

# Define the prompt to be sent to the Arcee AI model
text = """Arcee AI is a foundation model provider with a focus on building the highest performing models per parameter. 
They offer a range of models from on-device and edge optimized models to large language models. Their suite of models 
provides customers with the flexibility to choose the right model for the right task. All models are released Apache 2.0 
enabling the community to use safe, built-in-the-US models in their own environment or via the Arcee AI API platform."""

prompt = f"Summarize the following in three bullets:\n\n{text}"

# Invoke the Arcee AI model
response = arcee_llm.complete(prompt)

# Print the results
print("\n=== RESULT ===\n")
print(str(response))

Test your script:

python arceeai_llamaindex.py

Retrieval Augmented Generation

This example sets up a RAG pipeline with LlamaIndex using an Arcee AI model for text generation and an OpenAI Embeddings model for document embeddings. It uses an in-memory vector database that is cleared after execution. For persistent storage and more advanced pipelines, see the LlamaIndex documentation.

Prerequisites

  • Python: >=3.9

Environment Setup for RAG Pipeline

# Create project folder
mkdir arceeai_llamaindex_rag && cd arceeai_llamaindex_rag

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Create and activate virtual environment
uv venv --python 3.12 --seed
source .venv/bin/activate

# Install LlamaIndex core, Arcee LLM wrapper, and embedding support
uv pip install llama-index-core llama-index-llms-openai-like llama-index-embed

Create a new python file called arceeai_llamaindex_rag.py with the following:

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.llms import ChatMessage

# Configure Arcee AI Model
ARCEE_BASE = os.getenv("OPENAI_API_BASE", "http://127.0.0.1:8080/v1")
ARCEE_KEY  = os.getenv("OPENAI_API_KEY", "your-arcee-api-key")
ARCEE_MODEL = os.getenv("OPENAI_MODEL_NAME", "trinity-mini")

# Initialize Arcee AI model with OpenAI-compatible configuration
arcee_llm = OpenAILike(
    model=ARCEE_MODEL,
    api_base=ARCEE_BASE,
    api_key=ARCEE_KEY,
    is_chat_model=True,
    #is_function_calling_model=True,
)

# Configure an embedding model to embed your documents
# This can be any embedding model, local or API
# In this example, we'll use an embedding model from OpenAI
embed_model = OpenAIEmbedding(
    model_name="text-embedding-3-small",
    api_base="https://api.openai.com/v1",
    api_key="YOUR_API_KEY", # Put your API Key here or reference from environment variables
)

# Set the models for llama-index to use
Settings.llm = arcee_llm
Settings.embed_model = embed_model

# Load documents
# In this example, we have some .txt/.md/.pdf files under ./data
documents = SimpleDirectoryReader("./data").load_data()

# Build the vector index and load in the documents
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
answer = query_engine.query("Summarize the top 5 key points in these files.") # Change the prompt to a specific question about your documents

# Print the results
print("\n=== RESULT ===\n")
print(answer.response)

Run your script:

python arceeai_llamaindex_rag.py

Last updated