Featured image

Introduction

In the evolving landscape of data management and artificial intelligence, vector databases have emerged as a pivotal technology enabling advanced similarity searches and semantic understanding. Azure provides powerful tools to implement vector search, notably through vCore-based Azure Cosmos DB for MongoDB and Azure OpenAI’s GPT-3.5 model. This comprehensive guide dives deep into building an AI copilot by integrating these technologies, showcasing practical implementations, best practices, and real-world scenarios.

Understanding Vector Databases and Their Role in Azure

Traditional databases excel at exact-match queries but struggle with semantic and similarity-based searches. Vector databases store data as high-dimensional vectors and enable efficient similarity searches by comparing these vectors. In Azure, vCore-based Cosmos DB for MongoDB supports native vector search capabilities, allowing developers to index and query vector embeddings directly.

Integrating these vector databases with AI models like GPT-3.5 enhances the search experience by providing human-readable, conversational responses based on retrieved data. This approach, known as Retrieval-Augmented Generation (RAG), combines precise data retrieval with powerful generative AI capabilities.

Building an AI Copilot: Architecture Overview

The AI copilot architecture involves several key components:

  1. Data Source: Product information stored in Azure Blob Storage, including fields like categoryName and name.
  2. Vector Embeddings: Generated for key fields using Azure OpenAI embeddings deployment, converting text into vector representations.
  3. vCore-based Cosmos DB for MongoDB: Stores product documents with vector embeddings and supports vector indexes.
  4. Vector Indexing: Enables efficient nearest neighbor searches on the vector columns.
  5. Vector Search API: Aggregation pipeline queries Cosmos DB using vector similarity.
  6. GPT-3.5 Integration: Enhances raw vector search results with detailed, natural language insights.

Practical Walkthrough: Implementing Vector Search in Azure Cosmos DB

Preparing Your Environment

Before diving into code, ensure you have:

  • Visual Studio Code installed.
  • An Azure Subscription with access to:
    • A vCore-based Azure Cosmos DB for MongoDB account.
    • An Azure OpenAI account with deployments for embeddings and chat completions.

You can create required resources manually via the Azure portal or automate the process using the included PowerShell script create-azure-resources.ps1, driven by a .env file containing your environment variables.

Generating Document Embeddings

Vector search begins with generating embeddings from document fields. In this example, embeddings are created from the concatenation of the product categoryName and name.

Example: Generating Product Embeddings (Node.js)

const productName = "Category - " + product["categoryName"] + ", Name -" + product["name"];

if (productName) {
    product["productVector"] = await Embeddings.generateEmbeddings(productName, embeddingsDeployment, AzureOpenAIClient);
}
return product;

Example: Generating Product Embeddings (Python)

productName = "Category - " + product["categoryName"] + ", Name -" + product["name"]

if productName:
    product["productVector"] = Embeddings.generateEmbeddings(productName, embeddings_deployment, AzureOpenAIClient)

return product

This method ensures that each product document has a vector representation stored in the productVector field.

Creating Vector Indexes in Cosmos DB

Storing vectors alone isn’t sufficient; to perform efficient similarity searches, you must create vector indexes on the vector columns.

Best Practice: Use IVF (Inverted File) Indexes

IVF indexes balance fast search performance with reasonable resource consumption. In Cosmos DB for MongoDB, you can create IVF indexes with these properties:

  • kind: “vector-ivf”
  • numLists: Controls index granularity (e.g., 1)
  • similarity: Usually cosine similarity ("COS")
  • dimensions: Dimension of the vector embedding (e.g., 1536 for OpenAI embeddings)

Example: Creating a Vector Index (Node.js)

const commandResult = await db.command({
  'createIndexes': collectionName,
  'indexes': [{
    'name': indexName,
    'key': { [vectorColumn]: "cosmosSearch" },
    'cosmosSearchOptions': {
      'kind': 'vector-ivf',
      'numLists': 1,
      'similarity': 'COS',
      'dimensions': 1536
    }
  }]
});

Example: Creating a Vector Index (Python)

db.command({
    'createIndexes': collection_name,
    'indexes': [{
        'name': indexname,
        'key': { f"{vectorColumn}": "cosmosSearch" },
        'cosmosSearchOptions': {
            'kind': 'vector-ivf',
            'numLists': 1,
            'similarity': 'COS',
            'dimensions': 1536
        }
    }]
})

Loading Data and Vectorizing

You can either load data from Azure Blob Storage or local files, generate embeddings for each product, and then save them in Cosmos DB with vector indexes:

  1. Load documents.
  2. Generate embeddings for selected fields.
  3. Create or update vector indexes.
  4. Insert or update documents in Cosmos DB.

Vector search takes a user query, generates its embedding, and finds the closest vectors in the database.

Example: Vector Search Pipeline (Node.js)

const queryEmbedding = await Embeddings.generateEmbeddings(query, embeddingsDeployment, AzureOpenAIClient);
const pipeline = [
  {
    '$search': {
      "cosmosSearch": {
        "vector": queryEmbedding,
        "path": vectorColumn,
        "k": numResults
      },
      "returnStoredSource": true
    }
  },
  { '$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document': '$$ROOT' } }
];

const results = await collection.aggregate(pipeline).toArray();
return results;

Enhancing Results with GPT-3.5

While vector search returns relevant documents, integrating GPT-3.5 adds conversational understanding and detailed insights.

How It Works

  • The vector search results are passed as context to GPT-3.5.
  • A system prompt frames the AI as an assistant for the bike shop.
  • GPT-3.5 composes a human-friendly response based on the user query and retrieved documents.

Example: Generating GPT-3.5 Completion (Node.js)

const systemPrompt = `
You are an intelligent assistant for the Adventure Works Bike Shop.
You are designed to provide helpful answers to user questions about the store inventory given the information provided below.
- Only answer questions related to the information provided.
- Provide 3 clear suggestions in a list format.
- Write two lines of whitespace between each answer.
- Only provide answers about Adventure Works Bike Shop products.
- If unsure, say "I don't know" or "I'm not sure" and recommend searching.
`;

let messages = [
  {role: "system", content: systemPrompt},
  {role: "user", content: userInput},
];

for (let item of prompt) {
  messages.push({role: "system", content: `${item.document.categoryName} ${item.document.name}`});
}

const response = await AzureOpenAICompletionClient.chat.completions.create({ messages, model: completionDeployment });
return response;

Best Practices and Tips

  • Efficient Embedding Generation: Implement retries and rate limiting when calling Azure OpenAI to avoid throttling.
  • Index Management: Drop existing indexes before recreating to prevent conflicts and stale data.
  • Data Privacy: Avoid sending sensitive information to AI services unless compliant with your organization’s policies.
  • Testing Queries: Test vector search with varying queries to understand embedding quality and tuning needs.
  • Cost Management: Monitor resource usage and clean up Azure resources after experiments to avoid unexpected charges.

Real-World Scenario: A Bike Shop AI Assistant

Imagine a bike retail company wanting an AI assistant to help customers explore their inventory:

  • Customers ask about bike colors, sizes, or accessories.
  • The assistant uses vector search to find relevant products semantically matching the query.
  • GPT-3.5 then summarizes and presents human-friendly suggestions, enhancing customer experience.

This approach makes product discovery intuitive and conversational, greatly improving engagement.

Running the Application

  1. Prepare Environment Variables: Populate your .env file with Cosmos DB and Azure OpenAI credentials.

  2. Install Dependencies:

    • Python example:

      py -m pip install pymongo openai tenacity azure-storage-blob
      
    • Node.js example:

      npm install
      npm install openai
      
  3. Load and Vectorize Data: Run the data loading script to ingest products and create vector indexes.

  4. Interact with the AI Copilot: Choose options to perform vector search or GPT-3.5 enhanced search.

Cleaning Up

Always delete created Azure resources such as resource groups, Cosmos DB accounts, and OpenAI deployments after your experiments to avoid unnecessary costs.

Conclusion

This detailed exploration of vector databases in Azure highlights how vCore-based Cosmos DB for MongoDB and Azure OpenAI GPT-3.5 can be combined to build advanced AI copilots. By generating embeddings, creating vector indexes, performing similarity searches, and enhancing results with generative AI, developers can deliver powerful, natural language data exploration experiences. Following best practices ensures scalable, efficient, and cost-effective implementations.

Leveraging the Retrieval-Augmented Generation (RAG) paradigm, this integration represents a significant step forward in intelligent data retrieval and conversational AI.


Author: Joseph Perez