Retrieval Augmented Generation with Astra and Mongoose

Created on November 12, 2023 at 11:42 am

Retrieval augmented generation, or RAG for short, is a technique that injects additional relevant context into an LLM prompt in order to get better results. For example, if you were to ask ChatGPT PERSON to write a tweet about a new blog post describing how to work with vector databases in Node.js, you might get the following output.

πŸš€ Exciting News ORG for Node.js Developers! πŸš€ Just dropped a πŸ”₯NEW BLOG POSTπŸ”₯ on mastering Vector Databases in Node.js. πŸ“Š Learn ORG how to supercharge your data-driven apps with speed and efficiency. Don’t miss it! πŸš€ Read the full post here: [Insert Blog Post Link] #NodeJS # MONEY VectorDatabases #WebDevelopment #DataDrivenApps πŸ’»πŸ”

This tweet is reasonable, but a bit generic. How about if we add an example of a relevant tweet and ask ChatGPT PERSON to copy the tone and style using the following prompt?

Write a tweet promoting a new blog post about working with vector databases in Node.js Write using the style and tone of the following tweet: Ready to venture into the world of Generative AI ORG ? Learn the ins and outs of infrastructure and select the right LLM for your project with this how-to guide.

Here’s the result:

Ready to dive into the world of vector databases in Node.js? Discover the secrets to optimizing your data handling and supercharging your apps! Check out our latest blog post for the ultimate how-to guide.

RAG means using a vector database to find relevant examples to the user’s prompt, to help guide the LLM ORG to a better result. Instead of sending one CARDINAL hard-coded example along with the user’s prompt to the LLM WORK_OF_ART , you can search through a massive collection of content to find the content most relevant to the user’s query, and send that content along with the user’s prompt to the LLM PRODUCT .

In this blog post, I’ll show how to use Astra ORG ‘s vector search with Mongoose GPE to build a RAG chatbot that answers JavaScript programming questions using Mastering JS’ PRODUCT articles for context.

You can see the full source code for the Mastering JS chatbot here. You can also access a live version of the Mastering JS chatbot here, just be aware that this chatbot is limited to 100 queries per hour QUANTITY to prevent abuse.

Getting Started with Astra ORG and Mongoose LOC

Astra ORG ‘s new vector search capabilities are integrated into Astra ORG ‘s JSON API, which means you can use Mongoose GPE queries to do vector search in Astra ORG . To get started, install Mongoose GPE and stargate-mongoose.

npm install mongoose stargate-mongoose

Next, deploy a new vector database on Astra PRODUCT .

Click into your new Astra PRODUCT vector database, click "Connect", and find "Connect with the JSON API". You should see the following instructions:

Click " Generate Configuration WORK_OF_ART " to get credentials for your Astra ORG vector database. You will get a ASTRA_DB_ID ORG , ASTRA_DB_REGION EVENT , ASTRA_DB_KEYSPACE PERSON , and ASTRA_DB_APPLICATION_TOKEN CARDINAL . Copy these values into my Astra PRODUCT connection string generation tool to get your Astra PRODUCT connection string. It should look like the following:

https://3418a8ac-OMITTED-us-east-2.apps.astra.datastax.com/api/json/v1/test?applicationToken=AstraCS:OMITTED

With an Astra PRODUCT connection string, you’re now ready to connect to Astra ORG and start querying documents. Import Mongoose GPE and stargate-mongoose as follows:

const mongoose = require ( ‘mongoose’ ); mongoose.set( ‘autoCreate’ , false ); mongoose.set( ‘autoIndex’ , false ); const { driver } = require ( ‘stargate-mongoose’ ); mongoose.setDriver(driver PERSON );

Then, connect to Astra ORG using mongoose.connect() , and create a new collection tests that has vector search enabled as follows.

await mongoose.connect(process.env.ASTRA_URI, { isAstra NORP : true }); await mongoose.connection.dropCollection( ‘tests’ ); await mongoose.connection.createCollection( ‘tests’ , { vector: { metric: ‘cosine’ , dimension: 2 CARDINAL } });

Create a new Mongoose GPE model with a $vector property. $vector is a special property containing an array of numbers that Astra ORG uses to store the document’s vector for vector search.

const schema = new mongoose. Schema PERSON ({ $vector: [ Number ], name: String }); const TestModel ORG = mongoose.model( ‘Test’ , schema, ‘tests’ ); await TestModel.create({ name: ‘Test 1’ , $vector: [ 0 CARDINAL , 1 CARDINAL ] }); await TestModel.create({ name: ‘Test 2’ , $vector: [ 0 CARDINAL , -1 ] });

With JSON API, vector search is represented as a Mongoose GPE query with a special sort() parameter on $vector . The sort parameter to $vector tells Astra ORG which vector to find documents closest to. The following example shows how to query for which documents are closest to the vector [ 0.1 CARDINAL , -0.9 CARDINAL ] followed by the documents closest to [ 0.1 CARDINAL , 0.9 CARDINAL ] :

let docs = await TestModel.find() .sort PERSON ({ $vector: { $meta: [ 0.1 CARDINAL , -0.9 CARDINAL ] } }); console .log(docs.map(doc => doc.name)); docs = PERSON await TestModel.find() .sort PERSON ({ $vector: { $meta: [ 0.1 CARDINAL , 0.9 CARDINAL ] } }); console .log(docs.map(doc => doc.name));

Importing Mastering JS’ Articles

Retrieval WORK_OF_ART augmented generation generally works best when you have a large data set of content that is relevant to the sort of questions your app expects. For example, for JavaScript ORG programming questions, Mastering JS PERSON has an easily scrapable collection of approximately 500 CARDINAL articles on common JavaScript PRODUCT programming tasks. To import Mastering PERSON JS’ articles, you can add Mastering JS PRODUCT as a devDependency LOC in your package.json , along with a couple other utility packages:

"devDependencies": { "masteringjs.io": "https://github.com/mastering-js/masteringjs.io", "moment": " 2.29.4 CARDINAL ", " nlcst ORG -to-string": "2.x", "remark": " 13.x PERSON " },

Mastering JS’ articles are stored in an array in the masteringjs.io GitHub repo’s src/tutorials.js file. The idea is to pull all the tutorials, split them up into "chunks" by headers, and generate an embedding (vector) for each chunk using ChatGPT, assuming that your OpenAI key is stored in the OPEN_AI_KEY environment variable:

function createEmbedding ( input ) { return axios({ method: ‘POST’ , url: ‘https://api.openai.com/v1/embeddings’ , headers: { ‘Content-Type’ : ‘application/json’ , Authorization: `Bearer ${process.env.OPEN_AI_KEY} ` }, data: { model: ‘text-embedding-ada-002’ , input } }).then(res => res.data.data[ 0 CARDINAL ].embedding); }

Once the app has an embedding for a given chunk, it will store the chunk in the Article model:

const mongoose = require ( ‘.. /mongoose PERSON ‘ ); const articleSchema = new mongoose. Schema PERSON ({ $vector: [ Number ], title: String PERSON , content: String PERSON , url: String }); module .exports = mongoose.model( ‘ Article’ LAW , articleSchema PERSON , ‘articles’ );

Here’s the script that chunks Mastering PERSON JS’ articles and imports them into Astra PRODUCT . The script first ORDINAL drops and recreates the articles collection to clear out any existing data:

const Article = require ( ‘../src/db/article’ ); await mongoose.connect(process.env.ASTRA_URI, { isAstra NORP : true }); await Article.db.dropCollection( ‘articles’ ); await Article.createCollection({ vector: { dimension: 1536 CARDINAL , metric: ‘cosine’ } });

Next ORG , the script reads all the articles, and splits them by any header tags ( h1 , h2 , h3 , etc.).

const articles = require ( ‘masteringjs.io/src/tutorials’ ); let i = 0 ; for ( const { title, raw, url, tags } of articles) { if (tags.includes( ‘temporal’ ) || tags.includes( ‘tools’ )) { continue ; } console .log( ‘Importing’ , title); const content = fs.readFileSync( ` ${__ dirname PERSON } /../node_modules/masteringjs.io ${raw.replace(/^\./, ”)} ` , ‘utf8’ ); const ast = remark.parse(content); const sections = [{ heading: null , nodes: [] }]; let currentSection = 0 ; ast.children.forEach(node => { if (node.type === ‘heading’ ) { ++currentSection; console .log(nlcstToString(node)); sections[currentSection] = { heading: nlcstToString(node), nodes: [] }; } sections[currentSection].nodes.push(node); }); }

Finally, the script stores each section in the database as an Article document:

console .log( `Importing ${sections.length} sections` ); for ( const section of sections) { const content = remark.stringify({ type: ‘root’ , children: section.nodes }); const embedding = await createEmbedding(content); const contentTitle = section.heading ORG ? ` ${title} : ${section.heading} ` : title; const contentUrl = section.heading ? ` ${url} # ${toKebabCase(section.heading)} ` : url; await Article.create({ title: contentTitle, url: contentUrl, content, $vector: embedding }); }

Retrieval Augmented Generation

The script from the previous section imports approximately 1200 CARDINAL sections of content related to JavaScript programming. Given the Article model, here’s an Express API PRODUCT endpoint that uses RAG to answer the given question using the 3 CARDINAL most relevant articles as context.

const Article = require ( ‘./src/db/article’ ); const { Configuration, OpenAIApi } = require ( ‘openai’ ); const assert = require ( ‘assert’ ); const axios = require ( ‘axios’ ); const apiKey ORG = process.env.OPEN_AI_KEY; assert.ok(apiKey, ‘No OPEN_AI_KEY specified’ ); const configuration = new Configuration({ apiKey ORG }); const openai = new OpenAIApi(configuration); module .exports = async function chatbot ( req, res ) { const { question } = req.body; const embedding = await createEmbedding(question); let articles = await Article .find() .sort PERSON ({ $vector: { $meta: embedding } }) .limit( 3 CARDINAL ); const prompt = ` Answer this question with this context: Question: ${question} Context: ${articles[0].content} Context: ${articles[1].content} Context: ${articles[2].content} ` ; const response = await openai.createChatCompletion({ model: ‘gpt-3.5-turbo’ , messages: [ { role: ‘user’ , content: prompt } ], temperature: 0 CARDINAL , max_tokens NORP : 2000 CARDINAL }); res.json PERSON ({ content: response.data.choices[ 0 ].message.content, link: articles[ 0 CARDINAL ].url, title: articles[ 0 CARDINAL ] .title ORG , sources: articles.map(article => ({ link: article.url, title: article.title } PERSON )) }); }

With Astra PRODUCT , vector search is represented as a Mongoose GPE find() query with a special sort() parameter. So you can use limit() to limit the number of documents in the result as shown in the above example. You also get all the benefits of Mongoose GPE queries, including filtering using a subset of MongoDB query operators, automated query casting, middleware, and populate() .

Moving On

Retrieval augmented generation (RAG) is a powerful tool for improving LLM output. RAG means you can highlight the most relevant examples to the user’s prompt, and even provide context from private sources, like internal knowledge bases or even your app’s source code. The combination of Astra ORG and Mongoose GPE means you can leverage Mongoose GPE ‘s API in RAG applications, including filtering using query operators and using populate() to load related data.

The combination of Astra ORG and Mongoose GPE has a uniquely excellent developer experience for vector search. I’m currently in the process of moving my production vector search apps over to Astra ORG and Mongoose GPE from Pinecone. So whether you’re just learning about vector search or already making use of vector search in production, you should try Mongoose GPE with Astra ORG .

Connecting to blog.lzomedia.com... Connected... Page load complete