Retrieval Augmented Generation with Astra and Mongoose

By admin
Retrieval augmented generation, or RAG for short, is a technique that injects additional relevant context into an LLM prompt in order to get better results. For example, if you were to ask

ChatGPT
PERSON

to write a tweet about a new blog post describing how to work with vector databases in Node.js, you might get the following output.

๐Ÿš€

Exciting News
ORG

for Node.js Developers! ๐Ÿš€ Just dropped a ๐Ÿ”ฅNEW BLOG POST๐Ÿ”ฅ on mastering Vector Databases in Node.js.

๐Ÿ“Š Learn
ORG

how to supercharge your data-driven apps with speed and efficiency. Don’t miss it! ๐Ÿš€ Read the full post here: [Insert Blog Post Link]

#NodeJS #
MONEY

VectorDatabases #WebDevelopment #DataDrivenApps ๐Ÿ’ป๐Ÿ”

This tweet is reasonable, but a bit generic. How about if we add an example of a relevant tweet and ask

ChatGPT
PERSON

to copy the tone and style using the following prompt?

Write a tweet promoting a new blog post about working with vector databases in Node.js Write using the style and tone of the following tweet: Ready to venture into the world of

Generative AI
ORG

? Learn the ins and outs of infrastructure and select the right LLM for your project with this how-to guide.

Here’s the result:

Ready to dive into the world of vector databases in Node.js? Discover the secrets to optimizing your data handling and supercharging your apps! Check out our latest blog post for the ultimate how-to guide.

RAG means using a vector database to find relevant examples to the user’s prompt, to help guide the

LLM
ORG

to a better result. Instead of sending

one
CARDINAL

hard-coded example along with the user’s prompt to the

LLM
WORK_OF_ART

, you can search through a massive collection of content to find the content most relevant to the user’s query, and send that content along with the user’s prompt to the

LLM
PRODUCT

.

In this blog post, I’ll show how to use

Astra
ORG

‘s vector search with

Mongoose
GPE

to build a RAG chatbot that answers JavaScript programming questions using

Mastering JS’
PRODUCT

articles for context.

You can see the full source code for the Mastering JS chatbot here. You can also access a live version of the Mastering JS chatbot here, just be aware that this chatbot is limited to

100 queries per hour
QUANTITY

to prevent abuse.

Getting Started with

Astra
ORG

and

Mongoose
LOC


Astra
ORG

‘s new vector search capabilities are integrated into

Astra
ORG

‘s JSON API, which means you can use

Mongoose
GPE

queries to do vector search in

Astra
ORG

. To get started, install

Mongoose
GPE

and stargate-mongoose.

npm install mongoose stargate-mongoose

Next, deploy a new vector database on

Astra
PRODUCT

.

Click into your new

Astra
PRODUCT

vector database, click "Connect", and find "Connect with the JSON API". You should see the following instructions:

Click "

Generate Configuration
WORK_OF_ART

" to get credentials for your

Astra
ORG

vector database. You will get a

ASTRA_DB_ID
ORG

,

ASTRA_DB_REGION
EVENT

,

ASTRA_DB_KEYSPACE
PERSON

, and

ASTRA_DB_APPLICATION_TOKEN
CARDINAL

. Copy these values into my

Astra
PRODUCT

connection string generation tool to get your

Astra
PRODUCT

connection string. It should look like the following:

https://3418a8ac-OMITTED-us-east-2.apps.astra.datastax.com/api/json/v1/test?applicationToken=AstraCS:OMITTED

With an

Astra
PRODUCT

connection string, you’re now ready to connect to

Astra
ORG

and start querying documents. Import

Mongoose
GPE

and stargate-mongoose as follows:

const mongoose = require ( ‘mongoose’ ); mongoose.set( ‘autoCreate’ , false ); mongoose.set( ‘autoIndex’ , false ); const { driver } = require ( ‘stargate-mongoose’ );

mongoose.setDriver(driver
PERSON

);

Then, connect to

Astra
ORG

using mongoose.connect() , and create a new collection tests that has vector search enabled as follows.

await mongoose.connect(process.env.ASTRA_URI, {

isAstra
NORP

: true }); await mongoose.connection.dropCollection( ‘tests’ ); await mongoose.connection.createCollection( ‘tests’ , { vector: { metric: ‘cosine’ , dimension:

2
CARDINAL

} });

Create a new

Mongoose
GPE

model with a $vector property. $vector is a special property containing an array of numbers that

Astra
ORG

uses to store the document’s vector for vector search.

const schema = new mongoose.

Schema
PERSON

({ $vector: [ Number ], name: String }); const

TestModel
ORG

= mongoose.model( ‘Test’ , schema, ‘tests’ ); await TestModel.create({ name: ‘Test 1’ , $vector: [

0
CARDINAL

,

1
CARDINAL

] }); await TestModel.create({ name: ‘Test 2’ , $vector: [

0
CARDINAL

, -1 ] });

With JSON API, vector search is represented as a

Mongoose
GPE

query with a special sort() parameter on $vector . The sort parameter to $vector tells

Astra
ORG

which vector to find documents closest to. The following example shows how to query for which documents are closest to the vector [

0.1
CARDINAL

,

-0.9
CARDINAL

] followed by the documents closest to [

0.1
CARDINAL

,

0.9
CARDINAL

] :

let docs = await TestModel.find()

.sort
PERSON

({ $vector: { $meta: [

0.1
CARDINAL

,

-0.9
CARDINAL

] } }); console .log(docs.map(doc => doc.name));

docs =
PERSON

await TestModel.find()

.sort
PERSON

({ $vector: { $meta: [

0.1
CARDINAL

,

0.9
CARDINAL

] } }); console .log(docs.map(doc => doc.name));

Importing Mastering JS’

Articles

Retrieval
WORK_OF_ART

augmented generation generally works best when you have a large data set of content that is relevant to the sort of questions your app expects. For example, for

JavaScript
ORG

programming questions,

Mastering JS
PERSON

has an easily scrapable collection of

approximately 500
CARDINAL

articles on common

JavaScript
PRODUCT

programming tasks. To import

Mastering
PERSON

JS’ articles, you can add

Mastering JS
PRODUCT

as a

devDependency
LOC

in your package.json , along with a couple other utility packages:

"devDependencies": { "masteringjs.io": "https://github.com/mastering-js/masteringjs.io", "moment": "

2.29.4
CARDINAL

", "

nlcst
ORG

-to-string": "2.x", "remark": "

13.x
PERSON

" },

Mastering JS’ articles are stored in an array in the masteringjs.io GitHub repo’s src/tutorials.js file. The idea is to pull all the tutorials, split them up into "chunks" by headers, and generate an embedding (vector) for each chunk using ChatGPT, assuming that your OpenAI key is stored in the OPEN_AI_KEY environment variable:

function createEmbedding ( input ) { return axios({ method: ‘POST’ , url: ‘https://api.openai.com/v1/embeddings’ , headers: { ‘Content-Type’ : ‘application/json’ , Authorization: `Bearer ${process.env.OPEN_AI_KEY} ` }, data: { model: ‘text-embedding-ada-002’ , input } }).then(res => res.data.data[

0
CARDINAL

].embedding); }

Once the app has an embedding for a given chunk, it will store the chunk in the Article model:

const mongoose = require ( ‘..

/mongoose
PERSON

‘ ); const articleSchema = new mongoose.

Schema
PERSON

({ $vector: [ Number ], title:

String
PERSON

, content:

String
PERSON

, url: String }); module .exports = mongoose.model( ‘

Article’
LAW

,

articleSchema
PERSON

, ‘articles’ );

Here’s the script that chunks

Mastering
PERSON

JS’ articles and imports them into

Astra
PRODUCT

. The script

first
ORDINAL

drops and recreates the articles collection to clear out any existing data:

const Article = require ( ‘../src/db/article’ ); await mongoose.connect(process.env.ASTRA_URI, {

isAstra
NORP

: true }); await Article.db.dropCollection( ‘articles’ ); await Article.createCollection({ vector: { dimension:

1536
CARDINAL

, metric: ‘cosine’ } });


Next
ORG

, the script reads all the articles, and splits them by any header tags ( h1 , h2 , h3 , etc.).

const articles = require ( ‘masteringjs.io/src/tutorials’ ); let i = 0 ; for ( const { title, raw, url, tags } of articles) { if (tags.includes( ‘temporal’ ) || tags.includes( ‘tools’ )) { continue ; } console .log( ‘Importing’ , title); const content = fs.readFileSync( ` ${__

dirname
PERSON

} /../node_modules/masteringjs.io ${raw.replace(/^\./, ”)} ` , ‘utf8’ ); const ast = remark.parse(content); const sections = [{ heading: null , nodes: [] }]; let currentSection = 0 ; ast.children.forEach(node => { if (node.type === ‘heading’ ) { ++currentSection; console .log(nlcstToString(node)); sections[currentSection] = { heading: nlcstToString(node), nodes: [] }; } sections[currentSection].nodes.push(node); }); }

Finally, the script stores each section in the database as an Article document:

console .log( `Importing ${sections.length} sections` ); for ( const section of sections) { const content = remark.stringify({ type: ‘root’ , children: section.nodes }); const embedding = await createEmbedding(content); const

contentTitle = section.heading
ORG

? ` ${title} : ${section.heading} ` : title; const contentUrl = section.heading ? ` ${url} # ${toKebabCase(section.heading)} ` : url; await Article.create({ title: contentTitle, url: contentUrl, content, $vector: embedding }); }

Retrieval Augmented Generation

The script from the previous section imports

approximately 1200
CARDINAL

sections of content related to JavaScript programming. Given the Article model, here’s an

Express API
PRODUCT

endpoint that uses RAG to answer the given question using the

3
CARDINAL

most relevant articles as context.

const Article = require ( ‘./src/db/article’ ); const { Configuration, OpenAIApi } = require ( ‘openai’ ); const assert = require ( ‘assert’ ); const axios = require ( ‘axios’ ); const

apiKey
ORG

= process.env.OPEN_AI_KEY; assert.ok(apiKey, ‘No OPEN_AI_KEY specified’ ); const configuration = new Configuration({

apiKey
ORG

}); const openai = new OpenAIApi(configuration); module .exports = async function chatbot ( req, res ) { const { question } = req.body; const embedding = await createEmbedding(question); let articles = await Article .find()

.sort
PERSON

({ $vector: { $meta: embedding } }) .limit(

3
CARDINAL

); const prompt = ` Answer this question with this context: Question: ${question} Context: ${articles[0].content} Context: ${articles[1].content} Context: ${articles[2].content} ` ; const response = await openai.createChatCompletion({ model: ‘gpt-3.5-turbo’ , messages: [ { role: ‘user’ , content: prompt } ], temperature:

0
CARDINAL

,

max_tokens
NORP

:

2000
CARDINAL

});

res.json
PERSON

({ content: response.data.choices[ 0 ].message.content, link: articles[

0
CARDINAL

].url, title: articles[

0
CARDINAL

]

.title
ORG

, sources: articles.map(article => ({ link: article.url, title:

article.title }
PERSON

)) }); }

With

Astra
PRODUCT

, vector search is represented as a

Mongoose
GPE

find() query with a special sort() parameter. So you can use limit() to limit the number of documents in the result as shown in the above example. You also get all the benefits of

Mongoose
GPE

queries, including filtering using a subset of MongoDB query operators, automated query casting, middleware, and populate() .

Moving On

Retrieval augmented generation (RAG) is a powerful tool for improving LLM output. RAG means you can highlight the most relevant examples to the user’s prompt, and even provide context from private sources, like internal knowledge bases or even your app’s source code. The combination of

Astra
ORG

and

Mongoose
GPE

means you can leverage

Mongoose
GPE

‘s API in RAG applications, including filtering using query operators and using populate() to load related data.

The combination of

Astra
ORG

and

Mongoose
GPE

has a uniquely excellent developer experience for vector search. I’m currently in the process of moving my production vector search apps over to

Astra
ORG

and

Mongoose
GPE

from Pinecone. So whether you’re just learning about vector search or already making use of vector search in production, you should try

Mongoose
GPE

with

Astra
ORG

.