Tools: Essential Guide: My first “local” Vector Search: MongoDB community edition
How did I test this? This article was written by Darshan Jayarama. Ever since I received an email about the vector search auto-Embedding feature released in MongoDB community edition, my palms have been itching to test it out. And now.. I’m happily writing this blog after witnessing its power firsthand. Vector Search is one of the most admired, powerful, and amazing products of MongoDB. I felt pity for MongoDB Community/On-prem customers, as they were unable to use this because Vector search and Atlas search were only available in MongoDB Atlas. But this is no longer the case! I am sure by now you might be thinking, “Enough of this product endorsement, show me the code!” I’m getting there :). Initially, I followed what the MongoDB documentation suggests, but I struggled a lot as there is a disconnection when setting up the Docker containers: When we asked to start the MongoDB Search container, in that yml file, it sets the syncsource as mongot-community.search-community:27017 Next, there is mongot username and password entry that should be made in the yml file. Whileyou are creating a user, the instructions specify mongot user as mongot, but in the yml file, it is mentioned as mongotUser. If you created it with another name, correct the section. After these corrections, you can see the 2 containers running happily (I literally jumped out of my chair over joy). Next, connect to the mongod using mongosh to test the power. I created an index on the favorite movies collection, plot field: You can use voyage-4-lite, but as per the voyage usage stats, both voyage-4 and voyage-4-lite both have the same 10000TPM 3RPM limitation. My preferred choice is voyage-4. Then I wanted to see the results based on the context. So I ran the below query: The above query should retrieve movies with context of “bullied boy learns karate” (here we expect ‘karate-kid’): The answers amused me: Even though the expected result was not there, all the results related to karate, boxing, or being bullied. Below are the commands I have used to complete the setup of this. Pulling docker images; Create internal docker network to communicate Starting community server container Load sample movies collection data
Downloaded movies from https://github.com/neelabalan/mongodb-sample-dataset/blob/main/sample_mflix/movies.json Prepare search process Creating mongot config file Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret Repeat one more time for index API. Go to https://dashboard.voyageai.com/organization/api-keys API key, copy secret Starting MongoDB Search Check the docker status Now connect to the mongosh and create the index and run the test query I know we can make one single Docker compose file to make it easier, but I prefer this way to make debugging easier, and the understanding of each step will be clearer. Congratulations! You have successfully created your first local vector search. PS: To be safe, delete the API keys when no longer needed, just like I did :). Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
})
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
})
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
})
db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
])
db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
])
db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
])
[ { plot: 'A love-struck weakling must pretend to be boxer in order to gain respect from the family of the girl he loves.', title: 'Battling Butler', year: 1926, score: 0.6800339221954346 }, { plot: 'Two young brothers become the leaders of a gang of kids in their neighborhood. Their father is an office clerk who tries for advancement by playing up his boss. When the boys visit the boss...', title: 'I Was Born, But...', year: 1932, score: 0.6692402362823486 }, { plot: 'A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.', title: 'Pinocchio', year: 1940, score: 0.6628834009170532 }, { plot: 'An idealistic adolescent, suffering under the thumb of a sadistic schoolmaster, falls in love with a loose girl who is bullied and tormented by another lover.', title: 'Torment', year: 1944, score: 0.6620646715164185 }, { plot: `Against all odds Father Flanagan starts "Boys' Town" after hearing a convict's story. Whitey Marsh comes there. He runs away but, hungry, returns. He runs away again but, when friend Pee ...`, title: 'Boys Town', year: 1938, score: 0.6570459604263306 }, { plot: 'When three thuggish men are responsible for the death of his father and the crippling of his brother, young David must choose between supporting his family or risking his life and exacting vengeance.', title: "Tol'able David", year: 1921, score: 0.6559526920318604 }, { plot: "Fight promoter Nick Donati grooms a bellhop as a future champ, but has second thoughts when the 'kid' falls for his sister.", title: 'Kid Galahad', year: 1937, score: 0.6525536179542542 }, { plot: 'In a repressive boarding school with rigid rules of behavior, four boys decide to rebel against the direction on a celebration day.', title: 'Zero for Conduct', year: 1933, score: 0.651709794998169 }, { plot: 'While at a ski lodge, Larry Blake sees instructor Karin Borg and decides to sign up for private lessons. The next thing he knows, she is Mrs. Blake. When he announces that he is going back ...', title: 'Two-Faced Woman', year: 1941, score: 0.6513179540634155 }, { plot: 'To reconcile with his girlfriend, a bookish college student tries to become an athlete.', title: 'College', year: 1927, score: 0.6459328532218933 }
]
[ { plot: 'A love-struck weakling must pretend to be boxer in order to gain respect from the family of the girl he loves.', title: 'Battling Butler', year: 1926, score: 0.6800339221954346 }, { plot: 'Two young brothers become the leaders of a gang of kids in their neighborhood. Their father is an office clerk who tries for advancement by playing up his boss. When the boys visit the boss...', title: 'I Was Born, But...', year: 1932, score: 0.6692402362823486 }, { plot: 'A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.', title: 'Pinocchio', year: 1940, score: 0.6628834009170532 }, { plot: 'An idealistic adolescent, suffering under the thumb of a sadistic schoolmaster, falls in love with a loose girl who is bullied and tormented by another lover.', title: 'Torment', year: 1944, score: 0.6620646715164185 }, { plot: `Against all odds Father Flanagan starts "Boys' Town" after hearing a convict's story. Whitey Marsh comes there. He runs away but, hungry, returns. He runs away again but, when friend Pee ...`, title: 'Boys Town', year: 1938, score: 0.6570459604263306 }, { plot: 'When three thuggish men are responsible for the death of his father and the crippling of his brother, young David must choose between supporting his family or risking his life and exacting vengeance.', title: "Tol'able David", year: 1921, score: 0.6559526920318604 }, { plot: "Fight promoter Nick Donati grooms a bellhop as a future champ, but has second thoughts when the 'kid' falls for his sister.", title: 'Kid Galahad', year: 1937, score: 0.6525536179542542 }, { plot: 'In a repressive boarding school with rigid rules of behavior, four boys decide to rebel against the direction on a celebration day.', title: 'Zero for Conduct', year: 1933, score: 0.651709794998169 }, { plot: 'While at a ski lodge, Larry Blake sees instructor Karin Borg and decides to sign up for private lessons. The next thing he knows, she is Mrs. Blake. When he announces that he is going back ...', title: 'Two-Faced Woman', year: 1941, score: 0.6513179540634155 }, { plot: 'To reconcile with his girlfriend, a bookish college student tries to become an athlete.', title: 'College', year: 1927, score: 0.6459328532218933 }
]
[ { plot: 'A love-struck weakling must pretend to be boxer in order to gain respect from the family of the girl he loves.', title: 'Battling Butler', year: 1926, score: 0.6800339221954346 }, { plot: 'Two young brothers become the leaders of a gang of kids in their neighborhood. Their father is an office clerk who tries for advancement by playing up his boss. When the boys visit the boss...', title: 'I Was Born, But...', year: 1932, score: 0.6692402362823486 }, { plot: 'A living puppet, with the help of a cricket as his conscience, must prove himself worthy to become a real boy.', title: 'Pinocchio', year: 1940, score: 0.6628834009170532 }, { plot: 'An idealistic adolescent, suffering under the thumb of a sadistic schoolmaster, falls in love with a loose girl who is bullied and tormented by another lover.', title: 'Torment', year: 1944, score: 0.6620646715164185 }, { plot: `Against all odds Father Flanagan starts "Boys' Town" after hearing a convict's story. Whitey Marsh comes there. He runs away but, hungry, returns. He runs away again but, when friend Pee ...`, title: 'Boys Town', year: 1938, score: 0.6570459604263306 }, { plot: 'When three thuggish men are responsible for the death of his father and the crippling of his brother, young David must choose between supporting his family or risking his life and exacting vengeance.', title: "Tol'able David", year: 1921, score: 0.6559526920318604 }, { plot: "Fight promoter Nick Donati grooms a bellhop as a future champ, but has second thoughts when the 'kid' falls for his sister.", title: 'Kid Galahad', year: 1937, score: 0.6525536179542542 }, { plot: 'In a repressive boarding school with rigid rules of behavior, four boys decide to rebel against the direction on a celebration day.', title: 'Zero for Conduct', year: 1933, score: 0.651709794998169 }, { plot: 'While at a ski lodge, Larry Blake sees instructor Karin Borg and decides to sign up for private lessons. The next thing he knows, she is Mrs. Blake. When he announces that he is going back ...', title: 'Two-Faced Woman', year: 1941, score: 0.6513179540634155 }, { plot: 'To reconcile with his girlfriend, a bookish college student tries to become an athlete.', title: 'College', year: 1927, score: 0.6459328532218933 }
]
docker pull mongodb/mongodb-community-search:latest
docker pull mongodb/mongodb-community-server:latest
docker pull mongodb/mongodb-community-search:latest
docker pull mongodb/mongodb-community-server:latest
docker pull mongodb/mongodb-community-search:latest
docker pull mongodb/mongodb-community-server:latest
docker network create search-community
docker network create search-community
docker network create search-community
echo '
net: port: 27017 bindIpAll: true # Equivalent to --bind_ip_all replication: replSetName: rs0
systemLog: destination: file path: "/var/log/mongodb/mongod.log" logAppend: true setParameter: searchIndexManagementHostAndPort: mongot-community.search-community:27028 mongotHost: mongot-community.search-community:27028 skipAuthenticationToSearchIndexManagementServer: false useGrpcForSearch: true # Security configuration
security: authorization: enabled # Equivalent to --auth keyFile: /keyfile' > mongod.conf mkdir ./data/db openssl rand -base64 756 > keyfile
chmod 400 keyfile docker run -d --rm --name mongod -e MONGODB_INITDB_ROOT_USERNAME=root -e MONGODB_INITDB_ROOT_PASSWORD=rootpass -v ./mongod.conf:/etc/mongod.conf:ro -v ./data/db:/data/db -v ./keyfile:/keyfile -p 27017:27017 --network search-community mongodb/mongodb-community-server:latest --config /etc/mongod.conf
echo '
net: port: 27017 bindIpAll: true # Equivalent to --bind_ip_all replication: replSetName: rs0
systemLog: destination: file path: "/var/log/mongodb/mongod.log" logAppend: true setParameter: searchIndexManagementHostAndPort: mongot-community.search-community:27028 mongotHost: mongot-community.search-community:27028 skipAuthenticationToSearchIndexManagementServer: false useGrpcForSearch: true # Security configuration
security: authorization: enabled # Equivalent to --auth keyFile: /keyfile' > mongod.conf mkdir ./data/db openssl rand -base64 756 > keyfile
chmod 400 keyfile docker run -d --rm --name mongod -e MONGODB_INITDB_ROOT_USERNAME=root -e MONGODB_INITDB_ROOT_PASSWORD=rootpass -v ./mongod.conf:/etc/mongod.conf:ro -v ./data/db:/data/db -v ./keyfile:/keyfile -p 27017:27017 --network search-community mongodb/mongodb-community-server:latest --config /etc/mongod.conf
echo '
net: port: 27017 bindIpAll: true # Equivalent to --bind_ip_all replication: replSetName: rs0
systemLog: destination: file path: "/var/log/mongodb/mongod.log" logAppend: true setParameter: searchIndexManagementHostAndPort: mongot-community.search-community:27028 mongotHost: mongot-community.search-community:27028 skipAuthenticationToSearchIndexManagementServer: false useGrpcForSearch: true # Security configuration
security: authorization: enabled # Equivalent to --auth keyFile: /keyfile' > mongod.conf mkdir ./data/db openssl rand -base64 756 > keyfile
chmod 400 keyfile docker run -d --rm --name mongod -e MONGODB_INITDB_ROOT_USERNAME=root -e MONGODB_INITDB_ROOT_PASSWORD=rootpass -v ./mongod.conf:/etc/mongod.conf:ro -v ./data/db:/data/db -v ./keyfile:/keyfile -p 27017:27017 --network search-community mongodb/mongodb-community-server:latest --config /etc/mongod.conf
mongosh -u root -p rootpass --eval 'rs.initiate(); sleep(10); rs.status()'
mongosh -u root -p rootpass --eval "db.getSiblingDB('admin').createUser( { user: 'mongot', pwd: 'mongotPassword', roles: ['searchCoordinator'] }
)"
mongosh -u root -p rootpass --eval 'rs.initiate(); sleep(10); rs.status()'
mongosh -u root -p rootpass --eval "db.getSiblingDB('admin').createUser( { user: 'mongot', pwd: 'mongotPassword', roles: ['searchCoordinator'] }
)"
mongosh -u root -p rootpass --eval 'rs.initiate(); sleep(10); rs.status()'
mongosh -u root -p rootpass --eval "db.getSiblingDB('admin').createUser( { user: 'mongot', pwd: 'mongotPassword', roles: ['searchCoordinator'] }
)"
mongoimport -d sample_mflix -c movies /Users/darshanj/Downloads/movies.json -u root -p rootpass --authenticationDatabase admin
mongoimport -d sample_mflix -c movies /Users/darshanj/Downloads/movies.json -u root -p rootpass --authenticationDatabase admin
mongoimport -d sample_mflix -c movies /Users/darshanj/Downloads/movies.json -u root -p rootpass --authenticationDatabase admin
mkdir mongot_data
mkdir mongot_data
mkdir mongot_data
hostname=`mongosh -u root -p rootpass --eval "db.isMaster().me"`
cat << EOF > mongot.config
syncSource: replicaSet: hostAndPort: "$hostname" username: "mongot" passwordFile: "/passwordFile" authSource: "admin" tls: false readPreference: primaryPreferred
storage: dataPath: "/data/mongot"
server: grpc: address: "mongot-community.search-community:27028" tls: mode: "disabled"
metrics: enabled: true address: "mongot-community.search-community:9946"
healthCheck: address: "mongot-community.search-community:8080"
logging: verbosity: INFO
embedding: queryKeyFile: /etc/mongot/voyage-api-query-key indexingKeyFile: /etc/mongot/voyage-api-indexing-key providerEndpoint: https://api.voyageai.com/v1/embeddings isAutoEmbeddingViewWriter: true echo -n "mongotPassword" > passwordFile
chmod 400 passwordFile
hostname=`mongosh -u root -p rootpass --eval "db.isMaster().me"`
cat << EOF > mongot.config
syncSource: replicaSet: hostAndPort: "$hostname" username: "mongot" passwordFile: "/passwordFile" authSource: "admin" tls: false readPreference: primaryPreferred
storage: dataPath: "/data/mongot"
server: grpc: address: "mongot-community.search-community:27028" tls: mode: "disabled"
metrics: enabled: true address: "mongot-community.search-community:9946"
healthCheck: address: "mongot-community.search-community:8080"
logging: verbosity: INFO
embedding: queryKeyFile: /etc/mongot/voyage-api-query-key indexingKeyFile: /etc/mongot/voyage-api-indexing-key providerEndpoint: https://api.voyageai.com/v1/embeddings isAutoEmbeddingViewWriter: true echo -n "mongotPassword" > passwordFile
chmod 400 passwordFile
hostname=`mongosh -u root -p rootpass --eval "db.isMaster().me"`
cat << EOF > mongot.config
syncSource: replicaSet: hostAndPort: "$hostname" username: "mongot" passwordFile: "/passwordFile" authSource: "admin" tls: false readPreference: primaryPreferred
storage: dataPath: "/data/mongot"
server: grpc: address: "mongot-community.search-community:27028" tls: mode: "disabled"
metrics: enabled: true address: "mongot-community.search-community:9946"
healthCheck: address: "mongot-community.search-community:8080"
logging: verbosity: INFO
embedding: queryKeyFile: /etc/mongot/voyage-api-query-key indexingKeyFile: /etc/mongot/voyage-api-indexing-key providerEndpoint: https://api.voyageai.com/v1/embeddings isAutoEmbeddingViewWriter: true echo -n "mongotPassword" > passwordFile
chmod 400 passwordFile
printf "<your-voyage-api-query-key>" > voyage-api-query-key
printf "<your-voyage-api-query-key>" > voyage-api-query-key
printf "<your-voyage-api-query-key>" > voyage-api-query-key
printf "<your-voyage-api-index-key>" > voyage-api-indexing-key
printf "<your-voyage-api-index-key>" > voyage-api-indexing-key
printf "<your-voyage-api-index-key>" > voyage-api-indexing-key
docker run -d --rm --name mongot-community -v ./mongot_data:/data/mongot -v ./mongot.config:/mongot-community/config.default.yml -v ./passwordFile:/passwordFile:ro -v ./voyage-api-indexing-key:/etc/mongot/voyage-api-indexing-key:ro -v ./voyage-api-query-key:/etc/mongot/voyage-api-query-key:ro --network search-community -p 8080:8080 -p 9946:9946 mongodb/mongodb-community-search:latest --internalListAllIndexesForTesting=true
docker run -d --rm --name mongot-community -v ./mongot_data:/data/mongot -v ./mongot.config:/mongot-community/config.default.yml -v ./passwordFile:/passwordFile:ro -v ./voyage-api-indexing-key:/etc/mongot/voyage-api-indexing-key:ro -v ./voyage-api-query-key:/etc/mongot/voyage-api-query-key:ro --network search-community -p 8080:8080 -p 9946:9946 mongodb/mongodb-community-search:latest --internalListAllIndexesForTesting=true
docker run -d --rm --name mongot-community -v ./mongot_data:/data/mongot -v ./mongot.config:/mongot-community/config.default.yml -v ./passwordFile:/passwordFile:ro -v ./voyage-api-indexing-key:/etc/mongot/voyage-api-indexing-key:ro -v ./voyage-api-query-key:/etc/mongot/voyage-api-query-key:ro --network search-community -p 8080:8080 -p 9946:9946 mongodb/mongodb-community-search:latest --internalListAllIndexesForTesting=true
mongosh -u root -p rootpass use sample_mflix
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
}) db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
])
mongosh -u root -p rootpass use sample_mflix
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
}) db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
])
mongosh -u root -p rootpass use sample_mflix
db.movies.createSearchIndex("vector_index", "vectorSearch", { "fields": [ { "type": "autoEmbed", "modality": "text", "path": "plot", "model": "voyage-4" }, { "type": "filter", "path": "year" } ]
}) db.movies.aggregate([ { "$vectorSearch": { "index": "vector_index", "path": "plot", "query": { "text": "bullied boy learns karate" },"model": "voyage-4-lite" , "numCandidates": 10000, "limit": 10 } }, { "$project": { "_id": 0, "title": 1, "year":1, "plot": 1, "score": { $meta: "vectorSearchScore" } } }
]) - As per the Installation documentation, select Docker as your operating system.
- When the MongoDB community server starts, it starts with hostname as mongod.search-community:27017. Make a note of this.
- When we asked to start the MongoDB Search container, in that yml file, it sets the syncsource as mongot-community.search-community:27017 Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name mongod.search-community:27017. To ensure your container hostname, run db.isMaster().me, it should print you the hostname.
- Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name mongod.search-community:27017. To ensure your container hostname, run db.isMaster().me, it should print you the hostname.
- Next, there is mongot username and password entry that should be made in the yml file. Whileyou are creating a user, the instructions specify mongot user as mongot, but in the yml file, it is mentioned as mongotUser. If you created it with another name, correct the section. - Don't just copy/paste the yml from there. Make sure you are setting the correct syncSource, which should be the Community server container name mongod.search-community:27017. To ensure your container hostname, run db.isMaster().me, it should print you the hostname.