Tools: PostgreSQL: First Approach to Vector Databases with pgvector and Python

Tools: PostgreSQL: First Approach to Vector Databases with pgvector and Python

Set PostgreSQL as a Vector Database

Manual Installation

Running a Docker Container

Embedding Models

Transforming Data into Vectors

Gemini

Sentence Transformers

Storing Generated Vectors in Your Database

Conclusion If you're already familiar with relational databases like PostgreSQL, you're one step closer to start with vector databases and build AI applications. Through this tutorial you'll learn how to enable vector capabilities on your PostgreSQL instance using pgvector, transform raw text into the required format using Python, and perform searches. First of all, what is a vector database? In a vector database, information is stored as vectors—often referred to as embeddings. These contain numerical values that represent the meaning of the data, allowing LLMs to interpret and relate information. Query results are determined by the values within these vectors, returning the nearest vectors based on similarity. Imagine having a text like: 'Artificial Intelligence is transforming the way we process data in PostgreSQL.'. When converting this value using an embedding model, you'll get a high-dimensional vector that looks like this: We’ll discuss models, and vectors in detail later. For now, let’s get PostgreSQL ready. Suppose your PostgreSQL instance was installed manually or via your Linux distribution's package manager. To install pgvector, follow the instructions provided in the official repository. Before starting, make sure make, clang, and llvm are installed on your system. Once the extension is installed, enable it within your PostgreSQL instance. You'll get the following output: Select the database: \c movies Enable the extension: If you prefer to deploy your PostgreSQL instance using Docker, just run the following command: Replace password with your desired password. The image used for creating the container is the official Docker image of pgvector that also includes the PostgreSQL server. Log in to the PostgreSQL instance: Set up a database to store movie synopses Select the database: \c movies Enable the extension: An embedding model is a machine learning model used to transform data sources into vectors. The model determines not only how embeddings are created but also the dimensionality of the vector. While these are often provided by LLMs, they can also be specialized models designed solely for embedding tasks. For apps built with Python, you can utilize Gemini or OpenAI models through their official SDKs: Another option is to use LangChain, a framework for building agents, and LLM-powered applications, which can facilitate requests to these models. LLMs can also be executed locally without sending data to the cloud. This is possible using Open Source tools like Ollama, where models are downloaded from its own library. A similar and highly popular library is Sentence Transformers. It provides access to the Hugging Face catalog, allowing you to download models to run locally—typycally via PyTorch—without the need for an API key. Here's a comparison table of the most used embedding models, including name, provider, and dimensionality. Dimensionality refers to the size of the vector created by the model. If you're planning to use the model provided by Google, create an API key first. Now set the environment variable for the API key by running the following command: Replacing YOUR_API_KEY with the value of the key you prevously created. Install the Python library: If you want to transform this text: "A futuristic journey through the stars." into a vector. Here's how you can use the gemini-embedding-001 model from a Python script: The above script executes these tasks: You'll get this output: Showing only first 5 dimensions. You can run the models locally via Sentence Transformers without the need of an API key. Install the Python library: CHoose a model from the Hugging Face catalog. Here's a Python script that uses the all-MiniLM-L6-v2 whose dimensionality is 384: Previous script executes the following tasks: You'll get the following output: Create the synopses table in the movies database: Note: The dimensionality defined for your VECTOR column must match exactly the output size of the embedding model you are using. For example, if you set VECTOR(3072) for a Gemini model, PostgreSQL will reject any attempts to insert embeddings from a model like all-MiniLM-L6-v2, which outputs 384 dimensions. If you decide to switch models in the future, you will need to either recreate the table or alter the column definition. Install required Python libraries: Here's the Gemini script, updated to transform the list of movie synopses, and later insert the vectors into the database: Don't forget to set the environment variable for the API key, and replace the value of the following variable: password. Previous script executes the following tasks: You've got your PostgreSQL instance up and running as a vector database. You’ve learned how to enable pgvector, how to transform raw text into embeddings using both cloud-based tools like Gemini and local models like Sentence Transformers, and how to store those vectors so your database can finally manage more than just plain text. You’ve built the foundation for an AI-powered app—now you’re all set to start experimenting with the data you’ve stored! Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

[-0.015540238, 0.0014074693, 0.009978753, -0.07941696, -0.027072648, 0.02588584, 0.0045492477, 0.050993927, 0.019187931, 0.0050778543] [-0.015540238, 0.0014074693, 0.009978753, -0.07941696, -0.027072648, 0.02588584, 0.0045492477, 0.050993927, 0.019187931, 0.0050778543] [-0.015540238, 0.0014074693, 0.009978753, -0.07941696, -0.027072648, 0.02588584, 0.0045492477, 0.050993927, 0.019187931, 0.0050778543] cd /tmp git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git cd /tmp git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git cd /tmp git clone --branch v0.8.2 https://github.com/pgvector/pgvector.git cd pgvector make cd pgvector make cd pgvector make sudo make install sudo make install sudo make install sudo -u postgres psql sudo -u postgres psql sudo -u postgres psql List of installed extensions Name | Version | Default version | Schema | Description ---------+---------+-----------------+------------+------------------------------------------------------ plpgsql | 1.0 | 1.0 | pg_catalog | PL/pgSQL procedural language vector | 0.8.2 | 0.8.2 | public | vector data type and ivfflat and hnsw access methods List of installed extensions Name | Version | Default version | Schema | Description ---------+---------+-----------------+------------+------------------------------------------------------ plpgsql | 1.0 | 1.0 | pg_catalog | PL/pgSQL procedural language vector | 0.8.2 | 0.8.2 | public | vector data type and ivfflat and hnsw access methods List of installed extensions Name | Version | Default version | Schema | Description ---------+---------+-----------------+------------+------------------------------------------------------ plpgsql | 1.0 | 1.0 | pg_catalog | PL/pgSQL procedural language vector | 0.8.2 | 0.8.2 | public | vector data type and ivfflat and hnsw access methods CREATE DATABASE movies; CREATE DATABASE movies; CREATE DATABASE movies; CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS vector; docker run -d --name postgres -e POSTGRES_PASSWORD=password -p 5432:5432 pgvector/pgvector:pg17 docker run -d --name postgres -e POSTGRES_PASSWORD=password -p 5432:5432 pgvector/pgvector:pg17 docker run -d --name postgres -e POSTGRES_PASSWORD=password -p 5432:5432 pgvector/pgvector:pg17 docker exec -it postgres psql -U postgres docker exec -it postgres psql -U postgres docker exec -it postgres psql -U postgres CREATE DATABASE movies; CREATE DATABASE movies; CREATE DATABASE movies; CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS vector; export GEMINI_API_KEY='YOUR_API_KEY' export GEMINI_API_KEY='YOUR_API_KEY' export GEMINI_API_KEY='YOUR_API_KEY' pip install google-genai pip install google-genai pip install google-genai from google import genai client = genai.Client() text = "A futuristic journey through the stars." model_name = "gemini-embedding-001" response = client.models.embed_content( model=model_name, contents=text, ) vector = response.embeddings[0].values print(f"Dimension: {len(vector)}") print(f"Vector preview: {vector[:5]}...") from google import genai client = genai.Client() text = "A futuristic journey through the stars." model_name = "gemini-embedding-001" response = client.models.embed_content( model=model_name, contents=text, ) vector = response.embeddings[0].values print(f"Dimension: {len(vector)}") print(f"Vector preview: {vector[:5]}...") from google import genai client = genai.Client() text = "A futuristic journey through the stars." model_name = "gemini-embedding-001" response = client.models.embed_content( model=model_name, contents=text, ) vector = response.embeddings[0].values print(f"Dimension: {len(vector)}") print(f"Vector preview: {vector[:5]}...") python get_embeddings.py python get_embeddings.py python get_embeddings.py Dimension: 3072 Vector preview: [-0.003496697, 0.004707519, 0.02058491, -0.0735851, 0.0041175582]... Dimension: 3072 Vector preview: [-0.003496697, 0.004707519, 0.02058491, -0.0735851, 0.0041175582]... Dimension: 3072 Vector preview: [-0.003496697, 0.004707519, 0.02058491, -0.0735851, 0.0041175582]... pip install sentence-transformers pip install sentence-transformers pip install sentence-transformers from sentence_transformers import SentenceTransformer movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(movies) for i, embedding in enumerate(embeddings): print(f"Movie {i+1} dimension: {len(embedding)}") print(f"Vector preview: {embedding[:3]}...") from sentence_transformers import SentenceTransformer movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(movies) for i, embedding in enumerate(embeddings): print(f"Movie {i+1} dimension: {len(embedding)}") print(f"Vector preview: {embedding[:3]}...") from sentence_transformers import SentenceTransformer movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(movies) for i, embedding in enumerate(embeddings): print(f"Movie {i+1} dimension: {len(embedding)}") print(f"Vector preview: {embedding[:3]}...") python get_embeddings.py python get_embeddings.py python get_embeddings.py Movie 1 dimension: 384 Vector preview: [-0.04579255 0.01413548 -0.01935582]... Movie 2 dimension: 384 Vector preview: [ 0.02027559 -0.03948853 0.06786963]... Movie 3 dimension: 384 Vector preview: [-0.01461564 0.01758054 0.00982607]... Movie 1 dimension: 384 Vector preview: [-0.04579255 0.01413548 -0.01935582]... Movie 2 dimension: 384 Vector preview: [ 0.02027559 -0.03948853 0.06786963]... Movie 3 dimension: 384 Vector preview: [-0.01461564 0.01758054 0.00982607]... Movie 1 dimension: 384 Vector preview: [-0.04579255 0.01413548 -0.01935582]... Movie 2 dimension: 384 Vector preview: [ 0.02027559 -0.03948853 0.06786963]... Movie 3 dimension: 384 Vector preview: [-0.01461564 0.01758054 0.00982607]... \c movies CREATE TABLE synopses ( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(3072) ); \c movies CREATE TABLE synopses ( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(3072) ); \c movies CREATE TABLE synopses ( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(3072) ); pip install psycopg2 pgvector pip install psycopg2 pgvector pip install psycopg2 pgvector import psycopg2 from google import genai from pgvector.psycopg2 import register_vector movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] client = genai.Client() response = client.models.embed_content( model="gemini-embedding-001", contents=movies, ) conn = psycopg2.connect("dbname=movies user=postgres password=secret") register_vector(conn) cur = conn.cursor() for i, text in enumerate(movies): embedding = response.embeddings[i].values cur.execute("INSERT INTO synopses (content, embedding) VALUES (%s, %s)", (text, embedding)) conn.commit() print("Vectors from Gemini successfully stored!") cur.close() conn.close() import psycopg2 from google import genai from pgvector.psycopg2 import register_vector movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] client = genai.Client() response = client.models.embed_content( model="gemini-embedding-001", contents=movies, ) conn = psycopg2.connect("dbname=movies user=postgres password=secret") register_vector(conn) cur = conn.cursor() for i, text in enumerate(movies): embedding = response.embeddings[i].values cur.execute("INSERT INTO synopses (content, embedding) VALUES (%s, %s)", (text, embedding)) conn.commit() print("Vectors from Gemini successfully stored!") cur.close() conn.close() import psycopg2 from google import genai from pgvector.psycopg2 import register_vector movies = [ "A futuristic journey through the stars and the mysteries of the universe.", "A lighthearted story about two strangers falling in love in New York City.", "A gritty detective story set in a neon-lit city ruled by artificial intelligence." ] client = genai.Client() response = client.models.embed_content( model="gemini-embedding-001", contents=movies, ) conn = psycopg2.connect("dbname=movies user=postgres password=secret") register_vector(conn) cur = conn.cursor() for i, text in enumerate(movies): embedding = response.embeddings[i].values cur.execute("INSERT INTO synopses (content, embedding) VALUES (%s, %s)", (text, embedding)) conn.commit() print("Vectors from Gemini successfully stored!") cur.close() conn.close() python vectors-to-postgresql.py python vectors-to-postgresql.py python vectors-to-postgresql.py - Clone the repository: - Compile the source code: - Install the extension - Log in to your instance: - Verify the extension was installed correctly: - Set up a database to store movie synopses - Select the database: \c movies - Enable the extension: - OpenAI Python library → https://github.com/openai/openai-python - Google GenAI Python SDK → https://github.com/googleapis/python-genai - Go to the API Keys page in the Google AI Studio - Click on Create API key - Assign a name to the key - Choose a project or create a new one - Click on Create key - Click on the created API key to see the details - Copy the API key - Initialize the client - Define the text to transform and the model to be used - Generate the embedding - Extract the vector, print the values, and the vector size - Define a list of texts to transform - Load the model - Generate the embeddings - Extract the vectors, print the values, and vector size - Define a list of texts to transform - Initialize the client - Generate the embeddings - Establish a connection to the database - Insert each synopses with their respective vector - Close the connection