Tools
Enterprise-Grade RAG Platform: Orchestrating Amazon Bedrock Agents via Red Hat OpenShift AI
2025-12-26
0 views
admin
Table of Contents ## Overview ## Project Purpose ## Key Value Propositions ## Solution Components ## Architecture ## High-Level Architecture Diagram ## Data Flow ## Security Architecture ## Prerequisites ## Required Accounts and Subscriptions ## Required Tools ## AWS Prerequisites ## Service Quotas ## IAM Permissions ## Knowledge Prerequisites ## Phase 1: ROSA Cluster Setup ## Step 1.1: Configure AWS CLI ## Step 1.2: Initialize ROSA ## Step 1.3: Create ROSA Cluster ## Step 1.4: Monitor Cluster Creation ## Step 1.5: Create Admin User ## Step 1.6: Connect to Cluster ## Step 1.7: Create Project Namespaces ## Phase 2: Red Hat OpenShift AI Installation ## Step 2.1: Install OpenShift AI Operator ## Step 2.2: Verify Operator Installation ## Step 2.3: Create DataScienceCluster ## Step 2.4: Verify RHOAI Installation ## Step 2.5: Configure Model Serving ## Phase 3: Amazon Bedrock Integration via PrivateLink ## Step 3.1: Enable Amazon Bedrock ## Step 3.2: Identify ROSA VPC ## Step 3.3: Create VPC Endpoint for Bedrock ## Step 3.4: Create IAM Role for Bedrock Access ## Step 3.5: Create Service Account in OpenShift ## Step 3.6: Test Bedrock Connectivity ## Phase 4: AWS Glue Data Pipeline ## Step 4.1: Create S3 Bucket for Documents ## Step 4.2: Create IAM Role for Glue ## Step 4.3: Create Glue Database ## Step 4.4: Create Glue Crawler ## Step 4.5: Create Glue ETL Job ## Step 4.6: Test Glue Pipeline ## Phase 5: Milvus Vector Database Deployment ## Step 5.1: Install Milvus Operator ## Step 5.2: Create Persistent Storage ## Step 5.3: Deploy Milvus Cluster ## Step 5.4: Configure Milvus Access ## Step 5.5: Test Milvus Connectivity ## Step 5.6: Create Milvus Collection ## Phase 6: RAG Application Deployment ## Step 6.1: Create Application Code ## Step 6.2: Build and Push Container Image ## Step 6.3: Deploy Application to OpenShift ## Step 6.4: Verify Deployment ## Testing and Validation ## End-to-End Testing ## Test 1: Document Ingestion and Processing ## Test 2: Embedding Generation and Vector Storage ## Test 3: RAG Query ## Performance Testing ## Resource Cleanup ## Step 1: Delete OpenShift Resources ## Step 2: Delete ROSA Cluster ## Step 3: Delete AWS Glue Resources ## Step 4: Delete S3 Bucket and Contents ## Step 5: Delete VPC Endpoint ## Step 6: Delete IAM Resources ## Step 7: Clean Up Local Files ## Verification This platform provides an enterprise-grade Retrieval-Augmented Generation (RAG) solution that addresses the primary concern of enterprises: data privacy and security. By leveraging Red Hat OpenShift on AWS (ROSA) to control the data plane while using Amazon Bedrock for AI capabilities, organizations maintain complete control over their sensitive data while accessing state-of-the-art language models. Install the following CLI tools on your workstation: Verify you have adequate service quotas in your target region: Your AWS IAM user/role needs permissions for: You should be familiar with: Create a ROSA cluster with appropriate specifications for the RAG workload: Configuration Rationale: Wait until the cluster state shows ready. You should see the rhods-operator pod in Running state. Access the dashboard URL in your browser and log in with your OpenShift credentials. Create a serving runtime for Amazon Bedrock integration: This phase establishes secure, private connectivity between your ROSA cluster and Amazon Bedrock using AWS PrivateLink. If successful, you should see a JSON response from Claude. This phase sets up AWS Glue to process documents from S3 and prepare them for vectorization. Create a Python script for document processing: Deploy Milvus on your ROSA cluster to store and search document embeddings. Create a test collection for document embeddings: Deploy the RAG application that orchestrates the entire pipeline. Create the RAG application source code: Create a job to process documents into Milvus: To avoid ongoing AWS charges, follow these steps to clean up all resources created during this implementation. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ ROSA Cluster (VPC) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Red Hat OpenShift AI │ │ │
│ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │
│ │ │ │ Model Serving │ │ RAG Application │ │ │ │
│ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │
│ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┼─────────────────────────┼───────────────┘ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────▼──────────────┐ │ │
│ │ │ │ Milvus Vector Database │ │ │
│ │ │ │ (Embeddings & Metadata) │ │ │
│ │ │ └──────────────────────────────┘ │ │
│ └──────────────┼──────────────────────────────────────────┘ │
│ │ │
│ │ AWS PrivateLink (Private Connectivity) │
│ │ │
│ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │
│ │ Amazon Bedrock │ │ AWS Glue │ │
│ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │
│ │ - Text Generation │ │ │ Glue Crawler │ │ │
│ │ - Embeddings │ │ ├────────────────┤ │ │
│ └─────────────────────────────┘ │ │ ETL Jobs │ │ │
│ │ └────────┬───────┘ │ │
│ └───────────┼──────────┘ │
│ │ │
│ ┌───────────▼──────────┐ │
│ │ Amazon S3 │ │
│ │ (Document Store) │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ ROSA Cluster (VPC) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Red Hat OpenShift AI │ │ │
│ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │
│ │ │ │ Model Serving │ │ RAG Application │ │ │ │
│ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │
│ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┼─────────────────────────┼───────────────┘ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────▼──────────────┐ │ │
│ │ │ │ Milvus Vector Database │ │ │
│ │ │ │ (Embeddings & Metadata) │ │ │
│ │ │ └──────────────────────────────┘ │ │
│ └──────────────┼──────────────────────────────────────────┘ │
│ │ │
│ │ AWS PrivateLink (Private Connectivity) │
│ │ │
│ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │
│ │ Amazon Bedrock │ │ AWS Glue │ │
│ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │
│ │ - Text Generation │ │ │ Glue Crawler │ │ │
│ │ - Embeddings │ │ ├────────────────┤ │ │
│ └─────────────────────────────┘ │ │ ETL Jobs │ │ │
│ │ └────────┬───────┘ │ │
│ └───────────┼──────────┘ │
│ │ │
│ ┌───────────▼──────────┐ │
│ │ Amazon S3 │ │
│ │ (Document Store) │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘ CODE_BLOCK:
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ ROSA Cluster (VPC) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Red Hat OpenShift AI │ │ │
│ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │
│ │ │ │ Model Serving │ │ RAG Application │ │ │ │
│ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │
│ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┼─────────────────────────┼───────────────┘ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────▼──────────────┐ │ │
│ │ │ │ Milvus Vector Database │ │ │
│ │ │ │ (Embeddings & Metadata) │ │ │
│ │ │ └──────────────────────────────┘ │ │
│ └──────────────┼──────────────────────────────────────────┘ │
│ │ │
│ │ AWS PrivateLink (Private Connectivity) │
│ │ │
│ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │
│ │ Amazon Bedrock │ │ AWS Glue │ │
│ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │
│ │ - Text Generation │ │ │ Glue Crawler │ │ │
│ │ - Embeddings │ │ ├────────────────┤ │ │
│ └─────────────────────────────┘ │ │ ETL Jobs │ │ │
│ │ └────────┬───────┘ │ │
│ └───────────┼──────────┘ │
│ │ │
│ ┌───────────▼──────────┐ │
│ │ Amazon S3 │ │
│ │ (Document Store) │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘ COMMAND_BLOCK:
# AWS CLI (v2)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install # ROSA CLI
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
tar -xvf rosa-linux.tar.gz
sudo mv rosa /usr/local/bin/rosa
rosa version # OpenShift CLI (oc)
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
oc version # Helm (v3)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# AWS CLI (v2)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install # ROSA CLI
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
tar -xvf rosa-linux.tar.gz
sudo mv rosa /usr/local/bin/rosa
rosa version # OpenShift CLI (oc)
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
oc version # Helm (v3)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version COMMAND_BLOCK:
# AWS CLI (v2)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install # ROSA CLI
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
tar -xvf rosa-linux.tar.gz
sudo mv rosa /usr/local/bin/rosa
rosa version # OpenShift CLI (oc)
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
oc version # Helm (v3)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version COMMAND_BLOCK:
# Check EC2 vCPU quota (need at least 100 for production ROSA)
aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota
aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Check EC2 vCPU quota (need at least 100 for production ROSA)
aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota
aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 COMMAND_BLOCK:
# Check EC2 vCPU quota (need at least 100 for production ROSA)
aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota
aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 COMMAND_BLOCK:
# Configure AWS credentials
aws configure # Verify configuration
aws sts get-caller-identity Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Configure AWS credentials
aws configure # Verify configuration
aws sts get-caller-identity COMMAND_BLOCK:
# Configure AWS credentials
aws configure # Verify configuration
aws sts get-caller-identity COMMAND_BLOCK:
# Log in to Red Hat
rosa login # Verify ROSA prerequisites
rosa verify quota
rosa verify permissions # Initialize ROSA in your AWS account (one-time setup)
rosa init Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Log in to Red Hat
rosa login # Verify ROSA prerequisites
rosa verify quota
rosa verify permissions # Initialize ROSA in your AWS account (one-time setup)
rosa init COMMAND_BLOCK:
# Log in to Red Hat
rosa login # Verify ROSA prerequisites
rosa verify quota
rosa verify permissions # Initialize ROSA in your AWS account (one-time setup)
rosa init COMMAND_BLOCK:
# Set environment variables
export CLUSTER_NAME="rag-platform"
export AWS_REGION="us-east-1"
export MULTI_AZ="true"
export MACHINE_TYPE="m5.2xlarge"
export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes)
rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Set environment variables
export CLUSTER_NAME="rag-platform"
export AWS_REGION="us-east-1"
export MULTI_AZ="true"
export MACHINE_TYPE="m5.2xlarge"
export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes)
rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes COMMAND_BLOCK:
# Set environment variables
export CLUSTER_NAME="rag-platform"
export AWS_REGION="us-east-1"
export MULTI_AZ="true"
export MACHINE_TYPE="m5.2xlarge"
export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes)
rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes COMMAND_BLOCK:
# Watch cluster installation progress
rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status
rosa describe cluster --cluster=$CLUSTER_NAME Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Watch cluster installation progress
rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status
rosa describe cluster --cluster=$CLUSTER_NAME COMMAND_BLOCK:
# Watch cluster installation progress
rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status
rosa describe cluster --cluster=$CLUSTER_NAME COMMAND_BLOCK:
# Create cluster admin user
rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like:
# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
# --username cluster-admin \
# --password <generated-password> Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create cluster admin user
rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like:
# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
# --username cluster-admin \
# --password <generated-password> COMMAND_BLOCK:
# Create cluster admin user
rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like:
# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
# --username cluster-admin \
# --password <generated-password> COMMAND_BLOCK:
# Use the login command from previous step
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access
oc cluster-info
oc get nodes
oc get projects Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Use the login command from previous step
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access
oc cluster-info
oc get nodes
oc get projects COMMAND_BLOCK:
# Use the login command from previous step
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access
oc cluster-info
oc get nodes
oc get projects COMMAND_BLOCK:
# Create namespace for RHOAI
oc new-project redhat-ods-applications # Create namespace for RAG application
oc new-project rag-application # Create namespace for Milvus
oc new-project milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create namespace for RHOAI
oc new-project redhat-ods-applications # Create namespace for RAG application
oc new-project rag-application # Create namespace for Milvus
oc new-project milvus COMMAND_BLOCK:
# Create namespace for RHOAI
oc new-project redhat-ods-applications # Create namespace for RAG application
oc new-project rag-application # Create namespace for Milvus
oc new-project milvus COMMAND_BLOCK:
# Create operator subscription
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata: name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata: name: redhat-ods-operator namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata: name: rhods-operator namespace: redhat-ods-operator
spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create operator subscription
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata: name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata: name: redhat-ods-operator namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata: name: rhods-operator namespace: redhat-ods-operator
spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic
EOF COMMAND_BLOCK:
# Create operator subscription
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata: name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata: name: redhat-ods-operator namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata: name: rhods-operator namespace: redhat-ods-operator
spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic
EOF COMMAND_BLOCK:
# Wait for operator to be ready (takes 3-5 minutes)
oc get csv -n redhat-ods-operator -w # Verify operator is running
oc get pods -n redhat-ods-operator Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Wait for operator to be ready (takes 3-5 minutes)
oc get csv -n redhat-ods-operator -w # Verify operator is running
oc get pods -n redhat-ods-operator COMMAND_BLOCK:
# Wait for operator to be ready (takes 3-5 minutes)
oc get csv -n redhat-ods-operator -w # Verify operator is running
oc get pods -n redhat-ods-operator COMMAND_BLOCK:
# Create the DataScienceCluster custom resource
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata: name: default-dsc
spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create the DataScienceCluster custom resource
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata: name: default-dsc
spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed
EOF COMMAND_BLOCK:
# Create the DataScienceCluster custom resource
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata: name: default-dsc
spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed
EOF COMMAND_BLOCK:
# Check DataScienceCluster status
oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL
oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Check DataScienceCluster status
oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL
oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' COMMAND_BLOCK:
# Check DataScienceCluster status
oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL
oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' COMMAND_BLOCK:
# Create custom serving runtime for Bedrock
cat <<EOF | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true"
spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create custom serving runtime for Bedrock
cat <<EOF | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true"
spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock
EOF COMMAND_BLOCK:
# Create custom serving runtime for Bedrock
cat <<EOF | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true"
spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock
EOF COMMAND_BLOCK:
# Enable Bedrock in your region (if not already enabled)
aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed)
# Go to AWS Console > Bedrock > Model access
# Or use the CLI:
aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Enable Bedrock in your region (if not already enabled)
aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed)
# Go to AWS Console > Bedrock > Model access
# Or use the CLI:
aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' COMMAND_BLOCK:
# Enable Bedrock in your region (if not already enabled)
aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed)
# Go to AWS Console > Bedrock > Model access
# Or use the CLI:
aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' COMMAND_BLOCK:
# Get the VPC ID of your ROSA cluster
export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs
export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Get the VPC ID of your ROSA cluster
export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs
export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" COMMAND_BLOCK:
# Get the VPC ID of your ROSA cluster
export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs
export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" COMMAND_BLOCK:
# Create security group for VPC endpoint
export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes
aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime
export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available
aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create security group for VPC endpoint
export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes
aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime
export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available
aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" COMMAND_BLOCK:
# Create security group for VPC endpoint
export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes
aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime
export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available
aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" COMMAND_BLOCK:
# Create IAM policy for Bedrock access
cat > bedrock-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ]
}
EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account
export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ]
}
EOF # Create IAM role
export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role
aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create IAM policy for Bedrock access
cat > bedrock-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ]
}
EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account
export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ]
}
EOF # Create IAM role
export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role
aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy COMMAND_BLOCK:
# Create IAM policy for Bedrock access
cat > bedrock-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ]
}
EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account
export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ]
}
EOF # Create IAM role
export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role
aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy COMMAND_BLOCK:
# Create service account with IAM role annotation
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN
EOF # Verify service account
oc get sa bedrock-sa -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create service account with IAM role annotation
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN
EOF # Verify service account
oc get sa bedrock-sa -n rag-application COMMAND_BLOCK:
# Create service account with IAM role annotation
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN
EOF # Verify service account
oc get sa bedrock-sa -n rag-application COMMAND_BLOCK:
# Create test pod with AWS CLI
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: bedrock-test namespace: rag-application
spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION"
EOF # Wait for pod to be ready
oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call
oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response
oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod
oc delete pod bedrock-test -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create test pod with AWS CLI
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: bedrock-test namespace: rag-application
spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION"
EOF # Wait for pod to be ready
oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call
oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response
oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod
oc delete pod bedrock-test -n rag-application COMMAND_BLOCK:
# Create test pod with AWS CLI
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: bedrock-test namespace: rag-application
spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION"
EOF # Wait for pod to be ready
oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call
oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response
oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod
oc delete pod bedrock-test -n rag-application COMMAND_BLOCK:
# Create S3 bucket (name must be globally unique)
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning
aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/
aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/
aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create S3 bucket (name must be globally unique)
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning
aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/
aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/
aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" COMMAND_BLOCK:
# Create S3 bucket (name must be globally unique)
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning
aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/
aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/
aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" COMMAND_BLOCK:
# Create trust policy for Glue
cat > glue-trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ]
}
EOF # Create Glue service role
aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy
aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access
cat > glue-s3-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ]
}
EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create trust policy for Glue
cat > glue-trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ]
}
EOF # Create Glue service role
aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy
aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access
cat > glue-s3-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ]
}
EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json COMMAND_BLOCK:
# Create trust policy for Glue
cat > glue-trust-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ]
}
EOF # Create Glue service role
aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy
aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access
cat > glue-s3-policy.json <<EOF
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ]
}
EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json COMMAND_BLOCK:
# Create Glue database
aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation
aws glue get-database --name rag_documents_db --region $AWS_REGION Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create Glue database
aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation
aws glue get-database --name rag_documents_db --region $AWS_REGION COMMAND_BLOCK:
# Create Glue database
aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation
aws glue get-database --name rag_documents_db --region $AWS_REGION COMMAND_BLOCK:
# Create crawler for raw documents
aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create crawler for raw documents
aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" COMMAND_BLOCK:
# Create crawler for raw documents
aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" COMMAND_BLOCK:
# Create ETL script
cat > glue-etl-script.py <<'PYTHON_SCRIPT'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame # Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3') # Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents"
) # Document processing function
def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3
def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit()
PYTHON_SCRIPT # Upload script to S3
aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job
aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create ETL script
cat > glue-etl-script.py <<'PYTHON_SCRIPT'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame # Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3') # Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents"
) # Document processing function
def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3
def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit()
PYTHON_SCRIPT # Upload script to S3
aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job
aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" COMMAND_BLOCK:
# Create ETL script
cat > glue-etl-script.py <<'PYTHON_SCRIPT'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame # Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3') # Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents"
) # Document processing function
def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3
def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit()
PYTHON_SCRIPT # Upload script to S3
aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job
aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" COMMAND_BLOCK:
# Upload sample document
cat > sample-document.txt <<EOF
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
EOF # Upload to S3
aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status)
aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Upload sample document
cat > sample-document.txt <<EOF
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
EOF # Upload to S3
aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status)
aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK:
# Upload sample document
cat > sample-document.txt <<EOF
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
EOF # Upload to S3
aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status)
aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK:
# Add Milvus Helm repository
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update # Install Milvus operator
helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation
oc get pods -n milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Add Milvus Helm repository
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update # Install Milvus operator
helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation
oc get pods -n milvus COMMAND_BLOCK:
# Add Milvus Helm repository
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update # Install Milvus operator
helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation
oc get pods -n milvus COMMAND_BLOCK:
# Create PersistentVolumeClaims for Milvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-etcd-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-minio-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create PersistentVolumeClaims for Milvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-etcd-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-minio-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi
EOF COMMAND_BLOCK:
# Create PersistentVolumeClaims for Milvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-etcd-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata: name: milvus-minio-pvc namespace: milvus
spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi
EOF COMMAND_BLOCK:
# Create Milvus cluster configuration
cat > milvus-values.yaml <<EOF
cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus
helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation
oc get pods -n milvus
oc get svc -n milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create Milvus cluster configuration
cat > milvus-values.yaml <<EOF
cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus
helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation
oc get pods -n milvus
oc get svc -n milvus COMMAND_BLOCK:
# Create Milvus cluster configuration
cat > milvus-values.yaml <<EOF
cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus
helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation
oc get pods -n milvus
oc get svc -n milvus COMMAND_BLOCK:
# Get Milvus service endpoint
export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}')
export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata: name: milvus-config namespace: rag-application
data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT"
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Get Milvus service endpoint
export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}')
export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata: name: milvus-config namespace: rag-application
data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT"
EOF COMMAND_BLOCK:
# Get Milvus service endpoint
export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}')
export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata: name: milvus-config namespace: rag-application
data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT"
EOF COMMAND_BLOCK:
# Create test pod with pymilvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: milvus-test namespace: rag-application
spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT
EOF # Wait for pod
oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection
oc exec -n rag-application milvus-test -- bash -c "
pip install pymilvus && python3 <<PYTHON
from pymilvus import connections, utility
import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT']
) print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
" # Clean up test pod
oc delete pod milvus-test -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create test pod with pymilvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: milvus-test namespace: rag-application
spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT
EOF # Wait for pod
oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection
oc exec -n rag-application milvus-test -- bash -c "
pip install pymilvus && python3 <<PYTHON
from pymilvus import connections, utility
import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT']
) print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
" # Clean up test pod
oc delete pod milvus-test -n rag-application COMMAND_BLOCK:
# Create test pod with pymilvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata: name: milvus-test namespace: rag-application
spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT
EOF # Wait for pod
oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection
oc exec -n rag-application milvus-test -- bash -c "
pip install pymilvus && python3 <<PYTHON
from pymilvus import connections, utility
import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT']
) print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
" # Clean up test pod
oc delete pod milvus-test -n rag-application COMMAND_BLOCK:
# Create initialization job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: milvus-init namespace: rag-application
spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3
EOF # Check job status
oc logs job/milvus-init -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create initialization job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: milvus-init namespace: rag-application
spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3
EOF # Check job status
oc logs job/milvus-init -n rag-application COMMAND_BLOCK:
# Create initialization job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: milvus-init namespace: rag-application
spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3
EOF # Check job status
oc logs job/milvus-init -n rag-application COMMAND_BLOCK:
# Create application directory structure
mkdir -p rag-app/{src,config,tests} # Create requirements.txt
cat > rag-app/requirements.txt <<EOF
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
EOF # Create main application
cat > rag-app/src/main.py <<'PYTHON_CODE'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging # Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__) # Initialize FastAPI app
app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0"
) # Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents" # Initialize clients
bedrock_runtime = None
milvus_collection = None @app.on_event("startup")
async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown")
async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models
class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context:
{context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/")
async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)
PYTHON_CODE # Create Dockerfile
cat > rag-app/Dockerfile <<EOF
FROM python:3.11-slim WORKDIR /app # Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt # Copy application code
COPY src/ ./src/ # Expose port
EXPOSE 8000 # Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create application directory structure
mkdir -p rag-app/{src,config,tests} # Create requirements.txt
cat > rag-app/requirements.txt <<EOF
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
EOF # Create main application
cat > rag-app/src/main.py <<'PYTHON_CODE'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging # Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__) # Initialize FastAPI app
app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0"
) # Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents" # Initialize clients
bedrock_runtime = None
milvus_collection = None @app.on_event("startup")
async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown")
async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models
class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context:
{context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/")
async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)
PYTHON_CODE # Create Dockerfile
cat > rag-app/Dockerfile <<EOF
FROM python:3.11-slim WORKDIR /app # Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt # Copy application code
COPY src/ ./src/ # Expose port
EXPOSE 8000 # Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF COMMAND_BLOCK:
# Create application directory structure
mkdir -p rag-app/{src,config,tests} # Create requirements.txt
cat > rag-app/requirements.txt <<EOF
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
EOF # Create main application
cat > rag-app/src/main.py <<'PYTHON_CODE'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging # Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__) # Initialize FastAPI app
app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0"
) # Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents" # Initialize clients
bedrock_runtime = None
milvus_collection = None @app.on_event("startup")
async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown")
async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models
class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context:
{context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/")
async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)
PYTHON_CODE # Create Dockerfile
cat > rag-app/Dockerfile <<EOF
FROM python:3.11-slim WORKDIR /app # Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt # Copy application code
COPY src/ ./src/ # Expose port
EXPOSE 8000 # Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF COMMAND_BLOCK:
# Build container image (using podman or docker)
cd rag-app # Option 1: Build with podman
podman build -t rag-application:v1.0 . # Option 2: Build with docker
# docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry
export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry
podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream
oc create imagestream rag-application -n rag-application # Tag and push
podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0
podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Build container image (using podman or docker)
cd rag-app # Option 1: Build with podman
podman build -t rag-application:v1.0 . # Option 2: Build with docker
# docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry
export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry
podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream
oc create imagestream rag-application -n rag-application # Tag and push
podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0
podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. COMMAND_BLOCK:
# Build container image (using podman or docker)
cd rag-app # Option 1: Build with podman
podman build -t rag-application:v1.0 . # Option 2: Build with docker
# docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry
export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry
podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream
oc create imagestream rag-application -n rag-application # Tag and push
podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0
podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. COMMAND_BLOCK:
# Create deployment
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata: name: rag-application namespace: rag-application labels: app: rag-application
spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata: name: rag-application namespace: rag-application
spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata: name: rag-application namespace: rag-application
spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect
EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create deployment
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata: name: rag-application namespace: rag-application labels: app: rag-application
spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata: name: rag-application namespace: rag-application
spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata: name: rag-application namespace: rag-application
spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect
EOF COMMAND_BLOCK:
# Create deployment
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata: name: rag-application namespace: rag-application labels: app: rag-application
spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata: name: rag-application namespace: rag-application
spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata: name: rag-application namespace: rag-application
spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect
EOF COMMAND_BLOCK:
# Check deployment status
oc get deployment rag-application -n rag-application
oc get pods -n rag-application -l app=rag-application # Get application URL
export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}')
echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint
curl https://$RAG_APP_URL/health # View application logs
oc logs -f deployment/rag-application -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Check deployment status
oc get deployment rag-application -n rag-application
oc get pods -n rag-application -l app=rag-application # Get application URL
export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}')
echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint
curl https://$RAG_APP_URL/health # View application logs
oc logs -f deployment/rag-application -n rag-application COMMAND_BLOCK:
# Check deployment status
oc get deployment rag-application -n rag-application
oc get pods -n rag-application -l app=rag-application # Get application URL
export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}')
echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint
curl https://$RAG_APP_URL/health # View application logs
oc logs -f deployment/rag-application -n rag-application COMMAND_BLOCK:
# Upload test documents to S3
cat > test-doc-1.txt <<EOF
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
EOF cat > test-doc-2.txt <<EOF
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
EOF # Upload to S3
aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/
aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job
sleep 120
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Upload test documents to S3
cat > test-doc-1.txt <<EOF
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
EOF cat > test-doc-2.txt <<EOF
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
EOF # Upload to S3
aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/
aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job
sleep 120
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK:
# Upload test documents to S3
cat > test-doc-1.txt <<EOF
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
EOF cat > test-doc-2.txt <<EOF
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
EOF # Upload to S3
aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/
aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job
sleep 120
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK:
# Create embedding job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: embed-documents namespace: rag-application
spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3
EOF # Monitor job
oc logs job/embed-documents -n rag-application -f Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Create embedding job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: embed-documents namespace: rag-application
spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3
EOF # Monitor job
oc logs job/embed-documents -n rag-application -f COMMAND_BLOCK:
# Create embedding job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata: name: embed-documents namespace: rag-application
spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3
EOF # Monitor job
oc logs job/embed-documents -n rag-application -f COMMAND_BLOCK:
# Test RAG query endpoint
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Test RAG query endpoint
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . COMMAND_BLOCK:
# Test RAG query endpoint
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query
curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . COMMAND_BLOCK:
# Install Apache Bench for load testing
sudo yum install httpd-tools -y # Create query payload
cat > query-payload.json <<EOF
{ "query": "What are the benefits of using OpenShift?", "top_k": 5
}
EOF # Run load test (100 requests, 10 concurrent)
ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Install Apache Bench for load testing
sudo yum install httpd-tools -y # Create query payload
cat > query-payload.json <<EOF
{ "query": "What are the benefits of using OpenShift?", "top_k": 5
}
EOF # Run load test (100 requests, 10 concurrent)
ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" COMMAND_BLOCK:
# Install Apache Bench for load testing
sudo yum install httpd-tools -y # Create query payload
cat > query-payload.json <<EOF
{ "query": "What are the benefits of using OpenShift?", "top_k": 5
}
EOF # Run load test (100 requests, 10 concurrent)
ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" COMMAND_BLOCK:
# Delete RAG application
oc delete deployment rag-application -n rag-application
oc delete service rag-application -n rag-application
oc delete route rag-application -n rag-application # Delete Milvus
helm uninstall milvus -n milvus
helm uninstall milvus-operator -n milvus
oc delete pvc --all -n milvus # Delete RHOAI
oc delete datasciencecluster default-dsc -n redhat-ods-operator
oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Delete RAG application
oc delete deployment rag-application -n rag-application
oc delete service rag-application -n rag-application
oc delete route rag-application -n rag-application # Delete Milvus
helm uninstall milvus -n milvus
helm uninstall milvus-operator -n milvus
oc delete pvc --all -n milvus # Delete RHOAI
oc delete datasciencecluster default-dsc -n redhat-ods-operator
oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring COMMAND_BLOCK:
# Delete RAG application
oc delete deployment rag-application -n rag-application
oc delete service rag-application -n rag-application
oc delete route rag-application -n rag-application # Delete Milvus
helm uninstall milvus -n milvus
helm uninstall milvus-operator -n milvus
oc delete pvc --all -n milvus # Delete RHOAI
oc delete datasciencecluster default-dsc -n redhat-ods-operator
oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring COMMAND_BLOCK:
# Delete ROSA cluster (takes ~10-15 minutes)
rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete
rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted
rosa list clusters Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Delete ROSA cluster (takes ~10-15 minutes)
rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete
rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted
rosa list clusters COMMAND_BLOCK:
# Delete ROSA cluster (takes ~10-15 minutes)
rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete
rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted
rosa list clusters COMMAND_BLOCK:
# Delete Glue job
aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler
aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database
aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role
aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access
aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role --role-name AWSGlueServiceRole-RAG Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Delete Glue job
aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler
aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database
aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role
aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access
aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role --role-name AWSGlueServiceRole-RAG COMMAND_BLOCK:
# Delete Glue job
aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler
aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database
aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role
aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access
aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role --role-name AWSGlueServiceRole-RAG COMMAND_BLOCK:
# Delete all objects in bucket
aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket
aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Delete all objects in bucket
aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket
aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" COMMAND_BLOCK:
# Delete all objects in bucket
aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket
aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" COMMAND_BLOCK:
# Delete VPC endpoint for Bedrock
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group
aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Delete VPC endpoint for Bedrock
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group
aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" COMMAND_BLOCK:
# Delete VPC endpoint for Bedrock
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group
aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" COMMAND_BLOCK:
# Detach policy from Bedrock role
aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role
aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy
aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Detach policy from Bedrock role
aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role
aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy
aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" COMMAND_BLOCK:
# Detach policy from Bedrock role
aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role
aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy
aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" COMMAND_BLOCK:
# Remove temporary files
rm -f bedrock-policy.json
rm -f trust-policy.json
rm -f glue-trust-policy.json
rm -f glue-s3-policy.json
rm -f glue-etl-script.py
rm -f sample-document.txt
rm -f test-doc-1.txt
rm -f test-doc-2.txt
rm -f query-payload.json
rm -f milvus-values.yaml
rm -rf rag-app/ echo "Local temporary files cleaned up" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Remove temporary files
rm -f bedrock-policy.json
rm -f trust-policy.json
rm -f glue-trust-policy.json
rm -f glue-s3-policy.json
rm -f glue-etl-script.py
rm -f sample-document.txt
rm -f test-doc-1.txt
rm -f test-doc-2.txt
rm -f query-payload.json
rm -f milvus-values.yaml
rm -rf rag-app/ echo "Local temporary files cleaned up" COMMAND_BLOCK:
# Remove temporary files
rm -f bedrock-policy.json
rm -f trust-policy.json
rm -f glue-trust-policy.json
rm -f glue-s3-policy.json
rm -f glue-etl-script.py
rm -f sample-document.txt
rm -f test-doc-1.txt
rm -f test-doc-2.txt
rm -f query-payload.json
rm -f milvus-values.yaml
rm -rf rag-app/ echo "Local temporary files cleaned up" COMMAND_BLOCK:
# Verify ROSA cluster is deleted
rosa list clusters # Verify S3 bucket is deleted
aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted
aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted
aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Verify ROSA cluster is deleted
rosa list clusters # Verify S3 bucket is deleted
aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted
aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted
aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" COMMAND_BLOCK:
# Verify ROSA cluster is deleted
rosa list clusters # Verify S3 bucket is deleted
aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted
aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted
aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" - Architecture
- Prerequisites
- Phase 1: ROSA Cluster Setup
- Phase 2: Red Hat OpenShift AI Installation
- Phase 3: Amazon Bedrock Integration via PrivateLink
- Phase 4: AWS Glue Data Pipeline
- Phase 5: Milvus Vector Database Deployment
- Phase 6: RAG Application Deployment
- Testing and Validation - Privacy-First Architecture: All sensitive data remains within your controlled OpenShift environment
- Secure Connectivity: AWS PrivateLink ensures AI model calls never traverse the public internet
- Enterprise Compliance: Meets stringent data governance and compliance requirements
- Scalable Infrastructure: Leverages Kubernetes orchestration for production-grade reliability
- Best-of-Breed Components: Combines Red Hat's enterprise Kubernetes with AWS's managed AI services - Document Ingestion: Documents uploaded to S3 bucket
- ETL Processing: AWS Glue crawler discovers and processes documents
- Embedding Generation: Processed documents sent to Bedrock for embedding generation
- Vector Storage: Embeddings stored in Milvus running on ROSA
- Query Processing: User queries received by RAG application
- Vector Search: Application searches Milvus for relevant document chunks
- Context Retrieval: Relevant chunks retrieved from vector database
- LLM Inference: RHOAI gateway forwards prompt + context to Bedrock via PrivateLink
- Response Generation: Claude 3.5 generates response based on retrieved context
- Response Delivery: Answer returned to user through application - Network Isolation: ROSA cluster in private subnets with no public ingress
- PrivateLink Encryption: All Bedrock API calls encrypted in transit via AWS PrivateLink
- Data Sovereignty: Document content never leaves controlled environment
- RBAC: OpenShift role-based access control for all components
- Secrets Management: OpenShift secrets for API keys and credentials - [ ] AWS Account with administrative access
- [ ] Red Hat Account with OpenShift subscription
- [ ] ROSA Enabled in your AWS account (Enable ROSA)
- [ ] Amazon Bedrock Access with Claude 3.5 Sonnet model enabled in your region - EC2 (VPC, subnets, security groups, instances)
- IAM (roles, policies)
- S3 (buckets, objects)
- Bedrock (InvokeModel, InvokeModelWithResponseStream)
- Glue (crawlers, jobs, databases)
- CloudWatch (logs, metrics) - AWS fundamentals (VPC, IAM, S3)
- Kubernetes basics (pods, deployments, services)
- Basic Linux command line
- YAML configuration files
- REST APIs and HTTP concepts - m5.2xlarge: 8 vCPUs, 32 GB RAM per node - suitable for vector database and ML workloads
- 3 nodes: High availability across multiple availability zones
- Multi-AZ: Ensures resilience against AZ failures
how-totutorialguidedev.toaimlllmlinuxserverbashnetworkdnssubnetapachedocker