Enterprise-Grade RAG Platform: Orchestrating Amazon Bedrock Agents via Red Hat OpenShift AI

Enterprise-Grade RAG Platform: Orchestrating Amazon Bedrock Agents via Red Hat OpenShift AI

Source: Dev.to

Table of Contents ## Overview ## Project Purpose ## Key Value Propositions ## Solution Components ## Architecture ## High-Level Architecture Diagram ## Data Flow ## Security Architecture ## Prerequisites ## Required Accounts and Subscriptions ## Required Tools ## AWS Prerequisites ## Service Quotas ## IAM Permissions ## Knowledge Prerequisites ## Phase 1: ROSA Cluster Setup ## Step 1.1: Configure AWS CLI ## Step 1.2: Initialize ROSA ## Step 1.3: Create ROSA Cluster ## Step 1.4: Monitor Cluster Creation ## Step 1.5: Create Admin User ## Step 1.6: Connect to Cluster ## Step 1.7: Create Project Namespaces ## Phase 2: Red Hat OpenShift AI Installation ## Step 2.1: Install OpenShift AI Operator ## Step 2.2: Verify Operator Installation ## Step 2.3: Create DataScienceCluster ## Step 2.4: Verify RHOAI Installation ## Step 2.5: Configure Model Serving ## Phase 3: Amazon Bedrock Integration via PrivateLink ## Step 3.1: Enable Amazon Bedrock ## Step 3.2: Identify ROSA VPC ## Step 3.3: Create VPC Endpoint for Bedrock ## Step 3.4: Create IAM Role for Bedrock Access ## Step 3.5: Create Service Account in OpenShift ## Step 3.6: Test Bedrock Connectivity ## Phase 4: AWS Glue Data Pipeline ## Step 4.1: Create S3 Bucket for Documents ## Step 4.2: Create IAM Role for Glue ## Step 4.3: Create Glue Database ## Step 4.4: Create Glue Crawler ## Step 4.5: Create Glue ETL Job ## Step 4.6: Test Glue Pipeline ## Phase 5: Milvus Vector Database Deployment ## Step 5.1: Install Milvus Operator ## Step 5.2: Create Persistent Storage ## Step 5.3: Deploy Milvus Cluster ## Step 5.4: Configure Milvus Access ## Step 5.5: Test Milvus Connectivity ## Step 5.6: Create Milvus Collection ## Phase 6: RAG Application Deployment ## Step 6.1: Create Application Code ## Step 6.2: Build and Push Container Image ## Step 6.3: Deploy Application to OpenShift ## Step 6.4: Verify Deployment ## Testing and Validation ## End-to-End Testing ## Test 1: Document Ingestion and Processing ## Test 2: Embedding Generation and Vector Storage ## Test 3: RAG Query ## Performance Testing ## Resource Cleanup ## Step 1: Delete OpenShift Resources ## Step 2: Delete ROSA Cluster ## Step 3: Delete AWS Glue Resources ## Step 4: Delete S3 Bucket and Contents ## Step 5: Delete VPC Endpoint ## Step 6: Delete IAM Resources ## Step 7: Clean Up Local Files ## Verification This platform provides an enterprise-grade Retrieval-Augmented Generation (RAG) solution that addresses the primary concern of enterprises: data privacy and security. By leveraging Red Hat OpenShift on AWS (ROSA) to control the data plane while using Amazon Bedrock for AI capabilities, organizations maintain complete control over their sensitive data while accessing state-of-the-art language models. Install the following CLI tools on your workstation: Verify you have adequate service quotas in your target region: Your AWS IAM user/role needs permissions for: You should be familiar with: Create a ROSA cluster with appropriate specifications for the RAG workload: Configuration Rationale: Wait until the cluster state shows ready. You should see the rhods-operator pod in Running state. Access the dashboard URL in your browser and log in with your OpenShift credentials. Create a serving runtime for Amazon Bedrock integration: This phase establishes secure, private connectivity between your ROSA cluster and Amazon Bedrock using AWS PrivateLink. If successful, you should see a JSON response from Claude. This phase sets up AWS Glue to process documents from S3 and prepare them for vectorization. Create a Python script for document processing: Deploy Milvus on your ROSA cluster to store and search document embeddings. Create a test collection for document embeddings: Deploy the RAG application that orchestrates the entire pipeline. Create the RAG application source code: Create a job to process documents into Milvus: To avoid ongoing AWS charges, follow these steps to clean up all resources created during this implementation. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: ┌─────────────────────────────────────────────────────────────────┐ │ AWS Cloud │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ ROSA Cluster (VPC) │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ Red Hat OpenShift AI │ │ │ │ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │ │ │ │ │ Model Serving │ │ RAG Application │ │ │ │ │ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │ │ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ └───────────┼─────────────────────────┼───────────────┘ │ │ │ │ │ │ │ │ │ │ │ ┌───────────────▼──────────────┐ │ │ │ │ │ │ Milvus Vector Database │ │ │ │ │ │ │ (Embeddings & Metadata) │ │ │ │ │ │ └──────────────────────────────┘ │ │ │ └──────────────┼──────────────────────────────────────────┘ │ │ │ │ │ │ AWS PrivateLink (Private Connectivity) │ │ │ │ │ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │ │ │ Amazon Bedrock │ │ AWS Glue │ │ │ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │ │ │ - Text Generation │ │ │ Glue Crawler │ │ │ │ │ - Embeddings │ │ ├────────────────┤ │ │ │ └─────────────────────────────┘ │ │ ETL Jobs │ │ │ │ │ └────────┬───────┘ │ │ │ └───────────┼──────────┘ │ │ │ │ │ ┌───────────▼──────────┐ │ │ │ Amazon S3 │ │ │ │ (Document Store) │ │ │ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ┌─────────────────────────────────────────────────────────────────┐ │ AWS Cloud │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ ROSA Cluster (VPC) │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ Red Hat OpenShift AI │ │ │ │ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │ │ │ │ │ Model Serving │ │ RAG Application │ │ │ │ │ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │ │ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ └───────────┼─────────────────────────┼───────────────┘ │ │ │ │ │ │ │ │ │ │ │ ┌───────────────▼──────────────┐ │ │ │ │ │ │ Milvus Vector Database │ │ │ │ │ │ │ (Embeddings & Metadata) │ │ │ │ │ │ └──────────────────────────────┘ │ │ │ └──────────────┼──────────────────────────────────────────┘ │ │ │ │ │ │ AWS PrivateLink (Private Connectivity) │ │ │ │ │ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │ │ │ Amazon Bedrock │ │ AWS Glue │ │ │ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │ │ │ - Text Generation │ │ │ Glue Crawler │ │ │ │ │ - Embeddings │ │ ├────────────────┤ │ │ │ └─────────────────────────────┘ │ │ ETL Jobs │ │ │ │ │ └────────┬───────┘ │ │ │ └───────────┼──────────┘ │ │ │ │ │ ┌───────────▼──────────┐ │ │ │ Amazon S3 │ │ │ │ (Document Store) │ │ │ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ CODE_BLOCK: ┌─────────────────────────────────────────────────────────────────┐ │ AWS Cloud │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ ROSA Cluster (VPC) │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ Red Hat OpenShift AI │ │ │ │ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │ │ │ │ │ Model Serving │ │ RAG Application │ │ │ │ │ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │ │ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ └───────────┼─────────────────────────┼───────────────┘ │ │ │ │ │ │ │ │ │ │ │ ┌───────────────▼──────────────┐ │ │ │ │ │ │ Milvus Vector Database │ │ │ │ │ │ │ (Embeddings & Metadata) │ │ │ │ │ │ └──────────────────────────────┘ │ │ │ └──────────────┼──────────────────────────────────────────┘ │ │ │ │ │ │ AWS PrivateLink (Private Connectivity) │ │ │ │ │ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │ │ │ Amazon Bedrock │ │ AWS Glue │ │ │ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │ │ │ - Text Generation │ │ │ Glue Crawler │ │ │ │ │ - Embeddings │ │ ├────────────────┤ │ │ │ └─────────────────────────────┘ │ │ ETL Jobs │ │ │ │ │ └────────┬───────┘ │ │ │ └───────────┼──────────┘ │ │ │ │ │ ┌───────────▼──────────┐ │ │ │ Amazon S3 │ │ │ │ (Document Store) │ │ │ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ COMMAND_BLOCK: # AWS CLI (v2) curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # ROSA CLI wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz tar -xvf rosa-linux.tar.gz sudo mv rosa /usr/local/bin/rosa rosa version # OpenShift CLI (oc) wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz tar -xvf openshift-client-linux.tar.gz sudo mv oc kubectl /usr/local/bin/ oc version # Helm (v3) curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm version Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # AWS CLI (v2) curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # ROSA CLI wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz tar -xvf rosa-linux.tar.gz sudo mv rosa /usr/local/bin/rosa rosa version # OpenShift CLI (oc) wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz tar -xvf openshift-client-linux.tar.gz sudo mv oc kubectl /usr/local/bin/ oc version # Helm (v3) curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm version COMMAND_BLOCK: # AWS CLI (v2) curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # ROSA CLI wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz tar -xvf rosa-linux.tar.gz sudo mv rosa /usr/local/bin/rosa rosa version # OpenShift CLI (oc) wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz tar -xvf openshift-client-linux.tar.gz sudo mv oc kubectl /usr/local/bin/ oc version # Helm (v3) curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm version COMMAND_BLOCK: # Check EC2 vCPU quota (need at least 100 for production ROSA) aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Check EC2 vCPU quota (need at least 100 for production ROSA) aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 COMMAND_BLOCK: # Check EC2 vCPU quota (need at least 100 for production ROSA) aws service-quotas get-service-quota \ --service-code ec2 \ --quota-code L-1216C47A \ --region us-east-1 # Check VPC quota aws service-quotas get-service-quota \ --service-code vpc \ --quota-code L-F678F1CE \ --region us-east-1 COMMAND_BLOCK: # Configure AWS credentials aws configure # Verify configuration aws sts get-caller-identity Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Configure AWS credentials aws configure # Verify configuration aws sts get-caller-identity COMMAND_BLOCK: # Configure AWS credentials aws configure # Verify configuration aws sts get-caller-identity COMMAND_BLOCK: # Log in to Red Hat rosa login # Verify ROSA prerequisites rosa verify quota rosa verify permissions # Initialize ROSA in your AWS account (one-time setup) rosa init Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Log in to Red Hat rosa login # Verify ROSA prerequisites rosa verify quota rosa verify permissions # Initialize ROSA in your AWS account (one-time setup) rosa init COMMAND_BLOCK: # Log in to Red Hat rosa login # Verify ROSA prerequisites rosa verify quota rosa verify permissions # Initialize ROSA in your AWS account (one-time setup) rosa init COMMAND_BLOCK: # Set environment variables export CLUSTER_NAME="rag-platform" export AWS_REGION="us-east-1" export MULTI_AZ="true" export MACHINE_TYPE="m5.2xlarge" export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes) rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Set environment variables export CLUSTER_NAME="rag-platform" export AWS_REGION="us-east-1" export MULTI_AZ="true" export MACHINE_TYPE="m5.2xlarge" export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes) rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes COMMAND_BLOCK: # Set environment variables export CLUSTER_NAME="rag-platform" export AWS_REGION="us-east-1" export MULTI_AZ="true" export MACHINE_TYPE="m5.2xlarge" export COMPUTE_NODES=3 # Create ROSA cluster (takes ~40 minutes) rosa create cluster \ --cluster-name $CLUSTER_NAME \ --region $AWS_REGION \ --multi-az \ --compute-machine-type $MACHINE_TYPE \ --compute-nodes $COMPUTE_NODES \ --machine-cidr 10.0.0.0/16 \ --service-cidr 172.30.0.0/16 \ --pod-cidr 10.128.0.0/14 \ --host-prefix 23 \ --yes COMMAND_BLOCK: # Watch cluster installation progress rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status rosa describe cluster --cluster=$CLUSTER_NAME Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Watch cluster installation progress rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status rosa describe cluster --cluster=$CLUSTER_NAME COMMAND_BLOCK: # Watch cluster installation progress rosa logs install --cluster=$CLUSTER_NAME --watch # Check cluster status rosa describe cluster --cluster=$CLUSTER_NAME COMMAND_BLOCK: # Create cluster admin user rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like: # oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ # --username cluster-admin \ # --password <generated-password> Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create cluster admin user rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like: # oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ # --username cluster-admin \ # --password <generated-password> COMMAND_BLOCK: # Create cluster admin user rosa create admin --cluster=$CLUSTER_NAME # Save the login command output - it will look like: # oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ # --username cluster-admin \ # --password <generated-password> COMMAND_BLOCK: # Use the login command from previous step oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access oc cluster-info oc get nodes oc get projects Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Use the login command from previous step oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access oc cluster-info oc get nodes oc get projects COMMAND_BLOCK: # Use the login command from previous step oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \ --username cluster-admin \ --password <your-password> # Verify cluster access oc cluster-info oc get nodes oc get projects COMMAND_BLOCK: # Create namespace for RHOAI oc new-project redhat-ods-applications # Create namespace for RAG application oc new-project rag-application # Create namespace for Milvus oc new-project milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create namespace for RHOAI oc new-project redhat-ods-applications # Create namespace for RAG application oc new-project rag-application # Create namespace for Milvus oc new-project milvus COMMAND_BLOCK: # Create namespace for RHOAI oc new-project redhat-ods-applications # Create namespace for RAG application oc new-project rag-application # Create namespace for Milvus oc new-project milvus COMMAND_BLOCK: # Create operator subscription cat <<EOF | oc apply -f - apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: redhat-ods-operator namespace: redhat-ods-operator spec: {} --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: rhods-operator namespace: redhat-ods-operator spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create operator subscription cat <<EOF | oc apply -f - apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: redhat-ods-operator namespace: redhat-ods-operator spec: {} --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: rhods-operator namespace: redhat-ods-operator spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic EOF COMMAND_BLOCK: # Create operator subscription cat <<EOF | oc apply -f - apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: redhat-ods-operator namespace: redhat-ods-operator spec: {} --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: rhods-operator namespace: redhat-ods-operator spec: channel: stable name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace installPlanApproval: Automatic EOF COMMAND_BLOCK: # Wait for operator to be ready (takes 3-5 minutes) oc get csv -n redhat-ods-operator -w # Verify operator is running oc get pods -n redhat-ods-operator Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Wait for operator to be ready (takes 3-5 minutes) oc get csv -n redhat-ods-operator -w # Verify operator is running oc get pods -n redhat-ods-operator COMMAND_BLOCK: # Wait for operator to be ready (takes 3-5 minutes) oc get csv -n redhat-ods-operator -w # Verify operator is running oc get pods -n redhat-ods-operator COMMAND_BLOCK: # Create the DataScienceCluster custom resource cat <<EOF | oc apply -f - apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create the DataScienceCluster custom resource cat <<EOF | oc apply -f - apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed EOF COMMAND_BLOCK: # Create the DataScienceCluster custom resource cat <<EOF | oc apply -f - apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Managed datasciencepipelines: managementState: Managed kserve: managementState: Managed serving: ingressGateway: certificate: type: SelfSigned managementState: Managed name: knative-serving modelmeshserving: managementState: Managed ray: managementState: Removed workbenches: managementState: Managed EOF COMMAND_BLOCK: # Check DataScienceCluster status oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running oc get pods -n redhat-ods-applications oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Check DataScienceCluster status oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running oc get pods -n redhat-ods-applications oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' COMMAND_BLOCK: # Check DataScienceCluster status oc get datasciencecluster -n redhat-ods-operator # Verify all RHOAI components are running oc get pods -n redhat-ods-applications oc get pods -n redhat-ods-monitoring # Get RHOAI dashboard URL oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}' COMMAND_BLOCK: # Create custom serving runtime for Bedrock cat <<EOF | oc apply -f - apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true" spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create custom serving runtime for Bedrock cat <<EOF | oc apply -f - apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true" spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock EOF COMMAND_BLOCK: # Create custom serving runtime for Bedrock cat <<EOF | oc apply -f - apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: name: bedrock-runtime namespace: rag-application labels: opendatahub.io/dashboard: "true" spec: annotations: prometheus.io/path: /metrics prometheus.io/port: "8080" containers: - name: kserve-container image: quay.io/modh/rest-proxy:latest env: - name: AWS_REGION value: "us-east-1" - name: BEDROCK_ENDPOINT_URL value: "bedrock-runtime.us-east-1.amazonaws.com" ports: - containerPort: 8080 protocol: TCP resources: limits: cpu: "2" memory: 4Gi requests: cpu: "1" memory: 2Gi supportedModelFormats: - autoSelect: true name: bedrock EOF COMMAND_BLOCK: # Enable Bedrock in your region (if not already enabled) aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed) # Go to AWS Console > Bedrock > Model access # Or use the CLI: aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Enable Bedrock in your region (if not already enabled) aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed) # Go to AWS Console > Bedrock > Model access # Or use the CLI: aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' COMMAND_BLOCK: # Enable Bedrock in your region (if not already enabled) aws bedrock list-foundation-models --region us-east-1 # Request access to Claude 3.5 Sonnet (if needed) # Go to AWS Console > Bedrock > Model access # Or use the CLI: aws bedrock put-model-invocation-logging-configuration \ --region us-east-1 \ --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}' COMMAND_BLOCK: # Get the VPC ID of your ROSA cluster export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Get the VPC ID of your ROSA cluster export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" COMMAND_BLOCK: # Get the VPC ID of your ROSA cluster export ROSA_VPC_ID=$(aws ec2 describe-vpcs \ --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \ --query 'Vpcs[0].VpcId' \ --output text \ --region $AWS_REGION) echo "ROSA VPC ID: $ROSA_VPC_ID" # Get private subnet IDs export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \ --query 'Subnets[*].SubnetId' \ --output text \ --region $AWS_REGION) echo "Private Subnets: $PRIVATE_SUBNET_IDS" COMMAND_BLOCK: # Create security group for VPC endpoint export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create security group for VPC endpoint export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" COMMAND_BLOCK: # Create security group for VPC endpoint export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \ --group-name bedrock-vpc-endpoint-sg \ --description "Security group for Bedrock VPC endpoint" \ --vpc-id $ROSA_VPC_ID \ --region $AWS_REGION \ --output text \ --query 'GroupId') echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG" # Allow HTTPS traffic from ROSA worker nodes aws ec2 authorize-security-group-ingress \ --group-id $VPC_ENDPOINT_SG \ --protocol tcp \ --port 443 \ --cidr 10.0.0.0/16 \ --region $AWS_REGION # Create VPC endpoint for Bedrock Runtime export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \ --vpc-id $ROSA_VPC_ID \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \ --subnet-ids $PRIVATE_SUBNET_IDS \ --security-group-ids $VPC_ENDPOINT_SG \ --private-dns-enabled \ --region $AWS_REGION \ --output text \ --query 'VpcEndpoint.VpcEndpointId') echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT" # Wait for VPC endpoint to be available aws ec2 wait vpc-endpoint-available \ --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \ --region $AWS_REGION echo "VPC Endpoint is now available" COMMAND_BLOCK: # Create IAM policy for Bedrock access cat > bedrock-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ] } EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ] } EOF # Create IAM role export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create IAM policy for Bedrock access cat > bedrock-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ] } EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ] } EOF # Create IAM role export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy COMMAND_BLOCK: # Create IAM policy for Bedrock access cat > bedrock-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0" ] } ] } EOF aws iam create-policy \ --policy-name BedrockInvokePolicy \ --policy-document file://bedrock-policy.json \ --region $AWS_REGION # Create trust policy for ROSA service account export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||') cat > trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa" } } } ] } EOF # Create IAM role export BEDROCK_ROLE_ARN=$(aws iam create-role \ --role-name rosa-bedrock-access \ --assume-role-policy-document file://trust-policy.json \ --query 'Role.Arn' \ --output text) echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN" # Attach policy to role aws iam attach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy COMMAND_BLOCK: # Create service account with IAM role annotation cat <<EOF | oc apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN EOF # Verify service account oc get sa bedrock-sa -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create service account with IAM role annotation cat <<EOF | oc apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN EOF # Verify service account oc get sa bedrock-sa -n rag-application COMMAND_BLOCK: # Create service account with IAM role annotation cat <<EOF | oc apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: bedrock-sa namespace: rag-application annotations: eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN EOF # Verify service account oc get sa bedrock-sa -n rag-application COMMAND_BLOCK: # Create test pod with AWS CLI cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: bedrock-test namespace: rag-application spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION" EOF # Wait for pod to be ready oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod oc delete pod bedrock-test -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create test pod with AWS CLI cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: bedrock-test namespace: rag-application spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION" EOF # Wait for pod to be ready oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod oc delete pod bedrock-test -n rag-application COMMAND_BLOCK: # Create test pod with AWS CLI cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: bedrock-test namespace: rag-application spec: serviceAccountName: bedrock-sa containers: - name: aws-cli image: amazon/aws-cli:latest command: ["/bin/sleep", "3600"] env: - name: AWS_REGION value: "$AWS_REGION" EOF # Wait for pod to be ready oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s # Test Bedrock API call oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \ --content-type application/json \ --accept application/json \ --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \ /tmp/response.json # Check the response oc exec -n rag-application bedrock-test -- cat /tmp/response.json # Clean up test pod oc delete pod bedrock-test -n rag-application COMMAND_BLOCK: # Create S3 bucket (name must be globally unique) export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/ aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/ aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create S3 bucket (name must be globally unique) export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/ aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/ aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" COMMAND_BLOCK: # Create S3 bucket (name must be globally unique) export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export BUCKET_NAME="rag-documents-${ACCOUNT_ID}" aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION # Enable versioning aws s3api put-bucket-versioning \ --bucket $BUCKET_NAME \ --versioning-configuration Status=Enabled \ --region $AWS_REGION # Create folder structure aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/ aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/ aws s3api put-object --bucket $BUCKET_NAME --key embeddings/ echo "S3 Bucket created: s3://$BUCKET_NAME" COMMAND_BLOCK: # Create trust policy for Glue cat > glue-trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create Glue service role aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access cat > glue-s3-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ] } EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create trust policy for Glue cat > glue-trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create Glue service role aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access cat > glue-s3-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ] } EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json COMMAND_BLOCK: # Create trust policy for Glue cat > glue-trust-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create Glue service role aws iam create-role \ --role-name AWSGlueServiceRole-RAG \ --assume-role-policy-document file://glue-trust-policy.json # Attach AWS managed policy aws iam attach-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole # Create custom policy for S3 access cat > glue-s3-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::${BUCKET_NAME}" ] } ] } EOF aws iam put-role-policy \ --role-name AWSGlueServiceRole-RAG \ --policy-name S3Access \ --policy-document file://glue-s3-policy.json COMMAND_BLOCK: # Create Glue database aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation aws glue get-database --name rag_documents_db --region $AWS_REGION Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create Glue database aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation aws glue get-database --name rag_documents_db --region $AWS_REGION COMMAND_BLOCK: # Create Glue database aws glue create-database \ --database-input '{ "Name": "rag_documents_db", "Description": "Database for RAG document metadata" }' \ --region $AWS_REGION # Verify database creation aws glue get-database --name rag_documents_db --region $AWS_REGION COMMAND_BLOCK: # Create crawler for raw documents aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create crawler for raw documents aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" COMMAND_BLOCK: # Create crawler for raw documents aws glue create-crawler \ --name rag-document-crawler \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --database-name rag_documents_db \ --targets '{ "S3Targets": [ { "Path": "s3://'$BUCKET_NAME'/raw-documents/" } ] }' \ --schema-change-policy '{ "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }' \ --region $AWS_REGION # Start the crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION echo "Glue crawler created and started" COMMAND_BLOCK: # Create ETL script cat > glue-etl-script.py <<'PYTHON_SCRIPT' import sys import boto3 import json from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame # Initialize args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME'] s3_client = boto3.client('s3') # Read documents from Glue catalog datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents" ) # Document processing function def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3 def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit() PYTHON_SCRIPT # Upload script to S3 aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create ETL script cat > glue-etl-script.py <<'PYTHON_SCRIPT' import sys import boto3 import json from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame # Initialize args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME'] s3_client = boto3.client('s3') # Read documents from Glue catalog datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents" ) # Document processing function def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3 def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit() PYTHON_SCRIPT # Upload script to S3 aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" COMMAND_BLOCK: # Create ETL script cat > glue-etl-script.py <<'PYTHON_SCRIPT' import sys import boto3 import json from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame # Initialize args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) bucket_name = args['BUCKET_NAME'] s3_client = boto3.client('s3') # Read documents from Glue catalog datasource = glueContext.create_dynamic_frame.from_catalog( database="rag_documents_db", table_name="raw_documents" ) # Document processing function def process_document(record): """ Process document: chunk text, extract metadata """ # Simple chunking strategy (500 chars with 50 char overlap) text = record.get('content', '') chunk_size = 500 overlap = 50 chunks = [] for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if chunk: chunks.append({ 'document_id': record.get('document_id'), 'chunk_id': f"{record.get('document_id')}_{i}", 'chunk_text': chunk, 'chunk_index': i // (chunk_size - overlap), 'metadata': { 'source': record.get('source', ''), 'timestamp': record.get('timestamp', ''), 'file_type': record.get('file_type', '') } }) return chunks # Process and write to S3 def process_and_write(): records = datasource.toDF().collect() all_chunks = [] for record in records: chunks = process_document(record.asDict()) all_chunks.extend(chunks) # Write chunks to S3 as JSON for chunk in all_chunks: key = f"processed-documents/{chunk['chunk_id']}.json" s3_client.put_object( Bucket=bucket_name, Key=key, Body=json.dumps(chunk), ContentType='application/json' ) print(f"Processed {len(all_chunks)} chunks from {len(records)} documents") process_and_write() job.commit() PYTHON_SCRIPT # Upload script to S3 aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/ # Create Glue job aws glue create-job \ --name rag-document-processor \ --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \ --command '{ "Name": "glueetl", "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py", "PythonVersion": "3" }' \ --default-arguments '{ "--BUCKET_NAME": "'$BUCKET_NAME'", "--job-language": "python", "--enable-metrics": "true", "--enable-continuous-cloudwatch-log": "true" }' \ --glue-version "4.0" \ --max-retries 0 \ --timeout 60 \ --region $AWS_REGION echo "Glue ETL job created" COMMAND_BLOCK: # Upload sample document cat > sample-document.txt <<EOF This is a sample document for testing the RAG pipeline. It contains multiple sentences that will be chunked and processed. The Glue ETL job will extract this content and prepare it for vectorization. This demonstrates the data pipeline from S3 to processed chunks. EOF # Upload to S3 aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status) aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Upload sample document cat > sample-document.txt <<EOF This is a sample document for testing the RAG pipeline. It contains multiple sentences that will be chunked and processed. The Glue ETL job will extract this content and prepare it for vectorization. This demonstrates the data pipeline from S3 to processed chunks. EOF # Upload to S3 aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status) aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK: # Upload sample document cat > sample-document.txt <<EOF This is a sample document for testing the RAG pipeline. It contains multiple sentences that will be chunked and processed. The Glue ETL job will extract this content and prepare it for vectorization. This demonstrates the data pipeline from S3 to processed chunks. EOF # Upload to S3 aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/ # Run crawler to detect new file aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait for crawler to complete (check status) aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State' # Run ETL job aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed outputs sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK: # Add Milvus Helm repository helm repo add milvus https://milvus-io.github.io/milvus-helm/ helm repo update # Install Milvus operator helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation oc get pods -n milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Add Milvus Helm repository helm repo add milvus https://milvus-io.github.io/milvus-helm/ helm repo update # Install Milvus operator helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation oc get pods -n milvus COMMAND_BLOCK: # Add Milvus Helm repository helm repo add milvus https://milvus-io.github.io/milvus-helm/ helm repo update # Install Milvus operator helm install milvus-operator milvus/milvus-operator \ --namespace milvus \ --create-namespace \ --set operator.image.tag=v0.9.0 # Verify operator installation oc get pods -n milvus COMMAND_BLOCK: # Create PersistentVolumeClaims for Milvus cat <<EOF | oc apply -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-etcd-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-minio-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create PersistentVolumeClaims for Milvus cat <<EOF | oc apply -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-etcd-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-minio-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi EOF COMMAND_BLOCK: # Create PersistentVolumeClaims for Milvus cat <<EOF | oc apply -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-etcd-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gp3-csi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: milvus-minio-pvc namespace: milvus spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: gp3-csi EOF COMMAND_BLOCK: # Create Milvus cluster configuration cat > milvus-values.yaml <<EOF cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation oc get pods -n milvus oc get svc -n milvus Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create Milvus cluster configuration cat > milvus-values.yaml <<EOF cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation oc get pods -n milvus oc get svc -n milvus COMMAND_BLOCK: # Create Milvus cluster configuration cat > milvus-values.yaml <<EOF cluster: enabled: true service: type: ClusterIP port: 19530 standalone: replicas: 1 resources: limits: cpu: "4" memory: 8Gi requests: cpu: "2" memory: 4Gi etcd: replicaCount: 1 persistence: enabled: true existingClaim: milvus-etcd-pvc minio: mode: standalone persistence: enabled: true existingClaim: milvus-minio-pvc pulsar: enabled: false kafka: enabled: false metrics: enabled: true serviceMonitor: enabled: true EOF # Install Milvus helm install milvus milvus/milvus \ --namespace milvus \ --values milvus-values.yaml \ --wait # Verify Milvus installation oc get pods -n milvus oc get svc -n milvus COMMAND_BLOCK: # Get Milvus service endpoint export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}') export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details cat <<EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: milvus-config namespace: rag-application data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT" EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Get Milvus service endpoint export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}') export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details cat <<EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: milvus-config namespace: rag-application data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT" EOF COMMAND_BLOCK: # Get Milvus service endpoint export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}') export MILVUS_PORT=19530 echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT" # Create config map with Milvus connection details cat <<EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: milvus-config namespace: rag-application data: MILVUS_HOST: "$MILVUS_HOST" MILVUS_PORT: "$MILVUS_PORT" EOF COMMAND_BLOCK: # Create test pod with pymilvus cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: milvus-test namespace: rag-application spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT EOF # Wait for pod oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection oc exec -n rag-application milvus-test -- bash -c " pip install pymilvus && python3 <<PYTHON from pymilvus import connections, utility import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) print('Connected to Milvus successfully!') print('Milvus version:', utility.get_server_version()) PYTHON " # Clean up test pod oc delete pod milvus-test -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create test pod with pymilvus cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: milvus-test namespace: rag-application spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT EOF # Wait for pod oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection oc exec -n rag-application milvus-test -- bash -c " pip install pymilvus && python3 <<PYTHON from pymilvus import connections, utility import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) print('Connected to Milvus successfully!') print('Milvus version:', utility.get_server_version()) PYTHON " # Clean up test pod oc delete pod milvus-test -n rag-application COMMAND_BLOCK: # Create test pod with pymilvus cat <<EOF | oc apply -f - apiVersion: v1 kind: Pod metadata: name: milvus-test namespace: rag-application spec: containers: - name: python image: python:3.11-slim command: ["/bin/sleep", "3600"] env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT EOF # Wait for pod oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s # Install pymilvus and test connection oc exec -n rag-application milvus-test -- bash -c " pip install pymilvus && python3 <<PYTHON from pymilvus import connections, utility import os connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) print('Connected to Milvus successfully!') print('Milvus version:', utility.get_server_version()) PYTHON " # Clean up test pod oc delete pod milvus-test -n rag-application COMMAND_BLOCK: # Create initialization job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: milvus-init namespace: rag-application spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3 EOF # Check job status oc logs job/milvus-init -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create initialization job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: milvus-init namespace: rag-application spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3 EOF # Check job status oc logs job/milvus-init -n rag-application COMMAND_BLOCK: # Create initialization job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: milvus-init namespace: rag-application spec: template: spec: containers: - name: init image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT command: - /bin/bash - -c - | pip install pymilvus python3 <<PYTHON from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection import os # Connect to Milvus connections.connect( alias='default', host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) # Define collection schema fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535), FieldSchema(name='metadata', dtype=DataType.JSON) ] schema = CollectionSchema( fields=fields, description='RAG document embeddings collection' ) # Create collection collection = Collection( name='rag_documents', schema=schema ) # Create index index_params = { 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': {'nlist': 128} } collection.create_index( field_name='embedding', index_params=index_params ) print(f'Collection created: {collection.name}') print(f'Number of entities: {collection.num_entities}') PYTHON restartPolicy: Never backoffLimit: 3 EOF # Check job status oc logs job/milvus-init -n rag-application COMMAND_BLOCK: # Create application directory structure mkdir -p rag-app/{src,config,tests} # Create requirements.txt cat > rag-app/requirements.txt <<EOF fastapi==0.104.1 uvicorn[standard]==0.24.0 pydantic==2.5.0 pymilvus==2.3.3 boto3==1.29.7 langchain==0.0.350 langchain-community==0.0.1 python-dotenv==1.0.0 httpx==0.25.2 EOF # Create main application cat > rag-app/src/main.py <<'PYTHON_CODE' from fastapi import FastAPI, HTTPException from pydantic import BaseModel from typing import List, Optional, Dict, Any import os import json import boto3 from pymilvus import connections, Collection import logging # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Initialize FastAPI app app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0" ) # Configuration MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local") MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530")) AWS_REGION = os.getenv("AWS_REGION", "us-east-1") BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0" COLLECTION_NAME = "rag_documents" # Initialize clients bedrock_runtime = None milvus_collection = None @app.on_event("startup") async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown") async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints @app.get("/health", response_model=HealthResponse) async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse) async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context: {context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/") async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) PYTHON_CODE # Create Dockerfile cat > rag-app/Dockerfile <<EOF FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY src/ ./src/ # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"] EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create application directory structure mkdir -p rag-app/{src,config,tests} # Create requirements.txt cat > rag-app/requirements.txt <<EOF fastapi==0.104.1 uvicorn[standard]==0.24.0 pydantic==2.5.0 pymilvus==2.3.3 boto3==1.29.7 langchain==0.0.350 langchain-community==0.0.1 python-dotenv==1.0.0 httpx==0.25.2 EOF # Create main application cat > rag-app/src/main.py <<'PYTHON_CODE' from fastapi import FastAPI, HTTPException from pydantic import BaseModel from typing import List, Optional, Dict, Any import os import json import boto3 from pymilvus import connections, Collection import logging # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Initialize FastAPI app app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0" ) # Configuration MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local") MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530")) AWS_REGION = os.getenv("AWS_REGION", "us-east-1") BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0" COLLECTION_NAME = "rag_documents" # Initialize clients bedrock_runtime = None milvus_collection = None @app.on_event("startup") async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown") async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints @app.get("/health", response_model=HealthResponse) async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse) async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context: {context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/") async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) PYTHON_CODE # Create Dockerfile cat > rag-app/Dockerfile <<EOF FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY src/ ./src/ # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"] EOF COMMAND_BLOCK: # Create application directory structure mkdir -p rag-app/{src,config,tests} # Create requirements.txt cat > rag-app/requirements.txt <<EOF fastapi==0.104.1 uvicorn[standard]==0.24.0 pydantic==2.5.0 pymilvus==2.3.3 boto3==1.29.7 langchain==0.0.350 langchain-community==0.0.1 python-dotenv==1.0.0 httpx==0.25.2 EOF # Create main application cat > rag-app/src/main.py <<'PYTHON_CODE' from fastapi import FastAPI, HTTPException from pydantic import BaseModel from typing import List, Optional, Dict, Any import os import json import boto3 from pymilvus import connections, Collection import logging # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Initialize FastAPI app app = FastAPI( title="Enterprise RAG API", description="RAG platform using OpenShift AI, Bedrock, and Milvus", version="1.0.0" ) # Configuration MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local") MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530")) AWS_REGION = os.getenv("AWS_REGION", "us-east-1") BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0" COLLECTION_NAME = "rag_documents" # Initialize clients bedrock_runtime = None milvus_collection = None @app.on_event("startup") async def startup_event(): """Initialize connections on startup""" global bedrock_runtime, milvus_collection try: # Connect to Milvus connections.connect( alias="default", host=MILVUS_HOST, port=MILVUS_PORT ) milvus_collection = Collection(COLLECTION_NAME) milvus_collection.load() logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}") # Initialize Bedrock client bedrock_runtime = boto3.client( service_name='bedrock-runtime', region_name=AWS_REGION ) logger.info("Initialized Bedrock client") except Exception as e: logger.error(f"Startup error: {str(e)}") raise @app.on_event("shutdown") async def shutdown_event(): """Cleanup on shutdown""" try: connections.disconnect("default") logger.info("Disconnected from Milvus") except Exception as e: logger.error(f"Shutdown error: {str(e)}") # Request/Response models class QueryRequest(BaseModel): query: str top_k: Optional[int] = 5 max_tokens: Optional[int] = 1000 class QueryResponse(BaseModel): answer: str sources: List[Dict[str, Any]] metadata: Dict[str, Any] class HealthResponse(BaseModel): status: str milvus_connected: bool bedrock_available: bool # API endpoints @app.get("/health", response_model=HealthResponse) async def health_check(): """Health check endpoint""" milvus_ok = False bedrock_ok = False try: if milvus_collection: milvus_collection.num_entities milvus_ok = True except: pass try: if bedrock_runtime: bedrock_ok = True except: pass return HealthResponse( status="healthy" if (milvus_ok and bedrock_ok) else "degraded", milvus_connected=milvus_ok, bedrock_available=bedrock_ok ) @app.post("/query", response_model=QueryResponse) async def query_rag(request: QueryRequest): """ Process RAG query: 1. Generate embedding for query 2. Search similar documents in Milvus 3. Construct prompt with context 4. Call Bedrock for generation """ try: # Step 1: Generate query embedding using Bedrock query_embedding = await generate_embedding(request.query) # Step 2: Search Milvus for similar documents search_params = { "metric_type": "L2", "params": {"nprobe": 10} } results = milvus_collection.search( data=[query_embedding], anns_field="embedding", param=search_params, limit=request.top_k, output_fields=["chunk_id", "text", "metadata"] ) # Extract context from search results contexts = [] sources = [] for hit in results[0]: contexts.append(hit.entity.get("text")) sources.append({ "chunk_id": hit.entity.get("chunk_id"), "score": float(hit.score), "metadata": hit.entity.get("metadata") }) # Step 3: Construct prompt with context context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)]) prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question. If the answer cannot be found in the context, say so. Context: {context_text} User Question: {request.query} Answer:""" # Step 4: Call Bedrock for generation response = bedrock_runtime.invoke_model( modelId=BEDROCK_MODEL_ID, contentType="application/json", accept="application/json", body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": request.max_tokens, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7 }) ) response_body = json.loads(response['body'].read()) answer = response_body['content'][0]['text'] return QueryResponse( answer=answer, sources=sources, metadata={ "query": request.query, "num_sources": len(sources), "model": BEDROCK_MODEL_ID } ) except Exception as e: logger.error(f"Query error: {str(e)}") raise HTTPException(status_code=500, detail=str(e)) async def generate_embedding(text: str) -> List[float]: """Generate embedding using Bedrock Titan Embeddings""" try: response = bedrock_runtime.invoke_model( modelId="amazon.titan-embed-text-v2:0", contentType="application/json", accept="application/json", body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) response_body = json.loads(response['body'].read()) return response_body['embedding'] except Exception as e: logger.error(f"Embedding generation error: {str(e)}") raise @app.get("/") async def root(): """Root endpoint""" return { "message": "Enterprise RAG API", "version": "1.0.0", "endpoints": { "health": "/health", "query": "/query", "docs": "/docs" } } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) PYTHON_CODE # Create Dockerfile cat > rag-app/Dockerfile <<EOF FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY src/ ./src/ # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"] EOF COMMAND_BLOCK: # Build container image (using podman or docker) cd rag-app # Option 1: Build with podman podman build -t rag-application:v1.0 . # Option 2: Build with docker # docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream oc create imagestream rag-application -n rag-application # Tag and push podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0 podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Build container image (using podman or docker) cd rag-app # Option 1: Build with podman podman build -t rag-application:v1.0 . # Option 2: Build with docker # docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream oc create imagestream rag-application -n rag-application # Tag and push podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0 podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. COMMAND_BLOCK: # Build container image (using podman or docker) cd rag-app # Option 1: Build with podman podman build -t rag-application:v1.0 . # Option 2: Build with docker # docker build -t rag-application:v1.0 . # Tag for OpenShift internal registry export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}') # Login to OpenShift registry podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false # Create image stream oc create imagestream rag-application -n rag-application # Tag and push podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0 podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false cd .. COMMAND_BLOCK: # Create deployment cat <<EOF | oc apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: rag-application namespace: rag-application labels: app: rag-application spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5 --- apiVersion: v1 kind: Service metadata: name: rag-application namespace: rag-application spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP --- apiVersion: route.openshift.io/v1 kind: Route metadata: name: rag-application namespace: rag-application spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect EOF Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create deployment cat <<EOF | oc apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: rag-application namespace: rag-application labels: app: rag-application spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5 --- apiVersion: v1 kind: Service metadata: name: rag-application namespace: rag-application spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP --- apiVersion: route.openshift.io/v1 kind: Route metadata: name: rag-application namespace: rag-application spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect EOF COMMAND_BLOCK: # Create deployment cat <<EOF | oc apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: rag-application namespace: rag-application labels: app: rag-application spec: replicas: 2 selector: matchLabels: app: rag-application template: metadata: labels: app: rag-application spec: serviceAccountName: bedrock-sa containers: - name: app image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0 ports: - containerPort: 8000 protocol: TCP env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 10 periodSeconds: 5 --- apiVersion: v1 kind: Service metadata: name: rag-application namespace: rag-application spec: selector: app: rag-application ports: - protocol: TCP port: 80 targetPort: 8000 type: ClusterIP --- apiVersion: route.openshift.io/v1 kind: Route metadata: name: rag-application namespace: rag-application spec: to: kind: Service name: rag-application port: targetPort: 8000 tls: termination: edge insecureEdgeTerminationPolicy: Redirect EOF COMMAND_BLOCK: # Check deployment status oc get deployment rag-application -n rag-application oc get pods -n rag-application -l app=rag-application # Get application URL export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}') echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint curl https://$RAG_APP_URL/health # View application logs oc logs -f deployment/rag-application -n rag-application Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Check deployment status oc get deployment rag-application -n rag-application oc get pods -n rag-application -l app=rag-application # Get application URL export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}') echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint curl https://$RAG_APP_URL/health # View application logs oc logs -f deployment/rag-application -n rag-application COMMAND_BLOCK: # Check deployment status oc get deployment rag-application -n rag-application oc get pods -n rag-application -l app=rag-application # Get application URL export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}') echo "RAG Application URL: https://$RAG_APP_URL" # Test health endpoint curl https://$RAG_APP_URL/health # View application logs oc logs -f deployment/rag-application -n rag-application COMMAND_BLOCK: # Upload test documents to S3 cat > test-doc-1.txt <<EOF Red Hat OpenShift is an enterprise Kubernetes platform that provides a complete application platform for developing and deploying containerized applications. It includes integrated CI/CD, monitoring, and developer tools. EOF cat > test-doc-2.txt <<EOF Amazon Bedrock is a fully managed service that offers foundation models from leading AI companies through a single API. It provides access to models like Claude, Llama, and Stable Diffusion for various use cases. EOF # Upload to S3 aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/ aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job sleep 120 aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Upload test documents to S3 cat > test-doc-1.txt <<EOF Red Hat OpenShift is an enterprise Kubernetes platform that provides a complete application platform for developing and deploying containerized applications. It includes integrated CI/CD, monitoring, and developer tools. EOF cat > test-doc-2.txt <<EOF Amazon Bedrock is a fully managed service that offers foundation models from leading AI companies through a single API. It provides access to models like Claude, Llama, and Stable Diffusion for various use cases. EOF # Upload to S3 aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/ aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job sleep 120 aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK: # Upload test documents to S3 cat > test-doc-1.txt <<EOF Red Hat OpenShift is an enterprise Kubernetes platform that provides a complete application platform for developing and deploying containerized applications. It includes integrated CI/CD, monitoring, and developer tools. EOF cat > test-doc-2.txt <<EOF Amazon Bedrock is a fully managed service that offers foundation models from leading AI companies through a single API. It provides access to models like Claude, Llama, and Stable Diffusion for various use cases. EOF # Upload to S3 aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/ aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/ # Trigger Glue crawler aws glue start-crawler --name rag-document-crawler --region $AWS_REGION # Wait and run ETL job sleep 120 aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION # Check processed documents sleep 60 aws s3 ls s3://$BUCKET_NAME/processed-documents/ COMMAND_BLOCK: # Create embedding job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: embed-documents namespace: rag-application spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3 EOF # Monitor job oc logs job/embed-documents -n rag-application -f Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Create embedding job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: embed-documents namespace: rag-application spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3 EOF # Monitor job oc logs job/embed-documents -n rag-application -f COMMAND_BLOCK: # Create embedding job cat <<EOF | oc apply -f - apiVersion: batch/v1 kind: Job metadata: name: embed-documents namespace: rag-application spec: template: spec: serviceAccountName: bedrock-sa containers: - name: embedder image: python:3.11-slim env: - name: MILVUS_HOST valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_HOST - name: MILVUS_PORT valueFrom: configMapKeyRef: name: milvus-config key: MILVUS_PORT - name: AWS_REGION value: "us-east-1" - name: BUCKET_NAME value: "$BUCKET_NAME" command: - /bin/bash - -c - | pip install pymilvus boto3 python3 <<PYTHON import boto3 import json import os from pymilvus import connections, Collection # Connect to services s3 = boto3.client('s3') bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION']) connections.connect( host=os.environ['MILVUS_HOST'], port=os.environ['MILVUS_PORT'] ) collection = Collection('rag_documents') # Get processed documents bucket = os.environ['BUCKET_NAME'] response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/') for obj in response.get('Contents', []): if obj['Key'].endswith('.json'): # Read document chunk doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read()) # Generate embedding embed_response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ 'inputText': doc['chunk_text'], 'dimensions': 1024, 'normalize': True }) ) embedding = json.loads(embed_response['body'].read())['embedding'] # Insert into Milvus collection.insert([ [doc['chunk_id']], [embedding], [doc['chunk_text']], [doc['metadata']] ]) print(f"Inserted: {doc['chunk_id']}") collection.flush() print(f"Total entities in collection: {collection.num_entities}") PYTHON restartPolicy: Never backoffLimit: 3 EOF # Monitor job oc logs job/embed-documents -n rag-application -f COMMAND_BLOCK: # Test RAG query endpoint curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Test RAG query endpoint curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . COMMAND_BLOCK: # Test RAG query endpoint curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "What is Red Hat OpenShift?", "top_k": 3, "max_tokens": 500 }' | jq . # Test another query curl -X POST "https://$RAG_APP_URL/query" \ -H "Content-Type: application/json" \ -d '{ "query": "Tell me about Amazon Bedrock foundation models", "top_k": 3, "max_tokens": 500 }' | jq . COMMAND_BLOCK: # Install Apache Bench for load testing sudo yum install httpd-tools -y # Create query payload cat > query-payload.json <<EOF { "query": "What are the benefits of using OpenShift?", "top_k": 5 } EOF # Run load test (100 requests, 10 concurrent) ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Install Apache Bench for load testing sudo yum install httpd-tools -y # Create query payload cat > query-payload.json <<EOF { "query": "What are the benefits of using OpenShift?", "top_k": 5 } EOF # Run load test (100 requests, 10 concurrent) ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" COMMAND_BLOCK: # Install Apache Bench for load testing sudo yum install httpd-tools -y # Create query payload cat > query-payload.json <<EOF { "query": "What are the benefits of using OpenShift?", "top_k": 5 } EOF # Run load test (100 requests, 10 concurrent) ab -n 100 -c 10 -p query-payload.json \ -T application/json \ "https://$RAG_APP_URL/query" COMMAND_BLOCK: # Delete RAG application oc delete deployment rag-application -n rag-application oc delete service rag-application -n rag-application oc delete route rag-application -n rag-application # Delete Milvus helm uninstall milvus -n milvus helm uninstall milvus-operator -n milvus oc delete pvc --all -n milvus # Delete RHOAI oc delete datasciencecluster default-dsc -n redhat-ods-operator oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces oc delete project rag-application oc delete project milvus oc delete project redhat-ods-applications oc delete project redhat-ods-operator oc delete project redhat-ods-monitoring Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Delete RAG application oc delete deployment rag-application -n rag-application oc delete service rag-application -n rag-application oc delete route rag-application -n rag-application # Delete Milvus helm uninstall milvus -n milvus helm uninstall milvus-operator -n milvus oc delete pvc --all -n milvus # Delete RHOAI oc delete datasciencecluster default-dsc -n redhat-ods-operator oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces oc delete project rag-application oc delete project milvus oc delete project redhat-ods-applications oc delete project redhat-ods-operator oc delete project redhat-ods-monitoring COMMAND_BLOCK: # Delete RAG application oc delete deployment rag-application -n rag-application oc delete service rag-application -n rag-application oc delete route rag-application -n rag-application # Delete Milvus helm uninstall milvus -n milvus helm uninstall milvus-operator -n milvus oc delete pvc --all -n milvus # Delete RHOAI oc delete datasciencecluster default-dsc -n redhat-ods-operator oc delete subscription rhods-operator -n redhat-ods-operator # Delete projects/namespaces oc delete project rag-application oc delete project milvus oc delete project redhat-ods-applications oc delete project redhat-ods-operator oc delete project redhat-ods-monitoring COMMAND_BLOCK: # Delete ROSA cluster (takes ~10-15 minutes) rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted rosa list clusters Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Delete ROSA cluster (takes ~10-15 minutes) rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted rosa list clusters COMMAND_BLOCK: # Delete ROSA cluster (takes ~10-15 minutes) rosa delete cluster --cluster=$CLUSTER_NAME --yes # Wait for cluster deletion to complete rosa logs uninstall --cluster=$CLUSTER_NAME --watch # Verify cluster is deleted rosa list clusters COMMAND_BLOCK: # Delete Glue job aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole aws iam delete-role --role-name AWSGlueServiceRole-RAG Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Delete Glue job aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole aws iam delete-role --role-name AWSGlueServiceRole-RAG COMMAND_BLOCK: # Delete Glue job aws glue delete-job --job-name rag-document-processor --region $AWS_REGION # Delete Glue crawler aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION # Delete Glue database aws glue delete-database --name rag_documents_db --region $AWS_REGION # Delete Glue IAM role aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole aws iam delete-role --role-name AWSGlueServiceRole-RAG COMMAND_BLOCK: # Delete all objects in bucket aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Delete all objects in bucket aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" COMMAND_BLOCK: # Delete all objects in bucket aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION # Delete bucket aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION echo "S3 bucket deleted: $BUCKET_NAME" COMMAND_BLOCK: # Delete VPC endpoint for Bedrock aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Delete VPC endpoint for Bedrock aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" COMMAND_BLOCK: # Delete VPC endpoint for Bedrock aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION # Delete security group aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION echo "VPC endpoint and security group deleted" COMMAND_BLOCK: # Detach policy from Bedrock role aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Detach policy from Bedrock role aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" COMMAND_BLOCK: # Detach policy from Bedrock role aws iam detach-role-policy \ --role-name rosa-bedrock-access \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy # Delete Bedrock role aws iam delete-role --role-name rosa-bedrock-access # Delete Bedrock policy aws iam delete-policy \ --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy echo "IAM roles and policies deleted" COMMAND_BLOCK: # Remove temporary files rm -f bedrock-policy.json rm -f trust-policy.json rm -f glue-trust-policy.json rm -f glue-s3-policy.json rm -f glue-etl-script.py rm -f sample-document.txt rm -f test-doc-1.txt rm -f test-doc-2.txt rm -f query-payload.json rm -f milvus-values.yaml rm -rf rag-app/ echo "Local temporary files cleaned up" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Remove temporary files rm -f bedrock-policy.json rm -f trust-policy.json rm -f glue-trust-policy.json rm -f glue-s3-policy.json rm -f glue-etl-script.py rm -f sample-document.txt rm -f test-doc-1.txt rm -f test-doc-2.txt rm -f query-payload.json rm -f milvus-values.yaml rm -rf rag-app/ echo "Local temporary files cleaned up" COMMAND_BLOCK: # Remove temporary files rm -f bedrock-policy.json rm -f trust-policy.json rm -f glue-trust-policy.json rm -f glue-s3-policy.json rm -f glue-etl-script.py rm -f sample-document.txt rm -f test-doc-1.txt rm -f test-doc-2.txt rm -f query-payload.json rm -f milvus-values.yaml rm -rf rag-app/ echo "Local temporary files cleaned up" COMMAND_BLOCK: # Verify ROSA cluster is deleted rosa list clusters # Verify S3 bucket is deleted aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Verify ROSA cluster is deleted rosa list clusters # Verify S3 bucket is deleted aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" COMMAND_BLOCK: # Verify ROSA cluster is deleted rosa list clusters # Verify S3 bucket is deleted aws s3 ls | grep $BUCKET_NAME # Verify VPC endpoints are deleted aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT # Verify IAM roles are deleted aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG" echo "Cleanup verification complete" - Architecture - Prerequisites - Phase 1: ROSA Cluster Setup - Phase 2: Red Hat OpenShift AI Installation - Phase 3: Amazon Bedrock Integration via PrivateLink - Phase 4: AWS Glue Data Pipeline - Phase 5: Milvus Vector Database Deployment - Phase 6: RAG Application Deployment - Testing and Validation - Privacy-First Architecture: All sensitive data remains within your controlled OpenShift environment - Secure Connectivity: AWS PrivateLink ensures AI model calls never traverse the public internet - Enterprise Compliance: Meets stringent data governance and compliance requirements - Scalable Infrastructure: Leverages Kubernetes orchestration for production-grade reliability - Best-of-Breed Components: Combines Red Hat's enterprise Kubernetes with AWS's managed AI services - Document Ingestion: Documents uploaded to S3 bucket - ETL Processing: AWS Glue crawler discovers and processes documents - Embedding Generation: Processed documents sent to Bedrock for embedding generation - Vector Storage: Embeddings stored in Milvus running on ROSA - Query Processing: User queries received by RAG application - Vector Search: Application searches Milvus for relevant document chunks - Context Retrieval: Relevant chunks retrieved from vector database - LLM Inference: RHOAI gateway forwards prompt + context to Bedrock via PrivateLink - Response Generation: Claude 3.5 generates response based on retrieved context - Response Delivery: Answer returned to user through application - Network Isolation: ROSA cluster in private subnets with no public ingress - PrivateLink Encryption: All Bedrock API calls encrypted in transit via AWS PrivateLink - Data Sovereignty: Document content never leaves controlled environment - RBAC: OpenShift role-based access control for all components - Secrets Management: OpenShift secrets for API keys and credentials - [ ] AWS Account with administrative access - [ ] Red Hat Account with OpenShift subscription - [ ] ROSA Enabled in your AWS account (Enable ROSA) - [ ] Amazon Bedrock Access with Claude 3.5 Sonnet model enabled in your region - EC2 (VPC, subnets, security groups, instances) - IAM (roles, policies) - S3 (buckets, objects) - Bedrock (InvokeModel, InvokeModelWithResponseStream) - Glue (crawlers, jobs, databases) - CloudWatch (logs, metrics) - AWS fundamentals (VPC, IAM, S3) - Kubernetes basics (pods, deployments, services) - Basic Linux command line - YAML configuration files - REST APIs and HTTP concepts - m5.2xlarge: 8 vCPUs, 32 GB RAM per node - suitable for vector database and ML workloads - 3 nodes: High availability across multiple availability zones - Multi-AZ: Ensures resilience against AZ failures