<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[GainesAI]]></title><description><![CDATA[Build with AWS and AI through real demos and rapid prototypes. Learn what works, then scale it to production.]]></description><link>https://blog.gainesai.com</link><image><url>https://cdn.hashnode.com/uploads/logos/69d828c8fa7251682e0c6f85/5dce551b-45b1-4846-b11a-798f226e1640.png</url><title>GainesAI</title><link>https://blog.gainesai.com</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 16 May 2026 20:13:39 GMT</lastBuildDate><atom:link href="https://blog.gainesai.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building Institutional Memory Into Your AI App: Capturing Unstructured Decision Data with Amazon Bedrock Knowledge Bases]]></title><description><![CDATA[Teams don’t lose data—they lose reasoning. A team selects Site 7. The system logs it. But six months later, no one remembers why. That missing “why” is where institutional knowledge actually lives.
Th]]></description><link>https://blog.gainesai.com/building-institutional-memory-into-your-ai-app-capturing-unstructured-decision-data-with-amazon-bedrock-knowledge-bases</link><guid isPermaLink="true">https://blog.gainesai.com/building-institutional-memory-into-your-ai-app-capturing-unstructured-decision-data-with-amazon-bedrock-knowledge-bases</guid><category><![CDATA[AI Architecture]]></category><category><![CDATA[Amazon Bedrock]]></category><category><![CDATA[Knowledge Bases]]></category><category><![CDATA[Retrieval-Augmented Generation (RAG)]]></category><category><![CDATA[RAG ]]></category><category><![CDATA[Enterprise AI]]></category><dc:creator><![CDATA[James Gaines]]></dc:creator><pubDate>Thu, 30 Apr 2026 00:08:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69d828c8fa7251682e0c6f85/71100471-84af-4d99-9d9c-f433f0ba1013.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Teams don’t lose data—they lose reasoning. A team selects Site 7. The system logs it. But six months later, no one remembers why. That missing “why” is where institutional knowledge actually lives.</p>
<p>This is the difference between an app that <em>records</em> decisions and one that <em>learns</em> from them.</p>
<p>This post walks through how to build the latter end-to-end. We'll intercept the moment a team makes a selection, use an AI agent to capture the rationale in natural conversation, store those decision records in Amazon Bedrock Knowledge Bases, and surface that institutional memory to future teams — all inside the same app they're already using. The stack is React/Node.js on the frontend, Python 3.12+ on the backend, and AWS-native services throughout.</p>
<hr />
<h2>The Problem with Unstructured Institutional Knowledge</h2>
<p>Most enterprise apps are good at capturing <em>what</em> happened. They're terrible at capturing <em>why</em>.</p>
<p>In the research site selection context, you might have a database full of selections — site IDs, dates, teams, outcomes — but none of the reasoning that shaped those choices. That reasoning is exactly what makes institutional knowledge valuable. It's the difference between "Team B selected Site 7" and "Team B selected Site 7 because the proximity to the river introduced too much ambient noise at Sites 3 and 12, and the permit turnaround at Site 7 is historically two weeks faster."</p>
<p>The challenge is capturing that reasoning without burdening the team. A form that pops up asking "Why did you choose this site?" will get abandoned inside a week. The better approach is to use the AI agent that's already in the workflow to conduct a brief, conversational debrief — three or four targeted questions, 90 seconds of the team's time — and then automatically package and index that reasoning so it's available to everyone who comes after them.</p>
<hr />
<h2>The Institutional Memory Loop</h2>
<p>The core concept is a self-reinforcing cycle we'll call the <strong>Institutional Memory Loop</strong>. The same agent that helps a team <em>make</em> a decision is also the one that <em>captures</em> the reasoning behind it — and later <em>surfaces</em> that reasoning to the next team facing the same choice. The knowledge base is what closes the loop.</p>
<p><img src="https://raw.githubusercontent.com/jrgwv/blog-assets/main/diagrams/knowledge-bases/institutional-memory-loop.png" alt="The Institutional Memory Loop" /></p>
<p>The flow has two phases. In the <strong>capture phase</strong> (top row), a team selects a site, the agent conducts a 90-second conversational debrief, and a structured decision record lands in S3. An event notification triggers a Bedrock KB ingestion job, embedding the document into an OpenSearch Serverless vector store. In the <strong>retrieve phase</strong> (bottom row), the next team's agent query hits the knowledge base, pulls relevant past decisions, synthesizes them with Claude, and surfaces the context in the same UI — attributed, with provenance. The app the next team uses looks and feels identical to the one the previous team used. The difference is that it now knows things.</p>
<hr />
<h2>Step 1: Setting Up the Bedrock Knowledge Base</h2>
<p>Before wiring up the capture pipeline, you need a knowledge base to receive documents. Amazon Bedrock Knowledge Bases manages the entire RAG infrastructure — chunking, embedding, and storing documents in a vector store — so you don't need to operate OpenSearch directly.</p>
<h3>Create the S3 Bucket and IAM Role</h3>
<pre><code class="language-python"># infrastructure/setup_knowledge_base.py
import boto3
import json

s3 = boto3.client("s3", region_name="us-east-1")
iam = boto3.client("iam", region_name="us-east-1")
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")

BUCKET_NAME = "research-decision-records-prod"
KNOWLEDGE_BASE_NAME = "site-selection-decisions-kb"
EMBEDDING_MODEL_ARN = (
    "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
)

# Create the S3 bucket for decision records
s3.create_bucket(Bucket=BUCKET_NAME)

s3.put_bucket_versioning(
    Bucket=BUCKET_NAME,
    VersioningConfiguration={"Status": "Enabled"},
)

# IAM trust policy for Bedrock to access S3 and OpenSearch
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "bedrock.amazonaws.com"},
            "Action": "sts:AssumeRole",
        }
    ],
}

role = iam.create_role(
    RoleName="BedrockKBRole-SiteDecisions",
    AssumeRolePolicyDocument=json.dumps(trust_policy),
)

iam.attach_role_policy(
    RoleName="BedrockKBRole-SiteDecisions",
    PolicyArn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
)

# Create the Bedrock Knowledge Base with OpenSearch Serverless
kb_response = bedrock_agent.create_knowledge_base(
    name=KNOWLEDGE_BASE_NAME,
    description="Institutional knowledge from research site selection decisions",
    roleArn=role["Role"]["Arn"],
    knowledgeBaseConfiguration={
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
            "embeddingModelArn": EMBEDDING_MODEL_ARN,
        },
    },
    storageConfiguration={
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration": {
            "collectionArn": "arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID",
            "vectorIndexName": "site-decisions-index",
            "fieldMapping": {
                "vectorField": "embedding",
                "textField": "text",
                "metadataField": "metadata",
            },
        },
    },
)

knowledge_base_id = kb_response["knowledgeBase"]["knowledgeBaseId"]

# Add the S3 data source
bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name="decision-records-s3",
    dataSourceConfiguration={
        "type": "S3",
        "s3Configuration": {
            "bucketArn": f"arn:aws:s3:::{BUCKET_NAME}",
            "inclusionPrefixes": ["decision-records/"],
        },
    },
    vectorIngestionConfiguration={
        "chunkingConfiguration": {
            "chunkingStrategy": "SEMANTIC",
            "semanticChunkingConfiguration": {
                "maxTokens": 300,
                "bufferSize": 1,
                "breakpointPercentileThreshold": 95,
            },
        }
    },
)

print(f"Knowledge Base ID: {knowledge_base_id}")
</code></pre>
<p>A few things worth calling out here. The <code>SEMANTIC</code> chunking strategy is intentional — decision records are narrative text, and semantic chunking preserves the logical boundaries of the reasoning rather than slicing arbitrarily by token count. You get better retrieval quality on conversational documents this way.</p>
<p><strong>Why OpenSearch Serverless over managed OpenSearch Service?</strong> At this stage of the project, you don't need the operational depth that full OpenSearch gives you. Managed OpenSearch requires you to provision and right-size clusters, configure shard counts and replica topology, manage index lifecycle policies, and handle version upgrades. For a knowledge base that grows incrementally — a few decision records per week — that's overhead with no payoff. OpenSearch Serverless scales on demand and has zero cluster management surface. Bedrock Knowledge Bases provisions and manages the Serverless collection for you, so your interaction with the underlying vector store is exactly zero lines of infrastructure code. If you later need custom analyzers, fine-grained index control, or extremely high-throughput ingest, graduating to managed OpenSearch is a straightforward migration — but start serverless and earn that complexity.</p>
<hr />
<h2>Step 2: The AI-Assisted Capture Flow</h2>
<p>This is the most important piece of the puzzle, because all the infrastructure in the world is worthless if teams don't actually generate decision records.</p>
<p>The approach here is to hook into the existing site selection flow. When a team confirms a site selection in the React app, the frontend emits a <code>SITE_SELECTED</code> event to the backend, which spins up a short debrief conversation via the AI agent. The agent asks three to four targeted questions, collects the answers, and then formats a structured decision record document automatically. The team never fills out a form.</p>
<h3>React: Triggering the Capture Flow</h3>
<pre><code class="language-typescript">// src/hooks/useDecisionCapture.ts
import { useState, useCallback } from "react";

interface CaptureSession {
  sessionId: string;
  messages: Array&lt;{ role: "assistant" | "user"; content: string }&gt;;
  isComplete: boolean;
}

export function useDecisionCapture() {
  const [session, setSession] = useState&lt;CaptureSession | null&gt;(null);
  const [isLoading, setIsLoading] = useState(false);

  const startCapture = useCallback(
    async (siteId: string, selectionId: string) =&gt; {
      setIsLoading(true);
      try {
        const response = await fetch("/api/decisions/start-capture", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ siteId, selectionId }),
        });
        const data = await response.json();
        setSession({
          sessionId: data.sessionId,
          messages: [{ role: "assistant", content: data.firstQuestion }],
          isComplete: false,
        });
      } finally {
        setIsLoading(false);
      }
    },
    []
  );

  const sendAnswer = useCallback(
    async (answer: string) =&gt; {
      if (!session) return;
      setIsLoading(true);
      try {
        const response = await fetch("/api/decisions/respond", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ sessionId: session.sessionId, answer }),
        });
        const data = await response.json();
        setSession((prev) =&gt;
          prev
            ? {
                ...prev,
                messages: [
                  ...prev.messages,
                  { role: "user", content: answer },
                  { role: "assistant", content: data.nextMessage },
                ],
                isComplete: data.isComplete,
              }
            : null
        );
      } finally {
        setIsLoading(false);
      }
    },
    [session]
  );

  return { session, isLoading, startCapture, sendAnswer };
}
</code></pre>
<pre><code class="language-tsx">// src/components/DecisionCapture.tsx
import { useDecisionCapture } from "../hooks/useDecisionCapture";

interface Props {
  siteId: string;
  selectionId: string;
  onComplete: () =&gt; void;
}

export function DecisionCapture({ siteId, selectionId, onComplete }: Props) {
  const { session, isLoading, startCapture, sendAnswer } = useDecisionCapture();
  const [input, setInput] = useState("");

  useEffect(() =&gt; {
    startCapture(siteId, selectionId);
  }, [siteId, selectionId]);

  useEffect(() =&gt; {
    if (session?.isComplete) {
      setTimeout(onComplete, 1500);
    }
  }, [session?.isComplete]);

  return (
    &lt;div className="decision-capture-panel"&gt;
      &lt;h3&gt;Help us capture why you chose this site&lt;/h3&gt;
      &lt;p className="subtext"&gt;
        Just a few quick questions — your answers build the team's knowledge base
      &lt;/p&gt;
      &lt;div className="messages"&gt;
        {session?.messages.map((msg, i) =&gt; (
          &lt;div key={i} className={`message ${msg.role}`}&gt;
            {msg.content}
          &lt;/div&gt;
        ))}
      &lt;/div&gt;
      {!session?.isComplete &amp;&amp; (
        &lt;form
          onSubmit={(e) =&gt; {
            e.preventDefault();
            sendAnswer(input);
            setInput("");
          }}
        &gt;
          &lt;input
            value={input}
            onChange={(e) =&gt; setInput(e.target.value)}
            placeholder="Type your answer..."
            disabled={isLoading}
          /&gt;
          &lt;button type="submit" disabled={isLoading || !input.trim()}&gt;
            Send
          &lt;/button&gt;
        &lt;/form&gt;
      )}
      {session?.isComplete &amp;&amp; (
        &lt;p className="complete"&gt;✓ Decision captured — thanks!&lt;/p&gt;
      )}
    &lt;/div&gt;
  );
}
</code></pre>
<p>The key UX principle here is framing. "Help us capture why you chose this site" is far less threatening than "Fill out this form." Keeping the capture panel lightweight and conversational is what gets adoption.</p>
<h3>Node.js: API Routes for the Capture Session</h3>
<p>The Node.js layer is thin — its job is to create a session ID, proxy requests to the Python Lambda, and clean up the session map when the capture completes. Two routes: <code>POST /api/decisions/start-capture</code> initializes the session and returns the first question, and <code>POST /api/decisions/respond</code> passes each answer through and returns the next prompt or the completion signal.</p>
<pre><code class="language-typescript">// src/api/decisions.ts (Express routes — full implementation in repo)
import express from "express";
import { v4 as uuidv4 } from "uuid";
import { Lambda } from "@aws-sdk/client-lambda";

const router = express.Router();
const lambda = new Lambda({ region: "us-east-1" });

// In production, replace this Map with DynamoDB or ElastiCache
const captureSessions = new Map&lt;string, { siteId: string; selectionId: string }&gt;();

router.post("/start-capture", async (req, res) =&gt; {
  const { siteId, selectionId } = req.body;
  const sessionId = uuidv4();
  captureSessions.set(sessionId, { siteId, selectionId });

  const result = await lambda.invoke({
    FunctionName: "decision-capture-handler",
    Payload: JSON.stringify({ action: "start", sessionId, siteId, selectionId }),
  });

  const response = JSON.parse(Buffer.from(result.Payload!).toString());
  res.json({ sessionId, firstQuestion: response.message });
});

router.post("/respond", async (req, res) =&gt; {
  const { sessionId, answer } = req.body;
  const sessionData = captureSessions.get(sessionId);
  if (!sessionData) return res.status(404).json({ error: "Session not found" });

  const result = await lambda.invoke({
    FunctionName: "decision-capture-handler",
    Payload: JSON.stringify({ action: "respond", sessionId, answer, ...sessionData }),
  });

  const response = JSON.parse(Buffer.from(result.Payload!).toString());
  if (response.isComplete) captureSessions.delete(sessionId);
  res.json(response);
});

export default router;
</code></pre>
<hr />
<h2>Step 3: The Python Lambda — Conducting the Capture Interview</h2>
<p>This Lambda is the heart of the system. It uses the Bedrock Converse API to run a multi-turn conversation with a specific goal: extract the team's decision rationale and package it into a structured document.</p>
<pre><code class="language-python"># lambdas/decision_capture_handler/handler.py
import json
import os
import boto3
from datetime import datetime, timezone
from typing import Any

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
s3 = boto3.client("s3", region_name="us-east-1")

SESSIONS_TABLE = dynamodb.Table(os.environ["SESSIONS_TABLE"])
DECISION_RECORDS_BUCKET = os.environ["DECISION_RECORDS_BUCKET"]
MODEL_ID = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"

CAPTURE_SYSTEM_PROMPT = """You are a knowledge capture specialist embedded in a research site selection application. 
Your job is to conduct a brief, friendly debrief with researchers after they select a study site.

Your goal is to extract, in natural conversation, the following information:
1. The primary factors that made this site the right choice
2. The main alternatives that were seriously considered and why they were passed over
3. Any constraints or risks that influenced the decision
4. What success looks like for this site selection

Rules:
- Ask one question at a time, never multiple at once
- Keep your questions concise and specific — researchers are busy
- After the researcher answers your 4th question, output a special JSON block formatted EXACTLY like this:

&lt;DECISION_RECORD&gt;
{
  "primary_factors": "summary of key selection factors",
  "alternatives_considered": "summary of alternatives and why they were rejected",
  "constraints_and_risks": "summary of constraints or risks",
  "success_criteria": "what success looks like",
  "raw_conversation_summary": "brief narrative summary of the full conversation"
}
&lt;/DECISION_RECORD&gt;

Then end with a brief, warm closing message thanking the researcher.

Start your first message with a single, direct opening question about the primary factors behind their choice. 
Do NOT introduce yourself or explain what you're doing — just ask the question naturally."""


def lambda_handler(event: dict[str, Any], context: Any) -&gt; dict[str, Any]:
    action = event["action"]
    session_id = event["sessionId"]
    site_id = event["siteId"]
    selection_id = event["selectionId"]

    if action == "start":
        return handle_start(session_id, site_id, selection_id)
    elif action == "respond":
        return handle_respond(session_id, event["answer"], site_id, selection_id)

    raise ValueError(f"Unknown action: {action}")


def handle_start(session_id: str, site_id: str, selection_id: str) -&gt; dict:
    """Initialize a capture session and get the first question."""
    response = bedrock.converse(
        modelId=MODEL_ID,
        system=[{"text": CAPTURE_SYSTEM_PROMPT}],
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "text": (
                            f"A researcher just selected Site {site_id} for their study. "
                            "Please begin the debrief."
                        )
                    }
                ],
            }
        ],
        inferenceConfig={"maxTokens": 300, "temperature": 0.3},
    )

    first_question = response["output"]["message"]["content"][0]["text"]

    # Persist the conversation history to DynamoDB
    SESSIONS_TABLE.put_item(
        Item={
            "sessionId": session_id,
            "siteId": site_id,
            "selectionId": selection_id,
            "messages": [
                {
                    "role": "user",
                    "content": (
                        f"A researcher just selected Site {site_id} for their study. "
                        "Please begin the debrief."
                    ),
                },
                {"role": "assistant", "content": first_question},
            ],
            "questionCount": 1,
            "ttl": int(datetime.now(timezone.utc).timestamp()) + 3600,  # 1hr TTL
        }
    )

    return {"message": first_question, "isComplete": False}


def handle_respond(
    session_id: str, answer: str, site_id: str, selection_id: str
) -&gt; dict:
    """Process a researcher's answer and return the next question or finalize."""
    session = SESSIONS_TABLE.get_item(Key={"sessionId": session_id})["Item"]
    messages = session["messages"]
    question_count = int(session["questionCount"])

    # Append the researcher's answer
    messages.append({"role": "user", "content": answer})

    # Build the Converse messages payload
    converse_messages = [
        {"role": msg["role"], "content": [{"text": msg["content"]}]}
        for msg in messages
    ]

    response = bedrock.converse(
        modelId=MODEL_ID,
        system=[{"text": CAPTURE_SYSTEM_PROMPT}],
        messages=converse_messages,
        inferenceConfig={"maxTokens": 600, "temperature": 0.3},
    )

    assistant_message = response["output"]["message"]["content"][0]["text"]
    messages.append({"role": "assistant", "content": assistant_message})

    # Check if the model has produced a completed decision record
    is_complete = "&lt;DECISION_RECORD&gt;" in assistant_message

    if is_complete:
        decision_data = extract_decision_record(assistant_message)
        closing_message = assistant_message.split("&lt;/DECISION_RECORD&gt;")[-1].strip()

        # Write the decision record to S3 asynchronously
        write_decision_record(
            site_id=site_id,
            selection_id=selection_id,
            decision_data=decision_data,
            full_messages=messages,
        )

        # Update session state
        SESSIONS_TABLE.update_item(
            Key={"sessionId": session_id},
            UpdateExpression="SET messages = :m, questionCount = :q",
            ExpressionAttributeValues={":m": messages, ":q": question_count + 1},
        )

        return {"message": closing_message or "Thanks! Decision captured.", "isComplete": True}

    # Update session and continue the conversation
    SESSIONS_TABLE.update_item(
        Key={"sessionId": session_id},
        UpdateExpression="SET messages = :m, questionCount = :q",
        ExpressionAttributeValues={":m": messages, ":q": question_count + 1},
    )

    return {"message": assistant_message, "isComplete": False}


def extract_decision_record(text: str) -&gt; dict:
    """Parse the JSON block the model produces when capture is complete."""
    start = text.find("&lt;DECISION_RECORD&gt;") + len("&lt;DECISION_RECORD&gt;")
    end = text.find("&lt;/DECISION_RECORD&gt;")
    json_block = text[start:end].strip()
    return json.loads(json_block)


def write_decision_record(
    site_id: str,
    selection_id: str,
    decision_data: dict,
    full_messages: list,
) -&gt; None:
    """Format and upload the decision record to S3 for KB ingestion."""
    timestamp = datetime.now(timezone.utc).isoformat()

    # Format as a rich markdown document — narrative text chunks better for RAG
    document = f"""# Site Selection Decision Record

**Site ID:** {site_id}
**Selection ID:** {selection_id}
**Captured At:** {timestamp}

## Why This Site Was Selected

{decision_data.get("primary_factors", "Not captured")}

## Alternatives Considered

{decision_data.get("alternatives_considered", "Not captured")}

## Constraints and Risks Acknowledged

{decision_data.get("constraints_and_risks", "Not captured")}

## Success Criteria

{decision_data.get("success_criteria", "Not captured")}

## Summary

{decision_data.get("raw_conversation_summary", "Not captured")}
"""

    s3_key = f"decision-records/site-{site_id}/{selection_id}.md"

    s3.put_object(
        Bucket=DECISION_RECORDS_BUCKET,
        Key=s3_key,
        Body=document.encode("utf-8"),
        ContentType="text/markdown",
        Metadata={
            "site-id": site_id,
            "selection-id": selection_id,
            "captured-at": timestamp,
        },
    )
</code></pre>
<p>A few design decisions here worth explaining. First, conversations are persisted to DynamoDB with a 1-hour TTL — this keeps the Lambda stateless while maintaining context across the multi-turn conversation without expensive in-memory solutions.</p>
<p><strong>A note on DynamoDB vs. Amazon Bedrock AgentCore Memory:</strong> AgentCore Memory (formerly Bedrock Agent memory store) is purpose-built for persistent, cross-session agent memory — it's the right tool when an agent needs to remember facts about a user or ongoing project across many separate conversations over weeks or months. For this use case, each capture session is short-lived (under 10 minutes) and completely self-contained. You just need a scratchpad to hold the conversation history between Lambda invocations for a single session, and then you're done with it. DynamoDB with a 1-hour TTL is cheaper, simpler, already in most teams' AWS footprints, and doesn't add a dependency on a newer managed service. That said, if you later extend this system so the AI agent builds a persistent <em>profile</em> of each researcher's preferences and patterns over time — that's exactly the use case AgentCore Memory is designed for, and it would be worth revisiting then.</p>
<p>Second, the decision record is formatted as markdown rather than JSON because text documents chunk and embed significantly better for retrieval. The headers give the embeddings semantic structure without requiring you to write a custom chunking strategy. Third, the system prompt uses a <code>&lt;DECISION_RECORD&gt;</code> XML tag as a completion signal rather than a fixed question count — this lets the model handle researchers who are more or less verbose without cutting them off mid-thought.</p>
<hr />
<h2>Step 4: Triggering the Bedrock Knowledge Base Sync</h2>
<p>Once the decision record lands in S3, you need to ingest it into the knowledge base. You can either run a scheduled sync or trigger it on every new document. For this use case — where capture happens infrequently but recency matters — an S3 event-triggered sync is the right call.</p>
<pre><code class="language-python"># lambdas/kb_sync_trigger/handler.py
import os
import boto3
from typing import Any

bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")

KNOWLEDGE_BASE_ID = os.environ["KNOWLEDGE_BASE_ID"]
DATA_SOURCE_ID = os.environ["DATA_SOURCE_ID"]


def lambda_handler(event: dict[str, Any], context: Any) -&gt; None:
    """Triggered by S3 PutObject events under decision-records/"""
    for record in event["Records"]:
        bucket = record["s3"]["bucket"]["name"]
        key = record["s3"]["object"]["key"]

        print(f"New decision record: s3://{bucket}/{key}")

        # Start an ingestion job — Bedrock will pick up all new/modified S3 objects
        response = bedrock_agent.start_ingestion_job(
            knowledgeBaseId=KNOWLEDGE_BASE_ID,
            dataSourceId=DATA_SOURCE_ID,
        )

        job_id = response["ingestionJob"]["ingestionJobId"]
        print(f"Started KB ingestion job: {job_id}")
</code></pre>
<p>Wire this Lambda up with an S3 event notification on <code>ObjectCreated</code> events under the <code>decision-records/</code> prefix. The ingestion job runs asynchronously — typically takes 30 to 90 seconds for small document sets — so the knowledge base is updated within a couple of minutes of each capture.</p>
<p>For the S3 notification configuration in CDK or CloudFormation:</p>
<pre><code class="language-python"># infrastructure/app_stack.py (CDK)
from aws_cdk import (
    aws_s3 as s3,
    aws_s3_notifications as s3_notifications,
    aws_lambda as lambda_,
)

decision_bucket.add_event_notification(
    s3.EventType.OBJECT_CREATED,
    s3_notifications.LambdaDestination(kb_sync_lambda),
    s3.NotificationKeyFilter(prefix="decision-records/"),
)
</code></pre>
<hr />
<h2>Step 5: Querying the Knowledge Base in the Agent</h2>
<p>Now comes the payoff. When a team is working through site selection and the agent is providing context, it should be able to retrieve relevant past decisions and surface them naturally. Here's how to query the knowledge base and integrate the results into the agent's response.</p>
<pre><code class="language-python"># lambdas/site_selection_agent/knowledge_retriever.py
import os
import boto3
from dataclasses import dataclass

bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

KNOWLEDGE_BASE_ID = os.environ["KNOWLEDGE_BASE_ID"]


@dataclass
class DecisionContext:
    relevant_decisions: list[str]
    sites_referenced: list[str]
    summary: str


def retrieve_relevant_decisions(query: str, site_ids: list[str]) -&gt; DecisionContext:
    """
    Retrieve past decision records relevant to the current selection context.
    
    Args:
        query: Natural language description of the current selection scenario
        site_ids: Site IDs being considered in the current decision
    
    Returns:
        DecisionContext with retrieved passages and a synthesized summary
    """
    # Build a rich retrieval query
    retrieval_query = (
        f"Research site selection decision: {query}. "
        f"Sites being evaluated: {', '.join(site_ids)}. "
        "What factors influenced past selections of these or similar sites? "
        "What alternatives were considered and why were they rejected?"
    )

    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=KNOWLEDGE_BASE_ID,
        retrievalQuery={"text": retrieval_query},
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "numberOfResults": 5,
                "overrideSearchType": "HYBRID",  # semantic + keyword
            }
        },
    )

    results = response["retrievalResults"]

    if not results:
        return DecisionContext(
            relevant_decisions=[],
            sites_referenced=[],
            summary="No prior decision records found for these sites.",
        )

    passages = [r["content"]["text"] for r in results]
    sites_referenced = list({
        r["metadata"].get("site-id", "unknown")
        for r in results
        if "metadata" in r
    })

    # Use Claude to synthesize the retrieved passages into useful context
    synthesis_prompt = f"""You are summarizing past research site selection decisions to help a team make a new decision.

Retrieved decision records:
{chr(10).join(f"---{chr(10)}{p}" for p in passages)}

Current selection context: {query}

Provide a concise summary (3-5 sentences) of what past decisions reveal about the sites being considered. 
Focus on factors that would be useful to the current team. Do not invent information not present in the records."""

    synthesis = bedrock.converse(
        modelId="global.anthropic.claude-haiku-4-5-20251001-v1:0",  # Haiku 4.5 via global inference profile
        messages=[{"role": "user", "content": [{"text": synthesis_prompt}]}],
        inferenceConfig={"maxTokens": 400, "temperature": 0.1},
    )

    return DecisionContext(
        relevant_decisions=passages,
        sites_referenced=sites_referenced,
        summary=synthesis["output"]["message"]["content"][0]["text"],
    )
</code></pre>
<pre><code class="language-python"># lambdas/site_selection_agent/handler.py
import json
import os
import boto3
from knowledge_retriever import retrieve_relevant_decisions

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
MODEL_ID = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"

AGENT_SYSTEM_PROMPT = """You are a research site selection assistant. You help research teams 
evaluate and select the most appropriate sites for their studies.

When you have access to past decision context, you should reference it explicitly — 
e.g., "Based on past selections, teams have found that Site 7's proximity to..."

Always cite the institutional context you're drawing on so teams know this is validated 
experience rather than general advice."""


def lambda_handler(event: dict, context) -&gt; dict:
    user_message = event["message"]
    site_ids = event.get("siteIds", [])
    conversation_history = event.get("history", [])

    # Retrieve relevant past decisions from the knowledge base
    decision_context = retrieve_relevant_decisions(
        query=user_message,
        site_ids=site_ids,
    )

    # Inject the retrieved context into the system prompt
    augmented_system = AGENT_SYSTEM_PROMPT
    if decision_context.relevant_decisions:
        augmented_system += f"""

## Institutional Knowledge from Past Decisions

{decision_context.summary}

The following sites have prior decision records available: {', '.join(decision_context.sites_referenced)}.
Use this context to ground your recommendations in your team's actual experience."""

    # Build the messages payload
    messages = conversation_history + [
        {"role": "user", "content": [{"text": user_message}]}
    ]

    response = bedrock.converse(
        modelId=MODEL_ID,
        system=[{"text": augmented_system}],
        messages=messages,
        inferenceConfig={"maxTokens": 800, "temperature": 0.4},
    )

    assistant_reply = response["output"]["message"]["content"][0]["text"]

    return {
        "reply": assistant_reply,
        "contextUsed": len(decision_context.relevant_decisions) &gt; 0,
        "sitesWithHistory": decision_context.sites_referenced,
    }
</code></pre>
<p>The <code>HYBRID</code> search mode in the retrieval call is deliberate. Pure semantic search is great for conceptual similarity but can miss exact site ID matches when a team asks specifically about Site 7. Hybrid search combines vector similarity with BM25 keyword matching, so you get both the semantic richness and the precision.</p>
<hr />
<h2>Step 6: Surfacing the Context in the UI</h2>
<p>The final piece is making the retrieved institutional knowledge visible to the team in a way that builds trust and encourages continued use of the capture feature.</p>
<pre><code class="language-tsx">// src/components/SiteSelectionAgent.tsx
interface AgentResponse {
  reply: string;
  contextUsed: boolean;
  sitesWithHistory: string[];
}

function AgentResponseCard({ response }: { response: AgentResponse }) {
  return (
    &lt;div className="agent-response"&gt;
      &lt;p&gt;{response.reply}&lt;/p&gt;
      {response.contextUsed &amp;&amp; (
        &lt;div className="context-badge"&gt;
          &lt;span className="icon"&gt;🗂&lt;/span&gt;
          &lt;span&gt;
            Drawing on past decisions for{" "}
            {response.sitesWithHistory.join(", ")}
          &lt;/span&gt;
        &lt;/div&gt;
      )}
    &lt;/div&gt;
  );
}
</code></pre>
<p>The <code>contextUsed</code> badge is small but important. It tells the team that the agent's response is grounded in their organization's actual history, not just general knowledge. That provenance signal is what builds confidence in the system over time — and what motivates teams to keep doing the 90-second debrief.</p>
<hr />
<h2>Best Practices and Common Pitfalls</h2>
<p><strong>Document formatting matters more than you'd think.</strong> The Bedrock Knowledge Base will chunk whatever you put in S3, and how you format that content directly affects retrieval quality. Markdown with descriptive headers (<code>## Why This Site Was Selected</code>) gives the embedding model semantic anchors. A flat JSON blob gives it nothing. Always store decision records as human-readable narrative text.</p>
<p><strong>Keep capture sessions short and the questions specific.</strong> The system prompt used here targets four questions. You can tune this, but resist the temptation to ask for more. Teams that feel the debrief is too long will start dismissing the capture panel. Four focused questions with follow-ups beats ten exhaustive ones with no answers.</p>
<p><strong>Use metadata filters for scoping.</strong> As the knowledge base grows, you'll want to allow queries that are scoped to specific time ranges, research programs, or geographic regions. Store those attributes as S3 object metadata and use Bedrock's <code>filter</code> parameter in <code>retrieve()</code> calls to prevent knowledge from one program contaminating recommendations for another.</p>
<p><strong>Plan for knowledge base latency in the selection flow.</strong> The <code>retrieve()</code> call typically takes 300 to 700ms. For the selection assistant context panel — which is displayed while the team is evaluating options — this is fine. Don't put a KB retrieval call in the critical path of a page load.</p>
<p><strong>Monitor ingestion job failures.</strong> The <code>start_ingestion_job</code> call fires and forgets. Wire up a CloudWatch alarm on the <code>bedrock:IngestionJobFailed</code> metric so you know when a document failed to ingest. Document parsing errors are the most common culprit, usually from encoding issues in the uploaded text.</p>
<p><strong>Seed the knowledge base before launch.</strong> If you have historical meeting notes, email threads, or any documentation about past site selections, process and upload them before the app goes live. A knowledge base that returns zero results for the first three months trains users that it's useless. Even 20 to 30 bootstrapped decision records make a significant difference to early retrieval quality.</p>
<hr />
<h2>Where This Pattern Applies</h2>
<p>Research site selection is the lens we used, but the Institutional Memory Loop is domain-agnostic. Any workflow where teams make high-stakes judgment calls that the organization wants to learn from over time is a candidate. A few concrete translations:</p>
<p><strong>Clinical trial site selection.</strong> The same problem, higher stakes. CROs and sponsors evaluate dozens of sites per study and rarely document why Site A got the nod over Site B — investigator experience, patient retention history, IRB turnaround, local regulatory climate. That reasoning is enormously valuable to the next team running a similar indication. The capture and retrieval architecture is identical; only the domain vocabulary in the system prompt changes.</p>
<p><strong>Architectural Decision Records (ADRs).</strong> Most engineering teams treat ADRs as a documentation chore that happens after the decision, if at all. Embedding a capture step directly into the pull request or design review workflow — where an agent asks "What alternatives did you consider and why did you rule them out?" — produces ADRs that are actually filled out, indexed, and retrievable when the next engineer is making the same tradeoff eighteen months later.</p>
<p><strong>Vendor and procurement selection.</strong> Procurement teams evaluate vendors constantly and keep almost none of the comparative reasoning. Why did Security team veto Vendor X? Why did the integration story on Vendor Y win over Vendor Z's lower price? That context prevents organizations from re-evaluating the same vendor over and over, or re-learning the same lessons about a category. A post-selection debrief in the procurement tool closes that loop.</p>
<p><strong>GTM and pricing decisions.</strong> Why did a drug go to market at $X? Why was a particular region sequenced before another? These decisions involve complex tradeoffs across regulatory, commercial, and manufacturing inputs, and the reasoning is almost never written down in a form that the next product team can find and learn from. A knowledge base seeded with past decision rationale gives new teams a starting point that isn't blank.</p>
<p>The common thread across all of these: the value isn't in the decision itself — it's in the reasoning that produced it. The Institutional Memory Loop is just a systematic way of making sure that reasoning survives the people who held it.</p>
<hr />
<h2>Wrapping Up</h2>
<p>Three things determine whether the Institutional Memory Loop actually works in practice: making capture frictionless (which is why the AI-assisted debrief beats a form), formatting documents for retrieval quality (which is why markdown with headers beats flat JSON), and making retrieved knowledge visible and attributed in the UI so teams trust it and keep feeding it.</p>
<p>Systems don’t become intelligent when they store more data. They become intelligent when they remember why decisions were made. The only way that happens is if you design for it.</p>
<hr />
<p><em>Code samples in this post use <code>global.anthropic.claude-sonnet-4-5-20250929-v1:0</code> (Claude Sonnet 4.5 global inference profile, launched September 30, 2025) for the main agent, <code>global.anthropic.claude-haiku-4-5-20251001-v1:0</code> (Claude Haiku 4.5 global inference profile) for the synthesis step, and <code>amazon.titan-embed-text-v2:0</code> for embeddings. The <code>global.</code> prefix routes requests across AWS regions automatically for higher availability — see the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles.html">Amazon Bedrock cross-region inference documentation</a> for details. Check the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html">Amazon Bedrock model IDs documentation</a> for the latest available model strings in your region.</em></p>
<p><em>Lambda functions are deployed with <strong>Python 3.12</strong> runtime minimum (identifier: <code>python3.12</code>, based on Amazon Linux 2023). Python 3.13 (<code>python3.13</code>, released November 2024) and Python 3.14 (<code>python3.14</code>, released November 2025) are also available on Lambda and both run on Amazon Linux 2023 — either is a solid choice if you want the latest language features. Python 3.11 and earlier are on an older Amazon Linux 2 base and approaching end of support; there's no reason to target them for new projects. All code in this post is compatible with 3.12 through 3.14. Regardless of runtime version, package <code>boto3</code> in your deployment zip or use the <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html">AWS-provided managed layer</a>.</em></p>
]]></content:encoded></item><item><title><![CDATA[Why Vector Search Alone Fails on Regulatory Documents ]]></title><description><![CDATA[TL;DR
Vector search is great at finding semantically similar content, but it struggles with broad keyword-heavy queries common in regulatory and clinical document search. By combining BM25 keyword mat]]></description><link>https://blog.gainesai.com/why-vector-search-alone-fails-on-regulatory-documents</link><guid isPermaLink="true">https://blog.gainesai.com/why-vector-search-alone-fails-on-regulatory-documents</guid><category><![CDATA[RAG ]]></category><category><![CDATA[hybrid search]]></category><category><![CDATA[opensearch]]></category><category><![CDATA[Amazon Bedrock]]></category><category><![CDATA[embedding]]></category><category><![CDATA[nlp]]></category><category><![CDATA[Retrieval-Augmented Generation]]></category><category><![CDATA[regulatory,]]></category><dc:creator><![CDATA[James Gaines]]></dc:creator><pubDate>Tue, 28 Apr 2026 20:31:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69d828c8fa7251682e0c6f85/7f0c0feb-580f-4816-8ef1-c17ab96e206a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>TL;DR</h2>
<p>Vector search is great at finding semantically similar content, but it struggles with broad keyword-heavy queries common in regulatory and clinical document search. By combining BM25 keyword matching with k-NN vector search and fusing results with Reciprocal Rank Fusion (RRF), we improved retrieval quality, particularly for recall-heavy queries common in compliance and safety workflows.</p>
<hr />
<h2>The Problem Nobody Warns You About</h2>
<p>If you've built a RAG (Retrieval-Augmented Generation) system, you've probably followed the standard playbook: chunk your documents, embed them with a model like Titan or OpenAI, store the vectors, and retrieve the top-K most similar chunks at query time.</p>
<p>This works beautifully for specific questions:</p>
<blockquote>
<p><em>"What are the contraindications for apixaban?"</em></p>
</blockquote>
<p>The embedding model understands the semantic intent. It finds chunks that discuss apixaban contraindications even if the exact word "contraindications" doesn't appear. Vector search shines here.</p>
<p>But try this query:</p>
<blockquote>
<p><em>"Which drugs have bleeding warnings?"</em></p>
</blockquote>
<p>Suddenly, vector search struggles. Why? Because this query needs to match across <em>many</em> documents, finding every drug that mentions "bleeding" in a warnings context. The embedding for this query lands in a vague region of the vector space — it's semantically close to lots of things about bleeding, warnings, drugs, and adverse events, but not specifically close to any one document's bleeding warning section.</p>
<p>The result: you get back 3-4 chunks from one or two drugs, miss several others entirely, and the generated answer is incomplete.</p>
<p>This isn't a theoretical problem. In regulatory document search — FDA drug labels, clinical protocols, safety reports — the most important queries are often broad: <em>"List all drugs with boxed warnings"</em>, <em>"Which protocols require liver function monitoring?"</em>, <em>"What drugs interact with anticoagulants?"</em>. These are exactly the queries where vector-only search falls short.</p>
<h2>Why Vector Search Fails on Broad Queries</h2>
<p>To understand the failure, think about what an embedding model actually does. It compresses a chunk of text into a fixed-size vector (say, 1024 dimensions) that captures the <em>overall semantic meaning</em>. Two chunks about apixaban dosing will have similar vectors. A chunk about warfarin bleeding warnings and a chunk about apixaban bleeding warnings will have <em>somewhat</em> similar vectors — but they'll also be similar to chunks about bleeding in general, surgical bleeding, GI bleeding from NSAIDs, and so on.</p>
<p>When you search for "Which drugs have bleeding warnings?", the k-NN search returns the chunks whose vectors are closest to the query vector. But "closest" in a 1024-dimensional space doesn't mean "contains the keywords 'bleeding' and 'warnings' in a drug label context." It means "semantically similar to the general concept of drugs and bleeding warnings." The distinction matters.</p>
<p>BM25 (the classic keyword search algorithm) doesn't have this problem. It looks for the actual terms "bleeding" and "warnings" in the text, scores documents by term frequency and inverse document frequency, and ranks and surfaces documents containing those terms, typically providing strong recall for keyword-driven queries. It's not smart about paraphrasing, but it's thorough.</p>
<p>The insight: <strong>Vector search is semantically precise for focused queries but can under-retrieve on recall-heavy, cross-document queries.</strong></p>
<h2>The Solution: Hybrid Search with RRF Fusion</h2>
<p>Hybrid search runs both retrieval methods in parallel and merges the results. The architecture looks like this:</p>
<img src="https://raw.githubusercontent.com/jrgwv/blog-assets/main/diagrams/hybrid-search/hybrid-search-rrf.jpg" alt="Hybrid Search with RRF Fusion" style="display:block;margin:0 auto" />

<p>The key is the fusion step. Reciprocal Rank Fusion (RRF) is elegantly simple:</p>
<p><code>score(doc) = \sum \frac{1}{k + rank_i(doc)}</code></p>
<pre><code class="language-python">def rrf_fusion(result_lists, k=60):
    scores = {}

    for results in result_lists:
        for rank, doc_id in enumerate(results):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)
</code></pre>
<p>For each result list, a document's contribution to its final score is <code>1 / (k + rank + 1)</code>, where <code>k</code> is a constant (typically 60) and <code>rank</code> is its position in that list. A document ranked #1 in BM25 and #3 in k-NN gets a higher fused score than a document ranked #1 in k-NN alone.</p>
<p>RRF has a useful property: it avoids the need for score normalization across retrieval methods. BM25 scores and k-NN distances are on completely different scales, but RRF only uses rank positions, so they combine cleanly.</p>
<h3>Real Results</h3>
<p>Here's what the same queries return with hybrid search enabled, against a corpus of 142 FDA drug labels (~10,800 chunks):</p>
<table>
<thead>
<tr>
<th>Query Type</th>
<th>Vector Only</th>
<th>Hybrid Search</th>
</tr>
</thead>
<tbody><tr>
<td>Broad recall queries</td>
<td>Misses entities</td>
<td>High coverage across documents</td>
</tr>
<tr>
<td>Cross-document queries</td>
<td>Partial results</td>
<td>More complete retrieval</td>
</tr>
<tr>
<td>Specific semantic queries</td>
<td>Strong</td>
<td>Strong (slight boost from BM25)</td>
</tr>
</tbody></table>
<p><strong>Broad query (BM25 strength):</strong></p>
<blockquote>
<p><em>"Which drugs have bleeding warnings?"</em></p>
</blockquote>
<p>Hybrid search found <strong>Naproxen</strong> (NSAID with GI bleeding boxed warning) and <strong>Apixaban</strong> (anticoagulant with bleeding warnings and spinal hematoma boxed warning), with detailed citations from multiple label sections. Vector-only search returned only apixaban.</p>
<p><strong>Cross-drug query (hybrid strength):</strong></p>
<blockquote>
<p><em>"Which drugs have black box warnings about suicidal thoughts?"</em></p>
</blockquote>
<p>Hybrid search found <strong>Pregabalin</strong>, <strong>Gabapentin</strong> (antiepileptic drug warnings), <strong>Duloxetine</strong> (antidepressant warning), and <strong>Quetiapine</strong> (antipsychotic warning) — four different drug classes, each with the specific suicidality boxed warning language. Vector-only search returned only two of these.</p>
<p><strong>Specific query (k-NN strength):</strong></p>
<blockquote>
<p><em>"What is the recommended dosage for apixaban in atrial fibrillation?"</em></p>
</blockquote>
<p>Both approaches work well here, but hybrid search still benefits from BM25 boosting chunks that contain the exact terms "dosage", "apixaban", and "atrial fibrillation."</p>
<h2>Choosing Your Embedding Dimensions</h2>
<p>Amazon Titan Text Embeddings V2 offers three output dimensions: 256, 512, and 1024. This is a decision you make once — changing dimensions later means re-embedding and re-indexing your entire corpus.</p>
<p>Representative results from public benchmarks and AWS sample evaluations show:</p>
<table>
<thead>
<tr>
<th>Dimension</th>
<th>MTEB Retrieval (NDCG@10)</th>
<th>Retention vs 1024</th>
<th>Storage per vector</th>
</tr>
</thead>
<tbody><tr>
<td>1024</td>
<td>0.51</td>
<td>100% (baseline)</td>
<td>4 KB</td>
</tr>
<tr>
<td>512</td>
<td>~0.505</td>
<td>99.0%</td>
<td>2 KB</td>
</tr>
<tr>
<td>256</td>
<td>~0.494</td>
<td>96.8%</td>
<td>1 KB</td>
</tr>
</tbody></table>
<p><em>Sources: MTEB Retrieval benchmarks; AWS Titan V2 sample notebooks;</em> <a href="https://aws.amazon.com/blogs/machine-learning/build-cost-effective-rag-applications-with-binary-embeddings-in-amazon-titan-text-embeddings-v2-amazon-opensearch-serverless-and-amazon-bedrock-knowledge-bases/"><em>AWS Machine Learning Blog on Titan embeddings and OpenSearch</em></a><em>. Exact performance varies by dataset and query distribution.</em></p>
<p>For most use cases, 512 is the sweet spot — you lose only 1% retrieval accuracy and halve your storage. But for regulatory and clinical document search, I'd argue for 1024. Here's why:</p>
<p>The 3.2% accuracy gap between 256 and 1024 sounds small, but in a compliance context — where missing a relevant safety finding matters — even small retrieval quality differences add up across a large corpus.</p>
<p>The storage argument for lower dimensions also doesn't hold at typical regulatory corpus sizes. 100,000 vectors at 1024 dimensions is ~400 MB. That's nothing for an OpenSearch cluster. You'd need millions of vectors before the storage difference between 512 and 1024 becomes meaningful.</p>
<p>One more thing: Titan V2 at 1024 dimensions actually outperforms Titan V1 at 1536 dimensions on retrieval benchmarks (0.51 vs 0.47 NDCG@10) (<a href="https://github.com/aws-samples/amazon-bedrock-samples/blob/main/embeddings/Titan-V2-Embeddings.ipynb">source</a>). More dimensions isn't always better — the V2 training improvements matter more than the extra 512 dimensions.</p>
<h2>The Chunking Problem Nobody Talks About</h2>
<p>There's a second retrieval quality issue that's independent of vector vs. keyword search: <strong>chunk boundaries</strong>.</p>
<p>Most RAG tutorials show you how to split text into fixed-size windows (500 words, 1000 tokens, etc.) with some overlap. This works fine for blog posts and Wikipedia articles. It performs poorly for structured documents.</p>
<p>A 200-page clinical protocol has sections like Inclusion Criteria, Exclusion Criteria, Primary Endpoints, Adverse Events. With fixed 512-word chunking, a chunk boundary can land right in the middle of the Exclusion Criteria section:</p>
<pre><code class="language-text">Chunk 47: [...tail end of Inclusion Criteria...]
          [...first half of Exclusion Criteria...]

Chunk 48: [...second half of Exclusion Criteria...]
          [...start of Dosing Schedule...]
</code></pre>
<p>When someone asks <em>"What are the exclusion criteria?"</em>, Chunk 47 is a partial match polluted with inclusion criteria text. The embedding blends both concepts. BM25 matches "criteria" in both the inclusion and exclusion parts. The answer is confused.</p>
<p><strong>Section-aware chunking</strong> solves this by detecting section boundaries (numbered headings, font changes, structural markers) and chunking at those boundaries instead of at fixed word counts. Each chunk is a complete section or sub-section, with metadata recording its position in the document hierarchy:</p>
<pre><code class="language-json">{
  "text": "Patients are excluded if they have any of the following...",
  "sectionPath": ["6. Study Population", "6.2 Exclusion Criteria"]
}
</code></pre>
<p>This gives you:</p>
<ul>
<li><p><strong>Cleaner embeddings</strong> — each vector represents one coherent concept, not a blend of two adjacent sections</p>
</li>
<li><p><strong>Better BM25 matching</strong> — section headers like "Exclusion Criteria" are in the same chunk as the criteria themselves</p>
</li>
<li><p><strong>Filtered search</strong> — you can restrict queries to specific section types ("search only in Adverse Events sections")</p>
</li>
</ul>
<p>Recent work on hybrid and multi-stage retrieval systems shows consistent gains in top-K retrieval quality compared to single-method approaches (e.g., DIRSRT, 2026). In practice, sentence-boundary chunking often matches more complex semantic chunking approaches at lower computational cost for many workloads. It's one of the highest-impact optimizations you can make after implementing hybrid search — though the gap between strategies is typically single-digit percentage points, not transformative on its own.</p>
<h2>Architecture at a Glance</h2>
<p>The full system runs on AWS with minimal operational overhead:</p>
<img src="https://raw.githubusercontent.com/jrgwv/blog-assets/main/diagrams/hybrid-search/aws-hybrid-search-architecture-matching.png" alt="AWS Architecture" style="display:block;margin:0 auto" />

<p>The ingestion pipeline is fully serverless (Lambda, Step Functions, EventBridge). The search layer uses an OpenSearch managed domain — AWS handles patching, backups, and snapshots, but you're provisioning and sizing the cluster (instance types, node counts, storage). This is a deliberate choice: the managed domain gives us UltraWarm tiering for cost-effective warm storage and full control over the k-NN index configuration (HNSW engine, space type, shard count) that OpenSearch Serverless doesn't yet expose.</p>
<h2>Key Takeaways</h2>
<ol>
<li><p><strong>Vector search alone isn't enough for document-heavy RAG.</strong> If your queries include broad searches across many documents, you need keyword matching too.</p>
</li>
<li><p><strong>RRF fusion is simple and effective.</strong> You don't need a learned ranker or complex score normalization. Rank-based fusion with k=60 works remarkably well.</p>
</li>
<li><p><strong>1024 dimensions is the right choice for Titan V2 when precision matters.</strong> The storage savings of 256 or 512 are irrelevant at typical enterprise corpus sizes. The accuracy difference isn't.</p>
</li>
<li><p><strong>Chunk boundaries matter more than chunk size.</strong> For structured documents, chunk at section boundaries, not at fixed word counts. The gains are consistent but modest — expect single-digit percentage-point improvements, not a silver bullet.</p>
</li>
<li><p><strong>Hybrid search + section-aware chunking is the combination that unlocks regulatory document search.</strong> Either one alone is a partial fix. Together, they handle both the broad keyword queries and the specific semantic queries that compliance teams need.</p>
</li>
</ol>
<hr />
<h2>Try It Yourself</h2>
<p>The hybrid search pattern described here works with any OpenSearch cluster that has the k-NN plugin enabled. The key components:</p>
<ul>
<li><p><strong>Amazon Bedrock Titan V2</strong> for embeddings (configurable dimensions)</p>
</li>
<li><p><strong>OpenSearch</strong> with k-NN plugin (HNSW FAISS) for vector storage + BM25</p>
</li>
<li><p><strong>RRF fusion</strong> — ~20 lines of code, no dependencies</p>
</li>
</ul>
<pre><code class="language-python">def rrf_fusion(result_lists, k=60):
    scores = {}
    for results in result_lists:
        for rank, doc_id in enumerate(results):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)
</code></pre>
<ul>
<li><strong>Any LLM</strong> for the generation step (Claude, GPT-4, Llama, etc.)</li>
</ul>
<p>If you're building a RAG system for structured documents — regulatory filings, clinical protocols, legal contracts, technical specifications — hybrid search with section-aware chunking is worth the investment. The improvement on broad, recall-heavy queries is often substantial in practice.</p>
<hr />
<p><em>Have questions or want to discuss hybrid search architectures? Connect with me on</em> <a href="www.linkedin.com/in/james-gaines-3223b79"><em>LinkedIn</em></a><em>.</em></p>
<p><em>The views expressed in this post are my own and do not represent those of my employer.</em></p>
]]></content:encoded></item><item><title><![CDATA[LangGraph, Strands, AgentCore — and the Patterns That Actually Matter in 2026]]></title><description><![CDATA[10 min read · April 2026 · Python · AWS · Bedrock
We're at the Microservices Moment for AI
The Landscape
In the early days of cloud architecture, teams built monolithic applications and eventually lea]]></description><link>https://blog.gainesai.com/langgraph-strands-agentcore-and-the-patterns-that-actually-matter-in-2026</link><guid isPermaLink="true">https://blog.gainesai.com/langgraph-strands-agentcore-and-the-patterns-that-actually-matter-in-2026</guid><category><![CDATA[AWS]]></category><category><![CDATA[ai agents]]></category><dc:creator><![CDATA[James Gaines]]></dc:creator><pubDate>Sun, 12 Apr 2026 23:51:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69d828c8fa7251682e0c6f85/1a587740-3ba1-4380-a21f-8a06efc7aeed.svg" length="0" type="image/jpeg"/><content:encoded><![CDATA[

<p><em>10 min read · April 2026 · Python · AWS · Bedrock</em></p>
<h2>We're at the Microservices Moment for AI</h2>
<p><strong>The Landscape</strong></p>
<p>In the early days of cloud architecture, teams built monolithic applications and eventually learned — sometimes painfully — that decomposing them into services was the right long-term bet. Agentic AI is going through exactly the same transition right now. <strong>Single all-purpose agents are giving way to orchestrated systems of specialized agents.</strong></p>
<p>Gartner measured a <strong>1,445% surge</strong> in enterprise multi-agent system inquiries from Q1 2024 to Q2 2025 (Gartner, December 2025). By end of 2026, <a href="https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025">40% of enterprise applications are projected to embed AI agents</a> — up from less than 5% in 2025. The frameworks and patterns you pick today will define your architectural ceiling for years.</p>
<blockquote>
<p><strong>The hard truth:</strong> Gartner predicts <a href="https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027">over 40% of agentic AI projects will be canceled</a> by end of 2027 due to escalating costs and inadequate risk controls. The failures originate not in bad models, but in bad orchestration design — agents that are individually capable, but poorly coordinated, still fail. Framework choice is a first-class architectural decision.</p>
</blockquote>
<p>This post cuts through the noise and maps the major frameworks to the patterns they serve best — with practical guidance for teams building on AWS.</p>
<h2>The Real Contenders in 2026</h2>
<p><strong>The Frameworks</strong></p>
<p>The field has consolidated. LangChain is partially deprecated as a primary agent runtime — its latest versions actually run on LangGraph under the hood. The meaningful choices today are:</p>
<table>
<thead>
<tr>
<th>Framework</th>
<th>Philosophy</th>
<th>Best For</th>
</tr>
</thead>
<tbody><tr>
<td><strong>LangGraph</strong></td>
<td>You design the graph, model executes it</td>
<td>Deterministic, auditable, stateful workflows</td>
</tr>
<tr>
<td><strong>AWS Strands</strong></td>
<td>Model decides the graph, you provide tools</td>
<td>AWS-native PoCs, AgentCore deployment, fast iteration</td>
</tr>
<tr>
<td><strong>Bedrock AgentCore</strong></td>
<td>Managed runtime platform (not a framework)</td>
<td>Hosting, governance, identity, memory for any framework</td>
</tr>
<tr>
<td><strong>Agent Squad</strong></td>
<td>AWS routing library for multi-agent scale</td>
<td>When Strands outgrows a single-orchestrator topology</td>
</tr>
</tbody></table>
<blockquote>
<p><strong>The core mental model:</strong>
<em>"LangGraph says: you design the graph, the model executes it. Strands says: you give the model tools and let it figure out the graph itself."</em></p>
</blockquote>
<p>The axis isn't quality — it's <strong>developer control vs. model autonomy</strong>. Both are valid production choices. The wrong choice is picking one based on hype rather than your workflow's actual requirements.</p>
<h2>Bedrock AgentCore: The Shift That Changes Everything</h2>
<p><strong>Platform Layer</strong></p>
<p>The most important thing to understand about AgentCore is that it represents a <strong>categorical shift</strong> in what AWS is offering. Amazon Bedrock is no longer just a model hosting service. It is now the control plane, governance layer, and runtime that makes autonomous AI deployable in organizations with real risk profiles.</p>
<p>AgentCore <a href="https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-agentcore-is-now-generally-available/">reached general availability in October 2025</a> and works with <em>any</em> open-source framework — LangGraph, Strands, CrewAI, LlamaIndex, OpenAI Agents SDK — and any foundation model inside or outside Bedrock.</p>
<h3>AgentCore Service Map</h3>
<table>
<thead>
<tr>
<th>Service</th>
<th>What It Does</th>
<th>Why It Matters</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Runtime</strong></td>
<td>Serverless execution, 8-hour long-running tasks, session isolation, bidirectional streaming</td>
<td>Eliminates infra management; handles multi-step workflows that outlive a single HTTP response</td>
</tr>
<tr>
<td><strong>Memory</strong></td>
<td>Session + long-term episodic memory; agents learn from prior interactions</td>
<td>Enables multi-day workflows; agents remember what failed and what worked</td>
</tr>
<tr>
<td><strong>Gateway</strong></td>
<td>Converts APIs to MCP-compatible tools; intercepts tool calls for policy enforcement</td>
<td>MCP-first architecture; bridges legacy APIs without rebuilding them</td>
</tr>
<tr>
<td><strong>Identity</strong></td>
<td>Cognito, Entra ID, Okta integration; OAuth vault; multi-tenant custom claims</td>
<td>Agents act on behalf of users or autonomously with proper IAM</td>
</tr>
<tr>
<td><strong>Policy</strong> ✅ GA</td>
<td>Real-time tool call interception using natural language or Cedar policies</td>
<td>Enforces compliance boundaries without custom guardrail code</td>
</tr>
<tr>
<td><strong>Evaluations</strong> ✅ GA</td>
<td>13 pre-built evaluators; continuous CloudWatch monitoring</td>
<td>Shifts agent quality from manual spot-checks to a full DevOps lifecycle</td>
</tr>
<tr>
<td><strong>Observability</strong></td>
<td>Step-by-step execution visualization; OTEL-compatible; integrates with Langfuse, Datadog, Arize</td>
<td>Production-grade tracing; regulated industries can audit every step</td>
</tr>
</tbody></table>
<blockquote>
<p>🔒 <strong>For regulated industries:</strong> AgentCore Policy uses Cedar — AWS's open-source policy language — to intercept every tool call in real time. Natural language policy definitions auto-convert to Cedar, making compliance boundaries auditable by non-engineers. <a href="https://aws.amazon.com/about-aws/whats-new/2026/03/policy-amazon-bedrock-agentcore-generally-available/">Policy</a> and <a href="https://aws.amazon.com/about-aws/whats-new/2026/03/agentcore-evaluations-generally-available/">Evaluations</a> both reached GA in March 2026.</p>
</blockquote>
<h2>The 6 Orchestration Patterns You Need to Know</h2>
<p><strong>The Architecture</strong></p>
<p>Frameworks are implementation tools. <strong>Patterns are the architecture.</strong> Production systems almost always combine two or three of these. Understanding them is what separates teams that ship from teams that stay in pilot purgatory.</p>
<h3>1. Supervisor / Hierarchical</h3>
<p>A central orchestrator decomposes the task, delegates to specialist sub-agents, validates outputs, and synthesizes a final result. <strong>The gold standard for most enterprise workflows.</strong> Gartner predicts that by 2027, 70% of multi-agent systems will use narrowly specialized agents, improving accuracy but increasing coordination complexity (Gartner, December 2025).</p>
<p>Use an expensive, capable model for the orchestrator; use cheaper, specialized models for each sub-agent.</p>
<p><strong>Best implemented with:</strong> LangGraph (explicit state) or Strands + Agent Squad (model-driven routing)</p>
<h3>2. Sequential Pipeline</h3>
<p>Agent A hands off to Agent B. Classic for linear data transformation, document processing, or any workflow with clear stage dependencies.</p>
<p>Simple to debug, predictable cost profile. <strong>Critical rule:</strong> every stage must validate its inputs. Never pass garbage forward — a leaky pipeline where Stage 3 produces malformed output will have Stage 4 and 5 confidently processing garbage.</p>
<p><strong>Best implemented with:</strong> LangGraph nodes with explicit state contracts</p>
<h3>3. Parallel Fan-Out / Fan-In</h3>
<p>Multiple agents work the same task simultaneously from different angles or specializations. A collector agent synthesizes results. Also called <em>scatter-gather</em> or <em>map-reduce</em>.</p>
<p>Cut latency on complex research, multi-perspective analysis, or consensus-building tasks. The initiator agent distributes work; the collector waits for all branches and produces a unified output.</p>
<p><strong>Best implemented with:</strong> LangGraph parallel branches or Strands with concurrent tool calls</p>
<h3>4. Choreography (Event-Driven)</h3>
<p>Agents coordinate through events on a message bus — no central orchestrator. Agent A publishes <code>research_completed</code>, Agent B subscribes and acts, Agent B publishes <code>analysis_ready</code>, and so on.</p>
<p>High autonomy, loosely coupled, easy to add or remove agents. <strong>Trade-off:</strong> debugging is significantly harder without a centralized control flow. Best for workflows that change frequently, not for high-stakes deterministic pipelines.</p>
<p><strong>Best implemented with:</strong> EventBridge + SQS + Lambda, or Kafka for higher throughput</p>
<h3>5. Evaluator-Optimizer Loop (Reflection)</h3>
<p>A generator agent produces output; an evaluator agent critiques it; the generator revises. The cycle repeats until a quality threshold is met.</p>
<p><strong>Reflection is the most powerful pattern for accuracy-critical tasks</strong> — regulatory document authoring, code generation, clinical report drafting. Each critique round is a separate LLM call, so cost and latency multiply. Design your termination condition carefully.</p>
<p><strong>Best implemented with:</strong> LangGraph cycles with conditional edges</p>
<h3>6. ReAct (Reason + Act)</h3>
<p>The foundational single-agent loop: <strong>Thought → Action → Observation → repeat.</strong> The model articulates its reasoning, calls a tool, observes the result, and decides the next step.</p>
<p>This is the basis for both Strands (which wraps this loop automatically) and most LangGraph nodes. Understanding ReAct is prerequisite to understanding every other pattern.</p>
<p><strong>Best implemented with:</strong> Strands (native), LangGraph nodes, or any framework's AgentExecutor</p>
<blockquote>
<p>⚡ <strong>The cascading hallucination problem:</strong> Agent A hallucinates a policy. Agent B executes against that hallucination. Agent C reports confidently on a corrupt baseline. Multi-agent systems amplify errors as much as they amplify capability. Use <strong>immutable state snapshots</strong> — each agent works with a versioned state object and produces a new version. This provides audit lineage, prevents accidental mutations, and makes replay possible.</p>
</blockquote>
<h2>How to Choose</h2>
<p><strong>Decision Framework</strong></p>
<table>
<thead>
<tr>
<th>Question</th>
<th>If YES</th>
<th>If NO</th>
</tr>
</thead>
<tbody><tr>
<td>Does your workflow require auditable, deterministic step execution? <em>(GxP, regulated, financial)</em></td>
<td>→ <strong>LangGraph.</strong> Design the graph explicitly. Every transition is documented.</td>
<td>↓ Continue</td>
</tr>
<tr>
<td>Are you AWS-native and targeting Bedrock / AgentCore for deployment?</td>
<td>→ <strong>Strands Agents.</strong> Native AgentCore deployment, MCP-first tooling, built-in OTEL.</td>
<td>↓ Continue</td>
</tr>
<tr>
<td>Do you have many specialist sub-agents needing routing and context isolation?</td>
<td>→ <strong>Strands + Agent Squad</strong> for model-driven routing, or <strong>LangGraph</strong> for complex shared-state graphs.</td>
<td>↓ Continue</td>
</tr>
<tr>
<td>Is this 2–4 agents with a clear, simple workflow?</td>
<td>→ <strong>You may not need a framework.</strong> A 150-line orchestrator with explicit handoffs is easier to debug.</td>
<td>↓ Continue</td>
</tr>
<tr>
<td>Rapid prototype, role-based team-of-agents, not AWS-specific?</td>
<td>→ <strong>CrewAI.</strong> Fastest time to prototype for role-based multi-agent patterns.</td>
<td>—</td>
</tr>
</tbody></table>
<h2>The Production-Ready AWS Stack</h2>
<p><strong>Recommended Stack</strong></p>
<p>For complex, regulated, or enterprise-grade deployments on AWS, these layers complement each other without overlap:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Role</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr>
<td><strong>LangGraph</strong></td>
<td>Deterministic</td>
<td>Sub-workflows requiring validated, auditable steps (GxP-adjacent processes, sequential with contracts)</td>
</tr>
<tr>
<td><strong>AWS Strands Agents</strong></td>
<td>Model-Driven</td>
<td>Primary agent loop — reasoning, tool calls, MCP tool integration</td>
</tr>
<tr>
<td><strong>Agent Squad</strong></td>
<td>Routing</td>
<td>Routing across specialist agents at scale</td>
</tr>
<tr>
<td><strong>Bedrock AgentCore</strong></td>
<td>Platform</td>
<td>Runtime · Memory · Identity · Policy · Evaluations · Observability — the foundation for all of the above</td>
</tr>
</tbody></table>
<blockquote>
<p>🔭 <strong>Observability note:</strong> AgentCore Observability is OTEL-compatible and integrates natively with Langfuse. For regulated environments — pharma, healthcare, financial services — self-hosted Langfuse on AWS gives you full trace ownership with zero data leaving your VPC. This combination is current best-in-class for GxP-adjacent AI workloads.</p>
</blockquote>
<h2>What to Actually Do</h2>
<p><strong>Bottom Line</strong></p>
<p>The teams winning in 2026 make a deliberate pattern choice early, instrument observability from day one, and resist the temptation to add more agents when better tools or prompts would solve the problem.</p>
<p><strong>For most AWS practitioners:</strong> Start with Strands + AgentCore for speed and native integration. Add LangGraph for any sub-workflow where step auditability or strict ordering is non-negotiable. Wire Agent Squad in when your single-orchestrator topology starts to strain.</p>
<p><strong>For regulated industries:</strong> <a href="https://aws.amazon.com/about-aws/whats-new/2026/03/policy-amazon-bedrock-agentcore-generally-available/">AgentCore Policy</a> (now GA) gives you Cedar-based tool call interception that compliance teams can read and audit without engineering involvement. <a href="https://aws.amazon.com/about-aws/whats-new/2026/03/agentcore-evaluations-generally-available/">AgentCore Evaluations</a> (also now GA) gives you continuous quality monitoring rather than manual spot-checks. These two features alone move the "can we trust this in production?" conversation forward by months.</p>
<blockquote>
<p>🧵 <strong>The sustainable pattern:</strong> The ability to leverage the reasoning capabilities of these models, coupled with the ability to do real-world things through tools, is a durable architectural bet. The specific frameworks will keep evolving. The pattern of <em>reasoning + tool use + governance</em> will not.</p>
</blockquote>
<hr />
<p><em>Agentic AI Architecture Guide · April 2026 · AWS · LangGraph · Strands · AgentCore</em></p>
]]></content:encoded></item><item><title><![CDATA[Build the Slice, Not the System

]]></title><description><![CDATA[Most AI prototypes start the same way: everything in one file, the model call jammed next to the route handler, raw text dumped straight to the frontend. It works — until you need to change anything.
]]></description><link>https://blog.gainesai.com/build-the-slice-not-the-system</link><guid isPermaLink="true">https://blog.gainesai.com/build-the-slice-not-the-system</guid><category><![CDATA[AI development]]></category><category><![CDATA[FastAPI]]></category><category><![CDATA[AWS]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[James Gaines]]></dc:creator><pubDate>Sun, 12 Apr 2026 21:54:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69d828c8fa7251682e0c6f85/65c93744-3386-4381-b928-3f8d2462d5d6.svg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most AI prototypes start the same way: everything in one file, the model call jammed next to the route handler, raw text dumped straight to the frontend. It works — until you need to change anything.</p>
<p>This walkthrough shows a different approach. We build a small, working AI app — a "thin slice" — but we structure it with two patterns that keep the code clean as it grows:</p>
<ul>
<li><strong>BFF (Backend for Frontend)</strong> — the API layer shapes responses for the UI and hides backend complexity</li>
<li><strong>Observer</strong> — an event system that lets components react to what's happening without being coupled together</li>
</ul>
<p>The app itself is simple: paste rough notes, get back a summary, action items, and a next step. What matters is <em>how</em> the pieces connect.</p>
<blockquote>
<p>📦 Full source: <a href="https://github.com/jrgwv/thin-slice">github.com/jrgwv/thin-slice</a></p>
</blockquote>
<hr />
<h2>Architecture</h2>
<p><img src="https://raw.githubusercontent.com/jrgwv/thin-slice/main/images/component.png" alt="Component Diagram" /></p>
<p>Each layer has one job. The frontend never talks to the model. The BFF never builds prompts. The model layer never parses responses.</p>
<hr />
<h2>Project Structure</h2>
<pre><code class="language-text">app/
  main.py                    # FastAPI entrypoint
  routes/
    analyze.py               # BFF — request handling
  services/
    agent_service.py          # orchestration + observer events
    ai_client.py              # model integration (Bedrock Converse API)
    prompt_builder.py         # prompt construction
    response_parser.py        # raw text → structured output (with JSON extraction)
    observer.py               # event emitter/listener system
  models/
    schemas.py                # Pydantic request/response models
  static/
    index.html / app.js / styles.css
</code></pre>
<hr />
<h2>Pattern 1: BFF (Backend for Frontend)</h2>
<p>The BFF pattern means the API layer exists <em>for the frontend</em>. It doesn't contain business logic. It validates input, delegates work, and shapes the response.</p>
<p>Here's the full route:</p>
<pre><code class="language-python"># app/routes/analyze.py

@router.post("/analyze", response_model=AnalyzeResponse)
async def analyze(payload: AnalyzeRequest) -&gt; AnalyzeResponse:
    text = payload.text.strip()
    request_id = uuid.uuid4().hex[:12]

    if not text:
        raise HTTPException(status_code=400, detail="The 'text' field must not be empty.")

    if len(text) &gt; 12000:
        raise HTTPException(status_code=400, detail="Input is too long for this prototype.")

    try:
        return run_analysis(request_id, text)
    except Exception as exc:
        logger.exception("request_id=%s — analysis failed", request_id)
        raise HTTPException(status_code=500, detail="Analysis failed.") from exc
</code></pre>
<p>What it does:</p>
<ol>
<li>Validates input</li>
<li>Generates a request ID for tracing</li>
<li>Delegates to the agent service</li>
<li>Catches errors and returns clean HTTP responses</li>
</ol>
<p>What it <em>doesn't</em> do: build prompts, call models, parse responses. That's the agent's job.</p>
<p><strong>Why this matters:</strong> When you add a second frontend (mobile app, CLI, Slack bot), you add a second BFF. The agent service stays the same.</p>
<hr />
<h2>Pattern 2: Observer</h2>
<p>The Observer pattern decouples "something happened" from "what to do about it."</p>
<p>The implementation is intentionally minimal:</p>
<pre><code class="language-python"># app/services/observer.py

_listeners: dict[str, list[EventListener]] = defaultdict(list)

def on(event: str, listener: EventListener) -&gt; None:
    _listeners[event].append(listener)

def emit(event: str, data: dict[str, Any] | None = None) -&gt; None:
    for listener in _listeners.get(event, []):
        listener(data or {})
</code></pre>
<p>Three functions: <code>on</code>, <code>emit</code>, <code>clear</code>. That's the whole system.</p>
<p>The agent service emits events at each stage of the pipeline:</p>
<pre><code class="language-python"># app/services/agent_service.py

def run_analysis(request_id: str, user_text: str) -&gt; AnalyzeResponse:
    observer.emit("agent:start", {"request_id": request_id})

    prompt = build_prompt(user_text)
    observer.emit("agent:prompt_built", {"request_id": request_id, "prompt_len": len(prompt)})

    raw = call_model(prompt)
    observer.emit("agent:model_returned", {"request_id": request_id, "raw_len": len(raw)})

    result = parse_model_output(raw)
    observer.emit("agent:complete", {"request_id": request_id, "summary_len": len(result.summary)})

    return result
</code></pre>
<p>Right now, nothing listens. And that's fine. The point is that the hooks exist. When you need logging, tracing, metrics, or telemetry — you register a listener. The agent service doesn't change.</p>
<pre><code class="language-python"># Example: add structured logging later
observer.on("agent:complete", lambda data: logger.info("done", extra=data))
</code></pre>
<p><strong>Why this matters:</strong> You get observability without modifying the code that does the work.</p>
<hr />
<h2>The Agent Service</h2>
<p>This is the orchestration layer. It owns the pipeline:</p>
<ol>
<li>Build the prompt</li>
<li>Call the model</li>
<li>Parse the response</li>
</ol>
<p>It's the only place that knows all three steps exist. The BFF doesn't know about prompts. The model layer doesn't know about parsing.</p>
<p>If you later add tool calls, multi-step reasoning, or model routing — this is where it goes. The BFF and model layer stay untouched.</p>
<hr />
<h2>Model Integration: Bedrock Converse API</h2>
<p>This was one of the first real lessons in the build.</p>
<p>The initial version used <code>invoke_model</code> — Bedrock's low-level API. It works, but the request and response formats are <em>model-specific</em>. Amazon Nova expects one body shape. Anthropic expects another. Switch models, rewrite the integration.</p>
<p>The fix: switch to the <strong>Converse API</strong>. It provides a unified interface across all Bedrock models.</p>
<pre><code class="language-python"># app/services/ai_client.py

def _invoke_bedrock(prompt: str) -&gt; str:
    region = os.getenv("AWS_REGION", "us-east-1")
    model_id = os.getenv("MODEL_ID", "global.anthropic.claude-haiku-4-5-20251001-v1:0")
    client = boto3.client("bedrock-runtime", region_name=region)

    try:
        response = client.converse(
            modelId=model_id,
            messages=[{"role": "user", "content": [{"text": prompt}]}],
        )
    except (ClientError, BotoCoreError) as exc:
        raise RuntimeError(f"Bedrock invocation failed: {exc}") from exc

    try:
        return response["output"]["message"]["content"][0]["text"]
    except (KeyError, IndexError, TypeError) as exc:
        raise RuntimeError("Unexpected model response format.") from exc
</code></pre>
<p>Same request format whether you're calling Nova, Claude, Mistral, or Llama. Same response shape. Swap the model ID, everything else stays the same.</p>
<p>The default model is <code>global.anthropic.claude-haiku-4-5-20251001-v1:0</code> — a cross-region inference profile. Bedrock routes the request to the nearest available region automatically.</p>
<p><strong>Why this matters:</strong> The whole point of isolating the model layer is that you can swap models. The Converse API makes that actually true — not just in theory.</p>
<hr />
<h2>JSON Extraction: Handling Real Model Output</h2>
<p>Here's something that works perfectly in demo mode and breaks immediately in production: assuming the model returns clean JSON.</p>
<p>The prompt says "Return valid JSON only." The model says "Sure! Here's the JSON:" and wraps it in markdown fences.</p>
<p>Calling <code>json.loads()</code> on that throws a <code>JSONDecodeError</code>. The fix is a small extraction step before parsing:</p>
<pre><code class="language-python"># app/services/response_parser.py

def _extract_json(raw_text: str) -&gt; str:
    """Pull the first JSON object out of raw model output."""
    # Try stripping markdown fences first
    match = re.search(r"```(?:json)?\s*(\{.*?})\s*```", raw_text, re.DOTALL)
    if match:
        return match.group(1)
    # Fall back to first { ... }
    match = re.search(r"\{.*}", raw_text, re.DOTALL)
    if match:
        return match.group(0)
    return raw_text
</code></pre>
<p>It handles three cases:</p>
<ol>
<li><code>```json { ... } ```</code> — fenced with language tag</li>
<li><code>``` { ... } ```</code> — bare fences</li>
<li><code>Some preamble text { ... }</code> — JSON buried in prose</li>
</ol>
<p>This is defensive parsing. The model is a collaborator, not a contract. You ask for JSON, you <em>usually</em> get JSON, but you build for the times you don't.</p>
<p><strong>Why this matters:</strong> Every AI app hits this. The sooner you handle it, the fewer 500 errors you ship.</p>
<h3>Testing the Parser</h3>
<p>If you're parsing model output, test the weird cases — not just the happy path. These five tests cover the real-world formats models actually return:</p>
<pre><code class="language-python">def test_parse_clean_json():
    raw = '{"summary":"hello","action_items":["a","b"],"next_step":"ship it"}'
    result = parse_model_output(raw)
    assert result.summary == "hello"

def test_parse_markdown_fenced_json():
    raw = 'Here is the result:\n```json\n{"summary":"fenced","action_items":["x"],"next_step":"go"}\n```'
    result = parse_model_output(raw)
    assert result.summary == "fenced"

def test_parse_json_with_preamble():
    raw = 'Sure, here you go:\n{"summary":"preamble","action_items":[],"next_step":"next"}'
    result = parse_model_output(raw)
    assert result.summary == "preamble"

def test_parse_missing_fields_uses_defaults():
    raw = '{"summary":"","action_items":[],"next_step":""}'
    result = parse_model_output(raw)
    assert result.summary == "No summary returned."

def test_parse_bare_fences():
    raw = '```\n{"summary":"bare","action_items":["a"],"next_step":"done"}\n```'
    result = parse_model_output(raw)
    assert result.summary == "bare"
</code></pre>
<p>Clean JSON is the easy case. The ones that save you are <code>markdown_fenced</code>, <code>preamble</code>, and <code>bare_fences</code> — those are the formats that cause <code>JSONDecodeError</code> in production when you only tested with mock data.</p>
<hr />
<h2>Prompt Layer</h2>
<p>The prompt builder wraps user input in structured instructions:</p>
<pre><code class="language-python">def build_prompt(user_text: str) -&gt; str:
    return f"""You are helping turn rough notes into useful output.

Return the response in this exact JSON shape:
{{
  "summary": "short paragraph",
  "action_items": ["item 1", "item 2", "item 3"],
  "next_step": "single recommended next step"
}}

Rules:
- Be concise
- Keep action items practical
- Do not include markdown
- Return valid JSON only

Input:
{user_text}
""".strip()
</code></pre>
<p>This is where most of the "intelligence" lives. The model is only as good as the instructions it receives.</p>
<hr />
<h2>Frontend</h2>
<p>The frontend is deliberately minimal: a textarea, a button, a result display. No frameworks.</p>
<pre><code class="language-javascript">const response = await fetch("/api/analyze", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ text }),
});
const data = await response.json();
renderResults(data);
</code></pre>
<p>The frontend doesn't know about agents, prompts, or models. It sends text, gets JSON back. That's the BFF contract.</p>
<hr />
<h2>End-to-End Flow</h2>
<p><img src="https://raw.githubusercontent.com/jrgwv/thin-slice/main/images/sequence.png" alt="Sequence Diagram" /></p>
<hr />
<h2>What This Gets You</h2>
<p>This is ~200 lines of Python across 7 files. But the separation buys you real things:</p>
<table>
<thead>
<tr>
<th>Want to...</th>
<th>Change only...</th>
</tr>
</thead>
<tbody><tr>
<td>Swap the model</td>
<td><code>ai_client.py</code></td>
</tr>
<tr>
<td>Change the prompt</td>
<td><code>prompt_builder.py</code></td>
</tr>
<tr>
<td>Add logging/tracing</td>
<td>Register an observer</td>
</tr>
<tr>
<td>Add a mobile frontend</td>
<td>New BFF route</td>
</tr>
<tr>
<td>Add tool calls or multi-step</td>
<td><code>agent_service.py</code></td>
</tr>
<tr>
<td>Change the response shape</td>
<td><code>response_parser.py</code></td>
</tr>
</tbody></table>
<p>Nothing ripples. That's the point.</p>
<hr />
<h2>Running It</h2>
<pre><code class="language-bash">git clone https://github.com/jrgwv/thin-slice.git
cd thin-slice
python3 -m venv .venv &amp;&amp; source .venv/bin/activate
pip install -r requirements.txt
./start.sh
# Open http://127.0.0.1:8000
</code></pre>
<p>Demo mode is on by default — no AWS credentials needed to see it work.
Set <code>DEMO_MODE=false</code> in <code>.env</code> to call Bedrock for real.</p>
<hr />
<h2>Final Thought</h2>
<p>This isn't a production system. It's a clean, working slice of one.</p>
<p>That's enough to test the idea, validate the output, and decide what to build next — without untangling a mess when you do.</p>
<p>Build the slice. Not the system.</p>
]]></content:encoded></item><item><title><![CDATA[GainesAI: Build Fast. Learn Faster. Scale What Works.

]]></title><description><![CDATA[There's a lot of talking in tech right now.
A lot of slides.
A lot of "frameworks."
A lot of opinions about AI.
Not enough building.
GainesAI is about fixing that.

The Mindset
This site is built arou]]></description><link>https://blog.gainesai.com/gainesai-build-fast-learn-faster-scale-what-works</link><guid isPermaLink="true">https://blog.gainesai.com/gainesai-build-fast-learn-faster-scale-what-works</guid><dc:creator><![CDATA[James Gaines]]></dc:creator><pubDate>Thu, 09 Apr 2026 23:02:54 GMT</pubDate><content:encoded><![CDATA[<p>There's a lot of talking in tech right now.</p>
<p>A lot of slides.</p>
<p>A lot of "frameworks."</p>
<p>A lot of opinions about AI.</p>
<p>Not enough building.</p>
<p><strong>GainesAI is about fixing that.</strong></p>
<hr />
<h2>The Mindset</h2>
<p>This site is built around a simple idea:</p>
<p><strong>Don't overthink it. Build it.</strong></p>
<ul>
<li>Prototype in hours, not days</li>
<li>Test ideas quickly</li>
<li>Keep it simple</li>
<li>Good &gt; perfect</li>
<li>2 working versions &gt; 0 "perfect" ones</li>
</ul>
<p>Most ideas don't need a 6-week design phase. They need a working prototype and real feedback. That's what I focus on.</p>
<hr />
<h2>Who I Am</h2>
<p>I'm James Gaines.</p>
<p>I work in cloud, AI, and security — but more importantly, I build things.</p>
<p>Every day I'm working with AWS, distributed systems, and AI/ML in real environments. But the most valuable lessons don't come from documentation — they come from:</p>
<ul>
<li>Trying something</li>
<li>Breaking it</li>
<li>Fixing it</li>
<li>Scaling it</li>
</ul>
<p>This blog is where I share that process.</p>
<hr />
<h2>What You'll Find Here</h2>
<p>This isn't theory. This is execution.</p>
<h3>⚡ Rapid Prototyping with AI</h3>
<ul>
<li>Turning ideas into working apps in hours</li>
<li>Using AI to accelerate development (not replace thinking)</li>
<li>Prompt → prototype → iterate loops</li>
</ul>
<h3>🛠 AWS That Actually Ships</h3>
<ul>
<li>Lambda, Step Functions, EKS, Bedrock — used in real builds</li>
<li>Simple architectures that evolve over time</li>
<li>When to not over-engineer</li>
</ul>
<h3>🔐 Security Without Friction</h3>
<ul>
<li>Building secure systems without slowing everything down</li>
<li>Practical IAM, data protection, and guardrails</li>
<li>Making security part of the build — not a blocker</li>
</ul>
<h3>🚀 From Prototype → Production</h3>
<ul>
<li>What changes when something actually works</li>
<li>Scaling patterns that matter</li>
<li>Cost, reliability, and tradeoffs</li>
</ul>
<hr />
<h2>Show Me, Don't Tell Me</h2>
<p>You won't find long theoretical posts here. You'll find:</p>
<ul>
<li>Working demos</li>
<li>Real code</li>
<li>Architecture that evolves</li>
<li>Honest lessons (including what failed)</li>
</ul>
<p>If something takes too long to explain, I'd rather just build it and show you.</p>
<hr />
<h2>Why This Matters Right Now</h2>
<p>AI is changing how we build software. The bottleneck isn't tools anymore — it's execution.</p>
<p>The advantage goes to people who can:</p>
<ul>
<li>Move quickly</li>
<li>Test ideas</li>
<li>Learn fast</li>
<li>Iterate without overthinking</li>
</ul>
<p>That's the skill I'm focused on sharpening — and sharing.</p>
<hr />
<h2>The Rules I Build By</h2>
<ul>
<li>Start simple</li>
<li>Ship early</li>
<li>Learn from reality, not assumptions</li>
<li>Don't wait for perfect</li>
<li>Stay curious</li>
<li>Stay humble</li>
<li>Stay hungry</li>
</ul>
<hr />
<h2>What's Coming Next</h2>
<p>I'll be building and sharing things like:</p>
<ul>
<li>AI-powered apps built in a few hours</li>
<li>End-to-end AWS pipelines (prototype → production)</li>
<li>Automating workflows with LLMs</li>
<li>Real-world experiments with new tools</li>
<li>Fast iterations on ideas that may or may not work</li>
</ul>
<p>Some will be rough. Some will break. That's the point.</p>
<hr />
<h2>Final Thought</h2>
<p>You don't learn by reading about systems.</p>
<p>You learn by building them.</p>
<p><strong>GainesAI is where I do that — openly.</strong></p>
]]></content:encoded></item></channel></rss>