Skip to main content

Command Palette

Search for a command to run...

Adding an AI Chatbot to my Todo App on AWS

A Bedrock Agent, a WebSocket, and the parts I had to figure out along the way

Updated
โ€ข14 min read
P

๐Ÿ‘‹ Hi, Iโ€™m @hpfpv โ˜๏ธ Iโ€™m a Cloud Infrastructure Architect | 8x AWS Certified ๐Ÿš€ I build secure, scalable, and automated solutions on AWS using Terraform, CloudFormation, and CI/CD ๐Ÿ“š Always exploring hybrid cloud, serverless, and AI-driven architectures

TL;DR: I added a natural-language chat panel to my serverless todo app, backed by a Bedrock Agent that calls the existing API as its action group. The post walks through the parts the standard tutorials skip: why I picked API Gateway WebSocket (and why I'd probably pick SSE today), how Cognito auth survives the browser's no-headers-on-WebSocket-handshake limitation, a single-table DynamoDB design that resumes conversations across reconnects, and the Custom Resource you need because Bedrock model invocation logging has no native CloudFormation support.

Hi guys! A few years ago I built a sample todo app on AWS: Cognito for auth, API Gateway + Lambda + DynamoDB for the backend, S3 + CloudFront for the frontend, all behind a SAM template. It was a good little project for learning the basics of serverless on AWS. It still works today.

Fast-forward to now, with everything happening in the GenAI space and Bedrock Agents getting more capable by the month, I've been spending a lot of time learning the Bedrock ecosystem hands-on. And I came back to that old todo app with a new question: can I make this whole thing usable through a chat interface? Not a button-and-form app, but a thing where I open a chat panel, type "list my todos due this week," and the agent actually does it. Then "delete the one about ansible," and it does that too. Then drag a file into the chat window and say "attach this to my dentist appointment todo," and it does that.

I knew Bedrock Agents could do tool calling, and I already had a perfectly good API for the agent to call. I plugged the two together. So let's talk about it.

What I was building

The chatbot needed to do four things:

  • Natural-language CRUD on todos โ€” list, get, add, complete, delete, add notes

  • File attachments via chat โ€” register files the user drops into the chat window

  • Multi-turn memory โ€” "now mark that one as complete" should know what that one refers to

  • Real users only โ€” same Cognito users as the existing app; the bot only sees their data

The architecture

Here's what we ended up with:

The flow:

  1. On page load, the frontend opens a WebSocket connection to API Gateway, with the Cognito ID token in the query string, so the chat panel is ready to use as soon as the user sees it. (Browser WebSockets can't set custom headers โ€” there's a section below on why.)

  2. A Lambda authorizer validates the JWT against the Cognito user pool. If valid, it returns an Allow policy with the user's email in the authorizer context.

  3. On $connect, a WebSocket handler Lambda writes two items to a single DynamoDB table: one keyed by connectionId (deleted on disconnect) and one keyed by userID (survives disconnect, 30-min TTL). The userID item carries the Bedrock session ID, so reconnects resume the same conversation.

  4. On every message, the WebSocket handler calls bedrock_agent_runtime.invoke_agent with the user's text, the persisted sessionId, and the user's email as a promptSessionAttribute.

  5. The Bedrock Agent decides whether to call a tool. If so, it invokes the action group Lambda, which reads/writes the existing todo and files DynamoDB tables.

  6. The agent's streamed response chunks come back through the WebSocket handler, which forwards them frame-by-frame to the browser.

Let's go through each piece.

WebSocket vs HTTP โ€” and what I'd weigh differently now

The first decision I had to make was how the browser would talk to the backend. There are three reasonable options for a streaming chat:

  • Plain HTTP request-response. Simple, but no streaming. The user sees nothing until the full agent reply lands.

  • HTTP + Server-Sent Events (SSE). One-way streaming from server to client over a long-lived HTTP connection. I'll come back to this one โ€” turns out it's a closer fit for what this app actually does than I realised when I picked WebSocket.

  • WebSocket. Bidirectional streaming over a single persistent connection.

I went with WebSocket. The honest reason: at the time, it was the easiest AWS-managed path to streaming. API Gateway HTTP API doesn't support response streaming, so SSE on AWS means either Lambda Function URLs with response streaming (which skips API Gateway, so you lose custom domains, throttling, request validators, all the API GW features), or putting Lambda behind an ALB or CloudFront with extra wiring. API Gateway WebSocket API gives you the streaming piece (post_to_connection) with custom domains, authorizers, and routing already in the box. If your goal is to ship, WebSocket wins.

Here's the thing I didn't think enough about at the time: I don't actually need bidirectional streaming for this app. The user types a message, the agent streams a reply, repeat. There's no server push today โ€” no notifications, no multi-device sync, no background events landing in the chat. That's a textbook SSE workload. WebSocket is giving me a two-way channel I never use.

The steady-state cost of that choice has been real:

  • WAF doesn't attach to established WebSocket connections. Rate limiting and edge protection need a different design, which is most of Part 2.

  • The token has to go in the query string. Browsers don't let you set headers on the WebSocket handshake, so the JWT lands in API Gateway access logs by default. You can work around it, but it's one more thing to remember.

  • API Gateway WebSocket has lower default throttle limits than its HTTP counterpart. I hit them once token streaming started producing many small post_to_connection calls per response, and had to raise the stage limits.

  • \(connect/\)disconnect plumbing adds Lambda invocations and DynamoDB writes per session that an SSE design wouldn't need.

If I were starting today, with the same features in mind, I'd seriously look at Lambda Function URLs + SSE and pay the one-time setup cost upfront for a cheaper, simpler app to run day to day. WebSocket is the right call when you actually need server push. For a "user asks, agent streams back" chat, SSE is a better fit.

I'm keeping WebSocket in this post because that's what shipped, and because the lessons from running it in production are worth more than a rewrite I haven't done. But if you have the same choice to make today: think hard about whether your app is really two-way, or just streaming.

$connect auth: why the token is in the query string

Here's the part that catches everyone off guard the first time. A WebSocket connection starts life as a regular HTTP/1.1 request that asks the server to "upgrade" the TCP socket to the WebSocket protocol. From the browser side, that handshake goes through the WebSocket constructor, which is deliberately kept simple: you give it a URL, you get a socket back. There's no way to set Authorization: Bearer ... on that upgrade request, so the usual HTTP auth pattern doesn't work.

The workaround is the query string:

// frontend chatbot.ts
const url = `\({config.chatbotWsEndpoint}?token=\){encodeURIComponent(token)}`;
ws = new WebSocket(url);

API Gateway lets you wire a Lambda authorizer that reads identity from route.request.querystring.token:

ChatbotAuthorizer:
  Type: AWS::ApiGatewayV2::Authorizer
  Properties:
    ApiId: !Ref ChatbotWebSocketApi
    AuthorizerType: REQUEST
    AuthorizerUri: !Sub "arn:aws:apigateway:\({AWS::Region}:lambda:path/2015-03-31/functions/\){AuthorizerFunction.Arn}/invocations"
    IdentitySource:
      - route.request.querystring.token
    Name: CognitoJwtAuthorizer

The authorizer Lambda decodes and verifies the JWT (signed by the Cognito User Pool's JWKS) and returns an Allow policy with the user's email in the context:

return _policy(user_id, 'Allow', method_arn, {'userID': user_id})

The WebSocket handler then reads the user identity from event['requestContext']['authorizer']['userID'], never from anything the client sent in the body. The user can't lie about who they are because that field is set by API Gateway from the authorizer's response, not from anything the client sent.

The token-in-querystring approach comes with a few things to be aware of: the token shows up in API Gateway access logs, in the browser's Network tab, and possibly in any HTTP proxy logs along the way. The usual fixes all work. Cognito ID tokens are short-lived (1 hour by default), so even if a token leaks it doesn't last long; you should set the WS stage's access logging to skip query strings in production; and you treat the token as a login credential, not a stable ID. It's a browser WebSocket API limitation, not an AWS issue, and it's the same pattern AWS's own WebSocket auth examples use.

Single-table DynamoDB for connection + session state

I needed two things in DynamoDB:

  • a record that says "this connectionId belongs to that userID and uses that sessionId", short-lived, deleted on disconnect

  • a record that says "this userID has an active Bedrock session with this sessionId, valid until this TTL", surviving disconnect so a reconnect can resume the same conversation

The temptation was to use two tables. Instead I used one table with two item types, both keyed by a single pk attribute:

# connection-scoped item (deleted on $disconnect)
{ 'pk': connection_id, 'userID': user_id, 'sessionId': session_id, 'ttl': now + 30*60 }

# user-scoped item (survives disconnect, refreshed on connect)
{ 'pk': user_id, 'sessionId': session_id, 'ttl': now + 30*60 }

On \(connect, the handler reads the userID item; if it exists and isn't expired, it reuses the sessionId. Otherwise it generates a fresh one. On \)disconnect, only the connection-scoped item is deleted. The user-scoped one keeps living so the conversation can resume on the next connection.

Two writes per connect (instead of one), but no GSI cost, no second IAM resource, and a simpler deploy. The table's IAM policy is three actions on one ARN: GetItem, PutItem, DeleteItem.

Streaming the Bedrock Agent response

Bedrock's invoke_agent returns a generator of chunk events. The naive way to consume it is:

agent_answer = ''
for event in agent_response['completion']:
    if 'chunk' in event:
        agent_answer += event['chunk']['bytes'].decode('utf-8')
# post once at the end
api_gw.post_to_connection(ConnectionId=connection_id, Data=json.dumps({'response': agent_answer}))

The streaming way is to forward each chunk as it arrives, plus a final "done" frame:

agent_answer = ''
for event in agent_response['completion']:
    if 'chunk' in event:
        chunk_text = event['chunk']['bytes'].decode('utf-8')
        agent_answer += chunk_text
        api_gw.post_to_connection(
            ConnectionId=connection_id,
            Data=json.dumps({'type': 'chunk', 'text': chunk_text}),
        )
api_gw.post_to_connection(
    ConnectionId=connection_id,
    Data=json.dumps({'type': 'done'}),
)

The frontend keeps a single in-progress message bubble keyed off the streaming flag and re-renders the bubble's innerHTML from the accumulated text on every chunk. Use formatBotText (HTML-escape the source string before rendering), never raw concatenation. That's an XSS vector.

function appendChunk(text: string): void {
    _streamingText += text;
    if (!_streamingBubble) {
        removeTypingIndicator();
        const bubble = document.createElement('div');
        bubble.classList.add('message', 'bot');
        chatMessages.appendChild(bubble);
        _streamingBubble = bubble;
    }
    _streamingBubble.innerHTML =
        '<span class="bot-avatar-sm">โœฆ</span>' + formatBotText(_streamingText);
    chatMessages.scrollTop = chatMessages.scrollHeight;
}

The done frame triggers finalizeStream(), which persists the full text to localStorage and resets the in-progress flag. That's the whole streaming flow.

The Bedrock Agent and its action group

The agent itself is a SAM resource. The interesting bits are the foundation model, the instruction, and the function schema for the action group:

TodoAgent:
  Type: AWS::Bedrock::Agent
  Properties:
    AgentName: !Sub "${AWS::StackName}-todo-assistant"
    FoundationModel: amazon.nova-lite-v1:0
    AgentResourceRoleArn: !GetAtt TodoAgentRole.Arn
    AutoPrepare: true
    Instruction: |
      You are Tasko, a friendly productivity assistant ...
      The current user's ID is \(prompt_session.userID\). Use this exact value
      whenever a function requires a userID parameter.
      ...
    ActionGroups:
      - ActionGroupName: TodoActions
        Description: Actions for reading and managing the user's todos
        ActionGroupExecutor:
          Lambda: !GetAtt ActionGroupHandlerFunction.Arn
        FunctionSchema:
          Functions:
            - Name: getTodos
              Parameters:
                userID: { Type: string, Required: true, Description: "..." }
            # ... 8 more functions

\(prompt_session.userID\) is the placeholder Bedrock substitutes from the promptSessionAttributes you pass on each invoke_agent call. The WebSocket handler passes:

agent_response = bedrock_agent_runtime.invoke_agent(
    inputText=human,
    agentId=AGENT_ID,
    agentAliasId=AGENT_ALIAS_ID,
    sessionId=session_id,
    sessionState={'promptSessionAttributes': {'userID': user_id}},
)

The user's email never travels in the message body, so the user can't put userID=other-person@example.com in their input and have the agent see it as a different identity at the model layer. The substitution happens server-side.

(I'll come back to that in Part 2. There's something subtle here that took me longer than it should have to spot.)

Bedrock model invocation logging โ€” the operational blind spot

Here's the thing I genuinely didn't know until I was halfway through: Bedrock model invocation logging has no native CloudFormation support. There's a Bedrock API, put_model_invocation_logging_configuration, but no AWS::Bedrock::LoggingConfiguration CFN resource. Without invocation logging on, every call to your model is invisible: no latency, no token count, no prompt/response audit. For a production agent, that's not OK.

The fix is a SAM Custom Resource Lambda that calls the API once, on stack create/update:

EnableBedrockLoggingFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handler.lambda_handler
    Policies:
      - Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action: bedrock:PutModelInvocationLoggingConfiguration
            Resource: "*"
          - Effect: Allow
            Action: iam:PassRole
            Resource: !GetAtt BedrockLoggingRole.Arn

BedrockLoggingCustomResource:
  Type: AWS::CloudFormation::CustomResource
  Properties:
    ServiceToken: !GetAtt EnableBedrockLoggingFunction.Arn

The handler is straightforward: read a target log group from environment, configure Bedrock to write there, return a stable physical ID so CloudFormation doesn't try to delete-and-recreate on every update. Small piece of code, but it closes a real observability gap.

IaC and the deployment pipeline

The whole chatbot service lives under infra/sam/ai-assistant/template.yaml. Source code under services/ai-assistant/src/{authorizer,websocket_handler,action_group,enable_bedrock_logging}/. Tests under services/ai-assistant/tests/.

GitHub Actions handles deploy. The interesting part is how config flows between services. The chatbot needs the Cognito User Pool ID from the main service. Rather than duplicate the value as a GitHub secret, the main-service stack publishes its IDs to SSM Parameter Store:

SsmCognitoUserPoolId:
  Type: AWS::SSM::Parameter
  Properties:
    Name: /todo-houessou-com/main-service/cognito-user-pool-id
    Type: String
    Value: !Ref TodoUserPool

The chatbot pipeline reads from SSM during deploy:

- name: Read config
  run: |
    set -euo pipefail
    COGNITO_USER_POOL_ID=$(aws ssm get-parameter \
      --name /todo-houessou-com/main-service/cognito-user-pool-id \
      --query Parameter.Value --output text \
      --region ${{ secrets.AWS_REGION }})
    echo "COGNITO_USER_POOL_ID=\(COGNITO_USER_POOL_ID" >> \)GITHUB_ENV

And passes the value as a SAM parameter:

sam deploy ... --parameter-overrides CognitoUserPoolId=$COGNITO_USER_POOL_ID

Basically: SSM is the source of truth for cross-service config, and pipelines pick it up automatically. No GitHub secrets to rotate, no manual copying after a --guided deploy. The set -euo pipefail at the top of the run block makes sure that if the SSM parameter is missing, the build fails loudly instead of silently passing an empty value through.

Takeaways

  • WebSocket was probably overkill for this app. I picked it because API Gateway's managed WebSocket path is the easiest way to stream on AWS, but the day-to-day cost (WAF gap, query-string auth, lower throttle limits, \(connect/\)disconnect plumbing) is real, and the two-way channel never gets used.

  • Pick WebSocket only when you need real server push. For a "user asks, agent streams back" workload, SSE via Lambda Function URLs is the more proportionate fit. Don't pick WebSocket by default just because you're building chat.

  • The token has to go in the query string. Browsers don't let you set headers on WebSocket handshakes. Wire the Lambda authorizer to IdentitySource: route.request.querystring.token and read identity server-side from $context.authorizer, not from the message body.

  • Single-table DynamoDB for connection + session state is plenty. Two item types, one pk. No GSI, narrow IAM, simple deploy.

  • Bedrock model invocation logging needs a Custom Resource. No native CFN support; without it, your agent is unobservable, no latency, no token count, no audit trail. A small Custom Resource Lambda closes the gap once and for the lifetime of the stack.

  • SSM Parameter Store beats GitHub secrets for cross-service config. Pipelines read at deploy time, no rotation, no manual copy. set -euo pipefail makes missing values fail loud.

Where this goes next

The chatbot works. Multi-turn memory holds, streaming feels good, the action group handles the full set of todo and file operations through natural language. I shipped it feeling pretty good about it.

Then I started poking at it the way an attacker would, and I realised I'd left every door propped open. That's Part 2: same architecture, with the security work I should've done from day one.

The full source is on GitHub.

Happy building!

Building an AI Chatbot on AWS

Part 1 of 1

building-an-ai-chatbot-on-aws