Skip to main content

Command Palette

Search for a command to run...

Adding an AI Chatbot to my Todo App on AWS

A Bedrock Agent, a WebSocket, and the parts I had to figure out along the way

Updated
โ€ข9 min read
P

๐Ÿ‘‹ Hi, Iโ€™m @hpfpv โ˜๏ธ Iโ€™m a Cloud Infrastructure Architect | 8x AWS Certified ๐Ÿš€ I build secure, scalable, and automated solutions on AWS using Terraform, CloudFormation, and CI/CD ๐Ÿ“š Always exploring hybrid cloud, serverless, and AI-driven architectures

TL;DR: I added a natural-language chat panel to my serverless todo app, backed by a Bedrock Agent that calls the existing API as its action group. This post is the parts the standard tutorials skip: how Cognito auth survives the browser's no-headers-on-WebSocket-handshake limitation, a single-table DynamoDB design that resumes conversations across reconnects, and the Custom Resource you need because Bedrock model invocation logging has no native CloudFormation support.

A few years ago I built a sample todo app on AWS: Cognito for auth, API Gateway + Lambda + DynamoDB for the backend, S3 + CloudFront for the frontend, all behind a SAM template. It still works today.

I came back to it with a new question: can I make this whole thing usable through a chat interface? Open a chat panel, type "list my todos due this week," and the agent actually does it. Then "delete the one about ansible," and it does that too. Then drag a file into the chat window and say "attach this to my dentist appointment todo," and it does that.

Bedrock Agents do tool calling. I already had a perfectly good API. I plugged the two together. Let's talk about it.

What I was building

The chatbot needed to do four things:

  • Natural-language CRUD on todos: list, get, add, complete, delete, add notes

  • File attachments via chat, where the user drops files into the chat window and the bot registers them against a todo

  • Multi-turn memory, so "now mark that one as complete" knows what that one refers to

  • Cognito-scoped access, so the bot only ever sees the data of the user it's logged in for

The architecture

The flow:

  1. On page load, the frontend opens a WebSocket to API Gateway with the Cognito ID token in the query string, so the chat is ready as soon as the user sees it.

  2. A Lambda authorizer validates the JWT against the Cognito user pool and returns an Allow policy with the user's identity in the authorizer context.

  3. On $connect, the WebSocket handler Lambda writes two items to a single DynamoDB table: one keyed by connectionId (deleted on disconnect), one keyed by userID (survives disconnect, 30-min TTL). The userID item carries the Bedrock session ID, so reconnects resume the same conversation.

  4. On every message, the handler calls bedrock_agent_runtime.invoke_agent with the user's text, the persisted sessionId, and the user's email as a promptSessionAttribute.

  5. The Bedrock Agent decides whether to call a tool. If so, it invokes the action group Lambda, which reads/writes the existing todo and files tables.

  6. Streamed response chunks come back through the WebSocket handler, which forwards them frame-by-frame to the browser.

A quick note on WebSocket vs SSE. If I were starting today, I'd seriously evaluate SSE via Lambda Function URLs first; WebSocket shipped, but it came with operational tradeoffs.

$connect auth: why the token is in the query string

This one's a browser limitation, not an AWS one. A WebSocket connection starts life as a regular HTTP/1.1 request that asks the server to "upgrade" the TCP socket to the WebSocket protocol. From the browser side, that handshake goes through the WebSocket constructor, which is deliberately kept simple: you give it a URL, you get a socket back. There's no way to set Authorization: Bearer ... on that upgrade request, so the usual HTTP auth pattern doesn't work.

The workaround is the query string:

// frontend chatbot.ts
const url = `\({config.chatbotWsEndpoint}?token=\){encodeURIComponent(token)}`;
ws = new WebSocket(url);

API Gateway lets you wire a Lambda authorizer that reads identity from route.request.querystring.token:

ChatbotAuthorizer:
  Type: AWS::ApiGatewayV2::Authorizer
  Properties:
    ApiId: !Ref ChatbotWebSocketApi
    AuthorizerType: REQUEST
    AuthorizerUri: !Sub "arn:aws:apigateway:\({AWS::Region}:lambda:path/2015-03-31/functions/\){AuthorizerFunction.Arn}/invocations"
    IdentitySource:
      - route.request.querystring.token
    Name: CognitoJwtAuthorizer

The authorizer Lambda decodes and verifies the JWT (signed by the Cognito User Pool's JWKS) and returns an Allow policy with the user's email in the context:

return _policy(user_id, 'Allow', method_arn, {'userID': user_id})

The WebSocket handler reads the user identity from event['requestContext']['authorizer']['userID'], never from anything the client sent in the body. The user can't lie about who they are because that field is set by API Gateway from the authorizer's response.

The token-in-querystring approach comes with a few things to be aware of: the token shows up in API Gateway access logs, in the browser's Network tab, and possibly in any HTTP proxy logs along the way. The usual fixes all work. Cognito ID tokens are short-lived (1 hour by default), so even if a token leaks it doesn't last long; set the WS stage's access logging to skip query strings in production; and treat the token as a login credential, not a stable ID. Same pattern AWS's own WebSocket auth examples use.

Single-table DynamoDB for connection + session state

I needed two things in DynamoDB:

  • a record that says "this connectionId belongs to that userID and uses that sessionId", short-lived, deleted on disconnect

  • a record that says "this userID has an active Bedrock session with this sessionId, valid until this TTL", surviving disconnect so a reconnect can resume the same conversation

My first instinct was two tables. I'm glad I didn't โ€” the access pattern doesn't justify it. One table with two item types, both keyed by a single pk attribute:

# connection-scoped item (deleted on $disconnect)
{ 'pk': connection_id, 'userID': user_id, 'sessionId': session_id, 'ttl': now + 30*60 }

# user-scoped item (survives disconnect, refreshed on connect)
{ 'pk': user_id, 'sessionId': session_id, 'ttl': now + 30*60 }

On \(connect, the handler reads the userID item; if it exists and isn't expired, it reuses the sessionId. Otherwise it generates a fresh one. On \)disconnect, only the connection-scoped item is deleted. The user-scoped one keeps living so the conversation can resume on the next connection.

Two writes per connect instead of one, but no GSI cost, no second IAM resource, and a simpler deploy. The table's IAM policy is three actions on one ARN: GetItem, PutItem, DeleteItem.

Streaming the Bedrock Agent response

Bedrock's invoke_agent returns a generator of chunk events. My first version accumulated the whole thing and posted it at the end โ€” it worked, but the user stared at a typing indicator for the entire invocation and the UI felt broken. Forward each chunk as it arrives instead, plus a final "done" frame:

agent_answer = ''
for event in agent_response['completion']:
    if 'chunk' in event:
        chunk_text = event['chunk']['bytes'].decode('utf-8')
        agent_answer += chunk_text
        api_gw.post_to_connection(
            ConnectionId=connection_id,
            Data=json.dumps({'type': 'chunk', 'text': chunk_text}),
        )
api_gw.post_to_connection(
    ConnectionId=connection_id,
    Data=json.dumps({'type': 'done'}),
)

The frontend keeps a single in-progress message bubble and re-renders the bubble's innerHTML from the accumulated text on every chunk. Use formatBotText (HTML-escape the source string before rendering), never raw concatenation โ€” that's an XSS vector.

function appendChunk(text: string): void {
    _streamingText += text;
    if (!_streamingBubble) {
        removeTypingIndicator();
        const bubble = document.createElement('div');
        bubble.classList.add('message', 'bot');
        chatMessages.appendChild(bubble);
        _streamingBubble = bubble;
    }
    _streamingBubble.innerHTML =
        '<span class="bot-avatar-sm">โœฆ</span>' + formatBotText(_streamingText);
    chatMessages.scrollTop = chatMessages.scrollHeight;
}

The done frame triggers finalizeStream(), which persists the full text to localStorage and resets the in-progress flag. That's the whole streaming flow.

The Bedrock Agent and its action group

The agent itself is a SAM resource. The interesting bits are the foundation model, the instruction, and the function schema for the action group:

TodoAgent:
  Type: AWS::Bedrock::Agent
  Properties:
    AgentName: !Sub "${AWS::StackName}-todo-assistant"
    FoundationModel: amazon.nova-lite-v1:0
    AgentResourceRoleArn: !GetAtt TodoAgentRole.Arn
    AutoPrepare: true
    Instruction: |
      You are Tasko, a friendly productivity assistant ...
      The current user's ID is `\(prompt_session.userID\)`. Use this exact value
      whenever a function requires a userID parameter.
      ...
    ActionGroups:
      - ActionGroupName: TodoActions
        Description: Actions for reading and managing the user's todos
        ActionGroupExecutor:
          Lambda: !GetAtt ActionGroupHandlerFunction.Arn
        FunctionSchema:
          Functions:
            - Name: getTodos
              Parameters:
                userID: { Type: string, Required: true, Description: "..." }
            # ... 8 more functions

\(prompt_session.userID\) is the placeholder Bedrock substitutes from the promptSessionAttributes you pass on each invoke_agent call. The WebSocket handler passes:

agent_response = bedrock_agent_runtime.invoke_agent(
    inputText=human,
    agentId=AGENT_ID,
    agentAliasId=AGENT_ALIAS_ID,
    sessionId=session_id,
    sessionState={'promptSessionAttributes': {'userID': user_id}},
)

The user's email never travels in the message body, so the user can't put userID=other-person@example.com in their input and have the agent see it as a different identity at the model layer. The substitution happens server-side.

That said, promptSessionAttributes is context for the model, not an authorization boundary. The action group Lambda still has to enforce user ownership on every DynamoDB read and write โ€” the agent instruction is not where security lives.

Bedrock model invocation logging โ€” the operational blind spot

Here's the thing I genuinely didn't know until I was halfway through: Bedrock model invocation logging has no native CloudFormation support. There's a Bedrock API, put_model_invocation_logging_configuration, but no AWS::Bedrock::LoggingConfiguration CFN resource. Without invocation logging on, every call to your model is invisible: no latency, no token count, no prompt/response audit. For a production agent, that's not OK.

The fix is a SAM Custom Resource Lambda that calls the API once, on stack create/update:

EnableBedrockLoggingFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handler.lambda_handler
    Policies:
      - Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action: bedrock:PutModelInvocationLoggingConfiguration
            Resource: "*"
          - Effect: Allow
            Action: iam:PassRole
            Resource: !GetAtt BedrockLoggingRole.Arn

BedrockLoggingCustomResource:
  Type: AWS::CloudFormation::CustomResource
  Properties:
    ServiceToken: !GetAtt EnableBedrockLoggingFunction.Arn

The handler is straightforward: read a target log group from environment, configure Bedrock to write there, return a stable physical ID so CloudFormation doesn't try to delete-and-recreate on every update. Small piece of code, but it closes a real observability gap.

Takeaways

  • Browsers can't set headers on the WebSocket handshake. Token goes in the query string, the Lambda authorizer reads from IdentitySource: route.request.querystring.token, and the WebSocket handler reads identity server-side from $context.authorizer โ€” never from the message body.

  • One DynamoDB table is plenty for connection + session state. Two item types, one pk. No GSI, narrow IAM, simple deploy. The user-scoped item is what makes reconnects resume the same Bedrock session.

  • promptSessionAttributes keeps the user ID out of the prompt โ€” but it's not the security boundary. Server-side substitution into the agent instruction means the user can't claim to be someone else by typing it into the chat. The action group Lambda still has to enforce user ownership server-side.

  • Bedrock model invocation logging needs a Custom Resource. No native CFN support; without it, your agent is unobservable โ€” no latency, no token count, no audit trail. A small Custom Resource Lambda closes the gap once and for the lifetime of the stack.

Where this goes next

The chatbot works. Multi-turn memory holds, streaming feels good, the action group handles the full set of todo and file operations through natural language. I shipped it feeling pretty good about it.

Then I started poking at it the way an attacker would, and I realised I'd left every door propped open. That's Part 2: same architecture, with the security work I should've done from day one.

The full source is on GitHub.