Deploying DeepSeek-R1 on ECS Fargate with Open WebUI: A Scalable Ollama-based AI ChatBot

Published in

AWS in Plain English

5 min readJan 31, 2025

A complete CDK Automation with DeepSeek-R1 Chatbot confined locally inside AWS ECS Fargate container running Ollama that doesn’t transmit your data to China!

I love seeing developers push the boundaries of what’s possible with AWS services, and today’s topic is no exception. Imagine running DeepSeek-R1 (7B) — a powerful open-weight LLM — on Amazon ECS Fargate, fully managed, serverless, and without worrying about infrastructure and data sovereignty. On top of that, we’ll integrate Open WebUI for an easy-to-use chat interface.

No more dealing with self-hosted GPU or complex Kubernetes configurations. Let’s go build!

Image generated by Amazon Bedrock - Nova Canvas Frontier Model

🚀 The Goal: A Fully Serverless AI Chatbot on AWS

Our architecture consists of two Fargate services:

Ollama running DeepSeek-R1 — The backend, serving the model API.
Open WebUI — A chat interface for users to interact with the model.

We’ll wire it up with Application Load Balancers (ALB) for public access and set up cross-container communication so WebUI can talk to Ollama.

Here’s how you can deploy this stack with a single AWS CDK command.

🛠️ The CDK Stack: Fargate-Powered AI

The AWS Cloud Development Kit (CDK) makes this deployment ridiculously simple. Below is our complete TypeScript stack.

1️⃣ VPC & ECS Cluster Setup

We start with a VPC that only has public subnets (since we’re using Fargate) and an ECS Cluster to host both services.

const vpc = new ec2.Vpc(this, 'OllamaVpc', {
  maxAzs: 2,
  subnetConfiguration: [{
    name: 'PublicSubnet',
    subnetType: ec2.SubnetType.PUBLIC,
  }],
  natGateways: 0
});

const cluster = new ecs.Cluster(this, 'OllamaCluster', { vpc });

2️⃣ Ollama Service (Serving DeepSeek-R1)

Now, let’s deploy Ollama on ECS Fargate. This container pulls the DeepSeek-R1 model and serves it over port 11434.

 // Ollama Service
    const ollamaService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, 'OllamaService', {
      cluster,
      cpu: 4096,
      memoryLimitMiB: 16384,
      desiredCount: 1,
      taskImageOptions: {
        image: ecs.ContainerImage.fromRegistry('ollama/ollama:latest'),
        containerPort: 11434,
        environment: { 'OLLAMA_HOST': '0.0.0.0' },
        enableLogging: true
      },
      publicLoadBalancer: true,
      assignPublicIp: true
    });

🔹 Why Fargate? No EC2 instances, no manual scaling, just pay-for-what-you-use AI hosting.
🔹 CPU/Mem Choice? 4 vCPUs & 16GB RAM to handle inference workloads efficiently.

3️⃣ Open WebUI Service

Next up, the WebUI container for a seamless frontend.

// WebUI Service
    const webuiService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, 'WebUI', {
      cluster,
      cpu: 4096,
      memoryLimitMiB: 16384,
      desiredCount: 1,
      taskImageOptions: {
        image: ecs.ContainerImage.fromRegistry('ghcr.io/open-webui/open-webui:main'),
        containerPort: 8080,
        environment: {
          'OLLAMA_BASE_URL': `http://${ollamaService.loadBalancer.loadBalancerDnsName}`,
          'WEBUI_SECRET_KEY': 'your-secure-key-here',
          'MODEL_FILTER_ENABLED': 'false', // Show all models
          'WEBUI_DEBUG_MODE': 'true', // Debugging
          'OLLAMA_API_OVERRIDE_BASE_URL': `http://${ollamaService.loadBalancer.loadBalancerDnsName}`,
          'ENABLE_OLLAMA_MANAGEMENT': 'true'
        },
      
        enableLogging: true
      },
      publicLoadBalancer: true,
      assignPublicIp: true
    });

🔹 The WebUI connects to Ollama dynamically via the load balancer’s DNS name.
🔹 Users can now chat with the model in a beautiful UI.

4️⃣ Secure Service-to-Service Communication

We ensure WebUI can talk to Ollama securely:

 // Security Configuration
    ollamaService.service.connections.allowFrom(
      webuiService.service,
      ec2.Port.tcp(11434)
    );

🔹 This ensures only the WebUI can access Ollama, reducing security risks.

5️⃣ Health Checks for Auto-Restart

We configure custom health checks so ECS can replace unhealthy containers automatically.

ollamaService.targetGroup.configureHealthCheck({
  path: '/',
  port: '11434',
  timeout: cdk.Duration.minutes(2),
  interval: cdk.Duration.minutes(4),
  healthyThresholdCount: 2,
  unhealthyThresholdCount: 3
});

webuiService.targetGroup.configureHealthCheck({
  path: '/',
  healthyHttpCodes: '200-399',
});

🔹 Fargate will restart failed containers automatically if they fail health checks.

🌐 HTTPS Support (Optional)

Want a secure HTTPS connection? Just uncomment and provide a valid ACM certificate ARN:

webuiService.loadBalancer.addListener('HTTPS', {
  port: 443,
  certificates: [/* Your ACM cert ARN */],
  defaultAction: elbv2.ListenerAction.forward([webuiService.targetGroup])
});

🔹 ACM (AWS Certificate Manager) handles SSL/TLS for you. No need to manually manage certs!

🎯 Deployment: One Command to Deploy

Now for the fun part. Deploy everything with a single command:

cdk deploy

Once done, grab the WebUI URL (ChatInterfaceURL) from CloudFormation Stack Outputs and start chatting!

new cdk.CfnOutput(this, 'ChatInterfaceURL', {
  value: webuiService.loadBalancer.loadBalancerDnsName
});

Want to manually pull the model via API? Grab the output of ModelPullCommand from CloudFormation Stack Outputs:

 new cdk.CfnOutput(this, 'ModelPullCommand', {
      value: `curl -X POST http://${ollamaService.loadBalancer.loadBalancerDnsName}/api/pull -d '{"name": "deepseek-r1:7b"}'`
    });

CDK Deploy — CloudFormation Stack Output

Open WebUI ChatBot Launched from AWS Load balancer URL

Verify the Ollama API Connection is set correctly to the Ollama ECS Service Load Balancer URL

Try it out!

DeepSeek-R1 7B on Ollama explaining quantum computing to a 9 year old

🚀 Pro Tip: Easily Add New LLMs to Your Ollama Container

Want to try a different LLM? You can dynamically pull a new model into your Ollama container with a simple cURL command — no need to redeploy! 💡

curl -X POST http://<OLLAMA_LOAD_BALANCER_DNS>/api/pull -d '{"name": "your-model-name-here"}'

🔹 Replace "your-model-name-here" with any supported LLM (e.g., deepseek-r1:1.5b, mistral:7b).
🔹 Ollama will automatically download and serve the new model.

🔥 No downtime. No extra configurations. Just instant AI magic! 🚀

New models instantly reflected in Open WebUI

Source Code

The complete source code for this project can be found on our GitHub repository here. Simply clone the repository using the following command:

git clone https://github.com/awsdataarchitect/ecs-ollama-deepseek.git && cd ecs-ollama-deepseek

🔮 Final Thoughts

With Fargate, ALB, and ECS, we’ve built a scalable AI-powered chat interface that:

✅ Runs DeepSeek-R1 on AWS ECS Fargate with zero EC2 overhead
✅ Provides a WebUI for seamless interaction
✅ Auto-scales & self-heals with ECS & ALB health checks
✅ Requires zero manual maintenance
✅ Data Sovereignty — Zero cross-border data transfer risks
✅ Zero Token Fees 🚀

Key Cost Advantages

Zero Token Fees
Unlike Bedrock Claude 3.5 Sonnet’s $3 per million tokens (example), Ollama processes tokens locally using your Fargate allocation
Predictable Pricing
ECS Fargate costs ~$33/month for 4vCPU/16GB (vs $165+/month for comparable SageMaker ml.g5.xlarge)

I’d love to hear your thoughts! What model are you deploying next on Fargate? Let’s discuss this in the comments. 👇

Thank you for being a part of the community

Before you go:

Be sure to clap and follow the writer ️👏️️
Follow us: X | LinkedIn | YouTube | Newsletter | Podcast
Check out CoFeed, the smart way to stay up-to-date with the latest in tech 🧪
Start your own free AI-powered blog on Differ 🚀
Join our content creators community on Discord 🧑🏻‍💻
For more content, visit plainenglish.io + stackademic.com