What is AWS ECS?
Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to run, stop, and manage Docker containers on a cluster. ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.
ECS supports both EC2 and Fargate launch types, giving you the flexibility to choose between managing your own infrastructure or having AWS handle it completely. With ECS, you can focus on building and running your applications instead of managing the underlying infrastructure.
Key Components and Architecture
ECS Architecture Overview
graph TB
A[Application Load Balancer] --> B[ECS Service]
B --> C[Task Definition]
C --> D[Container 1]
C --> E[Container 2]
B --> F[ECS Cluster]
F --> G[EC2 Instance 1]
F --> H[EC2 Instance 2]
F --> I[Fargate]
G --> J[ECS Agent]
H --> K[ECS Agent]
L[ECR Registry] --> C
M[CloudWatch] --> B
N[IAM Roles] --> B
🏗️ Architecture Flow Explanation:
Traffic Flow: External traffic enters through the Application Load Balancer, which distributes requests across healthy tasks in your ECS Service. The service ensures your desired number of tasks are always running.
Container Management: Each task runs based on a Task Definition blueprint that specifies container images (pulled from ECR), resource requirements, and networking configuration. Tasks can run on either EC2 instances (managed by ECS Agents) or Fargate (fully managed).
Monitoring & Security: CloudWatch collects metrics and logs from your services, while IAM Roles provide secure access to AWS resources without hardcoding credentials.
🏗️ Task Definition
A blueprint that describes how containers should run, including CPU/memory requirements, networking, and storage configurations.
🔧 Service
Ensures a specified number of tasks are running and replaces unhealthy tasks automatically.
🖥️ Cluster
A logical grouping of compute resources (EC2 instances or Fargate) where tasks run.
📦 Task
A running instance of a task definition, containing one or more containers.
Launch Types Comparison
graph LR
A[ECS Launch Types] --> B[EC2 Launch Type]
A --> C[Fargate Launch Type]
B --> D[You manage EC2 instances]
B --> E[More control over infrastructure]
B --> F[Cost optimization possible]
C --> G[AWS manages infrastructure]
C --> H[Serverless experience]
C --> I[Pay per task]
⚖️ Launch Type Comparison:
EC2 Launch Type: You provision and manage EC2 instances in your cluster. This gives you full control over the underlying infrastructure, including instance types, operating systems, and networking. Best for predictable workloads where you can optimize costs through Reserved Instances or Spot Instances.
Fargate Launch Type: AWS completely manages the underlying infrastructure. You only define your task requirements (CPU, memory) and AWS handles provisioning, patching, and scaling the compute resources. Ideal for variable workloads and when you want to focus purely on application development.
Cost Consideration: EC2 can be more cost-effective for consistent workloads, while Fargate eliminates operational overhead and is better for sporadic or unpredictable traffic patterns.
🚀 Setting Up AWS ECS with CLI
Prerequisites
aws configure
This command configures your AWS CLI with access keys, secret keys, default region, and output format. You'll need appropriate IAM permissions for ECS operations.
1. Create an ECS Cluster
aws ecs create-cluster --cluster-name my-ecs-cluster
Creates a new ECS cluster named "my-ecs-cluster". A cluster is a logical grouping of tasks or services. This is your foundational infrastructure where containers will run.
aws ecs create-cluster --cluster-name my-fargate-cluster --capacity-providers FARGATE FARGATE_SPOT --default-capacity-provider-strategy capacityProvider=FARGATE,weight=1
Creates a cluster specifically configured for Fargate with both regular and Spot capacity providers. Fargate Spot can provide cost savings of up to 70% for fault-tolerant workloads.
2. Register a Task Definition
aws ecs register-task-definition --cli-input-json file://task-definition.json
Registers a new task definition from a JSON file. The task definition is like a blueprint that tells ECS how to run your containers, including resource requirements, networking, and storage configurations.
Note: You'll need to create a task-definition.json file with your container specifications. See the example below for reference.
3. Create a Service
aws ecs create-service --cluster my-ecs-cluster --service-name my-web-service --task-definition my-web-app:1 --desired-count 2
Creates a service that ensures 2 instances of your task are always running. The service monitors task health and automatically replaces failed tasks to maintain the desired count.
aws ecs create-service --cluster my-fargate-cluster --service-name my-fargate-service --task-definition my-web-app:1 --desired-count 2 --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[subnet-12345,subnet-67890],securityGroups=[sg-abcdef],assignPublicIp=ENABLED}"
Creates a Fargate service with network configuration. Fargate requires awsvpc network mode, so you must specify subnets and security groups. The assignPublicIp option allows tasks to pull images from public repositories.
4. List and Monitor
aws ecs list-clusters
Lists all ECS clusters in your account. Useful for getting an overview of your container infrastructure.
aws ecs describe-clusters --clusters my-ecs-cluster
Provides detailed information about a specific cluster, including active services, running tasks, and registered container instances.
aws ecs list-services --cluster my-ecs-cluster
Lists all services running in the specified cluster. Services maintain the desired number of running tasks.
aws ecs describe-services --cluster my-ecs-cluster --services my-web-service
Shows detailed information about a specific service, including deployment status, task definition being used, and health status of running tasks.
aws ecs list-tasks --cluster my-ecs-cluster --service-name my-web-service
Lists all tasks currently running for a specific service. Each task represents a running instance of your task definition.
5. Scaling Operations
aws ecs update-service --cluster my-ecs-cluster --service my-web-service --desired-count 5
Scales your service to 5 running tasks. ECS will automatically launch additional tasks to meet the new desired count or terminate excess tasks if scaling down.
aws ecs put-scaling-policy --service-namespace ecs --resource-id service/my-ecs-cluster/my-web-service --scalable-dimension ecs:service:DesiredCount --policy-name my-scaling-policy --policy-type TargetTrackingScaling --target-tracking-scaling-policy-configuration file://scaling-policy.json
Creates an auto-scaling policy for your service. This enables automatic scaling based on CloudWatch metrics like CPU utilization or request count, ensuring your application can handle varying loads.
6. Log Management
aws logs describe-log-groups --log-group-name-prefix /ecs/
Lists CloudWatch log groups for ECS tasks. ECS can automatically send container logs to CloudWatch for centralized logging and monitoring.
aws logs get-log-events --log-group-name /ecs/my-web-app --log-stream-name ecs/my-container/task-id
Retrieves log events from a specific log stream. This is useful for debugging application issues or monitoring application behavior.
7. Cleanup Operations
aws ecs update-service --cluster my-ecs-cluster --service my-web-service --desired-count 0
Scales down the service to 0 tasks, effectively stopping all running containers for this service while keeping the service definition intact.
aws ecs delete-service --cluster my-ecs-cluster --service my-web-service
Deletes the service entirely. Note: You must scale the service to 0 desired count before deletion. This removes the service configuration and stops managing the tasks.
aws ecs delete-cluster --cluster my-ecs-cluster
Deletes the entire cluster. The cluster must be empty (no active services or tasks) before deletion. This removes the logical grouping but doesn't affect EC2 instances if using EC2 launch type.
Task Definition Structure
Task Definition Components
graph TD
A[Task Definition] --> B[Family & Revision]
A --> C[Task Role & Execution Role]
A --> D[Network Mode]
A --> E[Container Definitions]
A --> F[CPU & Memory]
A --> G[Storage]
E --> H[Container 1]
E --> I[Container 2]
H --> J[Image URI]
H --> K[Port Mappings]
H --> L[Environment Variables]
H --> M[Health Check]
📋 Task Definition Structure:
Core Metadata: Family & Revision provide versioning for your task definitions, allowing you to track changes and roll back if needed. Each revision increments automatically when you register updates.
Security & Networking: Task Role defines what AWS services your application can access, while Execution Role allows ECS to pull images and write logs. Network Mode determines how containers communicate (bridge, host, or awsvpc).
Container Configuration: Each container definition specifies the Docker image location, port mappings for network access, environment variables for configuration, and health checks to ensure container readiness.
Resource Allocation: CPU and Memory settings ensure proper resource allocation, while Storage configurations handle persistent data and temporary file systems.
Sample Task Definition (task-definition.json)
{
"family": "my-web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web-server",
"image": "nginx:latest",
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-web-app",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost/ || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
]
}
This task definition creates a Fargate-compatible nginx web server with 256 CPU units and 512 MB memory. It includes health checks, CloudWatch logging, and proper IAM roles for execution.
Service Discovery and Load Balancing
graph TB
A[Internet Gateway] --> B[Application Load Balancer]
B --> C[Target Group]
C --> D[ECS Service]
D --> E[Task 1]
D --> F[Task 2]
D --> G[Task 3]
H[Route 53] --> I[Service Discovery]
I --> D
J[VPC] --> K[Private Subnet 1]
J --> L[Private Subnet 2]
K --> E
L --> F
L --> G
🌐 Load Balancing & Service Discovery:
Traffic Distribution: The Application Load Balancer receives traffic from the Internet Gateway and distributes it across healthy tasks through Target Groups. Each task is automatically registered/deregistered as they start/stop.
High Availability: Tasks are distributed across multiple Availability Zones (Private Subnets) within your VPC, ensuring your application remains available even if one AZ experiences issues.
Service Discovery: Route 53 integration allows other services to discover your ECS services using DNS names instead of hard-coded IP addresses, enabling dynamic service-to-service communication.
Security: Tasks run in private subnets, isolating them from direct internet access while still allowing outbound connectivity and load balancer health checks.
Setting Up Load Balancer Integration
aws elbv2 create-load-balancer --name my-ecs-alb --subnets subnet-12345 subnet-67890 --security-groups sg-abcdef
Creates an Application Load Balancer that will distribute traffic across your ECS tasks. The ALB provides health checking and can route traffic based on rules.
aws elbv2 create-target-group --name my-ecs-targets --protocol HTTP --port 80 --vpc-id vpc-12345 --target-type ip --health-check-path /health
Creates a target group for the load balancer. For Fargate tasks, use target-type "ip". The health check path should be an endpoint that returns 200 OK when the container is healthy.
aws ecs create-service --cluster my-ecs-cluster --service-name my-web-service --task-definition my-web-app:1 --desired-count 2 --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[subnet-12345,subnet-67890],securityGroups=[sg-abcdef]}" --load-balancers targetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/my-ecs-targets/1234567890,containerName=web-server,containerPort=80
Creates an ECS service integrated with the load balancer. ECS automatically registers and deregisters tasks with the target group as they start and stop.
🎯 Best Practices
- Use Fargate for simplified management: Unless you need specific EC2 features, Fargate eliminates infrastructure management overhead.
- Implement proper health checks: Configure container health checks and ALB health checks to ensure only healthy tasks receive traffic.
- Right-size your resources: Monitor CPU and memory utilization to optimize task resource allocation and costs.
- Use task roles for security: Assign specific IAM roles to tasks following the principle of least privilege.
- Enable logging and monitoring: Configure CloudWatch logs and Container Insights for better observability.
- Use secrets management: Store sensitive data in AWS Secrets Manager or Systems Manager Parameter Store instead of environment variables.
- Implement auto-scaling: Configure service auto-scaling based on CloudWatch metrics to handle varying loads efficiently.
- Use multiple AZs: Deploy services across multiple availability zones for high availability.
- Tag your resources: Use consistent tagging for cost allocation and resource management.
- Regular updates: Keep container images updated and use image scanning for security vulnerabilities.
Monitoring and Troubleshooting
CloudWatch Integration
aws ecs put-account-setting --name containerInsights --value enabled
Enables Container Insights for your account, providing detailed monitoring metrics for CPU, memory, network, and storage at both cluster and service levels.
aws logs create-log-group --log-group-name /ecs/my-web-app
Creates a CloudWatch log group for your ECS tasks. This must be created before tasks can send logs to CloudWatch.
aws cloudwatch get-metric-statistics --namespace AWS/ECS --metric-name CPUUtilization --dimensions Name=ServiceName,Value=my-web-service Name=ClusterName,Value=my-ecs-cluster --start-time 2024-01-01T00:00:00Z --end-time 2024-01-01T23:59:59Z --period 3600 --statistics Average
Retrieves CPU utilization metrics for your service. Use this data to understand performance patterns and configure auto-scaling policies.
Troubleshooting Commands
aws ecs describe-task-definition --task-definition my-web-app:1
Shows the complete task definition configuration. Useful for verifying container settings, resource allocation, and configuration parameters.
aws ecs describe-tasks --cluster my-ecs-cluster --tasks arn:aws:ecs:region:account:task/task-id
Provides detailed information about specific tasks, including current status, health check results, and failure reasons if applicable.
aws ecs stop-task --cluster my-ecs-cluster --task arn:aws:ecs:region:account:task/task-id --reason "Manual restart for troubleshooting"
Manually stops a specific task. The service will automatically start a replacement task. Useful for troubleshooting problematic tasks.
💡 Pro Tip: Always test your configurations in a development environment first. ECS changes can affect running applications, so use blue-green deployments for production updates.