AWS ECS DNS Services Guide for Network Engineers
Prerequisites: This guide assumes familiarity with traditional networking concepts but no prior container experience. We'll explain all ECS-specific terminology and components.
1. ECS Components Overview
ECS Cluster
Think of this as a logical grouping of compute resources (like a rack of servers). It's where your containers will run, similar to how VMs run on physical hosts.
Task Definition
This is like a VM template or blueprint. It defines what container images to run, how much CPU/memory to allocate, networking configuration, and other runtime parameters.
Service
A service ensures a specified number of tasks (running containers) are always running. It's like a process manager that automatically restarts failed containers and handles rolling updates.
Task
A task is a running instance of a task definition. Think of it as a running VM created from a template. One task can contain multiple containers that share networking and storage.
Service Discovery
This is AWS's built-in DNS service for ECS. It automatically creates and manages DNS records for your services, similar to how DHCP assigns IP addresses automatically.
2. ECS DNS Service Discovery Architecture
graph TB
subgraph "VPC (10.0.0.0/16)"
subgraph "Private Subnet A (10.0.1.0/24)"
ECS1[ECS Task 1
IP: 10.0.1.10]
ECS2[ECS Task 2
IP: 10.0.1.11]
end
subgraph "Private Subnet B (10.0.2.0/24)"
ECS3[ECS Task 3
IP: 10.0.2.10]
ECS4[ECS Task 4
IP: 10.0.2.11]
end
subgraph "AWS Cloud Map"
NS[Private DNS Namespace
myapp.local]
SRV[Service Registry
web.myapp.local]
end
end
Client[Client Application] --> NS
NS --> SRV
SRV --> ECS1
SRV --> ECS2
SRV --> ECS3
SRV --> ECS4
style ECS1 fill:#e1f5fe
style ECS2 fill:#e1f5fe
style ECS3 fill:#e1f5fe
style ECS4 fill:#e1f5fe
style NS fill:#fff3e0
style SRV fill:#f3e5f5
Architecture Explanation:
This diagram shows how ECS Service Discovery works within a VPC. The key components are:
- Private DNS Namespace: Creates a private DNS zone (myapp.local) within your VPC
- Service Registry: Automatically maintains A records pointing to healthy ECS tasks
- ECS Tasks: Running containers that register themselves with the service registry
- Client Resolution: Applications can resolve service names (web.myapp.local) to get IP addresses of healthy tasks
When a task starts, it automatically registers with the service registry. When it stops or fails health checks, it's automatically removed.
3. DNS Resolution Flow
sequenceDiagram
participant Client
participant Route53Resolver
participant CloudMap
participant ECSService
participant Task1
participant Task2
Client->>Route53Resolver: DNS Query: web.myapp.local
Route53Resolver->>CloudMap: Forward to Private DNS Zone
CloudMap->>CloudMap: Check Service Registry
CloudMap->>Route53Resolver: Return IP List [10.0.1.10, 10.0.2.10]
Route53Resolver->>Client: DNS Response with IPs
Client->>Task1: HTTP Request to 10.0.1.10
Task1->>Client: HTTP Response
Note over CloudMap,ECSService: Automatic Registration/Deregistration
ECSService->>Task1: Health Check
Task1->>ECSService: Health OK
ECSService->>CloudMap: Keep IP in registry
ECSService->>Task2: Health Check
Task2->>ECSService: Health FAIL
ECSService->>CloudMap: Remove IP from registry
DNS Resolution Flow Explanation:
This sequence shows how DNS resolution works with ECS Service Discovery:
- Client Query: Application requests DNS resolution for service name
- Route 53 Resolver: VPC's DNS resolver forwards query to Cloud Map
- Cloud Map Lookup: Returns list of healthy task IP addresses
- Client Connection: Client connects to one of the returned IPs
- Health Management: ECS continuously monitors task health and updates DNS records
The system automatically handles task failures by removing unhealthy IPs from DNS responses.
4. Service Discovery Types
graph LR
subgraph "DNS-Only Discovery"
DNS[DNS A Records
web.myapp.local ā IP List]
DNS --> IP1[10.0.1.10]
DNS --> IP2[10.0.1.11]
end
subgraph "DNS + SRV Discovery"
SRV[SRV Records
_http._tcp.web.myapp.local]
SRV --> PORT1[10.0.1.10:8080]
SRV --> PORT2[10.0.1.11:8080]
end
subgraph "API-Only Discovery"
API[Cloud Map API
DiscoverInstances]
API --> RESP[JSON Response
with IPs + metadata]
end
style DNS fill:#e8f5e8
style SRV fill:#fff3e0
style API fill:#f3e5f5
Service Discovery Types Explanation:
AWS ECS offers three types of service discovery:
- DNS-Only: Simple A records that return IP addresses. Best for HTTP/HTTPS services on standard ports.
- DNS + SRV: Includes port information in SRV records. Useful when services run on non-standard ports.
- API-Only: No DNS records created. Applications use Cloud Map API to discover services programmatically.
DNS-only is the most common choice as it works with existing applications without code changes.
5. Implementation Command Sequence
Setup Order and Dependencies
graph TD
A[1Create VPC & Subnets] --> B[2Create ECS Cluster]
B --> C[3Create Cloud Map Namespace]
C --> D[4Create Cloud Map Service]
D --> E[5Create Task Definition]
E --> F[6Create ECS Service]
F --> G[7Verify DNS Resolution]
style A fill:#ffebee
style B fill:#e8f5e8
style C fill:#fff3e0
style D fill:#f3e5f5
style E fill:#e1f5fe
style F fill:#fce4ec
style G fill:#f1f8e9
Command Sequence Dependencies:
This diagram shows the order in which components must be created. Each step depends on the previous ones:
- Infrastructure First: VPC and networking must exist before ECS
- Cluster Creation: ECS cluster provides the compute environment
- DNS Setup: Cloud Map namespace and service define DNS structure
- Application Definition: Task definition specifies container configuration
- Service Launch: ECS service starts tasks and registers them with DNS
- Verification: Test DNS resolution and service connectivity
6. Step-by-Step AWS CLI Commands
Step 1: Create VPC and Networking
aws ec2 create-vpc \
--cidr-block 10.0.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=ecs-vpc}]'
VPC Creation: This creates the virtual network where ECS tasks will run. The CIDR block defines the IP address range available for subnets.
Parameter |
Description |
Alternatives |
--cidr-block |
IP address range for the VPC |
172.16.0.0/16, 192.168.0.0/16 |
--tag-specifications |
Tags for resource identification |
Optional, but recommended for organization |
aws ec2 create-subnet \
--vpc-id vpc-12345678 \
--cidr-block 10.0.1.0/24 \
--availability-zone us-east-1a \
--tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=ecs-subnet-1}]'
Subnet Creation: Creates a subnet within the VPC for placing ECS tasks. Multiple subnets in different AZs provide high availability.
Step 2: Create ECS Cluster
aws ecs create-cluster \
--cluster-name my-ecs-cluster \
--capacity-providers FARGATE \
--default-capacity-provider-strategy capacityProvider=FARGATE,weight=1 \
--tags key=Environment,value=production
ECS Cluster: This creates the logical grouping where containers will run. Fargate is serverless compute, meaning AWS manages the underlying infrastructure.
Parameter |
Description |
Alternatives |
--capacity-providers |
How to run containers |
EC2, FARGATE_SPOT for cost savings |
--default-capacity-provider-strategy |
Default compute allocation |
Can mix multiple providers with weights |
Step 3: Create Cloud Map Namespace
aws servicediscovery create-private-dns-namespace \
--name myapp.local \
--vpc vpc-12345678 \
--description "Private DNS namespace for ECS services"
Private DNS Namespace: This creates a private DNS zone within your VPC. Services registered here are only resolvable from within the VPC, providing internal service discovery.
Parameter |
Description |
Alternatives |
--name |
DNS domain name |
Any valid domain: internal, corp.local, etc. |
--vpc |
VPC where namespace is available |
Must be existing VPC ID |
Important: The namespace creation returns a namespace ID that you'll need for the next step. Save this output!
Step 4: Create Cloud Map Service
aws servicediscovery create-service \
--name web \
--namespace-id ns-12345678 \
--dns-config NamespaceId=ns-12345678,DnsRecords=[{Type=A,TTL=300}] \
--health-check-custom-config FailureThreshold=3 \
--description "Web service discovery"
Cloud Map Service: This creates the actual service registry within the namespace. It defines how DNS records are created and managed for your ECS service.
Parameter |
Description |
Alternatives |
--name |
Service name (becomes DNS record) |
Any valid hostname: api, db, cache |
DnsRecords Type |
A for IP addresses |
SRV for port information, CNAME for aliases |
TTL |
DNS cache time in seconds |
60-3600 seconds (1 min to 1 hour) |
FailureThreshold |
Health check failures before removal |
1-10 (higher = more tolerance) |
Step 5: Create Task Definition
aws ecs register-task-definition \
--family web-service \
--network-mode awsvpc \
--requires-compatibilities FARGATE \
--cpu 256 \
--memory 512 \
--execution-role-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
--container-definitions '[
{
"name": "web-container",
"image": "nginx:latest",
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]'
Task Definition: This is the blueprint for your containers. It specifies what container image to run, resource requirements, networking, and logging configuration.
Parameter |
Description |
Alternatives |
--family |
Task definition name/group |
Any descriptive name |
--network-mode |
awsvpc gives each task its own ENI |
bridge, host (for EC2 only) |
--cpu |
CPU units (1024 = 1 vCPU) |
256, 512, 1024, 2048, 4096 |
--memory |
Memory in MB |
512, 1024, 2048, 4096, 8192 |
containerPort |
Port the container listens on |
Any port 1-65535 |
Note: The execution role allows ECS to pull container images and write logs. You may need to create this role first if it doesn't exist.
Step 6: Create ECS Service with Service Discovery
aws ecs create-service \
--cluster my-ecs-cluster \
--service-name web-service \
--task-definition web-service:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration 'awsvpcConfiguration={
subnets=[subnet-12345678,subnet-87654321],
securityGroups=[sg-12345678],
assignPublicIp=DISABLED
}' \
--service-registries '[
{
"registryArn": "arn:aws:servicediscovery:us-east-1:123456789012:service/srv-12345678"
}
]' \
--tags key=Environment,value=production
ECS Service: This creates the service that runs and manages your containers. It ensures the desired number of tasks are always running and registers them with service discovery.
Parameter |
Description |
Alternatives |
--desired-count |
Number of tasks to run |
1-100+ depending on needs |
subnets |
Where to place tasks |
Multiple subnets for HA |
securityGroups |
Firewall rules for tasks |
Must allow required ports |
assignPublicIp |
DISABLED for private services |
ENABLED if tasks need internet |
registryArn |
Cloud Map service ARN |
From previous step's output |
Step 7: Verify DNS Resolution
# Test DNS resolution from within VPC
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type t3.micro \
--subnet-id subnet-12345678 \
--security-group-ids sg-12345678 \
--user-data '#!/bin/bash
yum update -y
yum install -y bind-utils
echo "Testing DNS resolution..."
nslookup web.myapp.local
dig web.myapp.local
curl -I http://web.myapp.local'
DNS Verification: This launches a test instance to verify DNS resolution works. The commands test both DNS lookup and HTTP connectivity to your service.
7. Traffic Flow with Load Balancer Integration
graph TD
subgraph "Internet"
USER[User Request]
end
subgraph "VPC"
subgraph "Public Subnets"
ALB[Application Load Balancer\npublic-facing]
end
subgraph "Private Subnets"
subgraph "ECS Cluster"
T1[Task 1\n10.0.1.10:80]
T2[Task 2\n10.0.1.11:80]
T3[Task 3\n10.0.2.10:80]
end
subgraph "Service Discovery"
DNS[web.myapp.local\nā 10.0.1.10, 10.0.1.11, 10.0.2.10]
end
subgraph "Internal Service"
INT[Internal App\ncalling web.myapp.local]
end
end
end
USER --> ALB
ALB --> T1
ALB --> T2
ALB --> T3
INT --> DNS
DNS --> T1
DNS --> T2
DNS --> T3
style USER fill:#ffebee
style ALB fill:#e8f5e8
style T1 fill:#e1f5fe
style T2 fill:#e1f5fe
style T3 fill:#e1f5fe
style DNS fill:#fff3e0
style INT fill:#f3e5f5
Traffic Flow Explanation:
This diagram shows how ECS services can be accessed both externally and internally:
- External Access: Users connect through an Application Load Balancer in public subnets
- Internal Access: Other services use DNS names (web.myapp.local) for service-to-service communication
- Service Discovery: Automatically maintains DNS records for healthy tasks
- High Availability: Tasks distributed across multiple AZs
This pattern allows for secure internal communication while still providing external access when needed.
8. Health Check and DNS Management
stateDiagram-v2
[*] --> TaskStarting
TaskStarting --> HealthCheck : Task starts
HealthCheck --> Healthy : Passes health check
HealthCheck --> Unhealthy : Fails health check
Healthy --> DNSRegistered : Register with DNS
DNSRegistered --> ServingTraffic : Receive traffic
ServingTraffic --> HealthCheck : Continuous monitoring
Unhealthy --> TaskStopping : Stop unhealthy task
TaskStopping --> DNSDeregistered : Remove from DNS
DNSDeregistered --> [*] : Task terminated
ServingTraffic --> Unhealthy : Health check fails
Unhealthy --> Healthy : Health check passes
Health Check Lifecycle:
This state diagram shows how ECS manages task health and DNS registration:
- Task Starting: New task begins startup process
- Health Check: ECS performs health checks (HTTP, TCP, or custom)
- DNS Registration: Healthy tasks are added to DNS records
- Traffic Serving: Task receives traffic from service discovery
- Continuous Monitoring: Health checks continue throughout task lifecycle
- Failure Handling: Unhealthy tasks are removed from DNS and replaced
This ensures that only healthy tasks receive traffic, providing automatic failover.
9. DNS Query Types and Use Cases
graph LR
subgraph "A Record Query"
A1[Client: nslookup web.myapp.local]
A2[Response: 10.0.1.10
10.0.1.11
10.0.2.10]
end
subgraph "SRV Record Query"
S1[Client: dig SRV _http._tcp.web.myapp.local]
S2[Response: 10 0 8080 web-1.myapp.local
10 0 8080 web-2.myapp.local]
end
subgraph "API Discovery"
API1[Client: DiscoverInstances API]
API2[Response: JSON with IPs,
ports, metadata]
end
A1 --> A2
S1 --> S2
API1 --> API2
style A1 fill:#e8f5e8
style S1 fill:#fff3e0
style API1 fill:#f3e5f5
DNS Query Types:
Different query types serve different use cases:
- A Records: Simple IP address lookup, works with any HTTP client
- SRV Records: Include port information, useful for non-standard ports
- API Discovery: Programmatic discovery with rich metadata
Choose A records for simplicity, SRV for port flexibility, API for advanced scenarios.
10. Troubleshooting Common Issues
Common Issue #1: DNS resolution not working
# Check VPC DNS settings
aws ec2 describe-vpcs --vpc-ids vpc-12345678 --query 'Vpcs[0].{DnsSupport:DnsSupport,DnsHostnames:DnsHostnames}'
# Verify Route 53 Resolver
aws route53resolver describe-resolver-endpoints --filters Name=VpcId,Values=vpc-12345678
Common Issue #2: Tasks not registering with service discovery
# Check service registry instances
aws servicediscovery list-instances --service-id srv-12345678
# Check ECS service events
aws ecs describe-services --cluster my-ecs-cluster --services web-service --query 'services[0].events[0:5]'
11. Best Practices and Security Considerations
Security Best Practices:
- Use private subnets for ECS tasks to prevent direct internet access
- Implement least-privilege security groups
- Use private DNS namespaces to prevent external DNS resolution
- Enable VPC Flow Logs to monitor network traffic
- Use IAM roles for task authentication instead of hardcoded credentials
Performance Considerations:
- Set appropriate DNS TTL values (300s for dynamic services)
- Use health check grace periods to prevent premature task termination
- Distribute tasks across multiple Availability Zones
- Monitor DNS query patterns and adjust as needed
- Consider using Application Load Balancers for high-traffic services
12. Monitoring and Observability
# Monitor service discovery health
aws servicediscovery get-instances-health-status --service-id srv-12345678
# Check ECS service metrics
aws logs filter-log-events \
--log-group-name /ecs/web-service \
--start-time 1640995200000 \
--filter-pattern "ERROR"
# Monitor DNS resolution
aws cloudwatch get-metric-statistics \
--namespace AWS/Route53Resolver \
--metric-name QueryCount \
--dimensions Name=VPC,Value=vpc-12345678 \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 3600 \
--statistics Sum
Monitoring Commands: These commands help you monitor the health and performance of your ECS DNS setup. Regular monitoring helps identify issues before they impact users.
Summary
AWS ECS Service Discovery provides automatic DNS management for containerized applications, similar to how DHCP automatically assigns IP addresses. The key benefits include:
- Automatic Registration: Tasks automatically register/deregister with DNS
- Health-based Routing: Only healthy tasks receive traffic
- VPC Integration: Works seamlessly with existing VPC networking
- Multiple Discovery Methods: DNS, SRV records, and API-based discovery
- High Availability: Built-in failover and load distribution
This setup enables microservices to communicate using simple DNS names while AWS handles the complexity of service registration, health monitoring, and traffic routing.