AWS Monitoring and Troubleshooting Guide

Comprehensive guide to AWS monitoring, troubleshooting, and network analysis

1. CloudWatch Metrics, Logs, Alarms, and Dashboards

CloudWatch Architecture Overview

graph TB A[AWS Resources] --> B[CloudWatch Metrics] A --> C[CloudWatch Logs] B --> D[CloudWatch Alarms] C --> E[Log Groups] E --> F[Log Streams] D --> G[SNS Notifications] D --> H[Auto Scaling Actions] B --> I[CloudWatch Dashboards] C --> I J[Custom Applications] --> K[CloudWatch Agent] K --> B K --> C L[Lambda Functions] --> M[CloudWatch Events] M --> N[Event Rules] N --> O[Targets]
CloudWatch Architecture Explanation: This diagram shows how AWS resources automatically send metrics and logs to CloudWatch. The CloudWatch Agent can be installed on EC2 instances to collect custom metrics and logs. CloudWatch Alarms monitor metrics and trigger actions like SNS notifications or Auto Scaling. All data can be visualized through CloudWatch Dashboards, while CloudWatch Events (now EventBridge) handles event-driven automation.

Setting up CloudWatch Monitoring

Command Execution Order

graph LR A[1Create Log Group] --> B[2Install CloudWatch Agent] B --> C[3Configure Agent] C --> D[4Create Custom Metrics] D --> E[5Set Up Alarms] E --> F[6Create Dashboard]

Step 1: Create CloudWatch Log Group

aws logs create-log-group --log-group-name /aws/ec2/application-logs --region us-east-1
Parameters:
  • --log-group-name: Name of the log group (hierarchical naming recommended)
  • --region: AWS region where the log group will be created
Additional Options:
  • --retention-in-days: Set log retention period (1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653)
  • --kms-key-id: KMS key for log encryption
  • --tags: Key-value pairs for resource tagging

Step 2: Install CloudWatch Agent

wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm sudo rpm -U ./amazon-cloudwatch-agent.rpm
This downloads and installs the CloudWatch agent on Amazon Linux. The agent enables collection of system-level metrics and logs that aren't available by default. For other operating systems, use the appropriate package (deb for Ubuntu/Debian, msi for Windows).

Step 3: Configure CloudWatch Agent

{ "agent": { "metrics_collection_interval": 60, "run_as_user": "cwagent" }, "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/var/log/httpd/access_log", "log_group_name": "/aws/ec2/apache-access", "log_stream_name": "{instance_id}", "timezone": "UTC" } ] } } }, "metrics": { "namespace": "Custom/Application", "metrics_collected": { "cpu": { "measurement": [ "cpu_usage_idle", "cpu_usage_iowait", "cpu_usage_user", "cpu_usage_system" ], "metrics_collection_interval": 60 }, "disk": { "measurement": [ "used_percent" ], "metrics_collection_interval": 60, "resources": [ "*" ] }, "mem": { "measurement": [ "mem_used_percent" ], "metrics_collection_interval": 60 } } } }
Configuration Parameters:
  • metrics_collection_interval: How often metrics are collected (in seconds)
  • run_as_user: User account for running the agent
  • file_path: Path to log files to monitor
  • log_group_name: CloudWatch log group destination
  • log_stream_name: Stream name pattern ({instance_id}, {hostname}, etc.)
  • namespace: Custom namespace for metrics organization
This configuration collects CPU, disk, and memory metrics along with Apache access logs. The agent will create the specified log groups and streams automatically.
aws ssm put-parameter --name "AmazonCloudWatch-Config" --type "String" --value file://cloudwatch-config.json --region us-east-1
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:AmazonCloudWatch-Config

Step 4: Create Custom Metrics

aws cloudwatch put-metric-data --namespace "Custom/Application" --metric-data MetricName=ConnectionCount,Value=25,Unit=Count,Dimensions=InstanceId=i-1234567890abcdef0
Custom Metrics Parameters:
  • --namespace: Logical grouping for metrics
  • MetricName: Name of the metric
  • Value: Numeric value of the metric
  • Unit: Unit of measurement (Count, Bytes, Seconds, Percent, etc.)
  • Dimensions: Key-value pairs for metric filtering
Custom metrics enable monitoring of application-specific data points that aren't available through standard AWS metrics.

Step 5: Create CloudWatch Alarms

aws cloudwatch put-metric-alarm \ --alarm-name "High-CPU-Usage" \ --alarm-description "Alarm when CPU exceeds 70%" \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Average \ --period 300 \ --threshold 70 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 2 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:cpu-alarm-topic \ --dimensions Name=InstanceId,Value=i-1234567890abcdef0
Alarm Parameters:
  • --period: Time period for metric evaluation (in seconds)
  • --threshold: Value that triggers the alarm
  • --comparison-operator: How to compare metric to threshold
  • --evaluation-periods: Number of periods before triggering
  • --alarm-actions: Actions to take when alarm triggers
  • --statistic: Statistical measure (Average, Sum, Maximum, Minimum)
This alarm monitors CPU utilization and triggers when it exceeds 70% for two consecutive 5-minute periods.

Step 6: Create CloudWatch Dashboard

{ "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "metrics": [ [ "AWS/EC2", "CPUUtilization", "InstanceId", "i-1234567890abcdef0" ], [ ".", "NetworkIn", ".", "." ], [ ".", "NetworkOut", ".", "." ] ], "period": 300, "stat": "Average", "region": "us-east-1", "title": "EC2 Instance Metrics" } }, { "type": "log", "x": 0, "y": 6, "width": 24, "height": 6, "properties": { "query": "SOURCE '/aws/ec2/apache-access' | fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 100", "region": "us-east-1", "title": "Application Errors" } } ] }
aws cloudwatch put-dashboard --dashboard-name "ApplicationMonitoring" --dashboard-body file://dashboard-config.json
Dashboard Configuration: The dashboard JSON defines widget layout and data sources. Widget types include metric graphs, log queries, text, and numbers. Each widget specifies its position (x, y), size (width, height), and data properties. Log widgets can use CloudWatch Logs Insights queries for advanced log analysis.

2. VPC Flow Logs Analysis

VPC Flow Logs Architecture

graph TB A[VPC] --> B[Subnets] A --> C[Network Interfaces] A --> D[Internet Gateway] A --> E[NAT Gateway] B --> F[EC2 Instances] C --> G[ELB] H[Flow Logs] --> I[S3 Bucket] H --> J[CloudWatch Logs] H --> K[Kinesis Data Firehose] A --> H B --> H C --> H I --> L[Athena Queries] J --> M[CloudWatch Insights] K --> N[Real-time Processing] O[VPC Flow Logs Format] --> P[Base Fields] O --> Q[Extended Fields] P --> R[srcaddr, dstaddr, srcport, dstport, protocol] Q --> S[vpc-id, subnet-id, instance-id, interface-id]
VPC Flow Logs Architecture: This diagram illustrates how VPC Flow Logs capture network traffic metadata from various VPC components. Flow logs can be stored in S3 for long-term analysis, CloudWatch Logs for real-time monitoring, or Kinesis Data Firehose for streaming processing. The logs contain both base fields (source/destination IPs, ports, protocol) and extended fields (VPC ID, subnet ID, instance ID) for comprehensive network analysis.

Flow Logs Traffic Analysis

sequenceDiagram participant Client participant IGW as Internet Gateway participant NAT as NAT Gateway participant ALB as Application Load Balancer participant EC2 as EC2 Instance participant RDS as RDS Database participant FL as Flow Logs Client->>IGW: HTTP Request IGW->>FL: Log: Client IP → ALB IP IGW->>ALB: Forward Request ALB->>FL: Log: ALB IP → EC2 IP ALB->>EC2: Forward to Backend EC2->>FL: Log: EC2 IP → RDS IP EC2->>RDS: Database Query RDS->>EC2: Query Response EC2->>FL: Log: RDS IP → EC2 IP EC2->>ALB: HTTP Response ALB->>FL: Log: EC2 IP → ALB IP ALB->>IGW: Forward Response IGW->>FL: Log: ALB IP → Client IP IGW->>Client: HTTP Response
Traffic Flow Analysis: This sequence diagram shows how network traffic flows through AWS components and how each hop generates Flow Log entries. Each network interface logs both inbound and outbound traffic, creating a complete audit trail. The logs capture source/destination IPs, ports, protocols, and packet/byte counts, enabling detailed network forensics and performance analysis.

Setting up VPC Flow Logs

Flow Logs Setup Order

graph LR A[1Create S3 Bucket] --> B[2Create IAM Role] B --> C[3Enable Flow Logs] C --> D[4Configure Log Format] D --> E[5Set up Athena] E --> F[6Query Analysis]

Step 1: Create S3 Bucket for Flow Logs

aws s3 mb s3://vpc-flow-logs-bucket-unique-name --region us-east-1
aws s3api put-bucket-policy --bucket vpc-flow-logs-bucket-unique-name --policy '{ "Version": "2012-10-17", "Statement": [ { "Sid": "AWSLogDeliveryWrite", "Effect": "Allow", "Principal": { "Service": "delivery.logs.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "arn:aws:s3:::vpc-flow-logs-bucket-unique-name/*" }, { "Sid": "AWSLogDeliveryCheck", "Effect": "Allow", "Principal": { "Service": "delivery.logs.amazonaws.com" }, "Action": "s3:GetBucketAcl", "Resource": "arn:aws:s3:::vpc-flow-logs-bucket-unique-name" } ] }'
The S3 bucket policy allows AWS Log Delivery service to write Flow Logs to the bucket. The policy grants PutObject permissions for writing logs and GetBucketAcl for bucket access verification. This is required for S3 as a Flow Logs destination.

Step 2: Create IAM Role for Flow Logs

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "vpc-flow-logs.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
aws iam create-role --role-name flowlogsRole --assume-role-policy-document file://trust-policy.json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogGroups", "logs:DescribeLogStreams" ], "Resource": "*" } ] }
aws iam put-role-policy --role-name flowlogsRole --policy-name flowlogsDeliveryRolePolicy --policy-document file://permissions-policy.json
IAM Role Components:
  • Trust Policy: Allows VPC Flow Logs service to assume the role
  • Permissions Policy: Grants permissions to create and write to CloudWatch Logs
  • Required Actions: CreateLogGroup, CreateLogStream, PutLogEvents for log delivery
This role enables Flow Logs to deliver captured network traffic data to CloudWatch Logs or S3.

Step 3: Enable VPC Flow Logs

aws ec2 create-flow-logs \ --resource-type VPC \ --resource-ids vpc-12345678 \ --traffic-type ALL \ --log-destination-type s3 \ --log-destination arn:aws:s3:::vpc-flow-logs-bucket-unique-name \ --log-format '${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${windowstart} ${windowend} ${action} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${region} ${az-id}'
Flow Logs Parameters:
  • --resource-type: VPC, Subnet, or NetworkInterface
  • --traffic-type: ALL, ACCEPT, or REJECT
  • --log-destination-type: cloud-watch-logs or s3
  • --log-format: Custom format with base and extended fields
Extended Fields Available:
  • vpc-id: VPC identifier
  • subnet-id: Subnet identifier
  • instance-id: EC2 instance identifier
  • interface-id: Network interface identifier
  • region: AWS region
  • az-id: Availability zone identifier

Step 4: Flow Logs for Different Resources

aws ec2 create-flow-logs \ --resource-type Subnet \ --resource-ids subnet-12345678 subnet-87654321 \ --traffic-type ALL \ --log-destination-type cloud-watch-logs \ --log-group-name VPCFlowLogs \ --deliver-logs-permission-arn arn:aws:iam::123456789012:role/flowlogsRole
aws ec2 create-flow-logs \ --resource-type NetworkInterface \ --resource-ids eni-12345678 \ --traffic-type REJECT \ --log-destination-type s3 \ --log-destination arn:aws:s3:::vpc-flow-logs-bucket-unique-name/rejected-traffic/
Different resource types enable granular monitoring:
  • VPC-level: Captures all traffic within the VPC
  • Subnet-level: Monitors specific subnet traffic patterns
  • ENI-level: Detailed monitoring of individual network interfaces
Traffic type filtering allows focusing on specific scenarios like security analysis (REJECT) or capacity planning (ALL).

Step 5: Set up Athena for Flow Logs Analysis

CREATE EXTERNAL TABLE vpc_flow_logs ( srcaddr string, dstaddr string, srcport int, dstport int, protocol int, packets int, bytes bigint, windowstart bigint, windowend bigint, action string, vpc_id string, subnet_id string, instance_id string, interface_id string, region string, az_id string ) PARTITIONED BY ( year string, month string, day string, hour string ) STORED AS PARQUET LOCATION 's3://vpc-flow-logs-bucket-unique-name/' TBLPROPERTIES ( 'projection.enabled'='true', 'projection.year.type'='integer', 'projection.year.range'='2023,2030', 'projection.month.type'='integer', 'projection.month.range'='1,12', 'projection.month.digits'='2', 'projection.day.type'='integer', 'projection.day.range'='1,31', 'projection.day.digits'='2', 'projection.hour.type'='integer', 'projection.hour.range'='0,23', 'projection.hour.digits'='2', 'storage.location.template'='s3://vpc-flow-logs-bucket-unique-name/year=${year}/month=${month}/day=${day}/hour=${hour}/' )
Athena Table Configuration:
  • Partitioning: By year/month/day/hour for query performance
  • Projection: Automatically discovers partitions without manual maintenance
  • Storage Format: Parquet for optimized analytics performance
  • Location Template: Matches S3 prefix structure for data location
This setup enables efficient querying of large volumes of Flow Logs data with automatic partition discovery.

Step 6: Common Flow Logs Analysis Queries

-- Top talkers by bytes transferred SELECT srcaddr, dstaddr, SUM(bytes) as total_bytes, COUNT(*) as connection_count FROM vpc_flow_logs WHERE year = '2024' AND month = '06' AND day = '30' AND action = 'ACCEPT' GROUP BY srcaddr, dstaddr ORDER BY total_bytes DESC LIMIT 10;
-- Security analysis - rejected connections SELECT srcaddr, dstaddr, dstport, protocol, COUNT(*) as reject_count FROM vpc_flow_logs WHERE year = '2024' AND month = '06' AND day = '30' AND action = 'REJECT' GROUP BY srcaddr, dstaddr, dstport, protocol ORDER BY reject_count DESC LIMIT 20;
-- Network performance analysis SELECT interface_id, AVG(bytes/packets) as avg_packet_size, SUM(bytes) as total_bytes, SUM(packets) as total_packets FROM vpc_flow_logs WHERE year = '2024' AND month = '06' AND day = '30' AND packets > 0 GROUP BY interface_id ORDER BY total_bytes DESC;
Query Analysis Types:
  • Traffic Analysis: Identify top communicating endpoints
  • Security Monitoring: Detect rejected connections and potential threats
  • Performance Metrics: Calculate packet sizes and throughput
  • Cost Optimization: Identify high-traffic interfaces for optimization
These queries help understand network behavior, security posture, and performance characteristics.

3. VPC Traffic Mirroring

Traffic Mirroring Architecture

graph TB A[Production Traffic] --> B[Source ENI] B --> C[Mirror Session] C --> D[Target ENI] C --> E[Mirror Filter] E --> F[Filter Rules] F --> G[Accept Rules] F --> H[Reject Rules] D --> I[Analysis Instance] I --> J[Wireshark] I --> K[Suricata IDS] I --> L[Custom Analysis Tools] M[Mirror Target Types] --> N[ENI] M --> O[NLB] M --> P[Gateway Load Balancer] Q[Use Cases] --> R[Security Analysis] Q --> S[Performance Monitoring] Q --> T[Compliance Auditing] Q --> U[Troubleshooting]
VPC Traffic Mirroring Architecture: Traffic Mirroring creates a copy of network traffic from source ENIs and sends it to target destinations for analysis. Mirror filters control which traffic is copied based on rules. The mirrored traffic can be analyzed using various tools like Wireshark, IDS systems, or custom applications. This enables deep packet inspection without impacting production traffic.

Traffic Mirroring Data Flow

sequenceDiagram participant Client participant Source as Source ENI participant Target as Target ENI participant Analyzer as Analysis Tool participant Filter as Mirror Filter Client->>Source: Original Traffic Source->>Filter: Traffic Copy Filter->>Filter: Apply Rules alt Traffic Matches Filter Filter->>Target: Mirrored Packets Target->>Analyzer: Forward for Analysis Analyzer->>Analyzer: Deep Packet Inspection else Traffic Filtered Out Filter->>Filter: Drop Packet Copy end Source->>Client: Original Response (Unaffected)
Traffic Mirroring Flow: This sequence shows how original traffic flows normally while simultaneously being copied for analysis. The mirror filter evaluates each packet against configured rules to determine if it should be mirrored. Only matching traffic is sent to the target for analysis, while original traffic continues unaffected. This provides non-intrusive monitoring capabilities.

Setting up VPC Traffic Mirroring

Traffic Mirroring Setup Order

graph LR A[1Create Mirror Filter] --> B[2Add Filter Rules] B --> C[3Create Mirror Target] C --> D[4Create Mirror Session] D --> E[5Configure Analysis Instance] E --> F[6Start Analysis]

Step 1: Create Traffic Mirror Filter

aws ec2 create-traffic-mirror-filter \ --description "Web traffic mirror filter" \ --tag-specifications 'ResourceType=traffic-mirror-filter,Tags=[{Key=Name,Value=WebTrafficFilter},{Key=Environment,Value=Production}]'
Mirror Filter Purpose: Mirror filters define which traffic to copy based on rules. Each filter can contain multiple rules that specify source/destination IPs, ports, and protocols. Filters are reusable across multiple mirror sessions and help reduce the volume of mirrored traffic by focusing on relevant packets.

Step 2: Add Filter Rules

aws ec2 create-traffic-mirror-filter-rule \ --traffic-mirror-filter-id tmf-1234567890abcdef0 \ --traffic-direction ingress \ --rule-number 100 \ --rule-action accept \ --protocol 6 \ --destination-port-range FromPort=80,ToPort=80 \ --source-cidr-block 0.0.0.0/0 \ --description "Mirror HTTP traffic"
aws ec2 create-traffic-mirror-filter-rule \ --traffic-mirror-filter-id tmf-1234567890abcdef0 \ --traffic-direction ingress \ --rule-number 200 \ --rule-action accept \ --protocol 6 \ --destination-port-range FromPort=443,ToPort=443 \ --source-cidr-block 0.0.0.0/0 \ --description "Mirror HTTPS traffic"
aws ec2 create-traffic-mirror-filter-rule \ --traffic-mirror-filter-id tmf-1234567890abcdef0 \ --traffic-direction egress \ --rule-number 300 \ --rule-action accept \ --protocol 6 \ --source-port-range FromPort=80,ToPort=80 \ --destination-cidr-block 0.0.0.0/0 \ --description "Mirror outbound HTTP"
Filter Rule Parameters:
  • --traffic-direction: ingress or egress
  • --rule-number: Priority order (1-32766)
  • --rule-action: accept or reject
  • --protocol: IP protocol number (6=TCP, 17=UDP, etc.)
  • --destination-port-range: Target port range
  • --source-cidr-block: Source IP range
Common Protocol Numbers:
  • 1 = ICMP
  • 6 = TCP
  • 17 = UDP
  • 47 = GRE
Rules are processed in order by rule number. The first matching rule determines the action.

Step 3: Create Mirror Target

aws ec2 create-traffic-mirror-target \ --network-interface-id eni-analyzer123456 \ --description "Analysis instance target" \ --tag-specifications 'ResourceType=traffic-mirror-target,Tags=[{Key=Name,Value=AnalysisTarget}]'
aws ec2 create-traffic-mirror-target \ --network-load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/net/analysis-nlb/50dc6c495c0c9188 \ --description "NLB target for distributed analysis"
Mirror Target Types:
  • ENI Target: Direct to a single analysis instance
  • NLB Target: Distribute traffic across multiple analysis instances
  • Gateway Load Balancer: For appliance-based analysis
Target Selection Criteria:
  • Processing capacity requirements
  • High availability needs
  • Analysis tool compatibility
  • Cost considerations
NLB targets enable scaling analysis across multiple instances for high-throughput scenarios.

Step 4: Create Mirror Session

aws ec2 create-traffic-mirror-session \ --network-interface-id eni-source123456 \ --traffic-mirror-target-id tmt-1234567890abcdef0 \ --traffic-mirror-filter-id tmf-1234567890abcdef0 \ --session-number 1 \ --packet-length 65535 \ --virtual-network-id 12345 \ --description "Web server traffic analysis" \ --tag-specifications 'ResourceType=traffic-mirror-session,Tags=[{Key=Name,Value=WebServerMirror}]'
Mirror Session Parameters:
  • --network-interface-id: Source ENI to mirror
  • --session-number: Priority when multiple sessions exist (1-32766)
  • --packet-length: Maximum bytes to capture per packet
  • --virtual-network-id: VXLAN VNI for packet encapsulation
Packet Length Guidelines:
  • 65535 = Full packet capture
  • 128 = Headers only (faster processing)
  • 1500 = Standard MTU size
The virtual network ID helps identify traffic from different sessions when analyzing at the target.

Step 5: Configure Analysis Instance

sudo apt-get update && sudo apt-get install -y wireshark-common tshark suricata
# Configure network interface for promiscuous mode sudo ip link set dev eth0 promisc on # Configure tshark for continuous capture sudo tshark -i eth0 -w /tmp/mirror-capture.pcap -b filesize:100000 -b files:10
# Suricata configuration for IDS analysis # /etc/suricata/suricata.yaml af-packet: - interface: eth0 cluster-id: 99 cluster-type: cluster_flow defrag: yes outputs: - fast: enabled: yes filename: /var/log/suricata/fast.log - eve-log: enabled: yes filetype: regular filename: /var/log/suricata/eve.json types: - alert - http - dns - tls
sudo systemctl start suricata sudo systemctl enable suricata
Analysis Tools Configuration:
  • Wireshark/tshark: Packet capture and protocol analysis
  • Suricata: Intrusion detection and prevention
  • Promiscuous mode: Required to receive mirrored traffic
  • File rotation: Prevents disk space issues with continuous capture
The analysis instance must be properly configured to handle the encapsulated mirrored traffic and extract meaningful insights.

Step 6: Analysis and Monitoring Commands

aws ec2 describe-traffic-mirror-sessions --filters "Name=network-interface-id,Values=eni-source123456"
aws cloudwatch get-metric-statistics \ --namespace AWS/EC2 \ --metric-name NetworkPacketsIn \ --dimensions Name=InstanceId,Value=i-analyzer123456 \ --statistics Sum \ --start-time 2024-06-30T00:00:00Z \ --end-time 2024-06-30T23:59:59Z \ --period 3600
# Real-time traffic analysis sudo tshark -i eth0 -Y "tcp.port == 80 or tcp.port == 443" -T fields -e ip.src -e ip.dst -e tcp.port
# Suricata alert monitoring tail -f /var/log/suricata/fast.log | grep -E "(MALWARE|TROJAN|EXPLOIT)"
Monitoring and Analysis:
  • Session Status: Monitor active mirror sessions
  • Traffic Volume: Track mirrored packet volumes
  • Real-time Analysis: Live traffic inspection
  • Alert Processing: Security event detection
Regular monitoring ensures mirror sessions are working correctly and analysis tools are processing traffic effectively.

4. VPC Reachability Analyzer

Reachability Analyzer Architecture

graph TB A[Source] --> B[Reachability Analyzer] C[Destination] --> B B --> D[Path Analysis] D --> E[Route Tables] D --> F[Security Groups] D --> G[NACLs] D --> H[Internet Gateways] D --> I[NAT Gateways] D --> J[VPC Peering] D --> K[Transit Gateway] D --> L[VPC Endpoints] M[Analysis Results] --> N[Reachable] M --> O[Not Reachable] N --> P[Path Details] O --> Q[Blocking Component] R[Use Cases] --> S[Troubleshooting] R --> T[Security Validation] R --> U[Change Impact Analysis] R --> V[Compliance Testing]
VPC Reachability Analyzer: This service analyzes network paths between sources and destinations by examining all AWS networking components in the path. It considers route tables, security groups, NACLs, gateways, and other network configurations to determine if traffic can flow between two points. The analysis is performed without sending actual traffic, making it safe for production environments.

Network Path Analysis Flow

flowchart TD A[Start Analysis] --> B[Identify Source ENI] B --> C[Identify Destination] C --> D[Check Route Tables] D --> E{Route Exists?} E -->|No| F[Not Reachable - No Route] E -->|Yes| G[Check Source Security Groups] G --> H{Outbound Rules Allow?} H -->|No| I[Not Reachable - SG Outbound] H -->|Yes| J[Check Source NACL] J --> K{NACL Outbound Allow?} K -->|No| L[Not Reachable - NACL Outbound] K -->|Yes| M[Check Destination NACL] M --> N{NACL Inbound Allow?} N -->|No| O[Not Reachable - NACL Inbound] N -->|Yes| P[Check Destination Security Groups] P --> Q{Inbound Rules Allow?} Q -->|No| R[Not Reachable - SG Inbound] Q -->|Yes| S[Reachable - Path Found] style F fill:#ffcccc style I fill:#ffcccc style L fill:#ffcccc style O fill:#ffcccc style R fill:#ffcccc style S fill:#ccffcc
Path Analysis Logic: The Reachability Analyzer follows a systematic approach to validate network connectivity. It starts by checking if a route exists, then validates security controls in order: source security groups (outbound), source NACLs (outbound), destination NACLs (inbound), and destination security groups (inbound). Any component that blocks traffic results in a "Not Reachable" determination with specific details about the blocking component.

Setting up Reachability Analysis

Reachability Analysis Setup Order

graph LR A[1Identify Source/Destination] --> B[2Create Analysis Path] B --> C[3Run Analysis] C --> D[4Review Results] D --> E[5Fix Issues] E --> F[6Verify Resolution]

Step 1: Create Network Insights Path

aws ec2 create-network-insights-path \ --source eni-source123456 \ --destination eni-dest789012 \ --protocol tcp \ --destination-port 443 \ --tag-specifications 'ResourceType=network-insights-path,Tags=[{Key=Name,Value=WebServerConnectivity},{Key=Purpose,Value=TroubleshootHTTPS}]'
Path Configuration Parameters:
  • --source: Source ENI, instance, or other AWS resource
  • --destination: Target ENI, instance, or resource
  • --protocol: tcp, udp, or icmp
  • --destination-port: Target port for TCP/UDP protocols
Source/Destination Types:
  • EC2 instances (i-xxxxxxxxx)
  • Network interfaces (eni-xxxxxxxxx)
  • VPC endpoints (vpce-xxxxxxxxx)
  • Load balancers (arn:aws:elasticloadbalancing:...)
  • Internet gateways (igw-xxxxxxxxx)

Step 2: Different Path Analysis Scenarios

aws ec2 create-network-insights-path \ --source i-web123456 \ --destination i-db789012 \ --protocol tcp \ --destination-port 3306 \ --tag-specifications 'ResourceType=network-insights-path,Tags=[{Key=Name,Value=DatabaseConnectivity}]'
aws ec2 create-network-insights-path \ --source eni-private123456 \ --destination igw-12345678 \ --protocol tcp \ --destination-port 80 \ --tag-specifications 'ResourceType=network-insights-path,Tags=[{Key=Name,Value=InternetAccess}]'
aws ec2 create-network-insights-path \ --source i-app123456 \ --destination vpce-s3endpoint789 \ --protocol tcp \ --destination-port 443 \ --tag-specifications 'ResourceType=network-insights-path,Tags=[{Key=Name,Value=S3EndpointAccess}]'
Common Analysis Scenarios:
  • Database Connectivity: Web tier to database tier access
  • Internet Access: Private subnet to internet gateway
  • VPC Endpoint Access: Instance to AWS service endpoints
  • Cross-VPC Communication: Peering or Transit Gateway connectivity
Each scenario helps validate different aspects of network architecture and security configurations.

Step 3: Run Network Analysis

aws ec2 start-network-insights-analysis --network-insights-path-id nip-1234567890abcdef0
aws ec2 describe-network-insights-analyses --network-insights-analysis-ids nia-1234567890abcdef0
Analysis Execution:
  • Asynchronous Operation: Analysis runs in background
  • Status Monitoring: Check analysis status regularly
  • Result Availability: Results available when status is "succeeded"
  • Cost: Charged per analysis execution
Analysis typically completes within minutes but can take longer for complex network topologies.

Step 4: Analyze Results - Successful Path

{ "NetworkInsightsAnalysis": { "NetworkInsightsAnalysisId": "nia-1234567890abcdef0", "NetworkInsightsPathId": "nip-1234567890abcdef0", "Status": "succeeded", "NetworkPathFound": true, "ForwardPathComponents": [ { "SequenceNumber": 1, "Component": { "Id": "eni-source123456", "Arn": "arn:aws:ec2:us-east-1:123456789012:network-interface/eni-source123456" }, "ComponentType": "AWS::EC2::NetworkInterface", "ComponentDetails": { "NetworkInterface": { "NetworkInterfaceId": "eni-source123456" } } }, { "SequenceNumber": 2, "Component": { "Id": "sg-web123456", "Arn": "arn:aws:ec2:us-east-1:123456789012:security-group/sg-web123456" }, "ComponentType": "AWS::EC2::SecurityGroup", "ComponentDetails": { "SecurityGroupRule": { "Direction": "outbound", "SecurityGroupRuleId": "sgr-outbound123456", "Protocol": "tcp", "FromPort": 443, "ToPort": 443, "CidrBlock": "0.0.0.0/0" } } } ] } }
Successful Analysis Components:
  • NetworkPathFound: true: Path exists between source and destination
  • ForwardPathComponents: Ordered list of network components in path
  • SequenceNumber: Order of components in the path
  • ComponentType: Type of AWS resource (SecurityGroup, RouteTable, etc.)
  • ComponentDetails: Specific configuration that allows traffic
This shows the complete path that traffic would take from source to destination.

Step 5: Analyze Results - Blocked Path

{ "NetworkInsightsAnalysis": { "NetworkInsightsAnalysisId": "nia-blocked789012", "NetworkInsightsPathId": "nip-blocked789012", "Status": "succeeded", "NetworkPathFound": false, "ForwardPathComponents": [ { "SequenceNumber": 1, "Component": { "Id": "eni-source123456", "Arn": "arn:aws:ec2:us-east-1:123456789012:network-interface/eni-source123456" }, "ComponentType": "AWS::EC2::NetworkInterface" }, { "SequenceNumber": 2, "Component": { "Id": "sg-restrictive789012", "Arn": "arn:aws:ec2:us-east-1:123456789012:security-group/sg-restrictive789012" }, "ComponentType": "AWS::EC2::SecurityGroup", "ComponentDetails": { "SecurityGroupRule": { "Direction": "outbound", "Protocol": "tcp", "FromPort": 22, "ToPort": 22, "CidrBlock": "10.0.0.0/8" } } } ], "Explanations": [ { "ExplanationCode": "ENI_NO_SECURITY_GROUP_RULE", "Direction": "outbound", "Protocol": "tcp", "Port": 443, "SecurityGroup": { "Id": "sg-restrictive789012" } } ] } }
Blocked Path Analysis:
  • NetworkPathFound: false: No valid path exists
  • Explanations: Detailed reasons why path is blocked
  • ExplanationCode: Specific type of blocking issue
  • ENI_NO_SECURITY_GROUP_RULE: No security group rule allows the traffic
Common Explanation Codes:
  • ENI_NO_SECURITY_GROUP_RULE: Security group blocks traffic
  • NETWORK_ACL_RULE: NACL rule blocks traffic
  • ROUTE_TABLE_ROUTE: No route to destination
  • INTERNET_GATEWAY: Internet gateway issues

Step 6: Remediation Based on Analysis

aws ec2 authorize-security-group-egress