Day 10: Network Troubleshooting & Tools
What You'll Learn Today
- A systematic OSI layer-by-layer troubleshooting approach
- Essential tools: ping, traceroute, dig, netstat/ss, curl, tcpdump, and Wireshark
- Common network issues and how to diagnose them
- SNMP monitoring for network infrastructure
- Cloud networking concepts: VPC, security groups, and cloud-native networking
- A complete review of everything you have learned over the past 10 days
OSI Layer-by-Layer Troubleshooting
When a network problem occurs, the most effective approach is to work through the OSI model systematically from the bottom up. Start at the physical layer and work your way to the application layer.
flowchart TB
subgraph Approach["Bottom-Up Troubleshooting"]
L1["Layer 1: Physical\nCable connected? Link light on?"]
L2["Layer 2: Data Link\nMAC address correct? VLAN configured?"]
L3["Layer 3: Network\nIP assigned? Can ping gateway?"]
L4["Layer 4: Transport\nPort open? Firewall blocking?"]
L7["Layer 7: Application\nDNS resolving? HTTP responding?"]
end
L1 --> L2 --> L3 --> L4 --> L7
style L1 fill:#ef4444,color:#fff
style L2 fill:#f59e0b,color:#fff
style L3 fill:#3b82f6,color:#fff
style L4 fill:#8b5cf6,color:#fff
style L7 fill:#22c55e,color:#fff
| Layer | Check | Tool |
|---|---|---|
| L1 Physical | Cable connected, link light, Wi-Fi signal | Visual inspection, ethtool, iwconfig |
| L2 Data Link | MAC address, ARP table, VLAN | arp, ip link, bridge |
| L3 Network | IP address, routing, gateway reachability | ip addr, ping, traceroute |
| L4 Transport | Port connectivity, firewall rules | ss, netstat, telnet, nc |
| L5-7 Application | DNS resolution, HTTP response, application logs | dig, curl, tcpdump, application logs |
Essential Network Tools
ping
ping sends ICMP Echo Request packets and measures round-trip time. It is the most basic connectivity test.
# Basic ping
ping example.com
# Send 5 packets and stop
ping -c 5 example.com
# Ping with specific packet size
ping -s 1400 example.com
# Ping with interval of 0.5 seconds
ping -i 0.5 example.com
Example output:
PING example.com (93.184.216.34): 56 bytes
64 bytes from 93.184.216.34: icmp_seq=0 ttl=56 time=11.2 ms
64 bytes from 93.184.216.34: icmp_seq=1 ttl=56 time=10.8 ms
64 bytes from 93.184.216.34: icmp_seq=2 ttl=56 time=11.5 ms
--- example.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss
round-trip min/avg/max = 10.8/11.2/11.5 ms
| Result | Meaning | Likely Issue |
|---|---|---|
| Reply received | Host is reachable | No issue at L3 |
| Request timeout | No response | Host down, firewall blocking ICMP, routing issue |
| Destination unreachable | Network/host unreachable | Routing problem, no path to host |
| High latency | Slow response | Congestion, long path, overloaded server |
| Packet loss | Some packets dropped | Network congestion, faulty hardware |
traceroute / tracert
traceroute reveals the path packets take through the network by sending packets with incrementally increasing TTL values.
# Linux/macOS
traceroute example.com
# Windows
tracert example.com
# Use ICMP instead of UDP (Linux)
traceroute -I example.com
# Use TCP (useful when ICMP is blocked)
traceroute -T -p 443 example.com
Example output:
traceroute to example.com (93.184.216.34), 30 hops max
1 192.168.1.1 1.2 ms 1.1 ms 1.0 ms
2 10.0.0.1 5.3 ms 5.1 ms 5.2 ms
3 isp-router.net 12.4 ms 12.1 ms 12.3 ms
4 * * *
5 edge.example.com 15.2 ms 15.0 ms 15.1 ms
| Symbol | Meaning |
|---|---|
| IP/hostname + time | Hop responded; time is the round-trip |
* * * |
Hop did not respond (firewall or ICMP disabled) |
| Increasing times | Normal; each hop adds latency |
| Sudden time jump | Possible congestion at that hop |
dig
dig queries DNS servers. We covered this in Day 6, but it is also essential for troubleshooting.
# Check DNS resolution
dig example.com
# Query specific record type
dig example.com MX
# Use a specific DNS server
dig @1.1.1.1 example.com
# Trace the full resolution path
dig +trace example.com
# Reverse DNS lookup
dig -x 93.184.216.34
netstat / ss
ss (Socket Statistics) is the modern replacement for netstat. It shows active connections and listening ports.
# Show all listening TCP ports
ss -tlnp
# Show all established connections
ss -tnp
# Show listening UDP ports
ss -ulnp
# Filter by port
ss -tlnp | grep :443
# Legacy netstat equivalent
netstat -tlnp
| Flag | Meaning |
|---|---|
-t |
TCP connections |
-u |
UDP connections |
-l |
Listening sockets only |
-n |
Show numeric addresses (no DNS resolution) |
-p |
Show process name/PID |
curl
curl is a versatile tool for testing HTTP/HTTPS connectivity and APIs.
# Basic HTTP GET
curl https://example.com
# Show response headers
curl -I https://example.com
# Verbose output (see TLS handshake, headers)
curl -v https://example.com
# POST JSON data
curl -X POST -H "Content-Type: application/json" \
-d '{"key":"value"}' https://api.example.com
# Follow redirects
curl -L http://example.com
# Test with specific DNS resolution
curl --resolve example.com:443:93.184.216.34 https://example.com
# Measure timing
curl -w "DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \
-o /dev/null -s https://example.com
The timing output is particularly useful:
DNS: 0.012s
TCP: 0.045s
TLS: 0.123s
Total: 0.234s
tcpdump
tcpdump captures network packets at a low level. It is the command-line equivalent of Wireshark.
# Capture all traffic on eth0
sudo tcpdump -i eth0
# Capture only traffic to/from a specific host
sudo tcpdump host 93.184.216.34
# Capture only TCP traffic on port 443
sudo tcpdump -i eth0 tcp port 443
# Capture DNS traffic
sudo tcpdump -i eth0 udp port 53
# Save capture to file (for Wireshark)
sudo tcpdump -i eth0 -w capture.pcap
# Read a capture file
tcpdump -r capture.pcap
# Show packet contents in ASCII
sudo tcpdump -A -i eth0 port 80
| Filter | Description |
|---|---|
host 10.0.0.1 |
Traffic to or from this IP |
src host 10.0.0.1 |
Traffic from this IP |
dst port 443 |
Traffic to port 443 |
tcp |
TCP traffic only |
udp |
UDP traffic only |
not port 22 |
Exclude SSH traffic |
Combine with and, or |
tcp and port 80 and host 10.0.0.1 |
Wireshark
Wireshark is a graphical packet analyzer. While tcpdump is ideal for servers and quick captures, Wireshark excels at deep analysis with its protocol dissectors, filters, and visualizations.
flowchart LR
subgraph Workflow["Packet Analysis Workflow"]
CAPTURE["Capture\ntcpdump -w file.pcap"]
OPEN["Open in Wireshark"]
FILTER["Apply display filters"]
ANALYZE["Analyze protocols\nand flows"]
end
CAPTURE --> OPEN --> FILTER --> ANALYZE
style CAPTURE fill:#3b82f6,color:#fff
style OPEN fill:#8b5cf6,color:#fff
style FILTER fill:#f59e0b,color:#fff
style ANALYZE fill:#22c55e,color:#fff
| Wireshark Filter | Description |
|---|---|
http |
Show only HTTP traffic |
dns |
Show only DNS traffic |
tcp.port == 443 |
Traffic on port 443 |
ip.addr == 10.0.0.1 |
Traffic to/from an IP |
tcp.flags.syn == 1 |
SYN packets (connection starts) |
tcp.analysis.retransmission |
Retransmitted packets |
Common Network Issues
| Symptom | Possible Cause | Diagnostic Steps |
|---|---|---|
| No connectivity | Cable unplugged, Wi-Fi off, DHCP failure | Check physical connection, ip addr, dhclient |
| Slow performance | Congestion, packet loss, DNS delays | ping (check latency/loss), traceroute, curl timing |
| Intermittent drops | Faulty cable, wireless interference, MTU mismatch | ping -c 100 (check loss), ethtool, check MTU |
| Cannot reach website | DNS failure, firewall, server down | dig (DNS), curl -v (HTTP), traceroute |
| Connection refused | Service not running, wrong port, firewall | ss -tlnp (check listening ports), firewall rules |
| SSL/TLS error | Expired certificate, wrong hostname, cipher mismatch | curl -v, openssl s_client, check certificate dates |
DNS Troubleshooting
# Check if DNS resolves
dig example.com
# Try a different DNS server
dig @8.8.8.8 example.com
# Check /etc/resolv.conf
cat /etc/resolv.conf
# Flush DNS cache (systemd)
sudo systemd-resolve --flush-caches
MTU Issues
If large packets fail but small ones succeed, you may have an MTU (Maximum Transmission Unit) problem.
# Test MTU by sending specific-size packets (don't fragment)
ping -s 1472 -M do example.com # 1472 + 28 (header) = 1500
ping -s 1400 -M do example.com # Try smaller if above fails
SNMP Monitoring
SNMP (Simple Network Management Protocol) allows you to monitor and manage network devices (routers, switches, servers) from a central system.
flowchart TB
subgraph SNMP_Arch["SNMP Architecture"]
NMS["Network Management\nSystem (NMS)"]
A1["Router\n(SNMP Agent)"]
A2["Switch\n(SNMP Agent)"]
A3["Server\n(SNMP Agent)"]
end
NMS <-->|"SNMP Queries\n& Traps"| A1
NMS <-->|"SNMP Queries\n& Traps"| A2
NMS <-->|"SNMP Queries\n& Traps"| A3
style NMS fill:#3b82f6,color:#fff
style A1 fill:#22c55e,color:#fff
style A2 fill:#22c55e,color:#fff
style A3 fill:#22c55e,color:#fff
| SNMP Version | Authentication | Encryption | Status |
|---|---|---|---|
| SNMPv1 | Community string (plaintext) | None | Legacy |
| SNMPv2c | Community string (plaintext) | None | Common |
| SNMPv3 | Username + auth protocol | DES/AES | Recommended |
| SNMP Operation | Direction | Purpose |
|---|---|---|
| GET | NMS β Agent | Read a specific value (e.g., interface status) |
| SET | NMS β Agent | Change a configuration value |
| TRAP | Agent β NMS | Agent sends unsolicited alert (e.g., link down) |
| WALK | NMS β Agent | Read an entire subtree of values |
Common monitoring tools: Nagios, Zabbix, Prometheus + Grafana, PRTG, Datadog.
Cloud Networking
Modern infrastructure increasingly runs in the cloud. Cloud providers offer software-defined networking that mirrors physical networking concepts.
VPC (Virtual Private Cloud)
A VPC is an isolated virtual network within a cloud provider. You define the IP address range, subnets, routing, and access control.
flowchart TB
subgraph VPC["VPC: 10.0.0.0/16"]
subgraph Public["Public Subnet\n10.0.1.0/24"]
WEB["Web Server"]
NAT["NAT Gateway"]
end
subgraph Private["Private Subnet\n10.0.2.0/24"]
APP["App Server"]
DB["Database"]
end
end
IGW["Internet Gateway"] --> WEB
WEB --> APP
APP --> DB
APP -->|"Outbound via"| NAT
NAT --> IGW
style Public fill:#22c55e,color:#fff
style Private fill:#3b82f6,color:#fff
style VPC fill:#8b5cf6,color:#fff
Security Groups vs NACLs
| Feature | Security Group | Network ACL (NACL) |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| State | Stateful (return traffic auto-allowed) | Stateless (must explicitly allow return) |
| Rules | Allow only | Allow and Deny |
| Evaluation | All rules evaluated | Rules evaluated in order (first match) |
| Default | Deny all inbound, allow all outbound | Allow all inbound and outbound |
Cloud Networking Concepts
| Concept | Description | Physical Equivalent |
|---|---|---|
| VPC | Isolated virtual network | Private data center network |
| Subnet | IP range within a VPC | VLAN / physical subnet |
| Internet Gateway | VPC connection to the internet | Border router |
| NAT Gateway | Allows private instances to reach the internet | NAT router |
| Route Table | Determines where traffic is sent | Router routing table |
| Security Group | Instance-level firewall | Host firewall |
| NACL | Subnet-level firewall | Network firewall |
| VPC Peering | Connect two VPCs | WAN link between sites |
| Load Balancer | Distributes traffic across instances | Hardware load balancer |
Troubleshooting Flowchart
Here is a systematic approach to diagnosing any network issue.
flowchart TB
START["Problem reported"] --> L1{"L1: Physical\nCable/Wi-Fi connected?"}
L1 -->|No| FIX1["Fix physical connection"]
L1 -->|Yes| L2{"L2: Data Link\nIP assigned?\nip addr"}
L2 -->|No| FIX2["Check DHCP\ndhclient / renew"]
L2 -->|Yes| L3{"L3: Network\nCan ping gateway?\nping 192.168.1.1"}
L3 -->|No| FIX3["Check routing\nip route\nCheck firewall"]
L3 -->|Yes| DNS{"DNS\nCan resolve?\ndig example.com"}
DNS -->|No| FIX4["Check /etc/resolv.conf\nTry dig @8.8.8.8"]
DNS -->|Yes| L4{"L4: Transport\nPort reachable?\ncurl -v / telnet"}
L4 -->|No| FIX5["Check firewall rules\nss -tlnp on server"]
L4 -->|Yes| L7{"L7: Application\nCorrect response?\nCheck logs"}
L7 -->|No| FIX6["Check application logs\nRestart service"]
L7 -->|Yes| DONE["Issue resolved"]
style START fill:#ef4444,color:#fff
style DONE fill:#22c55e,color:#fff
style L1 fill:#f59e0b,color:#fff
style L2 fill:#f59e0b,color:#fff
style L3 fill:#3b82f6,color:#fff
style DNS fill:#3b82f6,color:#fff
style L4 fill:#8b5cf6,color:#fff
style L7 fill:#22c55e,color:#fff
Summary
| Concept | Description |
|---|---|
| Layer-by-layer troubleshooting | Systematic bottom-up approach through the OSI model |
| ping | ICMP connectivity test; measures latency and packet loss |
| traceroute | Reveals the network path to a destination hop by hop |
| dig | DNS query tool for resolution troubleshooting |
| ss / netstat | Shows active connections and listening ports |
| curl | HTTP/HTTPS testing with detailed timing and headers |
| tcpdump | Command-line packet capture and analysis |
| Wireshark | Graphical packet analyzer for deep protocol inspection |
| SNMP | Protocol for monitoring and managing network devices |
| VPC | Isolated virtual network in the cloud |
| Security Groups | Stateful instance-level firewall in the cloud |
| NACL | Stateless subnet-level firewall in the cloud |
Key Takeaways
- Always troubleshoot bottom-up: physical first, then data link, network, transport, application
- Master the core tools:
ping,traceroute,dig,ss,curl, andtcpdumpcover most situations curl -wtiming breakdown pinpoints exactly where delays occur (DNS, TCP, TLS)- Cloud networking maps directly to physical networking concepts; learn one and you understand the other
- Monitoring (SNMP, Prometheus) catches problems before users report them
Practice Problems
Beginner
A user reports they cannot access https://app.example.com. Using the layer-by-layer approach, write the exact commands you would run at each step, what output you would expect, and what each result tells you.
Intermediate
Use curl -w to measure the timing breakdown for three different websites. Compare DNS lookup time, TCP connection time, TLS handshake time, and total time. Based on the results, identify which phase is the bottleneck for each site and suggest optimizations (e.g., DNS caching, CDN, HTTP/2).
Advanced
You are responsible for a cloud-based web application deployed in AWS. Design the complete VPC architecture including: CIDR blocks, public and private subnets across two availability zones, internet gateway, NAT gateway, route tables, security groups for web servers (allow 80/443), app servers (allow from web SG only), and database (allow from app SG only). Write the security group rules and explain the traffic flow for: (1) a user accessing the website, (2) the app server calling an external API, and (3) an admin SSH-ing into the app server via a bastion host.
References
- Linux man pages: ping, traceroute, ss, tcpdump, dig, curl
- Wireshark Official Documentation
- AWS VPC Documentation
- RFC 1157 - SNMP
- Cloudflare - Network Troubleshooting
- Julia Evans - Networking Zines
Congratulations!
You have completed Learn Networking in 10 Days! Over the past 10 days, you have built a comprehensive understanding of computer networking:
| Day | Topic | Key Skills |
|---|---|---|
| Day 1 | Networking Fundamentals | OSI model, LAN vs WAN, network devices |
| Day 2 | The TCP/IP Model | TCP vs UDP, 3-way handshake, ports |
| Day 3 | IP Addressing & Subnetting | IPv4, IPv6, CIDR, subnet calculations |
| Day 4 | Switching & VLANs | MAC addresses, switches, VLAN segmentation |
| Day 5 | Routing & NAT | Static/dynamic routing, OSPF, BGP, NAT/PAT |
| Day 6 | DNS | Record types, resolution flow, DNSSEC |
| Day 7 | HTTP & the Web | HTTP methods, status codes, HTTP/2, HTTP/3 |
| Day 8 | TLS/SSL & Security | Encryption, TLS handshake, certificates, attacks |
| Day 9 | Wireless & VPN | Wi-Fi standards, WPA3, IPsec, WireGuard |
| Day 10 | Troubleshooting & Tools | ping, traceroute, tcpdump, cloud networking |
You now have the foundation to:
- Design networks for applications and organizations
- Secure communication with TLS, firewalls, and VPNs
- Troubleshoot any network issue systematically
- Understand cloud infrastructure as an extension of physical networking
Networking knowledge is foundational to every area of software engineering and infrastructure. Whether you are debugging a slow API call, configuring a cloud deployment, or building a distributed system, the concepts from this book will serve you every day.
Keep learning, keep experimenting, and keep building!