Learn Networking in 10 DaysDay 10: Network Troubleshooting & Tools
books.chapter 10Learn Networking in 10 Days

Day 10: Network Troubleshooting & Tools

What You'll Learn Today

  • A systematic OSI layer-by-layer troubleshooting approach
  • Essential tools: ping, traceroute, dig, netstat/ss, curl, tcpdump, and Wireshark
  • Common network issues and how to diagnose them
  • SNMP monitoring for network infrastructure
  • Cloud networking concepts: VPC, security groups, and cloud-native networking
  • A complete review of everything you have learned over the past 10 days

OSI Layer-by-Layer Troubleshooting

When a network problem occurs, the most effective approach is to work through the OSI model systematically from the bottom up. Start at the physical layer and work your way to the application layer.

flowchart TB
    subgraph Approach["Bottom-Up Troubleshooting"]
        L1["Layer 1: Physical\nCable connected? Link light on?"]
        L2["Layer 2: Data Link\nMAC address correct? VLAN configured?"]
        L3["Layer 3: Network\nIP assigned? Can ping gateway?"]
        L4["Layer 4: Transport\nPort open? Firewall blocking?"]
        L7["Layer 7: Application\nDNS resolving? HTTP responding?"]
    end
    L1 --> L2 --> L3 --> L4 --> L7
    style L1 fill:#ef4444,color:#fff
    style L2 fill:#f59e0b,color:#fff
    style L3 fill:#3b82f6,color:#fff
    style L4 fill:#8b5cf6,color:#fff
    style L7 fill:#22c55e,color:#fff
Layer Check Tool
L1 Physical Cable connected, link light, Wi-Fi signal Visual inspection, ethtool, iwconfig
L2 Data Link MAC address, ARP table, VLAN arp, ip link, bridge
L3 Network IP address, routing, gateway reachability ip addr, ping, traceroute
L4 Transport Port connectivity, firewall rules ss, netstat, telnet, nc
L5-7 Application DNS resolution, HTTP response, application logs dig, curl, tcpdump, application logs

Essential Network Tools

ping

ping sends ICMP Echo Request packets and measures round-trip time. It is the most basic connectivity test.

# Basic ping
ping example.com

# Send 5 packets and stop
ping -c 5 example.com

# Ping with specific packet size
ping -s 1400 example.com

# Ping with interval of 0.5 seconds
ping -i 0.5 example.com

Example output:

PING example.com (93.184.216.34): 56 bytes
64 bytes from 93.184.216.34: icmp_seq=0 ttl=56 time=11.2 ms
64 bytes from 93.184.216.34: icmp_seq=1 ttl=56 time=10.8 ms
64 bytes from 93.184.216.34: icmp_seq=2 ttl=56 time=11.5 ms

--- example.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss
round-trip min/avg/max = 10.8/11.2/11.5 ms
Result Meaning Likely Issue
Reply received Host is reachable No issue at L3
Request timeout No response Host down, firewall blocking ICMP, routing issue
Destination unreachable Network/host unreachable Routing problem, no path to host
High latency Slow response Congestion, long path, overloaded server
Packet loss Some packets dropped Network congestion, faulty hardware

traceroute / tracert

traceroute reveals the path packets take through the network by sending packets with incrementally increasing TTL values.

# Linux/macOS
traceroute example.com

# Windows
tracert example.com

# Use ICMP instead of UDP (Linux)
traceroute -I example.com

# Use TCP (useful when ICMP is blocked)
traceroute -T -p 443 example.com

Example output:

traceroute to example.com (93.184.216.34), 30 hops max
 1  192.168.1.1      1.2 ms   1.1 ms   1.0 ms
 2  10.0.0.1         5.3 ms   5.1 ms   5.2 ms
 3  isp-router.net   12.4 ms  12.1 ms  12.3 ms
 4  * * *
 5  edge.example.com 15.2 ms  15.0 ms  15.1 ms
Symbol Meaning
IP/hostname + time Hop responded; time is the round-trip
* * * Hop did not respond (firewall or ICMP disabled)
Increasing times Normal; each hop adds latency
Sudden time jump Possible congestion at that hop

dig

dig queries DNS servers. We covered this in Day 6, but it is also essential for troubleshooting.

# Check DNS resolution
dig example.com

# Query specific record type
dig example.com MX

# Use a specific DNS server
dig @1.1.1.1 example.com

# Trace the full resolution path
dig +trace example.com

# Reverse DNS lookup
dig -x 93.184.216.34

netstat / ss

ss (Socket Statistics) is the modern replacement for netstat. It shows active connections and listening ports.

# Show all listening TCP ports
ss -tlnp

# Show all established connections
ss -tnp

# Show listening UDP ports
ss -ulnp

# Filter by port
ss -tlnp | grep :443

# Legacy netstat equivalent
netstat -tlnp
Flag Meaning
-t TCP connections
-u UDP connections
-l Listening sockets only
-n Show numeric addresses (no DNS resolution)
-p Show process name/PID

curl

curl is a versatile tool for testing HTTP/HTTPS connectivity and APIs.

# Basic HTTP GET
curl https://example.com

# Show response headers
curl -I https://example.com

# Verbose output (see TLS handshake, headers)
curl -v https://example.com

# POST JSON data
curl -X POST -H "Content-Type: application/json" \
     -d '{"key":"value"}' https://api.example.com

# Follow redirects
curl -L http://example.com

# Test with specific DNS resolution
curl --resolve example.com:443:93.184.216.34 https://example.com

# Measure timing
curl -w "DNS: %{time_namelookup}s\nTCP: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" \
     -o /dev/null -s https://example.com

The timing output is particularly useful:

DNS:   0.012s
TCP:   0.045s
TLS:   0.123s
Total: 0.234s

tcpdump

tcpdump captures network packets at a low level. It is the command-line equivalent of Wireshark.

# Capture all traffic on eth0
sudo tcpdump -i eth0

# Capture only traffic to/from a specific host
sudo tcpdump host 93.184.216.34

# Capture only TCP traffic on port 443
sudo tcpdump -i eth0 tcp port 443

# Capture DNS traffic
sudo tcpdump -i eth0 udp port 53

# Save capture to file (for Wireshark)
sudo tcpdump -i eth0 -w capture.pcap

# Read a capture file
tcpdump -r capture.pcap

# Show packet contents in ASCII
sudo tcpdump -A -i eth0 port 80
Filter Description
host 10.0.0.1 Traffic to or from this IP
src host 10.0.0.1 Traffic from this IP
dst port 443 Traffic to port 443
tcp TCP traffic only
udp UDP traffic only
not port 22 Exclude SSH traffic
Combine with and, or tcp and port 80 and host 10.0.0.1

Wireshark

Wireshark is a graphical packet analyzer. While tcpdump is ideal for servers and quick captures, Wireshark excels at deep analysis with its protocol dissectors, filters, and visualizations.

flowchart LR
    subgraph Workflow["Packet Analysis Workflow"]
        CAPTURE["Capture\ntcpdump -w file.pcap"]
        OPEN["Open in Wireshark"]
        FILTER["Apply display filters"]
        ANALYZE["Analyze protocols\nand flows"]
    end
    CAPTURE --> OPEN --> FILTER --> ANALYZE
    style CAPTURE fill:#3b82f6,color:#fff
    style OPEN fill:#8b5cf6,color:#fff
    style FILTER fill:#f59e0b,color:#fff
    style ANALYZE fill:#22c55e,color:#fff
Wireshark Filter Description
http Show only HTTP traffic
dns Show only DNS traffic
tcp.port == 443 Traffic on port 443
ip.addr == 10.0.0.1 Traffic to/from an IP
tcp.flags.syn == 1 SYN packets (connection starts)
tcp.analysis.retransmission Retransmitted packets

Common Network Issues

Symptom Possible Cause Diagnostic Steps
No connectivity Cable unplugged, Wi-Fi off, DHCP failure Check physical connection, ip addr, dhclient
Slow performance Congestion, packet loss, DNS delays ping (check latency/loss), traceroute, curl timing
Intermittent drops Faulty cable, wireless interference, MTU mismatch ping -c 100 (check loss), ethtool, check MTU
Cannot reach website DNS failure, firewall, server down dig (DNS), curl -v (HTTP), traceroute
Connection refused Service not running, wrong port, firewall ss -tlnp (check listening ports), firewall rules
SSL/TLS error Expired certificate, wrong hostname, cipher mismatch curl -v, openssl s_client, check certificate dates

DNS Troubleshooting

# Check if DNS resolves
dig example.com

# Try a different DNS server
dig @8.8.8.8 example.com

# Check /etc/resolv.conf
cat /etc/resolv.conf

# Flush DNS cache (systemd)
sudo systemd-resolve --flush-caches

MTU Issues

If large packets fail but small ones succeed, you may have an MTU (Maximum Transmission Unit) problem.

# Test MTU by sending specific-size packets (don't fragment)
ping -s 1472 -M do example.com    # 1472 + 28 (header) = 1500
ping -s 1400 -M do example.com    # Try smaller if above fails

SNMP Monitoring

SNMP (Simple Network Management Protocol) allows you to monitor and manage network devices (routers, switches, servers) from a central system.

flowchart TB
    subgraph SNMP_Arch["SNMP Architecture"]
        NMS["Network Management\nSystem (NMS)"]
        A1["Router\n(SNMP Agent)"]
        A2["Switch\n(SNMP Agent)"]
        A3["Server\n(SNMP Agent)"]
    end
    NMS <-->|"SNMP Queries\n& Traps"| A1
    NMS <-->|"SNMP Queries\n& Traps"| A2
    NMS <-->|"SNMP Queries\n& Traps"| A3
    style NMS fill:#3b82f6,color:#fff
    style A1 fill:#22c55e,color:#fff
    style A2 fill:#22c55e,color:#fff
    style A3 fill:#22c55e,color:#fff
SNMP Version Authentication Encryption Status
SNMPv1 Community string (plaintext) None Legacy
SNMPv2c Community string (plaintext) None Common
SNMPv3 Username + auth protocol DES/AES Recommended
SNMP Operation Direction Purpose
GET NMS β†’ Agent Read a specific value (e.g., interface status)
SET NMS β†’ Agent Change a configuration value
TRAP Agent β†’ NMS Agent sends unsolicited alert (e.g., link down)
WALK NMS β†’ Agent Read an entire subtree of values

Common monitoring tools: Nagios, Zabbix, Prometheus + Grafana, PRTG, Datadog.


Cloud Networking

Modern infrastructure increasingly runs in the cloud. Cloud providers offer software-defined networking that mirrors physical networking concepts.

VPC (Virtual Private Cloud)

A VPC is an isolated virtual network within a cloud provider. You define the IP address range, subnets, routing, and access control.

flowchart TB
    subgraph VPC["VPC: 10.0.0.0/16"]
        subgraph Public["Public Subnet\n10.0.1.0/24"]
            WEB["Web Server"]
            NAT["NAT Gateway"]
        end
        subgraph Private["Private Subnet\n10.0.2.0/24"]
            APP["App Server"]
            DB["Database"]
        end
    end
    IGW["Internet Gateway"] --> WEB
    WEB --> APP
    APP --> DB
    APP -->|"Outbound via"| NAT
    NAT --> IGW
    style Public fill:#22c55e,color:#fff
    style Private fill:#3b82f6,color:#fff
    style VPC fill:#8b5cf6,color:#fff

Security Groups vs NACLs

Feature Security Group Network ACL (NACL)
Level Instance (ENI) Subnet
State Stateful (return traffic auto-allowed) Stateless (must explicitly allow return)
Rules Allow only Allow and Deny
Evaluation All rules evaluated Rules evaluated in order (first match)
Default Deny all inbound, allow all outbound Allow all inbound and outbound

Cloud Networking Concepts

Concept Description Physical Equivalent
VPC Isolated virtual network Private data center network
Subnet IP range within a VPC VLAN / physical subnet
Internet Gateway VPC connection to the internet Border router
NAT Gateway Allows private instances to reach the internet NAT router
Route Table Determines where traffic is sent Router routing table
Security Group Instance-level firewall Host firewall
NACL Subnet-level firewall Network firewall
VPC Peering Connect two VPCs WAN link between sites
Load Balancer Distributes traffic across instances Hardware load balancer

Troubleshooting Flowchart

Here is a systematic approach to diagnosing any network issue.

flowchart TB
    START["Problem reported"] --> L1{"L1: Physical\nCable/Wi-Fi connected?"}
    L1 -->|No| FIX1["Fix physical connection"]
    L1 -->|Yes| L2{"L2: Data Link\nIP assigned?\nip addr"}
    L2 -->|No| FIX2["Check DHCP\ndhclient / renew"]
    L2 -->|Yes| L3{"L3: Network\nCan ping gateway?\nping 192.168.1.1"}
    L3 -->|No| FIX3["Check routing\nip route\nCheck firewall"]
    L3 -->|Yes| DNS{"DNS\nCan resolve?\ndig example.com"}
    DNS -->|No| FIX4["Check /etc/resolv.conf\nTry dig @8.8.8.8"]
    DNS -->|Yes| L4{"L4: Transport\nPort reachable?\ncurl -v / telnet"}
    L4 -->|No| FIX5["Check firewall rules\nss -tlnp on server"]
    L4 -->|Yes| L7{"L7: Application\nCorrect response?\nCheck logs"}
    L7 -->|No| FIX6["Check application logs\nRestart service"]
    L7 -->|Yes| DONE["Issue resolved"]
    style START fill:#ef4444,color:#fff
    style DONE fill:#22c55e,color:#fff
    style L1 fill:#f59e0b,color:#fff
    style L2 fill:#f59e0b,color:#fff
    style L3 fill:#3b82f6,color:#fff
    style DNS fill:#3b82f6,color:#fff
    style L4 fill:#8b5cf6,color:#fff
    style L7 fill:#22c55e,color:#fff

Summary

Concept Description
Layer-by-layer troubleshooting Systematic bottom-up approach through the OSI model
ping ICMP connectivity test; measures latency and packet loss
traceroute Reveals the network path to a destination hop by hop
dig DNS query tool for resolution troubleshooting
ss / netstat Shows active connections and listening ports
curl HTTP/HTTPS testing with detailed timing and headers
tcpdump Command-line packet capture and analysis
Wireshark Graphical packet analyzer for deep protocol inspection
SNMP Protocol for monitoring and managing network devices
VPC Isolated virtual network in the cloud
Security Groups Stateful instance-level firewall in the cloud
NACL Stateless subnet-level firewall in the cloud

Key Takeaways

  1. Always troubleshoot bottom-up: physical first, then data link, network, transport, application
  2. Master the core tools: ping, traceroute, dig, ss, curl, and tcpdump cover most situations
  3. curl -w timing breakdown pinpoints exactly where delays occur (DNS, TCP, TLS)
  4. Cloud networking maps directly to physical networking concepts; learn one and you understand the other
  5. Monitoring (SNMP, Prometheus) catches problems before users report them

Practice Problems

Beginner

A user reports they cannot access https://app.example.com. Using the layer-by-layer approach, write the exact commands you would run at each step, what output you would expect, and what each result tells you.

Intermediate

Use curl -w to measure the timing breakdown for three different websites. Compare DNS lookup time, TCP connection time, TLS handshake time, and total time. Based on the results, identify which phase is the bottleneck for each site and suggest optimizations (e.g., DNS caching, CDN, HTTP/2).

Advanced

You are responsible for a cloud-based web application deployed in AWS. Design the complete VPC architecture including: CIDR blocks, public and private subnets across two availability zones, internet gateway, NAT gateway, route tables, security groups for web servers (allow 80/443), app servers (allow from web SG only), and database (allow from app SG only). Write the security group rules and explain the traffic flow for: (1) a user accessing the website, (2) the app server calling an external API, and (3) an admin SSH-ing into the app server via a bastion host.


References


Congratulations!

You have completed Learn Networking in 10 Days! Over the past 10 days, you have built a comprehensive understanding of computer networking:

Day Topic Key Skills
Day 1 Networking Fundamentals OSI model, LAN vs WAN, network devices
Day 2 The TCP/IP Model TCP vs UDP, 3-way handshake, ports
Day 3 IP Addressing & Subnetting IPv4, IPv6, CIDR, subnet calculations
Day 4 Switching & VLANs MAC addresses, switches, VLAN segmentation
Day 5 Routing & NAT Static/dynamic routing, OSPF, BGP, NAT/PAT
Day 6 DNS Record types, resolution flow, DNSSEC
Day 7 HTTP & the Web HTTP methods, status codes, HTTP/2, HTTP/3
Day 8 TLS/SSL & Security Encryption, TLS handshake, certificates, attacks
Day 9 Wireless & VPN Wi-Fi standards, WPA3, IPsec, WireGuard
Day 10 Troubleshooting & Tools ping, traceroute, tcpdump, cloud networking

You now have the foundation to:

  • Design networks for applications and organizations
  • Secure communication with TLS, firewalls, and VPNs
  • Troubleshoot any network issue systematically
  • Understand cloud infrastructure as an extension of physical networking

Networking knowledge is foundational to every area of software engineering and infrastructure. Whether you are debugging a slow API call, configuring a cloud deployment, or building a distributed system, the concepts from this book will serve you every day.

Keep learning, keep experimenting, and keep building!