Learn Splunk in 10 DaysDay 5: Statistics and Aggregation
books.chapter 5Learn Splunk in 10 Days

Day 5: Statistics and Aggregation

What You Will Learn Today

  • The stats command
  • The chart and timechart commands
  • The top and rare commands
  • Statistical functions (count, avg, sum, max, min, dc)
  • eventstats and streamstats

The stats Command

The most fundamental command for aggregating data.

# Count by status code
index=main sourcetype=access_combined
| stats count by status

# Average response time per host
index=main sourcetype=access_combined
| stats avg(response_time) AS avg_response by host

# Multiple aggregations at once
index=main sourcetype=access_combined
| stats count, avg(response_time) AS avg_rt, max(response_time) AS max_rt, min(response_time) AS min_rt by host

Key Statistical Functions

Function Description Example
count Number of events count
count(field) Count where field exists count(user)
dc(field) Distinct count dc(clientip)
avg(field) Average avg(response_time)
sum(field) Sum sum(bytes)
max(field) Maximum max(response_time)
min(field) Minimum min(response_time)
median(field) Median median(response_time)
mode(field) Most frequent value mode(status)
stdev(field) Standard deviation stdev(response_time)
perc95(field) 95th percentile perc95(response_time)
values(field) List of unique values values(user)
list(field) List of values (with duplicates) list(status)
latest(field) Most recent value latest(status)
earliest(field) Oldest value earliest(status)

Grouping by Multiple Fields

# Group by host and status
index=main sourcetype=access_combined
| stats count by host, status
| sort host, -count

The chart Command

Creates a two-dimensional cross-tabulation.

# Count by status and host
index=main sourcetype=access_combined
| chart count by status, host

Example output:

status web-01 web-02 web-03
200 1500 1200 1800
404 50 30 45
500 10 15 5
# Using over and by
index=main sourcetype=access_combined
| chart count over status by host

# Top 5 hosts only
index=main sourcetype=access_combined
| chart count by host
| sort -count
| head 5
flowchart LR
    subgraph stats["stats count by A"]
        S1["A | count<br>---+------<br>x | 10<br>y | 20"]
    end
    subgraph chart["chart count by A, B"]
        C1["A | B1 | B2<br>---+----+----<br>x | 5 | 5<br>y | 12 | 8"]
    end
    style stats fill:#3b82f6,color:#fff
    style chart fill:#22c55e,color:#fff

The timechart Command

Aggregates data into time-series buckets for charting.

# Event count per hour
index=main sourcetype=access_combined
| timechart span=1h count

# Time-series count by status
index=main sourcetype=access_combined
| timechart span=1h count by status

# Average response time over time
index=main sourcetype=access_combined
| timechart span=15m avg(response_time) AS avg_response

span Options

span Interval
span=1m 1 minute
span=5m 5 minutes
span=15m 15 minutes
span=1h 1 hour
span=1d 1 day

Key point: timechart automatically uses _time as the X-axis. The key difference from chart is that the time axis is fixed.

chart vs timechart

Comparison chart timechart
X-axis Any field _time (fixed)
Time bucketing Manual (eval + strftime) Automatic (span)
Use case Category analysis Time-series analysis

The top and rare Commands

top

Displays the most frequent values.

# Top 10 URIs
index=main sourcetype=access_combined
| top limit=10 uri

# Top 5 URIs per host
index=main sourcetype=access_combined
| top limit=5 uri by host

# Without percentage
index=main sourcetype=access_combined
| top limit=10 uri showperc=false

Output fields from top:

Field Description
count Number of occurrences
percent Percentage of total

rare

Displays the least frequent values (the inverse of top).

# Least common status codes
index=main sourcetype=access_combined
| rare limit=5 status

The eventstats Command

Works like stats, but appends the aggregated values to each original event instead of replacing them.

# Add the overall average to each event
index=main sourcetype=access_combined
| eventstats avg(response_time) AS overall_avg
| eval is_slow = if(response_time > overall_avg * 2, "Yes", "No")
| where is_slow="Yes"
| table _time, uri, response_time, overall_avg
flowchart TB
    subgraph Stats["stats: Events are replaced by aggregated rows"]
        S1["host | count<br>web-01 | 100<br>web-02 | 200"]
    end
    subgraph EventStats["eventstats: Original events + aggregated values"]
        E1["_time | host | ... | total<br>10:00 | web-01 | ... | 300<br>10:01 | web-02 | ... | 300"]
    end
    style Stats fill:#3b82f6,color:#fff
    style EventStats fill:#22c55e,color:#fff
Command Original Events Aggregated Results
stats Removed One row per group
eventstats Preserved Appended to each event

The streamstats Command

Processes events sequentially and computes running (cumulative) statistics.

# Running count
index=main sourcetype=access_combined
| streamstats count AS running_count
| table _time, uri, running_count

# Moving average over the last 5 events
index=main sourcetype=access_combined
| streamstats avg(response_time) AS moving_avg window=5
| table _time, response_time, moving_avg

# Cumulative error count per host
index=main sourcetype=access_combined status>=400
| streamstats count AS error_count by host
| table _time, host, status, error_count
Parameter Description
window=N Compute over the last N events
time_window=span Compute over a time window
current=false Exclude the current event from the calculation

Hands-On: Performance Analysis Queries

# 1. Overall KPIs
index=main sourcetype=access_combined
| stats
    count AS total_requests,
    dc(clientip) AS unique_visitors,
    avg(response_time) AS avg_response_time,
    perc95(response_time) AS p95_response_time,
    sum(eval(if(status>=400,1,0))) AS error_count
| eval error_rate = round(error_count / total_requests * 100, 2)
| eval avg_response_time = round(avg_response_time, 3)
| eval p95_response_time = round(p95_response_time, 3)

# 2. Hourly traffic
index=main sourcetype=access_combined
| timechart span=1h count AS requests, avg(response_time) AS avg_rt

# 3. Per-endpoint performance
index=main sourcetype=access_combined
| stats count, avg(response_time) AS avg_rt, perc95(response_time) AS p95_rt by uri
| sort -count
| head 20
| eval avg_rt = round(avg_rt, 3)
| eval p95_rt = round(p95_rt, 3)

# 4. Error rate over time
index=main sourcetype=access_combined
| timechart span=15m count(eval(status>=400)) AS errors, count AS total
| eval error_rate = round(errors / total * 100, 2)
| fields _time, error_rate

# 5. Response time distribution
index=main sourcetype=access_combined
| eval rt_bucket = case(
    response_time < 0.1, "< 100ms",
    response_time < 0.5, "100-500ms",
    response_time < 1.0, "500ms-1s",
    response_time < 5.0, "1-5s",
    1=1, "> 5s"
)
| stats count by rt_bucket
| sort rt_bucket

Summary

Concept Description
stats Group-by aggregation
chart Two-dimensional cross-tabulation
timechart Time-series aggregation
top / rare Most / least frequent values
eventstats Append aggregated values to original events
streamstats Running and moving statistics

Key Takeaways

  1. stats is the most fundamental aggregation command
  2. timechart is essential for time-series analysis
  3. Use eventstats when you need to compare individual events against aggregate values
  4. Use streamstats for moving averages and cumulative totals

Exercises

Exercise 1: Basic

Use stats to display the event count, number of unique hosts, and the latest timestamp for each sourcetype.

Exercise 2: Applied

Use timechart to aggregate request counts by status code in 15-minute intervals, suitable for a time-series chart.

Challenge

Combine eventstats and where to find requests that took more than twice the average response time for their host, then count them by URI.


References


Next up: In Day 6, you will learn about data visualization -- creating dashboards and panels to present your data visually.