SPL Introduction: From Basics to Practical Splunk Searches

Shunku

Splunk is one of the most widely used log analysis platforms in the world. At its core is SPL (Search Processing Language). Your ability to leverage Splunk effectively depends heavily on how well you understand SPL.

This article covers SPL fundamentals through practical queries with concrete examples.

What is SPL?

SPL (Search Processing Language) is the language used to search and analyze data indexed by Splunk. It combines concepts from Unix pipelines and SQL, allowing you to intuitively filter, transform, and aggregate data.

flowchart LR
    subgraph Pipeline["SPL Pipeline"]
        A["Search"] --> B["Filter"]
        B --> C["Transform"]
        C --> D["Aggregate"]
        D --> E["Display"]
    end
    style Pipeline fill:#3b82f6,color:#fff

Key characteristics of SPL:

  • Pipeline processing: Chain commands with | to process data sequentially
  • Time-series optimized: Fast timestamp-based searching
  • 140+ commands: Comprehensive coverage for statistics, transformation, and visualization

Basic Syntax

Search Structure

SPL queries start with search terms and chain commands using pipes (|).

index=web_logs status=500 | stats count by uri | sort -count | head 10

This query:

  1. Searches the web_logs index for events where status=500
  2. Counts events grouped by uri
  3. Sorts by count in descending order
  4. Returns the top 10 results

Search Terms

The first part of your search narrows down the target data.

# Index specification
index=main

# Keyword search
error OR failed

# Field value matching
status=404
host="web-server-01"

# Wildcards
source="/var/log/*.log"

# Negation
NOT status=200

# Time range (relative)
earliest=-24h latest=now

Important: index=* searches all indexes and is extremely slow. Always specify a concrete index.

Time Range Specification

Splunk is optimized for time-series data, so time range specification significantly impacts search performance.

# Relative time
earliest=-1h          # From 1 hour ago
earliest=-7d@d        # From midnight 7 days ago
earliest=@d           # From midnight today

# Absolute time
earliest="2026-02-01:00:00:00"
latest="2026-02-05:23:59:59"

# Snap operator (@)
earliest=-1d@d        # Yesterday at midnight
earliest=-1w@w        # Start of last week

Five Essential Commands

1. stats - Statistical Aggregation

stats is the most important command in SPL. It groups data and calculates statistics.

# Basic count
index=web_logs | stats count

# Group by field
index=web_logs | stats count by status

# Multiple statistical functions
index=web_logs
| stats count, avg(response_time) as avg_time, max(response_time) as max_time by uri

# Common statistical functions
# count    - Number of events
# sum      - Total
# avg      - Average
# min/max  - Minimum/Maximum
# dc       - Distinct count
# values   - List of unique values
# latest   - Most recent value

2. eval - Field Calculation

eval creates new fields or transforms existing ones.

# Create new field
index=web_logs
| eval response_sec = response_time / 1000

# Conditional logic
index=web_logs
| eval status_category = case(
    status < 300, "success",
    status < 400, "redirect",
    status < 500, "client_error",
    true(), "server_error"
)

# String manipulation
index=web_logs
| eval domain = lower(host)
| eval short_uri = substr(uri, 1, 50)

# DateTime operations
index=web_logs
| eval hour = strftime(_time, "%H")
| eval day_of_week = strftime(_time, "%A")

Useful eval functions:

Function Description Example
if(cond, true, false) Conditional if(status=200, "OK", "Error")
case(cond1, val1, ...) Multiple conditions See above
coalesce(a, b, ...) First non-null value coalesce(user, "anonymous")
len(str) String length len(message)
replace(str, regex, new) Replace replace(uri, "\d+", "N")
mvcount(field) Multivalue field count mvcount(tags)

3. timechart - Time Series Charts

timechart aggregates data along a time axis, outputting data suitable for graphing.

# Events over time
index=web_logs | timechart count

# Hourly count by status code
index=web_logs | timechart span=1h count by status

# Average response time (5-minute intervals)
index=web_logs | timechart span=5m avg(response_time) as avg_response

# Multiple metrics
index=web_logs
| timechart span=1h count as requests, avg(response_time) as avg_time
flowchart TB
    subgraph timechart["How timechart Works"]
        A["Raw Logs"] --> B["Split into Time Buckets"]
        B --> C["Aggregate per Bucket"]
        C --> D["Output Time Series Table"]
    end
    style timechart fill:#8b5cf6,color:#fff

4. table / fields - Output Field Control

table displays only specified fields in table format.

# Display specific fields only
index=web_logs
| table _time, host, uri, status, response_time

# Remove unwanted fields with fields
index=web_logs
| fields - _raw, _cd, _indextime

# Rename fields
index=web_logs
| rename response_time as "Response Time (ms)", status as "HTTP Status"
| table _time, host, uri, "HTTP Status", "Response Time (ms)"

5. where / search - Filtering

Filter data mid-pipeline.

# where evaluates expressions (works with calculated fields)
index=web_logs
| eval response_sec = response_time / 1000
| where response_sec > 5

# search uses keyword matching
index=web_logs
| stats count by uri, status
| search status=500

# where comparison operators
| where response_time > 1000
| where status >= 400 AND status < 500
| where like(uri, "/api/%")
| where match(user_agent, "(?i)bot")

Practical Examples

Example 1: Error Analysis Dashboard

Query set for analyzing web server errors.

# HTTP status code distribution
index=web_logs earliest=-24h
| eval status_group = case(
    status < 300, "2xx Success",
    status < 400, "3xx Redirect",
    status < 500, "4xx Client Error",
    true(), "5xx Server Error"
)
| stats count by status_group
| sort status_group
# Top 10 endpoints with most errors
index=web_logs status>=400 earliest=-24h
| stats count as errors by uri
| sort -errors
| head 10
# Hourly error rate
index=web_logs earliest=-24h
| timechart span=1h
    count(eval(status>=400)) as errors,
    count as total
| eval error_rate = round(errors / total * 100, 2)
| fields _time, errors, total, error_rate

Example 2: Performance Analysis

Response time analysis.

# Percentile analysis
index=web_logs earliest=-1h
| stats
    avg(response_time) as avg,
    median(response_time) as p50,
    perc95(response_time) as p95,
    perc99(response_time) as p99,
    max(response_time) as max
| eval avg = round(avg, 2)
# Identify slow requests
index=web_logs earliest=-1h
| where response_time > 3000
| table _time, host, uri, response_time, status
| sort -response_time

Example 3: User Behavior Analysis

# Daily active users
index=web_logs earliest=-7d
| timechart span=1d dc(user_id) as unique_users

# Per-user session analysis
index=web_logs user_id=* earliest=-24h
| stats
    count as page_views,
    dc(uri) as unique_pages,
    min(_time) as first_access,
    max(_time) as last_access
    by user_id
| eval session_duration = last_access - first_access
| eval session_minutes = round(session_duration / 60, 1)
| table user_id, page_views, unique_pages, session_minutes
| sort -page_views

Performance Best Practices

Key points for optimizing SPL queries.

1. Narrow the Time Range

# Bad: Searching all time
index=web_logs status=500

# Good: Specify time range
index=web_logs status=500 earliest=-24h latest=now

2. Filter as Early as Possible

# Bad: Filter after aggregation
index=web_logs | stats count by status | search status=500

# Good: Filter at search time
index=web_logs status=500 | stats count

3. Remove Unnecessary Fields with fields

# For logs with many fields
index=web_logs earliest=-1h
| fields _time, host, uri, status, response_time
| stats avg(response_time) by host

4. stats is Faster than transaction

# Slow: transaction
index=web_logs | transaction session_id | stats count

# Fast: stats
index=web_logs | stats count, values(uri) as pages by session_id | stats count

Summary

Command Purpose Example
stats Statistical aggregation stats count, avg(field) by group
eval Field calculation eval new_field = field1 + field2
timechart Time series aggregation timechart span=1h count by status
table Field selection table _time, host, status
where Conditional filter where response_time > 1000
sort Sort sort -count (descending)
head/tail Limit results head 10
rename Rename fields rename field as "New Name"
dedup Remove duplicates dedup host, uri

SPL is deep with over 140 commands, but mastering the five commands covered in this article (stats, eval, timechart, table, where) will handle most use cases.

References