Learn Splunk in 10 DaysDay 7: Advanced Search Techniques
books.chapter 7Learn Splunk in 10 Days

Day 7: Advanced Search Techniques

What You Will Learn Today

  • Subsearches
  • The lookup command
  • The join command
  • The transaction command
  • append and appendpipe

Subsearches

A subsearch lets you embed one search inside another. The inner search runs first, and its results feed into the outer search.

# Show logs from the host with the most errors
index=main
  [search index=main status>=500
   | stats count by host
   | sort -count
   | head 1
   | fields host]
flowchart TB
    Outer["Main Search<br>index=main"]
    Inner["Subsearch<br>[search ... | head 1 | fields host]"]
    Result["Result<br>Events for host=web-01"]
    Inner -->|"host=web-01"| Outer --> Result
    style Outer fill:#3b82f6,color:#fff
    style Inner fill:#22c55e,color:#fff
    style Result fill:#f59e0b,color:#fff

How Subsearches Work

  1. The search inside [...] runs first
  2. Its results are expanded as OR conditions into the main search
  3. Example: [search ... | fields host] becomes (host="web-01")

Practical Examples

# Show all access from IPs that generated many errors in the past hour
index=main sourcetype=access_combined
  [search index=main sourcetype=access_combined status>=400 earliest=-1h
   | stats count by clientip
   | where count > 10
   | fields clientip]
| table _time, clientip, uri, status

# Find activity within a specific user's session
index=main sourcetype=app_log
  [search index=main sourcetype=auth_log action=login user=alice
   | head 1
   | fields session_id]

Note: Subsearches have a time limit (60 seconds by default) and a result limit (10,500 results by default). For large datasets, consider using join instead.


The lookup Command

Lookups enrich events with data from external CSV files or KV stores.

Creating a CSV Lookup

  1. Prepare a CSV file:
status_code,status_description,severity
200,OK,info
301,Moved Permanently,info
400,Bad Request,warning
401,Unauthorized,warning
403,Forbidden,warning
404,Not Found,warning
500,Internal Server Error,critical
502,Bad Gateway,critical
503,Service Unavailable,critical
  1. Upload to Splunk: Settings > Lookups > Lookup table files > Add new

  2. Create a lookup definition: Settings > Lookups > Lookup definitions > Add new

Using the lookup Command

# Add status descriptions to access logs
index=main sourcetype=access_combined
| lookup http_status status_code AS status OUTPUT status_description, severity
| table _time, uri, status, status_description, severity

Automatic Lookups

Configure transforms.conf and props.conf to apply lookups automatically at search time.

# transforms.conf
[http_status_lookup]
filename = http_status.csv
# props.conf
[access_combined]
LOOKUP-http_status = http_status_lookup status_code AS status OUTPUT status_description severity

inputlookup / outputlookup

# Display the contents of a lookup table
| inputlookup http_status.csv

# Save search results to a lookup
index=main sourcetype=access_combined
| stats count by clientip
| outputlookup ip_activity.csv

The join Command

The join command merges two result sets on a shared field, similar to a SQL join.

# Join auth logs with access logs
index=main sourcetype=access_combined
| join type=left clientip
  [search index=main sourcetype=auth_log
   | stats latest(user) AS user, latest(action) AS last_action by src_ip
   | rename src_ip AS clientip]
| table _time, clientip, user, uri, status

Join Types

Type Description
inner (default) Only events that match on both sides
left All events from the main search + matching right-side data
outer All events from both sides

Note: join is memory-intensive and does not scale well with large datasets. Prefer stats or lookup when possible.

Alternatives to join

# Use stats instead of join
index=main (sourcetype=access_combined OR sourcetype=auth_log)
| stats values(uri) AS uris, values(user) AS users, latest(status) AS status by clientip

The transaction Command

The transaction command groups related events together. It is especially useful for session analysis and workflow tracking.

# Group events by client IP into sessions
index=main sourcetype=access_combined
| transaction clientip maxspan=30m maxpause=5m
| table clientip, duration, eventcount, _time

# Define session boundaries explicitly
index=main sourcetype=app_log
| transaction session_id startswith="login" endswith="logout"
| table session_id, user, duration, eventcount

Transaction Parameters

Parameter Description Example
maxspan Maximum duration of a transaction maxspan=1h
maxpause Maximum gap between events maxpause=5m
startswith Condition for the first event startswith="login"
endswith Condition for the last event endswith="logout"
maxevents Maximum number of events maxevents=100

Fields Added by transaction

Field Description
duration Duration of the transaction in seconds
eventcount Number of events in the transaction
closed_txn Whether the transaction closed normally
flowchart LR
    E1["10:00<br>login"]
    E2["10:05<br>browse"]
    E3["10:10<br>purchase"]
    E4["10:15<br>logout"]
    subgraph TX["transaction session_id"]
        E1 --> E2 --> E3 --> E4
    end
    TX -->|"duration=900s<br>eventcount=4"| Result["Result"]
    style TX fill:#3b82f6,color:#fff
    style Result fill:#22c55e,color:#fff

Performance tip: transaction is a resource-heavy command. Where possible, use stats as a lighter alternative.

# Replicate transaction behavior with stats
index=main sourcetype=access_combined
| stats min(_time) AS start, max(_time) AS end, count AS eventcount, values(uri) AS pages by clientip
| eval duration = end - start

append and appendpipe

append

The append command appends the results of a second search to the current result set.

# Regular results + a total row
index=main sourcetype=access_combined
| stats count by host
| append
  [search index=main sourcetype=access_combined
   | stats count
   | eval host="TOTAL"]
| sort -count

appendpipe

The appendpipe command runs a secondary aggregation on the current result set and appends it as additional rows.

# Per-host counts + a total row
index=main sourcetype=access_combined
| stats count by host
| appendpipe [stats sum(count) AS count | eval host="TOTAL"]
| sort -count

Recommendation: For adding summary rows, appendpipe is simpler and more efficient than append.


Hands-On: Security Analysis Queries

# 1. Brute force detection
# Multiple login failures from the same IP in a short period
index=main sourcetype=auth_log action=failed
| stats count by src_ip
| where count > 5
| lookup geo_ip ip AS src_ip OUTPUT country, city
| sort -count

# 2. Unusual access patterns
# Sessions with an abnormally high page count
index=main sourcetype=access_combined
| transaction clientip maxspan=10m
| where eventcount > 50
| table clientip, eventcount, duration

# 3. Error chain analysis
# Inspect events surrounding a specific error
index=main sourcetype=app_log
| transaction host maxspan=5m maxpause=1m
| search "OutOfMemoryError"
| table _time, host, eventcount, _raw

# 4. User behavior tracking
# Track all activity from IPs that received multiple 403s
index=main sourcetype=access_combined
  [search index=main sourcetype=access_combined status=403
   | stats count by clientip
   | where count > 3
   | fields clientip]
| transaction clientip maxspan=1h
| table clientip, duration, eventcount, _raw

Summary

Concept Description
Subsearch Embed a search inside another with [search ...]
lookup Enrich events with external data
join Merge two result sets on a shared field
transaction Group related events into logical units
append Add results from a separate search
appendpipe Add summary rows from the current result set

Key Takeaways

  1. Subsearches are subject to time and result count limits
  2. lookup is the best tool for enriching events with external data
  3. join is memory-intensive -- use it sparingly on large datasets
  4. transaction is powerful but heavy -- prefer stats when possible

Exercises

Exercise 1: Basic

Create an HTTP status code CSV lookup and use it to add descriptions to your access logs.

Exercise 2: Applied

Use a subsearch to display all logs from the host that had the most errors in the past hour.

Exercise 3: Challenge

Use transaction to analyze user sessions and calculate average session duration, average page views per session, and bounce rate (percentage of single-page sessions).


References


Coming up next: In Day 8, you will learn about alerts and reports -- how to automate monitoring with scheduled searches, trigger notifications, and build reusable search macros.