Learn Splunk in 10 DaysDay 2: Data Ingestion
books.chapter 2Learn Splunk in 10 Days

Day 2: Data Ingestion

What You Will Learn Today

  • Types of data inputs
  • Monitoring files and directories
  • Network inputs (TCP/UDP)
  • Configuring sourcetypes
  • The basics of inputs.conf

Types of Data Inputs

Splunk supports a wide variety of methods for ingesting data.

flowchart TB
    subgraph Inputs["Data Input Methods"]
        File["Files / Directories<br>monitor"]
        Network["Network<br>TCP / UDP"]
        Script["Scripted Input<br>scripted input"]
        HEC["HTTP Event Collector<br>HEC"]
        API["REST API<br>modular input"]
    end
    style File fill:#3b82f6,color:#fff
    style Network fill:#22c55e,color:#fff
    style Script fill:#f59e0b,color:#fff
    style HEC fill:#8b5cf6,color:#fff
    style API fill:#ef4444,color:#fff
Input Method Description Use Case
File Monitor Watches files and directories Log files
Network Receives data on TCP/UDP ports syslog
HEC Sends data over HTTP Application logs
Scripted Input Captures script output Custom data collection
Modular Input Data collection via add-ons Cloud service integrations

File Monitoring (Monitor Input)

Configuring via the Web UI

  1. Go to Settings > Data inputs > Files & directories
  2. Select New Local File & Directory
  3. Enter the path (e.g., /var/log/syslog)
  4. Choose or create a sourcetype
  5. Select an index

Configuring via inputs.conf

# $SPLUNK_HOME/etc/system/local/inputs.conf

# Monitor a single file
[monitor:///var/log/syslog]
disabled = false
index = main
sourcetype = syslog

# Monitor all files in a directory
[monitor:///var/log/apache2/]
disabled = false
index = web
sourcetype = access_combined

# Monitor files matching a pattern
[monitor:///opt/app/logs/*.log]
disabled = false
index = application
sourcetype = app_log

Monitor Configuration Options

Parameter Description Example
disabled Enable or disable the input false
index Destination index main
sourcetype Sourcetype assignment syslog
host Override the host name web-01
ignoreOlderThan Skip files older than this 7d
whitelist Include files matching pattern \.log$
blacklist Exclude files matching pattern \.gz$

Network Inputs

Receiving syslog (UDP)

# inputs.conf
[udp://514]
disabled = false
sourcetype = syslog
index = syslog

TCP Input

[tcp://9997]
disabled = false
sourcetype = forwarder_data

Note: TCP port 9997 is the standard port for receiving data from forwarders.


HTTP Event Collector (HEC)

HEC lets applications send data to Splunk directly over HTTP.

Enabling HEC

  1. Go to Settings > Data inputs > HTTP Event Collector
  2. Under Global Settings, enable HEC
  3. Click New Token to create a token

Sending Data

curl -k https://localhost:8088/services/collector/event \
  -H "Authorization: Splunk YOUR_HEC_TOKEN" \
  -d '{"event": "Hello from HEC!", "sourcetype": "manual", "index": "main"}'

Sending JSON Data

curl -k https://localhost:8088/services/collector/event \
  -H "Authorization: Splunk YOUR_HEC_TOKEN" \
  -d '{
    "event": {
      "action": "login",
      "user": "alice",
      "status": "success",
      "ip": "192.168.1.10"
    },
    "sourcetype": "app_json",
    "index": "main"
  }'
flowchart LR
    App["Application"]
    HEC["HEC<br>:8088"]
    Indexer["Indexer"]
    App -->|"HTTP POST<br>+ Token"| HEC --> Indexer
    style App fill:#3b82f6,color:#fff
    style HEC fill:#22c55e,color:#fff
    style Indexer fill:#f59e0b,color:#fff

Sourcetypes

A sourcetype tells Splunk how to interpret and parse the format of your data.

Built-in Sourcetypes

Sourcetype Description
syslog Syslog format
access_combined Apache Combined Log Format
csv CSV format
json JSON format
_json Auto-detected JSON
log4j Java Log4j format

Custom Sourcetypes

# props.conf
[my_app_log]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 19
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
Parameter Description
TIME_FORMAT Timestamp format string
TIME_PREFIX Characters preceding the timestamp
MAX_TIMESTAMP_LOOKAHEAD Max characters to scan for a timestamp
LINE_BREAKER Pattern that separates events
SHOULD_LINEMERGE Whether to merge multiple lines into one event

Creating Indexes

Creating via the Web UI

  1. Go to Settings > Indexes
  2. Click New Index
  3. Enter a name (e.g., web_logs)
  4. Configure retention period and size limits

Configuring via indexes.conf

# indexes.conf
[web_logs]
homePath = $SPLUNK_DB/web_logs/db
coldPath = $SPLUNK_DB/web_logs/colddb
thawedPath = $SPLUNK_DB/web_logs/thaweddb
maxDataSize = auto_high_volume
maxTotalDataSizeMB = 50000
frozenTimePeriodInSecs = 7776000  # 90 days

The Data Lifecycle

flowchart LR
    Hot["Hot<br>New data<br>Writable"]
    Warm["Warm<br>Recent data<br>Read-only"]
    Cold["Cold<br>Older data<br>Slow storage"]
    Frozen["Frozen<br>Deleted or archived"]
    Hot --> Warm --> Cold --> Frozen
    style Hot fill:#ef4444,color:#fff
    style Warm fill:#f59e0b,color:#fff
    style Cold fill:#3b82f6,color:#fff
    style Frozen fill:#8b5cf6,color:#fff
Bucket Description Characteristics
Hot Newest data Writable, fast storage
Warm Recent data Read-only, fast storage
Cold Older data Read-only, slower storage
Frozen Expired data Deleted or archived

Universal Forwarder

In production environments, the Universal Forwarder (UF) is used to ship data to indexers.

# Install the Universal Forwarder (Linux)
wget -O splunkforwarder.tgz "https://download.splunk.com/products/universalforwarder/releases/latest/linux/splunkforwarder-latest-Linux-x86_64.tgz"
tar xvzf splunkforwarder.tgz -C /opt

# Initial setup
/opt/splunkforwarder/bin/splunk start --accept-license
/opt/splunkforwarder/bin/splunk add forward-server indexer01:9997
/opt/splunkforwarder/bin/splunk add monitor /var/log/syslog
flowchart LR
    Server1["Web Server<br>UF"]
    Server2["App Server<br>UF"]
    Server3["DB Server<br>UF"]
    Indexer["Splunk<br>Indexer<br>:9997"]
    Server1 --> Indexer
    Server2 --> Indexer
    Server3 --> Indexer
    style Server1 fill:#3b82f6,color:#fff
    style Server2 fill:#3b82f6,color:#fff
    style Server3 fill:#3b82f6,color:#fff
    style Indexer fill:#22c55e,color:#fff

Hands-On: Ingesting Data from Multiple Sources

Create the following three log files and ingest them into Splunk.

access.log (Web Access Log)

192.168.1.10 - alice [30/Jan/2026:10:00:01 +0900] "GET /index.html HTTP/1.1" 200 2048
192.168.1.20 - bob [30/Jan/2026:10:00:05 +0900] "POST /api/login HTTP/1.1" 401 128
192.168.1.10 - alice [30/Jan/2026:10:00:10 +0900] "GET /dashboard HTTP/1.1" 200 4096

app.log (Application Log)

2026-01-30 10:00:01 INFO  [main] Application started successfully
2026-01-30 10:00:05 WARN  [db-pool] Connection pool running low: 2/10
2026-01-30 10:00:10 ERROR [api] NullPointerException at UserService.java:42

auth.log (Authentication Log)

Jan 30 10:00:01 server01 sshd[1234]: Accepted publickey for alice from 192.168.1.10 port 22
Jan 30 10:00:05 server01 sshd[1235]: Failed password for bob from 192.168.1.20 port 22
Jan 30 10:00:10 server01 sshd[1236]: Failed password for root from 10.0.0.1 port 22

Summary

Concept Description
Monitor Input Watch files and directories for new data
Network Input Receive data over TCP/UDP
HEC Send data over HTTP
Sourcetype Defines how Splunk parses data
Index A data repository
Universal Forwarder A lightweight agent for shipping data
inputs.conf The configuration file for data inputs

Key Takeaways

  1. Monitor Input is the most common way to ingest log files
  2. HEC is ideal for sending data directly from applications
  3. Sourcetypes define the parsing rules for your data
  4. Use the Universal Forwarder in production environments

Exercises

Exercise 1: Basic

Ingest the three sample log files into Splunk, assigning a different sourcetype to each.

Exercise 2: Applied

Enable HEC and use curl to send JSON data. Verify that the data appears in Splunk search results.

Challenge

Create a custom sourcetype in props.conf with your own timestamp format and field extraction rules for a custom log format.


References


Next up: In Day 3, you will learn the fundamentals of searching -- mastering the basics of SPL (Search Processing Language).