Production Infrastructure

HTTP Access Log Management: Rotation, Parsing, and Analysis

How to manage HTTP access logs at scale — log formats, logrotate configuration, centralized log shipping with Fluent Bit and Filebeat, status code analysis queries, and real-time monitoring with GoAccess.

Log Formats

HTTP access logs are the raw material of traffic analysis. The format you choose determines what questions you can answer later.

Common Log Format (CLF)

The original Apache log format — widely supported but limited:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326

Fields: host ident authuser [date] "request" status bytes

Combined Log Format

CLF plus Referer and User-Agent — Nginx's default:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

JSON logs are parseable without regex and ship directly to log aggregators:

log_format json_access escape=json '{
  "timestamp": "$time_iso8601",
  "remote_addr": "$remote_addr",
  "method": "$request_method",
  "path": "$request_uri",
  "status": $status,
  "bytes": $body_bytes_sent,
  "duration": $request_time,
  "upstream_status": "$upstream_status",
  "upstream_time": "$upstream_response_time",
  "request_id": "$http_x_request_id",
  "user_agent": "$http_user_agent",
  "referer": "$http_referer",
  "country": "$geoip2_data_country_code"
}';

access_log /var/log/nginx/access.log json_access buffer=64k flush=5s;

The buffer and flush parameters batch writes to reduce I/O overhead on high-traffic servers.

Log Rotation

Unrotated logs fill disks. On a busy server producing 1GB/day, you need rotation configured before the first day of traffic.

logrotate Configuration

# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily                    # Rotate every day
    missingok                # Don't error if log file missing
    rotate 14                # Keep 14 days of logs
    compress                 # gzip rotated files
    delaycompress            # Don't compress the most recent rotated file
                            # (allows tail -f to keep working)
    notifempty               # Don't rotate empty logs
    sharedscripts            # Run postrotate once for all matched files
    postrotate
        # Signal Nginx to reopen log files
        nginx -s reopen 2>/dev/null || true
    endscript
}

Size-Based vs Time-Based Rotation

StrategyDirectiveUse Case
Daily`daily`Predictable retention, calendar alignment
Size-based`size 100M`Bursty traffic, prevent disk fill
Combined`daily` + `size 500M`High-traffic production
# Size-based rotation (rotate when log exceeds 500MB)
/var/log/nginx/*.log {
    size 500M
    rotate 10
    compress
    missingok
    postrotate
        nginx -s reopen
    endscript
}

Test logrotate Without Rotating

# Dry run — shows what would happen
logrotate -d /etc/logrotate.d/nginx

# Force rotation immediately (useful for testing)
logrotate -f /etc/logrotate.d/nginx

Centralized Logging

Local log files don't survive instance termination and are hard to query across multiple servers. Ship logs to a central aggregator.

Fluent Bit (Lightweight Agent)

# /etc/fluent-bit/fluent-bit.conf
[INPUT]
    Name    tail
    Path    /var/log/nginx/access.log
    Tag     nginx.access
    Parser  json
    DB      /var/lib/fluent-bit/nginx.db

[FILTER]
    Name    record_modifier
    Match   nginx.*
    Record  hostname ${HOSTNAME}
    Record  app      myapp

[OUTPUT]
    Name          cloudwatch_logs
    Match         nginx.*
    region        us-east-1
    log_group_name /myapp/nginx/access
    log_stream_prefix nginx-
    auto_create_group true

Filebeat → Elasticsearch

# filebeat.yml
filebeat.inputs:
- type: log
  paths:
    - /var/log/nginx/access.log
  json.keys_under_root: true   # Parse JSON fields to root level
  json.add_error_key: true      # Flag parse errors

output.elasticsearch:
  hosts: ['https://es-cluster:9200']
  index: 'nginx-access-%{+yyyy.MM.dd}'

Status Code Analysis

Once logs are flowing, extract insights with log queries.

Command-Line Analysis

# Count responses by status code (JSON logs)
cat /var/log/nginx/access.log | jq -r '.status' | sort | uniq -c | sort -rn

# Top 10 URLs returning 404
cat /var/log/nginx/access.log | jq -r 'select(.status==404) | .path' \
  | sort | uniq -c | sort -rn | head -10

# 5xx errors in the last 5 minutes
cat /var/log/nginx/access.log \
  | jq -r 'select(.status >= 500) | [.timestamp, .status, .path] | @tsv' \
  | tail -100

# Average response time by status code
cat /var/log/nginx/access.log \
  | jq -r '[.status, .duration] | @tsv' \
  | awk '{sum[$1]+=$2; count[$1]++} END {for (s in sum) print s, sum[s]/count[s]}'

CloudWatch Insights Queries

# Error rate over time
fields @timestamp, status, path, duration
| filter status >= 400
| stats count(*) as error_count by bin(5m)
| sort @timestamp asc

# Top error paths
fields path, status
| filter status >= 400
| stats count(*) as errors by path, status
| sort errors desc
| limit 20

Real-Time Monitoring with GoAccess

GoAccess is a terminal-based and web-based log analyzer with live updates:

# Install
sudo apt install goaccess

# Real-time terminal dashboard
tail -f /var/log/nginx/access.log | goaccess - \
  --log-format=COMBINED \
  --real-time-html

# For JSON logs, use custom log format
goaccess /var/log/nginx/access.log \
  --log-format='{ "timestamp": "%^T", "method": "%m", "path": "%U", "status": %s, "bytes": %b, "duration": %T, "user_agent": "%u" }' \
  -o /var/www/html/report.html \
  --real-time-html \
  --daemonize

GoAccess provides at-a-glance views of status code distributions, top requested URLs, response time percentiles, and geographic distribution — all without a separate observability stack.

A complete log management lifecycle:

PhaseToolRetention
CollectionNginx JSON formatLocal (rotating)
ShippingFluent Bit / FilebeatReal-time
StorageCloudWatch / Elasticsearch30-90 days
AnalysisCloudWatch Insights / KibanaOn-demand
ArchivalS3 Glacier / cold storage1-7 years

Related Protocols

Related Glossary Terms

More in Production Infrastructure