Log Formats
HTTP access logs are the raw material of traffic analysis. The format you choose determines what questions you can answer later.
Common Log Format (CLF)
The original Apache log format — widely supported but limited:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326
Fields: host ident authuser [date] "request" status bytes
Combined Log Format
CLF plus Referer and User-Agent — Nginx's default:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
JSON Structured Format (Recommended)
JSON logs are parseable without regex and ship directly to log aggregators:
log_format json_access escape=json '{
"timestamp": "$time_iso8601",
"remote_addr": "$remote_addr",
"method": "$request_method",
"path": "$request_uri",
"status": $status,
"bytes": $body_bytes_sent,
"duration": $request_time,
"upstream_status": "$upstream_status",
"upstream_time": "$upstream_response_time",
"request_id": "$http_x_request_id",
"user_agent": "$http_user_agent",
"referer": "$http_referer",
"country": "$geoip2_data_country_code"
}';
access_log /var/log/nginx/access.log json_access buffer=64k flush=5s;
The buffer and flush parameters batch writes to reduce I/O overhead on high-traffic servers.
Log Rotation
Unrotated logs fill disks. On a busy server producing 1GB/day, you need rotation configured before the first day of traffic.
logrotate Configuration
# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily # Rotate every day
missingok # Don't error if log file missing
rotate 14 # Keep 14 days of logs
compress # gzip rotated files
delaycompress # Don't compress the most recent rotated file
# (allows tail -f to keep working)
notifempty # Don't rotate empty logs
sharedscripts # Run postrotate once for all matched files
postrotate
# Signal Nginx to reopen log files
nginx -s reopen 2>/dev/null || true
endscript
}
Size-Based vs Time-Based Rotation
| Strategy | Directive | Use Case |
|---|---|---|
| Daily | `daily` | Predictable retention, calendar alignment |
| Size-based | `size 100M` | Bursty traffic, prevent disk fill |
| Combined | `daily` + `size 500M` | High-traffic production |
# Size-based rotation (rotate when log exceeds 500MB)
/var/log/nginx/*.log {
size 500M
rotate 10
compress
missingok
postrotate
nginx -s reopen
endscript
}
Test logrotate Without Rotating
# Dry run — shows what would happen
logrotate -d /etc/logrotate.d/nginx
# Force rotation immediately (useful for testing)
logrotate -f /etc/logrotate.d/nginx
Centralized Logging
Local log files don't survive instance termination and are hard to query across multiple servers. Ship logs to a central aggregator.
Fluent Bit (Lightweight Agent)
# /etc/fluent-bit/fluent-bit.conf
[INPUT]
Name tail
Path /var/log/nginx/access.log
Tag nginx.access
Parser json
DB /var/lib/fluent-bit/nginx.db
[FILTER]
Name record_modifier
Match nginx.*
Record hostname ${HOSTNAME}
Record app myapp
[OUTPUT]
Name cloudwatch_logs
Match nginx.*
region us-east-1
log_group_name /myapp/nginx/access
log_stream_prefix nginx-
auto_create_group true
Filebeat → Elasticsearch
# filebeat.yml
filebeat.inputs:
- type: log
paths:
- /var/log/nginx/access.log
json.keys_under_root: true # Parse JSON fields to root level
json.add_error_key: true # Flag parse errors
output.elasticsearch:
hosts: ['https://es-cluster:9200']
index: 'nginx-access-%{+yyyy.MM.dd}'
Status Code Analysis
Once logs are flowing, extract insights with log queries.
Command-Line Analysis
# Count responses by status code (JSON logs)
cat /var/log/nginx/access.log | jq -r '.status' | sort | uniq -c | sort -rn
# Top 10 URLs returning 404
cat /var/log/nginx/access.log | jq -r 'select(.status==404) | .path' \
| sort | uniq -c | sort -rn | head -10
# 5xx errors in the last 5 minutes
cat /var/log/nginx/access.log \
| jq -r 'select(.status >= 500) | [.timestamp, .status, .path] | @tsv' \
| tail -100
# Average response time by status code
cat /var/log/nginx/access.log \
| jq -r '[.status, .duration] | @tsv' \
| awk '{sum[$1]+=$2; count[$1]++} END {for (s in sum) print s, sum[s]/count[s]}'
CloudWatch Insights Queries
# Error rate over time
fields @timestamp, status, path, duration
| filter status >= 400
| stats count(*) as error_count by bin(5m)
| sort @timestamp asc
# Top error paths
fields path, status
| filter status >= 400
| stats count(*) as errors by path, status
| sort errors desc
| limit 20
Real-Time Monitoring with GoAccess
GoAccess is a terminal-based and web-based log analyzer with live updates:
# Install
sudo apt install goaccess
# Real-time terminal dashboard
tail -f /var/log/nginx/access.log | goaccess - \
--log-format=COMBINED \
--real-time-html
# For JSON logs, use custom log format
goaccess /var/log/nginx/access.log \
--log-format='{ "timestamp": "%^T", "method": "%m", "path": "%U", "status": %s, "bytes": %b, "duration": %T, "user_agent": "%u" }' \
-o /var/www/html/report.html \
--real-time-html \
--daemonize
GoAccess provides at-a-glance views of status code distributions, top requested URLs, response time percentiles, and geographic distribution — all without a separate observability stack.
A complete log management lifecycle:
| Phase | Tool | Retention |
|---|---|---|
| Collection | Nginx JSON format | Local (rotating) |
| Shipping | Fluent Bit / Filebeat | Real-time |
| Storage | CloudWatch / Elasticsearch | 30-90 days |
| Analysis | CloudWatch Insights / Kibana | On-demand |
| Archival | S3 Glacier / cold storage | 1-7 years |