This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
Log-rotation is a key for running a stable server, but removing log files is often an anathema to security, traceability, and server history. In reality, you want a perfect rotation setup in order to maximise the retention of logs.
Instead of trying to continuously trying to balance the number of logs to keep on disk, why not just set the “rotate” value to a higher value and add a small script in cron to handle deletion of old files?
A simple script that does the above can look like this:
#!/bin/bash
LOGPATH=/var/log/myapp # The path to the logs
DFHIGH=80 # The disk usage watermark you want to delete down to, in percent
MAXDAYS=45 # Max logfile age to keep in any case
MINDAYS=7 # Do not delete files newer than this
# Step 1: Remove old files
while [ $( df --output=pcent ${LOGDIR}/ | tail -1 | tr -cd "[0-9\n]" ) -ge ${DFHIGH} -a ${MAXDAYS} -ge ${MINDAYS} ]; do
ionice find ${LOGDIR} -type f -mtime +${MAXDAYS} \! -exec fuser -s '{}' \; -delete >/dev/null 2>/dev/null
let MAXDAYS--
done
# Step 2: Remove empty directories
find ${LOGDIR} -mindepth 1 -type d -empty -delete
Breaking the above script down:
- We start at predefined MAXAGE and only delete files if disk usage is above DFHIGH percent.
- Find all files under specified directory that are older (by last modified date) that MAXDAYS.
- To avoid deleting currently open logs, we check every file with “fuser” before they are deleted.
- Since this is not a fast process and we want to avoid hogging the disks, we run this with ionice.
- Now we decrement MAXDAYS and iterate. If we reach MINDAYS, we stop.
- Finally, as some logs may be placed in date-series directories, we delete any empty directories.
Of course, this is just an example, and the script only a template.
Another case may be to limit disk usage for just one application, but only if the total disk usage exceeds a given limit. This script is designed to be run often as it does not contain a loop and it will only delete the oldest files each time:
#!/bin/bash
LOGPATH=/var/log/myapp # The path to the logs or files
LIMIT=5000 # Max usage in MB
MINAGE=480 # Minimum age to keep in minutes (8 hours = 480)
DELTA=720 # Minutes under max age from where to delete files
# Step 1: Exit if disk use is under LIMIT:
if [ $( du -sm ${LOGPATH}|cut -f1 ) -lt ${LIMIT} ]; then
exit 0
fi
# Step 2: Delete the oldest files
MAXAGE=$(( ( ( $( date +%s ) - $( find ${LOGPATH} -type f -printf "%T@\n"|cut -f1 -d.|sort -n|head -1 ) ) / 60 ) - ${DELTA} ))
if [ ${MAXAGE} -gt ${MINAGE} ]; then
find ${LOGPATH} -type f -mmin +${MAXAGE} -delete
fi
DELTA is the thickness of the time-slice for files we remove from the logpath in each run. It should typically be slightly larger than the time period between runs of this script. Smaller DELTA deletes less files each run, so the script may need to be run more frequently.
Breaking down the above script:
- Initially, if disk usage of the given LOGPATH is under LIMIT, we do nothing.
- We find the age of the oldest file present (current timestamp minus the file timestamp), converted to minutes.
- From that value, we subtract DELTA minutes. If the result, MAXAGE, is bigger than MINAGE, we proceed to find files older than MAXAGE and delete.
- We usually run the script from a cron job a couple of times a day, but you have to adjust this to your needs.
- If you want, you can wrap step 2 into a while-loop. If you do, you will probably want to specify a lower DELTA value.
Thoughts on the CrowdStrike Outage
Unless you’ve been living under a rock, you probably know that last Friday a global crash of computer systems caused by ‘CrowdStrike’ led to widespread chaos and mayhem: flights were cancelled, shops closed their doors, even some hospitals and pharmacies were affected. When things like this happen, I first have a smug feeling “this would never happen at our place”, then I start thinking. Could it?
Broken Software Updates
Our department do take responsibility for keeping quite a lot ... [continue reading]