This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
I come back to a specific problem every once in a while: Changing a program’s file descriptor while the same program is running.
We do stupid things
From time to time, we do stupid things, like running a very important shell command, and adding debug output to see that it works, then realizing that it will take hours to finish, and spitting gigabytes of debug to an xterm through SSH does not help. The dream is of course to just redirect that output to stdout in a screen process. Or just put it in /dev/null.
Other people do stupid things
An other typical example could be a finding an ill managed system with some daemon without proper log-file handling. Restarting that process right now is just out of the question, copy-truncating that 16GB log-file will take too much time, and by the way, the disk is almost full.
The dark side of gdb
Is it possible to just move the file descriptor for a running process? Yes it is. Welcome to the dark side of gdb.
With the power of gdb at your hand, you can hook into the inner parts of any running program, and change, well, virtually anything. Sounds insanely dangerous for systems in production, right? This hack is not that ugly. It does just what the doctor ordered: It changes a process’ fds for you while it’s running.
The hack: Google helps always
This script is based on a variant written by Robert McKay, and is one of the many versions that Google may find for you.
Basic usage:
fdswap /var/log/mydaemon/output.log /dev/null 1234
#!/bin/bash
#
# fdswap
#
if [ "$2" = "" ]; then
echo "
Usage: $0 /path/to/oldfile /path/to/newfile [pids]
Example: $0 /var/log/daemon.log /var/log/newvolume/daemon.log 1234
Example: $0 /dev/pts/53 /dev/null 2345"; exit 0
fi
if gdb --version > /dev/null 2>&1; then true
else echo "Unable to find gdb."; exit 1
fi
src="$1"; dst="$2"; shift; shift
pids=$*
for pid in ${pids:=$( /sbin/fuser $src | cut -d ':' -f 2 )};
do
echo "src=$src, dst=$dst"
echo "$src has $pid using it"
(
echo "attach $pid"
echo 'call open("'$dst'", 66, 0666)'
for ufd in $(LANG=C ls -l /proc/$pid/fd | \
grep "$src"\$ | awk ' { print $9; } ');
do echo 'call dup2($1,'"$ufd"')'; done
echo 'call close($1)'
echo 'detach'; echo 'quit'
sleep 5
) | gdb -q -x -
done
Test your stuff before destroying everything
Do test this before smashing your production environment to pieces. The following uses non-password access to root, so make sure you can do
sudo whoami
and get root as an answer, without having to enter a password
Make small shell script like this:
#!/bin/bash
echo "This is the pid: $$"
echo "This is stdout:"; sudo ls -l /proc/$$/fd/0;
n=0; while true; do ((n++)); echo $n; sleep 1; done
Run that script in a separate window. Note the PID, and which VTY is connected to that process’ stdout
$ bash foo.sh &
This is the pid: 28073
This is stdout:
lrwx------. 1 ingvar ingvar 64 Nov 18 11:53 /proc/28073/fd/0 -> /dev/pts/9
1
2
3
4
So the script runs with PID 28073, and is pumping numbers to /dev/pts/9. Now start screen in another window, and find its VTY for stdout
$ screen
[screen]$ sudo ls -l /proc/$$/fd/0
lrwx------. 1 ingvar ingvar 64 Nov 18 11:54 /proc/28094/fd/0 -> /dev/pts/6
So, the screen process uses /dev/pts/6. We want to move the output from our foo.sh script to inside the screen
Run the fdswap, let all that gdb output scare you, but watch that number pumping magically stop. You need admin rights (sudo) unless you own all processes yourself.
sudo fdswap /dev/pts/9 /dev/pts/6 28073
Look inside the screen again. The numbers are now being pumped there.
Thoughts on the CrowdStrike Outage
Unless you’ve been living under a rock, you probably know that last Friday a global crash of computer systems caused by ‘CrowdStrike’ led to widespread chaos and mayhem: flights were cancelled, shops closed their doors, even some hospitals and pharmacies were affected. When things like this happen, I first have a smug feeling “this would never happen at our place”, then I start thinking. Could it?
Broken Software Updates
Our department do take responsibility for keeping quite a lot ... [continue reading]