I was involved in an incident post-mortem today without realizing it!
But first, a little background.
I lead the Incident Response team at Mathspace and we follow a blameless
post-mortem culture as popularized by SREs. It boils down to not
blaming specific person(s) seemingly responsible for an incident, but
instead acknowledging that incidents are product of failure at team/org
level. We then focus our efforts to pinpoint root cause and ways of
preventing it from happening in the future.
Today’s post-mortem was outside of work! I had just met up with my
partner at a cafe and we were walking home when she told me about her
experience riding her usual bus back from work. The bus doors had barely
finished closing behind her, when the driver stepped on the accelerator
like it was race car. As she had not yet tightened her grip on
the rails, she almost fell to the floor had it not been for assistance
from another passenger on the first row. She was quite upset while
telling the story. In fact, it was such a distressing event for her,
that she took the time to report this driver to the authorities. That
marked the end of her story.
What do you do when you need to expose an HTTP health check endpoint but
the thing you’re health checking isn’t a web server? You
socat it! Here’s a complete example:
nohup socat TCP-LISTEN:8080,reuseaddr,fork,crnl SYSTEM:"
supervisorctl status celery | grep -q RUNNING &&
echo HTTP/1.0 200 OK ||
echo HTTP/1.0 500 Down
" &>/dev/null &
In short, the above will run a web server on port
8080 which will respond with HTTP status code of
200 if celery is running, or
500 otherwise. You can replace
supervisorctl status celery | grep -q RUNNING with any other
command. The exit code of that command determines the web server’s
response. Pretty neat ha!
We make mistakes. In most part of life, mistakes tend to incur cost,
from money, harm to simply time, and due to interaction with the
physical world. In world of software however, where state of things can
be cheaply changed back and forth, with some clever design, mistakes can
be made free of cost. I’ve increasingly noticed more of such design in
products I use and I thought I make a list here:
Mute notifications for next 4 hours. Notifications can sometimes
frustrate the user who is under stress and if the notification settings
stick indefinitely, a temporary annoyance can turn to ignored
messages/emails for days. Google Hangouts, Slack and most messaging apps
have timed mute feature. One place where this is very much needed is on
browsers. Given that browser is a platform, it relieves all the
individual sites from having to implement this functionality. Android is
a good example where this is implemented across the entire OS. It even
goes further and allows marking notifications from certain apps as being
of higher priority which bypass the mute period.
I was hanging in the local IKEA store when I came across this $5
multifunction digital clock. The interesting thing about it is that you
switch between the functions, not by pressing a button, but by turning
the clock on its various sides. It has four sides and hence four
functions: thermometer, alarm, clock and countdown timer. The countdown
timer in particular is awesome. You set the timer and every time you
turn the clock on the timer side, it automatically starts counting down.
Over the last week, I decided to move my dev environment from OSX to my
own Archlinux box. Now I didn’t want to pollute my machine with various
libraries/tools needed for work so I decided to run things in a
container. I had played with systemd-nspawn in the past so it was
the clear first choice (and last in this case). Overall I am very happy
with it as it was a piece of cake to setup and only one or two hiccups
along the way.
I followed the excellent
guide on Archlinux Wiki and that took me 95% of the way. Firefox
tweaks page took me 99% there. What follows are some tips for
the last mile.
I drive once in awhile and even though I’d much prefer to rely on
someone/something else to tell me where to go, I’ve decided to memorize
directions rather than use a GPS navigation software. Instead of the
why, I’ve listed my wishlist for GPS navigation software. If these are
taken care of, I probably no longer have to look at street view for
every turn and use pen and paper like it’s the 90s.
Lane navigation. A passenger familiar with an area giving you
directions tends to mention which lane you should stay in well before
the next turn so you won’t have to force your way onto the neighboring
lanes at the last second and potentially cause an accident. This can be
extended to take into account streets with lots of cross streets where
people queue to turn to or streets where a lane is occupied by parked
cars (both further dependent on the time of day). A lot of drivers,
especially inexperienced ones, find changing lanes under pressure (ie
upcoming turn) very stressful and are more likely to make mistakes doing