This post starts with a warning: This isn't an in-depth treaty about systemd, but that's on purpose. First, no in-depth systemd post should be written this close to midnight, or you'll have bad dreams. And second, I've found that even just this tiny piece of knowledge of systemd has already provided me with some comfort that it's not that impossible to grok. So if you're like me, an Ubuntu user who started in the Slackware days, and who is somewhat terrified of not understanding how services are started by this new incarnation of
init, read on.
For some background, the problem I ran into was that BIND was flooding my logs with IPv6 failures:
Jul 9 06:26:56 anthem named: network unreachable resolving './NS/IN': 2001:503:c27::2:30#53
That's totally appropriate, as this network doesn't have IPv6 connectivity. And I had previously silenced this (as others have too) by adding
-4 to the commandline options used to start
named, so what was going on?
Well, first, that option was definitely not being passed:
ps auxw | grep named bind 19165 0.3 1.1 451316 44664 ? Ssl 17:31 1:06 /usr/sbin/named -f -u bind
And it was definitely being specified in
# run resolvconf? RESOLVCONF=yes # additional startup options for the server OPTIONS="-u bind -4"
What's up with that? It turns out it's a bug inherited from Debian and that is yet to be fixed in Xenial (UPDATE: just fixed, in time for 18.04 LTS!), and what follows is what I had to learn in order to figure that out.
(The reason a few of these ignore-etc-default bugs have cropped up is that it turns out systemd upstream doesn't like the idea of having to spawn a shell just to read options from /etc/default/foo, and it's a bit of extra work for a Debian package maintainer to, when upgrading a service to systemd, parse /etc/default and set up systemd to override the options provided to the command.)
"initscripts" in systemd
systemd starts services by parsing unit files; the system-provided versions live in
/lib/systemd/system/, and that's where I found the bind9 unit:
[Unit] Description=BIND Domain Name Server Documentation=man:named(8) After=network.target [Service] ExecStart=/usr/sbin/named -f -u bind ExecReload=/usr/sbin/rndc reload ExecStop=/usr/sbin/rndc stop [Install] WantedBy=multi-user.target
Hah, there's nothing about parsing
/etc/default/bind9 there, so of course my option is being ignored.
Now systemd lets you override how services are started up by putting them in
/etc, so to test a fix I just copied the file over and changed it:
# cd /etc/systemd/system # cp /lib/systemd/system/bind9.service . # sed -i 's/named -f/named -4 -f/' bind9
But when I ran
systemctl start bind again, I noticed that the option was still not there. Hmm.
status output shows in its second line what's wrong:
● bind9.service - BIND Domain Name Server Loaded: loaded (/lib/systemd/system/bind9.service; enabled; vendor preset: enabled)
It's not looking at
/etc. Interestingly, systemd caches its configuration, so to get it to re-read the files you need to issue
Once you've done that,
status will show you the right output:
● bind9.service - BIND Domain Name Server Loaded: loaded (/etc/systemd/system/bind9.service; enabled; vendor preset: enabled)
and behold, it works:
bind 19165 0.3 1.1 451316 44664 ? Ssl 17:31 1:08 /usr/sbin/named -4 -f -u bind
Partial overrides through drop-ins
You can also override a specific stanza by populating a file in the
bind9.service.d directory; for instance, instead of copying the whole unit file, you could just:
# mkdir /etc/systemd/system/bind9.service.d # vi /etc/systemd/system/bind9.service.d/no-ipv6.conf
providing the following contents:
[Service] ExecStart= ExecStart=/usr/sbin/named -f -u bind -4
That gets read and is applied over what other unit files (in both
/etc) for the service specify.
ExecStart is oddly specified twice, the first with an empty RHS. That's because
ExecStart is parsed as a list (I suppose to allow executing multiple processes in a single unit file), and in our case we want to override the single
named command run, so we need to clear the list first. See Example 2 of the
systemd.unit manpage and this great AskUbuntu post for more detail.
is-system-running, and degraded mode
systemd has a somewhat confusing, but useful command to check overall status:
# systemctl is-system-running running
The meaning of the output surprised me: running means the system is up and all services came up correctly. If one or more services failed to start, you'll see this instead:
# systemctl is-system-running degraded
Surprisingly, I'm not the first person to stumble upon this command, run it, and discover something that I should have fixed. This post on reddit showed me what to do when that happened on one of my servers:
# systemctl --failed smartd
It turns out
smartd was completely failing to run:
# systemctl status smartd ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2017-07-13 22:27:02 BRT; 47s ago Docs: man:smartd(8) man:smartd.conf(5) Process: 12542 ExecStart=/usr/sbin/smartd -n $smartd_opts (code=exited, status=17) Main PID: 12542 (code=exited, status=17) Jul 13 22:27:02 chorus systemd: Started Self Monitoring and Reporting Technology (SMART) Daemon. Jul 13 22:27:02 chorus smartd: smartd 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-83-generic] (local build) Jul 13 22:27:02 chorus smartd: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Jul 13 22:27:02 chorus smartd: Opened configuration file /etc/smartd.conf Jul 13 22:27:02 chorus smartd: Configuration file /etc/smartd.conf parsed. Jul 13 22:27:02 chorus systemd: smartd.service: Main process exited, code=exited, status=17/n/a Jul 13 22:27:02 chorus systemd: smartd.service: Unit entered failed state. Jul 13 22:27:02 chorus systemd: smartd.service: Failed with result 'exit-code'.
Okay, it failed, but it doesn't tell me much about why. To dig in, you can either look at the logs, or use
journalctl -u smartd.service
In the output, nicely colored in red, I got
Jul 13 19:02:39 chorus smartd: Unable to register ATA device /dev/sda at line 23 of file /etc/smartd.conf Jul 13 19:02:39 chorus smartd: Unable to register device /dev/sda (no Directive -d removable). Exiting.
All of this, because the config file had a bad path and I'd never bothered to fix it when I replaced the drives on this machine. Oops!
Finally, there are a bunch of possible states in the
systemctl manpage's Table 2 for
is-system-running, but you're only likely to see
maintenance in real life; the rest are transient states that happen during startup or shutdown (or weirder things).
Thanks so much to Steve Langasek for giving me some intial hints, and until the next time.
I'm not sure why, but the CoreOS docs for drop-in units currently copy verbatim from the manpage the stuff about list settings, but omit the helpful example. Even worse, their SEO beats the manpage! ↩︎