14 November, 2018

Recent systemd Update May Make Your Service Units Not Start Properly


I had set up a MythTV on a Raspberry Pi using the mythtv-light repository.  I even used a lot of the suggestions from the MythTV wiki advising how to formulate a systemd service unit file.  It seemed to be working fine, I was watching programs, pulling content from a SiliconDust HDHomeRun Quatro and dutifully recording streams from it to a USB HDD.  Then TV enjoyment disaster struck.

Today I decided to set up another Pi to do the commercial flagging and any transcoding (although currently not doing it, although I may start due to things like Roku not supporting the native format, which may be either its own Nupple format or MPEG-TS, not sure).  The plan is also to include MariaDB replication to the new Pi so that the database is fault tolerant too.  I'd like to a make note here that I do like to do my own OS updates rather than have them done automatically; that way it's far simpler to relate something that starts breaking to a recently performed update rather than having to go back into some log files and figure out what changed.  This would be for Raspbian Stretch (FYI, Raspian releases follow Debian releases, and Stretch is the current stable release as of writing this.)  And once again this policy decision proved quite useful.  I remember that there was a systemd update very recently, probably between the last start of my mythbackend and now.  Having a User= directive in my unit file worked just fine, up until today when I restarted MariaDB, then the Myth backend (don't know if it tolerates the DB being restarted too well).

Suddenly mythfrontend was complaining that it could not connect to the backend.  huh....That's odd.  Does systemctl show that it's running?  Indeed, it's running, and it's not exiting and respawning, because its PID is not changing (the unit file specifies it is to be restarted after 3 seconds if it exits for any reason other than systemd telling it to do so).  But was there a listening socket?  Darn, netstat -tln told me that no, there was not.  There was nothing much to go on in the log file or systemcl status output, just something about not having data in a files cache in order to process expirations.  A weird thing was that the HDD activity LED indicated frequent disk access, yet there was very little/no activity indicated on the network switch LED for the HDHomeRun.  I had no idea what else it would be trying to do with the disk other than recording a program.

I thought, OK, slow way, way down, this is just too freaky.  Even stopping the backend was taking a really long time.  I got impatient and just used killall to try to stop it directly.  That seemed to "help."  So I wanted to see if something would be written to stdout if I ran the backend manually, from an XTerm command line.  And of course, while doing that, I would not use the loglevel clause.  But an odd thing happened: it ran normally.  OK, might this have been a temporary anomaly?  I tried once again to start via systemctl , and no, it was consistent.  Listening sockets were never being opened.  watch netstat -tln confirmed that.  Starting via the command line (which worked) and watching for the sockets to open showed it was only a few seconds.

I must say, I have had a lot of experience in things running differently depending on whether they have been started from an interactive login versus by the system (from init).  It's all in the execution environment.  Unix/POSIX/Linux has so many process properties, but more often than not it's environment variables (LD_LIBRARY_PATH and PATH are two of the most common which are different between system and interactive invocation and therefore cause things to fail).  So the next thing to try was removing User=mythtv (with "#") and adding /bin/su - mythtv -c to the command specifying how to start this unit.  Bingo, there you go; it started, stayed running, and even more importantly opened the socket listeners.  So hmmmm....what else does the system do for interactive logins?  Why, not only does it set HOME (which was already being done in the unit file) but it also sets your current working directory (cwd) to that value!  So hmmm....does a unit file have any directive like that?  Yes, yes it does, WorkingDirectory=.  So I set that to /home/mythtv, and it worked!  For some really oddball reason, mythbackend will not open its sockets/operate normally unless the cwd is set like that.  I have to wonder what systemd will choose for a service's cwd if you don't specify it, maybe the root.  Moreover, I don't know if User= previously changed the cwd to that listed for the user (usually in /etc/passwd) or not.

Hopefully this story will help someone whose service daemons have stopped working.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!