13 May, 2015

This is Why I Hate Being in IT Sometimes

I had multiple "WHAT THE??!!..." moments today.  I noticed that one of my internal hosts was attempting to connect to my router (which runs Sendmail) via its ULA and getting "connection refused."  All right...that should be a fairly easy optimization, just add another DAEMON_OPTIONS stanza to sendmail.mc, redo the M4 (with make(1)), restart the daemon, and Bob's your uncle.  WELLLLLLL....not quite.

After doing that, I got the following failure on startup:
NOQUEUE: SYSERR(root): opendaemonsocket: daemon mainv6MTA: cannot bind: Address already in use
Huh?  NOW WHAT?????!!!!  OKOKOK...so this should be simple, just back out the change. After all, I am using RCS for my sendmail.mc file, I should just check out (co) the revision of the file prior to adding the additional address to which to bind, run "make" for a new sendmail.cf, restart Sendmail.  NoNoNo...some days in IT, it's never that easy.  I still got the same error about the address being in use.  I used "netstat -tlnp" (TCP listeners, no DNS lookup (numeric output), and show the process associated with that socket) and saw there was nothing listening on TCP port 25...another "WHAT THE...!!!!" moment.

Believe me, this is one of the worst IT positions to be in, when backing out a change still results in a failure.  I even started going to my backups to fish out the previous sendmail.cf.  But then I thought, no, that's no help, it should be the same as that generated by the co of sendmail.mc, that's very unlikely to help any; no need to keep plugging in the USB HDD.  But this is where the better IT people, hopefully myself counted in that, get the job done.  Just to get "back on the air," I went into sendmail.cf directly, put a hashmark before the DaemonOptions line with address family inet6, and restarted Sendmail.  Pfew!  At least that worked; the daemon stayed runnning.

"OKOKOK...so what do I know?" I naturally asked myself.  The error is "address in use."  For Sendmail, what causes an address to be in use?  Well, that'd be any DaemonOptions lines in sendmail.cf, which are generated from DAEMON_OPTIONS lines in sendmail.mc.  So, next step, find all the non-commented-out DAEMON_OPTIONS lines in sendmail.mc (with grep) and go through them one by one to see if the same address shows up for more than one line.  Well...there was only one line, quite on purpose, whose daemon label is "mainv6MTA" (remember, from the error message), and that is for mail.philipps.us.  OKey dokey, so what is the address of mail.philipps.us?
host mail.philipps.us
(returns just my DHCP IPv4 address. Ummmm...)
host -t aaaa mail.philipps.us
mail.philipps.us has no AAAA record
This of course triggers another "WHAT THE...!!!!" moment.  How the frak did my AAAA record get deleted??  It turns out the actual reason was relatively simple.

At some time between when Sendmail had been started last and today, I did indeed manage to remove the AAAA record for "mail.philipps.us."  And now I know why.  As I have recently (well, back in 2015-Mar) changed ISPs, and therefore the IPv4 address I was using, I rejiggered the philipps.us zone file so that a.) I could make updates of the zone programmatic when the IPv4 address changes, and b.) it preserves the plain text zone file, with comments and such, so that precludes using a dynamic zone and updating with nsupdate.  I implemented this well after changing address space, so Sendmail continued to run just fine.  The implementation I chose was to $INCLUDE a separate zone file piece which is generated out of the DHCP client scripting using a template file (I get an address to use via DHCP).  Sure, there was a comment in the zone file that "mail" had been moved to an $INCLUDE file, but what I failed to realize at the time was, right below what I had commented out was an IPv6 address for "mail.philipps.us!"  But for compactness and less typing, I omitted the name, as is common in zone files.

mail    300    IN    A    192.0.2.112
        300    IN    AAAA 2001:DB8:2001:2010::112

became
;;; moved to $INCLUDE file
;;; mail    300    IN    A    192.0.2.112
            300    IN    AAAA 2001:DB8:2001:2010::112

So while doing this refactoring, I removed the A line, because it would be "covered" in the $INCLUDEd file.  But of course, this "continuation" line took on the name of whatever was before the now-commented-out A record.  So I had effectively obliterated the AAAA record for mail.philipps.us.  Ouch.

Let's review though.  The error was "cannot bind: Address already in use."  It is, unfortunately, one of those bogus, red herring errors.  For this particular build of Sendmail, instead of reporting the referral response for the inet6 lookup (and therefore lack of AAAA record), it (probably) used whatever garbage was in the IP address variable it uses.  Chances are that was initialized to all binary zeroes, which is the address used for the wildcard address.  I think at least on the kernel implementation I have on my router/Sendmail host, it would count IPv4 addresses used under that wildcard, by matching any ::ffff:w.x.y.z addresses to which I had already bound.

I really hate it when things don't work like they're supposed to work.  But indeed this time it was due to my own doing, just that it was a few weeks ago.  This is the sort of thing that really drives me batty about being in IT though, when reverting doesn't work.


Please direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!