06 March, 2017

Xorg Voodoo, and It Really Pays to Practice Your Backups and Restores

I wish Xorg/Wayland/Weston was not so much black magic voodoo juju.

For some experimentation and hacking fun, I installed gdm3 on my Xubuntu desktop system.  I set up gdm as my default DM (dpkg-reconfigure lightdm and select gdm from the list), and then I ran systemctl stop lightdm and systemctl start gdm.  That was somewhat of a visual shock, because I had never run gdm3 before, but nonetheless, it was usable.  I logged in as my normal user.

I had some "normal" logins, where GDM started up my "normal" Xfce session.  Then I decided, I wanted to see if the "Weston" option worked, as it had not under LightDM.  Shazam, whatever GDM does that LightDM doesn't, I don't know, but that entry worked.  Likewise I fiddled a little with "GNOME on Wayland", which was interesting.  It's the first time I've ever used (I think it's called) the Lens.  Meh.  It's OK I guess, but I miss my menus of applications and such.  I don't like the Lens so much.

One of the first things I noticed is, "log out" was not part of the dialog like it was under Xfwm/LightDM.  There was only poweroff, reboot, and suspend.  Huh?  That seems kind of weird.  Eventually, I found out (don't remember where) that there was a separate logout option.  Then I hopped on over to tty1, and did systemctl stop gdm (might have had a 3 too).  Then systemctl start gdm.  Wow, that's really weird.  GDM didn't start, but it looked like several times per second, it was trying.  It was even difficult to type systemctl stop gdm because a couple of times a second, input was being stolen by the process trying to start GDM (or Xorg, not sure which).  In fact, I don't know what the deuce was going on, but I could start neither gdm nor lightdm.

At this point I reasoned, I had seen the systemd file for lightdm and remembered it had a test for the default display manager.  I would have figured dpkg-reconfigure would just use systemctl enable and systemctl disable because it "knows" the list of DMs, but it writes /etc/X11/default-display-manager anyway.  Okeydokey, I did anoher dpkg-reconfigure lightdm and selected lightdm.  That still wouldn't start either.  Well, neither would gdm, so I rebooted.

The first surprise came when LightDM started really soon.  I had set up lightdm to be disabled, because I want all the stuff which happens at boot to settle down first, then start the display server.  I do this in rc.local by backgrounding a shell script which sleeps 20 seconds then does the appropriate thing for the service.  It used to use an appropriate Upstart command, but of course when upgrading Xubuntu LTS 14 to 16, it had to be updated to use systemctl(8) instead.  But it seems the dpkg-reconfigure had undone any enable or disable, since I had not even gotten the prompt on tty1 before the VT was changed to tty7 to start the X server.  Meh, OK, I recognized this and just disabled lightdm.

I had fun experimenting with starting Weston (like loading different modules in the [core] section).  One thing that didn't work too well was using drm-backend.so for Weston.  That not only killed Weston, but also whacked Xorg too.  That got a little whacky in that I had problems after that switching VTs.  I had to log into another host on my network, SSH to the workstation, and systemctl reboot.  After all, a computer isn't particularly useful if you can't type at it, if that's the way you normally give input to it.

I got tired of going through dpkg-reconfigure to switch DMs, so I just edited /etc/X11/default-display-manager directly.  That seemed to be OK, but eventually, I got to a point where GDM wouldn't start, and LightDM wouldn't start either.  Huh, that's weird.  So I restarted the whole system.

Then there was the chilling realization that systemctl start lightdm did not do a whole lot except throw errors I could not understand in to the systemctl status lightdm and journalctl outputs, like stuff about some assertion failing.  I'm sure if I wanted to take the time to download the ENTIRE SOURCE package for Xorg, I might see what that assertion does, and why its failure was happening, but I was not about to take all that time to futz around with that. What I thought might have helped is, I have an Xorg "prestart" script which sets the screen saver timeout and DPMS, changes the root window background color so that I know Xorg is running but before LightDM can initialize, and use some xrandr commands to set up the resolutions and refresh rates of the two framebuffers/monitors.  (Xorg cannot read EDID information because the switches through which both monitors are connected mangle EDID, so it uses defaults...and that's just really ugly.)  While I was writing that prestart script, I redirected stderr to an unused tty.  All I got on that VT was messages about "can't open display."  In retrospect, what I really should have peeked at was /var/log/Xorg.0.log for clues, but it would not necessarily have revealed anything I could understand.

I tried another dpkg-reconfigure to make sure whatever needs to be done to switch DMs is done, figuring it might be more than just rewriting the /etc/X11/default-display-manager file.  That, unfortunately, was no help whatsoever.  Restarting the system did not help either.  I remove/purge'd the gdm3 package; no help.  I reinstalled the lightdm package; that wasn't any help either.  Sigh.  It was going to be a really bad day if the only thing which is going to get my dailly driver back is a Xubuntu reinstallation and reconfiguration.  At least the vast, vast majority of my personal settings and data is on a separate /home logical volume.  I could very likely keep all the logical volumes, filesystems (but remade filesystems, except for /home of course) and stuff, so it wouldn't be like a blank disk installation.  I have to imagine there will unfortunately be a somewhat large portion of *buntu users where that would be their only option because they're just not that experienced or learned in operations at this level.  Most folks don't need it because their systems just work, they get their work done, and the amount of experimentation, especially at the system level, is minimal.

Next I did something I do very rarely, which is select the entry for system recovery at system boot.  I figured I needed as little as possible running for what I was about to do next.  Ugh.  That is really ugly because of the nomodeset option.  I am really, really used to the VTs coming up 1920x1080 (or 240x67).  So, I restarted and edited the default entry instead, adding "single" to the end of the kernel command line.  I figured pretty much all the configuration is held in /etc, so I figured out which disk and logical volume I used for backups last (which was right around midnight Sunday, started it up and went to bed) and mounted it.  Then I did rsync -av --delete /mnt/bkup/thishost/etc/. /etc/. to get the /etc directory back to how it was.  That went really quickly, as you can imagine.  Then I just hit Ctrl-Alt-Del.  That's of course going to umount the LV on the USB disk, deactivate all the USB LVs, everything buttoned up and ready to restart.

Except that didn't help either.  I even tried unplugging my computer for a while figuring it was some really weird juju with how the video controller was being initialized..hoping letting the capacitors discharge would unstick this lack of Xorg starting.  No, as I could have predicted, that really wasn't it either.

The semi-weird thing is, while logged in as the superuser on tty1, I could run Xorg :0 just fine.  Of course, that's not particularly useful, but at least it proved it was not a hardware denial, or corrupted driver .so'es, or something like that.  The X server itself would start, it's just that lightdm couldn't start it and use it.  Well...come to think of it, the screen was initialized to all black, not the gray dot pattern it usually does, and no big X cursor appeared.  Not sure what was up with that.  At least it didn't go, as it sometimes will when it's failing, to VT 7 and do nothing but leave the blinking text mode cursor there.

I was getting really discouraged (and a little panicked to be honest) at this point.  I thought it was going to be hours before being up and running again.  I was starting to think of, how am I going to fetch the ISO to do another installation?  Can I get one effectively with one of my other systems, likely with Lynx?  I mean, as IT disasters go, this is pretty mild because at least there is a "known way out" (namelly OS reinstallation) which is nearly guaranteed to get the blasted thing working again.  It's just the thought of the long, long time it was going to take to make that happen, with all the work that would need to be done in terms of installing the packages I like which basically has to happen after the standard installation was finished.  It could be a lot worse; it could be the CPU itself which doesn't work, and I'd have to go back to a LOT slower machine (from a Core 2 Duo to a Pentium IV).

Sigh.  OK, I wasn't too sure about using my complete backup.  I do a number of --exclude=  directives when I do the backups.  But I'm never quite sure if I am excepting enough.  For example, it'd probably be less than a good result if the LVM information was overwritten (so actually, that's already excluded).  And sometimes the presence of files can make a difference, so of course you're going to have to use the --delete directive.  I'm thinking, if this obliterates the wrong things, it's going to be a long, arduous reinstallation process, but hey, it's at least worth a try to do a full restore.  After all, like YouTuber AvE often says, if it's broken, how can it hurt a whole lot to break it some more?  Worst thing that happens is, my restore methodology overwrites zeroes over everything, and I have to reinstall everything anyway.  Surely it will take not a whole lot of time to MUNG things to the point where OS reinstallation becomes a certainty.

So with some trepidation, in single user mode again, I mounted up the last backup, but I was still unsure of what I was about to mangle, so I added the --dry-run option to rsync.  And boy am I glad I did.  When you go about deleting things like lost+found, and bad things™ happen, even worse things tend to happen when fsck is trying to set things right and it can't write to lost+found because it's not there.  It's also not particularly useful to go mucking about in /sys or /proc.  I definitely didn't want to get into a loop trying to do untoward things with /mnt/bkup/thishost so I knew enough to mount the backup read-only, but still figured out what I really wanted to do is exclude everything under /mnt.  I also chose to exclude everything under my $HOME but it would still be possible that some of the session files under there could screw with logging in under Xfce (or who knows, one time I got auto-logged into Weston when I didn't mean to, it must have stuck as the last thing I tried in the greeter).

So eventually I settled on a pretty significant set of --exclude's and let it rip.  As I had been experimenting with --dry-run a number of times, there already was significant information in the block cache that really, it was only a few minutes later that rsync said it was finished.  I restarted, unplugged the backup disk's PSU while the BIOS screen was showing (yep, it's that old, not UEFI), and let GrUB do its thing.  And...

Success!!


I killed my little delayed DM starter script, did systemctl start lightdm, and the system once again looked normal.  Of course, since the system is on a conventional SATA disk (not an SSD), it took agonizingly long to initialize, but I knew things were likely going to work OK because I got my normal prompt from ssh-agent to enter in the passphrase for my private keys in an XTerm.

What I'd really, really like to know is, what caused LightDM not to be able to start Xorg?  That's the voodoo juju part of all this.  You'd really hope that something particularly helpful woud be in the journalctl output, or systemctl status.  But alas, no help was forthcoming.  These days, if you don't have a working graphics environment where you can run a browser with JavaScript capabilities, lamentably you're at quite a disadvantage in researching possible causes and remedies.  The usual copying/pasting of an error message into a Google search is going to be quite difficult.

Long and the short of it is, it's really a particularly good idea to practice restores every now and again.  It will point out deficiencies in either your backup or your restore methodology, or maybe both.  In any case, with such practice, it shoud speed up recovery from being in a jam.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

02 March, 2017

Yesterday, I knocked myself off the Internet

Yesterday, I spent some time poking about the Actiontec MI424WR (Rev I) that Verizon supposedly provided "for free" as an incentive to subscribe to FiOS.  Supposedly, in order to be (fully) supported, you need a Verizon approved router, including one of these.  I'm not sure their site allows one to complete the online (service) order without agreeing to rent, purchase, or otherwise prove you have (or will have) one of Verizon's routers.  You may be able to finish the order these days, but as I recall, two years ago when I was ordering, their maze of forms and JavaScript wouldn't allow a submission without one.  Anyhow...I read a recent thread on DSL Reports with regards to residential class accounts being able to have static IP addresses (they don't allow that) and the workaround of using dynamic DNS services prompted me to start poking around to see what services (dyndns.org, noip.com, etc.) that the Verizon router supports directly.

I have VLANs set up on my switch, one for TWC/Spectrum WAN (although I don't subscribe to any of their services presently), one for my VOIP LAN, one for most of the rest of my LAN, one for FiOS WAN, and one for the FiOS LAN.  The nexus for everything is a PC running Linux functioning as a router.  I knew there was the possibility of an address "conflict" if I plugged in the WAN port on the Actiontec (because Verizon only allows one DHCP lease at a time) so initially I powered up the Actiontec with the WAN cable unplugged.

After puttering about with a lot of its settings (ugh, I hate the Actiontec Web interface), I'm not sure what possessed me, but I thought, hey, my Linux router has a DHCP lease, and since Verizon's systems will only allow one lease at a time, I thought plugging in the Actiontec WAN cable should be no problem.  If it tries to obtain a lease it will just be denied, whether by DHCPNAK or just timing out.

Emmmm....wrong!  Very shortly after plugging in the WAN cable, the "Internet" LED came on.  First I thought, "wait, what?"  That was shortly followed by "oh, crap!"  Sure enough, I logged onto my "production" router, tried the usual "ping 8.8.8.8", and there were no replies whatsoever.  There isn't anything of consequence connected to the LAN ports of the Actiontec; it was pretty much just connected so that I could get in to configure it, and possibly switch things up a bit if a Verizon TSR demanded to have it online.  So either the lease which Linux had obtained was somehow "transferred" to the Actiontec, or the lease Linux had was invalidated, and at any rate, in that state the Linux router was of no (WAN) use.  (It still routed just fine between all the LANs.)  I basically knocked myself off the Internet, because nothing on my network is set up to operate through the Actiontec.  I thought, you idiot, you should have logged onto the switch and issued "shutdown" to the interface for the Actiontec WAN port first.

As you may gather from some of my previous postings, here on the I Heart Libertarianism blog or on Google+, I get pretty anxious about not having Internet connectivity, so to lift a line from Dickens, this was not the best of times.  I think this is mostly because I have the family's email server here, not to mention virtually all the important notifications I have would go to a philipps.us or joe.philipps.us address.  It's also the DNS master for a number of my domains, including philipps.us.  I know, I know...the TTLs on the SOA records themselves should make them valid for two weeks, so even without Internet for an extended-ish time, things should not fall apart entirely.

Email servers very typically keep retrying for several days, maybe even as much as a week, so that should not be so terrible.  As a further mitigation of any failure of my email server here, it just so happens I was one of the people who got in on the "ground floor" when Google was beta testing Google Apps (the Web services, not the usual meaning these days of the apps to access Google on Android). As a consequence, I have a "no cost" G Suite configuration as a less preferred MX.  Therefore, it would be somewhat messy from an email history standpoint, but a catchall account on G Suite would have any email which my setup cannot suck in.  Still...I think it's the thought that without Internet, even that backup setup is no good because I can't get to it.  I would have to "borrow" someone else's Internet access even to see what's over at my G Suite account.

This would be compounded by the fact that these days, many of my access passwords are utter gibberish, thanks to KeePass and KeePassX.  The database is in my Google Drive, but also backed up on my local computer.  The implication is, it's another one of those "bootstrap" problems, without Internet, I don't have access to the master KeePass database, and even if I work from the copy, say from a computer at the Erie County library, it's going to be a LOT of tedious typing because the library's computers are likely not going to be able to run the KeePass software.  I'd be working with revealing the decrypted passwords on KeePassDroid on my Nexus 7.  For any Web services which will accept it, I turn on basically everything printable except space for the KeePass generator, and typically 20 characters.  So yeah....lots of tedious typing if I have to use another computer.

Despite the minor panic I was in, I thought, come on, this shouldn't be that difficult, you really should have a way out of this.  You can try ifdown on the Internet interface (happens to be eth3), followed by ifup.  Nope, that didn't really do anything.  Just calm down a little, and work the problem.  If you get back on the Actiontec, you should be able to pick and prod your way around it, and find the "release DHCP lease" button, which you know is in there somewhere.  That would at least mollify Verizon's backend(s) (or the ONT) into letting Linux get a usable address again.  That was in fact the key.  After hitting "release" on the Actiontec, I was able to ifdown/ifup one more time, and Linux got an address/lease.  However...it was not the IPv4 address I had before.  Rats.

As mentioned, the whole exercise started with wondering about Actiontec's implementation of dynamic DNS.  This is precisely what I needed to do.  This happens so infrequently that I have a Google Keep checklist for IPv4 address changes (which I have exported to a Google Doc for linking in this blog entry).  I have that accessible on my Nexus 7, so all I have to do is find it on there, and I'm good to go.  I copied the list, renamed it with "1-Mar-2017" in the title, and went about executing its items.

There were items on the checklist that I still had to figure out on the spot.  For example, for some of the items, I did not know the pathnames of what needed changing, or what item in the relevant file.  So in a sense, it's good this happened, because it has made me refine the process and therefore improve it.  Still, it's a pain whenever my address changes.  Some of it could probably be scripted or automated, but it's one of those things that happens so infrequently, I have to wonder how much utility there is in writing anything.

Anyhow...obviously, I'm back online, or I couldn't be posting this.  Hopefully I'll be better prepared for the next time my address changes.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

24 July, 2016

Some Company Practices Are Just So Nonsensical

Sometimes I just have no understanding of company policies.  I think I need to remind the reader that there's no such thing as a free lunch (TNSTAAFL).  Things that appear to be free are never genuinely free, there is always a price to be paid for anything, especially when dealing with a company.

I don't know, maybe there is some law which says billing has to be like this, but I'm dumbfounded by my recent experience with Time Warner Cable (can we call them Charter or Spectrum?  or not yet?).

I just couldn't justify paying O($25) per month for a service of which I rarely partake.  (I'll use this quasi- "big O notation" throughout this entry to mean "on the order of."  It has a similar but not quite the same meaning in mathematics and computer science.)  I find a lot of entertainment in YouTube videos for example, and something like Netflix, Vudu, Amazon Video/Amazon Prime, Hulu, etc. would be half the cost or less.  So I used the expiration of my SchedulesDirect subscription as a prompt to tell them, "sorry, disconnect me, please." They even offered to knock around $10/mo. off, but still, I don't watch that much on there.  After all, how many times can you watch the same episodes of M*A*S*H on WBBZ?  I must have more than 100 hours of recordings waiting to be watched on my MythTV.

A few days after I called in the disconnect order, I realized I never asked the CSR what my final bill should be, since it would be prorated.  It's also worthy to note that billing is explicitly ahead of time; that is to say, your payment is for the month to come, not for the month that ends on your billing date.  I hopped on their Web chat, and they gave me a figure for the partial month's service.  I thought I had disabled automatic payment, but they advised me not to pay anything.  However, I pointed out to them if I didn't pay for my partial month, and just wait for another bill, I'd be paying late, which I would rather not do for ethical reasons, plus a possible hit to my credit rating.  So I logged into TWC's site, scheduled a payment for that reduced amount, and thought that would set things square.  But no, that was wrong of me to think that.

For starters, there is ridiculous lag with those folks.  It turns out if you want to cancel automatic payments, you should do that before your bill arrives, otherwise an automatic payment will be scheduled anyway.  It doesn't matter that the amount to be billed might change between the bill being issued and the payment due date, which is weeks away.  I should also note here that Discover's program of Freeze does not apply to certain classes of payments, including  automatic charges from utilities (exactly like TWC).  I would have liked to prevent TWC from making any further charges to my payment card, but when I asked Discover about this, their answer was that indeed Freeze would not apply to TWC, and charges cannot be disputed before they actually post. You'd hope, wish, and think that the amount charged on payment due time would be at most any difference between any payments already collected and the amount of the previous bill.  Sorry, there will be no such luck.

When the original payment date arrived, I logged onto my Discover account and noticed a pending charge for the full month amount.  So I called up TWC and asked, what gives?  Oh, yes, we see your payment for the partial month here, and we also see that the automatic payment went through, we'll just refund that full month's amount to your payment card.  Keep in mind TNSTAAFL, because toll-free number (TFN) per-minute charges had to apply, maybe as well as telecomm charges on top of that (depending on how their toll-free routing is accomplished and their deal with their telecomms provider).  Plus the agent to whom I spoke had to be paid, the workstation he used had to be paid for somehow, the network to which that workstation was attached had to be paid for, the IVR which routed my call had to have been purchased and maintained, and so on.  Even less insignificant than these minor overheads, I happen to know from working for a small business which accepts credit cards that there is a per transaction charge plus a percentage of the purchase fee for accepting a credit card payment, typically around $0.30 plus 2%.  So that means that would be at LEAST O($0.60), $0.30 for my manual partial month payment and another $0.30 for their erroneous charge.  I honestly don't how much the transaction charge is for refunds.

An aside: I used to get my hair cut at Fantastic Sam's all the time.  I thoroughly realized why they had a sign that read something to the effect of, "minimum credit card payment is $10."  Their analysis likely figured out that paying the average shop rent, their stylists, etc. had not much margin, and that the credit card processing costs would cut significantly into that margin, if you'll pardon the pun.  As I recall, haircuts were $7 at the time, so no, I couldn't use a card, had to pay cash.

Well enough; a few days later, I saw both the charge and the refund were finalized on my payment card account.  But that wasn't the end of it.  No, that'd be too logical.

About two weeks before posting this, TWC sent a bill by mail.  It turns out, bottom line, the chat rep. got the amount total wrong, I still owed another $1.04.  Despite not being a customer anymore, they also sent along a pamphlet about their billing practices, "instructions on how to use your TV service," pricing, digital cable, remote controls, parental controls, and on and on.  Of course, they also sent an envelope, assuming I'd be writing a check and mailing it back.  Of course, again, TNSTAAFL.  They printed two "letter" size pages, one with the relevant billing information, another with ads for even more expensive services (O($90)/mo.), plus the mentioned pamphlet, the return payment envelope, had to mail it in an envelope for the whole thing, and paid postage to send it to me.

So...I went to log onto TWC's site again to make my $1.04 payment, of which I knew a significant fraction would be eaten up by transaction fees.  So be it; I don't want to deny them payment, even of that little amount, and I wouldn't want the potential damage to my credit rating.  Except...that was not possible, because TWC decided to shut down Web access.  So what's a debtor to do?  Why, make a call to a TFN to authorize another charge card payment of course.  I realize at this point I could have written a check and mailed it to them, which perhaps wouldn't cost them as much.  However, I have scant few of those left, and I don't want to "waste" them when other payment methods are available, lest I have to spend several dollars to have many checks printed which I'll not likely use (especially consdiering they should be printed as Key checks instead of First Niagara checks).  So there's more per-minute TFN charges, more agent time, and so on, plus another payment card transaction fee of course.  This is quickly eating way into the $1.04 that I owed.  And then you'd think that'd be the end of it, but I'd be wrong again.

A few of days ago, TWC sent another bill, this time showing $1.04 as a previous balance, but nothing in payments, for a bottom line of a $1.04 balance.  And once again, a return envelope was included, another one page ad for services, yet one more copy of the policies pamphlet, an outer envelope, postage to send it, just ridiculous in the grand scheme of things.  Noticing that there was no credit for my payment of a few days prior, I called them again to ask, what gives?  So of course, there had to be more TFN charges, more agent time, and so on.  The CSR assured me the second bill had been printed and mailed just prior to me arranging my payment, and indeed my balance was zero.

I have little doubt that at least one more mailing will be made showing that my $1.04 has been paid, my balance is zero, another sheet telling me that I can have a great triple play experience for four or more times what I was paying (despite telling them on the disconnect call what I was paying was too much); who knows, probably another policies pamphlet, and perhaps even though I don't owe any money, another envelope to return my (non)payment.

Let's take a moment to compare and contrast this with the IRS and some other companies, namely USA Datanet, Ink Bird, Monoprice, and Newegg.



First, I'll say I can't believe I'm using this as an example of being better, but the IRS is more sensible, at least in one, limited way.  I know it's just outside the stated limit, but if you owe less than a dollar to the IRS, the feds are willing to call it even, and you don't have to pay.


I used to be a customer of USA Datanet.  Their business model was charging a cheaper per-minute rate for long distance calls, only $0.10/minute, and transporting the calls with VOIP instead of conventional telephony infrastructure.  Plus as a promotion to join, they gave away 60 minutes "free."  The trouble is, I'm pretty sure they never profited from having me as a customer.  I live in New York and have siblings in Texas, but at the time, we were both people kind of busy with our jobs, so we would rarely talk.  For my first call, I think I talked for about 70 minutes.  So that was my free 60 minutes, plus a dollar.  They sent a bill, as you can imagine including an envelope for returning a payment, but I paid with a MasterCard.  So that was a printed page, plus the cost of two envelopes, plus postage, plus the transaction charges for MasterCard.  The next month, they sent another statement, showing that I owed a dollar and paid a dollar, so that was a printed page, an envelope, and postage.  Over the next couple of years, every few months I would make a call and leave a message on her answering machine (this was well before voicemail), which would generally only be a minute or less.  So I would be sent a bill for $0.10 or $0.20, I would pay with MasterCard over the Web, and the next month get a statement saying I paid what I owed.  This was stupidity similar to TWC; spend more to bill and send statements than there was revenue.

What I fail to understand is that USA Datanet called me up one day wondering why I had not used their service in many months.  After all, the way you would make calls is to call a PoP and enter in your account codes, and then the desired destination. Therefore they could tell when I would call, but more importantly. there was no obligation to use their service because it was not billed from a specific line/phone number.  I just told them there was no reason, I would still use their service if I really needed to talk to my sister, just that I hadn't in quite a while.  You would have thought they would have done some analysis deeper than "Joe hasn't used our service in a while" and figured out this is not exactly a profitable customer to have.  Or...I suppose they were trying to entice me to make some longer calls, like the first longer-than-an-hour one that I made, so I would be a profitable customer.

We now see the result.  USA Datanet is out of business.  Maybe their business model just doesn't have the appeal that it once did, or maybe they spent too much on providing service for more than the revenues they were receiving.



A while ago I got annoyed at the unreliability of the mechanical thermostat in my main floor refrigerator (I have another in my basement).  So through Amazon I bought an Ink Bird electronic thermostat.  About 6-8 months in, the thermostat was working OK, but it would not accept any commands to change or display anything.  It is stuck at 1.9° ± 2°.  When it powers up, it shows "1.8" which I can only assume is some sort of diagnostic code.  I wrote an email using Ink Bird's Web site asking if there was anything I could do, such as reflowing solder.  Even though they are based in China, I'm going to guess they were reluctant to tell me to do much with this thing, figuring I'd do something to it which would cause me to be shocked.  I also sent them a very short video of their unit being plugged in, and the "1.8."  Eventually they asked for my Amazon order number, and sent another thermostat.  I explicitly asked if I would have to return the defective one, and they said no.  They realized that it would cost O($8-$10) to ship it back, which is about half the cost of a new one.  They just figured this as a cost of doing business.  The thing is, I bought two more, figuring I eventually want to replace the basement's mechanical thermostat, as well as having a fully working one for the main floor.  Plus I have the one they sent as a spare, should one of the two active ones fail.  So for the price of 3, I basically have one for free.  And the one still works, it's just not adjustable, and is set for an awfully chilly temp.



A couple of months ago, I was frustrated at not being able to connect up all my audio gear simultaneously, always wanting for this cable or that cable, so I ordered a bunch from Monoprice ("MP").  The pickers must have been having a bad day because instead of getting the 5 Y cables of 3.5mm plug to left and right RCA jacks, I got 5 cables of RCA plug to two RCA jacks.  (In warehouse terms, there is "pick," where items are taken off shelves or similar, "pack" which is readying them in packages in preparation for shipping, and "ship."  A lot of operations have people do the picking, hence "pickers.")  Both varieties of cable retail, from them, for O($0.80) each.  I contacted MP, they redid that part of the order, and they said just dispose of, or use, the erroneously sent cables.  They realized that the lost value was O($5), and that it'd be around the same dollar amount for me to ship the erroneous cables back. They're cognizant of the price of good customer relations and the economics of the goof.



Finally, I will relate a tale of Newegg.  I ordered a number of items, one of which was a 3 m HDMI cable and an HDMI to DVI-D adapter, sold as a single item.  When the combination arrived, it appeared as if cable had been picked but not the adapter.  I will admit, I made a goof; as I opened the bubble pack they came in, the adapter fell out and out of sight.  What Newegg ended up doing was refunding that item of the order, and telling me to use the money towards the purchase of the "missing" adapter.  When I realized my error a few days later, and found the adapter, I called them up again, asking them to charge me for the combination item.  They said something like, tell you what, for this one we appreciate your honesty, but we're just going to call it even.  This really doesn't "hurt" us.  Part of the issue was that Newegg has a strict policy that it does not take any orders over the phone.  I'm sure they have experience where this is their most cost-effective way of doing business, either because they've gotten burned by phone orders before, or their considered analysis is that maintaining Web servers and networks is far cheaper than agent time.  While in the short term this was a "minor" loss for them, I subsequently made more purchases from them instead of Amazon specifically because of their willingness to give me the benefit of the doubt on that one set of orders.


I can only guess at the economics of of the IRS and their banking services.  I will guess that handling amounts of only a dollar might cost them more in processing fees or labor (especially if I were to mail in a check) than they would gain.

I can understand MP being wise and realizing the perspective of most customers, that most customers wouldn't want to pay the value of returned items in shipping alone for something they goofed in doing.  On the other hand, they trusted me an awful lot that I was telling the truth and not trying to scam them out of goods.  At least I'm reasonably sure since they ship in bulk, they do not pay the same rates as a "retail" shipping customer like me.  Still...there was an extra small box, more pickers' and packers' time, and so on.  All I can hope is there was enough margin in the other items I bought that day that at least they broke even, or came close.  What made me a little nervous about the long-term viability of that company is that it took over a week for the "RMA" to go through, and when I asked about it, they said their RMA people were backed up more than usual.


Ink Bird likewise had a lot of faith in me (I could have sent them a doctored video showing them the "bootup" of their thermostat and the brief showing of "1.8"), and likewise thought they were willing to take a similar hit in a second thermostat, a second box, more shipping, etc. for improved reputation.  I don't know what the warranty is supposed to be on those ITC-1000 thermostats, but somehow I don't think I was in the warranty period.

But sometimes businesses seem really stupid, as I've related with USA Datanet and Time Warner Cable (TWC).


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

22 July, 2016

An Annoying Week With Allwinner

Many months ago, I got an Allwinner A33 Q8H tablet, a BrightTab, from Ollie's Bargain Outlet.  I do not know why, but one day a couple of months ago, it would not authenticate with Google no matter what I did (wipe all the tablet's data, etc.).  I'm guessing the firmware that was on it got either obsoleted or blacklisted (it would fail more or less immediately).

Earlier this week, since my Nexus 7 has developed distorted audio, I decided that I'm going to try to revive the Google functionality on this BrightTab since it's not particularly useful to me w/o Google cloud services.  In an attempt to correct this authentication failure, I copied some .apk and .odex files over from my Nexus 7.  Except I had one fatal procedural flaw: I did not back up one of the BrightTab's original .apk's in /system/priv-app .  And since the ones for the Nex7 did not work at all with this Android, it got into a bootloop, basically an unrecoverable one (since I did not have any of the original firmware anywhere), with essential processes dying over and over again.


Unfortunately, I was in for a rude awakening about Allwinner.  They seem to have implemented fastboot, but ultimately, it's not all that useful.  Most Androids, you can take partition files and flash them from fastboot.  Buuuuuuut.....Allwinner decided to go their own way, and create a whole different way of packaging firmware, one giant image lump (which apparently contains boot0, uboot, the bootloader, recovery, and system, judging from the messages which their flashing tool puts out on the stdout/stderr).

The first hurdle is, of course, that the majority of readily extant tools is written to be run on Microsoft (ugh).  Not knowing what effect these things would have on WinXP, I decided to run my WinXP VirtualBox, making a snapshot before installing anything "foreign."  This was good for about a day of hacking around, because unfortunately, this goofball of a tablet is not recognized by USB drivers, even ones that are supposed to be "universal."  Some of the Allwinner-specific tools even have drivers packaged with them, but this VM didn't seem to recognize the tablet when physically then logically connected (would always pop up the "new hardware found, please install a driver for it" dialogs).  In retrospect, one of the problems was that the tablet wasn't even being put into the correct mode (I guess similar to Samsung's download mode), possibly due to me not knowing how to do that/poor documentation.  It seemed like the instructions were wanting to do their deeds through ADB, which I knew was active because i could see the processes which were dying with logcat. Eventually I came across "PhoenixCard," which is a utility to prepare microSD cards for firmware flashing.

Cleverly, Allwinner does seem to have some sort of provision for booting from microSD cards, a feature lacking on a lot of other tablets.  That's particularly handy when you're in a boot loop and all the firmware writing tools you can find can't seem to find your tablet on USB.  But it is a really tedious way to go through several iterations of trying things.

The second hurdle was that BrightTab does not publish their firmware anywhere.  At least for the Pandigital I had, a .zip file which I could flash from recovery was available for download from Pandigital's site, so that any time my tablet's Android got corrupted (and it did, many times, before it died for good) I could reboot into recovery and reflash the stock firmware.  Also, the great pleasure of Nexus devices is that the factory images are readily available from Google's site.  There are several Allwinner boards with different ARM CPUs each, each with dozens and dozens of firmware variations for each processor (in my case, an A33).  Thankfully, Google searching eventually led me to a blogger who has collected links to a lot of them.  These pages may be in Spanish, but heck, with my 4 years of high school classes I knew most of the words used, and Google Translate helped with the rest.

The third hurdle was, given these dozens of firmware files, which one would work for my particular tablet?  The first blob I downloaded actually was pointed to by XDA-Developers, a trusted Android site.  I stuck it on a microSD card with PhoenixCard, it was accepted and flashed by the tablet, but after booting, the touch coordinates were way off, and I could not even activate the "continue" button after sliding the chooser to US English.  So, I started going down the huge-ish list of links to ROMs, looking for ones which indicated the proper size (in my case, 800x480).

The next breakthrough was finding Linux tools on Allwinner's wiki and Git repository.  This is in the form of source code for a USB kernel module and their flashing utility, LiveSuit.  (Seriously...it's "LiveSuit," like clothing.  It's not missing an "e;" I would have called it "LiveSuite," but apparently Allwinner wouldn't.)  And the breakthrough right after that was in the very explicit on-screen instructions for LiveSuit: you have to disconnect and turn off your tablet, hold a key other than power as you plug USB in, and then press power many times (they suggest 10,, but it seems the tablet may come alive to LiveSuit in fewer than 10 presses).  This apparently wakes up the boot code in the tablet to go into a special mode to accept ROMs. Now that I think of it, it's something like Apple's DFU mode, except the screen stays off.  Yay!  There was no more need to keep Windows XP in VirtualBox booted!

Another hurdle to endure was that a lot of these blobs are hosted on freemium download hosting sites.  Sure, give them your credit card number, and for varying amounts for various terms (one month, six months, a year, etc.) you can log in and have "gold member" access, which amounts to faster data rates and no wait for the link to appear.  Me being the cheap (and currently unemployed) type, I had to endure one minute countdowns, CAPTCHAs, and approximately DS-1 rates (roughly 180KBytes/sec).  Since these files are considerable (maybe at least 300M), they took plenty of time to come across, and the site had stream counting too (so you couldn't download more than one file at a time...unless of course you are a gold member).

At the point of getting native Linux flashing working, I had tried a couple of ROMs using PhoenixCard.  They ranged from not flashing (apparently a firmware image file either inconpatible with PhoenixCard or incompatible with my tablet) to flashing OK but being unusable.  But then, with the Linux kernel module loaded and better instructions on how to prepare the tablet for another ROM image download, also without needing to go through VirtualBox, the pace of trials picked up considerably.

One of the ROMs looked (and sounded) while booting almost exactly how it did after I got it from Ollie's...except the touchscreen didn't work.  bummer.  A couple of tries later, it booted OK, the touchscreen worked, but it was all in (as best I could tell) Arabic.  Actually, I found a YouTube video which explains the settings entries to look for to change any Android's language, and managed to switch it to English.  But then I thought...any time you go into recovery and wipe all data, you're going to have to muddle through changing to English.  So although the ROM seemed to be suitable, I plodded on to more ROMs.

Finally, I found an image that, when loaded and booted, it behaves much like the original firmware, except of course the Google stuff works.  I also spent about half the day today perfecting the tablet, such as installing SuperSU, copying over some of the shell environment from my Nexus 7, installing some applications and generally customizing it.

As usual though, it wasn't necessarily all the annoyances of getting where I want to be, it was somewhat the fun of the challenge of the hacks needed to get there.  So it wasn't the adventure of the destination, but the adventure of how to get to the destination.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

16 November, 2015

A DDoS Attack Can Really Ruin Your Day

I hope I don't have to go through too many more days like this past Saturday, 14-Nov.  The insidius thing about DDoS attacks is there is little that can be done about them, except a few things.  Ordinarily (the DoS attack) you identify a host or netblock which is sending the nuisance traffic and you add some rules in your router to drop those packets.  But add that extra "D", and it's like being surrounded by a swarm of bees, you can't possibly swat them all dead.

The few things which can be done (which I know about) are:
  • direct traffic somewhere else.  There are companies which specialize in this sort of mitigation, and sometimes sink/absorb multiple gigabits per second of spurious traffic.  Although I've never approached anyone who does this, I'm guessing that doesn't come cheap.
  • insert some sort of rules to limit the rate at which these packets are handled.  This still diminishes service, but hopefully it's not shut down completely.  At least processing is occuring on the packet level, and it's not an application wasting time handling meaningless connections.  Also the kernel doesn't waste time maintaining socket state and such.
  • coordinate with your upstream Internet provider to block or reduce the rate of ingress of these packets.  This is more useful than acting on one's own due to the limits of your upstream link...i.e., the attackers are simply filling your pipe with useless bits.  Considering my link is what you might call residential-class service, this is impractical.  (Besides, shhhhhhhhhhh!  their ToS say I'm not supposed to be running a server.  But let's think about this a second.  If they were truly serious about this, all they would have to do is block the well-known proto/port used, just like they do with outbound TCP/25.)
  • shut off service entirely, and hope that the attacker botnet eventually "loses interest" in in your host, or gets shut down by others' actions.
This last is the strategy I used yesterday.  I know it's much less than ideal and is not sustainable.  From what I can tell in the logs, it began about 0130 local (US Eastern) time, and lasted for around 15 hours or so.  I was merrily puttering around (I think Web browsing) when I noticed the Internet NIC activity LED on my router was flashing an awful lot, accompanied by a lot of disk seeking sound (which would have been syslogd(8) writing out maillog).

The first thing I did of course was log onto the (Linux) router and do a tcpdump of the "WIC" (wide area network (WAN) interface card), and discovered a lot of TCP/25 traffic.  So I pulled up the maillog, and discovered a lot of instances of "host such-and-such did not issue MAIL/EXPN/VRFY/ETRN" and messages about refusing connections because the configured number of (MTA) children has been reached (happens to be 7 in my case).  By far my biggest concern is the attackers will be sending something which trips up my MTA and makes it spam other people, possibly getting me knocked off my ISP.  So I did what I usually do, look at a representative sample of recent maillog entries and add some iptables(8) rules and DROP traffic coming from these hosts (or maybe netblocks) which are causing the "did not issue MAIL/EXPN/VRFY/ETRN" entries to be generated (basically, connecting and doing a whole lot of nothing, just tying up a socket and pushing me over my child process limit).

There came a point where I realized I needed more than just a line or three of rules, so I decided to grep(1) for "did not issue" in the logs, take the latest 150 such entries with tail(1), extract the IP addresses with sed(1), and use sort(1) to get a unique set of them.  As I recall, I came up with 30 or so addresses, and used a for loop to append DROPs of them to the PREROUTING chain in the mangle table.  (The idea is to cut off traffic as soon as the router takes packets in, so a PREROUTING chain.)  Unfortunately, it became apparent that a handful of addresses wasn't going to be effective, because the log messages kept on a-comin'.  So I decided maybe a little PTR record and whois(1) investigation (to block entire netblocks instead of individual addresses) was in order.  A disturbing trend started to emerge.  The IP addresses were generally purportedly to be Russian and other Eastern Bloc countries, who, at least to me, are notorious for being the origin of a lot of DDoS attacks and spam.


I really did not want to shut off services entirely, but I saw little choice.  I did not want my MTA compromised to be yet another source of spam.  I put in an iptables(8) rule which dropped all TCP/25 traffic arriving over the WIC.  I noticed this did not stop the popping of message into maillog and realized the attackers were also attempting connections to TCP/587 and TCP/465, so I put in rules for those too.  For the moment, IPv6 is, in a sense, only in its infancy (for being a twenty or so year-old infant!), so there was no apparent reason to add any ip6tables(8) rules.  And in fact, email still flowed in over IPv6, most notably from Google (thanks for being a leader in the adoption of IPv6!).

It was at this point I was very glad that (according to Wikipedia) Dana Valerie Lank, D. Green, Paul Vixie, Meng Weng Wong, and so many others collaborated on initiating the SPF standard.  I began to think, from whom was it particularly critical that I receive email?  I could go to those domains' SPF records and insert ACCEPT rules for those addresses or netblocks.  Google was already "covered," as noted above about IPv6.  Oddly enough, the first domain I thought of was aol.com, because my volleyball team captain might want to tell me something about the soon-upcoming match on Monday.  The SPF for them looked knarly, with some include: directives.  I settled for looking at Received: headers in previous emails from Matt and identifying the netblock AOL had been using previously (turns out it could be summarized with a single /24).  Next I thought of Discover, then of First Niagara, then Verizon (for them informing me of impending ejection from their network for violation of their ToS).  I also thought that although it's not all that critical, I receive an awful lot of email from nyalert.gov, especially considering we had that extended rain and wind storm, and the extension of small craft advisories and so on.  All in all, I made exceptions for a handful of mailers.

Then to evaluate the extent of the problem, I used watch(1) to list the mangle PREROUTING table every 60 seconds, to see the packet counts on the DROP rules.  I'd say they averaged deltas of around 20, or one attempt every three seconds, and the peak I saw once was 51, or nearly an attempt per second.  I know as DDoS attacks go, this is extremely mild.  If it were a seriously large botnet controlled by a determined entity, they could likely saturate my 25 Mbit/s link, making doing anything Internet-related extremely challenging or impossible.  Always in the back of my mind was that this was unsustainable in the long run, and hoped that the botnet was dumb enough to "think" that if not even ICMPs for connection refusal were coming back, that it would assume they in one way acheived their one possible objective, which was knocking my entire MTA host offline.

I then contemplated the work which would be involved in logging onto various Web sites (power, gas, DSLReports/BroadbandReports, First Niagara, Amazon, Newegg, and on and on) and updating my email address to be GMail.  I also started trying to search for email hosting providers who would basically handle this entire variety of mess on my behalf, and forward email for me, whereby I could block all IP addresses except for those of said provider.  Or maybe I could rejuvenate fetchmail(1) or similar to go get my email from them over IMAP and reinject it locally, as I used to do decades ago with my dialup ISP's email.  To my amazement at the low prices, it looks as if, for example, ZoneEdit will handle email forwarding for only on the order of $1/month/domain (so $3 or maybe $4 in my case, because there is philipps.us, joe.philipps.us, philippsfamily.org, and philipps-family.org).  This is in contrast (as far as I know) to Google's $5/month/user, and I have a ton of custom addresses (which might be separate "users").  (Basically, it's one of the perks of having your own domain and your own email server.  Every single sender gets a separate address, and if they turn into a not-so-savory sender, their specific address ceases to work.)  The search terms needed some tweaking because trying things like "mail exchangers" (thinking in terms of MX records) turned up lots of hits for Microsoft Exchange hosting.

A friend of mine runs a small Internet services business and has some Linux VPSes which I can leverage.  He already lets me run DNS and a little "snorkel" where I can send email through a GRE tunnel, and not appear to be sending email from a "residential-class IP address."  So I called him up and thankfully he answered right away.  I got his permission to use one of the IPv4 addresses (normally used for Apache/mod_ssl) for inbound email, in an attempt to see if these attackers are more interested in my machine (the specific IP(v4) address) or my domain(s).  If I add an additional address to accept email, and the attacks do not migrate to that address, I then know that it's far more likely that this botnet came across my address at random or by scanning the Verizon address space.  So, I picked an address on one VPS, added a NAT rule to hit my end of the GRE tunnel, had to debug a routing table issue, redid the MTA configuration to listen on the tunnel's address, all-in all about an hour and a half's worth of work to implement and do preliminary testing.

It was at this time I realized in my watch window that the packet counts were no longer increasing, even over several samples (minutes).  Even after I added the A record to supplement the one for my Verizon address, I noticed there was basically no activity in maillog.  So as besst as I can tell, it was as I suspected, the botnet was really likely only interested in my specific address/host.  And thankfully it "saw" the lack of any TCP response as an indicator that the host had gone offline, and to cease directing any resources to the attack on me.  I hate to give these worms any ideas, but you could also try other things, like ICMP echo, to determine if that host is still alive.  Then again, if your sole objective is compromising an MTA, maybe that doesn't matter.

Eventually, I inserted a rule to accept TCP/25 traffic again, thinking if attacks resumed, I could easily kick that rule out and spring the shutoff trap again.  Or even better, I could replace it with a rule including the limit match, so only something like 10 or 15 connection attempts could be made per minute.  At least the MTA would not be wasting time/CPU on these useless connections, and the kernel would not have to keep track of state beyond the token bucket filter.  I almost hit the panic button when I saw some more "did not issue" messages, as well as a little later a notice of refusing connections because of child limit.  But I reasoned, just wait a few minutes, and see if it's persistent.  Howver, I had a lot of anxiety that it was not over yet, and that it was the domain and not the host the attackers wanted compromised, because some of that unwanted activity was through the newly set up IP address.

In retrospect, I question whether I want to continue doing this myself.

  • It's against the ISP ToS, and they could blast me off their service at any time.  I'd likely have to pay their early termination fee and I'd have to go slinking back to Time Warner (and their slow upstream speed).
  • What happens the next time this occurs?  Will it be more persistent?
  • What if a future attack is attacking the domain, and any IP address which I use get similarly bombarded?
  • What if next time it's not around an attempt per second but instead hundreds or more attempts per second?
  • I should have traced traffic on my GRE "snorkel" tunnel to see if they managed to compromise my MTA, and were actually sending email surreptitiously.
At least the experience has uncovered flaws in my router's configuration, and I'll be more ready to switch to using the VPS to recieve email.  And I'll have some ideas about hosting companies if I want to migrate operations there instead.

UPDATE Mon., 16-Nov: There are some more folks at it again.  What I've decided to do for now is to put in some iptables(8) rules which use the limit match to allow only a certain number of connections per minute.  So far this has been fairly effective at keeping the chaos to a minimum, but it's not ideal.  I really think I'm going to have to look at an email hosting service, at least until I can put up a more robust email server, or maybe permanently.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

16 August, 2015

It's Been Interesting Times This Morning With Some Severe Thunderstorms

15-Aug-2015:

(dated here at the top in case I don't finish this by midnight...all the "this morning" or similar references refer to the 15th.)

The supposedly Chinese saying goes, "may you live in interesting times." This morning I would say qualifies, although not in the curse sense.  It was just a weird slice of life with some drama for me.  As I and Garrison Keillor have said many times, it could be worse, and I got away comparatively really well...this time.

What an early morning.  I had slept for a couple of hours, when for unknown reasons I awoke to heavy rain and a thunderstorm at about 1:45 EDT or so.  A few minutes after listening to this, there was a REALLY CLOSE lightning strike, no discernible delay between the really bright flash and a sound which seemed like it would be loud enough to at least crack if not shatter a window (but all mine seem OK).  Thinking only more of this is going to continue, making it next to impossible to get back to sleep with all this thunder going on, and the fact that I like watching lightning, I decided to stay up.  It was quite a light show.

As I was wandering over near my kitchen sink, I heard the sound of water pouring at an incredible rate.  I knew what this was; I have experienced times before when water was coming into the drain tiling so fast that it made that definite sound of water pouring into other water, such as pouring water out of a pail into a swimming pool.  However, this was different.  It seemed more intense, maybe much more intense, than I had previously experienced.  That underscored just how hard it was raining outside, and that it had been doing that for long enough to soak through 2 to 3 meters of earth.

It occurred to me this could be quite a problem.  One day a few years ago, I was standing in my basement at my workbench during a thunderstorm when we also had a close lightning strike.  This was a bit further away back then because there was a perceptible, although short, delay between flash and thunder boom.  But close to the time of the flash, I also heard the sound of clicking, which I quickly deduced was the sound of (more than one) circuit breaker tripping.  I was also quite concerned that day because I smelled that same smell when you've built up quite a bit of friction between blade and wood when sawing, that smell of wood slightly roasting.  The thought was, "something got hot enough to do that to one of my floor joists?  ouch!"  I never did find any evidence of electrical fault or roasted wood.

So that previous experience got me to thinking, what if the same thing happened in this really close lightning strike, but this time to the breaker protecting/controlling my sump pump?  I went into my basement post-haste to look.  I could hear the sump pump motor humming away, so no, at least that breaker was OK.  Nonetheless, I went to the load center to look at all the breaker levers.  The RCD (GFCI) breaker for the garage was tripped, so I reset that.  All the others looked to be in the ON position, so I was fortunate there.

I thought I had fixed the concrete block mortar so that water would not leak in.  I was proven wrong this morning.  The Susan Lane Basement Creek was flowing again, as usual from south wall to sump at the north end.  When I had first noticed it this morning, it was not even to my workshop wall yet, but almost.  Judging how water was falling out of the sky to beat the band, I figured it had to make it to the sump, eventually.  Ever since the first incident after moving in, I have been careful not to leave anything on the basement floor which did not have the tolerance for a couple of millimeters of water...so plastic bases on computers, glides on the legs of wooden tables, and so on.  The poor workshop wall still suffers, but oh, well; this sort of thing hasn't happened in a long, long time so it's not really worth trying to mitigate.

But that seemed the least of worries at the moment.  My attention turned to the sump, and the heretofore unseen volume per unit time entering through the drain tile outlets.  It had to be on the order of a liter per second.  After a pumping out cycle, it had to be somewhere around only 10 or at most 15 seconds to fill it to the pump trip point again.  And during the pumping cycle, it really seemed like the pumping rate was not all that much faster than the filling rate.  I began to think flooding was a real possibility, so I started "pulling for" the pump.  Then it occurred to me, I would be in some exceptionally deep doo-doo if, due to the thunderstorm, we lost mains power.  That would mean the sump would overflow, and who knows if the floor drain about 4 or 5 meters to the east could handle the requisite rate?  I also started contemplating what, if anything, was subject to water damage between those two points (turns out, not a lot).  I had to have been nervously standing there for at least 20 minutes.  I sort of assured myself after that much observation that it was rather unlikely the flow rate would increase significantly enough to overwhelm the pump.  That was somewhat diminished by the thought that even if the rain stopped immediately, there were still many, many liters remaining in the soil which would make their way into my sump.

As I was ascending the staircase out of the basement, as I have a number of electroluminescent night lights, making much of my house glow a dull green, I noticed the power was interrupted for a few hundred milliseconds.  As I passed through the kitchen, I noticed the microwave had held time, but the oven did not.  That's weird, it's either so brief they both hold or long enough they both lose it.  It's very rare indeed that one holds but the other loses.  I knew that since at least one lost it, it was likely at least one, probably several, of my computers had just been forcibly rebooted. I only have so much capacity in my UPS, so I have decided only the  barest of essentials...which includes the router, the managed switch, the video monitor, and little else... will be connected to it, not everything.

Sure enough, as I got to the systems room, the screen was showing the RAM test for sal9000 just ending, and beginning to execute the Adaptec SCSI host adapter POST.  I watched it to see that it wouldn't hang on, say, a USB HDD (I have still yet to figure out why a USB drive left plugged in, even one with GrUB installed onto it, freezes POST).  So at least this one is alive and well.  Next my attention turned to my MythTV, because the boot was hanging at the point where sal9000 was supposed to NFS mount some shares from it.  Ruh-roh.

That was a whole lot less hopeful and more worrisome.  That is the one system I access as a general purpose computer the least, so I have it set up not to have screen blanking.  One of its purposes is to be able to switch to it to warm up the CRT (this one takes on the order of 15 seconds to show a picture).  So it was quite disconcerting when the monitor would not lock onto a picture, and its LED went amber indicating power saving mode.  Tap, tap on some keys to try to wake it up; there was no response.  OK, fair enough, I'll just activate an ACPI shutdown by pressing the power button.   Ummm...ummmm....ummmm.....that does nothing.  OK, so it's very much not as preferred, but I'll hold in the power button to make the ATX PSU shut down hard.  Ummmmm....how long have I been holding it?   1...2...3...4...5...6...7...uhhhhh, shouldn't it have hard shut down by now?  Isn't that supposed to be 5 seconds?  At this point, I'm thinking my poor MythTV has had it, I'll probably never see anything through it again, my (yearly) SchedulesDirect subscription renewal in June will have gone mostly for naught.  Hmmm....what to do...I went around to the back, yanked the power cord from the PSU, waited for the Ethernet link lights to go dark, and reinserted it.  I went back around the desk to the front, and with much hope pressed the power button.  Yay!  The power LED came on.  The CRT warmed up, and I saw RAM being tested/counted.  512 MB OK; this is good.  It booted...until the part with the filesystem check (fsck).  Well, I knew this was going to take the better part of 10 minutes or so; this is "normal" PATA, on a 1GHz Pentium /// machine, 400 or so gigabytes-worth.  So I turned my attention to the rest of the network.

Rootin, being on the UPS, seemed just fine.  In fact, it looked to be getting out to both the IPv4 and IPv6 Internet just fine.  By that time, the ONT had plenty of time to reboot, or had not been taken down at all.

The next thing to cross my mind was the rain gauge in my back yard.  Had we had so much rain it had overflowed?  It was roughly 0330 hrs by now. I put on some boots and a raincoat and went out to look.  Even in the dim light of night I could see about a third of my yard and half of southern neighbor Megan's yard had visibly ponding water.  The rain gauge capacity is 110 mm, give or take.  It was to 96.2, of course using the scientific method of estimating the last digit between the markings.

Not too long after that, I saw what looked like a truck from our volunteer fire department (not an engine, something a lot more regular and rectangular) go by on Huth Road, the street which connects to my dead end street.  I thought that was kind of odd, and I wonder what they were doing. The best I could guess was they were looking around the neighborhood for a fire started by the lightning strike.  A few minutes later, I watched them slowly go down my street too; southbound.  I went out to my street to try to get a better look at what they were doing; they were maybe 175 m away by that time (my street I believe is about 400 m long).  It's then i got the idea that I wanted to see my sump discharge, as I knew it would be likely doing so about every 10 to 15 seconds.

I noticed then that the grate cap was nowhere in sight.  I walked down the street a little ways, figuring it floated away (it's made of plastic).  Darn.  I should probably go to Home Depot later today for a replacement, I thought.  Aw, heck, it's got to be somewhere on the street or close to it, right?  So I went back into my house and got a flashlight.  As I returned to the outside, I noticed Fire Rescue 7 had turned around, and now was going, much faster this time, north.  Whatever they were  looking for, either they found it, or decided it wasn't on Susan Lane (or Hemenway).

I kept looking.  I thought it was unlikely for the cap to disappear.  It wasn't small enough to fit down the storm drains.  It didn't lodge against the tires of the cars parked in the street.  Eventually, about 10 minutes later, I found it...maybe 120 m "downstream," and a meter and a half away from the side of the street (we don't have curbs).  I thought, score! don't have to go to Home Depot anymore!

When I got back to the house, after I put the cap back on, I thought, maybe the NWS would be interested in how much rain I measured.  I checked the latest chart of the last 72 hours of measurements, and they had recorded roughly 60mm since midnight.  Even though they are on the order of only 5km away, there still could be significant differences in things like rainfall.  So I Tweeted to them (note: this requires JavaScript to be allowed from Twitter to view completely):

Apparently, they liked this very much:


I was displeased that they "imperialized" it instead of leaving it in SI units, but what the fsck, I realize they're publishing to a primarily US audience, and the US doesn't deal with SI very well at all.

Things seemed to be calming down. The downpour had reduced to just drizzle, my sump pump was undoubtedly keeping up with inflow (which was reduced), The Internet was accessible and all my systems seemed to come away from this unharmed...or were they?

Every once in a while, I'll go down my list of systems in the ConnectBot SSH/Telnet client app on my Nexus 7 and log into each one, just so the date/time last accessed will say less than a day.  Maybe I should have just put the tablet down and went to sleep.  But I hit the "sal9000 con" entry, one that establishes a "telnet" session to my Chase IOLAN serial terminal server on TCP port 5003.  (The port number I chose is a holdover from me working with Bay Networks' Annex 3 terminal servers, which port numbers are not configurable, they're 5000 + serial port number).  This in turn connects to serial port 3, which is connected to the ttyS0 port on sal9000.  And...it wouldn't even handshake.  I tried the rootin entry (port 2/5002).  Similarly, no dice.  Sigh...all was not well with the computing world.  So I went and traced the power cord, yanked the 120VAC from the IOLAN PSU, replug, saw the LAN lights going nuts as it TFTP downloaded updated firmware from rootin, and again a few seconds later with the updated configuration (the firmware for the most part will operate just fine, but i wanted the latest available used.  Similarly, it has nonvolatile storage for the config, but I figured what the heck, it's good to have a live backup on a server too.)  So eventually, sometime after 0400, yes, things as far as I could tell were right with the computing and networking world.

I stayed up longer, hitting refresh on the 72 hour history page, to get the 3:54 precip figure.  It wasn't posted until about 4:15.  It added some millimeters, but the total still didn't get up to my measurement by a fair margin.  I had a snack and some orange drink, and finally settled down to "Coast to Coast AM" with David Schrader on sleep timer.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!

03 June, 2015

Another (Couple of) Days in the IT Grind

I do indeed realize there are far worse things that could have happened to me, but the past couple of days have not been good.  I am a technologist, and as such, I get very uneasy and tense when the technology I own fails.

It started out during experimentation with installing Xubuntu on a Pentium II (P-II) 450 machine.  What I had completed earlier this week was to take apart a failed ATX-style power supply, unsolder all its output cables, take apart a Dell power supply (which has proprietary wiring of its 20P20C plug), desolder the proprietary and solder in the standard 20P19C plug.  I don't care if I blow up this P-II-450 system, because it is one of the lowliest of capable systems I have here at home, and also a bit wonky at times.  So it was the natural target for testing of my power supply unit (PSU) cable transplant job.

It turns out that the wiring job went well, no magic smoke or sparks were released for either the PSU or the computer.  As just mentioned, it is a bit of a funky system, and with the transplant PSU, it seemed to want to boot off optical disk OK but not hard disk (HDD).  With another supply I have, it didn't seem to want to boot off optical (got to a certain point in the Xubuntu 12.04 disc and rebooted) but the HDD seemed to operate, albeit with a bootloader installation error which I was trying to remedy (hence needed both drives to operate well).  For whatever oddball reasons, a brand new PSU, less than one hour power on time, finally operated the computer, HDD, and CD OK.  (Since then, the other PSUs seem to work too, don't know what changed other than all the unplugging/plugging.)

The first weirdness was this ancient Intel mainboard was complaining about being able to read the SPD (???) of the RAM I put into it (had 2 x 128M DIMMS, which I was replacing w/ 2 x 256M + 1 x 128M).  So I puttered around with Google and Intel's legacy support site, and managed to make up a BIOS update floppy.  After flashing, the SPD error did not go away, and (probably because of that) it will no longer "quick boot" (skip the RAM test), PLUS I haven't found a keystroke which will bypass the RAM test.  It checks somewhere around 10-20 MB per second, so it takes the better part of a minute before it will boot anything (CD or HDD).

After getting a working Xubuntu 12.04 set up, I tried doing the in-place 14.04 upgrade.  That worked kind of OK, except the X server would not run (log showed it was SIGSEGfaulting).  MmmmmKay, maybe something did not work well during the upgrade, so let's try a straight 14.04 installation (which had to be done with the minimal CD, because the full disc is more than one CD-R, so must be USB or DVD).  This implies installing almost everything over the Internet.  This computer is so old it doesn't boot from USB, and I don't really have a spare DVD drive, so over the Internet it was.  Unfortunately, it had the same result, the Xorg server would not stay running.

On one reboot while trying some fixes, the boot just froze.  I found out this was due to not getting a DHCP address (network initialization).  So I arose from my basement to find my Internet router (a P-II 350) locked solid.  That fortunately rebooted OK.  That prompted me to get a rudimentary daemon going to drive a watchdog timer card I had installed a few months ago after my previous router went splat.

After getting home from my volleyball match last night, I found myself able to log onto the router, but off the Internet.  I rebooted, and I was back online.  I may have been able to get away with ifdown eth3 && ifup eth3, but I didn't think of it at the time.  I also reinstated the command to send an email when booting was complete.

I awoke this morning to see that sometime after 3am it had been rebooted, no doubt due to the watchdog timer card tagging the reset line.  In retrospect, this is when the system gets really busy reindexing all pathnames for mlocate.  I have since adjusted my daemon to call nice(-19) to give it the highest userland priority.

I had been watching the latest EEVBlog Fundamentals Friday on BJTs and MOSFETs when I tried to leave a comment.  And YouTube's little "loading the comments" icon would not disappear and show me the comments (and the blank for another one).  I found out the router was routing with whatever it had in RAM, but it was spewing oodles of disk access errors on the console.  Presumably it needed something on disk in order to complete DNS recursion or something.  I couldn't even log onto the router.  I just had to "let it ride."  It immediately made me very nervous, because so much I have relies on the functioning of that router: some public DNS zones, email, Google Voice VOIP, routing between my VLANs, DHCP, Hurricane Electric's 6in4 tunnel/radvd, and on and on.  The worst of it is that static IPv4 addressing is horrendously expensive (Verizon (for FiOS) charges $20/month more than a DHCP account), and while TWC's leases are a week, Verizon's are a short TWO HOURS.  So let's just say, there are a whole lot of little headaches strewn throughout the Internet which require attention when my IPv4 address changes.  So being inaccessible more than 2 hours could add insult to injury.

Needless to say, instead of looking forward to some YouTube watching and Google+ reading, immediately the character of the day changed radically.  It was "beat the clock" time, with finding a replacement computer to use for the router, installing sufficient RAM and HDD in it, and restoring something bootable from backups.  There was no easy way to see if dhclient was continuing to renew the lease for "my" IPv4 address (as it would be tryiing to write a renew notice to the syslog, which would be likely failing badly).  My nerves were frazzled, my hands shaking.  I kept on thinking, got to follow the attitude, stay as calm as possible under the circumstances, just work the problem one step at a time as the step arises.

Thinking that I might have to replace the whole computer, I screwed a spare 20GB HDD into computer.  Later through the process, I thought it better to at least try removing the current HDD and substituting a rewritten from backup one (I thought, great, wasted time getting back online).  So I booted an Ubuntu 12.04 Rescue Remix CD, laid out partitions, formatted, mounted them up into one neat tree under /mnt/rootin ("rootin" is the router's name), used rsync to copy from the backup USB disk onto this new disk (which took about 30 minutes), and do grub-install to make the disk bootable.  On reboot, the kernel panicked because it cannot find init.  Reading back in the kernel messages a little further, the root filesystem could not be mounted because that particular kernel could not handle the inode size chosen by the mke2fs on the Rescue Remix.  ARGHH!!  that was the better part of an hour basically wasted.

So I dug out the CD which I used to build the router initially, booting from it into rescue mode.  I used its mke2fs all over again (wiping out my restore).  Rebooted to the Rescue Remix, rsync, grub-install, reboot.  This time, it worked OK, at least in single user mode.  Things were looking up for the first time in two hours or so.

To add to my already frazzled nerves during this, when trying to switch from one computer to another with my KVM switch, my CRT would only display black.  Humf.  Suspecting this was the KVM switch's fault because it had been operating so long, I switched it off then on...no effect.  For positions where I expected power-saving mode, the monitor's LED was orange, for those where I expected a picture, green, but still no picture on the tube.  Indeed I do have another CRT in the basement, but it's a 19" whereas the one which was wonky this morning is a 22", so quite a visual advantage.  Thankfully it wasn't totally dead, I powercycled the monitor, and it was better.

I decided I would give it one more try though.  I tried Ctrl-Alt-Del on the router's console.  That did not go well.  It immediately started spewing more disk access errors on the console (could not write this, bad sector read that, a technician's horror show).  As this kernel has neither APM nor ACPI support, hitting the power button brought it down HARD.  When turning it on, I expected POST would not even recognize the disk.  Surprisingly though, it booted OK.

But here are the things I'm thinking about as this incident winds down.  One, I wish I did not get so worked up about these technical failures.  For me, email would stop (but it would presumably queue up at its sources), and a bunch of conveniences would be inaccessible.  I can't seem to put it into perspective.  Wouldn't it be a lot more terrible if I were in parts of TX (floods) or Syria (ISIL)?  Two, at least now I have a disk which I can fairly easily put into the existing router should its disk decide to go splat for good (such as won't even pass POST).  Three, at least I have a fairly complete checklist for IPv4 address changes, I just have to calm down and execute each step.  Four, I have some new practical recovery experience for my environment.  In theory, that SHOULD help calm my nerves...but I can't seem to shake that feeling of dread when things go wrong.  I know it's no fun being in the middle of being down, but I wish I could calm down when this sort of thing happens.  Heck...I get nervous at the thought of just rebooting ANY of the computers in my environment.  I would guess what tweaks me the most is not knowing what the effort will be to restore normal operation.

I think what I really need is a complete migration plan as much away from in-home solutions as I can manage.  That way when stuff fails at home, there is less to lose.  But that's going to cost at least some money, for example for a VPS somewhere.  Sigh.


Direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!