04 September, 2012

Some Experiences with (X)Ubuntu 12.04.01

I really should read more carefully and take advice from those more experienced than I.  There are indeed all sorts of warnings when upgrading a Lucid Lynx to a Precise Pangolin about backing up and so on.  Well...as the system in question isn't used a whole lot (it's unstable because the processor context gets corrupt because of a quirky HyperTransport link), I thought, what the heck, just forge ahead and do it; if it breaks, it's not like you lost a whole lot.  Well, actually, I keep forgetting how long tweaking takes, especially tweaking the boot process so that WoL works and the console framebuffers are maximally sized for the video controller and font, and are initialized to green on black in the initrd really early in the boot process.  The hack that nudged me to write this post will come...please read on to the end, where I tell you about my shazam moments.

Nonetheless, I forged ahead, and did an in-place upgrade.  Yeah, it looked a little scarey in that the "before blurb" said several of the packages I had installed wouldn't carry over, "but the community might support them." "Uhhhhhhhh....OK,"  I thought a little nervously.  I knew it was trouble of nearly the greatest sort when in several places it threw errors.

In a recent phone conversation with one of my friends, I heard the unfortunate news that yes, Precise does quite well, even upgrading LTS to LTS...if you have a very generic, default installation.  Vary from this in some cases just a little, and you're in for a world of hurt.  My usual setup is far from typical.  For starters, it's GPT, not the "traditional" MS-DOG (sic) partition table scheme.  I always want my bootloader and boot files on a simple, uncomplicated partition, so those are the first two (as GPT and GrUB play together to put GrUB linearly in its own, "special purpose" partition; "bios_grub" as parted puts it).  Similarly, the root filesystem gets a parition all its own, to make recovery hopefully as uncomplicated as possible, and a fsck of same fairly speedy.  Similarly, the swap and /var/log paritions are plain partitions to make things as efficient and straightforward as possible.  Everything else gets partitioned as LVM PE(s), lumped into one VG, an divvied out to /opt, /usr, /var, /var/tmp, and /home LVs with varying sizes as I think appropriate.   Putting them into LVs makes it easy to resize filesystems and reapportion from one to the other if I guessed wrong while setting up.

As you may have guessed, an in-place upgrade went rather poorly.  After rebooting the system, it would sit there like a lump waiting for filesystems to become available, and after a while it told me root wasn't there and gave me options to skip trying or to fix things manually (followed by a reboot).   But the strange thing is, / was mounted from /dev/disk/by-uuid/something, as the kernel command line has root=UUID=something.  OK, next go-round (because a reboot was I guess mandatory at that point, no option to just continue) I just said skip, and it complained about /tmp being absent.  Again, I went into manual fix mode, and / was fine, so I fsck'ed it and remounted it rw.  To no avail; didn't matter what was tried to coax it into thinking filesystems were OK, it was convinced my system was all in shambles.

The same friend told me about being able to boot into rescue mode, and automatically fix up things.  This sounded reasonable because he was talking about a regen of the initrd and reinstalling the bootloader.  So I tried that.  Naw...so much for the Rescue Remix, or the Desktop installation media.  Neither was really any help, even after trying to upgrade again.  What really screwed the pooch was that I saw it emit a message about "removing all files from /usr "...frak!  There  goes my build of BZFlag, in /usr/src.  I tried to abort, basically by shutting down/rebooting.  Then the system wasn't recognized anymore, the "upgrade" option was gone.  Sigh.  (Re)install it was then.

Unfortunately, that had similar, error-laden results as trying to upgrade, with no automatic progress after rebooting.  I then went and grabbed the Xubuntu amd64 alternate install ISO from BitTorrent.  I think it may be key that I got the alternate, which is kinda geared towards "Martians" like myself.  And another unfortunate thing with Ubuntu and its derivatives like Xubuntu is, installations often do not work (as in making an autobootable system) unless the options to format all partitions in question are given.  The only thing I left totally out of the installer was homelv.  At least all those personal Xfce settings, .emacs, .inputrc, .profile and the like for my "ordinary user" would survive and not have to be tweaked too much.  Not so lucky was root who, of course, is stored at /root, not /home/root.  But, since they're one in the same person (myself), it wasn't too bad just to copy dotfiles and such from my normal user on /home.

Finally, basically after a total wipeout (except as noted for /home), I had a system which would autoboot to a login prompt and an Xorg server which would run.  The first graphical login under my normal user asked me about migrating my Xfce settings to the newer version, and all that was pretty much fine.  Root was another story however.

Under Lucid, I liked gdm, so I installed it.  Unfortunately, gone was the GUI widget which I used to set up autologin (after 5 secs).  In fact, logging in as Other... and specifying root and root's password would sometimes hang gdm.  In fact, until I installed gnome-desktop-environment, the greeter's cursor would never change to a pointer, just stay at the "I'm busy doing something" variety.  Worse yet, once the greeter wedged while attempting to log in, switching to a console and doing stop gdm and start gdm wouldn't help one wit, only a system reboot seemed to fix it.  God and deep GNOME DE hackers only know what the frak it wanted/was waiting for and never got.

The next surprise was that similarly, the "users and groups" widget was thoroughly nonfunctional until gnome-desktop-environment was installed.  It would bring up a window all right, but everything in it was greyed out...everything.  In fact, only xkill would make it close, not the WM, no keystrokes such as Alt-F4....nuttin'.   Well, not strictly nuttin'; a logout would kill it, but in doing that, it would be restarted with similar nonfunctioning results when logging back in...so it was wise always to xkill it before logging out.

What was really weird compared to previous releases is that individual greeters weren't configured by "Users and Groups."  Both the ones I had on, lightdm and gdm, were affected simultaneously.  That was a bit disquieting, as I was expecting only gdm to be affected, so when I used dpkg-reconfigure to revert back to lightdm, it was autologging in too.  This implementation chooses to have an "on/off toggle button for autologin" under each user.  And incidentally, I never did get back into a functioning "users and groups" GUI widget to undo the autologin effect; I found the conf files by recursively grepping through /etc for my logname, and editing them manually.  It was also somewhat of a surprise that neither of them seems to implement a timeout mechanism.  I tried adding that into the lightdm.conf according to some people's postings about that subject, but to no avail.

I even got into a state where start gdm would flash the greeter on the screen for maybe a half second, but then flip right back to the console on which that command was given.  So much for autologin, and indeed for the hypothetical stability for a(n) LTS release.

Eventually I just gave up, and uninstalled anything GNOME related.  I had read so much about how people just hated it (the new v3) that I figured it really wasn't worth any hack time to get it to work right.  I did however try the v3 shell for one login session (ironically chosen from lightdm).  I will say it's sucky, but not as bad as the opaque Unity.  At least with GNOME3 shell, it's still tree-oriented, with (as I recall) two options of Activity and something else at the top level.  Then activity categories show up along the right of the screen, and selecting them causes the tree to be descended.  But just like the sucky Modern UI (or Metro, or whatever they're going to call it this week) and Unity UI, the programmers of it are deluded into thinking a tiled interface is appropriate for desktops.  Well, maybe "inappropriate" is too strong a word, maybe annoying by being aesthetically displeasing would be a bit more accurate.  I somewhat understand not wanting to program (or use) two UIs, but a workstation is not a tablet, nor a tablet a workstation.  There's no particularly good reason to jam the tablet interface on a desktop.  Microsoft are about to find this out in the worst way with Win8 right soon.

What I'm the most proud of figuring out today though was why video initialization within the radeon module seemed to hang.  After all, there wasn't this hang before I set about getting the radeon module to load sooner in the boot process.  As I alluded to earlier, one of the things I like to tweak is getting the framebuffer driver running and the screen to green on black as SOON in the boot process as possible.  That's one of the reasons I had studied the initramfs-tools package thoroughly.  I put in hooks and scripts which would copy the fb modules (which happens to be integral to the radeon kernel module) into the initial RAM disk, as well as setterm, fbset, and setfont.  As I wrote, I want this initialized ASAP, so I put it into the init-top section.  The big downside, if you're not really, really careful, is that if vid init doesn't go particularly well, you're mostly blind until networking and sshd start, if they start at all.  That's where I thought I ran into big trouble, as in, back possibly to reinstalling.

I was just getting ready to take the 8GB flash drive I used to boot the installer and (re)insert it into a USB socket when the monitor came alive again.  My, that was a bit encouraging; it didn't wedge completely, just it was problematic for some reason because it eventually finished.  So...I went into the grub defaults file, added a console=ttyS0,... part to the kernel arguments, updated grub, hooked up a serial cable to another workstation, and rebooted.

Shazam.  It was getting to some part in the radeon module initialization where it was trying to download microcode of some sort, and timing out after a very long pause (like two minutes or so).  With due deference to Yogi Bear, this is where the "smarter than the average hacker" part comes in.  It's not necessarily "what changed," which is the perennial troubleshooting favorite; we know what changed: the initrd.  It eventually came down to, as most engineering sessions of this sort do, what's the difference between when it was working fine and now?  I have run across this sort of thing somewhat frequently in the past, usually something like the difference in environment variables between interactive sessions and cron jobs.  That's when the second "shazam" moment came: it's so early in the boot process that udev may not be running!  The tipoff was the eventual message that the load from some file within /lib/firmware wasn't loaded.  See, not only does udev make things appear and disappear from within /dev due to things like USB flash drives being inserted or removed, it is also the agent which initiates a lot of the firmware loads.  I know this from watching my MythTV with a PVR-500 boot, which pauses for about 7-10 seconds per tuner.

Under Lucid this didn't seem to be an issue.  However the radeon module happened to be programmed, or perhaps even because of a slightly different bootup, it worked without any special attention.  This fix was as simple as adding a "udev" under PREREQS in my script.  That ensured firmware-loading udevd was running before even trying to initialize the Radeon video controller.  After that tweak, it was loading lickety-split.

One last thing: the terminfo files are now under /lib these days instead of /usr/share, and therefore on the root fs?  Really?  Seriously?   Sigh...whatever.  This is kind of required for setterm to do its work, it needs to know how to work with the terminal type linux.  Therefore one of my hooks had to copy it into the initrd image.

Instead of commenting here, direct all comments to Google+, preferably under the post about this blog entry.

English is a difficult enough language to interpret correctly when its rules are followed, let alone when the speaker or writer chooses not to follow those rules.

"Jeopardy!" replies and randomcaps really suck!

Please join one of the fastest growing social networks, Google+!