EL5 to EL6: Pushing an in-place upgrade with Puppet

Posted by Ryan Uber | Bourne-Again Shell (bash),Enterprise Linux,Linux,Puppet,RPM | Saturday 7 January 2012 3:24 pm

EL6 is a large step up from the 5.x family – There are hundreds of improvements that every sysadmin will be eager to take advantage of; newer versions of most popular packages, a new kernel based on 2.6.32, new software added to the enterprise linux “base”, etc. With so many large changes to the operating system, in-place upgrades become harder. You have a gang of EL5.x boxes and want to move them to the 6.x distribution, but the fence between the two distributions is a fairly high one, the largest feat being the updates to YUM, python (2.6), and RPM itself (4.6+). You can’t read new RPM packages with an old version of RPM (pre-4.6 can’t read EL6-ready RPMs), and there is no 4.6 version of RPM available for the EL5 distribution. Upgrading core components such as glibc and python in-place can be scary too. However, the jump from EL5 to EL6 is not impossible. I’ll outline here how I was able to make it happen.

What you can’t be afraid of to accomplish this

  1. Forcefully breaking RPM dependencies (will be resolved upon completion)
  2. Ignoring package checksum mismatches
  3. Rebooting

New version of RPM
Seeing how its not possible to install EL6 RPM’s with EL5′s stock version of RPM, the first thing to do is to get that newer version of RPM installed on your system. Keith Chambers, a friend and colleague of mine, recompiled RPM 4.6 against glibc 2.5 so that it would work with the current EL5 system. RPM 4.8 may have worked as well, but 4.6 worked for me, and I only needed to use this version during the big upgrade. RPM will replace itself with the stock EL6 version later on. For dependencies, I needed to include xz-libs and lua. I force-upgraded these packages (–nodeps). What you want to end up with is:

# rpm --version
RPM version 4.6.0

Patch YUM
Since the new RPM’s coming from the repo will be verified with a newer hashing algorithm than what YUM/python2.4 will recognize, I needed to make a few quick patches to YUM itself to force it to ignore conflicts. These ghetto-hack patches will be automatically removed from the system when YUM upgrades itself, so really this is only needed one time.

diff -Nur /usr/lib/python2.4/site-packages/yum.orig/depsolve.py /usr/lib/python2.4/site-packages/yum/depsolve.py
--- /usr/lib/python2.4/site-packages/yum.orig/depsolve.py       2011-08-19 15:07:02.000000000 +0000
+++ /usr/lib/python2.4/site-packages/yum/depsolve.py    2012-01-07 20:43:21.634917883 +0000
@@ -142,6 +142,7 @@
                             'repackage': rpm.RPMTRANS_FLAG_REPACKAGE}

         self._ts.setFlags(0) # reset everything.
+        self._ts.addTsFlag(rpm.RPMTRANS_FLAG_NOMD5)

         for flag in self.conf.tsflags:
             if ts_flags_to_rpm.has_key(flag):
diff -Nur /usr/lib/python2.4/site-packages/yum.orig/__init__.py /usr/lib/python2.4/site-packages/yum/__init__.py
--- /usr/lib/python2.4/site-packages/yum.orig/__init__.py       2011-08-19 15:07:02.000000000 +0000
+++ /usr/lib/python2.4/site-packages/yum/__init__.py    2012-01-07 20:43:01.074133746 +0000
@@ -1217,6 +1217,7 @@
                 failed = True

+        failed = False
         if failed:
             # if the file is wrong AND it is >= what we expected then it
             # can't be redeemed. If we can, kill it and start over fresh
diff -Nur /usr/lib/python2.4/site-packages/yum.orig/yumRepo.py /usr/lib/python2.4/site-packages/yum/yumRepo.py
--- /usr/lib/python2.4/site-packages/yum.orig/yumRepo.py        2011-08-19 15:07:02.000000000 +0000
+++ /usr/lib/python2.4/site-packages/yum/yumRepo.py     2012-01-07 20:43:09.834466442 +0000
@@ -1467,6 +1467,7 @@
             file = fn.filename
         else:
             file = fn
+        return 1

         try:
             l_csum = self._checksum(r_ctype, file) # get the local checksum

I wrote a few quick lines in SED to handle this patching for me:

sed -i '/^        if failed:/i\        failed = False' /usr/lib/python2.4/site-packages/yum/__init__.py
sed -i '/^            file = fn$/a\        return 1' /usr/lib/python2.4/site-packages/yum/yumRepo.py
sed -i '/# reset everything.$/a\        self._ts.addTsFlag(rpm.RPMTRANS_FLAG_NOMD5)' /usr/lib/python2.4/site-packages/yum/depsolve.py

Package Conflicts
There are a number of packages while performing an upgrade from EL5 to EL6 that will cause you trouble, as some of the version numbers have actually decremented, and some cause dependency resolution issues that will be solved only after the upgrade is complete. The following is a small snippet that I used to force my way out of this situation:

declare -ra FORCEREMOVE=(
    m2crypto centos-release newt authconfig prelink tcp_wrappers sgpio
    iscsi-initiator-utils mkinitrd dmraid dmraid-events hmaccalc sysfsutils
    device-mapper device-mapper-multipath device-mapper-event
    vmware-open-vm-tools-common vmware-open-vm-tools-kmod less usermode
    libhugetlbfs lvm2 kpartx e4fsprogs-libs glib libsysfs
)
for PKG in ${FORCEREMOVE[@]}; do
    rpm -e --nodeps ${PKG}
done

PAM authentication work-around
By default, the RPM’s that install PAM do not clean out its include directory (/etc/pam.d). If the authentication modules you are using change at all (new ones added, old ones removed), you will need to manually clean out this directory. I found it easiest to just remove all of the configs and let them be re-populated by the new RPM’s. Without this step, after the upgrade completed, I was unable to log in to the console of the machine, because some authentication modules specified in the config files were now absent (no longer needed in the product I work on). If you have custom configs, you will need to account for them somehow, hopefully using Puppet or your configuration tool of choice.

rm -f /etc/pam.d/*

Re-install centos-release
You will notice in one of the steps above that I force-removed the centos-release package. The reason I needed to do this was because of the following:

package centos-release-5-7.el5.centos.x86_64 (which is newer than centos-release-6-0.el6.centos.5.x86_64) is already installed

By force-removing the centos-release package, and then performing a “yum install centos-release”, the centos-release package gets updated to the 6.x version.

Downgrade nss
This is a very important step. The version of “nss” actually decremented between EL5 and EL6. Since the new glibc 2.12 requires nss-softokn-freebl, which requires nss, we need to downgrade nss. This step should downgrade nss, install a few new packages, and update glibc. Once you do this, there is really no going back.

# yum downgrade nss
Loaded plugins: fastestmirror
Setting up Downgrade Process
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package nss.x86_64 0:3.12.7-2.el6 set to be updated
--> Processing Dependency: nss-softokn(x86-64) >= 3.12.7 for package: nss
--> Processing Dependency: nss-util >= 3.12.7 for package: nss
--> Processing Dependency: libnssutil3.so(NSSUTIL_3.12.3)(64bit) for package: nss
--> Processing Dependency: nss-system-init for package: nss
--> Processing Dependency: libnssutil3.so(NSSUTIL_3.12)(64bit) for package: nss
--> Processing Dependency: libnssutil3.so(NSSUTIL_3.12.5)(64bit) for package: nss
--> Processing Dependency: libnssutil3.so()(64bit) for package: nss
---> Package nss.x86_64 0:3.12.8-4.el5_6 set to be erased
--> Running transaction check
---> Package nss-softokn.x86_64 0:3.12.8-1.el6_0 set to be updated
--> Processing Dependency: nss-softokn-freebl(x86-64) >= 3.12.8 for package: nss-softokn
---> Package nss-sysinit.x86_64 0:3.12.7-2.el6 set to be updated
---> Package nss-util.x86_64 0:3.12.8-1.el6_0 set to be updated
--> Running transaction check
---> Package nss-softokn-freebl.x86_64 0:3.12.8-1.el6_0 set to be updated
--> Processing Dependency: libc.so.6(GLIBC_2.7)(64bit) for package: nss-softokn-freebl
--> Running transaction check
---> Package glibc.x86_64 0:2.12-1.7.el6_0.5 set to be updated
--> Processing Dependency: glibc-common = 2.12-1.7.el6_0.5 for package: glibc
--> Running transaction check
---> Package glibc-common.x86_64 0:2.12-1.7.el6_0.5 set to be updated
--> Processing Conflict: glibc conflicts binutils < 2.19.51.0.10
--> Restarting Dependency Resolution with new changes.
--> Running transaction check
---> Package binutils.x86_64 0:2.20.51.0.2-5.11.el6 set to be updated
--> Processing Conflict: glibc conflicts prelink < 0.4.2
--> Restarting Dependency Resolution with new changes.
--> Running transaction check
---> Package prelink.x86_64 0:0.4.6-3.el6 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================
 Package                   Arch          Version                     Repository     Size
=========================================================================================
Updating:
 binutils                  x86_64        2.20.51.0.2-5.11.el6        base          2.8 M
 prelink                   x86_64        0.4.6-3.el6                 base          994 k
Downgrading:
 nss                       x86_64        3.12.7-2.el6                base          735 k
Installing for dependencies:
 nss-softokn               x86_64        3.12.8-1.el6_0              base          166 k
 nss-softokn-freebl        x86_64        3.12.8-1.el6_0              base          115 k
 nss-sysinit               x86_64        3.12.7-2.el6                base           26 k
 nss-util                  x86_64        3.12.8-1.el6_0              base           46 k
Updating for dependencies:
 glibc                     x86_64        2.12-1.7.el6_0.5            base          3.7 M
 glibc-common              x86_64        2.12-1.7.el6_0.5            base           14 M

Transaction Summary
=========================================================================================
Install       4 Package(s)
Upgrade       4 Package(s)
Remove        0 Package(s)
Reinstall     0 Package(s)
Downgrade     1 Package(s)

Once this has completed, you are clear to upgrade the rest of the system.

# yum -y upgrade

Re-install a few packages
A few of the packages that we force-removed for dependency resolution reasons will not get automatically installed. Therefore, we need to install them by hand:

declare -ra VMW_PKGS=(
    vmware-tools-core
    vmware-tools-foundation
    vmware-tools-guestlib
    vmware-tools-libraries-nox
    vmware-tools-plugins-guestInfo
    vmware-tools-plugins-vix
    vmware-tools-services
    vmware-tools-plugins-deployPkg
)
yum -y install ${VMW_TOOLS[@]} prelink lvm2 less

You can ignore the vmware packages if you are not using them.

Remove the old kernels
You should no longer need any of your EL5 kernels. You have hopped the fence and are in the land of EL6 now. You should be able to delete (rpm -e) any kernel-2.6.18* packages that are still installed.

Rebooting
Since I added all of the above logic in a script and executed it during a pre-install stage with Puppet, I had to have a way to automatically reboot the machines as well. In one of my previous posts, Puppet Self-Management, I detailed how to patiently wait for a puppet run to finish before executing some action from within a script. I applied this same technique to the reboot for my EL5 to EL6 upgrade script. Since I am executing my upgrade script during a puppet run, I do not want to bounce the machine before the run completes to avoid corrupting the state, and also to ensure that all of my other puppet-managed updates are applied before the machine boots in to EL6 for the first time. I accomplished this with the following code at the very end of my upgrade script:

/bin/sh -c "
    until [ ! -f /var/lib/puppet/state/puppetdlock ]
    do
        sleep 1
    done
    /sbin/shutdown -r now" &

Puppet Self-Management

Posted by Ryan Uber | Linux,Puppet | Tuesday 19 July 2011 10:37 pm

As configuration becomes more and more automatic while using puppet, at some point you will start thinking about how you will go about managing puppet’s configuration itself. Your first thought was probably “I’ll just use puppet!” which is certainly the right attitude, however there are a couple of caveats. One such caveat is managing the puppet service. Let’s say you organize your manifest for managing puppet in a pretty standard fashion. You are managing puppet.conf, which notifies the puppet service when it is changed (or the puppet service is subscribed to the file, or both). This is all well and good. It’s how you would manage just about any other package/file/service manifest.

The problem is this: Puppet begins to run, it updates its config file, and then notifies the service to restart due to the change. The service restart will not necessarily be the very last thing to happen during the puppet run, so if you are in the middle of a run and the service restarts, you are most likely going to be in a bit of trouble. Typically what happens is that the running process gets killed prematurely, causing puppet’s state file to become corrupted. This isn’t the end of the world. However, if the lock file did not get removed as part of the service shutdown, you might be in slightly more serious trouble. Normally at this point, you would probably need to log in to the machine via SSH to correct the issue. Not a huge deal, unless you are managing a large number of systems.

All of this trouble can be avoided. At some point, maybe puppetd will be able to catch the SIGHUP correctly and handle this whole thing gracefully on its own. Until that day comes, I have come up with a small script that will help avoid this.

Essentially, what this script does is fork an “until” loop. The loop will check if the “puppetdlock” file exists, which would indicate that a puppet run is in progress. If it does, the loop will go back to sleep for a second and then try again. This is repeated until the file goes away, indicating the whatever was running is now done. At this point it would be safe to do a puppet restart, so the loop ends and the restart is then carried out.

This of course will not give you an accurate “puppet restarted successfully” message, because the only thing really being tested here is the success of the fork operation. You must rely on centralized logging or similar to catch the unlikely puppet service failure.

#!/bin/bash
# File Name:     restart-puppetd.sh
# Author:        Ryan Uber <ryan@blankbmx.com>
#
# Description:   This script is a hack! However, it solves a very important
#                issue with puppet. Normally, if you subscribe the puppet
#                service to the puppet.conf file, the puppet service will
#                be restarted too soon, interrupting the current puppet
#                run. Various attempts at using configure_delayed_restart
#                among other things have not proven to be 100% effective.
#                This script will watch the puppetdlock file, which can
#                determine whether or not there is a run in progress. If
#                there is a run in progress, we sleep for a second and then
#                test again until the process is unlocked. Once unlocked, we
#                can safely call a puppet restart. The checker process
#                itself gets forked into the background. If it were not
#                forked into the background, the puppet run would sit and
#                wait for the process to return, or for the exec timeout,
#                whichever came first. This would cause serious trouble if
#                timeouts were disabled or very long periods of time.

# Begin waiting for the current puppet run to finish, then restart.
/bin/sh -c "
    until [ ! -f /var/lib/puppet/state/puppetdlock ]
    do
        sleep 1
    done
    /sbin/service puppet restart" &

# Always return true, since this script just forks another process.
exit 0

# EOF

Global Definitions and Relative Scope in Puppet

Posted by Ryan Uber | Puppet | Saturday 14 May 2011 11:52 am

After completing some big piece of code, before committing it is almost always helpful to take a step back and look at it at a high level. Look for patterns and ways it can be simplified. Using software like puppet, you will inevitably find many, many patterns to the things you normally do with Unix systems. For instance, file permissions and group ownership. Recently, I realized that I used the mode, owner, group, and notify tags far too often throughout my puppet manifests when dealing with individual files.

I realized that not all of the files need the same permissions, and not all of them need the same owner or group. Most manifests, however, will contain 3 basic types: a package, a file, and a service, or “PFS“. These 3 types are related to the same thing, and most likely will have the same owner and the same group at the minimum. My thinking behind this is, you should only need to specify what user / group is going to own these files and folders one time, and have it apply to everything else.

To demonstrate, here is what a typical puppet manifest might look like, writing everything out, plain and simple:

file {

    "file_1":
        path    => "/path/to/file_1",
        owner   => "someuser",
        group   => "somegroup",
        mode    => 0644,
        content => "This is a test.\n";

    "file_2":
        path    => "/path/to/file_2",
        owner   => "someuser",
        group   => "somegroup",
        mode    => 0644,
        content => "This is another test.\n";
}

You can already see the patterns emerging even though we have only defined two files so far. So lets say we have 5 of these files. With the above method of defining the files / attributes, your manifests might become quite long, especially if you are defining more attributes that are common, like a require or maybe a notify.

The following example achieves the exact same effect as specifying the owner, group, and mode in each file definition, saving us 15 lines of duplicate definitions in just 5 file statements:

File {
    owner   => "someuser",
    group   => "somegroup",
    mode    => 0644
}

file {

    "file_1":
        path    => "/path/to/file_1",
        content => "This is a test.\n";

    "file_2":
        path    => "/path/to/file_2",
        content => "This is another test.\n";

    "file_3":
        path    => "/path/to/file_3",
        content => "This is test 3.\n";

    "file_4":
        path    => "/path/to/file_4",
        content => "This is test 4.\n";

    "file_5":
        path    => "/path/to/file_5",
        content => "This is test 5.\n";
}

Now things might get a little trickier. You obviously won’t have every file you manage owned by the same user, or the same group, or with the same permissions. Two things come into play here:

  1. The fact that we are only defining the defaults, and
  2. Scope

Since what we have specified for owner, group, and mode already are only the defaults, you can still define those attributes per-file. For instance:

File {
    owner   => "someuser",
    group   => "somegroup",
    mode    => 0644
}

file {

    "file_1":
        path    => "/path/to/file_1",
        content => "This is a test.\n";

    "file_2":
        path    => "/path/to/file_2",
        owner   => "someotheruser",
        content => "This is another test.\n";
}

In the above example, “file_1″ will get the defaults for all 3 attributes, and therefore be owned by “someuser:somegroup”. “file_2″, however, overrides the default owner, and will thus have ownership of “someotheruser:somegroup”.

Now for scope. Suppose I have an Apache class and a MySQL class. These should not have the same ownership. However, if I have defined the files, services, and other things related to each piece of software in separate classes, then I am in luck.

Global defaults can be defined per-class, and are inherited.

File {
    owner   => "root",
    group   => "root",
    mode    => 0700;
}

class "mysql"
{
    File {
        owner   => "mysql",
        group   => "mysql",
        mode    => 0644
    }

    file { "my.cnf":
        path    => "/etc/my.cnf",
        content => "This is a test.\n";
    }
}

class "httpd"
{
    File {
        owner   => "apache",
        group   => "apache",
        mode    => 0644
    }

    file { "httpd.conf":
        path    => "/etc/httpd/conf/httpd.conf",
        content => "This is another test.\n";
    }
}

You can see how the global defaults are defined at the top here, as “root:root” with “0700″ permissions. Then, within each class, new defaults are set for the files, and therefore, within the scope of that class only, all file statements get the class-specific permissions.

Use this technique throughout your manifests, and you will notice they will start to appear much more simplistic and organized while accomplishing the same result. Also keep in mind that this makes it infinitely easier to modify attributes at a wide scale without re-keying the modifications again and again. Do be warned, however, adding a new attribute to a defaults definition will affect many files, so be sure that it will not negatively impact any one particular item in your manifests.

Next Page »