This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

Migrating disk volumes to new storage solutions can be a bit of a chore. Cloning file systems are usually done after taking your system offline in order to preserve consistency. But every so often you simply can’t afford to have your data or system unavailable for that period of time. So what to do?

Our problem

We had invested in new SAN storage solution and had to move the disk volumes for our virtual machines from the old SAN to the new one. The new solution did not have any handy migration tools ready, and scheduling downtime for each VM/server, for internal changes in infrastructure, was something we wanted to avoid presenting to our customers. Not to mention the overhead involved with stopping, migrating and starting thousands of VMs, most of which had inter-dependencies.

Software RAID to the rescue!

mdadm

mdadm is the tool used to manage Software RAID in Linux. It has the ability to create a RAID that is not persistent, the super-block is kept in memory rather then on the device.

The command will then take the form:

/sbin/mdadm --build /dev/DEVICE --level 1 --raid-disks=2 /dev/mapper/ORGINALDISK missing
/sbin/mdadm --add /dev/DEVICE /dev/mapper/NEWDISK

The “normal” way of creating a Software RAID is by using --create, but here we used --build. See the man page of mdadm for details and warning.

Then when the Software RAID has finished the synchronization, the RAID device is stopped.

/sbin/mdadm --stop /dev/DEVICE

And the /dev/mapper/NEWDISK can be used instead of /dev/mapper/ORGINALDISK

We always have the opportunity to stop the Software RAID device and revert to using the /dev/mapper/ORGINALDISK

SAN migration

Then it was just down to a few simple steps:

  • Stop the Virtual Machine
  • Create a Software RAID volume out of the original volume on the old SAN and an equal sized volume on the new SAN
  • Start the Virtual Machine pointing to the Software RAID volume instead of the old SAN volume
  • Wait for the synchronization to finalize
  • Stop the Virtual Machine
  • Stop the RAID device
  • Start the Virtual Machine pointing to the volume on the new SAN

This can be scripted, so in wall clock time, it mounts to 2 reboots of the Virtual Machine.

Real example

As I do not have pastes from the actual migration, I ran through the steps on normal LVM disk volumes.

Got some extra space on the rotating rust.

$ pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/sda2  system lvm2 a--  74.38g 5.00g
  /dev/sdb   slowhd lvm2 a--   2.73t 1.38t

Make two volumes for the task.

$ lvcreate -n orgvolume -L 100G slowhd
  Logical volume "orgvolume" created.
$ lvcreate -n destvolume -L 100G slowhd
  Logical volume "destvolume" created.
$ mkfs.xfs /dev/slowhd/orgvolume
meta-data=/dev/slowhd/orgvolume  isize=256    agcount=4, agsize=6553600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Then mount it and create some data on the disk.

$ mount /dev/slowhd/orgvolume /mnt/
$ dd if=/dev/urandom of=/mnt/somefile.data count=10 bs=1M
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.514991 s, 20.4 MB/s

$ md5sum /mnt/somefile.data
598fab89fe76e6019be2593779eea1e8  /mnt/somefile.data

Then it is time for some mdadm stuff

$ umount /mnt
$ /sbin/mdadm --build /dev/md126 --level 1 --raid-disks=2 /dev/slowhd/orgvolume missing
mdadm: array /dev/md126 built and started.
$ /sbin/mdadm --add /dev/md126 /dev/slowhd/destvolume
mdadm: added /dev/slowhd/destvolume
$ cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 dm-17[2] dm-16[0]
      104857600 blocks super non-persistent [2/1] [U_]
      [>....................]  recovery =  0.3% (355264/104857600) finish=39.2min speed=44408K/sec

unused devices: <none>

$ /sbin/mdadm -D /dev/md126
/dev/md126:
        Version :
  Creation Time : Thu Nov 26 14:00:34 2015
     Raid Level : raid1
     Array Size : 104857600 (100.00 GiB 107.37 GB)
  Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
   Raid Devices : 2
  Total Devices : 2

          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 1% complete

    Number   Major   Minor   RaidDevice State
       0     254       16        0      active sync   /dev/dm-16
       2     254       17        1      spare rebuilding   /dev/dm-17

And the main feature here: It’s possible to use the RAID while synchronization goes on.

mount /dev/md126 /mnt/

And we add some data to the disk while it is being synchronized.

$ dd if=/dev/urandom of=/mnt/someotherfile.data count=100 bs=1M
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 5.00504 s, 21.0 MB/s

$ md5sum /mnt/someotherfile.data
c5dfaa813c47f998e8367a576e881a05  /mnt/someotherfile.data

md spends some time synchronizing the block devices. Eventually, it will complete:

$ cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 dm-17[2] dm-16[0]
      104857600 blocks super non-persistent [2/2] [UU]

unused devices: <none>

$ /sbin/mdadm -D /dev/md126
/dev/md126:
        Version :
  Creation Time : Thu Nov 26 14:00:34 2015
     Raid Level : raid1
     Array Size : 104857600 (100.00 GiB 107.37 GB)
  Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
   Raid Devices : 2
  Total Devices : 2

          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0     254       16        0      active sync   /dev/dm-16
       2     254       17        1      active sync   /dev/dm-17

Then we can change back to the new device

$ umount /mnt
$ /sbin/mdadm --stop  /dev/md126
mdadm: stopped /dev/md126
$ mount /dev/slowhd/destvolume /mnt/
$ md5sum /mnt/some*
598fab89fe76e6019be2593779eea1e8  /mnt/somefile.data
c5dfaa813c47f998e8367a576e881a05  /mnt/someotherfile.data

We used this process to migrate a number of virtual servers from an old SAN to a new one. The downtime, for each virtual server, has been reduced to two reboots.

Jónas Helgi Pálsson

Senior Systems Consultant at Redpill Linpro

Jónas joined Redpill Linpro over a decade ago and has in that period worked as both a consultant and a system administrator. Main focus currently for Jónas is AWS and infrastructure on that platform. Previously been working with KVM and OpenStack, dabbles with programming and has a soft spot for openSUSE.

Why automate Ansible

Ansible can be used for many things. There are only a few things I have on my bucket list of things I would like to do, where Ansible cannot help me.

One of my most urgent things to handle was the increasing complexity of Ansible, its configuration and in particular the role development. As I got deeper into Ansible, more and more factors needed to be taken into consideration when setting up a role: the role structure, linting issues, molecule ... [continue reading]

Comparison of different compression tools

Published on December 18, 2024

Why TCP keepalive may be important

Published on December 17, 2024