Solaris 11 LDOM Recovery

Solaris 11 CDOM Recovery

RSY Digital World

Solaris 11 LDOM Recovery

Today We are going to discuss Solaris 11 LDOM Recovery. There might be many scenarios when you can follow Solaris 11 LDOM Recovery steps we are going to discuss. The base is Solaris 11 Control Domain hosting Solaris 10 and Solaris 11 LDOMS. Control Domain is running Veritas for Volume Management.

Case 1: OS Disk on which LDOM is configured is totally bad and it has to be replaced by a new disk from SAN.

Case 2: Existing Storage on which LDOM OS is configured is getting replaced to meet the business needs of high performance and low in price which is the most common factor for SAN Migrations to cut costs and increase performance using the latest technology available from time to time.

Overall what I mean to say scenario maybe anything that prevails situation that OS SAN Disk needs to be replaced.

We will see how we can perform Solaris 11 LDOM Recovery after Disk is replaced.

You might have observed that Diskname is getting changed while new SAN is allocated from New Storage Array which leads to a non-booting operating system.

First of all, we always recommend running below commands before starting any disk replacement as a safety procedure to make sure recovery after disk replacement in case of the non-booting OS.

1. Save CDOM Configuration
# ldm add-spconfig <$date>
2. Save LDOM Disk Configuration
# ldm ls -o disk <ldom> > /var/tmp/ldm-ls-o-disk-ldom.txt
3. Save LDOM Constrains for Specific LDOM
# ldm list-constraints -x ldom_name > ldom_name.xml
4. Save All LDOM constraints
# ldm list-constraints -x > /var/tmp/cdom_ldoms.xml

So let’s review old output we have saved to validate what is exactly changed after new SAN disk allocation from the new storage array. We have saved LDOM config in CDOM /var/tmp before the start of the work.

Below is the saved output before change which shows OS was residing on disk mentioned below. Once storage migration is done you will get device name from storage you can easily identify it is the same device or different device.

# cat /var/tmp/ldm-ls-o-disk-ldom01.txt
NAME ldom01
DISK    NAME         VOLUME                 TOUT ID   DEVICE  SERVER         MPGROUP
vspg1k0_5539-p1 vspg1k0_5539-p1@primary-vds0      0    disk@0  primary
# cat /bar/tmp/ldm-ls-o-disk-ldom02.txt |grep 538e
vspg1k0_538e vspg1k0_538e@primary-vds0      0    disk@0  primary        vspg1k0_538e
New Devices allocated are as below
# vxdisk -eo alldgs list |egrep "7250|7251"
tagmastore-usp1_7250 auto:none - - online invalid c6t50060E8008773322d1s2 lun fc >>>>Solaris 10 VM
tagmastore-usp1_7251 auto:ZFS --           ZFS    c6t50060E8008773322d2 lun fc >>>>Solaris 11 VM

From the above sample output, you can see that LDOM OS Disk was configured was on Disk ID:5539 and 538E and also they were from usp0. Storage Team informed that new devices are assigned LUNS are from the different array and picked as usp1 with Disk ID as 7250/7251.

So after storage migration, if you try to boot normally considering that those new LUNS virtualized same as old you will end up non-booting OS with below type of error.

Please enter a number [1]: Jul  4 13:15:36 solaris vdc: NOTICE: vdisk@0 disk access failed

In both cases, you need to remove the existing OS disk configuration as it got change usp0 to usp1 and LUN ID also got changed.

Removing Existing Device

Remove the existing devices from the CDOM configuration using below. You need to stop VM to remove.You need to do the same for all LDOMs in which source and ID got changed during the allocation of new LUN replacement for OS Disk.

# ldm stop ldom01
# ldm remove-vdisk -f usp0_01d3-p1 ldom01
# ldm remove-vdsdev -f usp0_01d3-p1@primary-vds0
# ldm ls –o disk ldom01   >>> Validate there is no configuration exists anymore.

(A) Adding New Devices to Solaris 10 LDOMS using MPGROUP

# ldm add-vdsdev mpgroup=usp1_7250 /dev/vx/dmp/tagmastore-usp1_7250 usp1_7250-p1@primary-vds0
# ldm add-vdsdev mpgroup=usp1_7250 /dev/vx/dmp/tagmastore-usp1_7250 usp1_7250-p2@secondary-vds0
# ldm add-vdisk usp1_7250-p1 usp1_7250-p1@primary-vds0 ldom01
# ldm add-vdisk usp1_7250-p2 usp1_7250-p2@secondary-vds0 ldom01

You need to do the same for all LDOM. Primary and secondary here because the failover domain configured for availability purposes.

# ldm ls –o disk ldom01       Validate configuration done.
# ldm start ldom01
# telnet localhost <port>
OK > boot

It will boot.

(B) Adding New Devices to Solaris 11 LDOMS

There is small catch allocation like Solaris 10 is not going to work for Solaris 11 LDOM as mentioned above Disk Format is ZFS.It does not pick. Let us validate.

If you see partition details of new assigned disk you will find like below.

# format c6t50060E8008773322d2 >print>print
Total disk sectors available: 209698749 + 16384 (reserved sectors)
Part      Tag    Flag     First Sector         Size         Last Sector
0        usr    wm               256       99.99GB          209698782
1 unassigned    wm                 0           0               0
2 unassigned    wm                 0           0               0
3 unassigned    wm                 0           0               0  >>All these are not in use from s1-s6
4 unassigned    wm                 0           0               0
5 unassigned    wm                 0           0               0
6 unassigned    wm                 0           0               0
8   reserved    wm         209698783        8.00MB          209715166  >>These contains vtoc/inode-index/inode tables etc.

 New Disk Validation and finding boot device

# installboot -ei -F zfs /dev/vx/rdmp/tagmastore-usp1_7251s0

See above s0 partition of the disk contains bootblock and OS from the format and 
installboot enquiry.Also, you can validate the same using the veritas database 
using below.Remember mentioning partition is very important.
# zdb -lc /dev/vx/rdmp/tagmastore-usp1_7251s0
timestamp: 1590062854 UTC: Thu May 21 12:07:34 2020
version: 37
name: 'rpool'
state: 0
txg: 81059
pool_guid: 12702437114737145633
hostid: 2230872562
hostname: ''
top_guid: 18174691079512398175
guid: 18174691079512398175
vdev_children: 1
type: 'disk'
id: 0
guid: 18174691079512398175
path: '/dev/dsk/c1d0s0'
devid: 'id1,vdc@n60060e8007dd6f000030dd6f00005539/a'
phys_path: '/virtual-devices@100/channel-devices@200/disk@1:a'
whole_disk: 1
metaslab_array: 27
metaslab_shift: 29
ashift: 9
asize: 106837311488
is_log: 0
create_txg: 4

So for Solaris 11 LDOM you need to add devices using s0 partition like below. 

# ldm stop ldom02
# ldm add-vdsdev /dev/vx/dmp/tagmastore-usp1_7251s0   usp1_7251-p1@primary-vds0
# ldm add-vdisk usp1_7251-p1 usp1_7251-p1@primary-vds0 ldom02

Please remember to add a secondary path also if you have configured the secondary domain as well. Also, you need to do the same for all your Solaris 11 LDOMS.

After that, you need to unbind VM to re-read new configuration to take effect.

# ldm unbind ldom02
# ldm bind ldom02
# ldm start ldom02

Connect to LDOM console

OK >  devalias

OK>  printenv boot-device

OK > setenv boot-device <bootdeviceaddress>

from devalias and zlib highlighted disk.If it says Disk0 then Disk 0 or else Disk1 if it says Disk1 in zlib output.

OK > boot

You will be able to boot VM. This is all about Solaris 11 LDOM Recovery with Solaris 10 LDOM recovery as well. There was a debate with Oracle on this especially for Solaris 11 you can use s2 partition to add disks as they recommend as there is a chance of corruption to disk but using the above method I have recovered all the LDOMS either Solaris 10 or Solaris 11 recently after SAN migration of OS Disk.

I hope you like this how we can recover Solaris 11 LDOMS after SAN Migration or Disk Replacement. Please share as much as possible if you like and in case of any query please do write to us we will get back as soon as possible.

If you have lost all LDOM configuration due to any reason like mainboard replacement and like to recover please see how to restore domain configurations.

If you need to understand how to configure ipmp in Solaris 11 please refer this.