Solaris 11 LDOM Recovery
Today We are going to discuss Solaris 11 LDOM Recovery. There might be many scenarios when you can follow Solaris 11 LDOM Recovery steps we are going to discuss. The base is Solaris 11 Control Domain hosting Solaris 10 and Solaris 11 LDOMS. Control Domain is running Veritas for Volume Management.
Case 1: OS Disk on which LDOM is configured is totally bad and it has to be replaced by a new disk from SAN.
Case 2: Existing Storage on which LDOM OS is configured is getting replaced to meet the business needs of high performance and low in price which is the most common factor for SAN Migrations to cut costs and increase performance using the latest technology available from time to time.
Overall what I mean to say scenario maybe anything that prevails situation that OS SAN Disk needs to be replaced.
We will see how we can perform Solaris 11 LDOM Recovery after Disk is replaced.
You might have observed that Diskname is getting changed while new SAN is allocated from New Storage Array which leads to a non-booting operating system.
First of all, we always recommend running below commands before starting any disk replacement as a safety procedure to make sure recovery after disk replacement in case of the non-booting OS.
1. Save CDOM Configuration # ldm add-spconfig <$date> 2. Save LDOM Disk Configuration # ldm ls -o disk <ldom> > /var/tmp/ldm-ls-o-disk-ldom.txt 3. Save LDOM Constrains for Specific LDOM # ldm list-constraints -x ldom_name > ldom_name.xml 4. Save All LDOM constraints # ldm list-constraints -x > /var/tmp/cdom_ldoms.xml
So let’s review old output we have saved to validate what is exactly changed after new SAN disk allocation from the new storage array. We have saved LDOM config in CDOM /var/tmp before the start of the work.
Below is the saved output before change which shows OS was residing on disk mentioned below. Once storage migration is done you will get device name from storage you can easily identify it is the same device or different device.
# cat /var/tmp/ldm-ls-o-disk-ldom01.txt NAME ldom01 DISK NAME VOLUME TOUT ID DEVICE SERVER MPGROUP vspg1k0_5539-p1 vspg1k0_5539-p1@primary-vds0 0 disk@0 primary # cat /bar/tmp/ldm-ls-o-disk-ldom02.txt |grep 538e vspg1k0_538e vspg1k0_538e@primary-vds0 0 disk@0 primary vspg1k0_538e New Devices allocated are as below # vxdisk -eo alldgs list |egrep "7250|7251" tagmastore-usp1_7250 auto:none - - online invalid c6t50060E8008773322d1s2 lun fc >>>>Solaris 10 VM tagmastore-usp1_7251 auto:ZFS -- ZFS c6t50060E8008773322d2 lun fc >>>>Solaris 11 VM
From the above sample output, you can see that LDOM OS Disk was configured was on Disk ID:5539 and 538E and also they were from usp0. Storage Team informed that new devices are assigned LUNS are from the different array and picked as usp1 with Disk ID as 7250/7251.
So after storage migration, if you try to boot normally considering that those new LUNS virtualized same as old you will end up non-booting OS with below type of error.
Please enter a number [1]: Jul 4 13:15:36 solaris vdc: NOTICE: vdisk@0 disk access failed
In both cases, you need to remove the existing OS disk configuration as it got change usp0 to usp1 and LUN ID also got changed.
Table of Contents
Removing Existing Device
Remove the existing devices from the CDOM configuration using below. You need to stop VM to remove.You need to do the same for all LDOMs in which source and ID got changed during the allocation of new LUN replacement for OS Disk.
# ldm stop ldom01 # ldm remove-vdisk -f usp0_01d3-p1 ldom01 # ldm remove-vdsdev -f usp0_01d3-p1@primary-vds0 # ldm ls –o disk ldom01 >>> Validate there is no configuration exists anymore.
(A) Adding New Devices to Solaris 10 LDOMS using MPGROUP
# ldm add-vdsdev mpgroup=usp1_7250 /dev/vx/dmp/tagmastore-usp1_7250 usp1_7250-p1@primary-vds0 # ldm add-vdsdev mpgroup=usp1_7250 /dev/vx/dmp/tagmastore-usp1_7250 usp1_7250-p2@secondary-vds0 # ldm add-vdisk usp1_7250-p1 usp1_7250-p1@primary-vds0 ldom01 # ldm add-vdisk usp1_7250-p2 usp1_7250-p2@secondary-vds0 ldom01
You need to do the same for all LDOM. Primary and secondary here because the failover domain configured for availability purposes.
# ldm ls –o disk ldom01 Validate configuration done. # ldm start ldom01 # telnet localhost <port> OK > boot
It will boot.
(B) Adding New Devices to Solaris 11 LDOMS
There is small catch allocation like Solaris 10 is not going to work for Solaris 11 LDOM as mentioned above Disk Format is ZFS.It does not pick. Let us validate.
If you see partition details of new assigned disk you will find like below.
# format c6t50060E8008773322d2 >print>print
Total disk sectors available: 209698749 + 16384 (reserved sectors)
Part Tag Flag First Sector Size Last Sector
0 usr wm 256 99.99GB 209698782
1 unassigned wm 0 0 0
2 unassigned wm 0 0 0
3 unassigned wm 0 0 0 >>All these are not in use from s1-s6
4 unassigned wm 0 0 0
5 unassigned wm 0 0 0
6 unassigned wm 0 0 0
8 reserved wm 209698783 8.00MB 209715166 >>These contains vtoc/inode-index/inode tables etc.
New Disk Validation and finding boot device
# installboot -ei -F zfs /dev/vx/rdmp/tagmastore-usp1_7251s0 0.5.11,5.11-0.175.3.0.0.30.0 ae693bf2772f180ef478c46ef0d921a5 See above s0 partition of the disk contains bootblock and OS from the format and installboot enquiry.Also, you can validate the same using the veritas database using below.Remember mentioning partition is very important. # zdb -lc /dev/vx/rdmp/tagmastore-usp1_7251s0 -------------------------------------------------- LABEL 0 - VALID -------------------------------------------------- timestamp: 1590062854 UTC: Thu May 21 12:07:34 2020 version: 37 name: 'rpool' state: 0 txg: 81059 pool_guid: 12702437114737145633 hostid: 2230872562 hostname: '' top_guid: 18174691079512398175 guid: 18174691079512398175 vdev_children: 1 vdev_tree: type: 'disk' id: 0 guid: 18174691079512398175 path: '/dev/dsk/c1d0s0' devid: 'id1,vdc@n60060e8007dd6f000030dd6f00005539/a' phys_path: '/virtual-devices@100/channel-devices@200/disk@1:a' whole_disk: 1 metaslab_array: 27 metaslab_shift: 29 ashift: 9 asize: 106837311488 is_log: 0 create_txg: 4 -------------------------------------------------- LABEL 1 - VALID - CONFIG MATCHES LABEL 0 -------------------------------------------------- LABEL 2 - VALID - CONFIG MATCHES LABEL 0 -------------------------------------------------- LABEL 3 - VALID - CONFIG MATCHES LABEL 0
So for Solaris 11 LDOM you need to add devices using s0 partition like below.
# ldm stop ldom02 # ldm add-vdsdev /dev/vx/dmp/tagmastore-usp1_7251s0 usp1_7251-p1@primary-vds0 # ldm add-vdisk usp1_7251-p1 usp1_7251-p1@primary-vds0 ldom02
Please remember to add a secondary path also if you have configured the secondary domain as well. Also, you need to do the same for all your Solaris 11 LDOMS.
After that, you need to unbind VM to re-read new configuration to take effect.
# ldm unbind ldom02 # ldm bind ldom02 # ldm start ldom02
Connect to LDOM console
OK > devalias
OK> printenv boot-device
OK > setenv boot-device <bootdeviceaddress>
from devalias and zlib highlighted disk.If it says Disk0 then Disk 0 or else Disk1 if it says Disk1 in zlib output.
OK > boot
You will be able to boot VM. This is all about Solaris 11 LDOM Recovery with Solaris 10 LDOM recovery as well. There was a debate with Oracle on this especially for Solaris 11 you can use s2 partition to add disks as they recommend as there is a chance of corruption to disk but using the above method I have recovered all the LDOMS either Solaris 10 or Solaris 11 recently after SAN migration of OS Disk.
I hope you like this how we can recover Solaris 11 LDOMS after SAN Migration or Disk Replacement. Please share as much as possible if you like and in case of any query please do write to us we will get back as soon as possible.
If you have lost all LDOM configuration due to any reason like mainboard replacement and like to recover please see how to restore domain configurations.
If you need to understand how to configure ipmp in Solaris 11 please refer this.
Comments
Pingback: How to check top process in solaris
Pingback: Solaris 11 OS: WARNING Unsupported Bootblk Image Can Not Extract Fcode NixDrafts