LDM Command Hangs
In this article, I am going to share step-by-step procedures on how to address if the ldm command hangs and shows the error “failed to connect to logical domain manager connection refused” while running the ldm list command. ldm list command is not working due to the orphaned ldmd service in the control domain.
Environment referred in this case if of dual-domain and if you are not able to run ldm commands on CDOM.In such a scenario, you will lose supportability for LDOMs unless you have saved the current ldm list command output.
You can validate the ldmd service which is responsible for the ldm list command and see the status. ldmd service depends upon ldoms/agents. For ldmd service, operational and gives ability to run ldm commands /ldoms/agents must be in the online state.
# svcs -a|egrep "ldmd|ldom" online* Dec_26 svc:/ldoms/ldmd:default online Dec_26 svc:/ldoms/vntsd:default online 18:52:48 svc:/ldoms/ldmd_dir:default online 19:03:34 svc:/ldoms/agents:default You can validate log location of ldmd service. # svcs -L svc:/ldoms/ldmd /var/svc/log/ldoms-ldmd:default.log
If you try to stop and start the ldmd service it does not give any error. If you try to clear service it shows it is not in maintenance mode. It always shows in online* mode and it seems went into a coma kind of situation.
# svcadm clear svc:/ldoms/ldmd:default svcadm: Instance "svc:/ldoms/ldmd:default" is not in a maintenance or degraded state. # svcadm disable svc:/ldoms/ldmd:default # svcadm enable svc:/ldoms/ldmd:default If you see messages log file you will get error messages like below. Feb 27 18:56:07 cdom1 svc.startd: [ID 122153 daemon.warning] svc:/ldoms/ldmd:default:Method or service exit timed out. Killing contract 131.
This is the environment I just referred to we have dual-domain configured. So the solution we are going to try especially related to this environment. This can be applied on a single domain environment as well but it will have an impact on running LDOMS so you need real downtime for all LDOMS and taking approval may be a challenge as there might be many unseen processes that might be running.
# ldm ls |egrep "primary|secondary" primary active -n-cv- UART 16 49664M 6.6% 6.6% 18h 6m secondary active -n--v- 5000 16 32G 1.9% 1.9% 64d 13h 12m
Till now you might have sense we are going towards rebooting the CDOM. Here is the best practice we will reboot the LDOM and will be using the -d option with reboot which will dump the system so core gets generated which you can use to work with the vendor to find RCA if your department needs it.
There are two solutions here you can adopt anyone based on your environment one is non-impacting in mentioned case another method will require downtime from applications running on multiple LDOMs on CDOM.
Method 1: Rebooting CDOM with reboot -d command
Method 2: Stopping/Starting System from Console
Method 1: Either log in to CDOM via SSH or via Console and simply type reboot -d command to reboot the CDOM and have core dump generated as mentioned earlier for RCA.
Method 2: You need to login to ILOM and run the -> stop /SYS command and give confirmation. Remember in this method you need to have downtime. Wait for few minutes say 2-5 minutes and -> start/SYS to start the CDOM.
Once the CDOM system is back online please check the status of related services as mentioned above and make sure ldom/agent service is online before making ldmd service online. Validate service status.
At this stage, you will be able to run ldm list commands. That`s it about what to do to recover from if ldm command hangs in Solaris CDOM.
I hope you will find this relevant especially if the ldm command hangs in your Solaris CDOM. Please share as much as possible in your circle and help this article reach the intended audience.