Introdução
Quando aplicamos patch no GRID na estratégia Out-Of-Place usando a opção -switchGiHome do script gridSetup.sh, desde que a operação de switch tenha concluído com sucesso, o caminho para realizarmos o rollback é executar um novo switchGiHome a partir do GRID Home original.
Este post demonstra o que podemos fazer quando o processo de switchGiHome apresenta falha e não conseguimos iniciar o CRS em nenhum dos GRID Home (original e novo). O foco não é analisar a causa do erro, mas como tentar reestabelecer o ambiente rapidamente, evitando comprometer a janela de manutenção com throubleshooting com o CRS offline.
Problema
Durante um teste de out-of-place patching no GRID usando a opção -switchGiHome do gridSetup.sh, a operação apresentou erro e deixou todos os serviços do CRS fora do ar no node em questão:
[root@rac01 ~]# cat /u01/app/product/19.0.0.0/grid_home2/install/root_rac01_2023-09-22_20-24-23-823200734.logPerforming root user operation.The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /u01/app/product/19.0.0.0/grid_home2 Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ...Entries will be added to the /etc/oratab file as needed byDatabase Configuration Assistant when a database is createdFinished running generic part of root script.Now product-specific root actions will be performed.Relinking oracle with rac_on optionLD_LIBRARY_PATH='/u01/app/product/19.0.0.0/grid/lib:/u01/app/product/19.0.0.0/grid_home2/lib:'Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/rootcrs_rac01_2023-09-22_08-24-38PM.logUsing configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/crs_prepatch_apply_oop_rac01_2023-09-22_08-24-38PM.log2023/09/22 20:24:47 CLSRSC-347: Successfully unlock /u01/app/product/19.0.0.0/grid_home22023/09/22 20:24:47 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2023-09-22_08-24-48PM.logOracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].2023/09/22 20:25:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'CRS-4123: Starting Oracle High Availability Services-managed resourcesCRS-41053: checking Oracle Grid Infrastructure for file permission issuesCRS-4124: Oracle High Availability Services startup failed.CRS-4000: Command Start failed, or completed with errors.2023/09/22 20:30:22 CLSRSC-117: Failed to start Oracle Clusterware stack from the Grid Infrastructure home /u01/app/product/19.0.0.0/grid_home2Died at /u01/app/product/19.0.0.0/grid_home2/crs/install/crspatch.pm line 1911.The command '/u01/app/product/19.0.0.0/grid_home2/perl/bin/perl -I/u01/app/product/19.0.0.0/grid_home2/perl/lib -I/u01/app/product/19.0.0.0/grid_home2/crs/install /u01/app/product/19.0.0.0/grid_home2/crs/install/rootcrs.pl -dstcrshome /u01/app/product/19.0.0.0/grid_home2 -postpatch' execution failed
[root@rac01 ~]# cat /u01/app/product/19.0.0.0/grid_home2/install/root_rac01_2023-09-22_20-24-23-823200734.logPerforming root user operation.The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /u01/app/product/19.0.0.0/grid_home2 Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ...Entries will be added to the /etc/oratab file as needed byDatabase Configuration Assistant when a database is createdFinished running generic part of root script.Now product-specific root actions will be performed.Relinking oracle with rac_on optionLD_LIBRARY_PATH='/u01/app/product/19.0.0.0/grid/lib:/u01/app/product/19.0.0.0/grid_home2/lib:'Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/rootcrs_rac01_2023-09-22_08-24-38PM.logUsing configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/crs_prepatch_apply_oop_rac01_2023-09-22_08-24-38PM.log2023/09/22 20:24:47 CLSRSC-347: Successfully unlock /u01/app/product/19.0.0.0/grid_home22023/09/22 20:24:47 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2023-09-22_08-24-48PM.logOracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].2023/09/22 20:25:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'CRS-4123: Starting Oracle High Availability Services-managed resourcesCRS-41053: checking Oracle Grid Infrastructure for file permission issuesCRS-4124: Oracle High Availability Services startup failed.CRS-4000: Command Start failed, or completed with errors.2023/09/22 20:30:22 CLSRSC-117: Failed to start Oracle Clusterware stack from the Grid Infrastructure home /u01/app/product/19.0.0.0/grid_home2Died at /u01/app/product/19.0.0.0/grid_home2/crs/install/crspatch.pm line 1911.The command '/u01/app/product/19.0.0.0/grid_home2/perl/bin/perl -I/u01/app/product/19.0.0.0/grid_home2/perl/lib -I/u01/app/product/19.0.0.0/grid_home2/crs/install /u01/app/product/19.0.0.0/grid_home2/crs/install/rootcrs.pl -dstcrshome /u01/app/product/19.0.0.0/grid_home2 -postpatch' execution failed
Todos os serviços ficaram foram do ar:
[root@rac01 ~]# ps -ef | grep d.binroot 31146 22451 0 20:32 pts/1 00:00:00 grep --color=auto d.bin
Uma tentativa de iniciar o CRS usando o GI Home original apresentava falha:
[root@rac01 ~]# . oraenv <<< +ASM1[root@rac01 ~]# crsctl start crsCRS-6706: Oracle Clusterware Release patch level ('3998055650') does not match Software patch level ('1725261545'). Oracle Clusterware cannot be started.CRS-4000: Command Start failed, or completed with errors.
Uma tentativa de rodar o switchGiHome novamente para o GI Home original (rollback natural em caso de patch com sucesso) apresentava falha porque o CRS não estava online no node:
[grid@rac01 ~]$ $ORACLE_HOME/gridSetup.sh -silent -switchGridHome oracle.install.option=CRS_SWONLY ORACLE_HOME=$ORACLE_HOME oracle.install.crs.config.clusterNodes=rac01Launching Oracle Grid Infrastructure Setup Wizard...[FATAL] [INS-45101] Clusterware is not running on the local node. ACTION: Ensure that the Clusterware is configured and is running on local node before proceeding.
Solução
1) Com usuário root, defina a variável ORACLE_HOME para o GI Home original:
export ORACLE_HOME=/u01/app/product/19.0.0.0/grid
2) Execute o clscfg com a opção -localpatch
$ORACLE_HOME/bin/clscfg -localpatch
3) Execute o script rootcrs.sh com a opção -lock:
$ORACLE_HOME/crs/install/rootcrs.sh -lock
4) Verifique o CRS Home configurado no arquivo olr.loc:
cat /etc/oracle/olr.loc
Se este arquivo ainda estiver apontando para o GRID Home que apresentou falha durante o switchGiHome, edite o arquivo colocando o GRID Home original para o qual fez rollback. A configuração original pré patch pode ser recuperada do arquivo olr.loc.orig.
4.1) Ajustando o arquivo olr.loc:
cat /etc/oracle/olr.loc.orig > /etc/oracle/olr.loc
4.2) OPCIONAL: Verifique a integridade do OLR:
$ORACLE_HOME/bin/ocrcheck -local
5) Inicie o CRS normalmente:
$ORACLE_HOME/bin/crsctl start crs
Exemplo:
[root@rac01 ~]# export ORACLE_HOME=/u01/app/product/19.0.0.0/grid[root@rac01 ~]# $ORACLE_HOME/bin/clscfg -localpatchclscfg: EXISTING configuration version 0 detected.Creating OCR keys for user 'root', privgrp 'root'..Operation successful.[root@rac01 ~]#[root@rac01 ~]# $ORACLE_HOME/crs/install/rootcrs.sh -lockUsing configuration parameter file: /u01/app/product/19.0.0.0/grid/crs/install/crsconfig_paramsThe log of current session can be found at: /u01/app/grid/crsdata/rac01/crsconfig/crslock_rac01_2023-09-23_03-41-42PM.log2023/09/23 15:41:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'[root@rac01 ~]#[root@rac01 ~]# crsctl start crsCRS-4123: Oracle High Availability Services has been started.
Checando:
[root@rac01 ~]# ps -ef | grep d.binroot 8458 1 3 15:41 ? 00:00:03 /u01/app/product/19.0.0.0/grid/bin/ohasd.bin rebootroot 8552 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/orarootagent.bingrid 8659 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bingrid 8703 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/mdnsd.bingrid 8705 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/evmd.bingrid 8924 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/gpnpd.bingrid 8989 8705 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/evmlogger.bin -o /u01/app/product/19.0.0.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/product/19.0.0.0/grid/log/[HOSTNAME]/evmd/evmlogger.loggrid 9256 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/gipcd.binroot 11348 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/cssdmonitorroot 11350 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/osysmond.binroot 11381 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/cssdagentgrid 11398 1 2 15:42 ? 00:00:01 /u01/app/product/19.0.0.0/grid/bin/ocssd.binroot 11558 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/octssd.bin rebootroot 11597 1 6 15:42 ? 00:00:02 /u01/app/product/19.0.0.0/grid/bin/crsd.bin rebootroot 11670 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/orarootagent.binroot 11785 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/ologgerd -Mgrid 11809 1 3 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bingrid 11976 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inheritgrid 12061 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inheritgrid 12147 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inheritoracle 12505 1 4 15:43 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bin[root@rac01 ~]#[root@rac01 ~]# crsctl check crsCRS-4638: Oracle High Availability Services is onlineCRS-4537: Cluster Ready Services is onlineCRS-4529: Cluster Synchronization Services is onlineCRS-4533: Event Manager is online[root@rac01 ~]#[root@rac01 ~]# ps -ef | grep pmongrid 12323 1 0 15:42 ? 00:00:00 asm_pmon_+ASM1grid 12588 1 0 15:43 ? 00:00:00 apx_pmon_+APX1oracle 12664 1 0 15:43 ? 00:00:00 ora_pmon_cdbsec1oracle 12670 1 0 15:43 ? 00:00:00 ora_pmon_orcl1
Cluster em modo ROLLING PATCH
Se este foi o único node com tentativa de patch e após voltar ao GI Home todos os nodes estão na mesma versão, podemos parar o modo ROLLING PATCH:
$ORACLE_HOME/bin/crsctl query crs activeversion -f $ORACLE_HOME/bin/crsctl stop rollingpatch $ORACLE_HOME/bin/crsctl query crs activeversion -f
Exemplo:
[root@rac01 ~]# $ORACLE_HOME/bin/crsctl query crs activeversion -fOracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3998055650].[root@rac01 ~]#[root@rac01 ~]# $ORACLE_HOME/bin/crsctl stop rollingpatchCRS-1161: The cluster was successfully patched to patch level [3998055650].[root@rac01 ~]#[root@rac01 ~]# $ORACLE_HOME/bin/crsctl query crs activeversion -fOracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].
Feito isso, você deve conseguir realizar uma nova tentativa do processo ou refazer a instalação do novo GRID Home se for necessário.