Introdução

Quando aplicamos patch no GRID na estratégia Out-Of-Place usando a opção -switchGiHome do script gridSetup.sh, desde que a operação de switch tenha concluído com sucesso, o caminho para realizarmos o rollback é executar um novo switchGiHome a partir do GRID Home original.

Este post demonstra o que podemos fazer quando o processo de switchGiHome apresenta falha e não conseguimos iniciar o CRS em nenhum dos GRID Home (original e novo). O foco não é analisar a causa do erro, mas como tentar reestabelecer o ambiente rapidamente, evitando comprometer a janela de manutenção com throubleshooting com o CRS offline.

Problema

Durante um teste de out-of-place patching no GRID usando a opção -switchGiHome do gridSetup.sh, a operação apresentou erro e deixou todos os serviços do CRS fora do ar no node em questão:

[root@rac01 ~]# cat /u01/app/product/19.0.0.0/grid_home2/install/root_rac01_2023-09-22_20-24-23-823200734.log
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/product/19.0.0.0/grid_home2
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
LD_LIBRARY_PATH='/u01/app/product/19.0.0.0/grid/lib:/u01/app/product/19.0.0.0/grid_home2/lib:'
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/rootcrs_rac01_2023-09-22_08-24-38PM.log
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/crs_prepatch_apply_oop_rac01_2023-09-22_08-24-38PM.log
2023/09/22 20:24:47 CLSRSC-347: Successfully unlock /u01/app/product/19.0.0.0/grid_home2
2023/09/22 20:24:47 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2023-09-22_08-24-48PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].
2023/09/22 20:25:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
2023/09/22 20:30:22 CLSRSC-117: Failed to start Oracle Clusterware stack from the Grid Infrastructure home /u01/app/product/19.0.0.0/grid_home2
Died at /u01/app/product/19.0.0.0/grid_home2/crs/install/crspatch.pm line 1911.
The command '/u01/app/product/19.0.0.0/grid_home2/perl/bin/perl -I/u01/app/product/19.0.0.0/grid_home2/perl/lib -I/u01/app/product/19.0.0.0/grid_home2/crs/install /u01/app/product/19.0.0.0/grid_home2/crs/install/rootcrs.pl -dstcrshome /u01/app/product/19.0.0.0/grid_home2 -postpatch' execution failed
[root@rac01 ~]# cat /u01/app/product/19.0.0.0/grid_home2/install/root_rac01_2023-09-22_20-24-23-823200734.log
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/product/19.0.0.0/grid_home2
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
LD_LIBRARY_PATH='/u01/app/product/19.0.0.0/grid/lib:/u01/app/product/19.0.0.0/grid_home2/lib:'
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/rootcrs_rac01_2023-09-22_08-24-38PM.log
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/crs_prepatch_apply_oop_rac01_2023-09-22_08-24-38PM.log
2023/09/22 20:24:47 CLSRSC-347: Successfully unlock /u01/app/product/19.0.0.0/grid_home2
2023/09/22 20:24:47 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.
Using configuration parameter file: /u01/app/product/19.0.0.0/grid_home2/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2023-09-22_08-24-48PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].
2023/09/22 20:25:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
2023/09/22 20:30:22 CLSRSC-117: Failed to start Oracle Clusterware stack from the Grid Infrastructure home /u01/app/product/19.0.0.0/grid_home2
Died at /u01/app/product/19.0.0.0/grid_home2/crs/install/crspatch.pm line 1911.
The command '/u01/app/product/19.0.0.0/grid_home2/perl/bin/perl -I/u01/app/product/19.0.0.0/grid_home2/perl/lib -I/u01/app/product/19.0.0.0/grid_home2/crs/install /u01/app/product/19.0.0.0/grid_home2/crs/install/rootcrs.pl -dstcrshome /u01/app/product/19.0.0.0/grid_home2 -postpatch' execution failed

Todos os serviços ficaram foram do ar:

[root@rac01 ~]# ps -ef | grep d.bin
root 31146 22451 0 20:32 pts/1 00:00:00 grep --color=auto d.bin

Uma tentativa de iniciar o CRS usando o GI Home original apresentava falha:

[root@rac01 ~]# . oraenv <<< +ASM1
[root@rac01 ~]# crsctl start crs
CRS-6706: Oracle Clusterware Release patch level ('3998055650') does not match Software patch level ('1725261545'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

Uma tentativa de rodar o switchGiHome novamente para o GI Home original (rollback natural em caso de patch com sucesso) apresentava falha porque o CRS não estava online no node:

[grid@rac01 ~]$ $ORACLE_HOME/gridSetup.sh -silent -switchGridHome oracle.install.option=CRS_SWONLY ORACLE_HOME=$ORACLE_HOME oracle.install.crs.config.clusterNodes=rac01
Launching Oracle Grid Infrastructure Setup Wizard...
[FATAL] [INS-45101] Clusterware is not running on the local node.
ACTION: Ensure that the Clusterware is configured and is running on local node before proceeding.

Solução

1) Com usuário root, defina a variável ORACLE_HOME para o GI Home original:

export ORACLE_HOME=/u01/app/product/19.0.0.0/grid

2) Execute o clscfg com a opção -localpatch

$ORACLE_HOME/bin/clscfg -localpatch

3) Execute o script rootcrs.sh com a opção -lock:

$ORACLE_HOME/crs/install/rootcrs.sh -lock

4) Verifique o CRS Home configurado no arquivo olr.loc:

cat /etc/oracle/olr.loc

Se este arquivo ainda estiver apontando para o GRID Home que apresentou falha durante o switchGiHome, edite o arquivo colocando o GRID Home original para o qual fez rollback. A configuração original pré patch pode ser recuperada do arquivo olr.loc.orig.

4.1) Ajustando o arquivo olr.loc:

cat /etc/oracle/olr.loc.orig > /etc/oracle/olr.loc

4.2) OPCIONAL: Verifique a integridade do OLR:

$ORACLE_HOME/bin/ocrcheck -local

5) Inicie o CRS normalmente:

$ORACLE_HOME/bin/crsctl start crs

Exemplo:

[root@rac01 ~]# export ORACLE_HOME=/u01/app/product/19.0.0.0/grid
[root@rac01 ~]# $ORACLE_HOME/bin/clscfg -localpatch
clscfg: EXISTING configuration version 0 detected.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
[root@rac01 ~]#
[root@rac01 ~]# $ORACLE_HOME/crs/install/rootcrs.sh -lock
Using configuration parameter file: /u01/app/product/19.0.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/rac01/crsconfig/crslock_rac01_2023-09-23_03-41-42PM.log
2023/09/23 15:41:43 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
[root@rac01 ~]#
[root@rac01 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

Checando:

[root@rac01 ~]# ps -ef | grep d.bin
root 8458 1 3 15:41 ? 00:00:03 /u01/app/product/19.0.0.0/grid/bin/ohasd.bin reboot
root 8552 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/orarootagent.bin
grid 8659 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bin
grid 8703 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/mdnsd.bin
grid 8705 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/evmd.bin
grid 8924 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/gpnpd.bin
grid 8989 8705 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/evmlogger.bin -o /u01/app/product/19.0.0.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/product/19.0.0.0/grid/log/[HOSTNAME]/evmd/evmlogger.log
grid 9256 1 0 15:41 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/gipcd.bin
root 11348 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/cssdmonitor
root 11350 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/osysmond.bin
root 11381 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/cssdagent
grid 11398 1 2 15:42 ? 00:00:01 /u01/app/product/19.0.0.0/grid/bin/ocssd.bin
root 11558 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/octssd.bin reboot
root 11597 1 6 15:42 ? 00:00:02 /u01/app/product/19.0.0.0/grid/bin/crsd.bin reboot
root 11670 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/orarootagent.bin
root 11785 1 1 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/ologgerd -M
grid 11809 1 3 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bin
grid 11976 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid 12061 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
grid 12147 1 0 15:42 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
oracle 12505 1 4 15:43 ? 00:00:00 /u01/app/product/19.0.0.0/grid/bin/oraagent.bin
[root@rac01 ~]#
[root@rac01 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@rac01 ~]#
[root@rac01 ~]# ps -ef | grep pmon
grid 12323 1 0 15:42 ? 00:00:00 asm_pmon_+ASM1
grid 12588 1 0 15:43 ? 00:00:00 apx_pmon_+APX1
oracle 12664 1 0 15:43 ? 00:00:00 ora_pmon_cdbsec1
oracle 12670 1 0 15:43 ? 00:00:00 ora_pmon_orcl1

Cluster em modo ROLLING PATCH

Se este foi o único node com tentativa de patch e após voltar ao GI Home todos os nodes estão na mesma versão, podemos parar o modo ROLLING PATCH:

$ORACLE_HOME/bin/crsctl query crs activeversion -f
$ORACLE_HOME/bin/crsctl stop rollingpatch
$ORACLE_HOME/bin/crsctl query crs activeversion -f

Exemplo:

[root@rac01 ~]# $ORACLE_HOME/bin/crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3998055650].
[root@rac01 ~]#
[root@rac01 ~]# $ORACLE_HOME/bin/crsctl stop rollingpatch
CRS-1161: The cluster was successfully patched to patch level [3998055650].
[root@rac01 ~]#
[root@rac01 ~]# $ORACLE_HOME/bin/crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3998055650].

Feito isso, você deve conseguir realizar uma nova tentativa do processo ou refazer a instalação do novo GRID Home se for necessário.

Leave a Reply

Discover more from Blog do Dibiei

Subscribe now to keep reading and get access to the full archive.

Continue reading