Como Controlar a Ordem e Adicionar Pausa Entre os Nodes do RAC Atualizados com Fleet Patching and Provisioning (FPP)

Introdução

Quando iniciamos uma operação de MOVE em um Oracle RAC, o FPP pode não começar pelo node onde está executando o FPP Client atualmente. Por exemplo, em um RAC de 2 nodes, se o FPP Client estiver ativo no Node 1, o FPP pode começar o move reiniciando o node 2 primeiro, depois realizar a realocação do FPP Client para o Node 2 que já foi reiniciado, e finalizar a operação relizando o restart do Node 1, onde o FPP Client estava ativo inicialmente.

Para casos de uso onde é desejável ter uma ordem específica e controlada na qual os nodes são reiniciados, o FPP possui o recurso de organizar os nodes do Oracle RAC em batches, o que permite termos mais controle sobre a ordem, ou a quantidade de nodes que são atualizados paralelamente.

Por exemplo: Em um RAC de 4 nodes podemos ter 4 batches com 1 node em cada batch, apenas para ter controle da ordem e do momento em que cada node é reiniciado, é útil quando precisamos garantir que o próximo node não tenha algum processo crítico em execução. Outro caso uso seria colocar 2 ou 3 nodes em cada batch para reduzir o tempo de janela de manutenção, pois os nodes que ficam dentro do mesmo grupo são reiniciados em paralelo.

Este post demonstra como utilizar este recurso em exemplos via linha de comando, assim como a opção de REST API.

Opções de Controle

-batches – Este parâmetro permite especificar os grupos colocando a lista de nodes separados por virgula dentro de cada batch. Um batch é representado por ().

rhpctl move gihome -batches "(node1,node2),(node3,node4)"

-chainbatches : Este parâmetro especifica que o FPP deve iniciar um batch após o outro sem aguardar por interação do usuário. Por padrão, a operação de MOVE irá pausar após a conclusão de cada batch, e aguardar uma interação do usuário para iniciar o próximo batch.

-pausebetweenbatches : Especifica que o JOB deve ficar com status PAUSED após a conclusão de cada batch, e permite que o usuário resuma o mesmo JOB para continuar a execução no próximo batch. Por padrão o FPP irá marcar o JOB como SUCCESSED após cada batch, e o usuário precisar criar um novo JOB para executar o próximo batch.

-continue / -revert / -abort : Quando há uma operação de move pausada, o FPP permite o usuário continuar para o próximo batch (-continue), fazer rollback da operação que já foi concluída (-revert) ou cancelar o move e deixar o ambiente do jeito que está (-abort).

Este post apresenta exemplos apenas do -continue, mas para utilizar as outras opções, basta substituir o nome do parâmetro, a sintaxe do comando é a mesma.

rhpctl move gihome -destwc WC_GI1922_cluster01 -continue
rhpctl move gihome -destwc WC_GI1922_cluster01 -revert
rhpctl move gihome -destwc WC_GI1922_cluster01 -abort

Determinando a Ordem do Move (Opção -batches)

Iniciando um move do Grid com a opção -batches (propositalmente começando pelo node 2):

[grid@fppserver ~]$ rhpctl move gihome -destwc WC_GI1922_cluster01 -batches "(rac02),(rac01)" -schedule NOW -sourcehome /u01/app/product/19.0.0.0/GI1921 -ignorewcpatches 
Operation "rhpctl move gihome" scheduled with the job ID "171".

O JOB finaliza com sucesso quando o primeiro batch é concluído, observe no detalhe a última linha apresentada no log:

[grid@fppserver ~]$ rhpctl query job -jobid 171
fppserver.dibiei.com: Audit ID: 1591
Job ID: 171
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -destwc WC_GI1922_cluster01 -batches (rac02),(rac01) -schedule NOW -sourcehome /u01/app/product/19.0.0.0/GI1921 -ignorewcpatches"
Scheduled job execution start time: 2024-04-09T19:05:16-03. Equivalent local time: 2024-04-09 19:05:16
Current status: SUCCEEDED
Result file path: "/fpp_images/chkbase/scheduled/job-171-2024-04-09-19:05:35.log"
Job execution start time: 2024-04-09 19:05:35
Job execution end time: 2024-04-09 19:13:36
Job execution elapsed time: 8 minutes 0 seconds

Result file "/fpp_images/chkbase/scheduled/job-171-2024-04-09-19:05:35.log" contents:
fppserver.dibiei.com: Audit ID: 1588
fppserver.dibiei.com: verifying versions of Oracle homes ...
fppserver.dibiei.com: verifying owners of Oracle homes ...
fppserver.dibiei.com: verifying groups of Oracle homes ...
fppserver.dibiei.com: Connecting to RHPC...
fppserver.dibiei.com: initiating sharedness check for the 'move gihome' operation on client cluster
fppserver.dibiei.com: completed the sharedness check for the source and destination Oracle home on client cluster
fppserver.dibiei.com: Connecting to RHPC...
rac01.dibiei.com: retrieving status of databases ...
rac01.dibiei.com: retrieving status of services of databases ...
rac01.dibiei.com: relocating services of databases ...
rac01.dibiei.com: stopping services of databases ...
rac01.dibiei.com: stopping instances of databases ...
rac01.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac02".
rac01.dibiei.com: Executing "prepatch" script on nodes [rac02].
rac01.dibiei.com: Successfully executed "prepatch" script on nodes [rac02].
rac01.dibiei.com: Executing "postpatch" script on nodes [rac02].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1922/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac02/crsconfig/crs_postpatch_apply_oop_rac02_2024-04-09_07-08-14PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3270192392].
2024/04/09 19:09:06 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3270192392].
2024/04/09 19:10:57 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:10:57 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:11:05 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:11:06 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
2024/04/09 19:11:06 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
rac01.dibiei.com: Successfully executed "postpatch" script on nodes [rac02].
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: starting instances of databases "orcl" ...
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2752 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-09_07-12-01PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2755 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-09_07-12-13PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Relocating RHPS or RHPC node to rac02

Continue by running 'rhpctl move gihome -destwc WC_GI1922_cluster01 -continue'

Neste estágio, a qualquer momento podemos executar o comando sugerido para prosseguir a operação de move no próximo grupo de servidores.

Criando um novo JOB executando o comando informado na conclusão do primeiro batch:

[grid@fppserver ~]$ rhpctl move gihome -destwc WC_GI1922_cluster01 -continue -schedule NOW
Operation "rhpctl move gihome" scheduled with the job ID "172".

Alguns minutos depois, esse JOB também concluiu com sucesso:

[grid@fppserver ~]$ rhpctl query job -jobid 172
fppserver.dibiei.com: Audit ID: 1593
Job ID: 172
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -destwc WC_GI1922_cluster01 -continue -schedule NOW"
Scheduled job execution start time: 2024-04-09T19:19:24-03. Equivalent local time: 2024-04-09 19:19:24
Current status: SUCCEEDED
Result file path: "/fpp_images/chkbase/scheduled/job-172-2024-04-09-19:19:35.log"
Job execution start time: 2024-04-09 19:19:35
Job execution end time: 2024-04-09 19:26:20
Job execution elapsed time: 6 minutes 44 seconds

Result file "/fpp_images/chkbase/scheduled/job-172-2024-04-09-19:19:35.log" contents:
fppserver.dibiei.com: Audit ID: 1592
fppserver.dibiei.com: Connecting to RHPC...
rac02.dibiei.com: retrieving status of databases ...
rac02.dibiei.com: relocating services of databases ...
rac02.dibiei.com: stopping services of databases ...
rac02.dibiei.com: stopping instances of databases ...
rac02.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac01".
rac02.dibiei.com: Executing "prepatch" script on nodes [rac01].
rac02.dibiei.com: Successfully executed "prepatch" script on nodes [rac01].
rac02.dibiei.com: Executing "postpatch" script on nodes [rac01].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1922/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2024-04-09_07-21-34PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3270192392].
2024/04/09 19:22:30 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [658638121].
2024/04/09 19:24:53 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:24:53 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:25:00 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
2024/04/09 19:25:00 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
2024/04/09 19:25:03 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
rac02.dibiei.com: Successfully executed "postpatch" script on nodes [rac01].
rac02.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac02/rhp/conf/rhp.pref
rac02.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac02/rhp/conf/rhp.pref
rac02.dibiei.com: starting instances of databases "orcl" ...
rac02.dibiei.com: Updating inventory on nodes: rac01.
========================================
rac02.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2873 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-09_07-25-39PM.log
'UpdateNodeList' was successful.
rac02.dibiei.com: Updated inventory on nodes: rac01.
rac02.dibiei.com: Updating inventory on nodes: rac01.
========================================
rac02.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2878 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-09_07-25-50PM.log
'UpdateNodeList' was successful.
rac02.dibiei.com: Updated inventory on nodes: rac01.
rac02.dibiei.com: completed the move of Oracle Grid Infrastructure home on client cluster "cluster01"
rac02.dibiei.com: Completed the 'move gihome' operation on cluster cluster01.

Por ser o último batch, a operação de move é considerada concluída. Em casos de uso com mais batches, seria necessário executar o comando “rhpctl move gihome …. -continue” repetidamente, até chegar no último batch da lista informada no comando de move inicial.

DICA: Se você quiser apenas controlar a ordem dos nodes e não ter uma pausa com interação manual entre cada batch, adicione o parâmetro -chainbatches

Adicionando uma Pausa no JOB Atual (-pausebetweenbatches)

Funcionalmente falando, o parâmetro “-pausebetweenbatches” não muda muita coisa em relação ao exemplo anterior, pois a diferença é que desta forma o FPP pausa o JOB atual e permite a utilização do comando “rhpctl resume job” para prosseguir com a execução nos próximos nodes, ao invés de tornar necessário executar o comando de move novamente com a opção “-continue“, que na prática acabada gerando um novo JOB.

Para uso com linha de comando, essa é uma opção que eu acho mais interessante por simplificar o comando para resumir a operação, e manter tudo em um único JOB, que por sua vez resulta em um único arquivo de log consolidado para todo o move.

Exemplo:

[grid@fppserver ~]$ rhpctl move gihome -sourcehome /u01/app/product/19.0.0.0/GI1921 -destwc WC_GI1922_cluster01 -ignorewcpatches -batches "(rac02),(rac01)" -schedule NOW -pausebetweenbatches
Operation "rhpctl move gihome" scheduled with the job ID "166".

Consultando o status do JOB 166, neste momento está com status EXECUTING e já está executando o “prepatch” no node rac02:

[grid@fppserver ~]$ rhpctl query job -jobid 166
fppserver.dibiei.com: Audit ID: 1567
Job ID: 166
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -sourcehome /u01/app/product/19.0.0.0/GI1921 -destwc WC_GI1922_cluster01 -ignorewcpatches -batches (rac02),(rac01) -schedule NOW -pausebetweenbatches"
Scheduled job execution start time: 2024-04-08T23:26:43-03. Equivalent local time: 2024-04-08 23:26:43
Current status: EXECUTING
Result file path: "/fpp_images/chkbase/scheduled/job-166-2024-04-08-23:27:07.log"
Job execution start time: 2024-04-08 23:27:07

Result file "/fpp_images/chkbase/scheduled/job-166-2024-04-08-23:27:07.log" contents:
fppserver.dibiei.com: Audit ID: 1566
fppserver.dibiei.com: verifying versions of Oracle homes ...
fppserver.dibiei.com: verifying owners of Oracle homes ...
fppserver.dibiei.com: verifying groups of Oracle homes ...
fppserver.dibiei.com: Connecting to RHPC...
fppserver.dibiei.com: initiating sharedness check for the 'move gihome' operation on client cluster
fppserver.dibiei.com: completed the sharedness check for the source and destination Oracle home on client cluster
fppserver.dibiei.com: Connecting to RHPC...
rac01.dibiei.com: retrieving status of databases ...
rac01.dibiei.com: retrieving status of services of databases ...
rac01.dibiei.com: relocating services of databases ...
rac01.dibiei.com: stopping services of databases ...
rac01.dibiei.com: stopping instances of databases ...
rac01.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac02".
rac01.dibiei.com: Executing "prepatch" script on nodes [rac02].

Após alguns minutos o Move no node rac02 foi concluído. O JOB agora está com status “PAUSED” e a última informação no LOG é a indicação para executar o comando que faz o FPP resumir a operação do move que está pausada:

[grid@fppserver ~]$ rhpctl query job -jobid 166
fppserver.dibiei.com: Audit ID: 1568
Job ID: 166
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -sourcehome /u01/app/product/19.0.0.0/GI1921 -destwc WC_GI1922_cluster01 -ignorewcpatches -batches (rac02),(rac01) -schedule NOW -pausebetweenbatches"
Scheduled job execution start time: 2024-04-08T23:26:43-03. Equivalent local time: 2024-04-08 23:26:43
Current status: PAUSED
Result file path: "/fpp_images/chkbase/scheduled/job-166-2024-04-08-23:27:07.log"
Job execution start time: 2024-04-08 23:27:07
Job execution end time: 2024-04-08 23:35:19
Job execution elapsed time: 8 minutes 12 seconds

Result file "/fpp_images/chkbase/scheduled/job-166-2024-04-08-23:27:07.log" contents:
fppserver.dibiei.com: Audit ID: 1566
fppserver.dibiei.com: verifying versions of Oracle homes ...
fppserver.dibiei.com: verifying owners of Oracle homes ...
fppserver.dibiei.com: verifying groups of Oracle homes ...
fppserver.dibiei.com: Connecting to RHPC...
fppserver.dibiei.com: initiating sharedness check for the 'move gihome' operation on client cluster
fppserver.dibiei.com: completed the sharedness check for the source and destination Oracle home on client cluster
fppserver.dibiei.com: Connecting to RHPC...
rac01.dibiei.com: retrieving status of databases ...
rac01.dibiei.com: retrieving status of services of databases ...
rac01.dibiei.com: relocating services of databases ...
rac01.dibiei.com: stopping services of databases ...
rac01.dibiei.com: stopping instances of databases ...
rac01.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac02".
rac01.dibiei.com: Executing "prepatch" script on nodes [rac02].
rac01.dibiei.com: Successfully executed "prepatch" script on nodes [rac02].
rac01.dibiei.com: Executing "postpatch" script on nodes [rac02].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1922/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac02/crsconfig/crs_postpatch_apply_oop_rac02_2024-04-08_11-30-06PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3270192392].
2024/04/08 23:31:00 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3270192392].
2024/04/08 23:32:58 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:32:58 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:33:06 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:33:06 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
2024/04/08 23:33:07 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
rac01.dibiei.com: Successfully executed "postpatch" script on nodes [rac02].
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: starting instances of databases "orcl" ...
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2701 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-33-47PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2709 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-34-00PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Relocating RHPS or RHPC node to rac02

Continue by running 'rhpctl move gihome -destwc WC_GI1922_cluster01 -continue'

Para resumir o processo e avançar para o próximo node, eu prefiro resumir o JOB atual com o comando “rhpctl resume job” ao invés de executar o comando “rhpctl move gihome -destwc <wc_name> -continue” manualmente.

Exemplo resumindo o JOB 166 no FPP:

[grid@fppserver ~]$ rhpctl resume job -jobid 166
fppserver.dibiei.com: Audit ID: 1569

Agora ao consultar o mesmo JOB novamente ele deve ter voltado para o status EXECUTING. Para acompanhar o progresso em tempo real, também podemos usar um “tail” no arquivo de log que aparece nas primeiras linhas quando consultamos o status de um JOB.

[grid@fppserver ~]$ tail -100f "/fpp_images/chkbase/scheduled/job-166-2024-04-08-23:27:07.log"
fppserver.dibiei.com: Audit ID: 1566
fppserver.dibiei.com: verifying versions of Oracle homes ...
fppserver.dibiei.com: verifying owners of Oracle homes ...
fppserver.dibiei.com: verifying groups of Oracle homes ...
fppserver.dibiei.com: Connecting to RHPC...
fppserver.dibiei.com: initiating sharedness check for the 'move gihome' operation on client cluster
fppserver.dibiei.com: completed the sharedness check for the source and destination Oracle home on client cluster
fppserver.dibiei.com: Connecting to RHPC...
rac01.dibiei.com: retrieving status of databases ...
rac01.dibiei.com: retrieving status of services of databases ...
rac01.dibiei.com: relocating services of databases ...
rac01.dibiei.com: stopping services of databases ...
rac01.dibiei.com: stopping instances of databases ...
rac01.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac02".
rac01.dibiei.com: Executing "prepatch" script on nodes [rac02].
rac01.dibiei.com: Successfully executed "prepatch" script on nodes [rac02].
rac01.dibiei.com: Executing "postpatch" script on nodes [rac02].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1922/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac02/crsconfig/crs_postpatch_apply_oop_rac02_2024-04-08_11-30-06PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3270192392].
2024/04/08 23:31:00 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3270192392].
2024/04/08 23:32:58 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:32:58 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:33:06 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:33:06 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
2024/04/08 23:33:07 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
rac01.dibiei.com: Successfully executed "postpatch" script on nodes [rac02].
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac01/rhp/conf/rhp.pref
rac01.dibiei.com: starting instances of databases "orcl" ...
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2701 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-33-47PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Updating inventory on nodes: rac02.
========================================
rac01.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2709 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-34-00PM.log
'UpdateNodeList' was successful.
rac01.dibiei.com: Updated inventory on nodes: rac02.
rac01.dibiei.com: Relocating RHPS or RHPC node to rac02

Continue by running 'rhpctl move gihome -destwc WC_GI1922_cluster01 -continue'
fppserver.dibiei.com: Connecting to RHPC...
rac02.dibiei.com: retrieving status of databases ...
rac02.dibiei.com: relocating services of databases ...
rac02.dibiei.com: stopping services of databases ...
rac02.dibiei.com: stopping instances of databases ...
rac02.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac01".
rac02.dibiei.com: Executing "prepatch" script on nodes [rac01].
rac02.dibiei.com: Successfully executed "prepatch" script on nodes [rac01].
rac02.dibiei.com: Executing "postpatch" script on nodes [rac01].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1922/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac01/crsconfig/crs_postpatch_apply_oop_rac01_2024-04-08_11-41-44PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3270192392].
2024/04/08 23:42:40 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [658638121].
2024/04/08 23:45:27 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:45:27 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:45:35 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
2024/04/08 23:45:35 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
2024/04/08 23:45:37 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
rac02.dibiei.com: Successfully executed "postpatch" script on nodes [rac01].
rac02.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac02/rhp/conf/rhp.pref
rac02.dibiei.com: Processing arguments file /u01/app/grid/crsdata/rac02/rhp/conf/rhp.pref
rac02.dibiei.com: starting instances of databases "orcl" ...
rac02.dibiei.com: Updating inventory on nodes: rac01.
========================================
rac02.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2895 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-46-17PM.log
'UpdateNodeList' was successful.
rac02.dibiei.com: Updated inventory on nodes: rac01.
rac02.dibiei.com: Updating inventory on nodes: rac01.
========================================
rac02.dibiei.com:
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 2899 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2024-04-08_11-46-29PM.log
'UpdateNodeList' was successful.
rac02.dibiei.com: Updated inventory on nodes: rac01.
rac02.dibiei.com: completed the move of Oracle Grid Infrastructure home on client cluster "cluster01"
rac02.dibiei.com: Completed the 'move gihome' operation on cluster cluster01.

Observe que após a linha “Continue by running ‘rhpctl move gihome -destwc WC_GI1922_cluster01 -continue'”, o FPP seguiu a execução do Move no próximo node.

REST API

Se você quiser saber mais sobre o recurso de REST API do FPP, veja o post abaixo:

Usando o Oracle Fleet Patching and Provisioning (FPP) com REST API

A interface de REST API ainda não suporta todos os parâmetros que conseguimos usar na linha de comando com rhpctl. Para este caso de uso, eu não identifiquei na documentação os atributos correspondentes para os parâmetros “-pausebetweenbatches” e “-chainbatches“. No entanto, tentei utilizar mesmo assim e observei que essas opções já existem na API e funcionam corretamente.

Para obter o comportamento do parâmetro “-chainbatches“, adicione o atributo “chainBatches” com valor TRUE no JSON. Para obter o comportamento do parâmetro “-pausebetweenbatches“, adicione o atributo “pauseBetweenBatches” com valor true no JSON.

API ainda não tem uma opção de resumir um JOB, sendo possível apenas consular, modificar ou deletá-lo. Sendo assim, não conseguimos obter o mesmo resultado demonstrado anteriormente de resumir a operação de MOVE reaproveitando o JOB original através da API, apesar de ser possível dar continuidade ao mesmo JOB via linha de comando.

Para continuar o MOVE via API, precisamos executar uma nova requisição “PATCH” com um JSON Body simplificado contendo apenas o atributo “continue” com valor “true”, o atributo “schedule” é opcional. Isso vai criar um novo JOB, como no primeiro exemplo, o lado negativo disso é que o primeiro JOB fica com status PAUSED indefinidamente.

Exemplo do JSON utilizado para realizar o MOVE do Grid, onde o FPP irá iniciar o próximo batch automaticamente sem intervenção do usuário:

{
    "sourceHome": "/u01/app/product/19.0.0.0/GI1922",
    "ignorewcpatches": "true",
    "batches": "(rac01),(rac02)",
    "chainBatches": "true",
    "eval": "false",
    "schedule": "NOW"
}

Exemplo do JSON forçando uma pausa e passando o controle do início do próximo batch para o usuário:

{
    "sourceHome": "/u01/app/product/19.0.0.0/GI1922",
    "ignorewcpatches": "true",
    "batches": "(rac01),(rac02)",
    "pauseBetweenBatches": "true",
    "eval": "false",
    "schedule": "NOW"
}

URL Usada para a operação PATCH:

https://fppserver:8894/rhp-restapi/rhp/gihome/<DEST WC NAME>

Exemplo chamando a execução com Postman:

A operação gerou o JOB 169.

Consultando o status do JOB 169 no FPP Server, observe que ele apresenta exatamente o mesmo comportamento demostrado no primeiro exemplo:

[grid@fppserver ~]$ rhpctl query job -jobid 169
fppserver.dibiei.com: Audit ID: 1582
Job ID: 169
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -ignorewcpatches -batches (rac01),(rac02) -destwc WC_GI1921_CLUSTER01_REST -json -sourcehome /u01/app/product/19.0.0.0/GI1922 -schedule NOW"
Scheduled job execution start time: 2024-04-09T18:38:32-03. Equivalent local time: 2024-04-09 18:38:32
Current status: SUCCEEDED
Result file path: "/fpp_images/chkbase/scheduled/job-169-2024-04-09-18:38:34.log"
Job execution start time: 2024-04-09 18:38:34

Result file "/fpp_images/chkbase/scheduled/job-169-2024-04-09-18:38:34.log" contents:
....
....
....
rac02.dibiei.com: Updated inventory on nodes: rac01.
rac02.dibiei.com: Relocating RHPS or RHPC node to rac01

Continue by running 'rhpctl move gihome -destwc WC_GI1921_CLUSTER01_REST -continue'

A partir deste ponto podemos dar o comando para continuação do Move via REST API, realizando uma nova requisição PATCH na mesma URL. O corpo da requisição só precisa do atributo “continue”, eu adiciono o “schedule” apenas por padronização:

{
    "schedule": "NOW",
    "continue": true
}

Exemplo:

Com isso o FPP cria um novo JOB na sequência, neste caso foi o JOB 170:

[grid@fppserver ~]$ rhpctl query job -jobid 170
fppserver.dibiei.com: Audit ID: 1585
Job ID: 170
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -destwc WC_GI1921_CLUSTER01_REST -json -schedule NOW -continue"
Scheduled job execution start time: 2024-04-09T18:53:23-03. Equivalent local time: 2024-04-09 18:53:23
Current status: EXECUTING
Result file path: "/fpp_images/chkbase/scheduled/job-170-2024-04-09-18:53:34.log"
Job execution start time: 2024-04-09 18:53:34

Result file "/fpp_images/chkbase/scheduled/job-170-2024-04-09-18:53:34.log" contents:
fppserver.dibiei.com: Audit ID: 1584
fppserver.dibiei.com: Connecting to RHPC...
rac01.dibiei.com: retrieving status of databases ...
rac01.dibiei.com: relocating services of databases ...
rac01.dibiei.com: stopping services of databases ...
rac01.dibiei.com: stopping instances of databases ...
rac01.dibiei.com: Executing "prepatch" and "postpatch" scripts on nodes: "rac02".
rac01.dibiei.com: Executing "prepatch" script on nodes [rac02].
rac01.dibiei.com: Successfully executed "prepatch" script on nodes [rac02].
rac01.dibiei.com: Executing "postpatch" script on nodes [rac02].
Using configuration parameter file: /u01/app/product/19.0.0.0/GI1921/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/rac02/crsconfig/crs_postpatch_apply_oop_rac02_2024-04-09_06-55-29PM.log
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [658638121].
...
...

Ao final temos 2 JOBs concluídos com sucesso:

[grid@fppserver ~]$ rhpctl query job -jobid 169 -brief
fppserver.dibiei.com: Audit ID: 1589
Job ID: 169
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -ignorewcpatches -batches (rac01),(rac02) -destwc WC_GI1921_CLUSTER01_REST -json -sourcehome /u01/app/product/19.0.0.0/GI1922 -schedule NOW"
Scheduled job execution start time: 2024-04-09T18:38:32-03. Equivalent local time: 2024-04-09 18:38:32
Current status: SUCCEEDED
Result file path: "/fpp_images/chkbase/scheduled/job-169-2024-04-09-18:38:34.log"
Job execution start time: 2024-04-09 18:38:34
Job execution end time: 2024-04-09 18:47:36
Job execution elapsed time: 9 minutes 1 seconds

[grid@fppserver ~]$ rhpctl query job -jobid 170 -brief
fppserver.dibiei.com: Audit ID: 1590
Job ID: 170
User: grid
Client: cluster01
Scheduled job command: "rhpctl move gihome -destwc WC_GI1921_CLUSTER01_REST -json -schedule NOW -continue"
Scheduled job execution start time: 2024-04-09T18:53:23-03. Equivalent local time: 2024-04-09 18:53:23
Current status: SUCCEEDED
Result file path: "/fpp_images/chkbase/scheduled/job-170-2024-04-09-18:53:34.log"
Job execution start time: 2024-04-09 18:53:34
Job execution end time: 2024-04-09 19:01:23
Job execution elapsed time: 7 minutes 49 seconds

Conclusão

Este post apresentou apresentou a funcionalidade básica de batches nas operações de MOVE do FPP utilizando exemplos de linha de comando e REST API. Apesar de ter concentrado apenas em Move do GRID, a mesma funcionalidade pode ser utilizada em operações de Move de Database, ou até mesmo no Move combinado de GRID + DATABASE em uma única operação.

Blog do Dibiei

Como Controlar a Ordem e Adicionar Pausa Entre os Nodes do RAC Atualizados com Fleet Patching and Provisioning (FPP)

ByMaicon Carneiro

Introdução

Opções de Controle

Determinando a Ordem do Move (Opção -batches)

Adicionando uma Pausa no JOB Atual (-pausebetweenbatches)

REST API

Conclusão

Like this:

Related

By Maicon Carneiro

Related Post

Oracle Fleet Patching and Provisioning (FPP): Trabalhando Com User Actions

FPP: Comando “rhpctl move gihome” Falha com Erro “PRGO-1733 : No database in cluster “<client_name” is being moved.”

OEM 13.5 Fleet Maintenance / FPP Integration falha com erro “Cloud not find or load main class oracle.sysman.emInternalSDK.jobs.rest.SyncRestClient”

Leave a ReplyCancel reply

Outros posts

How to Update the Oracle Public Keys in the Secure Boot on Oracle Linux

GIRU 19.30 Patch Fails with “modprobe: ERROR: could not insert ‘oracleoks’: Operation not permitted” Followed by”CLSRSC-400: A system reboot is required to continue installing”

Clusterware Startup Fails with “CRS-1714: Unable to discover any voting files” After Upgrade the Kernel to UEK7 with ASMlib v3.1

Updating Oracle Enterprise Manager (OEM) from 13.5.0.21 to 13.5.0.29 Applying the OMS RU + Holistic Patch (JAN 2026) + Upgrading DB Client to 19c

ByMaicon Carneiro

Introdução

Opções de Controle

Determinando a Ordem do Move (Opção -batches)

Adicionando uma Pausa no JOB Atual (-pausebetweenbatches)

REST API

Conclusão

Compartilhe isso:

Like this:

Related

By Maicon Carneiro

Related Post

Leave a ReplyCancel reply

Outros posts

Discover more from Blog do Dibiei