Introdução

No mês de setembro/2023 a Oracle documentou na nota “Exadata Critical Issues (Doc ID 1270094.1)” mais um problema crítico do Exadata que pode afetar as versões X8 e X8M com discos de 14 TB:

Documento com mais detalhes sobre o BUG:

(EX80) X8/X8M Storage Servers experience high rates of 14TB disk drive failure, which can cause ASM disk group loss (Doc ID 2974254.1)

Na breve descrição do problema o suporte relata um cenário de 3 falhas seguidas em 3 storages diferentes em intervalo de 10 minutos:

Uma condição como essa pode ser o suficiente para ocasionar a perda total de um diskgroup do ASM com redundância HIGH.

Os Exadatas que administro estão na versão 22.1.10 e este problema só é corrigido na versão 22.1.15. Como a próxima atualização planejada do ambiente só deve ocorrer daqui alguns meses, optamos por aplicar apenas o patch específico para correção deste BUG.

Também quando iniciei o planejamento para aplicar a correção, ainda não tinha o patch 35650189 disponível para a versão 22.1.10 e precisei abrir uma SR no suporte solicitando o mesmo.

Neste post pretendo demonstrar os passos realizados e o comportamento esperado em cada fase para aplicar essa correção em um Exadata X8M com 6 Storage Cell na versão 22.1.10, procedimento executado em modo rolling, sem indisponibilidade para o banco de dados. Note que este visa compartilhar uma experiência e um exemplo concluído com êxito, mas você deve ler e seguir as instruções descritas no README.txt fornecido com o patch.

Também é recomendável abrir uma SR Proativa no suporte da Oracle para acompanhamento da atividade. Para esse tipo de chamado, o suporte solicita que seja aberto com pelo menos uma semana antes da data programada para a manutenção.

Preparação (Antes da Janela)

O patch foi baixado no diretório /tmp/ex80 em um dos DB Nodes. O conteúdo do patch é um RPM para ser instalado no SO dos Storage Cell:

[root@dbnode01]# cd /tmp/ex80
[root@dbnode01 ex80]# unzip p35650189_2211000_Linux-x86-64.zip
Archive:  p35650189_2211000_Linux-x86-64.zip
  inflating: exadata-firmware-cell-22.1.10.0.0.230919-1-rpm.bin
  inflating: exadata-firmware-cell-22.1.10.0.0.230422-1-rpm.bin
  inflating: README.txt

Junto com o RPM que contém a atualização necessária, com sufixo “230919-1” de 19/09/2023, também é fornecido um segundo RPM com sufixo “230422-1” de 22/04/2023 para uso apenas em caso de rollback. Note que as datas variam de acordo a versão da image, neste caso a versão 22.10 é de Abril/23.

Criando um diretório de STAGE para o patch em todos os Storage nodes:

[root@dbnode01]# dcli -g ~/cell_group -l root 'mkdir -p /var/log/exadatatmp/SAVE_patch_35650189'

Copiando o RPM para o diretório de STAGE em todos os Storage nodes:

dcli -g ~/cell_group -l root -f /tmp/ex80/exadata-firmware-cell-22.1.10.0.0.230919-1-rpm.bin -d /var/log/exadatatmp/SAVE_patch_35650189

Checando que todos os Storages estão usando apenas base release sem nenhum interim patch:

[root@dbnode01 ~]# dcli -g ~/cell_group -l root 'cellcli -e list cell attributes releaseTrackingBug'
exacell01: 35277259
exacell02: 35277259
exacell03: 35277259
exacell04: 35277259
exacell05: 35277259
exacell06: 35277259

Se comando acima retornar mais de um número por Storage Cell, indica a presenção de algum interim patch instalado no ambiente, e seria necessário abrir uma SR no suporte antes de seguir com a instalação desse patch.

Neste caso, todos estão apenas com o número referente ao release update 22.1.10 mesmo:

Patch 35277259: EXADATA RELEASE UPDATE 22.1.10.0.0 (MOS NOTE 2929852.1)

Com o grid owner, na instância ASM, checando a configuração do tempo de tolerência para discos offline:

col attribute format a30
col value format a10
select dg.name as diskgroup, a.name as attribute, a.value
from v$asm_diskgroup dg, v$asm_attribute a
where dg.group_number=a.group_number
and (a.name like '%repair_time' or a.name = 'compatible.asm');  

DISKGROUP                      ATTRIBUTE                      VALUE
------------------------------ ------------------------------ ----------
DATAC1                         disk_repair_time               12H
DATAC1                         failgroup_repair_time          24.0h
DATAC1                         compatible.asm                 19.0.0.0.0
RECOC1                         disk_repair_time               12H
RECOC1                         failgroup_repair_time          24.0h
RECOC1                         compatible.asm                 19.0.0.0.0

README: “Note that during a rolling storage server update disk groups with compatible.asm >= 12.1.0.2.0 will use the value of failgroup_repair_time, and disk groups with compatible.asm < 12.1.0.2.0 will use the value of disk_repair_time.

Neste caso com compatible.asm setado na versão 19.0.0.0, o tempo de tolerência para restart de um Storage Cell será o failgroup_repair_time que está setado para 24h.

Apenas para ter um baseline de algum eventual impacto após a atualização das controladoras de disco, eu chequei a quantidade de erros em todos os discos utilizando o script cell-status.sh de Fred Denis.

[root@dbnode01 ex80]# ~/cell-status.sh -c ~/cell_group

                Cluster is a X8M-2 Elastic Rack HC 14TB

     Cell Disks     |         FlashDisk        |         HardDisk         |           PMEM           |
                    |   Nb   | Normal | Errors |   Nb   | Normal | Errors |   Nb   | Normal | Errors |
------------------------------------------------------------------------------------------------------
   exacell01   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell02   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell03   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell04   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell05   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell06   |    4   |    4   |    0   |   12   |   12   |    0   |   12   |   12   |    0   |
------------------------------------------------------------------------------------------------------


     Grid Disks     |          DATAC1          |          RECOC1          |
                    |   Nb   | Online | Errors |   Nb   | Online | Errors |
---------------------------------------------------------------------------
   exacell01   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell02   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell03   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell04   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell05   |   12   |   12   |    0   |   12   |   12   |    0   |
   exacell06   |   12   |   12   |    0   |   12   |   12   |    0   |
---------------------------------------------------------------------------
 --  : Unused disks | xx  : Not ONLINE disks   |     : asmDeactivationOutcome is NOT yes

Também listei todos os alertas existentes nos Storages (neste caso, nenhum):

[root@dbnode01 ex80]# dcli -g ~/cell_group -l root 'cellcli -e list alerthistory'

Aqui se tiver algum alerta ainda não mapeado, recomendo abrir uma SR no suporte para a devida correção antes de seguir com uma atualização.

Instalando o Patch

1) Checando a versão atual do firmeware antes da atualização:

[root@exacell01 ~]# /opt/oracle.SupportTools/CheckHWnFWProfile -action list -component | grep -i DiskFirmwareVersion
    <DiskFirmwareVersion FIRMWARE_ID="252:11" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:10" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:9" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:8" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:7" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:6" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:5" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:4" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:3" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:2" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:1" VALUE="A146"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:0" VALUE="A146"/>

2) Checando o status dos griddisk antes de colocá-los em modo OFFLINE:

[root@exacell01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATAC1_CD_00_exacell01     ONLINE  Yes
         DATAC1_CD_01_exacell01     ONLINE  Yes
         DATAC1_CD_02_exacell01     ONLINE  Yes
         DATAC1_CD_03_exacell01     ONLINE  Yes
         DATAC1_CD_04_exacell01     ONLINE  Yes
         DATAC1_CD_05_exacell01     ONLINE  Yes
         DATAC1_CD_06_exacell01     ONLINE  Yes
         DATAC1_CD_07_exacell01     ONLINE  Yes
         DATAC1_CD_08_exacell01     ONLINE  Yes
         DATAC1_CD_09_exacell01     ONLINE  Yes
         DATAC1_CD_10_exacell01     ONLINE  Yes
         DATAC1_CD_11_exacell01     ONLINE  Yes
         RECOC1_CD_00_exacell01     ONLINE  Yes
         RECOC1_CD_01_exacell01     ONLINE  Yes
         RECOC1_CD_02_exacell01     ONLINE  Yes
         RECOC1_CD_03_exacell01     ONLINE  Yes
         RECOC1_CD_04_exacell01     ONLINE  Yes
         RECOC1_CD_05_exacell01     ONLINE  Yes
         RECOC1_CD_06_exacell01     ONLINE  Yes
         RECOC1_CD_07_exacell01     ONLINE  Yes
         RECOC1_CD_08_exacell01     ONLINE  Yes
         RECOC1_CD_09_exacell01     ONLINE  Yes
         RECOC1_CD_10_exacell01     ONLINE  Yes
         RECOC1_CD_11_exacell01     ONLINE  Yes

3) Colocando todos os griddisk desse storage node em modo OFFLINE:

[root@exacell01 ~]# cellcli -e alter griddisk all inactive
GridDisk DATAC1_CD_00_exacell01 successfully altered
GridDisk DATAC1_CD_01_exacell01 successfully altered
GridDisk DATAC1_CD_02_exacell01 successfully altered
GridDisk DATAC1_CD_03_exacell01 successfully altered
GridDisk DATAC1_CD_04_exacell01 successfully altered
GridDisk DATAC1_CD_05_exacell01 successfully altered
GridDisk DATAC1_CD_06_exacell01 successfully altered
GridDisk DATAC1_CD_07_exacell01 successfully altered
GridDisk DATAC1_CD_08_exacell01 successfully altered
GridDisk DATAC1_CD_09_exacell01 successfully altered
GridDisk DATAC1_CD_10_exacell01 successfully altered
GridDisk DATAC1_CD_11_exacell01 successfully altered
GridDisk RECOC1_CD_00_exacell01 successfully altered
GridDisk RECOC1_CD_01_exacell01 successfully altered
GridDisk RECOC1_CD_02_exacell01 successfully altered
GridDisk RECOC1_CD_03_exacell01 successfully altered
GridDisk RECOC1_CD_04_exacell01 successfully altered
GridDisk RECOC1_CD_05_exacell01 successfully altered
GridDisk RECOC1_CD_06_exacell01 successfully altered
GridDisk RECOC1_CD_07_exacell01 successfully altered
GridDisk RECOC1_CD_08_exacell01 successfully altered
GridDisk RECOC1_CD_09_exacell01 successfully altered
GridDisk RECOC1_CD_10_exacell01 successfully altered
GridDisk RECOC1_CD_11_exacell01 successfully altered

4) Iniciando a instalação do RPM, basta executar o arquivo .bin copiado anteriormente:

[root@exacell01 ~]# /var/log/exadatatmp/SAVE_patch_35650189/exadata-firmware-cell-22.1.10.0.0.230919-1-rpm.bin
UnZipSFX 6.00 of 20 April 2009, by Info-ZIP (http://www.info-zip.org).
  inflating: exadata-firmware-cell-22.1.10.0.0.230919-1.noarch.rpm
Updating the following pacakge: exadata-firmware-cell
Performing pre-update checks..
RPM to patch/update: exadata-firmware-cell
Patch  RPM         : exadata-firmware-cell-22.1.10.0.0.230919-1.noarch.rpm
Updating exadata-firmware-cell....
Preparing...                          ################################# [100%]
Updating / installing...
   1:exadata-firmware-cell-22.1.10.0.0################################# [ 50%]
Cleaning up / removing...
   2:exadata-firmware-cell-22.1.10.0.0################################# [100%]
exadata-firmware-cell package updated successfully.
Updating bootable usb: /opt/oracle.cellos/iso/cellbits/exadatarpms.tbz
Successfully updated USB rescue location with new exadata-firmware-cell-22.1.10.0.0.230919-1.noarch.rpm
/root
Shutdown cell services...

Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
Updating Firmware for HardDisk. Do not interrupt...
Connection to exacell01 closed by remote host.
Connection to exacell01 closed.

Neste momento o storage server é reiniciado e a conexão ssh é perdida (normal).

5) OPCIONAL: Acompanhando o progresso do reboot via ILOM:

[root@dbnode01 ~]# ssh root@exacell01-ilom
Password:

Oracle(R) Integrated Lights Out Manager

Version 5.1.1.21 r150524

Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: exacell01-ilom

->
-> start /sp/console
Are you sure you want to start /SP/console (y/n)? y

Serial console started.  To stop, type ESC (
[7174621.136362] reboot: Power down

BOOT_TIME_EVENT: { "Phase" : "Early SEC", "Step" : "End", "msg" : "End of Early SEC phase" }


......

[  OK  ] Started Dump dmesg to /var/log/dmesg.
[  OK  ] Started Permit User Sessions.
[  OK  ] Started Software RAID monitoring and management.
[  OK  ] Started ACPI Event Daemon.
[  OK  ] Started Name Service Cache Daemon.
[  OK  ] Started Resets System Activity Logs.
[  OK  ] Started Login Service.
         Starting update of the root trust a... DNSSEC validation in unbound...
         Starting Terminate Plymouth Boot Screen...
         Starting Wait for Plymouth Boot Screen to Quit...
[  OK  ] Started Command Scheduler.
[  129.146154] preipconf[19414]: [INFO     ] /opt/oracle.cellos/cellFirstboot.sh started from preipconf service
[  129.215516] preipconf[19414]: [INFO     ] Wait for udev settles all devices and renames network interfaces
[  129.416871] preipconf[19414]: [INFO     ] Checking if devices in /etc/fstab are mounted
[  129.432130] preipconf[19414]: [INFO     ] Expected devices to be mounted: /dev/md24p14 /dev/md24p16 /dev/md24p3 /dev/md24p5 /dev/md24p11 /dev/md24p1 /dev/md24p12 /dev/md24p15 /dev/md24p7 /dev/md24p2 /dev/md24p17
[  129.457579] preipconf[19414]: [INFO     ] Actual devices mounted        : /dev/md24p5 /dev/md24p12 /dev/md24p1 /dev/md24p7 /dev/md24p16 /dev/md24p3 /dev/md24p17 /dev/md24p11 /dev/md24p15 /dev/md24p14 /dev/md24p2
[  130.439023] preipconf[19414]: [INFO     ] Exadata configuration file /opt/oracle.cellos/cell.conf already exists
[INFO     ] /opt/oracle.cellos/cellFirstboot.sh started from precell service
[INFO     ] Reassembling the MD devices
[INFO     ] Personalities : [raid0] [raid1]
md24 : active raid1 sdn[1] sdm[0]
      141557760 blocks super external:/md127/0 [2/2] [UU]

md25 : active (auto-read-only) raid1 sdm[1] sdn[0]
      4917248 blocks super external:/md127/1 [2/2] [UU]

md127 : inactive sdn[1](S) sdm[0](S)
      10402 blocks super external:imsm

md304 : active raid0 nvme4n1[0] nvme5n1[1]
      6251229184 blocks super 1.2 2048k chunks

md305 : active raid0 nvme6n1[0] nvme7n1[1]
      6251229184 blocks super 1.2 2048k chunks

md306 : active raid0 nvme0n1[0] nvme1n1[1]
      6251229184 blocks super 1.2 2048k chunks

md310 : active raid0 nvme3n1[1] nvme2n1[0]
      6251229184 blocks super 1.2 2048k chunks

unused devices: <none>
[INFO     ] Mount all local devices in /etc/fstab


exacell01 login:

Quando o reboot for concluído com sucesso, deve aparecer o prompt para login.

6) Logando no Storage cell novamente e checando a versão do firmeware após a atualização.

[root@exacell01 ~]# /opt/oracle.SupportTools/CheckHWnFWProfile -action list -component | grep -i DiskFirmwareVersion
    <DiskFirmwareVersion FIRMWARE_ID="252:11" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:10" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:9" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:8" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:7" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:6" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:5" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:4" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:3" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:2" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:1" VALUE="A2W0"/>
    <DiskFirmwareVersion FIRMWARE_ID="252:0" VALUE="A2W0"/>

7) Colocando todos os discos em modo ONLINE:

[root@exacell01 ~]# cellcli -e alter griddisk all active
GridDisk DATAC1_CD_00_exacell01 successfully altered
GridDisk DATAC1_CD_01_exacell01 successfully altered
GridDisk DATAC1_CD_02_exacell01 successfully altered
GridDisk DATAC1_CD_03_exacell01 successfully altered
GridDisk DATAC1_CD_04_exacell01 successfully altered
GridDisk DATAC1_CD_05_exacell01 successfully altered
GridDisk DATAC1_CD_06_exacell01 successfully altered
GridDisk DATAC1_CD_07_exacell01 successfully altered
GridDisk DATAC1_CD_08_exacell01 successfully altered
GridDisk DATAC1_CD_09_exacell01 successfully altered
GridDisk DATAC1_CD_10_exacell01 successfully altered
GridDisk DATAC1_CD_11_exacell01 successfully altered
GridDisk RECOC1_CD_00_exacell01 successfully altered
GridDisk RECOC1_CD_01_exacell01 successfully altered
GridDisk RECOC1_CD_02_exacell01 successfully altered
GridDisk RECOC1_CD_03_exacell01 successfully altered
GridDisk RECOC1_CD_04_exacell01 successfully altered
GridDisk RECOC1_CD_05_exacell01 successfully altered
GridDisk RECOC1_CD_06_exacell01 successfully altered
GridDisk RECOC1_CD_07_exacell01 successfully altered
GridDisk RECOC1_CD_08_exacell01 successfully altered
GridDisk RECOC1_CD_09_exacell01 successfully altered
GridDisk RECOC1_CD_10_exacell01 successfully altered
GridDisk RECOC1_CD_11_exacell01 successfully altered

8) Monitorando o status dos griddisk via cellcli (agurdar todos ficarem com status ONLINE):

[root@exacell01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATAC1_CD_00_exacell01     SYNCING         Yes
         DATAC1_CD_01_exacell01     SYNCING         Yes
         DATAC1_CD_02_exacell01     SYNCING         Yes
         DATAC1_CD_03_exacell01     SYNCING         Yes
         DATAC1_CD_04_exacell01     SYNCING         Yes
         DATAC1_CD_05_exacell01     SYNCING         Yes
         DATAC1_CD_06_exacell01     SYNCING         Yes
         DATAC1_CD_07_exacell01     SYNCING         Yes
         DATAC1_CD_08_exacell01     SYNCING         Yes
         DATAC1_CD_09_exacell01     SYNCING         Yes
         DATAC1_CD_10_exacell01     SYNCING         Yes
         DATAC1_CD_11_exacell01     SYNCING         Yes
         RECOC1_CD_00_exacell01     SYNCING         Yes
         RECOC1_CD_01_exacell01     SYNCING         Yes
         RECOC1_CD_02_exacell01     SYNCING         Yes
         RECOC1_CD_03_exacell01     SYNCING         Yes
         RECOC1_CD_04_exacell01     SYNCING         Yes
         RECOC1_CD_05_exacell01     SYNCING         Yes
         RECOC1_CD_06_exacell01     SYNCING         Yes
         RECOC1_CD_07_exacell01     SYNCING         Yes
         RECOC1_CD_08_exacell01     SYNCING         Yes
         RECOC1_CD_09_exacell01     SYNCING         Yes
         RECOC1_CD_10_exacell01     SYNCING         Yes
         RECOC1_CD_11_exacell01     SYNCING         Yes

Para acompanhar o progresso, podemos monitorar a operação de RESYNC no ASM a partir de um DB Node (script disponibilizado no final do post):

[grid@dbnode01]$ ./show_rebalance.sh

    Group# NAME          INST_ID OPERA PASS      STAT      POWER     ACTUAL      SOFAR   EST_WORK     % Done   EST_RATE EST_MINUTES
---------- ---------- ---------- ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ---------- -----------
         1 DATAC1              1 REBAL COMPACT   WAIT          8          8          0          0          0          0           0
         1 DATAC1              1 REBAL REBALANCE WAIT          8          8          0          0          0          0           0
         1 DATAC1              1 REBAL RESYNC    RUN           8          8      15898      74501      21.34       6558           8
         1 DATAC1              1 REBAL RESILVER  WAIT          8          8          0          0          0          0           0
         1 DATAC1              1 REBAL REBUILD   WAIT          8          8          0          0          0          0           0

RESYNC finalizado:

[grid@dbnode01]$ ./show_rebalance.sh

no rows selected

Confirmando que no Storage os gridisk ficaram com status ONLINE:

[root@exacell01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATAC1_CD_00_exacell01     ONLINE  Yes
         DATAC1_CD_01_exacell01     ONLINE  Yes
         DATAC1_CD_02_exacell01     ONLINE  Yes
         DATAC1_CD_03_exacell01     ONLINE  Yes
         DATAC1_CD_04_exacell01     ONLINE  Yes
         DATAC1_CD_05_exacell01     ONLINE  Yes
         DATAC1_CD_06_exacell01     ONLINE  Yes
         DATAC1_CD_07_exacell01     ONLINE  Yes
         DATAC1_CD_08_exacell01     ONLINE  Yes
         DATAC1_CD_09_exacell01     ONLINE  Yes
         DATAC1_CD_10_exacell01     ONLINE  Yes
         DATAC1_CD_11_exacell01     ONLINE  Yes
         RECOC1_CD_00_exacell01     ONLINE  Yes
         RECOC1_CD_01_exacell01     ONLINE  Yes
         RECOC1_CD_02_exacell01     ONLINE  Yes
         RECOC1_CD_03_exacell01     ONLINE  Yes
         RECOC1_CD_04_exacell01     ONLINE  Yes
         RECOC1_CD_05_exacell01     ONLINE  Yes
         RECOC1_CD_06_exacell01     ONLINE  Yes
         RECOC1_CD_07_exacell01     ONLINE  Yes
         RECOC1_CD_08_exacell01     ONLINE  Yes
         RECOC1_CD_09_exacell01     ONLINE  Yes
         RECOC1_CD_10_exacell01     ONLINE  Yes
         RECOC1_CD_11_exacell01     ONLINE  Yes

Tendo concluído a atualização do firmeware e resync do discos no Storage Cell 01, segui o mesmo procedimento para os demais Storages de forma sequencial até que todos fossem atualizados. O procedimento durou em média 30 minutos por Storage, totalizando 3 horas de atividade para atualizar os 6 Storages.

Script shell show_rebalance.sh para consultar status do RESYNC no ASM rapidamente:

export ORACLE_HOME=/u01/app/19.0.0.0/grid_home2
export ORACLE_SID=+ASM1
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus -s / as sysasm <<EOF
set lines 400
set pages 50
col name format a10
SELECT O.GROUP_NUMBER as "Group#",
       case when o.group_number = 1 then 'DATAC1' else 'RECOC1' end as NAME,
       O.INST_ID,
       O.OPERATION,
       O.PASS,
       O.STATE,
       O.POWER,
       O.ACTUAL,
       O.SOFAR,
       O.EST_WORK,
       ROUND(SOFAR/greatest(EST_WORK,1)*100,2) "% Done", EST_RATE,
       O.EST_MINUTES
FROM GV\$ASM_OPERATION O
WHERE EST_WORK >=0
ORDER BY GROUP_NUMBER;
quit;
EOF

Leave a Reply

Discover more from Blog do Dibiei

Subscribe now to keep reading and get access to the full archive.

Continue reading