當前位置:首頁 » 數據倉庫 » oracle資料庫壞塊
擴展閱讀
webinf下怎麼引入js 2023-08-31 21:54:13
堡壘機怎麼打開web 2023-08-31 21:54:11

oracle資料庫壞塊

發布時間: 2023-07-20 07:59:05

『壹』 遇到大規模oracle壞塊該怎麼處理

最近一兩個月,一直有場景化運維、場景化大數據分析的聲音圍繞在耳畔,以Gdevops全球敏捷運維峰會杭州站上新炬網路執行副總裁程永新的「 一切沒有場景驅動的運維平台建設都是假大空! 」最為振聾發聵。我們一直在談技術,談原理,談內核,總以為「懂了」這些的人,就勢必能廣闊天地大有所為。

技術固然重要,但偏離了業務/應用場景的技術,無法呈現業務價值的技術就非常不重要。

技術也應該是場景驅動的,對於運維技術人員來說,離開場景學習的所謂高深技術,只是浪費時間。所以新同事進入一個新團隊後,能使技術更好發揮作用的環境、流程的考核會占據了另外的三分之二。

今天來談談Oracle壞塊問題。 壞塊問題,相信做過兩三年Oracle維護、支持的DBA都會遇到過,即使從來沒遇到過的,看過Oracle 官方文檔的甚至是會度娘或者谷哥的,應該也知道基本的處理手段。

這是我內部分享的一個簡單思維導圖,如果有遺漏,歡迎在後面評論補充。

面試的時候通常是這么問的:你了解Oracle壞塊么?壞塊為什麼會產生?描述一下你處理過的壞塊案例細節?如果你負責的好幾個資料庫都突然發生了壞塊,你會怎麼做?

通常,前面幾個問題的回答都不會太差。但是最後一個問題的回答,鮮有能出眾的。原因在於面太窄,思維太窄。如果這個時候你還要一個個庫的去看alert日誌,那麼顯然就走錯了方向。

現實情況就是,我們的資料庫可能是五六十套,或者上百套,你一套套庫的去看這些日誌里的報錯的塊,再根據塊找到對象,確定是表還是索引,再去用壞塊的修復手段去修復…….那麼,所有人都會被你害了。

我們經歷過,花了兩天的時間,都沒修復完一個庫中幾萬個壞塊的情況,在其他大牛還在哼哧哼哧做恢復的時候,我向領導提議啟用了新的方案,在大老闆沒有完全失去耐性的情況下恢復了業務。

真正正確的做法是,如果確定壞塊數量為數眾多,趕緊停業務,切災備,後面再補數據。 災備是干什麼吃的,養庫千日,用庫一時,就在這個時候了!

非常可惜的是,大多數來面試的DBA會非常糾結於用block recover,還是用dbms_repair,還是用BBED,還是……

那麼,什麼時候往上申報,要切災備?

且不談許多公司的災備形同虛設,關鍵時候不敢切的事情。就算這些災備都是實實在在可用的,恐怕也不是說切立即就能切的,切災備涉及到應用、網路、主機、存儲等多方面的調整。

那麼多久應該切呢? 一般的企業從故障處理開始,預估2小時之內不能讓業務恢復正常運行的,應該申報切災備。當然,如果是金融行業,特別是證券基金行業,1分鍾之內故障還沒恢復,就要知會證監會,半個小時沒有恢復就會受到同行業通報,所以要切就應該在這個時間之內申報。

問題又回到了原點。你得先有規劃,作為企業的重要系統,你得先建設災備環境,而且是有效的災備,並且應該事先有一個災備切換預案。

作為DBA來說,動作敏捷的檢查資料庫的情況,並及時匯報非常重要。其實這里,又想說自動運維平台了。通過簡單的按鈕點點,就能快速知道告警日誌里的壞塊涉及什麼對象,是不是就好很多呢?

繼續往回看,面試的問題是什麼?是多個資料庫同時發現大量壞塊。

作為一個經驗豐富的運維管理人員,第一反應應該是,為什麼會同時發生呢?顯然是由於外因導致的。因此做好容災切換,業務恢復使用的第一時間,應該去看看這些資料庫共同的基礎是不是同樣的存儲、同樣的存儲管理軟體、卷管理軟體。

依我的經驗,大部分多個庫同時出現壞塊,都壞在存儲管理軟體身上。

有一次是Storage Foundation做卷復制的時候出現了軟體bug,IBM、Oracle、symentec等眾多廠家在「問題作戰室」里整整呆了一個月,各家公司二線、三線出具各種證據自己沒問題,最後最後才找到蛛絲馬跡,揪出來。

還有一次稍微容易些,存儲軟體狀態恢復後壞塊沒有恢復,一個個系統通過fsck命令來進行的修復。你說,這是什麼類型的壞塊呢?

作為有經驗的DBA,要解決問題,但不要急著去敲命令,站在更高點的位置來看待問題,可能會事半功倍。

很不幸的前兩天,某個朋友公司核心資料庫「莫名奇妙」地遇到中止了。原因不方便說,但是據說等故障恢復完之後,朋友已經抽了好幾包煙了。

我們先來看看,是由於LGWR終止了資料庫( 註:做了一些脫敏處理 )。

但是,重啟資料庫卻發現資料庫啟動不了,發現眾多數據文件發生了壞塊,資料庫根本不能open:

同時伴隨著類似的內部錯誤:

怎麼破?通過當天的資料庫備份結合歸檔進行恢復,遠遠比去修復壞塊要快。

『貳』 Oracle資料庫打不開 該怎麼辦我們公司的oracle資料庫壞了 打不開了,該如何處理

如果自己搞不定可以找詩檀軟體專業ORACLE資料庫修復團隊成員幫您恢復!

詩檀軟體專業資料庫修復團隊

Oracle的損壞/壞塊 主要分以下幾種:

ORA-1578
ORA-8103
ORA-1410
ORA-1499
ORA-1578
ORA-81##
ORA-14##
ORA-26040
ORA-600 Errors
Block Corruption
Index Corruption
Row Corruption
UNDO Corruption
Control File
Consistent Read
Dictionary
File/RDBA/BL

Error Description Corruption related to:
ORA-1578 ORA-1578一般為Oracle檢測到存在物理壞塊問題,包括其檢測數據塊中的checksum不正確,或者tail_chk信息不正確等。 ORA-1578 is reported when a block is thought to be corrupt on read.
Block
數據塊

OERR: ORA-1578 「ORACLE data block corrupted (file # %s, block # %s)」 Master Note
OERR: ORA-1578 「ORACLE data block corrupted (file # %s, block # %s)」

Fractured Block explanation

Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g/11g
Diagnosing and Resolving 1578 reported on a Local Index of a Partitioned table
ORA-1410
ORA-1410錯誤常見於從INDEX或其他途徑獲得的ROWID,到數據表中查詢發現沒有對應的記錄。
該錯誤可能因為數據表與其索引存在不一致,也可能是分區的數據表本身存在問題。
This error is raised when an operation refers to a ROWID in a table for which there is no such row.
The reference to a ROWID may be implicit from a WHERE CURRENT OF clause or directly from a WHERE ROWID=… clause.
ORA 1410 indicates the ROWID is for a BLOCK that is not part of this table.
Row
數據行

Understanding The ORA-1410
Summary Of Bugs Containing ORA 1410
OERR: ORA 1410 「invalid ROWID」
ORA-8103
該ORA-8103可能由多個BUG引起,例如LOB在10.2.0.4之前可能會由於BUG覆蓋了另一張表的segment header,導致出現ORA-8103錯誤。
診斷該問題可以從數據表的segment header和data_object_id入手。
The object has been deleted by another user since the operation began.
If the error is reprocible, following may be the reasons:-
a.) The header block has an invalid block type.
b.) The data_object_id (seg/obj) stored in the block is different than the data_object_id stored in the segment header. See dba_objects.data_object_id and compare it to the decimal value stored in the block (field seg/obj).
Block
數據塊

ORA-8103 Troubleshooting, Diagnostic and Solution
OERR: ORA-8103 「object no longer exists」 / Troubleshooting, Diagnostic and Solution
ORA-8102 ORA-8102常見於索引鍵值與表上存的值不一致。 An ORA-08102 indicates that there is a mismatch between the key(s) stored in the index and the values stored in the table. What typically happens is the index is built and at some future time, some type of corruption occurs, either in the table or index, to cause the mismatch.
Index
索引

OERR ORA-8102 「index key not found, obj# %s, file %s, block %s (%s)
ORA-1499 對表和索引做交叉驗證時發現問題 An error occurred when validating an index or a table using the ANALYZE command.
One or more entries does not point to the appropriate cross-reference.
Index
索引

ORA-1499. Table/Index row count mismatch
OERR: ORA-1499 table/Index Cross Reference Failure – see trace file
ORA-1498 Generally this is a result of an ANALYZE … VALIDATE … command.
This error generally manifests itself when there is inconsistency in the data/Index block. Some of the block check errors that may be found:-
a.) Row locked by a non-existent transaction
b.) The amount of space used is not equal to block size
c.) Transaction header lock count mismatch.
While support are processing the tracefile it may be worth the re-running the ANALYZE after restarting the database to help show if the corruption is consistent or if it 『moves』.
Send the tracefile to support for analysis.
If the ANALYZE was against an index you should check the whole object. Eg: Find the tablename and execute:
ANALYZE TABLE xxx VALIDATE STRUCTURE CASCADE; Block
OERR: ORA 1498 「block check failure – see trace file」
ORA-26040 由於採用過nologging/unrecoverable選項的redo生成機制,且做過對應的recover,導致數據塊中被填滿了0XFF,導致報錯ORA-26040。 Trying to access data in block that was loaded without redo generation using the NOLOGGING/UNRECOVERABLE option.
This Error raises always together with ORA-1578
Block
數據塊

OERR ORA-26040 Data block was loaded using the NOLOGGING option
ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution
ORA-1578 ORA-26040 in a LOB segment – Script to solve the errors
ORA-1578 ORA-26040 in 11g for DIRECT PATH with NOARCHIVELOG even if LOGGING is enabled
ORA-1578 ORA-26040 On Awr Table
Errors ORA-01578, ORA-26040 On Standby Database
Workflow Tables ORA-01578 ORACLE data block corrupted ORA-26040 Data block was loaded using the NOLOGGING option
ORA-1578, ORA-26040 Data block was loaded using the NOLOGGING option
ORA-600[12700]
從索引獲得的ROWID,對應到數據表時發現不存在數據行錯誤。
一把是一致性度consistent read問題
Oracle is trying to access a row using its ROWID, which has been obtained from an index.
A mismatch was found between the index rowid and the data block it is pointing to. The rowid points to a non-existent row in the data block. The corruption can be in data and/or index blocks.
ORA-600 [12700] can also be reported e to a consistent read (CR) problem.
Consistent Read
一致性讀

Resolving an ORA-600 [12700] error in Oracle 8 and above.
ORA-600 [12700] 「Index entry Points to Missing ROWID」
ORA-600[3020] 主要問題是redo和數據塊中的信息不一致 This is called a 『STUCK RECOVERY』.
There is an inconsistency between the information stored in the redo and the information stored in a database block being recovered. Redo
ORA-600 [3020] 「Stuck Recovery」
Information Required for Root Cause Analysis of ORA-600 [3020] (stuck recovery)
ORA-600[4194] 主要是redo記錄與回滾rollback/undo的記錄不一致 A mismatch has been detected between Redo records and rollback (Undo) records.
We are validating the Undo record number relating to the change being applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails. Undo
ORA-600 [4194] 「Undo Record Number Mismatch While Adding Undo Record」
Basic Steps to be Followed While Solving ORA-00600 [4194]/[4193] Errors Without Using Unsupported parameter
ORA-600[4193] 主要是redo記錄與回滾rollback/undo的記錄不一致 A mismatch has been detected between Redo records and Rollback (Undo) records.
We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.
This error is reported when this validation fails. Undo
ORA-600 [4193] 「seq# mismatch while adding undo record」
Basic Steps to be Followed While Solving ORA-00600 [4194]/[4193] Errors Without Using Unsupported parameter
Ora-600 [4193] When Opening Or Shutting Down A Database
ORA-600 [4193] When Trying To Open The Database
ORA-600[4137] transaction id不匹配,問題可能存在與回滾段中或者對象本身存在訛誤 While backing out an undo record (i.e. at the time of rollback) we found a transaction id mis-match indicating either a corruption in the rollback segment or corruption in an object which the rollback segment is trying to apply undo records on.
This would indicate a corrupted rollback segment. Undo/Redo
ORA-600 [4137] 「XID in Undo and Redo Does Not Match」
ORA-600[6101] Not enough free space was found when inserting a row into an index leaf block ring the application of undo. Index
ORA-600 [6101] 「insert into leaf block (undo)」
ORA-600[2103] Oracle is attempting to read or update a generic entry in the control file.
If the entry number is invalid, ORA-600 [2130] is logged. Control File
ORA-600 [2130] 「Attempt to access non-existant controlfile entry」
ORA-600[4512] Oracle is checking the status of transaction locks within a block.
If the lock number is greater than the number of lock entries, ORA-600 [4512] is reported followed by a stack trace, process state and block mp.
This error possibly indicates a block corruption. Block
ORA-600 [4512] 「Lock count mismatch」
ORA-600[2662] 主要是發現一個數據塊的SCN甚至超過了當前SCN,常規解決途徑有調整SCN等,但11.2以後Oracle公司使較多調整SCN的方法失效了 A data block SCN is ahead of the current SCN.
The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN stored in a UGA variable.
If the SCN is less than the dependent SCN then we signal the ORA-600 [2662] internal error. Block
ORA-600 [2662] 「Block SCN is ahead of Current SCN」
ORA 600 [2662] DURING STARTUP
ORA-600[4097] 訪問一個回滾段頭以便確認事務是否已提交時,發現XID有問題 We are accessing a rollback segment header to see if a transaction has been committed.
However, the xid given is in the future of the transaction table.
This could be e to a rollback segment corruption issue OR you might be hitting the following known problem. Undo