C++Builder \| Delphi \| FireMonkey \| C/C++ \| Free Pascal \| Firebird

파이어버드 포럼

Q & A

FAQ

볼랜드포럼 홈

회원 메뉴

FireBird 강좌/문서

[12] FireBird/Interbase 데이터베이스가 깨졌을 때의 진단 및 복구 (업데이트)
[cbuilder]	72418 읽음 2005-09-02 05:30

Database Recovery.pdf 83.2KB PDF 포맷의 원문

데이터베이스가 깨졌을 때의 진단 및 복구

2006년 10월 3일 폴 비치(Paul Beach)에 의해 업데이트됨

많은 종류의 데이터베이스 깨짐은 gfix(alice) 및 gbak(burp)으로 복구될 수 있습니다. 하지만 일부 드문 경우에는 데이터베이스 파일이 이런 툴들이 복구할 수 없을 정도로 깨져버릴 수도 있습니다. 그런 경우에는 데이터베이스를 다시 동작하게 하기 위해 좀 더 강력한 수단이 필요할 수 있습니다. 여러분이 데이터베이스를 복구하려고 시도했다가 실패했다면, 우리에게 연락을 주시면 도울 수 있는 방법을 찾아보겠습니다.

파일이 깨지는 가장 흔한 원인은 데이터베이스 서버의 전원이 갑자기 나가는 경우입니다. 애플리케이션이 데이터베이스에 쓰기 작업을 하는 동안 파워가 내려지면 데이터베이스 파일이 깨지거나 불완전한 데이터가 쓰여질 수 있습니다. 모든 경우에 데이터베이스 사용자들과 데이터베이스 관리자는 이런 일이 발생하지 않도록 모든 가능한 예방 조치를 취해야 합니다.

인터베이스 서버는 동기 모드와 비동기 모드의 두가지 쓰기 모드(forced writes)가 있습니다. 인터베이스에서 6.0 미만의 버전에서는 디폴트 쓰기 모드가 동기 모드였습니다.

gfix -write sync database.gdb

인터베이스 6.0 이후에는 디폴트 쓰기 모드가 비동기 모드로 바뀌었습니다.

gfix -write async database.gdb

동기 쓰기는 "careful writes"라고도 불리는데, 이 모드에서 인터베이스는 트랜잭션이 커밋될 때 수정된 페이지들을 디스크에 먼저 써넣은 후, 그 페이지들을 다시 데이터베이스에 올바른 순서로(데이터베이스가 관련된 한은) 써넣기 때문에 데이터 손실을 최소화하게 됩니다. careful write는 모든 경우에 동작하지만, forced write 설정 없이는 운영체제의 파일 캐시 크기까지만 'careful'하게 됩니다. Forced write는 윈도우 3.1과 윈도우 95/98에서는 효과가 없습니다. 유닉스 및 NT에서 Forced write 설정은 운영체제가 자체적인 파일 캐시를 우회하여 페이지를 디스크로 직접적으로 보냅니다.

유닉스 시스템에서 표준은 forced write 설정이 꺼져 있는 상태인데, 이것은 forced write가 상당한 성능 비용이 요구되기 때문입니다. NT에서 표준은 forced write가 켜져 있는 상태인데, 그 이유는 중요한 페이지들을 믿고 맡기기에는 운영체제가 너무 얄팍하기 때문입니다. 인터베이스 6.0 미만 버전에서는 모든 I/O는 동기적이었는데, 이것은 하나의 페이지가 읽히거나 쓰여질 때 해당 쓰레드는 운영체제가 작업이 완료되었다고 알려줄 때까지 대기하는 것을 의미합니다. 이런 방식은 대부분의 경우 careful write를 해내는 적당한 방법이며, 모든 경우에 쓰기가 보장되는 방법입니다. 비동기 읽기는 인터베이스 6.0에서 새로 도입된 것으로, 읽기가 진행중인 동안에도 인터베이스 서버가 쓰레드를 대신하여 작업을 계속해줍니다. 보통 유닉스 시스템은 비동기 I/O를 지원하지 않으므로 이것은 NT에서만 선택 가능한 최적화 방법입니다. careful write는 글자 그대로입니다. forced write 설정은 사용자에 의해 꺼지거나 켜질 수 있습니다. 인터베이스 서버는 동기 및 비동기 I/O 사이에서 스스로 선택을 하며, 속도를 위해 신뢰도를 희생시키지는 않습니다.

운영체제가 자체 파일 캐시가 필요할 때 자동으로 디스크와 동기화시키도록 할 경우의 성능상의 이점으로 인해, 대부분의 사용자들이 보통 forced write를 꺼놓습니다. 만약 여러분이 비동기 쓰기를 이용하고 있다면, 최악의 상황에 대비하여 백업 전략을 신중히 고려해야 합니다.

데이터베이스 복구하기

인터베이스 6.0 미만의 버전에서는 서버 매니저가 IBConsole에서처럼 몇가지 데이터베이스 정정 기능을 제공했지만, 저는 깨진 데이터베이스를 복구하기 위해서는 더 많은 옵션과 유연성을 가진 컴맨드라인 gfix 유틸리티를 사용할 것을 권하고 싶습니다.

복구 가능한 데이터베이스 깨짐은 대부분의 경우 gfix를 이용하거나 gfix 및 gbak을 함께 사용함으로써 수정 가능합니다.

1. 아래의 두 변수를 정의하십시오. 이렇게 하면 명령을 실행할 때마다 유저 네임과 패스워드를 타이핑할 필요가 없어집니다.

SET ISC_USER=SYSDBA
SET ISC_PASSWORD=masterkey

2. 항상 실제 데이터베이스가 아니라 데이터베이스의 복사본으로 작업해야 합니다. 운영체제가 데이터베이스를 복사하도록 하십시오. 배타적인 액세스가 있어야 합니다.
copy employee.gdb database.gdb

3. 이제 데이터베이스가 깨졌는지 확인해봅니다. 배타적인 액세스가 필요하지만, 원래 데이터베이스의 복사본으로 작업하고 있기 때문에 문제가 없을 것입니다.

gfix -v -full database.gdb

4. 만약 이전 단계에서 데이터베이스에 문제가 있다고 알려지면 이제 수리(mend)하면 됩니다.

gfix -mend -full -ignore database.gdb

노트 :
아주 특이한 문제가 발생한 데이터베이스를 본 적이 있습니다. 테이블 하나가 사라졌고, 다시 생성하려고 하면 이미 존재한다고 에러를 내는 경우였습니다. 자세히 살펴보니, rdb$relations에 해당 테이블에 대한 레코드가 존재하지만 망가진(damaged) 것으로 표시되어 있었습니다. 사실, 잘못된 백 포인터(back pointer)가 있는 상태에서 레코드가 커밋되었고, 따라서 백 버전은 필요하지 않았습니다.

아마도 누군가가 먼저 그 데이터베이스에 gfix -mend 작업을 시도했던 것 같습니다. 그래서 백 포인터가 발견되었고 해당 레코드가 잘못된 것으로 표시되었습니다. 해당 데이터베이스 서버는 인터베이스 V5.6였는데, 이 망가진 레코드를 그냥 스킵해버립니다. 파이어버드도 마찬가지입니다.

gfix의 mend 옵션은 데이터를 날릴 수 있으므로 위험할 수 있습니다. 경우에 따라 많은 데이터를 날려버릴 수도 있습니다. 반드시 백업 및 리스토어를 할 수 없는 깨짐의 경우, 그리고 전체 데이터를 다 잃기보다는 일부 데이터라도 살려야 하는 경우에만 이 옵션을 사용하길 바랍니다.

gfix -mend를 실행한 직후에는 바로 백업 및 리스토어를 해야 합니다. gfix -mend는 데이터베이스를 믿을 만한 멀티 유저 상태로 만들어주지는 않기 때문입니다.

5. 이제 깨진 데이터베이스가 복구되었는지 확인해봅니다.

gfix -v -full database.gdb

6. 아직도 에러 메시지들이 보인다면, 이젠 전체 백업 및 리스토어를 해봐야 합니다. 가장 간단한 백업 명령 형태는 다음과 같습니다.

gbak -backup -v -ignore database.gdb database.gbk

7. gbak에서 가비지 컬렉션(garbage collection)에 문제가 있다면서 실패하면, 다음 명령을 써보십시오.

gbak -backup -v -ignore -garbage database.gdb database.gbk

8. 림보 트랜잭션에 의해 레코드 버전에 문제가 있을 경우, -limbo 스위치를 추가해야 합니다.

gbak -backup -v -ignore -garbage -limbo database.gdb database.gbk

9. 이제 백업으로부터 새 데이터베이스를 생성합니다.

gbak -create -v atlas.gbk atlas_new.gdb

10. 리스토어 과정에서 문제가 발생하면, 다음의 스위치를 사용할 것을 고려하십시오.

-inactive : 인덱스에 문제가 있는 경우, 이 스위치를 이용하면 데이터베이스를 리스토어할 수 있습니다. 하지만 모든 인덱스가 복구되지 않으므로 수작업으로 하나씩 작업해야 합니다.

-one_at_a_time : 이 스위치는 데이터베이스의 테이블들을 한번에 하나씩 리스토어하고 한 테이블이 리스토어될 때마다 커밋합니다. 중대한 문제가 있을 경우, 최소한 데이터의 일부라도 복구해야 할 경우 사용하십시오.

위의 절차가 효과가 없을 경우, QLI를 이용하여 데이터와 테이블 구조를 손상된 데이터베이스로부터 새로 생성한 데이터베이스로 이동시킬 것을 고려해보십시오.

1. 빈 데이터베이스를 생성하십시오.

2. 깨진 데이터베이스를 가리키도록 다음 내용(get_tables.sql)을 수정하십시오.

connect database.gdb user 'sysdba' password 'masterkey;

select 'define relation tgt.', rdb$relation_name,
 ' based on relation src.',  rdb$relation_name, ';'
from rdb$relations where rdb$relation_name
not starting with 'RDB$';
commit;

select 'tgt.', rdb$relation_name, ' = src.', 
rdb$relation_name, ';'
from rdb$relations where rdb$relation_name
not starting with 'RDB$';

3. 결과 파일을 수정하여 다음과 비슷하게 되도록 수정하십시오.

ready old.gdb as src;
ready new.gdb as tgt;

define relation tgt.COUNTRY
based on relation src.COUNTRY;
define relation tgt.JOB
based on relation src.JOB;
define relation tgt.DEPARTMENT
based on relation src.DEPARTMENT;
define relation tgt.EMPLOYEE
based on relation src.EMPLOYEE;
define relation tgt.PROJECT
based on relation src.PROJECT;
define relation tgt.PHONE_LIST
based on relation src.PHONE_LIST;
define relation tgt.EMPLOYEE_PROJECT
based on relation src.EMPLOYEE_PROJECT;
define relation tgt.CUSTOMER
based on relation src.CUSTOMER;
define relation tgt.SALES
based on relation src.SALES;
define relation tgt.PROJ_DEPT_BUDGET
based on relation src.PROJ_DEPT_BUDGET;
define relation tgt.SALARY_HISTORY
based on relation src.SALARY_HISTORY;

tgt.COUNTRY              = src.COUNTRY;
tgt.JOB                  = src.JOB;
tgt.DEPARTMENT           = src.DEPARTMENT;
tgt.EMPLOYEE             = src.EMPLOYEE;
tgt.PROJECT              = src.PROJECT;
tgt.PHONE_LIST           = src.PHONE_LIST;
tgt.EMPLOYEE_PROJECT     = src.EMPLOYEE_PROJECT;
tgt.CUSTOMER             = src.CUSTOMER;
tgt.SALES                = src.SALES;
tgt.PROJ_DEPT_BUDGET     = src.PROJ_DEPT_BUDGET;
tgt.SALARY_HISTORY       = src.SALARY_HISTORY;

4. 이제 QLI의 해당 버전을 인터베이스 bin 디렉토리에 설치하고 QLI를 기동시키고 move.sql 스크립트를 실행합니다.

QLI>@move.sql

gfix가 정확히 무슨 작업을 하고 있는지 이해하려면 다음 소스코드 일부를 보시면 자세한 설명이 되어 있으니 참고하시기 바랍니다. (이하 인용 코드는 번역하지 않음)

/*
 *	PROGRAM:	JRD Access Method
 *	MODULE:		val.c
 *	DESCRIPTION:	Validation and garbage collection
 *
 * copyright (c) 1985, 1997 by Borland International
 * copyright (c) 1999 by Inprise Corporation
 */

#ifdef INTERNAL_DOCUMENTATION
Database Validation and Repair
==============================

Deej Bredenberg March 16, 1994
Updated: 1996-Dec-11 David Schnepper 

I. TERMINOLOGY 

The following terminology will be helpful to understand in 
this discussion:

record fragment:The smallest recognizable piece of a record; 
multiple fragments can be linked together to form a single 
version.
record version: A single version of a record representing an 
INSERT, UPDATE or DELETE by a particular transaction (note 
that deletion of a record causes a new version to be stored as a
deleted stub).
record chain: A linked list of record versions chained together 
to represent a single logical "record".
slot: The line number of the record on page.  
A variable-length array on each data page stores the offsets 
to the stored records on that page, and the slot is an index 
into that array.

II. COMMAND OPTIONS

Here are all the options for gfix which have to do with 
validation, and what they do:

gfix switch   dpb parameter      
-----------   -------------      

-validate    isc_dpb_verify  (gds__dpb_verify prior to 4.0)

Invoke validation and repair.  All other switches modify this 
switch.

-full        isc_dpb_records       

Visit all records.  Without this switch, only page structures 
will be validated, which does involve some limited checking of 
records.     

-mend        isc_dpb_repair     

Attempts to mend the database where it can to make it viable 
for reading; does not guarantee to retain data.

-no_update   isc_dpb_no_update  

Specifies that orphan pages not be released, and allocated 
pages not be marked in use when found to be free.  Actually 
a misleading switch name since -mend will update the database, 
but if -mend is not specified and -no_update is specified, 
then no updates will occur to the database.

-ignore      isc_dpb_ignore

Tells the engine to ignore checksums in fetching pages.  
Validate will report on the checksums, however.  Should 
probably not even be a switch, it should just always be in 
effect.  Otherwise checksums will disrupt the validation.  
Customers should be advised to always use it.
NOTE: Unix 4.0 (ODS 8.0) does not have on-page checksums, 
and all platforms under ODS 9.0 do not have checksums.

III.  OPERATION

Validation runs only with exclusive access to the database, 
to ensure that database structures are not modified during 
validation.  On attach, validate attempts to obtain an exclusive 
lock on the database.

If other attachments are already made locally or through the 
same multi- client server, validate gives up with the message:

"Lock timeout during wait transaction
-- Object "database_filename.gdb" is in use"

If other processes or servers are attached to the database, 
validate  waits for the exclusive lock on the database 
(i.e. waits for every other server to get out of the database).

NOTE: Ordinarily when processes gain exclusive access to 
the database, all active transactions are marked as dead 
on the Transaction Inventory Pages.  This feature is turned 
off for validation.

IV. PHASES OF VALIDATION

There are two phases to the validation, the first of which 
is a walk through the entire database (described below).  
During this phase, all pages visited are stored in a bitmap for 
later use during the garbage collection phase.  


A. Visiting Pages

During the walk-through phase, any page that is fetched 
goes through a basic validation:

1. Page Type Check

Each page is check against its expected type.  If the wrong type
page is found in the page header, the message:

"Page xxx wrong type (expected xxx encountered xxx)"

is returned.  This could represent a) a problem with the database 
being overwritten, b) a bug with InterBase page allocation mechanisms 
in which one page was written over another, or c) a page which was 
allocated but never written to disk (most likely if the encountered
page type was 0).

The error does not tell you what page types are what, so here
they are for reference:

#define pag_undefined     0    /* purposely undefined */
#define pag_header        1    /* Database header page */
#define pag_pages         2    /* Page inventory page */
#define pag_transactions  3    /* Transaction inventory page */
#define pag_pointer       4    /* Pointer page */
#define pag_data          5    /* Data page */
#define pag_root          6    /* Index root page */
#define pag_index         7    /* Index (B-tree) page */
#define pag_blob          8    /* Blob data page */
#define pag_ids           9    /* Gen-ids */
#define pag_log           10   /* Write ahead log page: 4.0 only */

2. Checksum

If -ignore is specified, the checksum is specifically checked in
validate instead of in the engine.  If the checksum is found to 
be wrong, the error:

"Checksum error on page xxx"

is returned. This is harmless when found by validate, and the page
will still continue to be validated - if data structures can be 
validated on page, they will be.  If -mend is specified, the page 
will be marked for write, so that when the page is written to disk 
at the end of validation the checksum will automatically be 
recalculated.

Note: For 4.0 only Windows & NLM platforms keep page checksums.

3. Revisit

We check each page fetched against the page bitmap to make sure we
have not visited already.  If we have, the error:

"Page xxx doubly allocated"

is returned.  This should catch the case when a page of the same type 
is allocated for two different purposes.

Data pages are not checked with the Revisit mechanism - when walking
record chains and fragments they are frequently revisited.

B. Garbage Collection

During this phase, the Page Inventory (PIP) pages are checked against the
bitmap of pages visited.  Two types of errors can be detected during
this phase.

1. Orphan Pages

If any pages in the page inventory were not visited 
during validation, the following error will be returned:

"Page xxx is an orphan"

If -no_update was not specified, the page will be marked as free
on the PIP.

2. Improperly Freed Pages

If any pages marked free in the page inventory were in fact 
found to be in use during validation, the following error 
will be returned:

"Page xxx is use but marked free"  (sic)
    
If -no_update was not specified, the page will be marked in use
on the PIP.

NOTE:  If errors were found during the validation phase, no changes will
be made to the PIP pages.  This assumes that we did not have a chance to
visit all the pages because invalid structures were detected.

V. WALK-THROUGH PHASE

A. Page Fetching

In order to ensure that all pages are fetched during validation, the
following pages are fetched just for the most basic validation:

1. The header page (and for 4.0 any overflow header pages).
2. Log pages for after-image journalling (4.0 only).
3. Page Inventory pages.
4. Transaction Inventory pages

If the system relation RDB$PAGES could not be read or did not
contain any TIP pages, the message: 

"Transaction inventory pages lost"

will be returned.  If a particular page is missing from the 
sequence as established by RDB$PAGE_SEQUENCE, then the following
message will be returned:

"Transaction inventory page lost, sequence xxx"

If -mend is specified, then a new TIP will be allocated on disk and 
stored in RDB$PAGES in the proper sequence.  All transactions which 
would have been on that page are assumed committed.

If a TIP page does not point to the next one in sequence, the
following message will be returned:

"Transaction inventory pages confused, sequence xxx"

5. Generator pages as identified in RDB$PAGES.

B. Relation Walking

All the relations in the database are walked.  For each relation, all
indices defined on the relation are fetched, and all pointer and 
data pages associated with the relation are fetched (see below).

But first, the metadata is scanned from RDB$RELATIONS to fetch the
format of the relation.  If this information is missing or 
corrupted the relation cannot be walked.  
If any bugchecks are encountered from the scan, the following 
message is returned:

"bugcheck during scan of table xxx ()"

This will prevent any further validation of the relation.

NOTE: For views, the metadata is scanned but nothing further is done.

C. Index Walking

Prior to 5.0 Indices were walked before data pages.
In 5.0 Index walking was moved to after data page walking.
Please refer to the later section entitled "Index Walking".

D. Pointer Pages

All the pointer pages for the relation are walked.  As they are walked
all child data pages are walked (see below).  If a pointer page cannot 
be found, the following message is returned:

"Pointer page (sequence xxx) lost"

If the pointer page is not part of the relation we expected or
if it is not marked as being in the proper sequence, the following
message is returned:

"Pointer page xxx is inconsistent"

If each pointer page does not point to the next pointer page as
stored in the RDB$PAGE_SEQUENCE field in RDB$PAGES, the following 
error is returned:

"Pointer page (sequence xxx) inconsistent"

E. Data Pages

Each of the data pages referenced by the pointer page is fetched.
If any are found to be corrupt at the page level, and -mend is 
specified, the page is deleted from its pointer page.  This will 
cause a whole page of data to be lost.

The data page is corrupt at the page level if it is not marked as
part of the current relation, or if it is not marked as being in 
the proper sequence.  If either of these conditions occurs, the 
following error is returned:

"Data page xxx (sequence xxx) is confused"

F. Slot Validation

Each of the slots on the data page is looked at, up to the count
of records stored on page.  If the slot is non-zero, the record 
fragment at the specified offset is retrieved.  If the record
begins before the end of the slots array, or continues off the
end of the page, the following error is returned:

"Data page xxx (sequence xxx), line xxx is bad"

where "line" means the slot number.  

NOTE: If this condition is encountered, the data page is considered 
corrupt at the page level (and thus will be removed from its
pointer page if -mend is specified).

G. Record Validation

The record at each slot is looked at for basic validation, regardless
of whether -full is specified or not.  The fragment could be any of the 
following:

1.  Back Version

If the fragment is marked as a back version, then it is skipped.  
It will be fetched as part of its record.

2.  Corrupt

If the fragment is determined to be corrupt for any reason, and -mend 
is specified, then the record header is marked as damaged.

3.  Damaged

If the fragment is marked damaged already from a previous visit or
a previous validation, the following error is returned:

"Record xxx is marked as damaged"

where xxx is the record number.  

4.  Bad Transaction 

If the record is marked with a transaction id greater than the last 
transaction started in the database, the following error is returned:

"Record xxx has bad transaction xxx"

H. Record Walking

If -full is specified, and the fragment is the first fragment in a logical
record, then the record at this slot number is fully retrieved.  This
involves retrieving all versions, and all fragments of each 
particular version.  In other words, the entire logical record will 
be retrieved.

1. Back Versions

If there are any back versions, they are visited at this point.  
If the back version is on another page, the page is fetched but 
not validated since it will be walked separately.  

If the slot number of the back version is greater than the max
records on page, or there is no record stored at that slot number, 
or it is a blob record, or it is a record fragment, or the 
fragment itself is invalid, the following error 
message is returned:

"Chain for record xxx is broken"

2. Incomplete

If the record header is marked as incomplete, it means that there
are additional fragments to be fetched--the record was too large 
to be stored in one slot.
A pointer is stored in the record to the next fragment in the list.

For fragmented records, all fragments are fetched to form a full
record version.  If any of the fragments is not in a valid position,
or is not the correct length, the following error is returned:

"Fragmented record xxx is corrupt"      

Once the full record has been retrieved, the length of the format is
checked against the expected format stored in RDB$FORMATS (the 
format number is stored with the record, representing the exact 
format of the relation at the time the record was stored.)  
If the length of the reconstructed record does not match
the expected format length, the following error is returned:

"Record xxx is wrong length"

For delta records (record versions which represent updates to the record)
this check is not made.

I. Blob Walking 

If the slot on the data page points to a blob record, then the blob
is fetched (even without -full).  This has several cases, corresponding 
to the various blob levels.  

Level                      Action
-----   ----------------------------------------------------------------- 
0     These are just records on page, and no further validation is done.
1     All the pages pointed to by the blob record are fetched and
      validated in sequence.
2     All pages pointed to by the blob pointer pages are fetched and 
      validated.
3     The blob page is itself a blob pointer page; all its children
      are fetched and validated.

For each blob page found, some further validation is done.  If the
page does not point back to the lead page, the following error 
is returned:

"Warning: blob xxx appears inconsistent"

where xxx corresponds to the blob record number.  If any of the blob pages
are not marked in the sequence we expect them to be in, the following
error is returned:

"Blob xxx is corrupt"

Tip: the message for the same error in level 2 or 3 blobs is slightly
different:

"Blob xxx corrupt"

If we have lost any of the blob pages in the sequence, the following error
is returned:

"Blob xxx is truncated"

If the fetched blob is determined to be corrupt for any of the above
reasons, and -mend is specified, then the blob record is marked as
damaged.

J. Index Walking

In 5.0 Index walking was moved to after the completion
of data page walking.

The indices for the relation are walked.  If the index root page
is missing, the following message is returned:

"Missing index root page"

and the indices are not walked.  Otherwise the index root page
is fetched and all indices on the page fetched. 
For each index, the btree pages are fetched from top-down, left to
right.  
Basic validation is made on non-leaf pages to ensure that each node
on page points to another index page.  If -full validation is specified
then the lower level page is fetched to ensure it is starting index
entry is consistent with the parent entry. 
On leaf pages, the records pointed to by the index pages are not 
fetched, the keys are looked at to ensure they are in correct 
ascending order.

If a visited page is not part of the specified relation and index,
the following error is returned:

"Index xxx is corrupt at page xxx"

If there are orphan child pages, i.e. a child page does not have its entry 
as yet in the parent page, however the child's left sibling page has it's 
btr_sibling updated, the following error is returned

"Index xxx has orphan child page at page xxx"

If the page does not contain the number of nodes we would have
expected from its marked length, the following error is returned:

"Index xxx is corrupt on page xxx"

While we are walking leaf pages, we keep a bitmap of all record
numbers seen in the index.  At the conclusion of the index walk
we compare this bitmap to the bitmap of all records in the 
relation (calculated during data page/Record Validation phase).
If the bitmaps are not equal then we have a corrupt index
and the following error is reported:

"Index %d is corrupt (missing entries)"

We do NOT check that each version of each record has a valid
index entry - nor do we check that the stored key for each item
in the index corresponds to a version of the specified record.

K. Relation Checking

We count the number of backversions seen while walking pointer pages,
and separately count the number of backversions seen while walking
record chains.  If these numbers do not match it indicates either
"orphan" backversion chains or double-linked chains.  If this is
see the following error is returned:

"Relation has %ld orphan backversions (%ld in use)"

Currently we do not try to correct this condition, mearly report
it.  For "orphan" backversions the space can be reclaimed by
a backup/restore.  For double-linked chains a SWEEP should
remove all the backversions.

VI. ADDITIONAL NOTES

A.  Damaged Records

If any corruption of a record fragment is seen during validation, the 
record header is marked as "damaged".  As far as I can see, this has no 
effect on the engine per se.  Records marked as damaged will still be 
retrieved by the engine itself.  There is some question in my mind as
to whether this record should be retrieved at all during a gbak.

If a damaged record is visited, the following error message will appear:

"Record xxx is marked as damaged"

Note that when a damaged record is first detected, this message is not
actually printed.  The record is simply marked as damaged.  It is only 
thereafter when the record is visited that this message will appear.
So I would postulate that unless a full validation is done at some point,
you would not see this error message; once the full validation is done, 
the message will be returned even if you do not specify -full.

B. Damaged Blobs

Blob records marked as damaged cannot be opened and will not be deleted 
from disk.  This means that even during backup the blob structures marked 
as damaged will not be fetched and backed up.  (Why this is done
differently for blobs than for records I cannot say.  
Perhaps it was viewed as too difficult to try to retrieve a damaged blob.)

#endif /* INTERNAL_DOCUMENTATION

이 문서는 2000년 9월에 폴 비치에 의해 수정되었으며 폴 비치(Paul Beach)와 IBPhoenix사의 저작권 아래에 있습니다. 여러분은 이 문구까지 포함하여 이 문서를 그대로 재공개(republish)할 수 있습니다. 여러분은 이 원문이 폴 비치와 IBPhoenix사에 의해 작성되었다는 사실을 명시하는 조건 하에 이 문서를 수정, 정정, 확장할 수 있습니다.

원문: http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_db_corr

Trackback : http://firebird.borlandforum.com/impboard/impboard.dll/trackback?sn=80132

[cbuilder] 2005-09-02 05:50 X

일년쯤 전에 포럼의 디비가 깨졌을 때 복구하면서 참고한 문서입니다.
위에서 설명한 단계들 중 끝단계까지 가지 않고 중간쯤에서 복구에 성공했던 걸로 기억합니다. (신통방통~)

그때 몇몇 분들이 참고한 문서가 뭐냐고 물어보셨는데, 디비 복구 당시에 워낙에 정신없이 작업하다보니 주소를 잃어버렸었습니다. 최근에 다시 찾았길래 반가운 마음에 번역해서 올립니다. (그사이에 개정까지 되었네요)

[cbuilder] 2008-10-09 18:04 X

방금 업데이트된 원고로 수정했습니다. 처음 번역해서 올렸던 원문은 2000년에 쓰여진 것이었고, 2006년 10월에 다시 업데이트된 버전으로 다시 번역했습니다.