I need to know if there is any method for intentionally corrupting an index in PostgreSQL.
I have been searching for it for a long time but never seen anyone mentioning such an act.
I need to know if there is any method for intentionally corrupting an index in PostgreSQL.
I have been searching for it for a long time but never seen anyone mentioning such an act.
I think @Craig's comments are more important, addressing the intentions behind your question. In any case, to answer the question asked:
One simple way (among many others) would be to fake an IMMUTABLE function and base an index on it like outlined in this answer from yesterday:
CREATE OR REPLACE FUNCTION f_fake_immutable_ts()
RETURNS timestamp LANGUAGE sql COST 1 IMMUTABLE AS
$$SELECT timestamp '2015-07-25 01:00'$$;
Create a partial index based on it:
CREATE INDEX tbl_created_at_idx ON tbl (tbl_id)
WHERE created_at > f_fake_immutable_ts();
The index relies on the function to always return the same output for the same input. Changing the function (to return a different timestamp), renders the index corrupt.
There is no way to identify this index as corrupt except by testing each entry.
Erwin describes a method of creating an index that violates PostgreSQL's assumptions about the index, producing results that are likely incorrect and do not match the underlying table.
That's one kind of corruption.
Another kind of corruption is where an index has blocks that are simply invalid - zeroed out, replaced with random values, etc. Most likely reading such an index will result in an ERROR if/when that block is accessed. You can create such an index by dding from /dev/zero or /dev/urandom into some offset into the index.
Another kind of corruption is a simple bit-flip. That's where you pick a random (or targeted) byte from the index relation and change it from a 0 to a 1 or vice versa. This can happen in reality if memory errors (including gamma radiation, heat problems, etc) cause an index page to be corrupted in-memory before it's flushed to disk. It's trivial to write a program in C, Python, or whatever to do this.
A further kind of corruption is where the index is structurally valid, but outdated. It doesn't reflect the current table contents because for some reason some updates to the index didn't get written. You can create this situation by setting indisvalid to false, doing some inserts/updates/deletes on the indexed relation, then setting indisvalid = true again. Since PostgreSQL ignores an index with indisvalid = f, assuming it's in the process of being built, it won't update it, so you'll have outdated index contents.
Yet another kind of corruption involves b-tree structure problems, like missing leaves, circular references, etc. The best way to create these kinds of corruptions is to modify PostgreSQL's source code to make it do them on purpose. It'll be difficult to do them using just the index relation.
Another kind of index corruption is truncation. The index looks fine, but the end is missing.
There's the case where the index relation is simply missing. The catalog entries exist, but the backing file doesn't.
The index relation can exist, but be full of random garbage.
The list goes on. Almost by definition, corruption can be anything that isn't the normal state of affairs.
This is one of the reasons why "fuzzing" is so interesting. Because it's very, very hard for programmers to anticipate every possible state and guard against every possible problem, so problems can often be found by making random changes and seeing what breaks.
Erwin's answer does a good job of answering the question as originally stated, however you added the comment:
I want to check proactively if there is any corrupted index in the database, using the below query SELECT index_name, status from user_indexes where status='INVALID'
Assuming you are talking about the indisvalid column in pg_index, or some status field derived therefrom, you can create an invalid (or intentionally corrupt, if you prefer) index like so:
CREATE TABLE foo (a int);
INSERT INTO foo (a) VALUES (1),(1);
CREATE UNIQUE INDEX CONCURRENTLY ON foo(a);
Here, Postgres knows that the index is bad and marks it as such in pg_index. There are countless other ways to create a corrupt index that Postgres won't immediately notice: Erwin's answer, mismatched glibc or other collation behavior between a primary and a standby, writing arbitrary bytes to the files behind those indexes, performing bogus updates to Postgres' catalog tables, and on and on.