Composite Primary Key column order

Question

How would you do the ordering of the PK columns in this case (Oracle 11gR2 Standard Edition):

I have a database where I offer the same application to many users. I want to be perfectly sure that all tenants have separated data and that under no circumstances leaving out a WHERE (by user or persistence framework mistake) can show customer data to another customer. Also, I want to have as few maintenance as possible, so I do not want to have the same schema over and over again (because then, in order to make perfectly sure that all the environments are the same, one has neverending schema compare marathons).

My solution:

1 data schema with column "tenant_id" in every table and every view.
For every customer a view VTxxx_tablename or VTxxx_viewname in the form "SELECT * FROM underlyingobject WHERE tenant_id = x WITH CHECK OPTION" with Select/Insert/Update/Delete-grants.
n customer schemas with no data.
For every of the n tenants synonyms to their granted views in their no-data schemas.
All of the above generated (and re-generated) by script.

Result:

All tables have a PK of (tenant_id, id) (or: (id, tenant_id), this is the question)
All FKs obviously include the tenant_id column, therefore no mixing of customer data possible.

The data:

"normal" OLTP-application (everything normalized, many JOINs in the SELECTs)
tenant_id is currently from [1,2,3], but might eventually get into the range 1..100.
ID is filled by one sequence per table.

The access:

All access is though the views, therefore every SELECT has an implicit WHERE tenant_id = x
Many SELECTs have a WHERE clause with full PK (implicit tenant_id through the used VIEW plus extra id like in WHERE tenant_id = x AND id = y)

Right now I decided for PK-column order (tenant_id, id).

Pro:

when ID column not present, INDEX SKIP SCAN or INDEX RANGE SCAN is possible
As no Oracle EE is available, this is my "poor mans partitioning" (downside below)

Con:

assuming the the application is used the same amount by all tenants, the PK has to be reorganized pretty much / for every insert, as low tenant_id rows are added as well.

What would you do in this case and why:

PK (tenant_id, id)
PK (id, tenant_id)
own schema (with own tablespace) per tenant nevertheless (more admin work (schema compare), better speed, simpler SQLs)

I'm looking for Oracle related physical differences only (like SCANs, Index Update costs or more general read costs/bottlenecks against write costs/bottlenecks)

Thank you, Blama

score 3 · Accepted Answer · answered Aug 11 '14 at 13:04

Store the tenant_id first. When you do this you can enable index key compression.

See http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm#i1106790 for the syntax and http://docs.oracle.com/cd/B28359_01/server.111/b28318/schema.htm#i14618 for the concepts.

In your case, you can do it like this:

create unique index mytable_idx on mytable(tenant_id,id) compress 1;

alter table mytable add constraint mytable_pk primary key(tenant_id, id);

score 1 · Answer 2 · answered Aug 11 '14 at 11:58

One thing you need to keep in mind is what you are using PK for. At the logical level (sometimes called the conceptual level), the primary key is simply one of the candidate keys, chosen somewhat arbitrarily. The purpose is to guarantee that each row is unique, that each row has an identifier, and that no part of an identifier is left out (NULL). For these purposes, it makes no difference at all which choice you make.

At the physical level, the other purpose is to provide rapid access to your rows based on providing key data. If you do a lot of joins, this is where performance counts the most. And if your database is normalized, you will probably do a lot of joins. The important thing here is not the PK constraint, but the index the DBMS builds behind the scenes in order to manage the PK constraint.

In your case, this will be a composite index, with two fields. For some queries, you may be doing joins or lookups, based on partial information, and expecting a list of rows in response. Do you do a lot of joins and lookups where you know the tenant_id but don't know the id? Then put the tenant_id first in your PK definition.

That sounds like your case, based on what you said about views and tenant_id.

For Oracle, the automatic index will be a sorted index, and the joins with your views will run fairly fast.