I have a database of members of an organization.
The data is really messy - the leader had a catastrophic medical issue, so I'm trying to make a sensible database out of Claris Filemaker Pro.
I'm trying to look for duplicates. There are some - apparently on purpose. For example, a couple could join 'once' but get two badges, so in the database we have one member ID (so no unique on it) and two different names. Not only that, libraries are in the database and several of them have the same email address, so no joy there.
I was trying to make a unique compound index of last_name, first_name, email_address, and member_id, but I found it allows duplicates.
I know NULL's aren't in indexes, but I was surprised that a compound index, if there is a single NULL in it, apparently allows duplicates of the non-null fields. The MySQL docs say:
A UNIQUE index permits multiple NULL values for columns that can contain NULL.
Sure, but what about the other 3 fields?
I didn't see a mention in the MariaDB docs.
Database:
CREATE TABLE `test` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`last_name` VARCHAR(46) NULL DEFAULT NULL COLLATE 'utf8mb3_general_ci',
`first_name` VARCHAR(46) NULL DEFAULT NULL COLLATE 'utf8mb3_general_ci',
`email_address` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8mb3_general_ci',
`member_id` VARCHAR(40) NULL DEFAULT NULL COLLATE 'utf8mb3_general_ci',
UNIQUE INDEX `index_dupecheck` (`last_name`, `first_name`, `email_address`, `member_id`) USING BTREE,
INDEX `index_id` (`id`) USING BTREE
)
COMMENT='Test version'
COLLATE='utf8mb3_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=3
;
test
| id | last_name | first_name | email_address | member_id |
|---|---|---|---|---|
| 1 | Bob | Johnson | bob@example.com | \N |
| 2 | Bob | Smith | bsmith@example.com | 1 |
| 3 | Bob | Johnson | bob@example.com | \N |
I want to ensure that row 3 could not be inserted if row 1 is already there.
It sounds like this is due to NVL comparison to NVL being 'not equal'? (Postgres, but this sounds like the problem How can this unique index allow duplicate rows? )
except that syntax doesn't seem to work with MariaDB.
Is there another, better way of doing it? I could probably do the checks programmatically, but whoever put this data in, will probably put it in again, so I'd like to have a server side restriction for obvious reasons.
I'm also thinking of just saying member-ID has to be unique, and generate some way to link 'family' accounts together, or a 'smart number' like 123-A, 123-B for Alice and Bob (so no one's #1)
Thank you