7

I am creating a table wich will contain user-provided URLs. I want those to be unique, so when the user gives me a URL I will first check if the URL exists and if so return the ID for the entry. If not create a new row with this URL.

Obviously I want this to be fast. What is the best option?

  • Make the actual URL a varchar that is UNIQUE and look by this url?
  • Make a hash of the URL and use it as a primary key of sort?
  • Other ideas?
Derek Downey
  • 23,568
  • 11
  • 79
  • 104
nute
  • 313
  • 3
  • 10

2 Answers2

7

I would definitely go with a hash of the url and make the hash a unique index. A hash has a fixed length, so you can use CHAR to specify the length of the column, which grants a slight performance boost over VARCHAR or TEXT.

But might I suggest using INSERT IGNORE instead of making two calls to the database? Something like:

INSERT IGNORE INTO urlTable VALUES ('urlHash');

This has the benefit of ignoring any duplicate errors that might arise from attempting to insert a duplicate hash, without first having to do a SELECT COUNT(*) query.

Derek Downey
  • 23,568
  • 11
  • 79
  • 104
2

Unless I'm missing something, you should just create a UNIQUE index of the type HASH. I don't see what adding your own hash and triggers would add? And have the field itself NOT NULL.

CREATE  TABLE `test`.`bla` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `text` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `text_UNIQUE` USING HASH (`text`)
);
Jannes
  • 326
  • 1
  • 8