4

So I want to save different URIs in database.

I want format to be forced as well to look up different URIs easily.

Is it better to create table like this:

CREATE TABLE uris (
  id INT(11) UNIQUE PRIMARY KEY AUTO_INCREMENT,
  scheme VARCHAR(255) NOT NULL,
  host VARCHAR(255) NOT NULL,
  path VARCHAR(255) NOT NULL,
  port SMALLINT UNSIGNED,
  query VARCHAR(255),
  fragment VARCHAR(255)
);

Or just

CREATE TABLE uris (
    id INT(11) UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    uri TEXT NOT NULL
);

UPDATE: Use

Read, store maybe do some analytics. Build back original URI and if it's web URL redirect. May do some searching that's why I think first is better.

I am more interested in what advantage does one have over another or are there any problems in using one of them ?

nikartix
  • 259

3 Answers3

4

Seriously, are you ever going to do a search for scheme, path, port, query or fragment? Probably not. About the only thing you might want to search is host.

Now, if your URIs mostly have the same scheme (http), you can build an index on the free-form text column and search it using the LIKE operator with a prefix match, if you want to do host search. If there are few different schemes, just search for "http://foo%", "ftp://foo%" etc.

So, chances are the only search you will ever need can be implemented by storing the URI as free-form text. Thus, I would recommend not trying to be clever and simply storing the URI as free-form text.

Also, how many URIs are you going to store? If it's in the thousands and not millions, you should consider that premature optimization is the root of all evil. Thus, building indices you will never need is not going to help you in any way. This also favors storing the URI as free-form text.

Furthermore, there is the minor possibility that the format of URIs is going to change in the future. This means you need to update your application to accept the new format. By storing URI as free-form text, you don't need to do that.

juhist
  • 2,579
  • 12
  • 14
2

Uris have a well defined format, but even so I think you will be hard pressed to represent it as a table schema.

For example, your example has a number of problems

  1. you will run out of ints
  2. query string can be longer than 255 characters
  3. sometimes the port is not stated etc
Ewan
  • 83,178
1

I don't see any reason that you can't just do both here:

CREATE TABLE uris (
  id INT(11) UNIQUE PRIMARY KEY AUTO_INCREMENT,
  uri TEXT NOT NULL,
  scheme VARCHAR(255) NOT NULL,
  host VARCHAR(255) NOT NULL,
  path VARCHAR(255) NOT NULL,
  port SMALLINT UNSIGNED,
  query VARCHAR(255),
  fragment VARCHAR(255)
);

That way you can quickly search against that, or run analytics over the individual parts. Best of both worlds. The only argument I can think of to not do this is because of redundant data, but it's only a short string that's getting repeated and I think the benefit and simplicity would outweigh a string that's duplicated.

neilsimp1
  • 1,078