Hate to be captain obvious on this one, but Instagram generously provides a function that you linked to that stores the keys as bigint.
CREATE SCHEMA insta5;
CREATE SEQUENCE insta5.table_id_seq;
CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id int := 5;
BEGIN
-- The %1024, is just a way of saying they only want 10bit wraparound.
SELECT nextval('insta5.table_id_seq') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE plpgsql;
They're actually using PostgreSQL. From that function you can see that they're returning a bigint. So certainly you can store the result of that function in bigint. As a special note, this is likely not the function that they're using. That function probably has a signature more like this,
insta5.next_id(smallint shard, OUT result bigint);
We know this because hardcoding a shard of 5 is not all that useful, and they seem to indicate that they're using this functionality. So in that blog id they brag that their ID compromises of
- 64bits total
- 64-23 = 41 bits for timestamp
- 64-41 = 23 bits for shard + sequence id
- 10 bits for the sequence id.
- 13 bits for the shard.
Quick test on their code,
test=# SELECT insta5.next_id();
next_id
---------------------
1671372309237077023
(1 row)
Decomposing the ID
Now let's play. For teh extra sexy, we can create helper functions that get the internal components from the ID. In the event you want to know the shard Instagram is using or their internal timestamp.
-- 13 bits for shard
CREATE FUNCTION insta5.get_shard(id bigint)
RETURNS smallint
AS $$
SELECT ((id<<41)>>51)::smallint;
$$ LANGUAGE sql;
-- 10 bits for sequence id
CREATE FUNCTION insta5.get_sequence(id bigint)
RETURNS smallint
AS $$
SELECT ((id<<54)>>54)::smallint;
$$ LANGUAGE sql;
-- 41 bits for timestamp
CREATE OR REPLACE FUNCTION insta5.get_ts(id bigint)
RETURNS timestamp without time zone
AS $$
SELECT to_timestamp(((id >> 23) + 1314220021721 ) / 1000 )::timestamp without time zone;
$$ LANGUAGE sql;
Playing around, let's get a test id.
SELECT insta5.next_id();
next_id
---------------------
1671390786412876801
(1 row)
SELECT
insta5id,
insta5.get_ts(insta5id),
insta5.get_shard(insta5id),
insta5.get_sequence(insta5id)
FROM (VALUES
(1671390786412876801::bigint),
(insta5.next_id())
) AS t(insta5id);
Returns the following,
insta5id | get_ts | get_shard | get_sequence
---------------------+---------------------+-----------+--------------
1671390786412876801 | 2017-12-16 17:02:09 | 5 | 1
1671392537048257538 | 2017-12-16 17:05:38 | 5 | 2
(2 rows)
Rolling our own Instagram ID Domain
You can even create an explicit DOMAIN over the type if you wish to really clean this up.. This is how I would personally store this, note I made a few further modifications.
- I added
COMMENTS -- always good practice.
- Made the functions
IMMUTABLE
- Added
insta5.next_id require an explicit shard.
Let's drop what we had,
DROP SCHEMA insta5 CASCADE;
And start over,
CREATE SCHEMA insta5;
COMMENT ON SCHEMA insta5 IS 'Instagram';
CREATE DOMAIN insta5.id AS bigint;
COMMENT ON DOMAIN insta5.id IS $$Instagram's internal ID type, based on example from "Sharding & IDs at Instagram"$$;
CREATE SEQUENCE insta5.table_id_seq;
CREATE OR REPLACE FUNCTION insta5.next_id(shard_id smallint)
RETURNS insta5.id
AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
result insta5.id;
now_millis bigint;
BEGIN
SELECT nextval('insta5.table_id_seq') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
RETURN result;
END;
$$ LANGUAGE plpgsql;
COMMENT ON FUNCTION insta5.next_id(smallint)
IS 'Modifications made to require shard id';
CREATE OR REPLACE FUNCTION insta5.get_shard(id insta5.id)
RETURNS smallint
AS $$
SELECT ((id<<41)>>51)::smallint;
$$ LANGUAGE sql
IMMUTABLE;
COMMENT ON FUNCTION insta5.get_shard(insta5.id)
IS '13 bits from insta5.id representing shard';
CREATE OR REPLACE FUNCTION insta5.get_sequence(id insta5.id)
RETURNS smallint
AS $$
SELECT ((id<<54)>>54)::smallint;
$$ LANGUAGE sql
IMMUTABLE;
COMMENT ON FUNCTION insta5.get_sequence(insta5.id)
IS '10 bits from insta5.id representing sequence';
CREATE OR REPLACE FUNCTION insta5.get_ts(id insta5.id)
RETURNS timestamp without time zone
AS $$
SELECT to_timestamp(((id >> 23) + 1314220021721 ) / 1000 )::timestamp without time zone;
$$ LANGUAGE sql
IMMUTABLE;
COMMENT ON FUNCTION insta5.get_ts(insta5.id)
IS '41 bits from insta5.id representing timestamp';
Everything works as before, but now you can
CREATE SCHEMA mySchema;
CREATE TABLE mySchema.mydata ( insta5id insta5.id ) ;
This is likely the best solution you can obtain shy of a C implementation, and you probably don't want to generate an insta5id ever. That's their job. ;)
As another important aside, you likely never want to do this. Don't follow by example. This is what the uuid type is for, and you should be using it rather than hand rolling your own. Specifically, this is eerily similar to uuid_generate_v1() in uuid-ossp, which stores a MAC (shard), and timestamp
This function generates a version 1 UUID. This involves the MAC address of the computer and a time stamp. Note that UUIDs of this kind reveal the identity of the computer that created the identifier and the time at which it did so, which might make it unsuitable for certain security-sensitive applications.