2

We are restructuring our entire database / filesystem / user identity system. As a first step, we have determined that we need to assign each user/employee in an organization unique ID. Surprisingly enough, there does not seem to be theoretical resources regarding such a problem.

I wonder whether there are some recommendations for designing such a system. I have studied some ID schemes, but none of them seem to be practical in this case. In particular, UID or ISBN like systems are impractical because the codes are very long for people to remember or communicate. I have looked at history of CODEN system for assigning journal IDs and it is very inspiring, but I would prefer to avoid the problems they historically went through (changing system twice along the way).

Desired properties of the system

In my case, I have around 10.000 people. The system should (probably) have these properties:

  • Uniqueness of IDS
  • IDs should be easily communicated and remembered (i.e. not too long etc)
  • Optionally the system should catch common mistakes in ID if not correct it.

I have considered

I considered including some sort of initials from names and brief analysis shows the following:

  • By using initials (1 character from Given name, 1 from Surname) I split the people into groups with the largest one having 165 members (J.K.)
  • By taking 1,2 first characters (1 from Given name, 2 from Surname) I get the largest group of 49 members.
  • If I take 1,3 first character that I get the largest group of 18, which is better than taking 2,2 first characters from names where I get 39 people in the largest group.

I also consider adding a checksum character like in CODEN which would preferably not only avoid mistakes but also make automatic correction possible in most cases.

I also had a look at Plus Codes which has a great idea of NOT using some characters (like 0, I, etc) which can be easily mistaken for others. But this would collide with the intention to include initials of some sort.

Regarding the "catching errors" I have found an article about Check digit which also suggests Damm algorithm which however only concerns the case if the number codes are used. I might be able to construct a similar system for letters thought.

PS: I have searched SE sites and initially asked this question on SuperUser, but it has been rejected as off-topic. I am trying to find the right place to ask this, but it seems not obvious.

Adam Miklosi
  • 135
  • 1
  • 2
  • 7
gorn
  • 139

2 Answers2

1

It's only code, doesn't have to mean anything, i.e. no information should be encoded in the code. Since there's no length limitation, why not use words? This is not my original idea, btw, I got it from what3words

It meets the desired properties:

  • Uniqueness, what3words can map earth in 3x3 square meters space. Even if that's the maximum, you have plenty of IDs available.
  • Easily communicated and remembered, three.words.easy.
  • System can catch mistakes, just need dictionary lookup, autocomplete, etc.

I know that sounds like a joke, but it does meet the requirements and I can't find any reason to not use it.

imel96
  • 3,608
0

First of all I have to say, use a GUID and do your 'easy to communicate' requirement with a 2d barcode or mag stripe or near field communication or autocomplete fields or something.

Secondly, You have such a small number of people, why don't use just use their Name or an int or a random 5 character string?

Each has downsides but none are unsolvable. I would go for a random 5 characters from a subset of letters and numbers, omitting o,1,l etc. generate batches in advance and have a human check each for obscenities.

Ewan
  • 83,178