2

Related to the question Which hashing algorithm is best for uniqueness and speed?

Is there a way to create a hash function, or find one, whose hash length depends completely on the input length, has an adjustable hash character set (Since it must be 1-to-1 function the input must comply to this character set constraint as well).

Moreover the most important part is that the hash function generates as random as possible strings. So for example it would be possible to get the following two different results.

Hash(aaaa)->blue // for character set a-z

and

Hash(aaaz)->pink  // for character set a-z

More examples:

Hash(aa@12a)->dakj@4   // for character set a-z, @, 0-9

and

Hash(a2#46Ww)->@3#0Pdw // for character set a-z A-Z, !-), 0-9

Notice the character length between input and output and the character sets.

What I thought so far was a probability distribution function, there are random distribution functions but I am not sure how to get there.

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Normal_distribution

iordanis
  • 163

3 Answers3

4

This is not a hash function. It's either an encoding function or an encryption function.

Usually when I've encountered a problem of this form, it is because I have some sort of IDs that I want to "look" random. Assuming I don't care about security, I'll normally create an algorithm which works as follows:

  1. Map my data to numbers (usually, by treating a string in a character set containing X characters as a baseX number).
  2. Map my numbers to new numbers. Eric Lippert suggests using Multiplicative Inverses for this purpose. Skip32 encoding (AKA Skip32 encryption, but it's not secure) also works for this purpose.
  3. Reverse the mapping in step 1.

To decode, perform the same steps in reverse.

Brian
  • 4,553
1

I'd use a perfect hash function to map your set of n inputs to a unique integer < n, then use that as an index into either an array of outputs (eg a list of n colours) or map it to a unique string in your character set (e.g. convert the number to base 26 and render each base-26 digit as that letter of the alphabet).

0

It seems to me that Format Preserving Encryption is exactly what you're looking for, except for maybe the security properties implied by "encryption". Its wikipedia page notes that

For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers {0, ... , N−1} where N is the size of the domain.

For your case, 4-letter words with x possibilities for each character would be coded as integers by treating them as base-x numbers. Then the permutation would map that input to a distinct output, which would be decoded back into a string by reversing the base-x transform.

It seems that the best way to implement such a permutation is to apply an M-bit permutation to the input value, and if the result is N or more to apply it again to the result until the result is less than N, also called cycle walking. This works best when N is slightly less than 2^M.

There are ways to use Feistel Networks to adapt wide ciphers to produce arbitrary bit-width permutations, but this requires iterating the network to produce a good permutation.

Thelema
  • 101