Omni Calculator logo

How Hash Functions Work: From Tables to Passwords

You have already seen some hash types, but what is a hash function exactly? You remember that hash tables are crucial in efficient data storage, and you know that hashes act like a digital fingerprint. What’s next? Now is the time to learn what a hash function actually is!

In this article, we define a hash function and outline its basic properties. We will also discuss some examples of hash functions. With Omni’s assistance, you will finally understand how to define hash functions correctly!

The definition of a hash function is very simple. It is a map that:

  1. Takes any input data (text, files, etc.).
  2. Runs it through some mathematical algorithm.
  3. Outputs a fixed-length string of characters. This result is called a hash or digest.

If this explanation feels a bit abstract, that is understandable: it is the specific mathematical algorithm operating inside that truly determines how a hash function works.

Let us discuss two very basic examples of hash functions: one for numeric input and one for alphanumeric input.

Example 1

hash(n) = n (mod 100)

For instance:

hash(421231283) = 83

In other words, to compute the hash, you simply take the last two digits of the input:

hash(number) = last two digits

Even though this example is elementary, the modulo operator (mod) is a cornerstone of many more complex hash functions.

Example 2

This function is defined by analogy to what we have seen above. Given a string, we define its hash as:

hash(string) = last four characters

For example:

hash('OmniCalculator') = ator

These examples, although quite elementary, provide a basic understanding of how hash functions work. To see more examples, including the well-known bitwise methods (folding) and the multiplicative methods, visit our dedicated page: examples of hash functions.

Let us now discuss a bit more about the properties that appear in the definition of hash functions.

Every hash function has to satisfy the following properties:

  1. Arbitrary input size. The mathematical formula that constitutes the hash function definition must work no matter the input length: one character or an entire book must produce a result.
  2. Fixed output size. No matter the input length, the length of the output must be the same, e.g., 256 bits for both one character and an entire book.
  3. Deterministic map. For a given input, the hash function must always generate the same hash value.

Additionally, a hash function should be fast to compute and minimize the risk of collisions, i.e., it should rarely happen that two different inputs produce the same hash. The relative importance of these properties depends on the intended use of the hash function:

  • Non-cryptographic hash functions are crucial in hash tables.
  • Cryptographic hash functions are used in cybersecurity to secure sensitive data such as passwords.

Let us first discuss how hash functions work in cryptography.

Here, we briefly describe the desired properties of cryptographic hash functions. They are discussed in more detail in a dedicated article that explains how hash functions work in cryptography.

  1. Avalanche effect: A tiny change in the input should massively change the hash.

  2. Preimage resistance: It should be very difficult to find an input that produces a given hash.

  3. Second preimage resistance: Given an input and its hash, it should be tough to find another input that would produce this hash.

  4. Collision resistance: It should be hard to find a pair of inputs that would produce the same hash. A weaker property of collision avoidance is essential in data storage, which makes collisions a concern when defining hash functions.

We use non-cryptographic hash functions for applications that do not require the rigorous security properties of the triple resistance we discussed in the previous section. In exchange, these functions can be much, much faster to compute. Indeed, efficiency is crucial in data storage and retrieval.

Another important property is the uniform distribution of outputs: the hashes should cover the output range as evenly as possible. In other words, every hash should appear with roughly the same probability. This attribute helps avoid collisions, which decrease the efficiency of the hashing-based methods.

Recall that collision resistance means that it is difficult to find two inputs that produce the same hash. Since the output range of a hash function is limited, the pigeonhole principle implies that taking more inputs than the number of possible hashes must eventually produce a collision. However, finding such a collision is usually computationally infeasible.

Perfect hashing is a special type of hash function that guarantees no collisions for a given set of keys (inputs). This means that each key maps to a unique slot in a hash table, resulting in extremely fast, deterministic lookups.

Dynamic perfect hashing, which allows insertion of new keys, is typically more computationally expensive than using a standard hash function with collision handling.

A universal hash function (UHF) is not a single function but a family of hash functions, from which one is chosen at random. It is designed so that for any two distinct inputs, the probability of a collision is very low (more precisely: strictly bounded), regardless of the inputs themselves.

This article was written by Anna Szczepanek and reviewed by Steven Wooding.