Welcome to Omni's Benford's law calculator, where you can verify if your data follows the Benford's law formula by checking the probability with which each digit appears as the leading digit of numbers.
Curious what Benford's law is? Looking for examples of data following Benford's law? Or not sure how to use this Benford's law calculator? Scroll down and find all the answers!
What is Benford's law?
Benford's law says that the distribution of the leading digits of the numbers found in various real-life data sets is such that the digit
1 is most frequent (appearing about 30 % of the time), the digit
2 is less frequent (a bit less than 18% of the time) and so on, all the way down to
9, which is the least frequent leading digit (less than 5 % of the time).
Benford's law (also known as the law of first digits or the Newcomb–Benford law) owes its name to Frank Benford, who in 1938 published it in a paper "The Law of Anomalous Numbers". The same law had been found and published back in 1881 by Simon Newcomb (you can see that Benford's law is an example of Stigler's law 😉). However, neither Newcomb nor Benford were able to prove this result. It was only in 1995 that Theodore Hill published the proof.
As we now know what Benford's law is, let's discuss how you can apply it to your own data set.
How to use Benford's law?
To use Benford's law, follow these instructions:
- Count how often each digit appears as the leading digit in your data.
- Compute the relative frequencies.
- Compare them with the distribution predicted by Benford's law.
- To visualize the problem, construct bar plots.
- Make a decision based on how similar your distribution is to Benford's law.
Simple, isn't it? But if your data set is large, it will be very tedious to perform this procedure. That's precisely why we've built our Benford's law calculator!
How to use this Benford's law calculator?
As we have already explained Benford's law, you should have no trouble understanding the calculator's output. What we should still explain is how to input data. To use our Benford's law calculator most efficiently, follow these steps:
This Benford's law calculator has two modes of inputting data:
A raw sample - Just enter the consecutive numbers from your data set. The fields will appear as you go.
Count of leading digits - Use this mode if you have already determined how often each digit appears as the leading digit.
As you input data, you can see the bar plot of your sample and that of the Benford's law formula.
There is also a table comparing the frequency of leading digits in your sample with the theoretical Benford's distribution.
If the discrepancy between your data and Benford's law formula is large, then you have strong evidence that your sample does not follow Benford's distribution.
If you try to artificially create data following Benford's law, try to make the bar plots (and the two columns of the frequency table) as similar as possible. Good luck!
Benford's law formula
According to Benford's law, the theoretical probability of each digit to appear as the leading digit in your data set is the following:
Benford's law explained
A set of numbers is said to satisfy Benford's law if the leading digit
d occurs with probability:
P(d) = log10(d+1) - log10(d)
which, via the properties of logarithms, we can rewrite as:
P(d) = log10(1 + 1/d)
This formula follows from the assumption that the logarithms of numbers follow the uniform distribution. In other words, that the probability of the leading digit being
d is proportional to the width of the interval between
d + 1 on the logarithmic scale.
For instance, the interval
[log 1, log 2]
is a bit more than 6 times wider than the interval
[log 9, log 10]
1 is predicted to appear a bit more than 6 times more frequently than
What data follow Benford's law? Examples
Benford's law applies to data sets from many different domains. Benford himself tested his hypothesis on the following data sets:
- 104 physical constants;
- 5000 entries from a mathematical handbook;
- 1800 molecular weights;
- Sizes of 3259 US populations;
- The street addresses of the first 342 persons listed in American Men of Science;
- 418 death rates;
- Surface areas of 335 rivers; and
- 308 numbers contained in an issue of Reader's Digest.
It's not only real-life data that follows Benford's distribution. Also, some well-known mathematical sequences obey Benford's law. Examples include:
Some data sets do not obey Benford's law. This happens, for instance, when data do not cover several orders of magnitude, like height or weight. Sometimes discrepancy between data and Benford's law results from rounding and truncation, which introduce bias. In mathematics, the sequences disobeying Benford's law are reciprocals and square roots.
How to calculate Benford's law?
According to Benford's law, the probability of
d being the leading digit is
P(d) = log10(d+1) - log10(d) = log10(1 + 1/d).
How can Benford's law help detect fraud?
Benford's law helps detect artificially created data because people have a tendency to assume that each digit should appear as the leading digit with the same frequency. As a result, they introduce too many numbers starting, e.g., with 5. However, data may diverge from Benford's law for many reasons, not necessarily intentional fraud!
How do I tell if numbers follow Benford's law?
Plot the bar plot of your data against the frequency predicted by Benford's law. If there's a significant discrepancy, your data does not follow Benford's law. Otherwise, follow this introductory analysis with a statistical test to make the final decision.
How do I statistically test Benford's law?
To test Benford's law statistically, use the chi-squared goodness-of-fit test of the Kolmogorov test. Keep in mind, however, that neither of these methods is perfect, and there's ongoing research on the matter.
|Digit||Observed in your sample||Frequency in your sample||Frequency according to Benford's law|