FloatingPoint Calculator
The floatingpoint calculator is here to help you understand the IEEE754 standard for the floatingpoint format. It acts as a converter for floatingpoint numbers — it converts 32bit floats and 64bit floats from binary representations to real decimal numbers and vice versa.
💡 Are you looking to convert binary numbers to the decimal system? Perhaps you'd be interested in our binary calculator!
How to use the floatingpoint calculator
Before we get into the bits and bytes of the float32
and float64
number formats, let's learn how the floatingpoint calculator works. Just follow these easy steps:

If you want to convert the binary encoding of a floatingpoint number to the decimal number it represents, select
floatingpoint to number
at the top of the calculator. Then:
Select the precision used. This determines how your binary representation will be interpreted.

Enter the floatingpoint number's binary digits. You can enter the sign, exponent, and fraction separately, or you can enter the entire bitstring in one go — select your preference in the
Bit input method
dropdown menu. 
The value stored in your float will be shown at the bottom of the calculator.


If you want to convert a value into its floatingpoint representation, select
number to floatingpoint
at the top of the calculator. Then:
Enter your number in the field below that.

The IEEE754 floatingpoint binary and hexadecimal representations of both single and doubleprecision floats will be shown below.

The floatingpoint calculator will also show you the actual value stored due to the loss of precision inherent to the floatingpoint format. The error due to this loss of precision will also be reported.

What is an IEEE754 floatingpoint number?
In computing, a floatingpoint number is a data format used to store fractional numbers in a digital machine. A floatingpoint number is represented by a series of bits (1
s and 0
s). Computers perform mathematical operations on these bits directly instead of how a human would do the math. When a human wants to read the floatingpoint number, a complex formula reconstructs the bits into the decimal system.
The IEEE754 standard (established in 1985 by the Institute of Electrical and Electronics Engineers) standardized the floatingpoint format and also defined how the format can be operated upon to perform math and other tasks. It's been widely adopted by hardware and software designers and has become the de facto standard for representing floatingpoint numbers in computers. When someone mentions "floatingpoint numbers" in the context of computing, they'd generally mean the IEEE754 data format.
💡 Computer performance (i.e., speed) can be measured by the number of floatingpoint operations it can do per second. This metric is called FLOPS and is crucial in fields of scientific computing.
The most wellknown IEEE754 floatingpoint format (singleprecision, or "32bit") is used in almost all modern computer applications. The format is highly flexible: float32
s can encode numbers as small as 1.4×10^{−45} and as large as 3.4×10^{38} (both positive and negative).
Besides singleprecision, the IEEE754 standard also codifies doubleprecision ("64bit" or float64
), which can store numbers from 5×10^{−324} to 1.7×10^{308}. Less commonly used IEEE754 formats include:
 Halfprecision ("16bit");
 Quadrupleprecision ("128bit"); and
 Octupleprecision ("256bit").
💡 Technically, although IEEE754 only defines these formats, any arbitrary precision is possible — many older computers used 24bit floatingpoint numbers!
However, a floatingpoint number is not just a number converted to the binary number system — it's much more complicated than that! Let's learn how the IEEE754 floatingpoint standard works.
How are real numbers stored with floatingpoint representation?
Any floatingpoint binary representation consists of three segments of bits: sign, exponent, and fraction.
 The sign (
S
) indicates positive or negative;  The exponent (
E
) raises 2 to some power to scale the number; and  The fraction (
F
) determines the number's exact digits.
The segments' lengths and exact formula applied to S
, E
, and F
to recreate the number depend on the format's precision.
When stored in memory, $S$, $E$, and $F$ are laid endtoend to create the full binary representation of the floatingpoint number. In computer memory, it might look like this:
These bits are what the computer manipulates when arithmetic and other operations are performed. The computer never sees a number as its decimal digits — it only sees and works with these bits.
💡 The choice of precision depends on what its application requires. More precision means more bits and higher accuracy, but also bigger storage footprint and longer computation time.
Let's look at the two most commonly used floatingpoint formats: float32
and float64
.
The singleprecision 32bit float format
float32
is the most commonly used of the IEEE754 formats. As suggested by the term "32bit float", its underlying binary representation is 32 bits long. These are segmented as follows:
 1 bit for the sign ($S$);
 8 bits for the exponent ($E$); and
 23 bits for the fraction ($F$).
When it's stored in memory, the bits look like this:
We can rewrite it as a binary string:
The real value that this 32bit float stores can be calculated as:
where:
 $127$ is called the exponent bias and is inherent to the singleprecision format;
 $x_2$ means that $x$ must be interpreted as if in base 2 or binary; and
 $(1.b_{10}...b_{32})_2$ means to take the binary bits $b_{10}$ to $b_{32}$ and use it as the fractional part of $1$ to form a binary fraction. See our binary fraction calculator for help on that.
We can rewrite the formula better using $S$, $E$, and $F$:
where
 $S = b_1$;
 $E = (b_2\ ...\ b_9)_2$; and
 $F = (b_{10}\ ...\ b_{32})_2$.
To let floatingpoint formats store really small numbers with high precision, $E=0$ and $F>0$ activates a separate formula. For float32
, that formula is
Numbers created by this formula are called "subnormal numbers", while "normal numbers" are created using the previous formula.
There are other special cases, encoded by specific values for $E$ and $F$:
$E$  $F$  Value 

$0$  $0$  $\pm 0$ 
$0$  $>0$  $\small (1)^S \times 2^{126} \times (0.F)_2$ 
$0 < E<255$  Any  $\small (1)^S \times 2^{E127} \times (1.F)_2$ 
$255$  $0$  $\pm \infty$ 
$255$  $>0$  $\text{NaN}$ 
$\text{NaN}$ means "not a number", which is returned when you divide by zero or perform impossible math using infinity. For the cases of $\pm 0$ and $\pm \infty$, the sign bit determines whether the number is positive or negative. And yes, negative zero is a thing!
An example
Let's convert the binary floatingpoint representation 01000001110111100000000000000000
to the real number it represents.

First, let's segment it into $S$, $E$, and $F$:

$S = 0_2 = 0$

$E = 10000011_2 = 131$

$F = 10111100000000000000000_2$


Since $0<E<255$, we use the normal number formula:
$(1)^S \times 2^{E127} \times (1.F)_2$

$(1)^S = 1$ (so the number is positive!)

$2^{E127} = 2^{4} = 16$

$(1.F)_2 = 1.101111_2 = 1.734375$


Combine the three multiplicands:
$1 \times 16 \times 1.734375 = \textbf{27.75}$
Want to see for yourself? Try this value in our floatingpoint calculator to see it in action!
Note that $01000001110111100000000000000000_2$ converted directly from binary to decimal is not $27.75$, but $1,\!105,\!068,\!032$. Quite the difference, wouldn't you say?
💡 Floatingpoint numbers' formula can be seen as a form of scientific notation, where the exponential aspect uses base 2. We can rewrite the example above as $1.734375 \times 2^{4}$. See our scientific notation calculator for more information.
The doubleprecision 64bit float format
The inner workings of the float64
data type are much the same as that of float32
. All that differs are:
 The lengths of the exponent and fraction segments of the binary representation — in 64bit floats, $E$ takes up 11 bits, and $F$ takes up 52.
 The exponent bias — in 64bit floats, it's 1023 (whereas in 32bit floats, it's 127).
The formulas for the reconstruction of 64bit normal and subnormal floatingpoints are, therefore, respectively:
and
Because of the additional exponent and fraction bits compared to float32
s, float64
s can store much larger numbers and at much higher accuracy.
💡 All IEEE754 floatingpoint formats follow this pattern, with the biggest differences being the bias and the lengths of the segments.
How do I convert a number to floatingpoint?
To convert a decimal number to a floatingpoint representation, follow these steps:
 Convert the entire number to binary and then normalize it, i.e. write it in scientific notation using base 2.
 Truncate the fraction after the decimal point to the allowable fraction length.
 Extract the sign, exponent, and fraction segments.
Refer to the IEEE754 standard for more detailed instructions.
An example
Let's convert $27.75$ back to its floatingpoint representation.

Our integer part is $27$, which in binary is $11011_2$. Our fraction part is $0.75$. Let's convert it to a binary fraction:

$0.75 \times 2 = 1.50 = \textbf{1} + 0.5$

$0.5 \times 2 = 1.00 = \textbf{1} + 0$
So $0.75 = 0.11_2$, and then $27.75 = 11011.11_2$. To normalize it, we rewrite it as:

 The digits after the decimal point are already a suitable length — for
float32
, we'd be limited to 23 bits, but we have only five.
From the normalized rewrite, we can extract that:
 $S = 0$, since the number is positive;
 $E = \textcolor{blue}{4}+127 = 10000011_2$ (we add $127$ because $E127$ must be $\textcolor{blue}{4}$); and
 $F = \textcolor{red}{101111}00000000000000000$ (zeroes get padded on the right because we're dealing with fractional digits and not an integer).
Therefore, our floatingpoint representation is $0\textcolor{blue}{10000011}\textcolor{red}{101111}00000000000000000$.
You can verify this result with our floatingpoint calculator!
Floatingpoint accuracy
Floatingpoint numbers cannot represent all possible numbers with complete accuracy. This makes intuitive sense for sufficiently large numbers and for numbers with an infinite number of decimals, such as pi ($\pi$) and Euler's number ($e$). But did you know that computers cannot store a value as simple as $0.1$ with 100% accuracy? Let's look into this claim!
When you ask a computer to store $0.1$ as a float32
, it will store this binary string:
00111101110011001100110011001101
If we convert that back to decimal with the floatingpoint formulas we learned above, we get the following:
 $S = 0$
 $E = 01111011_2 = 123$
 $F = 10011001100110011001101_2$
That's a little bit more than $0.1$! The error (how far away the stored floatingpoint value is from the "correct" value of $0.1$) is $1.49 \times 10^{9}$.
Let's try to rectify this mistake and make the smallest change possible. Our stored number is a little too big, so let's change that last bit from a $1$ to a $0$ to make our float a little bit smaller. We're now converting:
00111101110011001100110011001100
 $S = 0$
 $E = 01111011_2 = 123$
 $F = 10011001100110011001100_2$
And we've missed our mark of $0.1$ again! This time, the error is a little larger ($5.96\times 10^{9}$). The first binary string that ended in 1
was even more correct than this one!
You may think, "what if we used more bits?" Well, if we were to do the same with the float64
format instead, we'd find the same problem, although less severe. $0.1$ converted to a 64bit floatingpoint and back to decimal is $0.100000000000000006$, and the error here is $5.55 \times 10^{18}$. This is the higher accuracy of float64
in action, but the error is still not $0$ — the conversion is still not lossless.
This is, unfortunately, the drawback of the ubiquitous floatingpoint format — it's not 100% precise. Small bits of information get lost, and they can wreak havoc if not accounted for. The only numbers that can be stored perfectly as a float without any losses are powers of 2 scaled by integers because that's how the format stores numbers. All other numbers are simply approximated when stored as a floatingpoint number. But it's still the best we've got!
💡 Some operations are more resilient against precision loss. Try our condition number calculator to see how severely a loss of accuracy will affect matrix operations.
FAQ
Why do we use floatingpoint numbers?
The IEEE754 floatingpoint format standard enables efficient storage and processing of numbers in computers.

From a hardware perspective, many simplifications of floatingpoint operations can be made to significantly speed up arithmetic, thanks to the IEEE754 standard's specifications.

For software, floats are very precise and typically lose a few millionths (if not less) per operation, which enables highprecision scientific and engineering applications.