proj-jasper-jasperArithmeticNotes1

" > map (^0) [0] 15:36:22 <lambdabot> [1] 15:36:24 <mar77a> NOOOOOOO 15:36:32 <mar77a> HOW COULD YOU!?! 15:36:51 <Taejo> mar77a, integers don't have NaN? "

--

IEEE754

--

deterministically portable arithmetic by default (e.g. exactly the same floating point results even on different platforms), but you can use annotations to request platform-specific forms

--

http://dec64.org/

https://news.ycombinator.com/item?id=7365812


StefanKarpinski? 202 days ago

link

...

I personally think that the way to handle floating-point confusion is better user education. However, if you really want a decimal standard, then, as I mentioned above, there already is one that is part of the IEEE 754 standard. Not only do there exist hardware implementations, but there are also high-quality software implementations.

A better approach to making things more intuitive in all bases, not just base 10, is using rational numbers. The natural way is to use reduced paris of integers, but this is unfortunately quite prone to overflow. You can improve that by using reduced ratios of – guess what – floating point numbers.


StefanKarpinski? 202 days ago

link

This is some serious amateur hour. The most glaring problem is this:

> There are 255 possible representations of zero. They are all considered to be equal.

There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations. Aside from the fact that you're wasting an entire byte of your representation, this means that you can't check for equality by comparing bits. Take a look at the the assembly implementation of equality checking:

https://github.com/douglascrockford/DEC64/blob/master/dec64....

The "fast path" (which is ten instruction) applies only if the two numbers have the same exponent. The slow path calls subtraction and returns true if the result is zero. The implementation of subtraction falls back on yet another function, which jumps around even more:

https://github.com/douglascrockford/DEC64/blob/master/dec64....

For most comparisons (no, comparing numbers with the same exponent is not the norm) it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT. Many of these instructions are branches – and inherently unpredictable ones at that, which means that pipeline stalls will be normal. All told, I would expect equality comparison to typically take around 100 cycles. It's not even clear to me that this implementation is correct because at the end of the subtraction, it compares the result to the zero word, which is only one of the 255 possible representations of zero. The lack of a single canonical representation of any number is just as bad for other arithmetic operations and comparisons, if not worse.

Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological. He devoted a section in his book "JavaScript?: The Good Parts" to a rather ill-informed rant against it. When I saw him give a talk, I took the opportunity to ask him what he thought would be a good alternative to using IEEE 754. His answer was – I shit you not – "I don't know". Apparently this proposal is the answer. No thanks, I will stick with the amazingly successful, ubiquitous, thoroughly thought out standard, that was spearheaded by William Kahan – one of the greatest numerical analysts of all time. Anyone who doesn't appreciate how good we have it with IEEE 754 should really read "An Interview with the Old Man of Floating-Point" [1], in which Kahan relates just how messed up this stuff was before the IEEE 754 standardization process. It should also be noted that there already is an IEEE standard for decimal floating-point [2], which is not only infinitely better thought out than this drivel, but also is already implemented in hardware on many systems sold by IBM and others, specifically for financial applications.

[1] http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.ht...

[2] http://en.wikipedia.org/wiki/Decimal64_floating-point_format


haberman 202 days ago

link

This is some seriously hyperbolic negativity.

> There are also 255 representations of almost all representable numbers. [...] Aside from the fact that you're wasting an entire byte of your representation

How is this different than any other floating-point representation? I'm pretty sure IEEE floating-point has the same redundancy, though numbers are normalized so comparisons are cheaper as you note. But IEEE doubles "waste" even more bits due to the 2^52 representations of NaN?.

> For most comparisons [...] it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT.

Good point, sounds like a notable weakness and barrier to adoption.

> Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological.

He calls it "the most frequently reported bug in JavaScript?." Wouldn't you be interested in improving on the most common cause of user confusion in a technology you care about?


StefanKarpinski? 202 days ago

link

IEEE 754 doesn't waste any bits – there is only a single representation of each value (except for multiple NaNs?). In this proposal, there are 255 representations of most values, which means that it has almost an entire byte of redundancy. The waste is bad, but the lack of a canonical representation of each value is worse.

I personally think that the way to handle floating-point confusion is better user education. However, if you really want a decimal standard, then, as I mentioned above, there already is one that is part of the IEEE 754 standard. Not only do there exist hardware implementations, but there are also high-quality software implementations.

A better approach to making things more intuitive in all bases, not just base 10, is using rational numbers. The natural way is to use reduced paris of integers, but this is unfortunately quite prone to overflow. You can improve that by using reduced ratios of – guess what – floating point numbers.


cornholio 202 days ago

link

> There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations.

You are not correct. The smallest significand possible is 1x10^1, but you can't delve further into positive exponents. Conversely, 56 signed bits allows the largest integer power of 10 as 10 000 000 000 000 000 so the exponent will be -15. So there are exactly 17 representations of 10, and that's the worst it gets. All other numbers except powers of 10 have fewer representations, and most real world data affected by noise has a single representation because they use the full precision of the significand, and you can't shift them to the right or left without overflow or loss of precision.

So the redundancy is much less than you think, one in 10 real values has two representations, one in 100 has three etc. This is common for other decimal formats and not that big of a problem, detecting zero is a simple NOR gate on all significant bits.

The real problem with this format is the very high price in hardware (changing the exponent requires recomputing the significand) and complete unsuitability for any kind of numerical problem or scientific number crunching. Because designing a floating point format takes numerical scientists and hardware designers, not assembly programmers and language designers.

Heck, the only reason he put the exponent in the lower byte and not the upper byte, where it would have ensured a perfect compatibility to most positive integers, is that X64 assembly does not allow direct access to the upper byte.


StefanKarpinski? 202 days ago

link

You are right. I forgot about the interaction between the exponent and the significand. The lack of a canonical representation is still quite problematic.


JonahBraun? 202 days ago

link

In one format, IEE754 has 24576 possible representations of zero[1], which fits your definition of "wasted bits". Some of your other criticisms might be valid, but at this point I'd like to see an accurate technical comparison between DEC64 and the decimal formats of IEEE 754.

[1] http://en.wikipedia.org/wiki/Decimal128_floating-point_forma...


StefanKarpinski? 202 days ago

link

This is why decimal floating-point formats are kind of a disaster in general and are only implemented in relatively rare hardware intended for financial uses. In many of those applications, using a decimal fixed point representations is better – i.e. counting in millionths of pennies (you can still count up to ±9 trillion dollars with 64 bits). But yes, a technical comparison of different decimal formats would definitely be interesting. I suspect that despite the occasional failure of intuitiveness, we're far better off with binary formats and better programmer education.


tel 202 days ago

link

I feel like that bug has more to do with Javascript's (and clearly Crockford's) inane desire to pretend it is the case that all numeric types are the same.

Nobody who does anything with numbers believes that! Even if all you can do is count your fingers you believe in the difference between integers and floats. They have different algebraic properties entirely and it takes a whole hell of a lot of work to get from one to the other---there's even a whole class (fractions) in between.


haberman 202 days ago

link

I'm not sure what that has to do with it. Even if you are ok with the idea that integers and "decimal numbers" are different, it's still confusing that 0.1 + 0.2 != 0.3.

It's confusing because it is very difficult to look at a decimal number and know whether it can be represented exactly as base-2 floating point. It's especially confusing because you get no feedback about it! Here is a Ruby session:

    irb(main):001:0> 0.1
    => 0.1
    irb(main):002:0> 0.2
    => 0.2
    irb(main):003:0> 0.3
    => 0.3
    irb(main):004:0> 0.4
    => 0.4
    irb(main):005:0> 0.5
    => 0.5

They all look exact, right? Well actually only 0.5 is (IIRC?). All the others are actually approximations internally.


StefanKarpinski? 202 days ago

link

The problem here is that Ruby is lying to you – if you enter 0.3 as a literal, you will not get that value.


haberman 202 days ago

link

I would challenge you to name any software system that reliably shows you the precise value of a floating point number.

I couldn't find any (like not a single one), so I wrote a program myself to do it: http://blog.reverberate.org/2012/11/dumpfp-tool-to-inspect-f...


StefanKarpinski? 202 days ago

link

Python, JavaScript? V8, Julia, Java.


haberman 202 days ago

link

The precise value of double(0.1) is 0.1000000000000000055511151231257827021181583404541015625. That is precise, not an approximation.

If you know of a program in any of these languages that will print this value for "0.1" using built-in functionality, please let me know because I would love to know about it.

Likewise the precise value of double(1e50) is 100000000000000007629769841091887003294964970946560. Anything else is an approximation of its true value.

In another message you said that what's really important is that the string representation uniquely identifies the precise value. While that will help you reconstruct the value later, it does not help you understand why 0.1 + 0.2 != 0.3.


masida 202 days ago

link

Python does it:

  >>> from decimal import Decimal
  >>> Decimal(0.1)
  Decimal('0.1000000000000000055511151231257827021181583404541015625')
  >>> Decimal(1e50)
  Decimal('100000000000000007629769841091887003294964970946560')

haberman 202 days ago

link

Great to know, thanks!


PaulAJ? 202 days ago

link

Haskell will tell you that

    toRational 0.1 = 3602879701896397 % 36028797018963968
    toRational 1e50 = 100000000000000007629769841091887003294964970946560 % 1

Both of those values are precise, although I admit the first isn't really useful for human beings.


StefanKarpinski? 201 days ago

link

The easiest way to see this exact value in Julia is to convert the Float64 value 0.1 to BigFloat? and then print that:

    julia> big(0.1)
    1.000000000000000055511151231257827021181583404541015625e-01
    julia> big(1e50)
    1.0000000000000000762976984109188700329496497094656e+50

Note that this is exactly why you don't normally want to construct BigFloats? this way. Instead you want to do this:

    julia> BigFloat("0.1")
    1.000000000000000000000000000000000000000000000000000000000000000000000000000002e-01
    julia> BigFloat("1e50")
    1e+50

StefanKarpinski? 202 days ago

link

It helps because 0.1 + 0.2 produces 0.30000000000000004 for 64-bit floats – so at least you can see that this value isn't the same as 0.3. In Ruby you just get two values that print the same yet aren't equal, which is way more confusing. I agree that printing the minimal number of digits required for reconstruction does not help with explaining why 0.1, 0.2 and 0.3 in 64-bit floats aren't the real values 1/10, 2/10 and 3/10.


giovannibajo1 202 days ago

link

Python doesn't:

    rasky at monocle in ~
    ↪ python
    Python 2.7.5 (default, Sep  2 2013, 05:24:04)
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 0.1
    0.1
    >>> ^D
    rasky at monocle in ~
    ↪ python3
    Python 3.3.3 (default, Dec 24 2013, 13:54:32)
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 0.1
    0.1

It used to, but it was changed to reduce users' confusion.


StefanKarpinski? 202 days ago

link

We may be speaking across each other here. Ruby is lying in the sense that there are multiple distinct float values that it will print as 0.3 – in particular, confusion ensues when two values look the same but are unequal. These other languages print each distinct float value differently, using just enough decimal digits to reconstruct the exact binary value. Ruby doesn't give you enough digits to reconstruct the value you have. Nobody actually prints the full correct value because it's fifty digits long and is completely redundant given that you know you're dealing with a 64-bit float.


kzrdude 202 days ago

link

You're right.

    Python 3.3.5rc1 (default, Feb 23 2014, 17:44:29)  
    >>> 0.3 == 0.2 + 0.1
    False
    >>> 0.2 + 0.1
    0.30000000000000004
    >>> 0.3
    0.3

deathanatos 202 days ago

link

> DEC64 is intended to be the only number type in the next generation of application programming languages.

Please no. There's a very solid case for integers in programming languages: many places in code call for a number which must be an integer: having the type enforce this is nice: you don't need to check and abort (or floor, or whatever) if someone passes you a non-integer. Basically, anything that does an array index, which is any for loop walking through a memory buffer (which might be an array, a string, a file, the list goes on and on. Anything that can be counted) wants an integer. x[i] where i is not an integer just doesn't make sense: ideally, let a type system enforce that.

Granted, of course, that many languages will just truncate a float in an integers context, and so funny stuff does happen (I don't really feel that this is a good thing). (Although interestingly, JS is not one of them.) Personally, I think JS needs an integer type. Especially when you see people start getting into bignum stuff in JS, it gets silly fast, as first you can only store so many bits before floating point loses precision, but even then, even if you have an "integer", JS will find funny ways to shoot you in the foot. For example:

  x | 0 === x

does not hold true for all integers in JS.


spc476 202 days ago

link

But it can still bite you. I got hit with an IEEE 754 implementation detail that took me two days to figure out (http://boston.conman.org/2014/03/05.1). The same problem would exist with this proposal as well.

The summary: RLIM_INFINITY on a 32-bit system can be safely stored in a double. RLIM_INFINITY on a 64-bit system can't. I had code in Lua (which uses doubles as the only numeric type) that worked fine on 32-bit systems, but not 64-bit systems. It took me two days to track the root cause down.

(Edit: added summary)


justincormack 202 days ago

link

Which is why Lua 5.3 will have a 64 bit integer type, and will move away from float types only. You can't interop with external stuff without full 64 bit ints.

---

bsder 202 days ago

link

Can we please make fun of this "specification" very very loudly as a warning to people who might think this is even a remotely good idea?

Decimal floating point is actually a good, underutilized idea. However, there is a VERY good specification here: http://speleotrove.com/decimal/dbspec.html

It is written by people who understand the problem, and have thought quite deeply about the solutions.

As opposed to this pile of garbage ...


pascal_cuoq 202 days ago

link

I see no need to provide a single instruction for this. It can already be implemented in a couple of instructions in standard instruction sets (I am familiar with OCaml's Zarith, which does exactly this: https://forge.ocamlcore.org/scm/viewvc.php/*checkout*/trunk/... . Other implementations surely exist). The only inefficient aspect of the current implementation is that most operations contain two or three conditional jumps, two for the arguments and sometimes one for the possibility of overflow.

...

0x09 202 days ago

link

> A later revision of IEEE 754 attempted to remedy this, but the formats it recommended were so inefficient that it has not found much acceptance.

The C and C++ standards committees have been drafting decimal support based on IEC 60559:2011 (previously IEEE 754-2008) since its creation.

Original C TR: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1312.pdf

Latest draft specification: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1781.pdf

Original C++ TR: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n284...

Update proposal for C++11: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n340...

GCC, ICC, IBM C, and HP C have adopted the extension. Links to the respective software implementations can be found under "cost of implementation" in the C++11 proposal above, except GCC, which uses IBM's library. Its support is documented here: http://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html

Meanwhile, hardware support so far exists in POWER6 and 7 and SPARC64 X.

This may seem like slow adoption if you aren't accustomed to the glacial pace of standards processes. Especially in the context of a component as critical as floating point numerics. If there is any holdup it would be lack of demand for this format, which the article's proposal doesn't affect.


reading the above, i wonder if any language or library has used the idea stated by StefanKarpinski? above to internally represent rationals as reduced ratios of two floating point numbers (in order to reduce overflow if you just used pairs of integers). That sounds like a system i might like in my language..

heck, since we're dreaming, is there any language that does arithmetic with algebraic numbers?:

https://en.wikipedia.org/wiki/Algebraic_number

or heck, how about EL numbers? https://en.wikipedia.org/wiki/Closed-form_expression#Closed-form_number

here's the Julia implementation of rationals: https://github.com/JuliaLang/julia/blob/master/base/rational.jl

doc: http://julia.readthedocs.org/en/latest/manual/complex-and-rational-numbers/#man-rational-numbers

see also http://julia.readthedocs.org/en/latest/manual/constructors/#case-study-rational and https://github.com/JuliaLang/julia/blob/master/doc/manual/conversion-and-promotion.rst for explanation of the rational implementation

i guess from my point of view this is idle speculation, though, because i don't know any numerical analysis and so am not the write person to improve on representations of numbers.