Bayle Shanks's website: proj-oot-ootArithmeticNotes1

" > map (^0) [0] 15:36:22 <lambdabot> [1] 15:36:24 <mar77a> NOOOOOOO 15:36:32 <mar77a> HOW COULD YOU!?! 15:36:51 <Taejo> mar77a, integers don't have NaN? "

IEEE754

deterministically portable arithmetic by default (e.g. exactly the same floating point results even on different platforms), but you can use annotations to request platform-specific forms

http://dec64.org/

https://news.ycombinator.com/item?id=7365812

StefanKarpinski? 202 days ago

link

...

I personally think that the way to handle floating-point confusion is better user education. However, if you really want a decimal standard, then, as I mentioned above, there already is one that is part of the IEEE 754 standard. Not only do there exist hardware implementations, but there are also high-quality software implementations.

A better approach to making things more intuitive in all bases, not just base 10, is using rational numbers. The natural way is to use reduced paris of integers, but this is unfortunately quite prone to overflow. You can improve that by using reduced ratios of – guess what – floating point numbers.

StefanKarpinski? 202 days ago

link

This is some serious amateur hour. The most glaring problem is this:

> There are 255 possible representations of zero. They are all considered to be equal.

There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations. Aside from the fact that you're wasting an entire byte of your representation, this means that you can't check for equality by comparing bits. Take a look at the the assembly implementation of equality checking:

https://github.com/douglascrockford/DEC64/blob/master/dec64....

The "fast path" (which is ten instruction) applies only if the two numbers have the same exponent. The slow path calls subtraction and returns true if the result is zero. The implementation of subtraction falls back on yet another function, which jumps around even more:

https://github.com/douglascrockford/DEC64/blob/master/dec64....

For most comparisons (no, comparing numbers with the same exponent is not the norm) it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT. Many of these instructions are branches – and inherently unpredictable ones at that, which means that pipeline stalls will be normal. All told, I would expect equality comparison to typically take around 100 cycles. It's not even clear to me that this implementation is correct because at the end of the subtraction, it compares the result to the zero word, which is only one of the 255 possible representations of zero. The lack of a single canonical representation of any number is just as bad for other arithmetic operations and comparisons, if not worse.

Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological. He devoted a section in his book "JavaScript?: The Good Parts" to a rather ill-informed rant against it. When I saw him give a talk, I took the opportunity to ask him what he thought would be a good alternative to using IEEE 754. His answer was – I shit you not – "I don't know". Apparently this proposal is the answer. No thanks, I will stick with the amazingly successful, ubiquitous, thoroughly thought out standard, that was spearheaded by William Kahan – one of the greatest numerical analysts of all time. Anyone who doesn't appreciate how good we have it with IEEE 754 should really read "An Interview with the Old Man of Floating-Point" [1], in which Kahan relates just how messed up this stuff was before the IEEE 754 standardization process. It should also be noted that there already is an IEEE standard for decimal floating-point [2], which is not only infinitely better thought out than this drivel, but also is already implemented in hardware on many systems sold by IBM and others, specifically for financial applications.

[1] http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.ht...

[2] http://en.wikipedia.org/wiki/Decimal64_floating-point_format

haberman 202 days ago

link

This is some seriously hyperbolic negativity.

> There are also 255 representations of almost all representable numbers. [...] Aside from the fact that you're wasting an entire byte of your representation

How is this different than any other floating-point representation? I'm pretty sure IEEE floating-point has the same redundancy, though numbers are normalized so comparisons are cheaper as you note. But IEEE doubles "waste" even more bits due to the 2^52 representations of NaN?.

> For most comparisons [...] it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT.

Good point, sounds like a notable weakness and barrier to adoption.

> Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological.

He calls it "the most frequently reported bug in JavaScript?." Wouldn't you be interested in improving on the most common cause of user confusion in a technology you care about?

StefanKarpinski? 202 days ago

link

IEEE 754 doesn't waste any bits – there is only a single representation of each value (except for multiple NaNs?). In this proposal, there are 255 representations of most values, which means that it has almost an entire byte of redundancy. The waste is bad, but the lack of a canonical representation of each value is worse.

cornholio 202 days ago

link

> There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations.

You are not correct. The smallest significand possible is 1x10^1, but you can't delve further into positive exponents. Conversely, 56 signed bits allows the largest integer power of 10 as 10 000 000 000 000 000 so the exponent will be -15. So there are exactly 17 representations of 10, and that's the worst it gets. All other numbers except powers of 10 have fewer representations, and most real world data affected by noise has a single representation because they use the full precision of the significand, and you can't shift them to the right or left without overflow or loss of precision.

So the redundancy is much less than you think, one in 10 real values has two representations, one in 100 has three etc. This is common for other decimal formats and not that big of a problem, detecting zero is a simple NOR gate on all significant bits.

The real problem with this format is the very high price in hardware (changing the exponent requires recomputing the significand) and complete unsuitability for any kind of numerical problem or scientific number crunching. Because designing a floating point format takes numerical scientists and hardware designers, not assembly programmers and language designers.

Heck, the only reason he put the exponent in the lower byte and not the upper byte, where it would have ensured a perfect compatibility to most positive integers, is that X64 assembly does not allow direct access to the upper byte.

StefanKarpinski? 202 days ago

link

You are right. I forgot about the interaction between the exponent and the significand. The lack of a canonical representation is still quite problematic.

JonahBraun? 202 days ago

link

In one format, IEE754 has 24576 possible representations of zero[1], which fits your definition of "wasted bits". Some of your other criticisms might be valid, but at this point I'd like to see an accurate technical comparison between DEC64 and the decimal formats of IEEE 754.

[1] http://en.wikipedia.org/wiki/Decimal128_floating-point_forma...

StefanKarpinski? 202 days ago

link

This is why decimal floating-point formats are kind of a disaster in general and are only implemented in relatively rare hardware intended for financial uses. In many of those applications, using a decimal fixed point representations is better – i.e. counting in millionths of pennies (you can still count up to ±9 trillion dollars with 64 bits). But yes, a technical comparison of different decimal formats would definitely be interesting. I suspect that despite the occasional failure of intuitiveness, we're far better off with binary formats and better programmer education.

tel 202 days ago

link

I feel like that bug has more to do with Javascript's (and clearly Crockford's) inane desire to pretend it is the case that all numeric types are the same.

Nobody who does anything with numbers believes that! Even if all you can do is count your fingers you believe in the difference between integers and floats. They have different algebraic properties entirely and it takes a whole hell of a lot of work to get from one to the other---there's even a whole class (fractions) in between.

haberman 202 days ago

link

I'm not sure what that has to do with it. Even if you are ok with the idea that integers and "decimal numbers" are different, it's still confusing that 0.1 + 0.2 != 0.3.

It's confusing because it is very difficult to look at a decimal number and know whether it can be represented exactly as base-2 floating point. It's especially confusing because you get no feedback about it! Here is a Ruby session:

    irb(main):001:0> 0.1
    => 0.1
    irb(main):002:0> 0.2
    => 0.2
    irb(main):003:0> 0.3
    => 0.3
    irb(main):004:0> 0.4
    => 0.4
    irb(main):005:0> 0.5
    => 0.5

They all look exact, right? Well actually only 0.5 is (IIRC?). All the others are actually approximations internally.

StefanKarpinski? 202 days ago

link

The problem here is that Ruby is lying to you – if you enter 0.3 as a literal, you will not get that value.

haberman 202 days ago

link

I would challenge you to name any software system that reliably shows you the precise value of a floating point number.

I couldn't find any (like not a single one), so I wrote a program myself to do it: http://blog.reverberate.org/2012/11/dumpfp-tool-to-inspect-f...

StefanKarpinski? 202 days ago

link

Python, JavaScript? V8, Julia, Java.

haberman 202 days ago

link

The precise value of double(0.1) is 0.1000000000000000055511151231257827021181583404541015625. That is precise, not an approximation.

If you know of a program in any of these languages that will print this value for "0.1" using built-in functionality, please let me know because I would love to know about it.

Likewise the precise value of double(1e50) is 100000000000000007629769841091887003294964970946560. Anything else is an approximation of its true value.

In another message you said that what's really important is that the string representation uniquely identifies the precise value. While that will help you reconstruct the value later, it does not help you understand why 0.1 + 0.2 != 0.3.

masida 202 days ago

link

Python does it:

  >>> from decimal import Decimal
  >>> Decimal(0.1)
  Decimal('0.1000000000000000055511151231257827021181583404541015625')
  >>> Decimal(1e50)
  Decimal('100000000000000007629769841091887003294964970946560')

haberman 202 days ago

link

Great to know, thanks!

PaulAJ? 202 days ago

link

Haskell will tell you that

    toRational 0.1 = 3602879701896397 % 36028797018963968
    toRational 1e50 = 100000000000000007629769841091887003294964970946560 % 1

Both of those values are precise, although I admit the first isn't really useful for human beings.

StefanKarpinski? 201 days ago

link

The easiest way to see this exact value in Julia is to convert the Float64 value 0.1 to BigFloat? and then print that:

    julia> big(0.1)
    1.000000000000000055511151231257827021181583404541015625e-01

    julia> big(1e50)
    1.0000000000000000762976984109188700329496497094656e+50

Note that this is exactly why you don't normally want to construct BigFloats? this way. Instead you want to do this:

    julia> BigFloat("0.1")
    1.000000000000000000000000000000000000000000000000000000000000000000000000000002e-01

    julia> BigFloat("1e50")
    1e+50

StefanKarpinski? 202 days ago

link

It helps because 0.1 + 0.2 produces 0.30000000000000004 for 64-bit floats – so at least you can see that this value isn't the same as 0.3. In Ruby you just get two values that print the same yet aren't equal, which is way more confusing. I agree that printing the minimal number of digits required for reconstruction does not help with explaining why 0.1, 0.2 and 0.3 in 64-bit floats aren't the real values 1/10, 2/10 and 3/10.

giovannibajo1 202 days ago

link

Python doesn't:

    rasky at monocle in ~
    ↪ python
    Python 2.7.5 (default, Sep  2 2013, 05:24:04)
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 0.1
    0.1
    >>> ^D
    rasky at monocle in ~
    ↪ python3
    Python 3.3.3 (default, Dec 24 2013, 13:54:32)
    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 0.1
    0.1

It used to, but it was changed to reduce users' confusion.

StefanKarpinski? 202 days ago

link

We may be speaking across each other here. Ruby is lying in the sense that there are multiple distinct float values that it will print as 0.3 – in particular, confusion ensues when two values look the same but are unequal. These other languages print each distinct float value differently, using just enough decimal digits to reconstruct the exact binary value. Ruby doesn't give you enough digits to reconstruct the value you have. Nobody actually prints the full correct value because it's fifty digits long and is completely redundant given that you know you're dealing with a 64-bit float.

kzrdude 202 days ago

link

You're right.

    Python 3.3.5rc1 (default, Feb 23 2014, 17:44:29)  
    >>> 0.3 == 0.2 + 0.1
    False
    >>> 0.2 + 0.1
    0.30000000000000004
    >>> 0.3
    0.3

deathanatos 202 days ago

link

> DEC64 is intended to be the only number type in the next generation of application programming languages.

Please no. There's a very solid case for integers in programming languages: many places in code call for a number which must be an integer: having the type enforce this is nice: you don't need to check and abort (or floor, or whatever) if someone passes you a non-integer. Basically, anything that does an array index, which is any for loop walking through a memory buffer (which might be an array, a string, a file, the list goes on and on. Anything that can be counted) wants an integer. x[i] where i is not an integer just doesn't make sense: ideally, let a type system enforce that.

Granted, of course, that many languages will just truncate a float in an integers context, and so funny stuff does happen (I don't really feel that this is a good thing). (Although interestingly, JS is not one of them.) Personally, I think JS needs an integer type. Especially when you see people start getting into bignum stuff in JS, it gets silly fast, as first you can only store so many bits before floating point loses precision, but even then, even if you have an "integer", JS will find funny ways to shoot you in the foot. For example:

  x | 0 === x

does not hold true for all integers in JS.

spc476 202 days ago

link

But it can still bite you. I got hit with an IEEE 754 implementation detail that took me two days to figure out (http://boston.conman.org/2014/03/05.1). The same problem would exist with this proposal as well.

The summary: RLIM_INFINITY on a 32-bit system can be safely stored in a double. RLIM_INFINITY on a 64-bit system can't. I had code in Lua (which uses doubles as the only numeric type) that worked fine on 32-bit systems, but not 64-bit systems. It took me two days to track the root cause down.

(Edit: added summary)

justincormack 202 days ago

link

Which is why Lua 5.3 will have a 64 bit integer type, and will move away from float types only. You can't interop with external stuff without full 64 bit ints.

---

bsder 202 days ago

link

Can we please make fun of this "specification" very very loudly as a warning to people who might think this is even a remotely good idea?

Decimal floating point is actually a good, underutilized idea. However, there is a VERY good specification here: http://speleotrove.com/decimal/dbspec.html

It is written by people who understand the problem, and have thought quite deeply about the solutions.

As opposed to this pile of garbage ...

pascal_cuoq 202 days ago

link

I see no need to provide a single instruction for this. It can already be implemented in a couple of instructions in standard instruction sets (I am familiar with OCaml's Zarith, which does exactly this: https://forge.ocamlcore.org/scm/viewvc.php/*checkout*/trunk/... . Other implementations surely exist). The only inefficient aspect of the current implementation is that most operations contain two or three conditional jumps, two for the arguments and sometimes one for the possibility of overflow.

...

0x09 202 days ago

link

> A later revision of IEEE 754 attempted to remedy this, but the formats it recommended were so inefficient that it has not found much acceptance.

The C and C++ standards committees have been drafting decimal support based on IEC 60559:2011 (previously IEEE 754-2008) since its creation.

Original C TR: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1312.pdf

Latest draft specification: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1781.pdf

Original C++ TR: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n284...

Update proposal for C++11: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n340...

GCC, ICC, IBM C, and HP C have adopted the extension. Links to the respective software implementations can be found under "cost of implementation" in the C++11 proposal above, except GCC, which uses IBM's library. Its support is documented here: http://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html

Meanwhile, hardware support so far exists in POWER6 and 7 and SPARC64 X.

This may seem like slow adoption if you aren't accustomed to the glacial pace of standards processes. Especially in the context of a component as critical as floating point numerics. If there is any holdup it would be lack of demand for this format, which the article's proposal doesn't affect.

reading the above, i wonder if any language or library has used the idea stated by StefanKarpinski? above to internally represent rationals as reduced ratios of two floating point numbers (in order to reduce overflow if you just used pairs of integers). That sounds like a system i might like in my language..

heck, since we're dreaming, is there any language that does arithmetic with algebraic numbers?:

https://en.wikipedia.org/wiki/Algebraic_number

or heck, how about EL numbers? https://en.wikipedia.org/wiki/Closed-form_expression#Closed-form_number

here's the Julia implementation of rationals: https://github.com/JuliaLang/julia/blob/master/base/rational.jl

doc: http://julia.readthedocs.org/en/latest/manual/complex-and-rational-numbers/#man-rational-numbers

i guess from my point of view this is idle speculation, though, because i don't know any numerical analysis and so am not the write person to improve on representations of numbers.

dottrap 1 day ago

link

I think you would probably want to do it as a userdata and not as a language mod.

Roberto addressed the idea of a infinite number type, but it has a lot of downsides which are very un-Lua like, i.e. complicated, slow (no real hardware), and interoperability problems with the C API (which is very important to Lua).

video: https://www.youtube.com/watch?v=bjqNK1jA77M slides: http://www.inf.puc-rio.br/~roberto/talks/ws2014.pdf

---

" James Morgan replied to comment from Jim Davis

February 5, 2015 10:44 PM

I was going to say something similar, but with the caveat that it depends on language, compiler, etc.

My perl5 -V says among other things doublesize=8 longdblsize=16 and nvsize=8. Meaning I'm using 64-bit floating point numbers, but my compiler (and perl's configuration) knows about 128-bit floating point numbers.

I've honestly not heard may people clamor for 128-bit floats the way that 64-bit integers were. I suppose there probably are some fields where the absurdly large or absurdly small numbers allowed by the re-doubling of the precision are useful. But I've not lately found my self calculating inter-stealer trajectories, or the mass of leptons. "

---

perl6 has rationals by default, not floating point

http://blogs.perl.org/users/ovid/2015/02/a-little-thing-to-love-about-perl-6-and-cobol.html

---

blahedo 2 days ago

link

Not to disagree with anything the OP says, but a language with somewhat more recent currency that shares the "decimals are rationals by default" advantages is Scheme (and Racket and so on). At least in Racket, floating point representation is an implementation detail of the inexact-real numbers, while the exact-rational numbers are correctly produced and computed on when you give values in decimal form.

Scheme (et al) are probably the more direct conceptual inspiration for Perl 6's rational decimals---the Perl 6 developers were certainly aware of it (as they were aware of pretty much all the major PL work out there :)

perlgeek 2 days ago

link

> For instance, summing the harmonic series (nth term is 1/n) means doing massive amounts of factoring to keep intermediate results in lowest terms...

There is a feature in Perl 6 that the OP didn't mention: if the denominator doesn't fit into a 64 bit integer anymore, it falls back to floating point arithmetic.

There's a type (FatRat?, http://doc.perl6.org/type/FatRat) that stays precise if you really want, but the default give precise arithemtics for "usual" numbers without pathologic slow-down for pathologic cases.

xioxox 2 days ago

link

Is this kind of mixed rational/floating point sensible? It seems to be a fuzzy do-what-I-mean, not what-I-say kind of arithmetic. Surely weird differences in algorithm behaviour might occur around the 64 bit integer limit. It will also make the analysis and testing of numerical algorithm accuracy very difficult unless you can put bounds on your input values.

leoh 2 days ago

link

I feel like this "do what I mean not what I say" is exactly what I have found so irksome about Perl

nemo1618 2 days ago

link

That was my first reaction too, but I think we're reaching the point where the increased computational complexity is manageable. Efficiency has certainly never been one of Perl's selling points. I imagine there's a trapdoor available if you want to use floats anyway; actually, knowing Perl, I'm sure of it.

colomon 2 days ago

link

There are (at least) two trapdoors. The first is that an operation on Rats which would produce a denominator which does not fit in a 64-bit unsigned integer will produce a Num instead. (If you're okay with your denominators growing in an unbounded fashion, you can use the FatRat? class instead.)

The second "trapdoor" is that you can just explicitly work with Nums (ie C doubles). If your numeric literal has a exponent (eg 1e-15), it's automatically a Num. Or you can convert any Real numeric type to a Num by calling the .Num method on it.

Derbasti 2 days ago

link

Another language that supports this is Julia: In Julia, `/` is division, but `` creates a rational number. Thus, in Julia, `110 + 210 - 310 == 01`, and `1 / (110 + 210 - 310) == 10`, where `isinf(10) == true` and `01 == 0`.

But the best thing is that because of Julia's type system, every subsequent operation on rationals will happen rationally as well, including (fast) linear algebra. Obviously this excludes things like `sqrt` or `fft`.

(http://docs.julialang.org/en/release-0.3/manual/complex-and-...)

cnvogel 2 days ago

link

> Another language that supports this is Julia:

As does, for example, scheme (tests done in racket).

    > (- (+ (/ 1 10) (/ 2 10)) (/ 3 10))
    0
    > (= (- (+ (/ 1 10) (/ 2 10)) (/ 3 10)) 0)
    #t

arocks 2 days ago

link

I prefer Python's explicit approach to decimal types:

  $ python -c "from decimal import Decimal as D; print(D('.1')+D('.2')-D('.3'))"
  0.0

Or if you prefer true rational numbers:

  $ python -c "from fractions import Fraction as F; print(F('.1')+F('.2')-F('.3'))"
  0

  $ python -c "from fractions import Fraction as F; print(F('3.1415927'))"
  31415927/10000000

herge 2 days ago

link

What other languages call all-important features Python calls a module in the standard library :)

draegtun 2 days ago

link

You could say the same thing about perl5 then :)

    $ perl -E "use bignum; say .1 + .2 - .3"
    0

    $ perl -E "use bigrat; say .1 + .2 - .3"
    0

    $ perl -E "use bigrat; say 3.1415927 / 1"
    31415927/10000000

ref: http://perldoc.perl.org/bignum.html

http://perldoc.perl.org/bigrat.html

---

thread on floating-point determinism: https://news.ycombinator.com/item?id=9141303

the only interesting parts imo were

trsohmers 5 hours ago

What most people don't realize is that the IEEE Floating Point standard (754-2008) actually doesn't standardize when a float should be rounded. Since most CPUs/GPUs are able to do FMAs (Fused Multiply-add)'s that are rounded after the operation, you get a different result than if you were to do a multiplication operation, then round, then add, and then round again. ... "

and: https://news.ycombinator.com/item?id=9141679

antiuniverse 5 hours ago

Here's an outstanding article that goes into the nuance of actually achieving floating point determinism: http://gafferongames.com/networking-for-game-programmers/flo...

My takeaway: It's hard enough that I'm probably never going to attempt to implement lockstep simulation, and if I was forced to I'd probably do everything in fixed point for my own sanity.

http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/

i think i'd like Oot to have floating-point determinism.. but i dont want to make platform implementers' lives hard, so maybe that is a feature that is not required of every Oot implementation?

---

so, from http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/ , what do you need to do?

tell ur compiler to use 'strict' floating point mode, eg disallow arithmetic optimizations that may change things
dont use x87 commands or trancendentals or inverse sqrt ("avoid the non-754 instructions that are becoming more prevalent for inverse and inverse sqrt that don’t round correctly or even consistently from one implementation to another")
- "For all of the instructions that are IEEE operations (*,+,-,/,sqrt, compares, regardless of whether they are SSE or x87), they will produce the same results across platforms with the same control settings (same precision control and rounding modes, flush to zero, etc.) and inputs. This is true for both 32-bit and 64-bit processors… On the x87 side, the transcendental instructions like, fsin, fcos, etc. could produce slightly different answers across implementations. They are specified with a relative error that is guaranteed, but not bit-for-bit accuracy."
"configure the FP CSR to use 64b intermediates and avoid 32b floats."
reset (a) the floating-point unit's precision and (b) the floating-point unit's rounding mode and (c) the FPU's denormalization mode (is this the same as setting underflows to 0?) and (d) the FPU's exception mode, after every call to an external library which may have changed it (which is probably all of them, eg even printer drivers sometimes change this stuff)
" I know three main sources of floating point inconsistency pain: Algebraic compiler optimizations “Complex” instructions like multiply-accumulate or sine x86-specific pain not available on any other platform; not that ~100% of non-embedded devices is a small market share for a pain. The good news is that most pain comes from item 3 which can be more or less solved automatically."
possibly use "limitation of precision and options and programming style that force operands to be systematically written to memory between floating-point operations. This incurs some performance loss; furthermore, there will still be slight discrepancy due to double rounding on underflow."
- "Many programmers may not realize that even a program that uses only the numeric formats and operations prescribed by the IEEE standard can compute different results on different systems. In fact, the authors of the standard intended to allow different implementations to obtain different results. Their intent is evident in the definition of the term destination in the IEEE 754 standard: “A destination may be either explicitly designated by the user or implicitly supplied by the system (for example, intermediate results in subexpressions or arguments for procedures). Some languages place the results of intermediate calculations in destinations beyond the user’s control. Nonetheless, this standard defines the result of an operation in terms of that destination’s format and the operands’ values.” (IEEE 754-1985, p. 7) In other words, the IEEE standard requires that each result be rounded correctly to the precision of the destination into which it will be placed, but the standard does not require that the precision of that destination be determined by a user’s program. Thus, different systems may deliver their results to destinations with different precisions, causing the same program to produce different results (sometimes dramatically so), even though those systems all conform to the standard."
- http://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html explains that old x86 processors by default try to keep intermediate results in hidden storage with 80 bit precision, so depending on if you are eg in a debug build or not, you may end up writing intermediate results to memory (<80 bit precison) more or less frequently
  - which is why many ppl say to use SSE2 whenever possible.. but others say not to use it b/c it's underspecified.
dont use 'reduction operations'
"Likewise dependence on the underflow and inexact flags is not reproducible because two different methods of treating underflow are allowed to preserve conformance between IEEE-754(1985) and IEEE-754(2008). The rounding modes are reproducible attributes. Optional attributes are not reproducible."
"Most programs will actually produce different results on different systems for a variety of reasons. For one, most programs involve the conversion of numbers between decimal and binary formats, and the IEEE standard does not completely specify the accuracy with which such conversions must be performed. For another, many programs use elementary functions supplied by a system library, and the standard doesn’t specify these functions at all"
"Even Java gave up on its initial intent of getting consistent results on the x87 FPU and retreated to a cowardly strictfp scheme." -- http://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html
- https://en.wikipedia.org/wiki/Strictfp

---

which makes me think: hey, if we're going to slow down floating-point for the sake of consistency... why not just mandate 64-bit floats (double-precison)? otoh maybe some old or embedded architectures support single-precision and not double-precision so for compatibility we should choose single?

http://programmers.stackexchange.com/questions/188721/when-do-you-use-float-and-when-do-you-use-double says basically to use double by default; but it also says

" Some platforms (ARM Cortex-M2, Cortex-M4 etc) don't support double (It can always be checked in the reference manual to your processor. If there is no compilation warnings or errors, it does not mean that code is optimal. double can be emulated.). That is why you may need to stick to int or float. "

---

it's key to make "Decimal" (or Rational) arithmetic available and standard in Oot, since business-y accounting-y stuff is the most common use-case

---

so what are we going to do about floating point determinism?

i'd like oot to guarantee cross-platform arithmetic determinism, unless the user enables some sort of 'make math faster by using platform-specific floating point' optimization.

maybe i'd even like fixed-point by default (after all, it's better for eg finance), as some sort of 'rational' (ratio) type.

but otoh i'm not a numerical computing expert and this isn't a huge priority for me, so probably best to warn that, although cross-platform floating point indeterminism IS a bug, i HAVENT put very much care into testing for it, so there is likely to be many such bugs.

should we automatically fallback to floating point when the denominator gets too big, like Perl6, or should we use arbitrary-precision arithmetic?

---

so we want to use rationals but then what about irrational numbers (both trancendentals and non-trancendental irrationals such as sqrt(2)?)? Surely we aren't expected to do symbolic math here?

--- " Reminded me about the excellent talk about Julia: "Julia: to Lisp or not to Lisp?" https://www.youtube.com/watch?v=dK3zRXhrFZY

One things he points out early is that both the C99 and the R6RS Scheme spec is 20% numerical. Because correct and (reasonably) fast numbers and arithmetic is actually pretty hard to get right on a computer - if you want to abstract away hardware "short-cuts" and allow for precise arithmetic by default. ...

bjourne 3 days ago

> Another point with the above loop, is that ideally, even if it can't be optimized/memoized down to a constant - it really shouldn't have to be much slower than its C counterpart. Except for handling bignums in some way or other (perhaps on overflow only).

Sadly, bignum handling is very expensive even with type hinting. Essentially, every time two numbers is added or subtracted you have to check for overflow:

    int z = hw_add(x, y);
    if (over/under-flowed?()) {
      bignum* bn = new bignum();
      bignum_add(to_bignum(x), to_bignum(y), &bn);
      return bn;
    }
    return z;

So most modern statically typed languages (none named, none forgotten!) actually cheat and use modular arithmetic instead. For example, the straightforward way of computing 9223372036854775807 + 1 on a 64bit system in one of those languages is likely to yield an incorrect result.

Which imho is complete bullshit because there is no reason to sacrifice correctness for performance other than to win benchmarking contests.

rurban 3 days ago

This overflow check is extremely cheap. Just check the overflow flag on the cpu. jo +4; jmp bignum; 6 byte. C compilers didn't support it until last year, so everyone just used inline assembly. Now we have it in gcc 5 and clang 3.4

One good trick started by lua is to use double for everything and check the mantissa of the normalized result for overflows. A double can represent the whole int32 range. With double and nan-tagging you have a unique fast representation of all primitives and can use a double word stack. Which is the biggest win. Look at how python, perl and ruby are doing this. Horrible. And then look how a fast dynamic language is doing it.

bjourne 3 days ago

Everything is relative. :) Compared to the add instruction itself, the overflow check is very expensive. But the big problem is that your types widen. E.g:

    // x and y are hardware integers >= 0
    var z = x + y
    for (var i = 0; i < z; i++) {
        ...
    }

If your language used modular arithmetic, 'z' would also be a hardware integer and the induction variable 'i' could be kept in a cpu register. But with real arithmetic using bignums you can't assume that anymore and each iteration of the loop must do a relatively expensive comparison of 'i' against 'z'. In pseudo code:

    var z = x + y
    var i = 0
    if (!hw_int(z)) i = new_bignum(0)
    while (true) {
      // i < z
      if (hw_ints?(i, z)) {
        if (i >= z) break
      } else {        
        if (bignum_gte(i, z)) break
      }        
      ... body ...
      // i++
      if (hw_int?(i)) {
        i++;
      } else {
        i = bignum_add(i, 1)
      }
    }

rurban 2 days ago

No, it's super cheap. overflows happen <1%, so you mark it as UNLIKELY, and the branch prediction units just don't care about all those bignum branches.

If you know to use bigints or bignums, just type it and use their variants from the beginning.

Most better dynamic languages already use tagged ints, so they have only say ~30bits for an int and do a lot of those overflow checks and still beat the hell out of the poorly designed dynamic languages with full boxed ints or nums.

---

earlz 67 days ago [-]

In the rationale document they explain that arbitrary length integers were too difficult to determine a reasonable gas cost around for math etc.

---

"Finally, one language feature which I would never wish to emulate is Java's number hierarchy.... In Java, there are primitives like int, float, boolean, which are not part of Java's class hierarchy ... Java also has wrapper classes for each of the primitive types: Integer, Float, Boolean, and so on....Note also that int is wrapped by Integer and not Int. Why? I don't know....

It gets worse! The number class hierarchy in Java is flat, meaning that all of the numeric classes (Byte, Integer, Double, Short, Float, Long) descend directly from Number, and Number implements no methods to do anything other than convert a given Number to a particular primitive type.

I wouldn't wish Java's Number hierarchy on my worst enemy. "

---

https://news.ycombinator.com/item?id=9089112

" In Python, given an array xs = [3.1, 1.2, 4.3, 2.2], I can write

    xs.sort()

and get [1.2, 2.2, 3.1, 4.4]

In Haskell

    sort xs

In Swift

    sort(&xs)

In Rust you have to spew this monstrosity

    xs.sort_by(|a, b| a.partial_cmp(b).unwrap_or(Less))

The Rust position appears to be that sorting an array of floats is unreasonable and so you must be "punished" by not being allowed to use the built-in .sort() function.

...

 Whenever I've seen people complaining about Rust religiosity, is mostly because other languages trained them to be careless about certain things.

...

For instance, if you write code that depends on sorting a list of floating point values in Python, like in your example, and write all of your unit tests and design using non-NaN? floating point values, then use a third party library that winds up producing a NaN? value, you are likely to be quite surprised by the outcome:

  >>> nan = float('nan')
  >>> xs = [1, nan, 0, 3, 5, 2]
  >>> sorted(xs)
  [1, nan, 0, 2, 3, 5]

Now, deep in the middle of production code, that result might not be so apparent. Most of the values that you care about are ordered correctly; but eventually you'll hit the fact that the 1 is sorted before the 0, so you'll have some strange, hard to reproduce bug, that depends on the precise ordering of the original array.

...

There are several possible ways you could deal with it; one is the one you mentioned, where you just define some way of comparing NaN? so that you now have a total order. Another would be to panic any time you try to do an undefined operation like comparison on a NaN?. Or you could work with a type that is restricted to non-NaN? values, and deal with the issue only at the boundary of components which convert between arbitrary floating point values and your restricted subset (and any operations that may produce NaN? values).

In order to not have to write those cumbersome sort expressions by hand every time, if you need to work with floating point numbers with one of the non-standard semantics described above, you could define any of the above behaviors by creating a newtype the wraps floats but provides the semantics you want. "

nikomatsakis commented on Feb 24, 2014

...

I think there are two seemingly irreconcilable constraints:

    The operators (< etc) should ideally work as they do in nearly every other language (with possible exception of Haskell, which I believe uses a total equality for floats). This implies that floats are not totally ordered.
    For most generic code, total ordering is mandatory.

The palette of options then becomes:

    Use total ordering everywhere, implementing some version for floats:
        Pro: Ord and operators do the same thing.
        Con: Non-standard behavior of operators, possible porting hazard and performance penalty. In particular, floats do not use the builtin hardware operations, which may impact the use of things like SIMD operations and so forth.
    Use partial ordering for operators, total ordering for generics:
        Pro: Basic floating point operators use the built-in hardware and are consistent with other languages. Extension to SIMD and so forth are clear.
        Con: Generic code generally doesn't get to use operators, unless it is willing to accept partial ordering. This distinction is subtle and people may choose the wrong thing (e.g., a sort may only require the bound for the < operator, even though it expects a total ordering).
        Con: Ord and operators diverge in behavior.

Precedents: I believe Haskell took approach 1 and Java chose approach 2. Note that in both cases, I am assuming that we implement some total ordering for floats, whatever it is, as that seems strictly better, though opinions may vary here.

glaebhoerl commented on Feb 24, 2014

@nikomatsakis You seem to be at least partly re-hashing the discussion we had in #12442 https://github.com/rust-lang/rust/issues/12442 . My understanding of the relevant tradeoffs was here https://github.com/mozilla/rust/issues/12442#issuecomment-35809637 . Further elaboration on my part between two of the options was here https://github.com/mozilla/rust/issues/12442#issuecomment-35878234 . In particular, the proposal I describe at greater length in the latter comment (you can read it as an RFC if you like) is different from both of the options you just described.

thestinger commented on Feb 24, 2014

12517 is the issue for @glaebhoerl's suggestion, although with altered naming.

 pongad commented on Feb 24, 2014

I agree with the discussion above. Though, I'm a little confused: will the comparison operators on user defined types work off of Ord or PartialOrd?? @thestinger Contributor Author thestinger commented on Feb 24, 2014

The comparison operators will be tied to PartialOrd? and PartialEq?. If a type implements Ord and Eq, then the comparison operators implement a total ordering. The Eq trait won't provide any methods itself, trait inheritance will provide them. @pongad Contributor pongad commented on Feb 24, 2014

I'm not quite sure what you mean by the last sentence. Is there some kind of design doc that I can look up? (Sorry if it's in the discussion above and I missed it.) @thestinger Contributor Author thestinger commented on Feb 24, 2014

I opened #12520 with a list of things to change but I haven't written new documentation for std::cmp yet.

https://github.com/rust-lang/rust/pull/12520

 @thestingerContributor Author thestinger commented on Feb 24, 2014

@pcwalton: The IEEE754 totalOrder does use a sane mathematical ordering. It considers -0 to be less than +0, and NaN? as equal to itself but it doesn't distinguish because different encodings of the same value. @pcwalton Contributor pcwalton commented on Feb 24, 2014

It's not fast to implement in software though. I talked to sunfish at lunch and apparently very few people use the IEEE 754 total ordering when they need an ordering for binary trees—they just bitcast to int and use that.

 bill-myers commented on Feb 24, 2014

Floats have multiple orderings useful in different cases, so the only reasonable solution seems to be to force the user to explicitly select which ordering they want for floats.

In practice, that means accepting this pull request and adding a bunch of newtypes implementing normal order with NaNs? banned, integer order, IEEE 754 total order, normal order except for -nan < -inf, and so on.

BTW, I think IEEE 754 total order can be implemented like this, which shouldn't be terribly inefficient (at least if the float is in memory, and thus doesn't need to be moved from FPU registers to ALU registers):

fn total_order(af: f32, bf: f32) -> Ordering { let a: i32 = unsafe {transmute(af)}; let b: i32 = unsafe {transmute(bf)}; if (a & b) >= 0 { both positive, or different signs a.cmp(b) } else { both negative b.cmp(a) } }

or without the if:

fn total_order(af: f32, bf: f32) -> Ordering { let a: i32 = unsafe {transmute(af)}; let b: i32 = unsafe {transmute(bf)}; let mask: i32 = (a & b) >> 31; if both negative, all ones else all zeroes (a ^ mask).cmp(b ^ mask); }

EDIT: fixed, oops, also improved

note that Haskell appears to use some kind of a total order but this guy says Haskell's sorts don't work well:

Reflexive, Transitive, and Symmetric

That's the defining characteristics of an equivalence relation; consider the rationals: 1/2 == 2/4 even though the bit patterns are different. So no, that's not really a problem in itself.

The bigger issue is that since +0 and -0 are equal, division is not a well-defined function with respect to IEEE equality, as 1 / +0 == +âˆž != -âˆž == 1 / -0

It can be amusing to play with various library implementations of sorting algorithms and their interaction with NaNs?: Python actually behaves pretty well, as it's built-in sort appears to double as a topological sort, but Haskell's sort behaves quite badly.

Thanks for the informative post! By Leon P Smith at Fri, 2010-02-12 17:08

is this true? someday i should do this search:

https://www.google.com/search?safe=active&client=ubuntu&hs=sdZ&channel=fs&ei=FOPsXIDpBIjmsAW05ZaACA&q=floating+point+equality+Haskell%27s+sort+total+order&oq=floating+point+equality+Haskell%27s+sort+total+order&gs_l=psy-ab.3..33i10i160j33i10.405740.409280..409398...0.0..0.254.3262.0j22j1......0....1j2..gws-wiz.......0i71j35i39j0i22i30j0j33i160j33i299.iJ7d1y3Z3O4

someone above said that they think Haskell has a total order, but i dunno if that's correct, b/c Hoogle says:

https://hackage.haskell.org/package/base-4.12.0.0/docs/Prelude.html#v:-61--61-

" Note that due to the presence of NaN?, Double's Eq instance does not satisfy reflexivity.

>>> 0/0 == (0/0 :: Double) False "

alextgordon on Feb 22, 2015 [-]

My preference, all considered would be a .sort() that pushes the NaNs? to the front or back (but is slightly slower), and a .sort_unsafe() that assumes no NaNs? but is faster.

[1]

---

another option is to make NaNs? cause errors. Which isn't supported by most platforms, so the language would have to check NaNs? on every operation

---

" Trichotomy is not the only property of real arithmetic that does not hold for floats, nor even the most important. For example:

    Addition is not associative.
    The distributive law does not hold.
    There are floating-point numbers without inverses.

I could go on. It is not possible to specify a fixed-size arithmetic type that satisfies all of the properties of real arithmetic that we know and love. " -- [2]

---

" NaN? != NaN? originated out of two pragmatic considerations:

    That x == y should be equivalent to x - y == 0 whenever possible (beyond being a theorem of real arithmetic, this makes hardware implementation of comparison more space-efficient, which was of utmost importance at the time the standard was developed — note, however, that this is violated for x = y = infinity, so it’s not a great reason on its own; it could have reasonably been bent to (x - y == 0) or (x and y are both NaN)).

    More importantly, there was no isnan( ) predicate at the time that NaN was formalized in the 8087 arithmetic; it was necessary to provide programmers with a convenient and efficient means of detecting NaN values that didn’t depend on programming languages providing something like isnan( ) which could take many years." -- [3]

---

the full comment in the last two sections:

" I was a member of the IEEE-754 committee, I'll try to help clarify things a bit.

First off, floating-point numbers are not real numbers, and floating-point arithmetic does not satisfy the axioms of real arithmetic. Trichotomy is not the only property of real arithmetic that does not hold for floats, nor even the most important. For example:

    Addition is not associative.
    The distributive law does not hold.
    There are floating-point numbers without inverses.

I could go on. It is not possible to specify a fixed-size arithmetic type that satisfies all of the properties of real arithmetic that we know and love. The 754 committee has to decide to bend or break some of them. This is guided by some pretty simple principles:

    When we can, we match the behavior of real arithmetic.
    When we can't, we try to make the violations as predictable and as easy to diagnose as possible.

Regarding your comment "that doesn't mean that the correct answer is false", this is wrong. The predicate (y < x) asks whether y is less than x. If y is NaN?, then it is not less than any floating-point value x, so the answer is necessarily false.

I mentioned that trichotomy does not hold for floating-point values. However, there is a similar property that does hold. Clause 5.11, paragraph 2 of the 754-2008 standard:

    Four mutually exclusive relations are possible: less than, equal, greater than, and unordered. The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself.

As far as writing extra code to handle NaNs? goes, it is usually possible (though not always easy) to structure your code in such a way that NaNs? fall through properly, but this is not always the case. When it isn't, some extra code may be necessary, but that's a small price to pay for the convenience that algebraic closure brought to floating-point arithmetic.

Addendum: Many commenters have argued that it would be more useful to preserve reflexivity of equality and trichotomy on the grounds that adopting NaN? != NaN? doesn’t seem to preserve any familiar axiom. I confess to having some sympathy for this viewpoint, so I thought I would revisit this answer and provide a bit more context.

My understanding from talking to Kahan is that NaN? != NaN? originated out of two pragmatic considerations:

    That x == y should be equivalent to x - y == 0 whenever possible (beyond being a theorem of real arithmetic, this makes hardware implementation of comparison more space-efficient, which was of utmost importance at the time the standard was developed — note, however, that this is violated for x = y = infinity, so it’s not a great reason on its own; it could have reasonably been bent to (x - y == 0) or (x and y are both NaN)).

    More importantly, there was no isnan( ) predicate at the time that NaN was formalized in the 8087 arithmetic; it was necessary to provide programmers with a convenient and efficient means of detecting NaN values that didn’t depend on programming languages providing something like isnan( ) which could take many years. I’ll quote Kahan’s own writing on the subject:

    Were there no way to get rid of NaNs, they would be as useless as Indefinites on CRAYs; as soon as one were encountered, computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion. That is why some operations upon NaNs must deliver non-NaN results. Which operations? … The exceptions are C predicates “ x == x ” and “ x != x ”, which are respectively 1 and 0 for every infinite or finite number x but reverse if x is Not a Number ( NaN ); these provide the only simple unexceptional distinction between NaNs and numbers in languages that lack a word for NaN and a predicate IsNaN(x).

Note that this is also the logic that rules out returning something like a “Not-A-Boolean”. Maybe this pragmatism was misplaced, and the standard should have required isnan( ), but that would have made NaN? nearly impossible to use efficiently and conveniently for several years while the world waited for programming language adoption. I’m not convinced that would have been a reasonable tradeoff.

To be blunt: the result of NaN? == NaN? isn’t going to change now. Better to learn to live with it than to complain on the internet. If you want to argue that an order relation suitable for containers should also exist, I would recommend advocating that your favorite programming language implement the totalOrder predicate standardized in IEEE-754 (2008). The fact that it hasn’t already speaks to the validity of Kahan’s concern that motivated the current state of affairs.

" -- [4]

Wikipedia says what IEEE totalOrder does:

" Total-ordering predicate

The standard provides a predicate totalOrder which defines a total ordering for all floating-point data for each format. The predicate agrees with the normal comparison operations when one floating-point number is less than the other one. The normal comparison operations, however, treat NaNs? as unordered and compare −0 and +0 as equal. The totalOrder predicate orders all floating-point data strictly and totally. When comparing two floating-point numbers, it acts as the ≤ operation, except that totalOrder(−0, +0) ∧ ¬ totalOrder(+0, −0), and different representations of the same floating-point number are ordered by their exponent multiplied by the sign bit. The ordering is then extended to the NaNs? by ordering −qNaN < −sNaN < numbers < +sNaN < +qNaN, with ordering between two NaNs? in the same class being based on the integer payload, multiplied by the sign bit, of those data.[21] " [5]

i gotta say, to me that makes a lot more sense.

I think for oot, we should implement IEEE totalOrder, and expose the other stuff as some function 'fcmp' (we can even give it an infix symbolic form, although i don't know what it should be (.=? `=? =`? `=`? `==? ==:? :==? .==? "==?"? )). I want '==' in Oot to be an equivalence relation, darn it.

---

FullyFunctional? on May 18, 2017 [-]

325 comments presently and sadly no one commented on what I find most exciting about Swift: integer overflow is trapped! FINALLY (the only other sane option is multi-precision integers, like in Scheme or Integer in Haskell). Does Kotlin do the same?

I presume Kotlin also traps out-of-range array accesses.

---

if True is represented as bitwise-not of 0, that is the binary word with all 1 bits, then a problem with that is that the integer representing True differs based on word size

---

"Python rounds float values by converting them to string and then back (github.com)"

https://github.com/python/cpython/blob/master/Objects/floatobject.c#L965-L972

---

mb have a type for floats without infs and nans? more of the normal algebraic rules about equality etc probably hold for that

---

https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/

---

https://www.bitsnbites.eu/ieee-754-suggestion-a-core-subset/

" Suggestion: A “core” subset

I think that it would be great if the IEEE 754 standard could be split into a light weight “core” subset, and a “full” feature set (the latter being more or less identical to the current standard).

Compared to the full version, the core subset would have the following properties:

    No denormalized numbers. All denormalized numbers are treated as zero, and any result that would be a denormalized number will be turned into zero (a.k.a. “flush-to-zero”).
    Only a single rounding mode: Round to nearest, ties to even (the default rounding mode in IEEE 754).
    No exceptions. Any operation that would throw an exception will silently produce a well defined result (e.g. sqrt(-1) will return NaN).
    All NaN:s are treated as quiet NaN:s. There are no signalling NaN:s.

The latter two bullet points aim to simplify hardware implementations and optimize for parallelized floating point execution (SIMD, out-of-order, etc). For some solid arguments against fault trapping in floating point hardware, see Agner Fog’s “NAN propagation versus fault trapping in floating point code”.

The core subset would acknowledge the current state of diverse uses of floating point in modern hardware (such as for graphics and AI), and improve software portability between different platforms. Additionally it would lower the barrier for adding standards compliant floating point functionality to new hardware. "

---

stephencanon on April 14, 2020 [–]

Clang and GCC's approach for these operations is even nicer FWIW (__builtin_[add/sub/mul]_overflow(a, b, &c)), which allow arbitrary heterogenous integer types for a, b, and c and do the right thing.

stephencanon on April 14, 2020 [–]

It feels like it would be a real shame to standardize something that gives up the power of the Clang/GCC heterogeneous checked operations. We added them in Clang precisely because the original homogeneous operations (__builtin_smull_overflow, etc) led to very substantial correctness bugs when users had to pick a single common type for the operation and add conversions. Standardizing homogeneous operations would be worse than not addressing the problem at all, IMO. There's a better solution, and it's already implemented in two compilers, so why wouldn't we use it?

The generic heterogeneous operations also avoid the identifier blowup. The only real argument against them that I see is that they are not easily implementable in C itself, but that's nothing new for the standard library (and should be a non-goal, in my not-a-committee-member opinion).

---

millstone on April 15, 2020 [–]

Being brutal heterodox: STOP WRITING SIGNED ARITHMETIC.

Your code assumes that negating a negative value is positive. Your division check forgot about INT_MIN / -1. Your signed integer average is wrong. You confused bitshift with division. etc. etc. etc.

Unsigned arithmetic is tractable and should be treated with caution. Signed arithmetic is terrifying and should be treated with the same PPE as raw pointers or `volatile`.

This applies if arithmetic maps to CPU instructions, but not to Python or Haskell or etc. If you have automatic bignums, signed arithmetic is of course better.

---

cesarb 13 hours ago [–]

This kind of issue is the reason why some more modern languages like Rust or Go do not have implicit narrowing conversions. For instance, on Rust, trying to simply pass an usize (Rust's equivalent of size_t) to a function which expects an i32 (Rust's equivalent of int) will not compile; the programmer has to write "size as i32" (Rust's equivalent of "(int) size"), which makes it explicit that it might truncate the value at that point.

(Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.)

derefr 12 hours ago [–]

> Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.

It's important to still have the option for efficient truncating semantics, though; some software (e.g. emulators) needs to chunk large integers into 2/4/8 smaller ones, and rotation + truncating assignment is usually the cheapest way to do that.

But, importantly, this is a rare case. Most software that does demoting casts does not mean to achieve these semantics.

So I wonder — are there any low-level/systems languages where a demoting cast with a generated runtime check gets the simple/clean syntax-sugared semantics (to encourage/favor its use), while truncating demotion requires a clumsier syntax (to discourage its use)?

tialaramex 11 hours ago [–]

Right, this is a small infelicity in Rust, it is easier to write

  let x = size as i32;

... even if what you meant was closer to

  let x: i32 = size.try_into().expect("We are 100% sure size is small enough to fit into x");

But at least it isn't C or C++ where you might accidentally write

  x = size;

... and the compiler doesn't even warn you that size is bigger than x and you need to think about what you intended.

It's really hard to fix this in C++. Some of the Epoch proponents want to do so using epochs to get there, basically you'd have a "new" epoch of C++ in which narrowing must be explicit, and old code would continue to have implicit narrowing so it doesn't break.

---

https://futhark-lang.org/blog/2021-08-05-half-precision-floats.html

4 Isaacy 2 days ago

link

flag

The problem is that almost no one will use IEEE halfs; for the reasons in the blogpost. F16 is probably only if ever going to be used in machine learning, and even in some cases people have moved on to u2 (xnor.ai, inference-only), u8-ish (google, TPU, inference-only). I did some research showing that training models can probably be done effectively using an 8-bit float if you expand to a higher precision during summation phase of kronecker products in backpropagation.

For machine learning, as the article says, bfloat16 is the most popular (google, TF/TPU/gpus), but nvidia also has confusingly named TensorFloat?32, which is a 16-bit format.

---

some posts about arithmetic in small bitwidth assembly:

http://cowlark.com/2018-02-27-6502-arithmetic/index.html http://cowlark.com/2018-03-18-z80-arithmetic/index.html http://cowlark.com/2020-10-19-6303-arithmetic/index.html

---

https://lobste.rs/s/ny8gjz/new_integer_types_i_d_like_see

---