proj-oot-ootNotes27

rthomas6 4 hours ago [-]

One "non-programmer" perspective here: Python is the new Perl, except it's fast to read AND write. In Python it's fast to do ...everything. I'm an electrical engineer by trade, and Python gets used a lot.

    import csv
    with open(filename) as csvfile:
        csvreader = csv.DictReader(csvfile)
        data = list(csvreader)

Bam, an array of dictionaries with the data in them, indexed by column name. With listcomps, whatever I need to do will only take a few more lines. And it's this easy for almost any data format. There will be a widely used library for me to `pip install`.

I don't have to do any compile steps. I don't have to use a special IDE. I don't have to think in objects and state-passing and all that BS (but I can if I want to!). I don't have to search for some arcane library. I don't have to know a magic build sequence. I don't have to edit config files. I just make cool shit really fast. It runs slower, but for most things I use it for, I don't care if it takes 10 us or 10 seconds. I just want to make it work.

reply

---

uvtc 6 hours ago [-]

> Looking forward to the "why" post. Glad that they mentioned that they'll be publishing that soon.

I think the "why" is because, for many users:

Python has its own set of problems, but it hits all those marks except for the last one. So, it's gotten quite popular.

reply

tim333 4 hours ago [-]

A lot of people like the Python syntax. It has to be about the most readable of all computer languages.

reply

---

zimablue 17 hours ago [-]

I moved from my last job (mostly C#/vba, front office finance)to a new role (python, data guy at hedge fund).

Doing data stuff and simple web services is absurdly more straight forward in python, the main things I miss (weaknesses of python to C# and I'm guessing java):

nice parallel options (I know several options exist but haven't found any of them as easy to get into as C# async/await, GIL is the problem i guess)

the django database layer doesn't do smart diffs in the same way as .net db projects (in .net it's smart enough to actually look at your code schema vs the database and work out how to roll forward/back, in django it's just using a combination of your code schema and a table describing what has and hasn't been rolled out yet, making everything a bit scarier and tougher if anything goes wrong. I dunno if SQLAlchemy does this.

edit: also I once read a HN comment that a problem at the heart of python is that block-syntax forces you into having only trivial inline lambdas, after writing it for a while I think they might have been right

reply

mixmastamyk 24 minutes ago [-]

Trivial lambdas make Python more readable than highly nested languages.

reply

---

mirekrusin 1 day ago [-]

This is the first time I can see Zig lang [1]. (Self-proclaimed?) C successor with manual memory management, ironed out edge cases, new take on error handling (that resembles well written error handling in C), generics, compile time reflection/execution (macros?), import .h works directly, exporting to C works directly, nullable types etc... all sound quite interesting actually. Anybody has experience/comments on the Zig lang, please?

[1] http://ziglang.org/

reply

---

" ES2015 (also known as ES6), the version of the JavaScript? specification ratified in 2015, is a huge improvement to the language’s expressive power thanks to features like classes, for-of, destructuring, spread, tail calls, and much more. "

---

 kbenson 4 hours ago [-]

> So a CoffeeScript? => is now output as =>

Wait, using the arrow function notation doesn't bind this, arguments, super, or new.target[1] in javascript, as opposed to the normal function syntax. Not being a user of coffeescript, did coffeescript not do that already, or did they also throw in code to change that in the generated code?

madeofpalk 4 hours ago [-]

Coffeescript's => binds this. Javascript's => also binds this (more or less)

Seems to map fairly well conceptually.

reply

---

" Any decent PL course would emphasize the first-class-everything (procedures , obviously), lexical scooping for proper closures, strong but dynamic typing (no implicit coersions) etc. No matter which sect one belongs to - ML or Scheme - these are universal fundamentals, the well-researched foundation.

PHP/Javascript coders usually do not understand the principles and implications behind languages they use. Otherwise they would develop a healthy aversion, like to a tasteless, vulgar porn.

Sane people with PL background would use a dialect of Scheme as a DSL (the numerical tower and all the nice things) or at least Lua (because packers are afraid of parenthesis and see nothing behind of them).

Actually, Scheme (or a simplified dialect such as the one inside Julia - flisp) would be almost ideal choice for financial DSLs.

"

---

 serhei 84 days ago [-]

It sure makes me wonder if Ethereum would do better with a less forgiving programming language. The fact that the syntax resembles Javascript is not reassuring, neither is the fact that the very first code snippet in "Solidity By Example" [1] is littered with comments like this:

        // Forward the delegation as long as
        // `to` also delegated.
        // In general, such loops are very dangerous,
        // because if they run too long, they might
        // need more gas than is available in a block.
        // In this case, the delegation will not be executed,
        // but in other situations, such loops might
        // cause a contract to get "stuck" completely.

---

ghthor 84 days ago [-]

What would be the complications in providing a transpiler for pact into Solidity?

spopejoy 84 days ago [-]

It's possible, but impractical:

The biggest issue facing Solidity developers today is the sheer cost of best practices: ensuring you handle overflows right (ie don't use math primitives but call an approved function), planning for upgrades/data export, you name it: you have to use that code and pay that gas. The environment really needs to provide a lot more "free" functionality than it does today to change this reality.

---

spopejoy 83 days ago [-]

We haven't seen the use-case yet where the (signature,payload) tuple is not isomorphic to a transaction. Yes, in the case of multiple, distinct payloads, you'd have to break those into separate transactions, but that seems like a very specific use-case that doesn't sound very "transactional".

Pact's philosophy sees a blockchain as a special-purpose database, for distributed transactions, so it's not designed for many "normal" database cases, namely bulk operations, searches, heuristics, etc. The use case of accepting multiple signed payloads sounds suspiciously "batchy" to me. Also, Pact is "success-oriented": we see failures like a bad signature as something that should fail your tx only. This is a way of avoiding dicey control-flow logic.

So, if a single payload is what you need the signatures on, you simply design your contract API/function call to have the specific function to handle that data (store it in the database, whatever), and let the environment check the signature.

EDIT: Pact is actually `([Signature],payload)` -- ie, you can sign the same payload multiple times

woah 82 days ago [-]

Signing the same payload multiple times would work for my use case (channels). I also need to accept transactions signed by at least one of two keys. I suspect this might be possible too. However, I can imagine that anything more complicated would go outside of the system you have designed. I haven't had the chance to learn your language, but I would be wary about it either being too limited for edge cases that most real world stuff is going to have, or turning into a "universal framework" antipattern.

spopejoy 82 days ago [-]

> I also need to accept transactions signed by at least one of two keys.

Keysets are designed for precisely this; what's more this rule can now be persisted.

> anything more complicated would go outside of the system you have designed.

Always a possibility of any PL, especially running in a blockchain. Pact makes less assumptions about use cases than most however. It's imperative, allows functions and modules, and offers a database metaphor. That handles a fair number of things.

---

xg15 83 days ago [-]

Also, I think "hard to write" is meant as "require that critical or dangerous details are written explicitly; if a feature adds convenience for writing at the expense of reading, it should be avoided". (Type inference, overloads and reflection come to mind)

I think (hope) that no one is advocating making a language verbose or complex for it's own sake.

kronos29296 83 days ago [-]

Like Rust? Rust does most of those things wrt explicit dangerous behaviour.

xg15 83 days ago [-]

Exactly.

---

MicahZoltu? 66 days ago [-]

There is an EIP that is slated to land in the upcoming hardfork that adds an EVM opcode for effectively "const" call that is enforced by the runtime: https://github.com/ethereum/EIPs/pull/2

---

variables should be local (not global) by default.

functions and fields should be private (not public) by default.

---

kbenson 68 days ago [-]

...PHP at that time was a mix of some good ideas (ease of deployment) with some horrible ones (register globals on by default is unexcusable), and some poor planning. We all lived with the fallout of that for a very long time.

---

" We still shouldn't give up to look for better ways to code, but I think the better ways are easier to use programming languages that preserve the general purposeness of programming, better interactive environments (like F# interactive, Python REPL, etc.) and helpful syntax highlighting and better code annotation systems. Maybe live rendering of markdown and explanatory pictures in the IDE would be a helpful first step. "

--

"

The ID is composed of two parts: a millisecond time and a sequence number. 1506871964177 is the millisecond time, and is just a Unix time with millisecond resolution. The number after the dot, 0, is the sequence number, and is used in order to distinguish entries added in the same millisecond. ... The millisecond part of the ID is obtained using the maximum between the current local time of the Redis server generating the ID, and the last entry inside the stream. So even if, for instance, the computer clock jumps backward, the IDs will continue to be incremental. In some way you can think stream entry IDs as whole 128 bit numbers. "

---

yongjik 1 day ago [-]

After being exposed to several declarative tools during my career, I must say they age poorly: make, autoconf, Tensorflow, and so on. They may start out being elegant, but every successful library is eventually (ab)used for something the original authors didn't envision, and with declarative syntax it descends into madness of "So if I change A to B here does it apply before or after C becomes D?"

At least Tensorflow isn't at that level, because its "declarative" syntax is just yet another imperative language living on top of Python. But it still makes performance debugging really hard.

With PyTorch?, I can just sprinkle torch.cuda.synchronize() liberally and the code will tell me exactly which CUDA kernel calls are consuming how much milliseconds. With Tensorflow, I have no idea why it is slow, or whether it can be any faster at all.

reply

rkangel 1 day ago [-]

I believe that make's declarative is not the cause of it's problems at all - it's poor syntax and lack of support for programming abstractions is what makes it clunky to use.

Something like rake, which operates on the same fundamental principles (i.e. declarative dependency description) but using ruby syntax has aged better.

reply

eru 1 day ago [-]

Indeed. Getting these text based configuration tools work requires a lot of experience in language design.

Lots of tools become accidentally Turing complete, like Make. You need to plan these things from the start. If you want any computation possible at all, you need to be extremely vigilant, and base your language on firm foundations. See eg Dhall, a non-Turing complete configuration language (http://www.haskellforall.com/2016/12/dhall-non-turing-comple...).

If you are happy to get Turing completeness, you might want to write your tool as an embedded DSL and piggy-bank on an existing language, declarative or otherwise.

reply

acjohnson55 1 day ago [-]

SBT in the Scala world would also fit this description.

reply

---

" UTF-8 has some really useful properties:

    It’s backwards compatible with ASCII, which never used the highest bit.
    Sort order is preserved. Sorting a set of code point sequences has the same result as sorting their UTF-8 encoding.
    No additional zero bytes are introduced. In C we can continue using null terminated char buffers, often without even realizing they hold UTF-8 data.
    It’s self-synchronizing. A leading byte will never be mistaken for a continuation byte. This allows for byte-wise substring searches, meaning UTF-8 unaware functions like strstr(3) continue to work without modification (except for normalization issues). It also allows for unambiguous recovery of a damaged stream.

" -- [1]

---

one nice thing about pass-by-reference in Python, as opposed to pass-by-value, is that you can often factor a chunk of code out into a separate subroutine without worrying that updates to composite data structures won't be preserved (as would be the case with pass-by-value); with pass-by-reference, any updates to composite data structures occur whether or not they are encapsulated in a subroutine.

eg the following:

blah... try: input_rows[row_to_process_idx] = readers[row_to_process_idx].next() except StopIteration?: input_rows[row_to_process_idx] = None blah...

is equivalent to this:

def advance_to_next_input_row(row_to_process_idx, input_rows, readers): try: input_rows[row_to_process_idx] = readers[row_to_process_idx].next() except StopIteration?: input_rows[row_to_process_idx] = None

blah.. advance_to_next_input_row(row_to_process_idx, input_rows, readers) blah..

---

woah here's a pretty good argument for Scheme over Common Lisp imo: you can't just do

> (define (adder n) (lambda (x) (+ x n)))

adder

> ((adder 3) 5)

instead if you have a dynamically computed function like that you must call it with 'funcall':

" “How do I write a function that returns a function?” is a typical question asked by people who have learned Scheme before they started with Common Lisp. In Scheme, they were accustomed to be able to do things like this:

> (define (adder n) (lambda (x) (+ x n)))

adder

> ((adder 3) 5)

8 ... in Common Lisp...

we can’t use it in the same way we would use it in Scheme:

CL-USER> (adder 3)

  1. <Interpreted Function "LAMBDA (N)" {485FFE81}>

CL-USER> ((adder 3) 5) In: (ADDER 3) 5 ((ADDER 3) 5) Error: Illegal function call.

Here is why: CL has different namespaces for functions and variables, i.e. the same name can refer to different things depending on it’s position in a form that’s evaluated:

CL-USER> (boundp 'foo) NIL CL-USER> (fboundp 'foo) NIL CL-USER> (defparameter foo 42) FOO

...

To simplify a bit, you can think of each symbol in CL having (at least) two “cells” in which information is stored. One cell - sometimes referred to as its value cell - can hold a value that is bound to this symbol, and you can use boundp to test whether the symbol is bound to a value (in the global environment). You can access the value cell of a symbol with symbol-value.

The other cell - sometimes referred to as its function cell ...

Now, if a symbol is evaluated, it is treated as a variable in that it’s value cell is returned - see the line marked with * above. If a compound form, i.e. a cons, is evaluated and its car is a symbol, then the function cell of this symbol is used - see the line marked +++ above.

In Common Lisp, as opposed to Scheme, it is not possible that the car of the compound form to be evaluated is an arbitrary form. If it is not a symbol, it must be a lambda expression, which looks like

(lambda lambda-list form*)

This explains the error message we got above - (adder 3) is neither a symbol nor a lambda expression. But, you might ask, how do we use the function object that is returned by adder? The answer is: Use funcall or apply:

;;; continued from above CL-USER> (funcall (adder 3) 5) 8 ... "

-- [2]

---

rusanu 109 days ago [-]

Having spent 7 years of my life working with Pat Helland in implementing Exactly Once In Order messaging with SQL Server Service Broker[0] I can assure you that practical EOIO messaging is possible, exists, and works as advertised. Delivering data EOIO is not rocket science, TCP has been doing it for decades. Extending the TCP paradigms (basically retries and acks) to messaging is not hard if you buy into transacted persisted storage (= a database) for keeping undelivered messages (transmission queue) and storing received messages before application consumption (destination queue). Just ack after you commit locally.

We've been doing this in 2005 at +10k msgs/sec (1k payload), durable, transacted, fully encrypted, with no two phase commit, supporting long disconnects (I know for documented cases conversations that resumed and continued after +40 days of partner network disconnect).

Running into resource limits (basically out of disk space) is something the database community knows how to monitor, detect and prevent for decades now.

I really don't get why so many articles, blogs and comments claim this is not working or impossible or even hard. My team shipped this +12 years ago, is used by major deployments, technology is proven and little changed in the original protocol.

[0] https://docs.microsoft.com/en-us

xenadu02 109 days ago [-]

The ACK is also a transaction. Something like this:

1. Receive from server, commit state = NEW, UNACK 2. Send ACK to server, get confirmation from server, commit state = NEW, ACK 3. Start processing, commit state = PROC 4. Got results, commit state = FIN, UNACK 5. Send FIN to server, commit state = FIN, ACK

Each commit is a database transaction where you write the results of that step along with the new state. If anything fails along the way the work-in-progress is discarded along with the state change. The server has an equivalent so if it gets a duplicate ACK for the same (or earlier) state it can ignore it.

In this example, if the client crashes between 1-2, in #2 never gets confirmation, or crashes trying to commit the "NEW, ACK" state then it will retry. The server has already committed the fact that it sent the value to the client and is awaiting an ACK. If it saw the ACK and gets a duplicate it ignores it. If it never saw the first ACK then it will see the second(+) attempt and commit that it saw the ACK before sending confirmation to the client.

ztorkelson 109 days ago [-]

Right. The idea is that by having your database, message queue, and application all share the same transactional context, "reprocessing" the message twice doesn't matter, because the effects are only ever committed exactly once.

It's true that this doesn't work if your processing touches external systems or otherwise escapes the transaction context, but in those cases you do still get at-least-once delivery (or at-most-once, if you choose to commit the receipt before processing the message).

It really is a powerful technology and when leveraged can absolutely reduce the level of effort and cognitive burden to building correct asynchronous systems.

 deepsun 109 days ago [-]

Are you talking about distributed environment, where network partitions can occur? If yes, then there's Two Generals Problem and "FLP result", that just prove it impossible. So I guess you're talking about non-distributed environment.

In other words, to reliably agree on a system state (whether message id was delivered) you need the system to be Consistent. And per CAP theorem, it cannot be Available in presence of Partitions.

So other people you're referring to probably talk about distributed systems.

rusanu 109 days ago [-]

Yes, I'm talking about distributed systems and I am aware of the CAP theorem. Hence my choice of the word 'practical'.

As I said, users had cases when the plumbing (messaging system) recovered and delivered messages after +40 days of network partitioning. Correctly written apps completed the business process associated with those messages as normal, no special case. Humans can identify and fix outages and databases can easily outlast network outages (everything is durable, transacted, with HA/DR). And many business processes make perfect sense to resume/continue after the outage, even if it lasted for days.

alexbeloi 109 days ago [-]

I'm not really versed in this topic, but it seems like using a database for a socket makes the system entirely centralized around that database. Is there something I'm missing?

ztorkelson 109 days ago [-]

ServiceBroker?, at least, had the capability of (transactionally) sending messages between databases. So, if you drank the kool-aid (I did; it wasn't so bad), there needn't be "the centralized database". You can separate your databases and decompose your services, and indeed it's easier to do so correctly and with confidence because the technology eliminates a lot of hairy edge cases.

rusanu 109 days ago [-]

Pat had some opinions about CAP and SOA and distributed systems, see [0]. I also remember a talk given by Pat and Eric Brewer together, that went deeper into the whole CAP ideas vis-a-vis the model Pat had been advocating (see Fiefdoms and Emissaries [1]), but I can't remember when it was or find a link for it.

[0] https://blogs.msdn.microsoft.com/pathelland/2007/05/20/soa-and-newtons-universe/

[1] http://download.microsoft.com/documents/uk/msdn/architecture/connected/fiefdoms_emissaries.ppt

aetherson 109 days ago [-]

Exactly once delivery is theoretically impossible.

Approaching exactly once delivery asymptotically is possible. Your parent poster's point is that this is one where you can get so close to exactly once in order that in practice you never violate for years and years.

rusanu 109 days ago [-]

My point is that I've seen people making decisions to go with 'best effort delivery' and live with the (costly) consequences because they read here and there that EOIO is impossible, so why bother trying.

inlined 109 days ago [-]

Because idempotentcy can be cheap at the application layer, so why try to solve something we know can never truly be solved?

mamon 109 days ago [-]

"Exactly once" model of message is theoretically impossible to do in distributed environment with nonzero possibility of failure. If you haven't received acknowledgement from the other side of communication in the specified amount of time you can only do one of two things:

1) do nothing, risking message loss

2) retransmit, risking duplication

But of course that's only from messaging system point of view. Deduplication at receiver end can help reduce problem, but itself can fail (there is no foolproof way of implementing that pseudocode's "has_seen(message.id)" method)

LgWoodenBadger? 109 days ago [-]

I agree. I wish more messaging platforms would recognize this and stop trying to paper-over the failure mode (Kafka just came out with "Exactly Once," which I think is a variant of the 2-Phase-Commit protocol, which still does not solve the problem).

My go-to for explaining to people is the Two Generals Problem https://en.wikipedia.org/wiki/Two_Generals%27_Problem

falcolas 109 days ago [-]

So, a combination of a best effort "at least once" messaging with deduplication near the receiving edge. Fairly standard, honestly.

There is still a potential for problems in the message delivery to the endpoints (malformed messages, Kafka errors, messages not being consumed fast enough and lost), or duplication at that level (restart a listener on the Kafka stream with the wrong message ID) as well.

This is based on my own pains with Kinesis and Lambda (which, I know, isn't Kafka).

In my experience, better to just allow raw "at least once" messaging and perform idempotant actions based off the messages. It's not always possible (and harder when it is possible), but its tradeoffs mean you're less likely to lose messages.

caust1c 109 days ago [-]

This is generally better, but we're delivering these messages to integrations which don't necessarily take idempotent actions.

---

kibwen 131 days ago [-]

Looking ahead to 1.19 (currently in beta, which you can try out easily via rustup), the RFC for unsafe (C-style) unions recently stabilized, which closes IMO the biggest remaining hole in Rust's FFI story. By removing the need for various manual hacks to interoperate with unions from C libraries, this should help bring greater safety to C code bound to Rust (or vice versa). Along with `pub(crate)` discussed in the OP and the `?` operator from 1.15, this makes for the third real language-level "feature" that Rust has added in the past two years, though I'd say these are more quality-of-life improvements than anything earth-shaking. Now, when `impl Trait` lands (recently accepted, but yet unimplemented), that will be a big deal. :P

cpr 131 days ago [-]

What will `impl Trait` do, for us Rust-interested onlookers?

steveklabnik 131 days ago [-]

  fn foo() -> impl Trait {

means "this function returns something that implements the Trait trait, but I'm not gonna tell you what that concrete type is."

---

"Not Getting Too Creative ... Python allows you to get pretty creative with the code you’re writing. For instance, you can:

    Use MetaClasses to self-register classes upon code initialization
    Swap out True and False
    Add functions to the list of built-in functions
    Overload operators via magic methods

These features are fun to play around with but, as most programmers will agree, they often make the code harder to understand when reading someone else’s work. "

---

StillBored? 1 day ago [-]

As I get older, I'm strongly swinging into the camp that thinks languages should have strongly restrictive syntax. Object pascal is one of those languages which is both low level enough to cleanly map to efficient code, as well as high level enough not to feel really restrictive. Its a language that is completely misunderstood because far to many people read a couple critical essays, and believed everything they read, even though the strongest argument frequently was that its harder to type "begin/end" than "{/}". Which is a pretty lame thing, given code completion in editors as old as emacs.

Put another way, the thought patterns programmers fall into when using it seem to result in code which is easier to understand than most other languages which seem to encourage "perlism" (creating a single unreadable line), "forthism" (creating a billion two line words that combine to solve all the problems in one word), or a few other things which become completely unmaintainable when the project grows beyond what can be written by a over-energetic student in a semester at school.

reply

chubot 21 hours ago [-]

I don't think anyone really disagrees with you -- most people have problems with the syntax of Perl, C, and C++. Sure, they can get used to it, particularly with C because it's small, but the syntax isn't why they use the language.

Somewhat surprisingly, I think Perl's syntax is one of the reasons that Python became popular. Perl can do pretty much everything Python can, and it came significantly earlier. Yet people gravitated toward Python.

I wouldn't call Python "strongly restrictive", but I would call it "just right". It's terse without being unreadable.

JavaScript? syntax is also pretty sane (but there are many horrible parts to the semantics.)

Ruby seems to be pretty nice in the common cases, but there are apparently gremlins lurking in corner cases. Python has only 1 or 2 syntactic gremlins I can think of (trailing comma, etc.)

reply

kibwen 21 hours ago [-]

> Perl can do pretty much everything Python can, and it came significantly earlier. Yet people gravitated toward Python.

I don't think syntax is the main culprit here, and Perl is only four years older than Python. For about 15 years Python tried, unsuccessfully, to compete with Perl as the premier language for shell scripting and sysadmin tasks. It wasn't until around 2005 that Python finally caught on, its rise coinciding with Ruby. While I do personally find Python more readable than Perl, I attribute its ascent more to the combination of the Web 2.0 gold rush demanding dynamically-typed languages for quick prototyping, Perl's apparent mid-2000s limbo due to Perl 5 vs Perl 6, the then-declining reputation of PHP and the shared hosting services which first popularized it, and the quality of Python's standard library (which made up for CPAN).

reply

---

oblio 8 hours ago [-]

I'd say that simplest "production-ready" criterion is this: does it have an officially supported and stable release of a IntelliJ? IDE (there is a slight Java-ecosystem bias, but I think it's only slight).

So the mainstream languages are...: C/C++, C#, F#, Go, Groovy, Java, JavaScript?, TypeScript?, Kotlin, Objective-C, PHP, Python, Ruby, Scala, SQL, Swift, VB.NET (source: https://www.jetbrains.com/products.html).

Sounds about right :)

reply

majewsky 3 hours ago [-]

So, from the top of my head, Erlang, Haskell, OCaml, Perl, Fortran, Cobol and any type of shell script is not production-ready. Nice to have that clarified. :P

reply

---

" Computing with high-dimensional vectors complements traditional computing and occupies the gap between symbolic AI and artificial neural nets. Traditional computing treats bits, numbers, and memory pointers as basic objects on which all else is built. I will consider the possibility of computing with high-dimensional vectors as basic objects, for example with 10,000-bit words, when no individual bit nor subset of bits has a meaning of its own--when any piece of information encoded into a vector is distributed over all components. ... Two operations on high-dimensional vectors correspond to the addition and multiplication of numbers. With permutation of coordinates as the third operation, we end up with a system of computing that in some ways is richer and more powerful than arithmetic, and also different from linear algebra. Computing of this kind was anticipated by von Neumann, described by Plate, and has proven to be possible in high-dimensional spaces of different kinds.

The three operations, when applied to orthogonal or nearly orthogonal vectors, allow us to encode, decode and manipulate sets, sequences, lists, and arbitrary data structures. One reason for high dimensionality is that it provides a nearly endless supply of nearly orthogonal vectors. Making of them is simple because a randomly generated vector is approximately orthogonal to any vector encountered so far. The architecture includes a memory which, when cued with a high-dimensional vector, finds its nearest neighbors among the stored vectors. A neural-net associative memory is an example of such.

Circuits for computing in high-D are thousands of bits wide but the components need not be ultra-reliable nor fast. Thus the architecture is a good match to emerging nanotechnology, with applications in many areas of machine learning. I will demonstrate high-dimensional computing with a simple algorithm for identifying languages. " -- [EE CS Colloq] Computing with High-Dimensional Vectors * 4:30PM, Wed Oct 25, 2017 in Gates B03, Pentti Kanerva, Stanford CSLI

---

feelin_googley 10 minutes ago

parent flag favorite on: Learn to use Awk with hundreds of examples

Kernighan and Van Wyk, "Timing Trials, or, the Trials of Timing: Experiments with Scripting and User-Interface Languages" (1998)

http://web.archive.org/web/20000829071436/http://inferno.bell-labs.com:80/cm/cs/who/bwk/interps/pap.html

What if k scripting language was included in these experiments?

http://kparc.com/z/bell.k

k3:

1. "Basic Loop Test"

   \t 1000000(1+)/0

2. "Ackermann's Function Test"

   \t {:[~x;y+1;~y;_f[x-1;1];_f[x-1;_f[x;y-1]]]}[3;7]

3. "Indexed Array Test"

   \t x(x;|x:!200000)     

4. "String Test"

   \t f:{(x>#:){(i _ x),(1+i:_.5*#x)#x:,/("123";x;"456";x;"789")}/y};do[10;f[500000;"abcdef"]]

5. "Associative Array Test"

   \t {+/("0123456789abcdef"16_vs'!x)_lin$!x}40000

6. "File Copy Test"

   `f 0:(30000 _draw 300)#\:"king "       
   \t `f 0:0:`f   

7. "Word Count Test"

   \t (#:;+/(+/1<':" "=)';+/#:')@\:0:`f

8. "File Reversal Test"

   \t `f 0:|0:`f          

9. "Sum Test"

   `f 0:100000#,"-123.456" / generate 100K fp numbers
   \t +/0.0$0:`f

---

" First of all, what is Python? According to its creator, Guido van Rossum, Python is a:

    “high-level programming language, and its core design philosophy is all about code readability and a syntax which allows programmers to express concepts in a few lines of code.”"

sounds a lot like what we want...

---

second half of this talk has a great list of features that we probably want:

http://dev.stephendiehl.com/nearfuture.pdf

some HN comments on the article:

"

...

...

pjmlp 18 hours ago [-]

10-20 years ago some of us could use Smalltalk, do systems programming with strong type safe languages, use RAD environments like Delphi, release applications in Prolog, for example.

To me it seems we are catching up with the past, and as someone already programming on those environments, it looks we have spent 10-20 years loosing our tools, educating the masses, only to get a taste of things used to be.

reply

js8 22 hours ago [-]

I would love to see programming as a dialogue between user and computer (programmer and compiler). For example:

Compiler would infer the types, and the programmer would read it and say, oh, I agree with this type, but I disagree with this type, that's perhaps wrong, this should be rather that type. Then the compiler would infer types again, based on programmer's incremental input.

Data structure selection. Programmer would say I want a sequence here. The compiler would say, I chose a linked list representation. The programmer would look over it, and disagree, saying, you should put this into an array. And compiler could say, look, based on measurements, array will save this much space but list will be this much faster.

Code understanding. Programmer should be able to say just, I don't know what happens here, and the compiler would include some debug code to show more information at that point.

Or take refactoring. Programmer would write some code, computer would refactor it to simplify it. Then programmer would look over it, and say, no, I rather want this, and don't touch it, and he would perhaps write some other code. The compiler would refactor again...

But all this requires that there is syntactically distinct way (so that perhaps editor could selectively hide it) to specify these remarks in the code, both for computer and programmer. So each of them should have a special kind of markup that would be updated at each turn of the discussion. Because you don't want to just overwrite what the other side has just said; both are good opinions (which complement each other - human understands the purpose of the code but the computer can understand the inner details much better). So, to conclude - I wish future programming language would include some framework like this.

reply

nadagast 2 minutes ago [-]

I've been thinking a lot about nearly these exact same things. We desperately need better ways to deal with derived textual data. Why do we make the programmer guess which data structure will work best for a particular task, when we could easily try it each way and record the performance, and pick the best? A big part of the reason must be that we have no good strategy for storing that data and making that choice in an ongoing and living way, inside of a code repository. We suck at dealing with derived data on the meta layer above our programming languages.

Email me at glmitchell[at]gmail if you want to chat more about this.

reply

bjz_ 19 hours ago [-]

Programming in Lean, Agda, and Idris have been quite a revelation in terms of interactive type system exploration. Granted, they can be flakey at times (Lean especially), but it's a tantalizing glimpse at what could be around the corner. Hazel[1] is also a pretty exciting look at advancing the idea of 'programming with holes', as is Isomorph[2]. Lots of exciting things around the corner!

[1]: http://hazelgrove.org/

[2]: https://isomorf.io/

reply

marcosdumay 17 hours ago [-]

The types line is what some people do with Haskell, Idris. I do personally favor writing the large-scale types beforehand, because that gives the compiler a chance of saying "look, you program is wrong", what is way more useful than "hey, your program has this type". Besides, abstract-type driven programming is an incredibly good methodology where it's applicable.

On code understanding, what makes it better than the programmer inserting the debug statements themselves? It saves some misunderstanding from the computer's part.

On refactoring, some IDEs do that. I'm on the fence about its usefulness.

reply

"

---

kmax12 1 day ago [-]

I am the lead contributor of a python library called Featuretools[0]. It is intended to perform automated feature engineering on time-varying and multi-table datasets. We see it as bridging a gap between pandas and libraries for machine learning like scikit-learn. It doesn't handle data cleaning necessarily, but it does help get raw datasets ready for machine learning algorithms.

We have actually used it to compete on Kaggle without having to perform manual feature engineering with success.

[0] https://www.featuretools.com

reply

ScottBurson? 1 day ago [-]

Wow, this looks very cool!

reply

IanCal? 1 day ago [-]

I'm starting to build up various utilities to help with this kind of thing, but I fully agree. The decisions require understanding the business requirements (do I use source X or Y for field 1, what errors are OK, what types of error are worst, etc), but the process of finding some of these could be better.

One simple one is missing data. Missing data is rarely a null, I've seen (on one field, in one dataset):

    N/A
    NA
    " "
    Blank # literally the string "Blank"
    NULL # Again, the string
    No data
    No! Data
    No data was entered for this field
    No data is known
    The data is not known
    There is no data

And many, many more. None can be clearly identified automatically, but some processes like:

Pull out the most common items, manually mark some as "equivalent to blank" and remove.

Identify common substrings with known text (N/A, NULL, etc) and bring up those examples.

Are useful, I'd like to extend with more clustering and analysis to bring out other common general issues but rare specific issues. Lots of similar things with encodings, etc. too.

Other things that might be good are clearer ways I could supply general conditions I expect to hold true, then bring the most egregious ones to my attention so I can either clear out / deal with them in some way. A good way of recording issues that have already been analysed and found to be OK would be great too.

reply

philvb 23 hours ago [-]

Yes, completely agree that each dataset requires decisions to be made that can't be automated, but there are huge opportunities for tools to assist users in understanding what cleaning decisions they might want to make and how those decisions affect the data. Most data cleaning tools do a very poor job of helping the user visualize and understand the impact cleaning has on data - they're usually very low level (such as pandas).

As an example of a tool: Trifacta (disclaimer I work here) https://www.trifacta.com/products/wrangler/. We're trying to improve data cleaning with features such as suggesting transforms the user might want, integrating data profiling through all stages to discover and understand, and transform previews so the user can understand the impact.

I think there's a huge opportunity for better tools in the problem space.

reply

---

"The most disliked languages, by a fairly large margin, are Perl, Delphi, and VBA. They’re followed by PHP, Objective-C, Coffeescript, and Ruby....If you’ve read some of our other posts about the growing and shrinking programming languages, you might notice that the least disliked tags tend to be fast-growing ones...Similarly, many of the shrinking tags, such as Perl, Objective-C, and Ruby, are ones we’ve previously observed to be among the fastest-shrinking tags on the site."

---

" I’ve noticed something interesting lately: I can categorize almost all of the software I use into two distinct groups:

    Software that breaks pretty much every time I update it (e.g. weechat, offlineimap, Clojure, many Python packages, Skype).
    Software that almost never breaks when I update it (e.g. Mercurial, git, tmux, Python, ack, zsh, Vim, Dropbox).

...

I think it’s important that I nail down what I mean by “breaks” or “is broken”. I don’t necessarily just mean the introduction of “new bugs”.

When I say that a program “breaks”, I mean:

    When I update from version X to version Y of a program, library, or language…
    Without changing my configuration files, source code, etc…
    The resulting combination doesn’t work properly

In effect, I’m saying that “breaking backwards compatibility” means “the program is broken”!"

...

When pointing out a backwards incompatible change to someone, you’ll often get a response similar to this:

    “Well, I mentioned that backwards incompatibility in the changelog, so what the hell, man!”

This is not a satisfactory answer.

When I’m updating a piece of software there’s a good chance it’s not because I’m specifically updating that program. I might be:

    Moving to a new computer.
    Running a “$PACKAGE_MANAGER update” command.
    Moving a website to a bigger VPS and reinstalling all the libraries.

In those cases (and many others) I’m not reading the release notes for a specific program or library. I’m not going to find out about the brokenness until I try to use the program the next time.

If I’m lucky the program will have a “this feature is now deprecated, read the docs” error message. That’s still a pain, but at least it’s less confusing than just getting a traceback, or worst of all: silently changing the behavior of a feature.

...

Exceptions

One case where I feel the backwards incompatibility tradeoff is worth it is security.

A good example of this is Django’s change which made AJAX requests no longer be exempt from CSRF checks. It was backwards incompatible and I’m sure it broke some people’s projects, but I think it was the right thing to do because it improved security.

I also think it’s unreasonable to expect all software to be perfectly ready from its first day.

Sometimes software needs to get poked and prodded in the real world before it’s fully baked, and until then requiring strict backwards compatibility will do more harm than good.

By all means, backwards compatibility should be thrown to the wind in the first stage of a project’s life. At the beginning it needs to find its legs, like a baby gazelle on the Serengeti. But at some point the project needs to get its balance, grow up, and start concerning itself with backwards compatibility.

But when should that happen? A Solution

I think there’s a simple, intuitive way to mark the transition of a piece of software from “volatile” to “stable”:

Version 1.0 "

---

 throwaway7645 16 hours ago [-]

This is what I like about Rebol & Red...pity Rebol didn't catch on more. The full Rebol download is a single executable (no real install) and can build GUIs, and so much more in a very small amount of code.

reply

hdhzy 7 hours ago [-]

Not to mention Red can cross compile for multiple OSes and the parse dialect that is basically generalized, readable regular expressions.

My personal theory why Rebol/Red didn't succeed in the mainstream is steep learning curve, it's hard to discover how to do things if you haven't already done some substantial programming work in it. While built-in help is good you need to know what you're looking for. (Delphi had better help in this matter, with examples etc.).

By the way I wonder why Pecan's post is [dead], it doesn't seem to be violating any rule.

reply

---

rkangel 3 hours ago [-]

I think Rust has done this quite well, managing to have the best of both worlds. Unstable features are only available on the nightly build and need specifically enabling (and their interface may change). Once a feature is considered stable it is carefully maintained. They do have some breaking changes, but they're not common and for unusual edge cases:

https://killercup.github.io/bitrust/

This explains the Rust approach:

https://blog.rust-lang.org/2014/10/30/Stability.html

reply

lmkg 2 hours ago [-]

Rust also uses a sizable fraction of all publicly-available Rust code as a regression suite for compiler updates. This allows them to check edge cases that are "breaking in theory" to find out if there's actually any code that would be broken, or if the edge case had literally not yet been encountered.

reply

---

probably not relevant to a programming language but i don't know where else to put this (response to a post which argued for replacing files eg. openoffice slide presentations (current zip files with XML plus image assets) with SQLite:

mjw1007 10 hours ago [-]

For me, the biggest reason not to do this sort of thing is that SQLite still often refuses to work over network filesystems.

I appreciate that this is because the authors are being careful about possible corruption when locking isn't working properly, but if they really mean the « SQLite competes with fopen() » line they need to find a way to get round this, because fopen() doesn't do that.

reply

---

" REPL

The grammar of Go needs to be changed to support line-by-line evaluation. Top-level constructions in a .go file are declarations, not statements. There is no sequence in the declarations, all are evaluated simultaneously across all the files in a package. A declaraction on an earlier line in a file can happily refer to a name declared later in the file. If you want to type Go declarations into a REPL, nothing can execute until you declare the package done.

So the first thing you need to do to define a REPL for Go is to step down a level. Instead of declaractions, process statements. Pretend everything typed into the REPL is happening inside the func main() {} of a Go program. Now there is a sequence of events and statements can be evaluated as they are read.

This shrinks the set of programs you can write dramatically. In Go there is no way to define a method on a type inside a function (that is, using statements). There is a good reason for this: all the methods of a type need to be defined simultaneously, so that the method set of a type doesn’t change over time. It would lead to a whole new class of confusing errors if you could write:

func main() { type S string var s S _, ok1 := s.(io.Reader) func (S) Read(b []byte) (int, error) { ... } _, ok2 := s.(io.Reader) fmt.Println(ok1, ok2) Prints: false, true }

That is why you cannot write that in Go.

So for the language to be REPL-compatible it needs serious grammar surgery, which would make a REPL possible, but hurt the readability of big complex programs.

Neugram has its own statement-based method syntax https://github.com/neugram/ng/blob/master/eval/testdata/method2.ng , which diverges in a small but significant way from Go. (Though it won’t be properly functional until the Go generating backend is complete.)

"

---

crawshaw 18 hours ago [-]

I made this point poorly, because I agree with you that Go is quite easy to read and very predictable.

With more words: I have written a parser and type checker for an ML-style language, with parametric types and several other neat tricks in it, and I've now written a parser and type checker for a good subset of Go. The latter has been far more work. I am not entirely sure how to explain the work. Go has lots of syntactic and type conveniences that are easy to use and read, but quite difficult to implement.

As there are few implementers and many users, I think the original designers of Go picked well when they put the work on us implementers.

reply

ainar-g 12 hours ago [-]

Can you elaborate on what syntactic conveniences are difficult to implement and why? Language design is on of my hobbies, so I really would like to know that.

reply

crawshaw 7 hours ago [-]

One good example is untyped constants. They form a large extension to the type system only present at compile time. It is a good example because there is an excellent blog post describing them in detail: https://blog.golang.org/constants

In particular note the difference between

    const hello = "Hello, 世界"

and

    const typedHello string = "Hello, 世界"

one of these can be assigned to named string types, the other cannot.

As a user of Go I find untyped constants to be extremely useful. They almost always do what I want without surprise. Implementing them however, is not trivial.

A tricker example is embedding. It adds a lot of complexity to the type system. As a user, it can be a bit surprising, but I admit it is very useful.

reply

---

 HumanDrivenDev 2 days ago [-]

(((asks in essence, what is the purpose of getters and setters?))) reply

derefr 2 days ago [-]

In languages with both primitive "read/write field" operations that can't be interceded upon, but also with interfaces/protocols, "obfuscating" (encapsulating) a field access in getter/setter methods is how you allow for alternative implementations of the interface/protocol.

Specifying your object's interface in terms of getters/setters allows for objects that satisfy your interface but which might:

• read from another object (flyweight pattern)

• read from another object and return an altered value (decorator pattern)[1]

• read from an object in a remote VM over a distribution protocol (proxy pattern)

Etc.

Then a client can build these things and feed them to your library, and you won't know that they're any different than your "ordinary" data objects.

You don't see this in languages with "pure" message-passing OOP, like Ruby or Smalltalk, because there's no (exposed) primitive for field access. All field access—at the AST level—goes through an implicit getter/setter, and so all field access can be redefined. But in a language like Java, where field access has its own semantics divorced from function calls, you need getters/setters to achieve a similar effect.

---

" The 8051 harkens from a time where developers programmed microcomputers (and microcontrollers) in assembly, not C. Its fancy control-friendly features like small sets of banked registers (which can interchange in an interrupt context) don’t play well with compilers.

Worst still, the 8051 suffers from a small, 8-bit stack that struggles to keep up with traditional stack-based C implementations. "

" The 8051 is not a C friendly processor.

It has several address spaces. I used the Keil 8051 compiler extensively and it had several pointer types.

    An 8 bit pointer to point at the internal memory space or internal indirect space.
    A 16 bit pointer to point to either external ram or code space.
    A "smart" 24 bit pointer that could point anywhere. Basically a tag followed by 16 bits.

All of this is without the added complexity of bank switching schemes that make things even more "interesting".

The smart pointers where to be avoided because they were big and slow. shareeditflag

" A compliant C compiler requires SIZE_MAX to be at least 65535 (0xFFFF). This implies an object pointer must be at least 16 bits. "

---

" Early C compilers for the 8051 often started as 68K or x86 compilers hacked with an emulated software stack stored in XRAM. This produced code that dawdled through tasks at a snail’s pace.

PL/M-5110was an Intel compiler introduced in 1980 that got around this problem by passing variables in defined RAM locations.

Keil took this idea and ran with it. They introduced C51 in 1988 — and it flourished in popularity.

...

the big problem is reentrancy — when a function attempts to call itself (i.e., recursion), or when an ISR calls the same function it happened to interrupt "

Keil apparently has an annotation for variables that need to be reentrant.

---