Bayle Shanks's website: proj-oot-ootNotes16

---

comments on an article whose summary is "data-first, not code first"

hcrisp 19 hours ago

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -Linus Torvalds

zurn 13 hours ago

Though Guy Steele's idea sounds contentious, that OO encourages "data-first" because code is encapsulated:

  "Smart data structures and dumb code works a lot better than 
  the other way around."

  This is especially true for object-oriented languages, 
  where data structures can be smart by virtue of the fact 
  that they can encapsulate the relevant snippets of "dumb 
  code." Big classes with little methods–that's the way to go!

Or maybe he is just encouraging OO programmers to think more in this vein?

Chris_Newton 13 hours ago

The tricky part is that smartness of data structures is context-sensitive.

One of the most common design errors in OO systems seems to be building systems that beautifully encapsulate a single object’s state… and then finding that the access patterns you actually need involve multiple objects but it’s impossible to write efficient implementations of those algorithms because the underlying data points are all isolated within separate objects and often not stored in a cache-friendly way either.

Another common design problem seems to be sticking with a single representation of important data even though it’s not a good structure for all of the required access patterns. I’m surprised by how often it does make sense to invest a bit of run time converting even moderately large volumes of data into some alternative or augmented structure, if doing so then sets up a more efficient algorithm for the expensive part of whatever you need to do. However, again it can be difficult to employ such techniques if all your data is hidden away within generic containers of single objects and if the primary tools you have to build your algorithms are generic algorithms operating over those containers and methods on each object that operate on their own data in isolation.

The more programming experience I gain, the less frequently I seem to find single objects the appropriate granularity for data hiding.

 iofj 12 hours ago

Have you noticed yet that programming ideologies go around in a circle. Programmers may currently be defending data-first (/data-oriented/...) programming, but it isn't the first time they did so ?

The way I experienced it:

micro-services/small tools "that do one thing well"/... (pro: reasonably generic, con: loads of tools, requires expert users, if one tool out of 30 is not working well things break)

data-first/data-oriented programming (really easy to manipulate data, very, VERY hard to maintain consistency)

database-oriented programming (enforce consistency. Otherwise data-oriented. Works well, then when in data-oriented programming your data would have gone inconsistent, in this paradigm you get errors. Needless to say "every operation errors out" is better than "our data suddenly became crap", but still blocks entire departments/systems unpredictably)

event-driven programming (really easy to make button X do Y) (to some extent built into data-base oriented programming, also available separately. Works well, but gets extremely confusing when programs get larger)

object-oriented programming (massively simplifies the "I have 200 different message types and forgot how they interact" problems of event-driven programming, also provides the consistency of database-oriented programming)

event-driven programming with UI designers (makes event-driven programming and later object oriented event-driven programming accessible for idiots)

declarative object-oriented programming / aspect-oriented programming / J2EE and cousins / "Generating forms from data" / Django

micro-services/"do one thing well" (same disadvantages as during the 80s)

data-first/data-oriented programming (same disadvantages as end-of-80s) * you are here *

How much do you want to bet that servers that enforce data consistency and store it are next ?

mgrennan 2 hours ago

So so much truth in this. I've also seen Batch processing with terminals ID move to real time programming and operating systems. Move back to browsers and session ID (IE HTML GET / PUT are the new punch cards)

jeffdavis 19 hours ago

Or, start from the user experience.

Both ((data and ui)) are good places to start, and both should be given serious consideration early in the project.

Code is usually the worst place to start. If you do need code early on, I think it's fine to write it as quickly as possible, even ignoring edge cases and simplifying algorithms.

...

That's one of the reasons SQL databases are so great: they help you get the data design up and going quickly, and will enforce a lot of your invariants (through schema normalization, PK/FKs, and CHECK constraints). If anything goes wrong, rollback returns you to a known-good state.

...

 nostrademons 17 hours ago

This is one reason why prototyping is often so necessary. When you start from the user experience, you usually end up working your way back from the front-end.... Start from the UX, and you end up with a bunch of ad-hoc data structures that are very difficult to rationalize and inefficient to access. Start from the data, and you end up with a UI that mimics the data you were given and not how the user thinks about achieving their task....The solution is to write a quick & dirty prototype focusing on UX, nailing it but focusing on only the happy-path that's core to the user experience. Then take careful note of the data structures that you ended up with, and throw away the prototype. Then you start with a carefully planned data architecture that captures everything you learned in the prototyping phase, but eliminates redundancies and awkward access paths that you wrote in the quick & dirty prototype.

someone else commented 'start with the API' but they were downvoted

al2o3cr 8 hours ago

...

http://prog21.dadgum.com/37.html

...points out the tradeoffs explicitly: pure functions mean (in this case) immutable data, which means more-complex data structures...

Euclid's algorithm in Forth, Postscript (which is like Forth), and Python:

: gcd ( a b -- n ) begin dup while tuck mod repeat drop ;

/gcd { { {0 gt} {dup rup mod} {pop exit} ifte } loop }.

def gcd(u, v): return gcd(v, u % v) if v else abs(u)

it's interesting to compare:

Forth has that '( a b -- n )' thing
Forth is so wonderfully concise
Forth's fn defn syntax is concise
Forth has control flow syntax
Forth's 'repeat...until...end' look is spelled 'begin...while....repeat'
Postscript is like Forth but has blocks delimited by curly braces; and appears to have ifte (if) and loop instead of a specific repeat...until construct
in Postscript, the function 'lambda x: x > 0' looks like '{0 gt}'
Postscript is slightly less concise than Forth (in this example), and messier looking, probably b/c of the braces and the lack of repeat..until
Python is about as concise as Forth

---

Forth has an interesting convention to document what stack manipulation functions do (in the parentheses; stuff in parens is comment in Forth); eg:

dup ( a -- a a ) ?dup ( a -- a a

drop ( a -- ) swap ( a b -- b a ) over ( a b -- a b a ) rot ( a b c -- b c a ) -rot ( a b c -- c a b ) rot rot ; nip ( a b -- b ) swap drop ; tuck ( a b -- b a b ) swap over ;

0 ) dup if dup then ;

three notes:

the convention exactly describes the relevant function, which means that you could have a DSL for such functions that looks just like that; oot should be capable of such DSLs
the convention could also be interpreted used to describe the type signature
there is a relationship between descriptions of a type signature and descriptions of a function result, namely that you could interpret a variable as meaning 'something of this type goes here, but what it is may differ even within one expression' or you could interpret it as 'over the entire expression, this variable represents one value'

---

" By now you realize that if monads were a stock, I’d be shorting it. I’m going to go get myself in a huge amount of trouble now, just as I did when I took a hideously pragmatic tack on continuations some years ago.

The most important practical contribution of monads in programming is, I believe, the fact that they provide a mechanism to interface pure functional programming to the impure dysfunctional world.

The thing is, you don’t really need them for that. Just use actors. Purely functional actors can interact with the stateful world, and this has been known since before Haskell was even conceived.

Some kind soul will doubtless point out to me how you can view actors as monads or some such. Be that as it may, it is beside the point. You can invent, build and most importantly, use, actors without ever mentioning monads. Carl Hewitt and his students did that decades ago.

Tangent: I have to say how amazing that is. Actors were first conceived by Hewitt in 1973(!), and Gul Agha's thesis has been around for 25 years. I firmly believe actors are the best answer to our concurrency problems, but that is for another post.

You can write an actor in a purely functional language, and have it send messages to file systems, databases or any other other stateful actor. Because the messages are sent asynchronously, you never see the answer in the same activation (aka turn) of the actor, so the fact that these actors are stateful and may give different answers to the same question at different times does not stain your precious snow white referential transparency with its vulgar impurity. This is pretty much what you do with a monad as well - you bury the stateful filth in a well marked shallow grave and whistle past it.

Of course, your troubles are by no means over. Actors or monads, the state is out there and you will have to reason about it somewhere. But better you reason about it in a well bounded shallow grave than in C.

What is important to me is that the notion of actors is intuitive (a pesky property of Dijkstra’s hated anthropomorphisms, like self) for many people. Yes, there are many varieties of actors and I have my preferences - but I’ll take any one of them over a sheaf of categories.

Speaking of those preferences, look at the E programming language (I often point at Mark Miller’s PhD? thesis) or on AmbientTalk?. I would like to have something similar in Newspeak (and in its hypothetical functional subsets, Avarice and Sloth). " -- http://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html

rebuttal in comments:

I too spend much of my time exploring how to effectively leverage actors, but I don't think you'll find any real world Erlang advocate that will advocate making anything that could be an actor into an actor. There comes a point at which you need to stop subdividing tasks into actors. However, there is no harm in observing that a structure is a monad and marking it as such. This pays no runtime cost.

Sending a message on a channel usually costs at least a compare-and-swap or memory fence. This limits their applicability to things above a certain granularity.

Actors carry some heavyweight baggage in the form of their message queue. This has an operational cost, because you can either live in the Erlang-style world where these things carry around potentially unbounded numbers of messages, whereupon the whole system can come crashing down upon your ears when the queues get out of whack if your consumers can't keep up with your producers or you can live in the Singularity-style world where they have to be composed out of 2-endpoint channels with some affine type system managing the endpoints. Living in a world full of erlang-style actors requires you to build up a complicated series of tools for dealing with how to kill and reset the system when things inevitably go out of whack. If you look at Erlang's OTP, much of it is devoted to this very problem. (This _can_ be perceived as a good thing. It forces you to think about how to make a distributed system robust against a wide-array of failures.)

I happen to enjoy using these abstractions quite a bit, but even in Erlang or Scala actors, you wind up passing around lists and other concrete data structures, because it isn't worth constructing those queues _everywhere_.

" -- http://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html?showComment=1296027246940#c6671808644893652655

---

recommends:

ycombinator

reddit programming

dadgum

http://raganwald.com/

https://twitter.com/hmason

---

" As to the syntax, THE MQL4 language for programming trading strategies is very much similar to the C++ programming language, except for some features:

    no address arithmetic;
    no goto operator;
    an anonymous enumeration can't be declared;
    no multiple inheritance."

also it seems to me not to have templates or any sort of parametric polymorphism? but it does have eg copy constructors

also it does have argument defaults

also the builtins have varargs but i'm not sure if user fns can

---

mpweiher 2 hours ago

> PowerShell? = bash

While a great fan of PowerShell? in theory, in practice it seems to be extremely cumbersome. Sort of the opposite of bash and other Unix shells, which suck in theory and are very useful/convenient/powerful in practice.

Someone1234 2 hours ago

Cumbersome or you just aren't familiar with it yet?

The whole design of PS is meant to make it so you can "guess" the names of cmdlets you've never used before. Everything is Verb-Noun, Get-Service, New-Service, Restart-Service, Stop-Service, etc.

Discoverability is valued over succinctness.

Karunamon 1 hour ago

Which is fine when you're starting out, but infinitely frustrating after you have an idea of what you're doing.

A recent example, looking for errors in windows logs:

    Get-Childitem "C:\Windows\" -recurse -Include *.log ` | Select-String "Error" -ErrorAction SilentlyContinue ` | Group-Object filename | Sort-Object count -descending

Closest *NIX analogue I can think of would be

    grep -r Error /var/log/ | sort

English is an arguably fine speaking language but an awful programming one.

I love what Powershell can do, but gods do I hate typing it.

Someone1234 1 hour ago

Those two lines do different things. They're not analogous.

The equivalent to the UNIX line above in Powershell is:

      ls "C:\app\*.log" -R  | sls "Error" | sort

You cannot just tack on a bunch of extra requirements for Powershell (grouping, sorting by certain things and in a certain order) and then not include them in the UNIX example, that's disingenuous/misleading.

The only big difference between PS and UNIX in an actual analogous example is that the PS version of grep gets files fed in one by one and processes them, whereas UNIX's grep processes files itself.

PS - The above Powershell code may not work in 2.0 (2009). You'll need 3.0 (2012) or higher.

ckozlowski 46 minutes ago

I'd agree it's a bit more to type out, but it does seem to make more sense from a readability perspective. And for those of us who didn't grow up with the UNIX shell to understand the reasons why things were kept so short, it's a bit easier to digest. (I do appreciate why bash is so short and succulent. =)

But really, taking the above into consideration, the might in powershell comes not from the terms used, in my opinion, but how it works on things. With grep for example, you're parsing a file. If say, you wanted to filter on that more, you'd be using awk to pick out parts of the text.

In Powershell, everything's an object. Everything's already an object, you don't need to pick the file apart to isolate the date, you just filter on the date object. It's got a data type.

That makes it really powerful, in my opinion.

(Example largely pulled from "Windows Powershell in Action". I really like this book, as it goes into detail to describe /why/ things are they way they are in PowerShell?, because he wrote it. =) https://www.manning.com/books/windows-powershell-in-action-s...

overgryphon 1 hour ago

Lots of powershell cmdlets have aliases too. They are less discoverable than the verb-noun names, but for cmdlets I use often they are great. Get-ChildItem? is aliased to gci, ls, and dir.

 r-w 1 hour ago

Try links and environment variables to substitute long commands and flags, respectively. Fine, it’s not there by default, but as a power user you have the ability to adapt it to your needs. (That would also be a cool fix to release on GitHub? for others like you to use as well. Yay hackability!)

mpweiher 1 hour ago

Right. When I found out that this was supposed to be the alternative, I almost lost it.

For the thing to become usable, you have to do personal configuration, meaning your system will be unusable to anyone else and theirs will be unusable to you.

Back to the drawing board...

kuschku 7 minutes ago

Except, GNU grep is even faster than Select-String, if you run it on a better filesystem.

Touche 1 hour ago

> Cumbersome or you just aren't familiar with it yet?

Cumbersome. To create my own cmdlet that does the cool stuff Powershell can do (as far as piping) my choices are a terrible language like ps1 or .NET.

On Unix I can use pretty much whatever I want because the shell isn't tied to a specific runtime.

UK-AL 2 hours ago

Although you can have succinctness, most commands have shortcuts for them.

Get-Content = gc or cat

for example

---

 tkinom 22 hours ago

16 years, I created a profiling macro system on for select set critical functions. The profiling system can switch the measuring data from #instruction, #cpu_clk, #branch_stall_cycles, #L1_miss, #L2 miss for all the critical path functions.

After analyzed the data, I found branch stall cycles and data access stall cycles were causing huge number of delays in the critical code path.

I used the following tricks to get rid of the stall cycles.

1) Use branch_likely to force gcc to make sure there is no branch at all in the critical path of executions. (save 30+ cpu cycle per branch, there are a lot of branch stall cycles if one just simplely follow the gcc generated "optimized" code. MIPS CPU 200Mhz)

2) Use prefetch ahead of data structure access to get rid of un-cache data delay. (save ~50+cpu cycle per data stall, also, there are lot of them in the critical path.)

3) Use inline functions, etc to get rid of call stalls in critical path.

The system got ~100x increase on the overall system thru-put with those techniques with just pure C optimization from standard -O2 build.

I think it might be possible to create a build system that can automatically collect the profiling data (branch stall cycles and data stall cycles) and use the branch likely and prefetch instructions to auto-optimized the critical path code.

Specifying which code path / function call sequences are the real critical path probably still require programmer's touch.

As result of using data prefetch code in proper place, I don't used cache locking nor doing any kind of CPU affinity trick to generated the optimized obj code without any stall cycles for critical code path.

xavierd 18 hours ago

A lot of those optimizations would no longer yield any benefits[0]. The CPU archictecture evolved a lot in 16 years, especially in branch/code prediction to the point where a correctly predicted branch (without branch_likely) has almost no cost.

[0]: At least, this is true for x86 CPUs.

e5f34f89 17 hours ago

As a CPU architect, I can confirm that all those except possibly 2) will not yield significant benefits. Prefetching hints will only be useful when the particular code fragment is highly memory-bound because most wide superscalar microarchitectures will easily hide L1/L2 miss latencies.

---

notes on one guy's introduction to R (he now loves R): "Why on earth would you index a data structure with a $? Why would you “attach” a data frame?" -- http://datascience.la/a-conversation-with-hadley-wickham-the-user-2014-interview/

---

j2kun 203 days ago

> The language is byzantine and weird

This is my biggest beef with R. It is constantly changing the dimensions and types of your data without telling you. Want to grab some subset of the rows of a matrix? Better add some extra post-processing in case there's only one row that satisfies your query, or else R will change its type!

The solution is not to make the programmer memorize obscure edge cases.

jghn 203 days ago

You know that you can tell it not to do that, right? drop=FALSE

j2kun 203 days ago

I think this only reinforces my point: this is a ridiculous default. But that is news to me :)

andy_wrote 203 days ago

In light of this issue, dplyr's tbl_df structure (a light but helpful wrapper around data.frame) actually has different drop defaults, for example

  > x <- data.frame(foo=1:5, bar=1:5, baz=1:5)
  > dim(x[,'foo'])
  NULL
  > dim(x[,c('foo','bar')])
  [1] 5 2
  > dim(x[,'foo',drop=FALSE])
  [1] 5 1

compared to

  > x <- dplyr::data_frame(foo=1:5, bar=1:5, baz=1:5)
  > dim(x[,'foo'])
  [1] 5 1

Although I think these are more reasonable (I've got multiple commits at work with messages bemoaning drop=FALSE), this can ironically also mess you up if you got used to the old defaults :)

stewbrew 200 days ago

Use the right function, subset(), for the right effect then.

---

i probably dont want to read the rest of this, but just in case i'm storing it here, it may have something to teach about library design:

https://www.imperialviolet.org/2015/10/17/boringssl.html

---

security issues in x86. i havent read it:

http://blog.invisiblethings.org/papers/2015/x86_harmful.pdf

https://news.ycombinator.com/item?id=10458318

part II:

http://blog.invisiblethings.org/papers/2015/state_harmful.pdf

https://news.ycombinator.com/item?id=10787614

" The main principle introduced below is the requirement for the laptop hardware to be stateless , i.e. lacking any persistent storage. This includes it having no firmware-carrying flash memory chips. All the state is to be kept on an external, trusted device. This trusted device is envisioned to be of a small USB stick or SD card form factor. This clean separation of state-carrying vs. stateless silicon is, however, only one of the requirements, itself not enough to address many of the problems discussed in the article referenced above. There are a number of additional requirements: for the endpoint (laptop) hardware, for the trusted “stick”, and for the host OS. We discuss them in this paper. "

---

"It does not currently make sense to compile JavaScript? to WebAssembly?, because it lacks JavaScript?-specific features such as objects and arrays (for C++, one manually manages a heap in a typed array). Once WebAssembly? gains those features, JavaScript? can be compiled to it, but it will use the same techniques as current engines. Therefore, only the load time of applications will improve, because parsing is much faster.

...

2) Could languages like Dart, TypeScript? and PureScript? produce WebAssambly? bytecode in the near future? If that were the case, it could be a good selling point to choose them instead of plain JavaScript?, which in any case would produce less performance bytecode.

... Dart will continue to compile to JS in the short-term. Dart code is not static enough to compile to wasm today, and it will be a while before wasm gets the dynamic features to support it.

    The same applies to TypeScript and PureScript, JavaScript, and Python.

    Currently, wasm will only support languages such as C and C++ that don't need dynamic features such as garbage collectors or polymorphic inline cache.

... I doubt it. At least as far as TypeScript? compiling to wasm is concerned. TypeScript? is a superset of JavaScript?. Therefore, all (or at least most) valid JavaScript? is also valid TypeScript?. Additionally, TypeScript? does not subvert JavaScripts? object model and type system. Instead it compliments it by providing optional restrictions. The strong typing is intended as a design feature to aid development and does not and cannot provide the kind of hard type and layout guarantees, the kind used by a compiler to generate fast machine code, provided by purely static typed languages like java or c++. So compiling TypeScript? to wasm would be have the same drawbacks as compiling JavaScript? to wasm. "

---

in a discussion about CodePush?: https://news.ycombinator.com/item?id=10512867

sudhirj 14 hours ago

Was reading somewhere that you're free to do on the fly updates of any interpreted code - so JavaScript? is fine. Only compiled code can't be updated on the fly.

And in react native only the main framework is compiled, with the actual application running in a JS interpreter. So unless we upgrade the RN version, or start using a new SDK etc there will never be a reason to go through the App Store.

All interpreted code is sandboxed anyway, so there's no security risks involved.

---

lispm 1 day ago

> Most programming languages have several syntax rules. Lisp has one: everything is a list.

Oh, not again.

The author is confusing s-expression syntax with Lisp syntax. Even s-expressions are more than lists (conses, cyclic conses, symbols, vectors, strings, various numbers, structures, ...) and the s-expression reader knows about more than pure s-expressions - for example comments.

Not every s-expression is a valid Lisp expression. For example (let "let") is not a valid Common Lisp expression. The syntax for LET is:

    let ({var | (var [init-form])}*) declaration* form* => result*

This is also using the syntax definition for 'declaration' and 'form'.

Lisp has built-in syntax (provided by special operators in Common Lisp) and arbitrary complex syntax provided by macros.

This is valid for other languages with a notation based on s-expressions, like Scheme or Clojure.

---

a handler tree could be useful for vm protection domains ('ring 0' etc)

---

" creat(/dir/log); write(/dir/log, “2, 3, [checksum], foo”); fsync(/dir/log); fsync(/dir); pwrite(/dir/orig, 2, “bar”); fsync(/dir/orig); unlink(/dir/log); fsync(/dir); "

-- http://danluu.com/file-consistency/

needs 6 hours ago

There is an easy way to write data without corruption. First copy your file-to-be-changed as a temporary file or create a temporary file. Then modify the temporary file and write whatever you want in it. Finally, use rename() to atomically replace the old file by the temporary one.

The same logic also apply to directories, although you will have to use links or symlinks to have something really atomic.

It may not work on strangely configured systems, like if your files are spread over different devices over the network (or maybe with NFS). But in those cases you will be able to detect it if you catch errors of rename() and co (and you should catch them of course). So no silver bullet here, but still a good shot.

reply "

-- https://news.ycombinator.com/item?id=10726871

---

geofft 180 days ago

"Paravirtualization" can refer to two somewhat different things, both of which count as guest-assisted virtualization. The original idea behind paravirtualization, and originally the only type of virtualization in Xen, was one where Xen presented itself as its own PC-incompatible machine type using the x86 instruction set, and you had to port the boot code, memory management code, etc. over to Xen hypercalls, and you don't have access to real mode, the x86 MMU, etc. But you also need to port all your driver code (disks, networking, graphics) to Xen hypercalls, since you also no longer have access to the BIOS interrupts, PCI bus, etc.

It's possible to isolate the second part -- paravirtualized drivers -- without requiring the first part -- paravirtualized bootup. Since bootup isn't guest-assisted any more, the hypervisor now has to emulate the entire PC-compatible boot process, including minimal emulation of BIOS interrupts for disk access, etc. But once that's completed, you can switch to paravirtualized drivers for optimized performance, and the actual performance benchmarks people care about are the steady-state disk and network bandwidth.

The only somewhat tricky thing is that you need to handle MMU updates somehow. I believe (but don't know for certain) that with nested page table support at the processor level, you can just safely give x86 hardware-virtualized guests access to their nested page tables, and they can use the same instructions to modify that as they would on native hardware. So you don't need paravirtualization for that. This support has been in hardware since around 2008.

One of the benefits of using paravirtualized drivers alone, instead of an entire paravirtualized boot process, is that you can support OSes where you can write custom drivers but you can't change the boot code. So, for instance, the KVM project has Windows disk and network drivers that use the paravirtualized interface (virtio):

http://www.linux-kvm.org/page/WindowsGuestDrivers

You can continue to use Windows without this support, which will use the slow, emulated disk and network hardware. But if you have the drivers, things will get much faster. This is a best-of-both-worlds approach: you can continue to run any OS (since full virtualization support remains present), but you can switch to the paravirt drivers and get steady-state performance competitive with paravirt-only hypervisors.

---

 jstimpfle 3 hours ago

I'm coding exclusively in python3, and I agree the "least small" changes from python2 made it cleaner BUT

> I'm also a fan of map and filter returning generators rather than being hard-coded to a list implementation

I find myself often having to wrap expressions in list(...) to force the lists. (which is annoying)

Generators make things much more complicated. They are basically a way to make (interacting, by means of side-effects) coroutines, which are difficult to control. In most use cases (scripting) lists are much easier to use (no interleaving of side effects) and there is plenty of memory available to force them.

Generators also go against the "explicit is better than implicit" mantra. It's hard to tell them apart from lists. And often it's just not clear if the code at hand works for lists, or generators, or both.

So IMHO generators by default is a bad choice.

> stream fusion is a good thing

I don't think generators qualify for "stream fusion". I think stream fusion is a notion from compiled languages like Haskell where multiple individual per-item actions can be compiled and optimized to a combined action. Python instead, I guess, just interleaves actions, which might even be less efficient for complicated actions.

---

so what does a HLL like Haskell, Python, abstract from compared to assembly? Compared to Forth and Lisp? Compared to C?:

linear code sequence vs. parenthesized subexpressions, curly braces
first-class functions
macros (and similar metaprogrammy things like monads)
type checking (including side-effect management), polymorphism
precedence/order of operations
memory management
call stack, data stack
named functions, keyword arguments
named variable assignment, namespaces/symbol tables (assembly sometimes has this)
default arguments
laziness (evaluation strategy)
autocoercion, truthiness
looping constructs (and conditional constructs)
dynamic dispatch (runtime oop)
inheritance (for code reuse)

---

with the new idea of renaming 'oot core' to 'mu oot' (mb we should just say 'moot' or 'muut' to make it easier to type?) and reusing 'oot core' to describe the oot language without the stdlib, we might have the following possible levels of abstraction:

ootbc, oot bytecode, or oasm, oot assembly (textual representation of ootbc): a linear language with fixed-sized instructions
moot: a small kernel language in which 'oot core' is written. 'Lispy' in the sense that it has simple syntax, but unlike assembly, is not 'linear' in that it does permit nested subexpressions.
oot core: the oot language, implemented in moot as a set of metaprogram--y libraries
oot (or, standard oot): oot core plus the stdlib, defined as those parts of the distributed libraries that all moderately experienced Oot programmers should be able to easily read

---

if oot is the 'python of haskell', i've said a lot about what the 'python' part of that phrase means to me (readability, simplicity, stuff like: commandline-ability, concise notation for eg dictionary mutation, sugar and conventions for common tasks like error handling, default function arguments, keyword arguments, the possiblity of creating what i now called 'views' via pointer references between structures and/or attribute access metaprogramming), what about the haskell?

i guess what i liked about haskell was:

the power of non-strictnness (often called 'laziness' but technically it's non-strictness that i liked)
a syntax designed for working with hofs (higher-order functions); ML-style function application, currying, partial function application
pattern matching/deconstructing bind
immutable data by default; aliasing between data structures is explicit
static typing (although optional static typing would be even better imo), and interfaces (typeclasses)
typechecking of functional purity (although i'd prefer simply TRACKING functional purity, or perhaps something like effect systems(?) to allow impure operations within certain domains, such as profiling and logging, without contaminating the ostensible purity of the caller)

combining the first three of these, you enable many of the wonderfully concise function definitions in the Haskell prelude, eg:

True && x = x; False && _ = False
foldr f z [] = z; foldr f z (x:xs) = f x (foldr f z xs)
and xs = foldr (&&) True xs
all p xs = and (map p xs)

(compare the definition of 'all' to an imperative non-hof-friendly language, where something like 'all p xs' would probably be defined:

for x in xs: if not(p(x)): return False return True )

---

i guess what i like about BASIC is:

simplicity/smallness of the language; there's only about 25 components of the language to learn (eg http://www.dartmouth.edu/basicfifty/commands.html lists 15 statement types, 5 infix arithmetic operators, number literals, string literals, arrays, variables)
lightweight interpreter/run immediately from command prompt; also no boilerplate like "#include <stdio.h>; int main(int argc, char *argv[]) {}" nor tooling boilerplate like "gcc -o myprog test.c"

---

neat Powershell example:

" Get-Service

" -- http://www.tomsitpro.com/articles/powershell-piping-filtering-objects,2-771.html

Where-Object {$_.DependentServices? -ne $null}

---

so what makes pipelining in shells so cool? mb 3 things:

the fact that since the communication through the pipe is human-readable plaintext, that everything is reducible to the 'common denominator' of human-readable plaintext, which means that it won't be TOO hard to figure out how to cobble together two things that weren't designed to be cobbled together (unlike with binary formats, or worse, undocumented binary formats)
a stream (=lazy list?) abstraction, which leads to things like list comprehensions (but also laziness, since its a stream/lazy list)
the idea that impure interactions with the environment (reads from stdin, writes to stdout) are 'captured' and redirected

to the extent that plaintext alone is the (source of the) coolness, the convention in scripting languages like Python that everything can have 'str' or at least 'repl' called on it already does the job

to the extent that lazy lists is the coolness, this suggests that '

', rather than being a stdin/stdout-remapping pipeline, should just be a syntactic operator for building list comprehensions (or their lazy or 'stream' variant)

but to the extent that the redirection of impure interactions is the coolness, '

' needs to do something more complex (we could always use '

' for list comprehensions and '

' for the redirecting stuff if we want). As hinted by my wording, we can in fact GENERALIZE shell pipelines to (a) work on impure interactions with arbitrary bits of state, not just I/O streams, and (b) 'capture' the impurity in more general ways besides just redirecting it, eg capture it for reification, allowing versioning and undo.

if '

' is to be some sort of list-comprehension-builder but for shell pipelines, then the stuff on the left side of the '

' is the expressions producing the whole list, whereas the stuff on the right side is a function accepting ONE ITEM of that list and operating on it. This is asymmetric, suggesting that '

' is not the best syntax; maybe is better? but we are using for -->, eg the branches in a 'cond'. So maybe

- gives us 'map', then we also want/need:

filtering (the 'if' in Python list comprehensions). Note that we ALMOST already get filtering for free, because a filter can be written as just a function which is mapped over list and returns NIL for items that should be thrown out, and something truthy for items that should be kept. But this isn't quite enough, because what if some of the items in the list are themselves NIL? So we do need (or at least want) a distinct filter operator. Mb syntax of '

>'? or mb

/, that's easier to type

reduce (fold) (as in map/reduce). Mb

note also Haskell's list comprehension syntax:

[x*2

x <- [1..10]]

note also that all of Python comprehensions, Haskell comprehensions, Ruby blocks have a (mandatory) way to name the variable being bound to the list item in the iteration, as opposed to the proposal above, which assumes that the function being mapped takes the item as its first argument. Maybe we should offer both?

---

the page http://mnemnion.github.io/blog/2013/10/05/the-urbit-has-landed/ (note to self: i already read the rest of it, no need to reread) contains a comparison of the syntax of a floating point number, as specified in Hoon, vs. in a grammar notation. That page seems to think that this example provides evidence that Hoon may not be very readable:

Here's some Hoon:

1
2
3
4
5
6
7
8
9
10
11
12

	

         %r
        =+  yug=(yell q.p.lot)
        =>  ^+(. .(rex ?~(f.yug rex ['.' (s-co f.yug)])))
        :-  '~'
        ?:  &(=(0 d.yug) =(0 m.yug) =(0 h.yug) =(0 s.yug))
          ['.' 's' '0' rex]
        =>  ^+(. ?:(=(0 s.yug) . .(rex ['.' 's' (a-co s.yug)])))
        =>  ^+(. ?:(=(0 m.yug) . .(rex ['.' 'm' (a-co m.yug)])))
        =>  ^+(. ?:(=(0 h.yug) . .(rex ['.' 'h' (a-co h.yug)])))
        =>  ^+(. ?:(=(0 d.yug) . .(rex ['.' 'd' (a-co d.yug)])))
        +.rex
      ==

This, I am told, specifies the syntax of a floating point number. Or some of it does.

This is from the JSON spec:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

	

number
    int
    int frac
    int exp
    int frac exp 
int
    digit
    digit1-9 digits
    - digit
    - digit1-9 digits 
frac
    . digits
exp
    e digits
digits
    digit
    digit digits
e
    e
    e+
    e-
    E
    E+
    E-

The former is executable, the latter is admirably clear.

" -- http://mnemnion.github.io/blog/2013/10/05/the-urbit-has-landed/

---

thinking on Nock, it's primitive structural equality operator, and how its combinator machine model would lead to a native implementation that copies around large 'state' expressions a lot, while an advanced implementation would use pointers and mostly only track diffs of the 'state' expressions, and how a naive implementation of structural equality checking on Nock data strutures (binary trees) would completely traverse the tree;

it's important to provide structural equality as a primitive if you have a language in which, in theory, large/pure/immutable/pointerless data structures are being copied often in order to accomodate frequent small modifications to them that in theory produce distinct, non-alias values, and when in practice, the implementation doesn't have to copy much because pointers are used. Because then language implementation can short-circuit most of the structural equality checks on closely related values by using pointer equality to verify that the 'shared base' of the related values is identical, rather than recursing all the way through the shared base.

---

i guess Nock has binary trees; Lisp sorta has binary trees but sorta has n-ary lists; Haskell has lists; Python has lists and dicts; assembly has a linear address space with pointers; and we want to have graphs

so Oot is an interesting contrast to, say, Nock; Nock is careful to only have binary trees, not even lists, in order to start with the simplest thing, something that is dead simple to implement (or specify) in relation to the underlying hardware (or machine model, in Nock's case); whereas with Oot, i want to start with one very general/powerful data structure, but one which is not necessarily close to the hardware.

thinking about how Nock treats binary trees also sheds light on how Oot might treat graphs. We might call them 'rooted graphs'. The main difference between trees and our graphs (aside from named nodes, reification, and any other big differences) is that (a) a node can have multiple 'parents' (indegree can be >1) and (b) there can be cycles. Nock teaches us that things would stay pretty simple if we stick to immutability and we also stick with the convention that each node IS a graph on its own (or at least corresponds to one), and that to assign a node to a variable is the same as assigning the corresponding graph (ie the convention is that the interior nodes of binary trees ARE binary trees; the definition of binary trees is recursive, not 'a set of nodes and links between the nodes such that...'). Of course unlike Nock we want to permit pointers in some cases so it's not quite as simple.

---

should loop variables be locally scoped to loops? If so, should they be scoped to the entire loop, or should a separate variable be created for each iteration? The following example illustrates the difference:

" Loop indices behave a little differently in Lua 5.1 com pared to Lua 5.0.2. Consider the following, where loop index i is used as an upvalue in the instantiation of 10 functions:

local a = {} for i = 1, 10 do a[i] = function() return i end end print(a[5]())

Lua 5.0.2 will print out 10, while Lua 5.1 will print out 5. In Lua 5.0.2, the scope of the loop index encloses the for loop, resulting in the creation of a single upvalue. In Lua 5.1, the loop index is truly local to the loop, resulting in the cre ation of 10 separate upvalues " -- http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf

---

" Computation vs. automation (self.urbit)

submitted 2 months ago * by TheDividualist? ....The reason I find the Hoon tutorials really difficult to understand is that it is apparently written for Computer Science folks who think computers are for doing computation. We on the "street" use computers to automate repetitive tasks....It is as imperative as it gets....When CS people think of a typical code they consider something like the recursive Fibonacci function. When I think of a typical code, it sends email, it writes to a file, plots a graph, or makes a robot arm give the middle finger....There are several ways to sell functional programming to "doers". Usually it ends up reinventing SQL, as in the famous LISP book: http://www.gigamonkeys.com/book/practical-a-simple-database.html

Anders Hejlsberg added functional programming to C# and then ended up reinventing SQL as LINQ on top of it. Python generator expressions of the sum(i+j for i,j in whatever if j >10 ) are just a different way to write SELECT SUM (i+j) FROM whatever WHERE j>10. Apparently, Common LISP, C# and Python all want to work like SQL. Why? Because it is something fairly easy for understand for "doers". "

---

this guy (the inventor of Erlang?) loves a metaprotocol called 'OSC'

http://joearms.github.io/2016/01/28/A-Badass-Way-To-Connect-Programs-Together.html

" it’s Verb-Tag*-Value* encoded.

First comes a Verb which is a zero terminated string padded to a four byte boundary. Then comments a sequence of tags (which is also encoded as a string) then a sequence of values - each value corresponds to a single tag.

The tags are i for an integer d for a double s for a string and so on.

So the tag string iisif means that the values in the packet are int32 int32 string int32 float in that order. Both the encoder and decoder know how these data types are encoded so no additional information is necessary. ... The interesting thing about OSC encoding is that:

    It is extremely efficient.
    encoders/decoders can be implemented in a few lines of code.
    It is strongly typed.
    Complex nested data structures cannot be represented.... Let’s compare this to JSON - JSON is flexible, untyped, tricky to parse and represent and wasteful of space on the wire. In other words JSON has everything that a wire line protocol should not have. ... They are expressive - but not too expressive (limiting the types to flat sequences of atomic types) seems a good idea - it certainly gets the job done and are “good enough” for most purposes.

If a protocol cannot be expressed in sequences of OSC messages it probably should not be used. ... At a low level of abstractions we could just send integers over the wire but this would be too low level. At a higher level we could use some form of S-expression (like XML or JSON, which are just verbose S-expressions) but this is too expressive.

OSC seems to strike the right balance.

OSC has an additional advantage - the internal representation of an OSC message in the programming language of your choice is easy - why is this? Precisely because OSC does not have deeply nested recursive data structures.

If you parse XML or JSON you need to map the parse tree onto some object structure in your language, and since the parse trees in complex, the object in your programming language will be complex ... One measure of how good a protocol is is the size of the implementation and the time it took to write it. As I said implementing OSC is really easy, thanks mainly to the simplicity of the design.

I’ve written a number of XML parsers in my time, and it is not easy and there are some unpleasant edge cases. JSON parsers are also cumbersome beasts. "

others in a HN discussion note that OSC isn't great for everything:

 cyphar 7 hours ago

OP hasn't addressed the fact that UDP will willfully drop packets and send them out of order. What if I need to stream some data? What if I care about the order my IPC messages arrive in?

---

Uchikoma 727 days ago

I don't want to sound pessimistic, but adding a map structure to an language sounds rather small to me.

jlouis 727 days ago

Off my head:

Support the runtime, support the syntax and semantics, change the pattern matcher, write extensive tests, add it to the term_to_binary/binary_to_term protocol, add it to distribution, enable reading of maps in various function calls, enable NIFs to inspect maps, define how to copy maps between processes, change the core compiler.

It isn't so simple.

---

samatman 396 days ago

parent

on: Nim Advocacy and Promotion Strategies

Rust's runtime has been reduced to bounds-checking of array access, IIRC. So to a much lesser degree, and I believe Nim also provides this behavior by default.

Go is garbage collected and has an opinionated ABI and runtime. Completely different beasts.

haberman 396 days ago

Yes, agreed, though I really wish Rust could compile to C: http://www.reddit.com/r/rust/comments/2e8t9k/my_1_wish_list_...

Someone has revived the LLVM C Backend. Hopefully this could be a path to Rust->C compilation: https://github.com/draperlaboratory/llvm-cbe

Compiling to C is so important to me that I may check out Nim as an alternative to Rust, despite liking Rust very much.

---

samatman 404 days ago

parent

on: Clojure: Creating a reducible repeat

This reminds me of a daydream I've been having, that goes something like this: Rewrite the core Clojure data structures in Rust. Link in LuaJIT?. Write a bunch of finalizers and a reader in Lua, skin the Clojure library in the proper syntax, and Bob's your uncle.

I'm far too busy to take ownership on such a project, but if anyone starts it or finds it, please contact me.

Skinney 404 days ago

The core datastructures are immutable and uses structural sharing. This would require reference counting (slower) or a proper GC to be efficient (Rust doesn't have one). Doing this in Rust would probably give you worse results than the JVM.

Also, why LuaJIT?? LuaJIT? (or Lua) is not threadsafe, so you wouldn't be able to share state between threads safely, which would conflict with the way Clojure handles concurrency.

I'm not trying to be negative. But unless this is a project for fun or excercise, the JVM is a better host.

Jweb_Guru 404 days ago

Rust doesn't require reference counting to make sharing immutable datastructures safe. The interface would require lifetimes for safety, though, and might be generally unpleasant to use.

---

samatman 635 days ago

parent

on: Why I am excited about Clojure

Imagine if, instead of Javascript, you had a language based on JSON, with a few additional types, notably a 'symbol' type and a 'list' type. A few of the symbols are primitives that determine the semantics of the list type, allowing other symbols to be defined as functions.

That's homoiconicity, and Clojure strongly resembles an actual implementation of this conceit.

---

samatman 725 days ago

Forth doesn't have 'keywords' per se, but i is predefined for you. It has the effect of placing the index of a 'do' loop on the stack.

---

this guy says: "So when I encounter Nock, what I see is Lisp, rewritten as a cellular automaton, capable of serving as a DNA for computational beasties." [1]. I am wondering why he calls it a 'celluar automaton'? I guess the idea must be that the transition function of a single cell in a celluar automaton must be a pure function (the state of the cell and its neighbors are input, the new state of the cell is output).

---

in a post on Ax, in the part about getting input from an external source (via accessing a binary tree at address 0, which is an error in Nock), Sam Atman says

"That would be similar to allowing a calculation to slot on 0 and for any old thing to be in 0, even fairy dust."

i doubt he meant this, but what that makes me think of is allowing the external inputs to be something OTHER THAN natural numbers, even though only natural numbers are permitted in Nock. Once they are in the system, they can be moved and copied using the normal Nock operations. What if you tried to apply arithmetic operations to them? Two options are (a) don't allow arithmetic on alien values, (b) polymorphism: allow arithmetic on alien values, with results defined in some custom alien type definition. Also, another design choice would be to permit or forbid the alien types to break certain arithmetic or even more general identities (eg is x + 0 = x? Is x + x - x = x? and an example of a 'more general' one: If you copy x and then compare the copies with the structural equality comparison, are they always equal?)

---

" Anything added dilutes everything else. Approachable is better than simple. Avoid administrative distraction. Design for failure. Encourage flow. Favor focus over features. Half measures are as bad as nothing at all. It's not fully shipped until it's fast. Keep it logically awesome. Mind your words, they are important. Non-blocking is better than blocking. Practicality beats purity. Responsive is better than fast. Speak like a human. " -- https://news.ycombinator.com/item?id=6380340

---

is the urbit call stack over events considered a 'trampoline' or is that different?

---

briefly skimmed Ambients Applied to Ethereum. I think the only relevant idea for Oot is that you could have:

a graph
where each node in the graph was the root of an 'inner graph' (eg there is hierarchy)
where to send messages between nodes, the messages must contain 'to' and 'from' addresses, and each intermediate node gets to look at and act upon the message and decide whether to pass it on (can they 'spoof' messages eg alter messages whose 'from' is not their own? i'm assuming not.. but maybe the 'ancestors' can)
- and maybe we only consider the hierarchical graph