proj-oot-ootNotes16

Difference between revision 25 and current revision

No diff available.

---

comments on an article whose summary is "data-first, not code first"

hcrisp 19 hours ago

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -Linus Torvalds

reply

zurn 13 hours ago

Though Guy Steele's idea sounds contentious, that OO encourages "data-first" because code is encapsulated:

  "Smart data structures and dumb code works a lot better than 
  the other way around."
  This is especially true for object-oriented languages, 
  where data structures can be smart by virtue of the fact 
  that they can encapsulate the relevant snippets of "dumb 
  code." Big classes with little methods–that's the way to go!

Or maybe he is just encouraging OO programmers to think more in this vein?

reply

Chris_Newton 13 hours ago

The tricky part is that smartness of data structures is context-sensitive.

One of the most common design errors in OO systems seems to be building systems that beautifully encapsulate a single object’s state… and then finding that the access patterns you actually need involve multiple objects but it’s impossible to write efficient implementations of those algorithms because the underlying data points are all isolated within separate objects and often not stored in a cache-friendly way either.

Another common design problem seems to be sticking with a single representation of important data even though it’s not a good structure for all of the required access patterns. I’m surprised by how often it does make sense to invest a bit of run time converting even moderately large volumes of data into some alternative or augmented structure, if doing so then sets up a more efficient algorithm for the expensive part of whatever you need to do. However, again it can be difficult to employ such techniques if all your data is hidden away within generic containers of single objects and if the primary tools you have to build your algorithms are generic algorithms operating over those containers and methods on each object that operate on their own data in isolation.

The more programming experience I gain, the less frequently I seem to find single objects the appropriate granularity for data hiding.

reply

 iofj 12 hours ago

Have you noticed yet that programming ideologies go around in a circle. Programmers may currently be defending data-first (/data-oriented/...) programming, but it isn't the first time they did so ?

The way I experienced it:

micro-services/small tools "that do one thing well"/... (pro: reasonably generic, con: loads of tools, requires expert users, if one tool out of 30 is not working well things break)

data-first/data-oriented programming (really easy to manipulate data, very, VERY hard to maintain consistency)

database-oriented programming (enforce consistency. Otherwise data-oriented. Works well, then when in data-oriented programming your data would have gone inconsistent, in this paradigm you get errors. Needless to say "every operation errors out" is better than "our data suddenly became crap", but still blocks entire departments/systems unpredictably)

event-driven programming (really easy to make button X do Y) (to some extent built into data-base oriented programming, also available separately. Works well, but gets extremely confusing when programs get larger)

object-oriented programming (massively simplifies the "I have 200 different message types and forgot how they interact" problems of event-driven programming, also provides the consistency of database-oriented programming)

<web browsers start here>

event-driven programming with UI designers (makes event-driven programming and later object oriented event-driven programming accessible for idiots)

declarative object-oriented programming / aspect-oriented programming / J2EE and cousins / "Generating forms from data" / Django

micro-services/"do one thing well" (same disadvantages as during the 80s)

data-first/data-oriented programming (same disadvantages as end-of-80s) * you are here *

How much do you want to bet that servers that enforce data consistency and store it are next ?

reply

mgrennan 2 hours ago

So so much truth in this. I've also seen Batch processing with terminals ID move to real time programming and operating systems. Move back to browsers and session ID (IE HTML GET / PUT are the new punch cards)

reply

jeffdavis 19 hours ago

Or, start from the user experience.

Both ((data and ui)) are good places to start, and both should be given serious consideration early in the project.

Code is usually the worst place to start. If you do need code early on, I think it's fine to write it as quickly as possible, even ignoring edge cases and simplifying algorithms.

...

That's one of the reasons SQL databases are so great: they help you get the data design up and going quickly, and will enforce a lot of your invariants (through schema normalization, PK/FKs, and CHECK constraints). If anything goes wrong, rollback returns you to a known-good state.

...

 nostrademons 17 hours ago

This is one reason why prototyping is often so necessary. When you start from the user experience, you usually end up working your way back from the front-end.... Start from the UX, and you end up with a bunch of ad-hoc data structures that are very difficult to rationalize and inefficient to access. Start from the data, and you end up with a UI that mimics the data you were given and not how the user thinks about achieving their task....The solution is to write a quick & dirty prototype focusing on UX, nailing it but focusing on only the happy-path that's core to the user experience. Then take careful note of the data structures that you ended up with, and throw away the prototype. Then you start with a carefully planned data architecture that captures everything you learned in the prototyping phase, but eliminates redundancies and awkward access paths that you wrote in the quick & dirty prototype.

someone else commented 'start with the API' but they were downvoted

al2o3cr 8 hours ago

...

http://prog21.dadgum.com/37.html

...points out the tradeoffs explicitly: pure functions mean (in this case) immutable data, which means more-complex data structures...


Euclid's algorithm in Forth, Postscript (which is like Forth), and Python:

gcd ( a b -- n ) begin dup while tuck mod repeat drop ;

/gcd { { {0 gt} {dup rup mod} {pop exit} ifte } loop }.

def gcd(u, v): return gcd(v, u % v) if v else abs(u)

it's interesting to compare:

---

Forth has an interesting convention to document what stack manipulation functions do (in the parentheses; stuff in parens is comment in Forth); eg:

dup ( a -- a a ) ?dup ( a -- a a

drop ( a -- ) swap ( a b -- b a ) over ( a b -- a b a ) rot ( a b c -- b c a ) -rot ( a b c -- c a b ) rot rot ; nip ( a b -- b ) swap drop ; tuck ( a b -- b a b ) swap over ;
0 ) dup if dup then ;

three notes:

---

" By now you realize that if monads were a stock, I’d be shorting it. I’m going to go get myself in a huge amount of trouble now, just as I did when I took a hideously pragmatic tack on continuations some years ago.

The most important practical contribution of monads in programming is, I believe, the fact that they provide a mechanism to interface pure functional programming to the impure dysfunctional world.

The thing is, you don’t really need them for that. Just use actors. Purely functional actors can interact with the stateful world, and this has been known since before Haskell was even conceived.

Some kind soul will doubtless point out to me how you can view actors as monads or some such. Be that as it may, it is beside the point. You can invent, build and most importantly, use, actors without ever mentioning monads. Carl Hewitt and his students did that decades ago.

Tangent: I have to say how amazing that is. Actors were first conceived by Hewitt in 1973(!), and Gul Agha's thesis has been around for 25 years. I firmly believe actors are the best answer to our concurrency problems, but that is for another post.

You can write an actor in a purely functional language, and have it send messages to file systems, databases or any other other stateful actor. Because the messages are sent asynchronously, you never see the answer in the same activation (aka turn) of the actor, so the fact that these actors are stateful and may give different answers to the same question at different times does not stain your precious snow white referential transparency with its vulgar impurity. This is pretty much what you do with a monad as well - you bury the stateful filth in a well marked shallow grave and whistle past it.

Of course, your troubles are by no means over. Actors or monads, the state is out there and you will have to reason about it somewhere. But better you reason about it in a well bounded shallow grave than in C.

What is important to me is that the notion of actors is intuitive (a pesky property of Dijkstra’s hated anthropomorphisms, like self) for many people. Yes, there are many varieties of actors and I have my preferences - but I’ll take any one of them over a sheaf of categories.

Speaking of those preferences, look at the E programming language (I often point at Mark Miller’s PhD? thesis) or on AmbientTalk?. I would like to have something similar in Newspeak (and in its hypothetical functional subsets, Avarice and Sloth). " -- http://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html

rebuttal in comments:

"

I too spend much of my time exploring how to effectively leverage actors, but I don't think you'll find any real world Erlang advocate that will advocate making anything that could be an actor into an actor. There comes a point at which you need to stop subdividing tasks into actors. However, there is no harm in observing that a structure is a monad and marking it as such. This pays no runtime cost.

Sending a message on a channel usually costs at least a compare-and-swap or memory fence. This limits their applicability to things above a certain granularity.

Actors carry some heavyweight baggage in the form of their message queue. This has an operational cost, because you can either live in the Erlang-style world where these things carry around potentially unbounded numbers of messages, whereupon the whole system can come crashing down upon your ears when the queues get out of whack if your consumers can't keep up with your producers or you can live in the Singularity-style world where they have to be composed out of 2-endpoint channels with some affine type system managing the endpoints. Living in a world full of erlang-style actors requires you to build up a complicated series of tools for dealing with how to kill and reset the system when things inevitably go out of whack. If you look at Erlang's OTP, much of it is devoted to this very problem. (This _can_ be perceived as a good thing. It forces you to think about how to make a distributed system robust against a wide-array of failures.)

I happen to enjoy using these abstractions quite a bit, but even in Erlang or Scala actors, you wind up passing around lists and other concrete data structures, because it isn't worth constructing those queues _everywhere_.

" -- http://gbracha.blogspot.com/2011/01/maybe-monads-might-not-matter.html?showComment=1296027246940#c6671808644893652655

---

recommends:

ycombinator

reddit programming

dadgum

http://raganwald.com/

https://twitter.com/hmason

---

" As to the syntax, THE MQL4 language for programming trading strategies is very much similar to the C++ programming language, except for some features:

    no address arithmetic;
    no goto operator;
    an anonymous enumeration can't be declared;
    no multiple inheritance."

also it seems to me not to have templates or any sort of parametric polymorphism? but it does have eg copy constructors

also it does have argument defaults

also the builtins have varargs but i'm not sure if user fns can

---

mpweiher 2 hours ago

> PowerShell? = bash

While a great fan of PowerShell? in theory, in practice it seems to be extremely cumbersome. Sort of the opposite of bash and other Unix shells, which suck in theory and are very useful/convenient/powerful in practice.

reply

Someone1234 2 hours ago

Cumbersome or you just aren't familiar with it yet?

The whole design of PS is meant to make it so you can "guess" the names of cmdlets you've never used before. Everything is Verb-Noun, Get-Service, New-Service, Restart-Service, Stop-Service, etc.

Discoverability is valued over succinctness.

reply

Karunamon 1 hour ago

Which is fine when you're starting out, but infinitely frustrating after you have an idea of what you're doing.

A recent example, looking for errors in windows logs:

    Get-Childitem "C:\Windows\" -recurse -Include *.log ` | Select-String "Error" -ErrorAction SilentlyContinue ` | Group-Object filename | Sort-Object count -descending

Closest *NIX analogue I can think of would be

    grep -r Error /var/log/ | sort

English is an arguably fine speaking language but an awful programming one.

I love what Powershell can do, but gods do I hate typing it.

reply

Someone1234 1 hour ago

Those two lines do different things. They're not analogous.

The equivalent to the UNIX line above in Powershell is:

      ls "C:\app\*.log" -R  | sls "Error" | sort       

You cannot just tack on a bunch of extra requirements for Powershell (grouping, sorting by certain things and in a certain order) and then not include them in the UNIX example, that's disingenuous/misleading.

The only big difference between PS and UNIX in an actual analogous example is that the PS version of grep gets files fed in one by one and processes them, whereas UNIX's grep processes files itself.

PS - The above Powershell code may not work in 2.0 (2009). You'll need 3.0 (2012) or higher.

reply

ckozlowski 46 minutes ago

I'd agree it's a bit more to type out, but it does seem to make more sense from a readability perspective. And for those of us who didn't grow up with the UNIX shell to understand the reasons why things were kept so short, it's a bit easier to digest. (I do appreciate why bash is so short and succulent. =)

But really, taking the above into consideration, the might in powershell comes not from the terms used, in my opinion, but how it works on things. With grep for example, you're parsing a file. If say, you wanted to filter on that more, you'd be using awk to pick out parts of the text.

In Powershell, everything's an object. Everything's already an object, you don't need to pick the file apart to isolate the date, you just filter on the date object. It's got a data type.

That makes it really powerful, in my opinion.

(Example largely pulled from "Windows Powershell in Action". I really like this book, as it goes into detail to describe /why/ things are they way they are in PowerShell?, because he wrote it. =) https://www.manning.com/books/windows-powershell-in-action-s...

reply

overgryphon 1 hour ago

Lots of powershell cmdlets have aliases too. They are less discoverable than the verb-noun names, but for cmdlets I use often they are great. Get-ChildItem? is aliased to gci, ls, and dir.

reply

 r-w 1 hour ago

Try links and environment variables to substitute long commands and flags, respectively. Fine, it’s not there by default, but as a power user you have the ability to adapt it to your needs. (That would also be a cool fix to release on GitHub? for others like you to use as well. Yay hackability!)

reply

mpweiher 1 hour ago

Right. When I found out that this was supposed to be the alternative, I almost lost it.

For the thing to become usable, you have to do personal configuration, meaning your system will be unusable to anyone else and theirs will be unusable to you.

Back to the drawing board...

reply

kuschku 7 minutes ago

Except, GNU grep is even faster than Select-String, if you run it on a better filesystem.


Touche 1 hour ago

> Cumbersome or you just aren't familiar with it yet?

Cumbersome. To create my own cmdlet that does the cool stuff Powershell can do (as far as piping) my choices are a terrible language like ps1 or .NET.

On Unix I can use pretty much whatever I want because the shell isn't tied to a specific runtime.

reply

UK-AL 2 hours ago

Although you can have succinctness, most commands have shortcuts for them.

Get-Content = gc or cat

for example

reply

---

 tkinom 22 hours ago

16 years, I created a profiling macro system on for select set critical functions. The profiling system can switch the measuring data from #instruction, #cpu_clk, #branch_stall_cycles, #L1_miss, #L2 miss for all the critical path functions.

After analyzed the data, I found branch stall cycles and data access stall cycles were causing huge number of delays in the critical code path.

I used the following tricks to get rid of the stall cycles.

1) Use branch_likely to force gcc to make sure there is no branch at all in the critical path of executions. (save 30+ cpu cycle per branch, there are a lot of branch stall cycles if one just simplely follow the gcc generated "optimized" code. MIPS CPU 200Mhz)

2) Use prefetch ahead of data structure access to get rid of un-cache data delay. (save ~50+cpu cycle per data stall, also, there are lot of them in the critical path.)

3) Use inline functions, etc to get rid of call stalls in critical path.

The system got ~100x increase on the overall system thru-put with those techniques with just pure C optimization from standard -O2 build.

I think it might be possible to create a build system that can automatically collect the profiling data (branch stall cycles and data stall cycles) and use the branch likely and prefetch instructions to auto-optimized the critical path code.

Specifying which code path / function call sequences are the real critical path probably still require programmer's touch.

As result of using data prefetch code in proper place, I don't used cache locking nor doing any kind of CPU affinity trick to generated the optimized obj code without any stall cycles for critical code path.

reply

xavierd 18 hours ago

A lot of those optimizations would no longer yield any benefits[0]. The CPU archictecture evolved a lot in 16 years, especially in branch/code prediction to the point where a correctly predicted branch (without branch_likely) has almost no cost.

[0]: At least, this is true for x86 CPUs.

reply

e5f34f89 17 hours ago

As a CPU architect, I can confirm that all those except possibly 2) will not yield significant benefits. Prefetching hints will only be useful when the particular code fragment is highly memory-bound because most wide superscalar microarchitectures will easily hide L1/L2 miss latencies.

reply

---

notes on one guy's introduction to R (he now loves R): "Why on earth would you index a data structure with a $? Why would you “attach” a data frame?" -- http://datascience.la/a-conversation-with-hadley-wickham-the-user-2014-interview/

---

j2kun 203 days ago

> The language is byzantine and weird

This is my biggest beef with R. It is constantly changing the dimensions and types of your data without telling you. Want to grab some subset of the rows of a matrix? Better add some extra post-processing in case there's only one row that satisfies your query, or else R will change its type!

The solution is not to make the programmer memorize obscure edge cases.

jghn 203 days ago

You know that you can tell it not to do that, right? drop=FALSE

j2kun 203 days ago

I think this only reinforces my point: this is a ridiculous default. But that is news to me :)

andy_wrote 203 days ago

In light of this issue, dplyr's tbl_df structure (a light but helpful wrapper around data.frame) actually has different drop defaults, for example

  > x <- data.frame(foo=1:5, bar=1:5, baz=1:5)
  > dim(x[,'foo'])
  NULL
  > dim(x[,c('foo','bar')])
  [1] 5 2
  > dim(x[,'foo',drop=FALSE])
  [1] 5 1

compared to

  > x <- dplyr::data_frame(foo=1:5, bar=1:5, baz=1:5)
  > dim(x[,'foo'])
  [1] 5 1

Although I think these are more reasonable (I've got multiple commits at work with messages bemoaning drop=FALSE), this can ironically also mess you up if you got used to the old defaults :)

stewbrew 200 days ago

Use the right function, subset(), for the right effect then.

---

i probably dont want to read the rest of this, but just in case i'm storing it here, it may have something to teach about library design:

https://www.imperialviolet.org/2015/10/17/boringssl.html

---

security issues in x86. i havent read it:

http://blog.invisiblethings.org/papers/2015/x86_harmful.pdf

https://news.ycombinator.com/item?id=10458318

part II:

http://blog.invisiblethings.org/papers/2015/state_harmful.pdf

https://news.ycombinator.com/item?id=10787614

" The main principle introduced below is the requirement for the laptop hardware to be stateless , i.e. lacking any persistent storage. This includes it having no firmware-carrying flash memory chips. All the state is to be kept on an external, trusted device. This trusted device is envisioned to be of a small USB stick or SD card form factor. This clean separation of state-carrying vs. stateless silicon is, however, only one of the requirements, itself not enough to address many of the problems discussed in the article referenced above. There are a number of additional requirements: for the endpoint (laptop) hardware, for the trusted “stick”, and for the host OS. We discuss them in this paper. "

---

"It does not currently make sense to compile JavaScript? to WebAssembly?, because it lacks JavaScript?-specific features such as objects and arrays (for C++, one manually manages a heap in a typed array). Once WebAssembly? gains those features, JavaScript? can be compiled to it, but it will use the same techniques as current engines. Therefore, only the load time of applications will improve, because parsing is much faster.

...

2) Could languages like Dart, TypeScript? and PureScript? produce WebAssambly? bytecode in the near future? If that were the case, it could be a good selling point to choose them instead of plain JavaScript?, which in any case would produce less performance bytecode.

... Dart will continue to compile to JS in the short-term. Dart code is not static enough to compile to wasm today, and it will be a while before wasm gets the dynamic features to support it.

    The same applies to TypeScript and PureScript, JavaScript, and Python.
    Currently, wasm will only support languages such as C and C++ that don't need dynamic features such as garbage collectors or polymorphic inline cache.

... I doubt it. At least as far as TypeScript? compiling to wasm is concerned. TypeScript? is a superset of JavaScript?. Therefore, all (or at least most) valid JavaScript? is also valid TypeScript?. Additionally, TypeScript? does not subvert JavaScripts? object model and type system. Instead it compliments it by providing optional restrictions. The strong typing is intended as a design feature to aid development and does not and cannot provide the kind of hard type and layout guarantees, the kind used by a compiler to generate fast machine code, provided by purely static typed languages like java or c++. So compiling TypeScript? to wasm would be have the same drawbacks as compiling JavaScript? to wasm. "

---

in a discussion about CodePush?: https://news.ycombinator.com/item?id=10512867

sudhirj 14 hours ago

Was reading somewhere that you're free to do on the fly updates of any interpreted code - so JavaScript?