proj-oot-ootDataNotes5

" Preserving order of dictionaries and kwargs

In CPython 3.6+ dicts behave like OrderedDict? by default (and this is guaranteed in Python 3.7+). This preserves order during dict comprehensions (and other operations, e.g. during json serialization/deserialization)

import json x = {str(i):i for i in range(5)} json.loads(json.dumps(x))

  1. Python 2 {u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}
  2. Python 3 {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}

Same applies to kwargs (in Python 3.6+), they're kept in the same order as they appear in parameters. Order is crucial when it comes to data pipelines, previously we had to write it in a cumbersome manner:

from torch import nn

  1. Python 2 model = nn.Sequential(OrderedDict?([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU?()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU?()) ]))
  2. Python 3.6+, how it *can* be done, not supported right now in pytorch model = nn.Sequential( conv1=nn.Conv2d(1,20,5), relu1=nn.ReLU?(), conv2=nn.Conv2d(20,64,5), relu2=nn.ReLU?()) )

Did you notice? Uniqueness of names is also checked automatically.

Iterable unpacking

  1. handy when amount of additional stored info may vary between experiments, but the same code can be used in all cases model_paramteres, optimizer_parameters, *other_params = load(checkpoint_name)
  2. picking two last values from a sequence
  1. This also works with any iterables, so if you have a function that yields e.g. qualities,
  2. below is a simple way to take only last two values from a list

...

Multiple unpacking

Here is how you merge two dicts now:

x = dict(a=1, b=2) y = dict(b=3, d=4)

  1. Python 3.5+ z = {x, y}
  2. z = {'a': 1, 'b': 3, 'd': 4}, note that value for `b` is taken from the latter dict.

See this thread at StackOverflow? for a comparison with Python 2.

The same approach also works for lists, tuples, and sets (a, b, c are any iterables):

[*a, *b, *c] # list, concatenating (*a, *b, *c) # tuple, concatenating {*a, *b, *c} # set, union

Functions also support this for *args and kwargs:

Python 3.5+ do_something({default_settings, custom_settings})

  1. Also possible, this code also checks there is no intersection between keys of dictionaries do_something(first_args, second_args)

...

Future-proof APIs with keyword-only arguments ... In Python 3, library authors may demand explicitly named parameters by using *:

class SVC(BaseSVC?): def __init__(self, *, C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, ... )

...

    Enums are theoretically useful, but
        string-typing is already widely adopted in the python data stack
        Enums don't seem to interplay with numpy and categorical from pandas
    ...
    Python 3 has stable ABI
    

" -- [1]

---

" Problems for code migration specific for data science (and how to resolve those)

    support for nested arguments was dropped
    map(lambda x, (y, z): x, z, dict.items())
    However, it is still perfectly working with different comprehensions:
    {x:z for x, (y, z) in d.items()}
    In general, comprehensions are also better 'translatable' between Python 2 and 3.
    map(), .keys(), .values(), .items(), etc. return iterators, not lists. Main problems with iterators are:
        no trivial slicing
        can't be iterated twice
    Almost all of the problems are resolved by converting result to list.

...

Course authors should spend time in the first lectures to explain what is an iterator, why it can't be sliced / concatenated / multiplied / iterated twice like a string (and how to deal with it). " -- [2]

---

" Optional Chaining #

let x = foo?.bar.baz();

this is a way of saying that when foo is defined, foo.bar.baz() will be computed; but when foo is null or undefined, stop what we’re doing and just return undefined.”

More plainly, that code snippet is the same as writing the following.

let x = (foo === null

foo === undefined) ?
    undefined :
    foo.bar.baz();

You might find yourself using ?. to replace a lot of code that performs repetitive nullish checks using the && operator.

Before if (foo && foo.bar && foo.bar.baz) { ... }

After-ish if (foo?.bar?.baz) { ... }

Keep in mind that ?. acts differently than those && operations since && will act specially on “falsy” values (e.g. the empty string, 0, NaN?, and, well, false), but this is an intentional feature of the construct. It doesn’t short-circuit on valid data like 0 or empty strings.

Optional chaining also includes two other operations. First there’s the optional element access which acts similarly to optional property accesses, but allows us to access non-identifier properties (e.g. arbitrary strings, numbers, and symbols):

    arr?.[0];

There’s also optional call, which allows us to conditionally call expressions if they’re not null or undefined.

    log?.(`Request started at ${new Date().toISOString()}`);

Nullish Coalescing #

You can think of this feature - the ?? operator - as a way to “fall back” to a default value when dealing with null or undefined. When we write code like

let x = foo ?? bar();

this is a new way to say that the value foo will be used when it’s “present”; but when it’s null or undefined, calculate bar() in its place.

Again, the above code is equivalent to the following.

let x = (foo !== null && foo !== undefined) ? foo : bar();

The ?? operator can replace uses of

when trying to use a default value. For example, the following code snippet tries to fetch the volume that was last saved in localStorage (if it ever was); however, it has a bug because it uses .

function initializeAudio() { let volume = localStorage.volume

0.5
    // ...}

When localStorage.volume is set to 0, the page will set the volume to 0.5 which is unintended. ?? avoids some unintended behavior from 0, NaN? and "" being treated as falsy values. "

"

achou 9 hours ago [-]

I just did some refactoring on a medium size code base and here are a few things to watch out for when adopting optional chaining and the new null coalescing operator:

  foo && await foo();

is not the same as

  await foo?.();

this will work in most cases but subtly, the await wraps the undefined case into a Promise, while the original code would skip the await altogether.

String regular expression matching returns null, not undefined, so rewriting code such as:

  const match = str.match(/reg(ex)/);
  return match && match[1];

is not the same thing as:

  return match?.[1];

because the latter returns undefined, not null, in case of match failure. This can cause problems if subsequent code expects null for match failure. An equivalent rewrite would be:

  return match?.[1] ?? null;

which is longer than the original and arguably less clear.

A common idiom to catch and ignore exceptions can interact poorly with optional chaining:

  const v = await foo().catch(_ => {});
  return v?.field; // property 'field' does not exist on type 'void'

This can be easily remedied by changing the first line to:

  const v = await foo().catch(_ => undefined);

Of course, these new operators are very welcome and will greatly simplify and help increase the safety of much existing code. But as in all things syntax, being judicious about usage of these operators is important to maximize clarity.

reply

mattigames 7 hours ago [-]

You have to watch out for first and last one in JavaScript? but not on TypeScript? as it isn't possible to make that mistake because you have to type it as a promise or in the last one as void.

You can even avoid the problem in the second one by using NonNullable? TypeScript? types, but I admit that's not common so its still likely to arise.

reply

achou 7 hours ago [-]

The first example can happen in TypeScript?; foo has type

  (() => Promise<void>) | undefined

admittedly it may not be all that common to have a function-valued variable that may be undefined, but it happened in the code base I was working with.

In the last example, you're right that TypeScript? will catch this at compile time. My point was to show how this compile time error can happen from refactoring to use optional chaining, and one easy solution in this case.

reply

"

---

:

in haskell is the constructor for a Data.List.NonEmpty?:

data NonEmpty? a = a :

[a]

---

want to be able to do something like this Ruby example:

  list_cart unless @cart.include? 'pineapple'

so we need:

---

bools in structs should have to have a default value, or they should have no default; they should not auto-default to 0 when unspecified

---

[3]

---

gorgoiler 21 hours ago [–]

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

reply

jcranmer 21 hours ago [–]

> (1) everything is text

And lists are space-separated. Unless you want them to be newline-separated, or NUL-separated, which is controlled by an option that may or may not be present for the command you're invoking, and is spelled completely differently for each program. Or maybe you just quote spaces somehow, and good luck figuring out who is responsible for inserting quotes and who is responsible for removing them.

reply

laumars 21 hours ago [–]

> everything is text

Everything is a byte stream. Usually that means text but sometimes it doesn't. Which means you can do fun stuff like:

reply

" Indeed, while pipes are useful at times, their system of communicationbetween programs—text traveling through standard input and standard out-put—limits their usefulness.4 First, the information flow is only one way.Processes can’t use shell pipelines to communicate bidirectionally. Second,pipes don’t allow any form of abstraction. The receiving and sending pro-cesses must use a stream of bytes. Any object more complex than a bytecannot be sent until the object is first transmuted into a string of bytes thatthe receiving end knows how to reassemble. This means that you can’tsend an object and the code for the class definition necessary to implementthe object. You can’t send pointers into another process’s address space.You can’t send file handles or tcp connections or permissions to accessparticular files or resources.At the risk of sounding like a hopeless dream keeper of the intergalacticspace, we submit that the correct model is procedure call (either local orremote) in a language that allows first-class structures (which C gainedduring its adolescence) and functional composition. " -- http://web.mit.edu/~simsong/www/ugh.pdf

cuddlybacon 21 hours ago [–]

I mostly like what they wrote about pipes. I think the example of bloating they talked about in ls at the start of the shell programming section is a good example: if pipelines are so great, why have so many unix utilities felt the need to bloat?

I think it a result of there being just a bit too much friction in building a pipeline. A good portion tends to be massaging text formats. The standard unix commands for doing that tend to have infamously bad readability.

Fish Shell seems to be making this better by making a string which has a syntax that makes it clear what it is doing: http://fishshell.com/docs/current/cmds/string.html I use fish shell, and I can usually read and often write text manipulations with the string command without needing to consult the docs.

Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

reply

code-faster 21 hours ago [–]

It's less that pipelines are friction, they're really not.

It's more that people like building features and people don't like saying no to features.

The original unix guys had a rare culture that was happy to knock off unnecessary features.

reply

 atombender 17 hours ago [–]

Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
  0
  0

You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
  0
  1

Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)

This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
  fi

I don't think I've seen a script ever bother do this.

Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.

And that's why shell scripts are so brittle.

reply

codemac 17 hours ago [–]

Just use a better shell. rc handles this wonderfully, $? is actually called $status, and it's an array, depending on the number of pipes.

reply

fomine3 15 hours ago [–]

set -e makes another pain for command that nonzero isn't mean failed (ex. diff). It changes semantics for whole script.

reply

 geophile 22 hours ago [–]

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.

reply

ehsankia 20 hours ago [–]

+1

Piping is great if you memorize the (often very different) syntax of every individual tool and memorize their flags, but in reality unless it's a task you're doing weekly, you'll have to go digging through MAN pages and documentation every time. It's just not intuitive. Still to date if I don't use `tar` for a few months, I need to lookup the hodge podge of letters needed to make it work.

Whenever possible, I just dump the data in Python and work from there. Yes some tasks will require a little more work, but it's work I'm very comfortable with since I write Python daily.

Your project looks like, but honestly iPython already lets me run shell commands like `ls` and pipe the results into real python. That's mostly what I do these days. I just use iPython as my shell.

reply

khimaros 21 hours ago [–]

The lispers/schemers in the audience may be interested in Rash https://docs.racket-lang.org/rash/index.html which lets you combine an sh-like language with any other Racket syntax.

reply

cat199 21 hours ago [–]

also what I think is the 'original' in this domain, scsh

reply

jraph 22 hours ago [–]

Your project looks really cool.

I am pretty sure I've seen a Python-based interactive shell a few years ago but I can't remember the name. Have you heard of it?

reply

x1798DE 22 hours ago [–]

I imagine you are thinking of xonsh? https://xon.sh/

reply

ketanmaheshwari 22 hours ago [–]

Unix pipelines are cool and I am all for it. In recent times however, I see that sometimes they are taken too far without realizing that each stage in the pipeline is a process and a debugging overhead in case something goes wrong.

A case in point is this pipeline that I came across in the wild:

TOKEN=$(kubectl describe secret -n kube-system $(kubectl get secrets -n kube-system

grep default cut -f1 -d ' ')grep -E '^token'cut -f2 -d':'tr -d '\t'tr -d " ")

In this case, perhaps awk would have absorbed 3 to 4 stages.

reply

dfinninger 21 hours ago [–]

Oh man. That's when knowing more about the tools you are using comes in handy. Kubectl has native JSONPath support [0].

Or at the very least, use structured output with "-o json" and jq [1], like they mention in the article.

I have always found that trying to parse JSON with native shell tools has been difficult and error-prone.

[0] https://kubernetes.io/docs/reference/kubectl/jsonpath/ [1] https://stedolan.github.io/jq/

reply

---

lisp's `(quoted-list-with-substitution ,(this-is-evaluated) )

---

first-class 'automatic type conversion and constraint checking' in Iris reminds me of Hoon:

" Ubiquitous use of parameterizable coercions (which are also first-class values) for automatic type conversion and constraint checking, e.g. list returns a basic (unspecialized) list coercion, whereas list {of: whole_number {from: 0, to: 100}, min: 4, max: 4} returns a list coercion with additional element type and length constraints. Code is untyped; however, handler interfaces can include coercion information to provide both auto-generated user documentation and run-time conversions and checks that handler arguments and results are suitable for use. Weak latent structural rather than strong nominal typing: “If a value looks acceptable, it [generally] is.” " -- https://github.com/hhas/iris-script

---

need to look at Kernel's other innovations beyond fexprs/vau/wrap. The webpage says they have 'keyed' dynamic variables which are 'fluid's 'done right'

---

" (println (s/valid? ::user {::username "rich" ::password "zegure" ::comment "this is a user" ::last-login 11000}))

:my-project.users/username ;;this is what fully-qualified keywords look like true

Spec also encourages the use of qualified keywords: Until recently in Clojure people would use keywords with a single colon but the two colons (::) mean that keywords belong to this namespace, in this case my-project.users. This is another deliberate choice, which is about creating strong names (or "fully-qualified"), that belong to a particular namespace, so that we can mix namespaces within the same map. This means that we can have a map that comes from outside our system and has its own namespace, and then we add more keys to this map that belong to our own company's namespace without having to worry about name clashes. This also helps with data provenance, because you know that the :subsystem-a/id field is not simply an ID – it's an ID that was assigned by subsystem-a. " -- [4]

---

brundolf 1 day ago [–]

One other aspect of Rust that makes it suited to GUI work is the fact that the management of mutable state is one of the core problems of writing a GUI app, and that Rust allows you to talk about mutation in a way that no other language (that I know of) does. You can a) have mutable structures, and b) declare that a function will treat one of those - passed as an argument - as deeply immutable, both within the same language. You can have exactly the amount of mutation that you want, which should be extremely enticing to any GUI developer.

reply

zozbot234 1 day ago [–]

> deeply immutable

Not really true, because the "interior mutability" pattern allows for mutating structures that are passed via a shared reference. Truly "immutable" data is in fact quite hard to characterize in a language that's as 'low-level' as Rust.

reply

brundolf 1 day ago [–]

Technically yes, there are trapdoors like RefCell?. But these are intended to be used sparingly because they move all relevant borrow-checks to runtime. Under normal circumstances immutability is statically guaranteed by the ownership system, which is much more than can be said for other languages where mutability is an option at all.

reply

Groxx 4 hours ago [–]

I suspect lifetimes get you the vast majority of the benefit of immutable data for UI purposes, tbh. It lets you ensure that references aren't retained or accessed at the wrong time, unless they provide an explicit way to bypass that (i.e. the defaults for data types is safe).

reply

---

" 3. Rust is is not that good to be a language-behind-UI. Ownership graph of GUI objects can be quite complex, contain loops and can be not known at compile time. Something GCable plays significantly better in this respect. That's why JS is so popular in UI. Sciter's script is even better than JS but that's another story. "

"

  1. 3 is brimming full of assumptions that there is good cause to suspect are false. You were present in the Towards Principled Reactive UI thread a few days ago (https://news.ycombinator.com/item?id=24599560), which is basically all about developing models that don’t depend on complex ownership graphs. Sure, GC languages work better for the traditional breed of observer-based GUIs, but that’s not the only feasible approach—it’s merely the simplest to implement at the library level, which is why it got so popular. So Rust is probably not a good language for supporting that particular traditional style of UI, but there’s reason to suspect that it may actually be very good for other types of UIs. " https://news.ycombinator.com/item?id=24599560

---

a dataframe is like a spreadsheet, but without (sometimes) hetrogeneity, formulas, UI

---

in J,

" Notice that a scalar, (17 say), is not the same thing as a list of length one (e.g. 1 $ 17), or a table with one row and one column (e.g. 1 1 $ 17). "

---

"The easiest way to include data in a question is to use dput() to generate the R code to recreate it. For example, to recreate the mtcars dataset in R, I’d perform the following steps: Run dput(mtcars) in R"

---

https://blog.klipse.tech/databook/2020/09/29/do-principles.html?essence

The principles of Data Oriented (DO) Programming are: Separate code from data Model entities with generic data structures Data is immutable Data is comparable by value Data has a literal representation

https://blog.klipse.tech/databook/2020/10/02/separate-code-data.html

the idea is that this is bad:

class Author { constructor(firstName, lastName, books) { this.firstName = firstName; this.lastName = lastName; this.books = books; } fullName() { return this.firstName + " " + this.lastName; } isProlific() { return this.books > 100; } }

and this is good, because the data structure defn (firstName, lastName, books) is separated from the operations on it (fullName and isProlific):

function createAuthorData(firstName, lastName, books) { return {firstName: firstName, lastName: lastName, books: books}; } function fullName(data) { return data.firstName + " " + data.lastName; } function isProlific () { return books > 100; }

https://blog.klipse.tech/databook/2020/10/02/generic-data-structures.html

"The most common data structures are maps (a.k.a dictionaries) and arrays. Other data structures: sets, lists and queues."

dicts, arrays good, classes bad

https://blog.klipse.tech/databook/2020/10/02/immutable-data.html

"As far as I know, Clojure is the only programming language where data is immutable by default. For other languages, adhering to data immutability requires the inclusion a third party library.

https://blog.klipse.tech/databook/2020/10/02/data-comparable-value.html

" Two arrays with same elements are considered to be equal Two maps with the same keys and values are considered to be equal "

" In Clojure, equality is defined by value in compliance with Principle #4. However, on most programming languages, equality is defined by reference and not by value. "

https://blog.klipse.tech/databook/2020/10/03/data-literal.html

    It is possible to display the content of any data collection.
    A data collection can be instantiated via a literal.

bad: var data = new Object(); data.firstName = "Isaac"; data.lastName = "Asimov"; data.details = new Object(); data.details.yearOfBirth = 1920; data.details.yearOfDeath = 1992; data;

good:

Object { "details": Object { "yearOfBirth": 1920, "yearOfDeath": 1992, }, "firstName": "Isaac", "lastName": "Asimov", }

---

after reading the previous section, i think for Oot i (strongly?) agree with all of those, except that we still want the option of 'classes' (via annotations?) for performance.

to summarize:

1. Separate code from data: i would put it, yes, use classes to encapsulate data structures, but don't put 'business logic' methods in them 2. Model entities with generic data structures: yes, in Oot we would like to make everything a typeclass, and probably make everything a generic data structure, but allow it to be made more rigid upon demand (via annotations?) for perf reasons 3. Data is immutable: yes, we'll have immutable by default 4. Data is comparable by value: i hadn't thought about this too hard yet but if he thinks it's good for this sort of thing, and Clojure does too, then it's probably the way to go 5. Data has a literal representation: yes, this makes things much more concise and also more readable

interestingly, the OVM layer should NOT (only, or even primarily) be like this (although it should support it), because we want some perf there. There we want Wren-like or JVM-like focus on static stuff like classes.

on HN, a commentor says "In the context of C++ some of us have been calling these programming/design principles "Value Oriented Design" -- https://news.ycombinator.com/item?id=24687287". mb see also https://matt.diephouse.com/2018/08/value-oriented-programming/ (already skimmed, it's not essential; although note that the first half of the page, "Don't start with a class. Start with a protocol" is sort of another way of stating our goal "everything is an interface" (except they're not saying everything is an interface, just 'start with an interface')

" I think "data are immutable" should be qualified - usually it means "data are immutable through your codepaths" and if you are mutating data, you need explicit checkouts and checkins of the data."

" As for some of the short-comings you mentioned:

"But you still need a mechanism to manage mutating data"

Clojure supports this through the use of locking constructs like atoms. [1]

"I think what it's getting at is that you don't really know the precise type of your data, over time, in a distributed system, so it's good to include the flexibility to handle that. That makes sense to me. but generic data structures aren't necessarily always the right way to handle that."

Clojure attempts to bridge the gap between generic data-structures and strongly-typed constructs using run-time specifications. [2]

" -- https://news.ycombinator.com/item?id=24690767

---

one question i have is, should we discourage circular dependencies (to allow Automatic Reference Counting (ARC)), or even provide a sort of 'mode' where a subset of data could be annotated as no-circular-dependencies?

---

https://kevinmahoney.co.uk/articles/applying-misu/

Applying “Make Invalid States Unrepresentable”

already read (skimmed). It's a good read (skim). The basic idea is to Make Invalid States Unrepresentable (misu) via removing redundancy in data structures. For example, if you have a bunch of time segments which should be a partitioning of the timeline, then instead of giving each time segment both a start time and end time, share the start time of one time segment with the end time of the next one to prevent gaps. For example, if you have a timeline which should be partitioned into segments of either type A or type B, where type B is the default if type A was not specifiec, and type B can be 'ongoing' but type A cannot, then rather than represent the type Bs explicitly just represent the start and end times of types As and infer that everywhere else are type Bs, to prevent gaps.

however many HN comments think that you should not 'misu' in this way with business requirements, because they may change, which would lead to a major redesign. Rather, only misu in this way with states that are 'invalid' in a more fundamental way.

---

" 5.1.3 dplyr basics

In this chapter you are going to learn the five key dplyr functions that allow you to solve the vast majority of your data manipulation challenges:

    Pick observations by their values (filter()).
    Reorder the rows (arrange()).
    Pick variables by their names (select()).
    Create new variables with functions of existing variables (mutate()).
    Collapse many values down to a single summary (summarise()).

These can all be used in conjunction with group_by() which changes the scope of each function from operating on the entire dataset to operating on it group-by-group. These six functions provide the verbs for a language of data manipulation.

All verbs work similarly:

    The first argument is a data frame.
    The subsequent arguments describe what to do with the data frame, using the variable names (without quotes).
    The result is a new data frame.

" -- https://r4ds.had.co.nz/transform.html

"# Select all columns except those from year to day (inclusive): select(flights, -(year:day))"

---

"Missing values are always sorted at the end" in dplyr

---

in dplyr:

"filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA values. If you want to preserve missing values, ask for them explicitly:"

---

R says NA instead of nan

---

some of the comments in https://news.ycombinator.com/item?id=24730713 are a good read

"

nickpeterson 2 days ago [–]

There is a book, Applied Mathematics for Database professionals that uses a terse mathematical notation to describe queries and I remember actually kind of liking it. I especially like that they used it to describe the state of the database and the transforms. "

"

reply

jhallenworld 2 days ago [–]

I don't like either syntax. Wikipedia has this example:

QUEL:

    range of E is EMPLOYEE
    retrieve into W
    (COMP = E.Salary / (E.Age - 18))
    where E.Name = "Jones"

SQL:

    select (e.salary / (e.age - 18)) as comp
    from employee as e
    where e.name = "Jones"

I would prefer an operator syntax that directly mimics relational algebra. Something like:

    w = employee(name == "Jones")[comp = salary / (age - 18)]

So () is "where", [] is "project" (choose or create columns) and you can use * for join and + for union. The result is a table with a column named comp.

reply

roenxi 2 days ago [–]

They should implement something using straight functions, extremely spartan with no special syntax at all, and let everyone build their own favoured DSL over the top.

One of my major complaints about SQL is the syntax is so finicky that it is really hard to replace it with a [something -> sql] layer, because the something layer can't generate all the silly syntactic forms that SQL uses.

Eg, personal favourite, it is easy to have a dsl that translates

  select(y = fn(x)) -> select fn(x) as y

that then breaks down because it can't construct

  ??? -> select extract(month from x) as y

and that is the only syntax the SQL database decided to understand. There are too many cases like that that need special handling, especially once SQL dialect-specific stuff comes into play.

reply "

---

"Data-bearing enums...

  1. [derive(Copy, Clone, Debug)] pub enum ETM3Header { BranchAddress? { addr: u8, c: bool }, ASync, CycleCount?, ISync, Trigger, OutOfOrder? { tag: u8, size: u8 }, StoreFailed?, ISyncCycleCount?, OutOfOrderPlaceholder? { a: bool, tag: u8 }, VMID, NormalData? { a: bool, size: u8 }, Timestamp { r: bool }, DataSuppressed?, Ignore, ValueNotTraced? { a: bool }, ContextID?, ExceptionExit?, ExceptionEntry?, PHeaderFormat?1 { e: u8, n: u8 }, PHeaderFormat?2 { e0: bool, e1: bool }, } " -- [5]

---

" In APL, arrays can be both multidimensional and contain nested arrays. This can be a little difficult to wrap your head around, so let's look at an example.

Using the multidimensional array in the example above, we'll replace the item in the central (2, 2) position (the number 4) with a new multidimensional array consisting of the letters a, b, c, and d. 0 1 2 a b 3 5 c d 6 7 8 " [6]

---

" in Swift enums have associated values)" -- [7]

---

https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/

examples given where X = buffer class expression file object table/relation resource sexp

https://lobste.rs/s/g7c661/everything_is_x

also mentions x = memory_address stream function set category actor symbol relation array procedure file_system [cons cell, or atom, or nil] [register or immediate] [stream/file or process] [a forth word]

---

https://github.com/redplanetlabs/specter

---

13 lojikil 3 days ago (unread)

link flag
    Something other than “everything is bytes”, for starters. The operating system should provide applications with a standard way of inputting and outputting structured data, be it via pipes, to files, …

It’s a shame I can agree only once.

Things like Records Management Services, ARexx, Messages and Ports on Amiga or OpenVMS’? Mailboxes (to say nothing of QIO), and the data structures of shared libraries on Amiga…

Also, the fact that things like Poplog (which is an operating environment for a few different languages but allows cross-language calls), OpenVMS’s? common language environment, or even USCD p-System aren’t more popular is sad to me.

Honestly, I’ve thought about this a few times, and I’d love something that is:

    an information utility like Multics
    secure like seL4 and Multics
    specified like seL4
    distributed like Plan9/CLive
    with rich libraries, ports, and plumbing rules
    and separated like Qubes
    with a virtual machine that is easy to inspect like LispM’s OSes, but easy to lock down like Bitfrost on one-laptop per child…

a man can dream.

---

4 JohnCarter? 3 days ago (unread)

link flag

In many ways you can’t even remove the *shells from current OS’s IPC is so b0rked.

How can a shell communicate with a program it’s trying to invoke? Array of strings for options and a global key value dictionary of strings for environment variables.

Awful.

It should be able to introspect to find out the schema for the options (what options are available, what types they are…)

Environment variables are a reliability nightmare. Essentially hidden globals everywhere.

Pipes? The data is structured, but what is the schema? I can pipe this to that, does it fit? Does it make sense….? Can I b0rk your adhoc parser of input, sure I can, you scratched it together in half a day assuming only friendly inputs.

In many ways IPC is step zero to figure out. With all the adhoc options parsers and adhoc stdin/out parsers / formatters being secure, robust and part of the OS.

---

"

My favorite near-miss is this observation Graham made in December 2001, only months after he began development on Arc:

    assoc-lists turn out to have a property that is very useful in recursive programs: you can cons stuff onto them nondestructively. We end up using assoc-lists a lot.

He had noticed that immutable maps are a useful data structure. Building on this, he might have found a paper published the previous year describing an efficient implementation for immutable maps, and made those a foundational data representation in his language. That, in any case, is what Rich Hickey did when creating Clojure, which was released three months before Arc and became the most widely-used Lisp ever made. Instead, assoc-lists remain a list-backed data structure which, Arc’s documentation informs us, makes them “inefficient for large numbers of entries”. "

---

 pansa2 19 hours ago [–]

Python's descriptor protocol is very elegant. Given:

    class C:
        def __init__(self):
            self.a = 1
        def b(self):
            print(f'b({self})')
        @property
        def c(self):
            print(f'c({self})')
    o = C()

The descriptor protocol is a generic way to enable `o.a` to access an attribute of `o`, `o.b` to access an attribute of `C` and bind `self` to `o`, and `o.c` to access an attribute of `C`, bind `self` to `o`, and call it.

Do any other dynamic languages use a similar protocol for attribute access?

Ruby, for example, doesn't need to - it can use a simpler approach because `o.x` never refers to an attribute of `o`.

JavaScript?'s solution is less elegant - the way it sets the value of `this` is infamous, and AFAIK properties are treated as a special case rather than built on top of a generic protocol.

reply

lifthrasiir 10 hours ago [–]

I don't think Python's approach is particularly more elegant than JavaScript?, as they both distinguish property-as-value and property-as-function. It's just a matter of where to put property-as-function (and unlike Python, JavaScript? can have non-configurable properties which can be optimized). As property-as-function can subsume property-as-value, no built-in property-as-value would be truly elegant (Ruby is close, but not exact).

reply

sillysaurusx 9 hours ago [–]

I agree. Python's descriptor pattern was an endless source of confusion for me. But, it's much easier to do mixins in Python than in JS, even if the mro algorithm is a bit complicated...

reply

pansa2 9 hours ago [–]

Sorry, could you clarify what you mean by “distinguish property-as-value and property-as-function“? How does Python distinguish between these two?

reply

---

i've been hearing a lot about HCL lately. Seems like now the frontrunners for data languages are TOML and HCL. YAML is often mentioned as well but i would go with StrictYAML? at the least. Protobuf is also used a lot although it appears to have some problems, and of course JSON is the mainstay.

---

so what data lang is on top these days?

Ppl seem to like TOML and HCL

HCL allows expressions, so it's probably not what i want

Something ppl like in HCL over TOML is the way that you can nest things using a JSON-like syntax, without referencing the containing block name in the nested block

something important in both of them and lacking in JSON is comments

---

gfxgirl 4 days ago [–]

I just wish there were better UIs for much of this stuff. Most code editors have a regex search and replace figuring out the regex requires too much regex expertise. I can't think of a good example off the top of my head but for example, being able to search for things that match multiple rules. It would be much easier for many users to be able to say "starts with x", "ends with y", "has abc in the middle" as 3 separate statements than to have to derive the regex that does it.

Similarly being able to search easily for balanced parens, brackets, quotes. And being able to search programming language aware as in "only in strings", "only in comments", etc...

All of those seem like they'd be useful features but I know of no editor that do any more than just a single regex with no context

reply

---

aarroyoc 3 days ago [–]

Some of the things you say can already be achieved with Prolog DCGs, which are more or less like pattern matching over lists (usually lists of chars, but not needed).

reply

---

abernard1 45 days ago [–]

I concur.

The dynamic inheritance part of Ruby is still core even in a non-Rails ecosystem. That does make optimizing performance more difficult.

But on a much simpler level, a lot of Pythonic data code is just functions + values, without them being complected in an object. That makes a lot of stuff like wrapping numerical libraries in C very easy in Python. I think you'd still have a bit of a culture shock writing procedural, return-by-value code in Ruby. That's pretty normal in Python and I think one of the (many) reasons it got picked up by STEM disciplines in academia. Looks over shoulder at awful code in Numerical Recipes in Fortran book.

Different strokes for different folks and all that, I just think there are probably constraints (and cultural values) that Python has that make it more suited to data problems than Ruby.

mumblemumble 45 days ago [–]

I'm inclined to agree.

One of Python's strengths in this space is that it's a procedural language with just enough in the way of object-oriented facilities. That makes it possible to write reasonably object-oriented libraries with easy-to-learn interfaces, while still sticking to procedural idioms for the actual data hacking.

Which, in turn, means makes it a comfortable environment and interfacing mechanism for both engineers and analysts.

That said, sometimes I look at tools like https://moosetechnology.org and desperately wish I could be using something like that at work.

pjmlp 45 days ago [–]

Just enough?

Python has the full program with multiple inheritance, metaclasses, abstract classes, slots, attributes, properties and decorators.

One can go wizard level with Python's OOP facilities.

abernard1 45 days ago [–]

Totally.

Another thing is that outside of scientific data applications, ordinary data processing apps at scale have been map-reduce for a long time. That's an inherently functional paradigm that's closer to Python.

Having parallelized heavily OO Ruby code across machines by unrolling it into functions that could be sent data payloads... that's not a fun thing to sell in an OO culture. In Python you're kind of already writing code that way, so increasing performance by splitting the data onto different machines is simple.

---

cutler 45 days ago [–]

Functional paradigm closer to Python? I think you mean procedural as Guido has openly professed his distaste for all things functional to which Python's crippled lambdas bear witness. Ruby, on the other hand, with its procs, blocks and lambdas, supports an elegant functional style even if it is implemented in OO.

abernard1 45 days ago [–]

So I'm using a simple, deliberately not correct version of "functional" because a lot of people don't know the difference.

I don't like it when people break out their monocle and pipe and complain about this or that being true functional :-). I really do just mean "procedural functions + return by value instead of mutation" here.

It is completely normal to have a python script that is just ordinary functions, without state being enclosed in a class and accessed via `self`. This "My First Program" behavior scales extremely well, both from a parallelism perspective, as well from a complexity perspective. OO features can be applied iteratively on top of that.

With Ruby--syntax aside--it is just very rare to see people only write functions or private_class_methods. The "Everything is a Class" mantra is very strong, and there's lots of rich behavior in Ruby attached onto what would be considered primitives or basic structs in Python.

With data science specifically, this natural way of pushing simple procedures, primitives, structs from Python down into C has lots of performance benefits. Because those functions are "flat", you don't have to traverse the call stack routinely and lose performance there.

pjmlp 45 days ago [–]

All of which go back to Smalltalk blocks and collections.

As for Python, Guido might not like it, but Python has enough features to do FP, in spite of crippled lambdas.

mumblemumble 45 days ago [–]

I don't think that would really be too bad, TBH. Don't forget that the first project to popularize distributed map-reduce, Hadoop, was a Java project.

All you really have to do to translate map and reduce into an OO paradigm is have the map and reduce operations take objects that implement specific interfaces instead of functions that have a certain signature.

---

rat87 45 days ago [–]

Python isn't less expressive for the most part except for lack of blocks(although can be emulated with decorators and def _), and ruby's strategic lack of parentheses and not being to add/change methods in c based classes.

Python has just as much access to reflection, it has more powerful and arguably easier to shoot yourself in the foot inheritance (python has multiple inheritance while ruby only has mixins). Python also has method_missing in __getattr__. Arguably python tends to use these less often then ruby but I've definitely used both reflection and __getattr__. I've also once had to debug a really hairy multiple inheritance issue(in a test framework that used base classes to test different hardware features where you might want to have both you'd need to use multiple inheritance and then make sure they were initialized in the right order, multiple inheritance can get really annoying)

---

AcerbicZero? 45 days ago [–]

I've found Ruby to be far more "learnable", in that once I learn how something works in Ruby, it will work that way in most every situation going forward, whereas Python has a lot of caveats to its functionality. This might just be due to my use cases, but the Ruby documentation is 100x more useful for getting me from point A to point B than anything I've seen from Python's docs.

I still use Python daily, since its so ubiquitous, I just find myself having a difficult time using it for anything remotely complicated without a lot of extra research.

---

" Arrays are a first-class citizen in Fortran. A higher level abstraction for arrays, compared to C, makes it easier for scientists to describe their domain. Fortran supports multi-dimensional arrays, slicing, reduction, reshaping, and many optimizations for array based calculations like vectorization. "

" Point of ease 1: Fortran array handling features

Arrays (or in physics-speak, matrices) lie at the heart of all physics calculations. Fortran 90+ incorporates array handling features, similar to APL or Matlab/Octave. Arrays can be copied, multiplied by a scalar, or multiplied together quite intuitively as:

A = B A = 3.24*B C = A*B B = exp(A) norm = sqrt(sum(A2))

Here, A, B, C are arrays, with some dimensions (for instance, they all could 10x10x10). C = A*B gives an element-by-element multiplication of A and B, assuming A and B are the same size. To do a matrix multiplication, one would use C = matmul(A,B). Almost all of the intrinsic functions in Fortran (Sin(), Exp(), Abs(), Floor(), etc) can take arrays as arguments, leading to easy of use and very neat code. Similar C++ code simply does not exist. In the base implementation of C++, merely copying an array requires cycling through all the elements with for loops or a call to a library function. Trying to feed an array into the wrong library function in C will return an error. Having to use libraries instead of intrinsic functions means the resulting code is never as neat, as transferable, or as easy to learn.

In Fortran, array elements are indexed using the simple syntax A[x,y,z], whereas in C++ one has to use A[x][y][z]. Arrays are indexed starting at 1, which conforms to the way physicists talk about matrices, unlike C++ arrays, which start at 0. The following Fortran code shows a few more array features:

A = (/ i , i = 1,100 /) B = A(1:100:10) C(10:) = B

First a vector A is created using an implicit do loop, also called an array constructor. Next, a vector B is created from every 10th element of A using a ‘stride’ of 10 in the subscript. Finally, array B is copied into array C, starting at element 10. Fortran supports declaring arrays with indices that are zero or negative:

double precision, dimension(-1:10) :: myArray

A negative index may sound silly, but I have heard that they can be very useful – imagine a negative index as an area with ‘extra space’ for annotations. Fortran also supports vector-valued indices. For instance, we can extract elements 1, 5, and 7 from a Nx1 array A into a 3×1 array B using:

subscripts = (/ 1, 5, 7 /) B = A(subscripts)

Fortran also incorporates masking of arrays in all intrinsic functions. For instance, if we want to take the log of a matrix on all of the elements where it is greater than zero we use

log_of_A = log(A, mask= A .gt. 0)

Alternatively we may want to take all the negative points in an array and set them to 0. This can be done in one line using the ‘where’ command:

where(my_array .lt. 0.0) my_array = 0.0

Dynamically allocating and deallocating arrays in Fortran is easy. For instance, to allocate a 2D array:

real, dimension(:,:), allocatable :: name_of_array allocate(name_of_array(xdim, ydim))

C++ requires the following code:

int array; array = malloc(nrows * sizeof(double * ));

for(i = 0; i < nrows; i++){ array[i] = malloc(ncolumns * sizeof(double)); }

To deallocate an array in Fortran, we use

deallocate(name_of_array)

In C++, we have:

for(i = 0; i < nrows; i++){ free(array[i]); } free(array); " -- [9]

"However, more innovation is needed to make languages like Chapel and Julia the go-to alternative climate modeling so that, as these languages continue to mature and develop stable ecosystems, the climate modeling community can benefit from their enhancements over Fortran."

---

cpdean 12 hours ago [–]

> ...or why one should use it over plain Datalog...

I have been looking for examples of how to use a practical implementation of Datalog for years and the closest I've come to is actually miniKanren instead. Could you point me to codebases that productively use Datalog internally?

reply

hcarvalhoalves 10 hours ago [–]

Datomic: https://docs.datomic.com/cloud/query/query-data-reference.ht... https://docs.datomic.com/cloud/query/query-data-reference.html

Datascript: https://github.com/tonsky/datascript

Crux: https://opencrux.com/main/index.html

reply

kovvy 2 hours ago [–]

Souffle: https://github.com/souffle-lang/souffle

Used by projects such as Doop: https://bitbucket.org/yanniss/doop and Ddisasm: https://github.com/grammatech/ddisasm

reply

---

pandas vs sql

https://news.ycombinator.com/item?id=27026843

---

data oriented (columnar) processing vs OOP, and cache hits/perf https://blog.royalsloth.eu/posts/the-compiler-will-optimize-that-away/

---

https://github.com/golang/go/issues/45955 proposal: slices: new package to provide generic slice functions #45955

---

jec 13 hours ago

link flag

IME, the real magic of APL, and what the numerous APL-influenced array languages have consistently lost in translation, are the concatenative, compositional, functional operators that give rise to idiomatic APL. They have taken the common usecases, but forgone the general ones. For example, numpy provides cumsum as a common function, but APL & J provide a more general prefix scan operator which can be used with any function, no matter whether primitive or user-defined, giving rise to idioms like “running maximum” and “odd parity” to name just a couple. Likewise, numpy has inner but it only computes the ordinary “sum product” algorithm while APL & J have the matrix product operator that affords the programmer the ability to easily define all sorts of unusual matrix algorithms that follow the same inner pattern.

This is not even to mention the fantastic sorts of other operators, like the recursive power of verb or the sort-of-monadic under that AFAICT have no near equivalent in numpy.

    ~
 jec 13 hours ago | link | flag |

I wish we could do away with this archaic version of Game of Life for APL demos. Modern APLs have convolution operators that are more suited to that type of problem than wastefully creating multiple rotated copies of the input array.

https://code.jsoftware.com/wiki/Vocabulary/semidot3#dyadic

---

https://wiki.alopex.li/BetterThanJsonIdeas https://wiki.alopex.li/BetterThanC

---

" Polymorphic data with Specter

Specter is a library we developed for supercharging our ability to work with data structures, especially nested and recursive data. Specter is based around the concept of “paths” into data structures, where a path can “navigate” to any number of values starting from the root of a data structure. The path can include traversals, views, and filters, and they’re deeply composable.

Our compiler compiles code into an abstract representation with a distinct record type for each kind of operation possible in our language. There are a variety of attributes every operation type must expose in a uniform way. For example, one of these attributes is “needed fields”, the fields in the closure of that operation that it requires to do its work. A typical way to express this polymorphic behavior would be to use an interface or protocol, like so:

(defprotocol NeededFields? (needed-fields [this]))

The problem with this approach is it only covers querying. Some phases of our compiler must rewrite the fields throughout the abstract representation (e.g. uniquing vars to remove shadowing) and this protocol doesn’t support that. A (set-needed-fields [this fields] ) method could be added to this protocol, but that doesn’t cleanly fit data types which have a fixed number of input fields. It also doesn’t compose well for nested manipulation.

Instead, we use Specter’s "protocol paths" feature to organize the common attributes of our varying compiler types. Here’s an excerpt from our compiler:

(defprotocolpath NeededFields? [])

(defrecord+ OperationInput? [fields :- [(s/pred opvar?)] apply? :- Boolean ])

(defrecord+ Invoke [op :- (s/cond-pre (s/pred opvar?) IFn RFn) input :- OperationInput?])

(extend-protocolpath NeededFields? Invoke (multi-path [:op opvar?] [:input :fields ALL]))

(defrecord+ VarAnnotation? [var :- (s/pred opvar?) options :- {s/Keyword Object}])

(extend-protocolpath NeededFields? VarAnnotation? :var)

(defrecord+ Producer [producer :- (s/cond-pre (s/pred opvar?) PFn)])

(extend-protocolpath NeededFields? Producer [:producer opvar?])

"Invoke", for instance, is the type that represents calling another function. The :op field could be a static function or a var reference to a function in the closure. The other path navigates to all the fields used as arguments to the function invocation.

This structure is extremely flexible and allows for modifications to be expressed just as easily as queries by integrating directly with Specter. For instance, we can append a "-foo" suffix to all the needed fields in a sequence of operations like so: 1

(setval [ALL NeededFields? NAME END] "-foo" ops)

If we want the unique set of fields used in a sequence of ops, the code is: 1

(set (select [ALL NeededFields?] ops))

Protocol paths are a way to make the data itself polymorphic and able to integrate with the supercharged abilities of Specter. They greatly reduce the number of manipulation helper functions that would be required otherwise and make the codebase far more comprehensible. "

---

AndrewStephens? 25 hours ago

link flag

Const is such a useful concept in C++ but nobody is seriously suggesting changing the default now. If I was put in charge of designing a language today I would break it down like this:

    local variables - not const by default. They are variables - they vary.
    function parameters - const by default (references and pointers could still point to non-const data). Maybe even always const, it simplifies ownership rules.
    globals/statics - const by default
    instance members - const by default except maybe in structs
    instance methods - const by default

[10]

---

anonymfus on March 14, 2020 [–]

That is so sad.

"There’s something deeply right about how list indexing and function application are the same operation". That is a quote from a recent submission about K (https://news.ycombinator.com/item?id=22504106), and that is a perfect example of how I usually get VB nostalgia attacks: somebody talks about a thing in other language they consider amazing implying that that thing is unique for that language despite it was implemented in VB and earlier. Sometimes that somebody can ever be Microsoft as they brag about adding more features into C#'s switch statement with increasingly ugly syntax in every new version completely ignoring that they did them better in VB.

kragen on March 14, 2020 [–]

VB uses the same syntax for array indexing and subroutine invocation, but they aren't the same operation; you can't define a parameter that is either an array or a subroutine. So the extra expressive power that this unification provides in K is absent in VB.

int_19h on March 14, 2020 [–]

Parentheses for array indexing dates all the way back to the original Dartmouth BASIC, and from there to the very first FORTRAN. It's one of the oldest pieces of PL syntax still around in its original meaning.

yyhhsj0521 on March 14, 2020 [–]

I'm pretty sure that there are a few languages where array indexing is basically calling a function that returns a reference to the cell you're accessing.

DaiPlusPlus? on March 14, 2020 [–]

Since C# 1.0, any type can have an indexer property defined that can accept any parameter types and can can return anything - though it wasn't until C# 7.0 that we could return a first-class reference to a value, including array members (`public ref int this[int idx] => ref this.someArray[idx]`).

sk5t on March 14, 2020 [–]

Perhaps not exactly what you mean, but Scala eschews [] for array access, and arr(0) returns the same as arr.apply(0).

---

bob1029 3 hours ago [–]

> What can you say in this language that would be impossibly inconvenient to say in others?

This ideology is why I believe so strongly in the power of SQL (and similarly-expressive functional/DSLs). They are effectively limitless in their ability to describe problem spaces. In SQL, joining 30 different dimensions of data is a matter of the same # of lines of SQL. Doing such a thing in any imperative language would be a medium nightmare at best. Even with LINQ at your disposal, this is not fun.

---

pandas discussion https://news.ycombinator.com/item?id=22187121

---

https://stopa.io/post/279 Database in the Browser, a Spec

---

https://mchow.com/posts/2020-02-11-dplyr-in-python/ https://news.ycombinator.com/item?id=29949234

---

a goal of oot should be to have expressive enough data structures to be able to be configured to capture the style of the data structures of almost any other programming language.

---

"Jd has one or more databases. A database has one or more tables. A table is rows (unnamed) and columns (named). Data in a column is of the same type (integer, float, datetime, fixed length characters, variable length characters, etc.). Tables can be joined with other tables to function as if they were that large table. Rows and columns can be retrieved by queries on the joined tables. Retrieved data from a query can be aggregated, grouped, and sorted. Rows can be deleted, updated, and inserted. "

---

i dont quite buy this but:

" ~ andyc edited 5 hours ago

link flag

Graphs are really common data structures but there hasn’t yet been an “everything’s a graph” language

Nearly all common languages like Python, JS, Java, OCaml, etc. let you express graphs with records / objects. Everything is a graph!

If you want a homogeneous graph, you just declare a single node type, like

class Node: edges: List[Node] # or maybe Dict[str, Node] if you want them to be named payload: int

    Or you can have a heterogeneous graph with many different node types.
    If you want to label the edges, you can reify them as their own type

The whole heap is graph-shaped!

It is true that many programs in these languages are more tree-like than graph-like. And sometimes imperative code to manipulate graphs is hard to visualize.

But I think there doesn’t need to be a separate language of graphs for this reason.

If you want to see graphs done in plain C, look at this DFA code by Russ Cox:

https://swtch.com/~rsc/regexp/regexp1.html

e.g. Implementation: Compiling to NFA

"

---

(regarding the idea of an "everything is a graph" language):

Awhile back I said on this forum it would be interesting to have a doubly link list based language… and it took me quite some time to spot the obvious flaw. You can’t have (cheap) immutable data structures .

But yes, a digraph as a first class citizen would be very useful. It really is a pretty fundamental data structure. -- [11]

--- " I do find it useful to try to organize code so that most functions only look at their explicit inputs, and where reasonable don't mutate those inputs. But I tend to do that with arrays and hashtables, rather than the pointer-heavy immutable structures typically found in functional languages. The latter imposes a low performance ceiling that makes many of the problems I work on much harder to solve.

The main advantage I see in functional programming is that it encourages tree-shaped data, one-way dataflow and focusing on values rather than pointer identity. As opposed to the graph-of-pointers and spaghetti-flow common in OOP languages. But you can just learn to write in that style from well-designed imperative code (eg like this or this). And I find it most useful at a very coarse scale. Within the scope of a single component/subsystem mutation is typically pretty easy to keep under control and often very useful.

(Eg here the top-level desugar function is more or less functional. It's internals rely heavily on mutation, but they don't mutate anything outside the Desugarer struct.). " -- [12]

---

" I don’t use the ideas that are idiomatic in haskell. I write very imperative code, I use lots of mutable state, I avoid advanced type system features. These days I even try to avoid callbacks and recursion where possible (the latter after a nasty crash at materialize).

Look at this:

https://github.com/jamii/imp/blob/2f85cca53b86b4c56392afb5b0596bdf2697b19b/lib/imp/lang/pass/desugar.zig

I don’t even pass the scope around functionally - I mutate it before and after each recursive call.

At one point the code gathers an array of mutable pointers to parts of the expression so that some other code can mutate them all at once.

There is nothing about this that was learned from haskell.

    he says that applying ideas from functional programming is really valuable

This is what I said:

    I do find it useful to try to organize code so that most functions only look at their explicit inputs, and don’t mutate those inputs.

The idea of not writing code as a giant mess of global mutable state is not unique to haskell.

The only part of the code I linked above that looks vaguely functional is at a very coarse level - the top level desugar function doesn’t mutate the input expression, and it only appends to the store.

But the rest of the code is only allowed to mutate things within a single component and most of that mutable state is transient. I try to avoid having mutable pointers cross components, and also limit when and where long-lived state can be mutated. With the goal being not to avoid mutation, but to make it easy to find when reading the code or debugging. " -- [13]

---

lmm 58 days ago

root parent next [–]

> And, to be brutally honest, as much as I love those functional combinators, first-class functions, streams, etc, they suck to reason about.

> Sometimes loops are better!

That I think is backwards. A loop could be doing literally anything - it probably is futzing with global variables - so there's no way to reason about it except by executing the whole thing in your head. A map (or mapA) or a fold (or foldM) or a filter or a scan is much more amenable to reasoning, since it's so much more specific about what it's doing even without looking at the body.

andyferris 58 days ago

root parent next [–]

I like to write loops where they follow the functional map/filter/reduce paradigm where they don’t mutate anything except some initial variables you “fold” over (defined immediately prior to the loop) and which are treated immutable (or uniquely owned) after the loop.

I find this has good readability and by containing the mutation you can reason about it “at a distance” quite simply since further away it’s for-intents-and-purposes pure code.

What you might lose are the compiler-enforced guarantees that a functional language gives you. Some languages give you the best of both worlds - with rust you could put this in a pure function (immutable inputs, owned outputs) and the borrow checker even reasons about things like this within a function body.

galaxyLogic 58 days ago

root parent next [–]

I think a loop is ok if it only modifies variables whose scope is the loop.

And that's what map() basically does.

titzer 58 days ago

root parent prev next [–]

> it probably is futzing with global variables

The opposite is true in my experience. Loops working over variables local to a function is a lot easier to reason about then a series of intricately intertwined lambdas, if they share state. If everything you do is functional then a series of pipelines might be alright, but it might not. If the functions are spread all over the place it can take a larger amount of mental load to keep all the context together.

Code should permit local reasoning, and anytime that is obscured, rather than helped by, abstractions, we incur additional cognitive load.

---

    toastal 2 days ago | link | flag | It’s not just that though. JSX compiles down to something incredibly simple in theory: trinary functions (string element name, object of attributes, and array of child elements). This isn’t complex but JSX adds a ton of complexity to do the same thing—the only part that sucks is that JavaScript?, not being a functional language, the ergonomic are terrible, impossible to read. You look at ClojureScript?, PureScript?, Elm, and the API is pretty much this but legible like Pug.

---