proj-oot-ootDataNotes5

Difference between revision 23 and current revision

No diff available.

" Preserving order of dictionaries and kwargs

In CPython 3.6+ dicts behave like OrderedDict? by default (and this is guaranteed in Python 3.7+). This preserves order during dict comprehensions (and other operations, e.g. during json serialization/deserialization)

import json x = {str(i):i for i in range(5)} json.loads(json.dumps(x))

  1. Python 2 {u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}
  2. Python 3 {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}

Same applies to kwargs (in Python 3.6+), they're kept in the same order as they appear in parameters. Order is crucial when it comes to data pipelines, previously we had to write it in a cumbersome manner:

from torch import nn

  1. Python 2 model = nn.Sequential(OrderedDict?([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU?()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU?()) ]))
  2. Python 3.6+, how it *can* be done, not supported right now in pytorch model = nn.Sequential( conv1=nn.Conv2d(1,20,5), relu1=nn.ReLU?(), conv2=nn.Conv2d(20,64,5), relu2=nn.ReLU?()) )

Did you notice? Uniqueness of names is also checked automatically.

Iterable unpacking

  1. handy when amount of additional stored info may vary between experiments, but the same code can be used in all cases model_paramteres, optimizer_parameters, *other_params = load(checkpoint_name)
  2. picking two last values from a sequence
  1. This also works with any iterables, so if you have a function that yields e.g. qualities,
  2. below is a simple way to take only last two values from a list

...

Multiple unpacking

Here is how you merge two dicts now:

x = dict(a=1, b=2) y = dict(b=3, d=4)

  1. Python 3.5+ z = {x, y}
  2. z = {'a': 1, 'b': 3, 'd': 4}, note that value for `b` is taken from the latter dict.

See this thread at StackOverflow? for a comparison with Python 2.

The same approach also works for lists, tuples, and sets (a, b, c are any iterables):

[*a, *b, *c] # list, concatenating (*a, *b, *c) # tuple, concatenating {*a, *b, *c} # set, union

Functions also support this for *args and kwargs:

Python 3.5+ do_something({default_settings, custom_settings})

  1. Also possible, this code also checks there is no intersection between keys of dictionaries do_something(first_args, second_args)

...

Future-proof APIs with keyword-only arguments ... In Python 3, library authors may demand explicitly named parameters by using *:

class SVC(BaseSVC?): def __init__(self, *, C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, ... )

...

    Enums are theoretically useful, but
        string-typing is already widely adopted in the python data stack
        Enums don't seem to interplay with numpy and categorical from pandas
    ...
    Python 3 has stable ABI
    

" -- [1]

---

" Problems for code migration specific for data science (and how to resolve those)

    support for nested arguments was dropped
    map(lambda x, (y, z): x, z, dict.items())
    However, it is still perfectly working with different comprehensions:
    {x:z for x, (y, z) in d.items()}
    In general, comprehensions are also better 'translatable' between Python 2 and 3.
    map(), .keys(), .values(), .items(), etc. return iterators, not lists. Main problems with iterators are:
        no trivial slicing
        can't be iterated twice
    Almost all of the problems are resolved by converting result to list.

...

Course authors should spend time in the first lectures to explain what is an iterator, why it can't be sliced / concatenated / multiplied / iterated twice like a string (and how to deal with it). " -- [2]

---

" Optional Chaining #

let x = foo?.bar.baz();

this is a way of saying that when foo is defined, foo.bar.baz() will be computed; but when foo is null or undefined, stop what we’re doing and just return undefined.”

More plainly, that code snippet is the same as writing the following.

let x = (foo === null

foo === undefined) ?
    undefined :
    foo.bar.baz();

You might find yourself using ?. to replace a lot of code that performs repetitive nullish checks using the && operator.

Before if (foo && foo.bar && foo.bar.baz) { ... }

After-ish if (foo?.bar?.baz) { ... }

Keep in mind that ?. acts differently than those && operations since && will act specially on “falsy” values (e.g. the empty string, 0, NaN?, and, well, false), but this is an intentional feature of the construct. It doesn’t short-circuit on valid data like 0 or empty strings.

Optional chaining also includes two other operations. First there’s the optional element access which acts similarly to optional property accesses, but allows us to access non-identifier properties (e.g. arbitrary strings, numbers, and symbols):

    arr?.[0];

There’s also optional call, which allows us to conditionally call expressions if they’re not null or undefined.

    log?.(`Request started at ${new Date().toISOString()}`);

Nullish Coalescing #

You can think of this feature - the ?? operator - as a way to “fall back” to a default value when dealing with null or undefined. When we write code like

let x = foo ?? bar();

this is a new way to say that the value foo will be used when it’s “present”; but when it’s null or undefined, calculate bar() in its place.

Again, the above code is equivalent to the following.

let x = (foo !== null && foo !== undefined) ? foo : bar();

The ?? operator can replace uses of

when trying to use a default value. For example, the following code snippet tries to fetch the volume that was last saved in localStorage (if it ever was); however, it has a bug because it uses .

function initializeAudio() { let volume = localStorage.volume

0.5
    // ...}

When localStorage.volume is set to 0, the page will set the volume to 0.5 which is unintended. ?? avoids some unintended behavior from 0, NaN? and "" being treated as falsy values. "

"

achou 9 hours ago [-]

I just did some refactoring on a medium size code base and here are a few things to watch out for when adopting optional chaining and the new null coalescing operator:

  foo && await foo();

is not the same as

  await foo?.();

this will work in most cases but subtly, the await wraps the undefined case into a Promise, while the original code would skip the await altogether.

String regular expression matching returns null, not undefined, so rewriting code such as:

  const match = str.match(/reg(ex)/);
  return match && match[1];

is not the same thing as:

  return match?.[1];

because the latter returns undefined, not null, in case of match failure. This can cause problems if subsequent code expects null for match failure. An equivalent rewrite would be:

  return match?.[1] ?? null;

which is longer than the original and arguably less clear.

A common idiom to catch and ignore exceptions can interact poorly with optional chaining:

  const v = await foo().catch(_ => {});
  return v?.field; // property 'field' does not exist on type 'void'

This can be easily remedied by changing the first line to:

  const v = await foo().catch(_ => undefined);

Of course, these new operators are very welcome and will greatly simplify and help increase the safety of much existing code. But as in all things syntax, being judicious about usage of these operators is important to maximize clarity.

reply

mattigames 7 hours ago [-]

You have to watch out for first and last one in JavaScript? but not on TypeScript? as it isn't possible to make that mistake because you have to type it as a promise or in the last one as void.

You can even avoid the problem in the second one by using NonNullable? TypeScript? types, but I admit that's not common so its still likely to arise.

reply

achou 7 hours ago [-]

The first example can happen in TypeScript?; foo has type

  (() => Promise<void>) | undefined

admittedly it may not be all that common to have a function-valued variable that may be undefined, but it happened in the code base I was working with.

In the last example, you're right that TypeScript? will catch this at compile time. My point was to show how this compile time error can happen from refactoring to use optional chaining, and one easy solution in this case.

reply

"

---

:

in haskell is the constructor for a Data.List.NonEmpty?:

data NonEmpty? a = a :

[a]

---

want to be able to do something like this Ruby example:

  list_cart unless @cart.include? 'pineapple'

so we need:

---

bools in structs should have to have a default value, or they should have no default; they should not auto-default to 0 when unspecified

---

[3]

---

gorgoiler 21 hours ago [–]

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

reply

jcranmer 21 hours ago [–]

> (1) everything is text

And lists are space-separated. Unless you want them to be newline-separated, or NUL-separated, which is controlled by an option that may or may not be present for the command you're invoking, and is spelled completely differently for each program. Or maybe you just quote spaces somehow, and good luck figuring out who is responsible for inserting quotes and who is responsible for removing them.

reply

laumars 21 hours ago [–]

> everything is text

Everything is a byte stream. Usually that means text but sometimes it doesn't. Which means you can do fun stuff like:

reply

" Indeed, while pipes are useful at times, their system of communicationbetween programs—text traveling through standard input and standard out-put—limits their usefulness.4 First, the information flow is only one way.Processes can’t use shell pipelines to communicate bidirectionally. Second,pipes don’t allow any form of abstraction. The receiving and sending pro-cesses must use a stream of bytes. Any object more complex than a bytecannot be sent until the object is first transmuted into a string of bytes thatthe receiving end knows how to reassemble. This means that you can’tsend an object and the code for the class definition necessary to implementthe object. You can’t send pointers into another process’s address space.You can’t send file handles or tcp connections or permissions to accessparticular files or resources.At the risk of sounding like a hopeless dream keeper of the intergalacticspace, we submit that the correct model is procedure call (either local orremote) in a language that allows first-class structures (which C gainedduring its adolescence) and functional composition. " -- http://web.mit.edu/~simsong/www/ugh.pdf

cuddlybacon 21 hours ago [–]

I mostly like what they wrote about pipes. I think the example of bloating they talked about in ls at the start of the shell programming section is a good example: if pipelines are so great, why have so many unix utilities felt the need to bloat?

I think it a result of there being just a bit too much friction in building a pipeline. A good portion tends to be massaging text formats. The standard unix commands for doing that tend to have infamously bad readability.

Fish Shell seems to be making this better by making a string which has a syntax that makes it clear what it is doing: http://fishshell.com/docs/current/cmds/string.html I use fish shell, and I can usually read and often write text manipulations with the string command without needing to consult the docs.

Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

reply

code-faster 21 hours ago [–]

It's less that pipelines are friction, they're really not.

It's more that people like building features and people don't like saying no to features.

The original unix guys had a rare culture that was happy to knock off unnecessary features.

reply

 atombender 17 hours ago [–]

Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
  0
  0

You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
  0
  1

Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)

This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
  fi

I don't think I've seen a script ever bother do this.

Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.

And that's why shell scripts are so brittle.

reply

codemac 17 hours ago [–]

Just use a better shell. rc handles this wonderfully, $? is actually called $status, and it's an array, depending on the number of pipes.

reply

fomine3 15 hours ago [–]

set -e makes another pain for command that nonzero isn't mean failed (ex. diff). It changes semantics for whole script.

reply

 geophile 22 hours ago [–]

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.

reply

ehsankia 20 hours ago [–]

+1

Piping is great if you memorize the (often very different) syntax of every individual tool and memorize their flags, but in reality unless it's a task you're doing weekly, you'll have to go digging through MAN pages and documentation every time. It's just not intuitive. Still to date if I don't use `tar` for a few months, I need to lookup the hodge podge of letters needed to make it work.

Whenever possible, I just dump the data in Python and work from there. Yes some tasks will require a little more work, but it's work I'm very comfortable with since I write Python daily.

Your project looks like, but honestly iPython already lets me run shell commands like `ls` and pipe the results into real python. That's mostly what I do these days. I just use iPython as my shell.

reply

khimaros 21 hours ago [–]

The lispers/schemers in the audience may be interested in Rash https://docs.racket-lang.org/rash/index.html which lets you combine an sh-like language with any other Racket syntax.

reply

cat199 21 hours ago [–]

also what I think is the 'original' in this domain, scsh

reply

jraph 22 hours ago [–]

Your project looks really cool.

I am pretty sure I've seen a Python-based interactive shell a few years ago but I can't remember the name. Have you heard of it?

reply

x1798DE 22 hours ago [–]

I imagine you are thinking of xonsh? https://xon.sh/

reply

ketanmaheshwari 22 hours ago [–]

Unix pipelines are cool and I am all for it. In recent times however, I see that sometimes they are taken too far without realizing that each stage in the pipeline is a process and a debugging overhead in case something goes wrong.

A case in point is this pipeline that I came across in the wild:

TOKEN=$(kubectl describe secret -n kube-system $(kubectl get secrets -n kube-system

grep default cut -f1 -d ' ')grep -E '^token'cut -f2 -d':'tr -d '\t'tr -d " ")

In this case, perhaps awk would have absorbed 3 to 4 stages.

reply

dfinninger 21 hours ago [–]

Oh man. That's when knowing more about the tools you are using comes in handy. Kubectl has native JSONPath support [0].

Or at the very least, use structured output with "-o json" and jq [1], like they mention in the article.

I have always found that trying to parse JSON with native shell tools has been difficult and error-prone.

[0] https://kubernetes.io/docs/reference/kubectl/jsonpath/ [1] https://stedolan.github.io/jq/

reply

---

lisp's `(quoted-list-with-substitution ,(this-is-evaluated) )

---

first-class 'automatic type conversion and constraint checking' in Iris reminds me of Hoon:

" Ubiquitous use of parameterizable coercions (which are also first-class values) for automatic type conversion and constraint checking, e.g. list returns a basic (unspecialized) list coercion, whereas list {of: whole_number {from: 0, to: 100}, min: 4, max: 4} returns a list coercion with additional element type and length constraints. Code is untyped; however, handler interfaces can include coercion information to provide both auto-generated user documentation and run-time conversions and checks that handler arguments and results are suitable for use. Weak latent structural rather than strong nominal typing: “If a value looks acceptable, it [generally] is.” " -- https://github.com/hhas/iris-script

---

need to look at Kernel's other innovations beyond fexprs/vau/wrap. The webpage says they have 'keyed' dynamic variables which are 'fluid's 'done right'

---

" (println (s/valid? ::user {::username "rich" ::password "zegure" ::comment "this is a user" ::last-login 11000}))

:my-project.users/username ;;this is what fully-qualified keywords look like true

Spec also encourages the use of qualified keywords: Until recently in Clojure people would use keywords with a single colon but the two colons (::) mean that keywords belong to this namespace, in this case my-project.users. This is another deliberate choice, which is about creating strong names (or "fully-qualified"), that belong to a particular namespace, so that we can mix namespaces within the same map. This means that we can have a map that comes from outside our system and has its own namespace, and then we add more keys to this map that belong to our own company's namespace without having to worry about name clashes. This also helps with data provenance, because you know that the :subsystem-a/id field is not simply an ID – it's an ID that was assigned by subsystem-a. " -- [4]

---

brundolf 1 day ago [–]

One other aspect of Rust that makes it suited to GUI work is the fact that the management of mutable state is one of the core problems of writing a GUI app, and that Rust allows you to talk about mutation in a way that no other language (that I know of) does. You can a) have mutable structures, and b) declare that a function will treat one of those - passed as an argument - as deeply immutable, both within the same language. You can have exactly the amount of mutation that you want, which should be extremely enticing to any GUI developer.

reply

zozbot234 1 day ago [–]

> deeply immutable

Not really true, because the "interior mutability" pattern allows for mutating structures that are passed via a shared reference. Truly "immutable" data is in fact quite hard to characterize in a language that's as 'low-level' as Rust.

reply

brundolf 1 day ago [–]

Technically yes, there are trapdoors like RefCell?