proj-oot-ootLibrariesNotes2

i'm not quite sure how to divide up the core/bundled/recommended libraries.

At the one extreme, we have the core language within Oot. (this defines the standard AST that macros operate upon, i guess)

Then we have the minimal profile for Oot. Eg a very small version of Oot without many libraries, that defines the language itself. This is what you'd run on massively parallel machine where each CPU has little more than 32k (or whatever) local memory (although couldn't we put the other libraries in shared global memory here?).

Then we have the standard profile, ie everything that is imported by default on PCs.

Then we have the distribution, ie everything that is shipped/distributed. All of this is released and versioned together.

Then we have the 'blessed' libraries, ie things that are screened for quality and usefulness by the Oot team, but not as carefully as the distributed libraries; and which are released and versioned separately from the distribution.

Then we have the 'canonicalized' library set, that is the 'canonical' libraries for doing various tasks, according to the wider community.

And then we have everything else that is being tracked by Oot's official packaging system/CPAN analog (is this confined to open source?).

Quesions: Should all of these things be considered separately?

which set of library functions should a 'typical Oot user' be assumed to be familiar with (i'm guessing, either the minimal profile, the standard profile, the distribution?) I'm leaning towards saying they should know the minimal profile cold, they should be familiar with every function in the standard profile, and they should be aware of the general capabilities of every library in the distribution.

if a blessed library becomes orphaned, should the core team try to adopt it? (i'm guessing, yes, but they should feel free to cut it if they don't have the manpower)

should we even have blessed library? (the motivation for this is so as to not bloat the distribution, but also possibly to provide a way for a library to become 'standard' even if an outsider wants to use their own release system for it)

what are the counterparts of these things in other languages? haskell: i feel like the 'minimal profile' is like the Haskell Prelude; what does GHC import by default (corresponding to our 'standard profile'; here's hawk's: https://github.com/ssadler/hawk/issues/35 ). is haskell.base part of the GHC distribution? actually Haskell 2010 specifies a set of libraries, so i guess these would be the std. profile: http://www.haskell.org/onlinereport/haskell2010/haskellpa2.html#x20-192000II

---

http://www.haskell.org/haskellwiki/Functor-Applicative-Monad_Proposal (called the AMP in other places). related to the idea to add foldable/traverable to the Prelude.

---

---

http://www.haskell.org/haskellwiki/Class_system_extension_proposal

https://ghc.haskell.org/trac/ghc/wiki/InstanceTemplates

---

" a design that makes it easier for new releases to make backwards incompatible changes. One approach to this could be at the package level the way that base-compat operates. Another approach that could be useful to library authors is incorporate versioning at the module level.

Something to keep in mind though is that because the new Prelude needs to try to work with the old Prelude, there are not that many options in the design space. classy-prelude has had the luxury of being able to re-think every Haskell wart. So it was able to remove all partial functions and use Text instead of String in many places. But that process is very difficult for the actual Prelude, which is severly constrained. "

---

" Wednesday, October 01, 2014 Why Traversable/Foldable should not be in the Prelude

Summary: For GHC 7.10, Traversable and Foldable are going to be in the Prelude. I missed the original discussion, but I suspect it's a bad idea.

Types are how Haskell programmers communicate their intentions to each other. Currently, the Haskell Prelude contains:

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

As of GHC 7.10, as part of something known as the Burning Bridges Proposal (ticket, discussion, I can't actually find a full proposal...), that will become:

mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)

Surely that's a good thing? Aren't more general types always better? Isn't the Prelude an archaic beast from the time before? I'd argue functions which are highly polymorphic are hard to use, and hard to think about, especially for beginners. I'd also argue the Prelude is remarkably well designed, not perfect, but quite an impressive feat.

What makes a type signature complex?

I've been thinking recently about what makes type signatures complex, both to practitioners, and to relative beginners. My rough metric is:

    Fully concrete types are usually simple, as long as they aren't too long. The longer a type gets, the more complex it gets.
    Types with functions in them aren't too bad (order-1 types), but as you go up to order-2 types things start to get more complex.
    Fully polymorphic functions can be simpler than concrete functions, since they declare what you don't need to worry about.
    Functions with type classes are more complex, since you need to read the type signature while looking at the context, and need to know each class being used.
    Simple type classes (Eq, Show) aren't too bad, but custom type classes impose more of a burden.
    As you add more type classes, the complexity grows faster than linearly. Three type classes are not three times as complex as one, but quite a bit harder.
    Higher kinded type classes are significantly more complex than kind * type classes, e.g. Monad, Functor. The reason is that instead of having a hole you fill in, you now have a hole which itself has a hole.
    The higher-kinded type classes Monad and Functor aren't as bad as the others, since Functor is really the "simplest" higher-kinded type class, and Monad is required knowledge for IO.
    As you have more higher kinded type classes, the complexity burden grows even worse than for kind * type classes. Two is significantly more complex than one.

By that metric, the old mapM isn't too bad, but the new mapM is quite complex. It has two higher-kinded type classes, and one of them is not one of the common ones. I appreciate that making Foldable and Traversable key to Haskell will probably lead to them being more used, but now all beginners are going to have to wade through the Monad tutorial, their Foldable tutorial and their Traversable tutorial before they start programming (or just give up).

Why generality hurts

There are two main reasons why generality hurts:

Reading type signatures becomes difficult/impossible. We already have that problem with the Control.Arrow module, which (as far as most people use it), is just a pile of tuple combinators. But unlike other tuple combinators, these are ones whose type signature can't be understood. When I want to use &&& or * I just pick randomly, see if it type checks, then try again. When other people I know what to use these functions they just use an explicit lambda. No one thinks of referring to the documentation, since the documentation presents a unification problem (which most of the people I know could solve), not an intuition.

Reading code becomes difficult. Haskell is brilliant for letting you write a composable pipeline of code that takes some input, does some processing, and produces some output. But that only works if you have enough concrete pieces in each function to read each piece in isolation. As an example:

test = foo . mapM baz . bar

Using the current mapM definition I can, in a fraction of a second, know the approximate shape of what foo consumes, and what bar produces. With the new mapM I don't, and have to keep more context in my head to reason about the code.

Who it hurts

Generality of this nature tends to hurt two types of people:

Beginners are hurt because they need to know more concepts just to get going. As a beginner I read through Data.List regularly to build up weapons in my arsenal to attack larger problems. The new Data.List will be generalised, and reading it won't give the insights I enjoyed. Maybe the beginner can instantiate all Foldable things to [], but that adds a mental burden to exactly those people who can bear it least.

Practitioners, those who are paid to code for a living, will have greater problems with maintenance. This isn't an unsubstantiated guess... I have taken over a project which made extensive use of the generalised traverse and sequence functions. Yes, the code was concise, but it was read-only, and even then, required me to "trust" that the compiler and libraries snapped together properly.

Who it benefits

The benefit probably comes from those who are already using the Applicative/Traversable classes regularly. For these people, they can probably avoid an import Prelude(). I am not against ever changing the Prelude, but I do think that for changes of this magnitude the ideas should probably be prototyped as a separate package, widely accepted, and only then should significant surgery be attempted on the Prelude. The classy-prelude work has gone in that direction, and I wish them luck, but the significant changes they've already iterated through suggest the design space is quite large.

Concluding remarks

I realise that I got to this discussion late, perhaps too late to expect my viewpoint to count. But I'd like to leave by reproducing Henning Thielemann's email on the subject:

        David Luposchainsky wrote:
        +1. I think the Prelude should be a general module of the most commonly
        needed functions, which (generalized) folds and traversals are certainly
        part of; right now it feels more like a beginner module at times.
    It is certainly a kind of beginner module, but that's good. Experts know
    how to import. Putting the most general functions into Prelude does not
    work because:
    1. There are often multiple sensible generalizations of a Prelude
    function.
    2. You have to add more type annotations since types cannot be infered
    from the functions.
    There is simply no need to change Prelude and all packages that rely on
    specific types. Just don't be lazy and import the stuff you need!
    I should change my vote to:
    -10"

-- http://neilmitchell.blogspot.co.uk/2014/10/why-traversablefoldable-should-not-be.html

(already added this link to plbook)

--

[–]yitz 6 points 1 day ago

    proposal made by Simon Marlow a year and a half ago that if you import Prelude.Foo then NoImplicitPrelude would get set automatically. This would make alternate preludes easier for folks to push.

That is a really nice idea.

[–]WilliamDhalgren? 3 points 1 day ago*

right.

If I'm reading that design correctly, the leaf class with InstanceTemplates? still needs to be coded for the true hierarchy above it, with a "default instance <classname>" for each superclass template it inherits. The example given has Monad decl still conscious of the default Functor instance in Applicative.

and still gets warnings for any generated instances unless doing a "deriving <classnames>" for all actual classes on the final datatype.

IMHO not as scalable as McBride?'s proposals, where final instances apparently freely mix declaration from all intrinsicly-declared superclasses.

There you only get warnings if pre-empting with explicitly created instances, allowable for a transitional period with a PRE-EMPT pragma, or an error otherwise, without excluding these explicitly from being generated.

    permalink
    save
    parent
    report
    give gold
    reply

[–]edwardkmett 6 points 1 day ago

When I last spoke with Richard we'd talked about including such a component in the proposal. I'm unsure if its absence is an act of omission or commission.

The ability to split a class is particularly dear to me, if we ever want to have the ability to refine our class hierarchies without doing so on the back of every user.

    permalink
    save
    parent
    report
    give gold
    reply

---

"I think a more promising solution to the problem of generic type complexity is making specialization of type signatures easier in code, documentation, and compiler error messages."

---

---

"libraries in haskell that stood out as being unique":

http://www.reddit.com/r/haskell/comments/1k3fq7/what_are_some_killer_libraries_and_frameworks/

(already added this link to plbook)

summary:

the main ones:

runner-ups:

others:

---

confusing error in Python numpy: if you use an array where a scalar is expected, you get:

"TypeError?: only length-1 arrays can be converted to Python scalars"

of course, you weren't trying to convert anything

---

max and nanmax

---

need a find_first!

http://numpy-discussion.10968.n7.nabble.com/Implementing-a-quot-find-first-quot-style-function-td33085.html

---

numpy 'take' (axis-wise array indexing by array)

---

http://www.yukinishijima.net/2014/10/21/did-you-mean-experience-in-ruby.html

---

i always have to look up how to convert b/t epoch time and Python datetime objects in Python:

http://partiallyattended.com/2011/10/13/managing-unix-time-in-python/

---

.NET immutable collections

http://msdn.microsoft.com/en-us/library/dn385366%28v=vs.110%29.aspx

--

it's confusing how in Python, datetime.timedelta(100) is 100 days , but datetime.timedelta(0, 100) is 100 seconds. datetime.timedelta(100) should be 100 seconds.

--

need library fn to quickly sanitize strings for use in filenames, shell commands, etc

  re.sub(r'[^a-zA-Z0-9_]', '', x) might be enough

--

https://lodash.com/?v3

---

python requests

---

perl6's roll, join, pick, say, as in [1]

---

validation libraries

for example (i dunno if this one is popular/any good, i just saw it in a random search for something else):

https://pypi.python.org/pypi/good/0.0.1-0

---

we discuss this in [2], but here i note that it also serves as a list of useful library functions:

http://t-a-w.blogspot.com/2010/07/arrays-are-not-integer-indexed-hashes.html

---

https://msdn.microsoft.com/en-us/library/aa287104%28v=vs.71%29.aspx

An Extensive Examination of Data Structures Part 1: An Introduction to Data Structures Part 2: The Queue, Stack, and Hashtable Part 3: Binary Trees and BSTs Part 4: Building a Better Binary Search Tree Part 5: From Trees to Graphs Part 6: Efficiently Representing Sets

---

on arrays/lists:

--

"Don't steal good names from the user. Avoid giving a package a name that is commonly used in client code. For example, the buffered I/O package is called bufio, not buf, since buf is a good variable name for a buffer." -- http://blog.golang.org/package-names

"Avoid stutter. Since client code uses the package name as a prefix when referring to the package contents, the names for those contents need not repeat the package name. The HTTP server provided by the http package is called Server, not HTTPServer. Client code refers to this type as http.Server, so there is no ambiguity." -- http://blog.golang.org/package-names

(http://blog.golang.org/package-names probably has other useful tips too, i probably should read it)

--

this looks really cool:

https://github.com/gizak/termui https://news.ycombinator.com/item?id=9276188

--

https://news.ycombinator.com/item?id=9280813

--

--

np.asanyarray : File: /usr/lib/python2.7/dist-packages/numpy/core/numeric.py Definition: asanyarray(a, dtype=None, order=None)

Convert the input to an ndarray, but pass ndarray subclasses through.

--

griddata meshgrid transpose loadtxt imload (imread) imsave plot scatter what else? look at everything i used in bshanks_thesis/__init__.py; also look at the explicitly imported stuff in atr. Also look at those matlab<-->numpy cheat sheets. Also julia, and that clojure matrix math lib (incanter)

--

use reddis data structs, ops for libs eg "lists, sets, sorted sets, hash tables, pub/sub, hyperloglog, and scripting (lua) support." -- https://news.ycombinator.com/item?id=9304718 eg "Simple values or data structures by keys but complex operations like ZREVRANGEBYSCORE. INCR & co (good for rate limiting or statistics) Bit operations (for example to implement bloom filters) Has sets (also union/diff/inter) Has lists (also a queue; blocking pop) Has hashes (objects of multiple fields) Sorted sets (high score table, good for range queries) Lua scripting capabilities (!) Has transactions (!) Values can be set to expire (as in a cache) Pub/Sub lets one implement messaging "

note: reddis 'hyperloglog' is a constant-space, linear-time algorithm for approximately counting uniques in some set in an online (as opposed to batch) manner (as opposed to the naive algorithm, which is linear in space rather than constant). There are 3 operations for it:

i've written the above signatures as if these are non-variadic fns operating on immutable data and returning updated state when necessary, but in Reddis, actually PFADD and PFMERGE are variadic, PFADD var element .. element, PFMERGE dst src src .. src, and PFADD and PFMERGE are mutating (PFADD mutates 'var' and PFMERGE takes an additional parameter, 'dst', which is mutates).

--

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

--

Lodash (the new underscore, apparently)

--

good for quick tests:

the constant: array([[1,2], [3,4]])

also the magic(9) constant (the 3x3 magic number array)

--

isprime

prime factorization

--

https://github.com/mahmoud/boltons

https://news.ycombinator.com/item?id=9350562

https://github.com/cool-RR/python_toolbox

https://news.ycombinator.com/item?id=9352253

---

javascript arrays have a 'map' method that is like map-enumerate, passing both an item and its index:

" here's the view m("table", [ todo.vm.list.map(function(task, index) { return m("tr", [ m("td", [ m("input[type=checkbox]") ]), m("td", task.description()), ]) }) ]) " -- http://lhorie.github.io/mithril/getting-started.html

--

"matplotlib works with a number of user interface toolkits (wxpython, tkinter, qt4, gtk, and macosx)"

--

In [829]: vstack([[1,2], [3,4]]) Out[829]: array([[1, 2], [3, 4]])

In [830]: vstack([[1,2], [[3,4], [5,6]]]) Out[830]: array([[1, 2], [3, 4], [5, 6]])

---

complaints about golang's std lib:

 TheDong 13 hours ago

Consistent standard library?

It's a crapshoot what will be an interface and what will be a struct, which is one of the fundamental features of the language.

Returned error values are sometimes errors.New (e.g. io.EOF) and sometimes random fmt.Errorf strings that couldn't possibly be handled (e.g. all of tls.go basically), sometimes actual structs (e.g. PathError?), and sometimes panics.

If you can't even get error handling right and consistent, what claim do you have to consistency? At least in java it's pretty much all exceptions.

The math library, as mentioned above, is pretty darn iffy.

The existence of both 'path' and 'filepath' is possibly a mistake.

The 'heap' and 'list' in collections both seem like they should be similar, but 'heap' operates on a thing that fulfills its interface and list operates directly on arbitrary objects without some interface needed.

Due to the lack of 'protected', the std library is littered with exported fields you shouldn't use (e.g. gob.CommonType? and template.Template.*parse.Tree and so on).

Sure, the library is consistent in many places because the language is so small there's little else it can do but be consistent, but the fact that it falls flat in error handling and has visible warts makes me feel that it's a very broad std library, but far less consistent than, say, Rust's (where everything returns consistent Optional error types) or Java's or Smalltalks. Sure, it's more consistent than the clusterfuck that is ruby and/or python or javascript or (heaven forbid) c/c++, but by no means is being better than any of those an achievement in this category.

reply

chimeracoder 12 hours ago

> Returned error values are sometimes errors.New (e.g. io.EOF) and sometimes random fmt.Errorf strings that couldn't possibly be handled (e.g. all of tls.go basically), sometimes actual structs (e.g. PathError?),

This is the advantage of having errors be interfaces rather than concrete values. Once you learn how to use it, it's a strength. The usage of structs that satisfy the interface (as in PathError?) is clearly documented and completely predictable. If your concern is that a certain package could use a struct satisfying the interface and does not, well, one advantage of errors as interfaces is that you could propose this change without breaking the backwards compatibility guarantee.

It's really important to keep in mind that errors are adopted from error values in C, which means that their idioms and usage is inspired by that rather than by exception handling. Many new Go programmers (including me, when I started) were more familiar with exception handling in higher-level languages than error handling in idiomatic C, so it does take some getting used to.

> and sometimes panics.

Do you have any examples of the standard library using panics to signal handle-able errors (not counting cases in which the panic is recovered at the top-level in the library and never visible to the caller)?

> The math library, as mentioned above, is pretty darn iffy.

The only complaint I have had about package "math" is that all operations are defined on float64s, which can be annoying, but is far better than any alternative. That's an annoyance, but it's certainly not an inconsistency in the library.

Do you have any other examples of inconsistencies in the math package?

reply

jbooth 10 hours ago

Yeah, but with C's old-school int values for errors, you can always make an equality comparison. The error interface in Go doesn't specify equality, so you can do "if err == io.EOF" in some cases and you're up a creek in other cases. Sure you can do if err.Error() == "String I'm expecting", but as the parent said, fmt.Errorf can easily make that impossible.

reply

 TheDong 10 hours ago

No, the advantage of having an interface is everything can return an error I can typeswitch on. Unfortunately, you can't (in a backwards compatible way) change io.EOF or any of the other 'constant' errors because there's so much code doing 'if err != io.EOF' which now breaks. In addition, it's backwards incompatible due to anyone doing 'reflect.TypeOf?' which I guess you could argue is fine to break.

Speaking of reflection, there's a metric TON of runtime panics in the reflect package. Hell, I'm not sure there's a method in there that can't panic.

No doubt, however, you'll say that's cheating and the expected behavior of that package, so how about NewTimer? (playground https://play.golang.org/p/jDKniK3aqa ). It does not give me an opportunity to recover from bad input by passing an error out, it just panics. It is also not documented that it panics afaik and other functions in time (time.Sleep) take negative durations and handle them just fine. This is definitely inconsistent with other std library functions that can take bad input and return errors.

I also think it's disingenuous to say "Keep in mind errors are from C and at least they're better than C"... the comment I'm responding to is a comment regarding go std library being one of the most consistent (implicit compared to ANY LANGUAGE) so I may bring in anything I wish, and sure Go's errors are a step up from C, but they're awful compared to other languages. They're basically passing around strings with no builtin way to create a trace of errors to return up the stack to maybe be printed, no default way to create an optional type with an error value, and no consistent way to determine which of N errors it was that was returned from a function without using a regexp 90% of the time because that damned fmt.Errorf / errors.New... And yes, using a Regexp in your error handling IS a problem.

> defined on float64, which can be annoying, but is far better than any alternative

Funnily enough, the 'rand' library implements everything by just having multiple functions and postfixing them with types (rand.Uint32, rand.Int, rand.Float32) and strconv does something vaguely similar.

Whether one or the other is better, that's another inconsistency in the stdlibrary; do you have many functions per type or only one function for one type? I have no clue how Go could have abused you such that you are able to conceive of not having a builtin int64 compatible 'min' function as a good thing.

Actually, perhaps your post is simply a plea for help as Go has tied you up and holds a gun to your head, in which case I suppose we should mount a rescue mission forthwith! Just let me write a quick timer to keep track of our progress... -1 you s.. err panic

reply

comex 39 minutes ago

I don't know about consistency within the standard library, but NewTimer?'s behavior definitely sounds consistent with the concept of distinguishing programming errors from exceptional conditions. Passing a negative number as a duration could either be defined behavior (equivalent to zero for programmer convenience and consistency with the principle that timers can fire late) or a programming error (should've checked that whatever you were subtracting to get that duration isn't backwards), but it's not an exceptional condition that can't be predicted in advance. Indeed, checking for a negative number, which is rarely necessary, is no more code than the 'if err' thing, and if it isn't necessary in a particular use of the function, you have the luxury of knowing it can't fail, and you don't need to figure out what will happen if it does.

(I bet you'd be unhappy at Rust irrecoverably panicking the thread for simply performing addition, if the result overflows!)

reply

tomjakubowski 16 days ago

parent flag

There is a third way. See Haskell's Either l r, or Rust's Result<T, E> types. See http://lucumr.pocoo.org/2014/10/16/on-error-handling/ and http://lucumr.pocoo.org/2014/11/6/error-handling-in-rust/

I don't know why so often, the immediate reaction to legitimate criticism of this wart in Go is to argue against exceptions, even when nobody has even brought up exceptions as an alternative.

vowelless 10 hours ago

> The math library, as mentioned above, is pretty darn iffy.

My biggest disappointments early on was not even having a Max/Min function for ints.

reply

whateveracct 10 hours ago

But there is for float64 ;)

I think this is a classic case of Go's need for "simplicity" hamstringing the language. When handling min/max, they had a few options:

1) Treat numbers as a special case and have a polymorphic min/max that operates on any number type. This is out of the question because it is obtuse and irregular and neither of those things are the "Go way"

2) Properly abstract over numbers somehow. This can be weakly done using Go interfaces but 1) it would require >, < etc to all be methods on the number types. But then Go would need operator overloading and that's not the "Go way" 2) using an interface is actually bad anyways because it doesn't make guarantees that input type =:= output type. To do that you need proper generics and then use a common subtype or take it further and use a Numeric typeclass. This is way too complicated and not the "Go way"

3) Write a min/max for every number type. Due to lack of overloading, each would be named accordingly (minInt64 etc). This is ugly and definitely not the "Go way"

4) Just use the most "general" number type and write min/max for that. You can just cast on the way in and out so this "works". It doesn't require API bloat or more language features, so it's the "Go way"

reply

aikah 10 hours ago

yet the "go way" makes computing really tough :

        x := 1
	y := 2.0
	z := x + y

Error : invalid operation: x + y (mismatched types int and float64)

So much that devs end up using float64 everywhere ...

reply

coldtea 14 hours ago

It's quite good, but e.g. Java SDK has stuff that runs circles around Go's standard library regarding breadth and maturity, especially stuff added since 1.4 (nio, etc).

And some aspects of the Go SKD are horrible in practical use. Case in point, most things math related.

reply

krylon 9 hours ago

Well, yes. But Java's standard library is also huge and full of deprecated stuff. I have not done a lot of Java programming, but when I did play around with Java, I found myself spending most of the time actually browsing through the standard library's documentation (which is, to be fair, really, really good) looking for stuff.

Also, Java has a head start of nearly 15 years on Go, and the Java community is (or used to be, at least) pretty huge.

Not that this invalidates your point.

reply

 nevergo 11 hours ago

Go has the poorest std lib I've ever seen.

reply

andrewchambers 5 hours ago

Compared with? My reference point is python,java,clojure,c,c++. Python is huge and useful but messy and inconsistent. Java is huge and doesn't compose well. Clojure is underdocmented and hardly batteries included. C and C++ can't do anything useful without a bunch of extra libraries. From what I can see, C# is like java in this regard.

Go is a small incredibly useful and composable set of libraries that handles a vast amount of cases with a small amount of code.

reply

 curun1r 8 hours ago

> Can you think of some aspect that was especially well thought-out?

Go routines, channels and select. With very little else, it's possible to write some very useful code in a way that's concise, elegant and easy to reason about.

reply

coldtea 8 hours ago

Only compared to something like C or explicit old-skool Java like threading. A lot of modern languages have CSP built-in or as a lib (Java, Scala, Haskell, Clojure, C++, Ada, Erlang, heck even Rust).

And Go is not quite expressive to address higher level, but common, constructs ( https://gist.github.com/kachayev/21e7fe149bc5ae0bd878 ) in a conceise and elegant way.

And if you access anything outside of channel provided stuff in Golang, you have to bring your own safety.

reply

---

https://github.com/aturon/rfcs/blob/collections-conventions/text/0000-collection-conventions.md

--

http://www.jsgraphs.com/

other suggestions: https://news.ycombinator.com/item?id=9583384

http://c3js.org/ recc. by https://news.ycombinator.com/item?id=9585859

http://ecomfe.github.io/echarts/index-en.html recc. by https://news.ycombinator.com/item?id=9584899

---

needs an isiterable (Python somehow left this out!)

---

in Python, the syntax: print '.', is very convenient, but often what you actually have to do is: sys.stdout.write('.') sys.stdout.flush()

---

should have something like python.requests but which makes it easy to:


another contender to lodash and underscore (and ramda?) for a js library, this one claims to be more functional:

https://github.com/jussi-kalliokoski/trine

discussion: https://news.ycombinator.com/item?id=9699061

---

http://underscorejs.org/ https://lodash.com/ https://github.com/ramda/ramda

---

idiv for integer/truncating division, div for normal division

---

++ unary increment (pre) rem modulus/idiv remainder (or have separate 'mod' for modulus? note: 3 mod 5 == 5 rem 3)

---

http://stackoverflow.com/questions/1903954/is-there-a-standard-sign-function-signum-sgn-in-c-c

https://mail.haskell.org/pipermail/libraries/2013-April/019694.html

---

which is better, 'to' or 'from; library names like:

str-to-keyword

or like:

keyword-from-str

?

---

on the difference between remainder and modulus on signed numbers (sounds like the things to do here is to have a modulus function, not remainder, b/c that's what's expected nowadays; i think LuaVM? also uses mod; so use mod):

" Recently I learned that Javascript doesn't handle modulus on a negative number correctly among other things.

That's not really true. Javascript doesn't have a modulus operator, it has a remainder operator, and it is completely valid as a remainder operator. The "%" operator is usually called the modulus operator, but it originated in C where prior to C99 its behavior on signed integers was undefined, and on signed integers modulo and remainder are equivalent. So javascript didn't want undefined behavior so they picked one definition and ran with it, it just so happens that C99 chose the other definition and it stuck. " -- https://news.ycombinator.com/item?id=9789371


Useful Python Libraries for Startups https://news.ycombinator.com/item?id=9806370


https://github.com/Workiva/go-datastructures

https://news.ycombinator.com/item?id=9829025


"hom" for homsets

---

" Classic and ISO-Prolog: Ciao provides, through convenient defaults, an excellent Prolog system with support for ISO-Prolog. Other classical “builtins” expected by users, and which are provided by modern Prolog systems (YAP, SWI-Prolog, Quintus Prolog, SICStus Prolog, XSB, GNU Prolog, B-Prolog, BinProlog?, etc.), are also conveniently available. In line with its design philosophy, in Ciao all of these features are optional and brought in from libraries rather than being part of the language. This is done in such a way that classical Prolog code runs without modifications "


https://janestreet.github.io/

    Core, an industrial strength alternative to OCaml's standard library. It is not a compatible, drop-in replacement for the standard library. We've made different design decisions, and so code designed for the standard library needs to be adapted to use Core.
    Core_extended, a set of useful extensions to Core. These are less well tested and less stable than Core proper.
    Async, a monadic concurrency library
    A set of syntax extensions, including Sexplib and Bin_prot, which extend the OCaml language itself. These are not necessary to use the rest of the libraries, but they are necessary for building them.

---

 nbevans 4 hours ago

Bear in mind this project was started around the same time there was a ton of uncertainty around the future of Silverlight and WPF. Alas, one did die, one lives on for now. But nobody knew that at the time, including Intel, or apparently Microsoft. WinForms? has never faced any forward compatibility uncertainty so it is a good long-term bet.

reply

bunderbunder 2 hours ago

It's also a compelling choice if you need to operate in a resource-constrained environment. I don't know what kind of hardware is in Hawking's system nowadays, but back when this was first being built a low-power portable system that runs a higher-activity WPF UI smoothly could have been fairly expensive.

reply

tonyedgecombe 4 hours ago

WinForms? is still a great way to write desktop software if you don't need all the features of WPF, I just started a new project with it and have been really productive.

reply

Pxtl 2 hours ago

WinForms? is not simple. There are so many core classes that have counterintuitive edge-cases and overcomplicated behavior, and so many things you'd expect to work by default don't.

Databinding is a complete trainwreck, the Combo-box class is horribly overcomplicated by its double-duty as text-entry and drop-down-list, the DataGridView? is a complete beast of leaky abstractions, and the layout engine completely falls apart if somebody alters the DPI unless you obsessively test DPI alterations yourself.

I don't blame Microsoft for any of this - it was 2000 and they were making a wrapper around some terrifying legacy code.

But this thing should have been tossed in the dustbin of history a long time ago.

reply

jorgeleo 1 hour ago

I believe that no technology is simple on it's own, it all depends to the abstractions that you are used to. Counterintuitive is dependent on how things are expected to work.

Data binding is not solid, but it is a quick hack to display data, the solution is using a business model and mvp or mvc.

What do you find complicated about the combo class?

Data grid view... It is a train wreck, but then again, there is not much need to use it if you have a proper model behind.

The layout engine does sucks... The only alternative I found is to use the dev express layout control. The rumor is that 4.6 solves this.

Win forms is solid and it has very little chance of disappearing. Areas of the screen can be controlled independly, which means ui encapsulation is there... Something not easily done in html.

reply

duncan_bayne 1 hour ago

Sure. But - serious question - what offering from Microsoft would you replace it with?

reply

ZanyProgrammer? 2 hours ago

WinForms? is still the easiest way to write desktop software quickly and easily. If you don't need a fancy UI for customer facing work, its a great platform for internal use.

reply

lbruder 24 minutes ago

Have a look at lazarus (http://www.lazarus-ide.org/). It uses a different language (FreePascal? instead of C#), but for me it's much more productive, and the programs written run without any framework and feel much snappier.

reply

Aleman360 1 hour ago

... for some definition of "easiest." Maybe I would agree if you don't need to support a custom look-and-feel (Win32-looking apps don't really fly anymore), responsive layout, high DPI, touch/pen input, system theme colors, accessibility, localization, and haven't learned XAML.

WPF and UWP apps are both far easier.

reply

 jimmcslim 4 hours ago

The data binding in WPF seemed more powerful to me... enabling MVVM approaches, although I'm sure the same is possible in WinForms? as well.

Does WinForms? still get love in the new versions of the SDK or is it considered done now?

reply

toong 4 hours ago

WinForms? is in maintenance mode. The only updates it gets is to make sure it runs on new versions of Windows & maybe some security related updates.

(There still are bugs around, but I think they will not be addressed ever. Fixing those could potentially break existing applications relying on that behaviour.)

reply

whoisthemachine 4 hours ago

I don't know, I've done both (Silverlight MVVM and Winforms) plenty and I prefer the manual approach. Databinding is cool until you have any sort of complexity in your app, and then it becomes unwieldy quickly.

reply

mariusmg 4 hours ago

Databinding (and MVVM by extension) is ok only in very simple scenarios. Once you hit some complexity you end up in a world of hurt.

Databinding is not a leaky abstraction ,it's a fucking flood abstraction.

reply

stupidcar 4 hours ago

Um, if you're using MVVM, then by definition you should have a clean separation between the properties of your Model, which is part of your domain, and the properties of your View-Model, which is part of your presentation tier and directly tied to a particular View. If you have this, what exactly can be leaked by databinding? The only things you're binding to should be properties of your View-Model.

reply

icegreentea 3 hours ago

While that's largely true when dealing with pure business logic, there's a maddeningly large amount of UI logic, and hybrid business/UI logic that becomes really annoying (or outright impossible) to handle in pure MVVM with WPF. Reasons for this include that many user interface properties aren't exposed nicely to allow data-binding.

Probably the most infamous one is that the WPF listview with multiple select enabled doesn't allow you to bind to the collection of selected items. Instead you have to do all sorts of work arounds, that while individually aren't too bad, when put all together, makes all the other hard work you put into doing MVVM on the components that you fully control super frustrating.

reply

r-n 2 hours ago

> many user interface properties aren't exposed nicely to allow data-binding

You hit the nail on the head with this. This is why I don't like to use WPF outside of writing small utilities.

reply

m_fayer 1 hour ago

I've worked on large and complex apps with MVVM in WPF and on the web with Angular, and have not regretted using MVVM for a second.

Databinding is indeed a leaky abstraction, but at the same time it's a very powerful one. I'm willing to learn the inner workings of the binding system to avoid performance pitfalls and other weirdnesses. I'm also willing to continually wrap all sorts of not MVVM-ready components to make them data-binding friendly. When people talk about databinding being a leaky abstraction, what I hear is "I was promised magic and it's not actually magical."

Also - there's many different approaches to how it's done with various tradeoffs. Compiled bindings on Android and the new Windows platforms look interesting, and you should also check out how ReactiveUI? approaches it.

In the end though, I've never been able to achieve a satisfactory level of loose coupling, testability, and portability without databinding. Despite the overhead and occasional surprises, it's paid off in spades as far as quality and productivity.

reply

---

https://www.simba.com/resources/data-access-standards-library

---

" Consolidating Common Lisp Libraries

I'm starting a movement to consolidate Common Lisp libraries. To participate, here's the ten point program summary:

    Pick your favorite problem domain
    Identify all libraries that address it
    Work with various library authors
    Pick the most promising library
    Declare it THE library
    Add missing features present in other libraries
    Declare other libraries obsolete
    Rename package to short name identifying domain
    Invite all users to migrate to new library
    Profit

Most of it is pretty straightforward, but below are some details. "

-- http://fare.livejournal.com/169346.html

---

http://eudoxia.me/article/common-lisp-sotu-2015/

---

visualizaiton in js on top of d3: mentions https://news.ycombinator.com/item?id=10176595

js, visualize, vega, nvd3, epoch, d4, c3js, rickshaw, XCharts, D3xter, Metrics Graphics, TauCharts?, N3, Dangle

---

is this good?:

https://docs.python.org/2/library/string.html#formatspec

---

Clojure libs needed by boot and Alda:

Retrieving boot-2.2.0.jar from https://clojars.org/repo/ Retrieving clojure-1.7.0.jar from https://repo1.maven.org/maven2/ Retrieving dynapath-0.2.3.jar from https://clojars.org/repo/ Retrieving pod-2.2.0.jar from https://clojars.org/repo/ Retrieving shimdandy-impl-1.1.0.jar from https://repo1.maven.org/maven2/ Retrieving core-2.2.0.jar from https://clojars.org/repo/ Retrieving aether-2.2.0.jar from https://clojars.org/repo/ Retrieving worker-2.2.0.jar from https://clojars.org/repo/ Retrieving parsley-0.9.3.jar from https://clojars.org/repo/ Retrieving reply-0.3.5.jar from https://clojars.org/repo/ Retrieving regex-1.1.0.jar from https://clojars.org/repo/ Retrieving cd-client-0.3.6.jar from https://clojars.org/repo/ Retrieving clj-http-lite-0.2.0.jar from https://clojars.org/repo/ Retrieving slingshot-0.10.3.jar from https://clojars.org/repo/ Retrieving clj-stacktrace-0.2.7.jar from https://clojars.org/repo/ Retrieving drawbridge-0.0.6.jar from https://clojars.org/repo/ Retrieving ring-core-1.0.2.jar from https://clojars.org/repo/ Retrieving clj-http-0.3.6.jar from https://clojars.org/repo/ Retrieving clojure-complete-0.2.3.jar from https://clojars.org/repo/ Retrieving versioneer-0.1.1.jar from https://clojars.org/repo/ Retrieving cheshire-5.3.1.jar from https://clojars.org/repo/ Retrieving sjacket-0.1.1.jar from https://clojars.org/repo/ Retrieving clj-jgit-0.8.0.jar from https://clojars.org/repo/ Retrieving tigris-0.1.1.jar from https://clojars.org/repo/ Retrieving fs-1.3.2.jar from https://clojars.org/repo/ Retrieving clj-yaml-0.4.0.jar from https://clojars.org/repo/ Retrieving clj-pgp-0.5.4.jar from https://clojars.org/repo/ Retrieving byte-streams-0.1.13.jar from https://clojars.org/repo/ Retrieving clj-tuple-0.1.5.jar from https://clojars.org/repo/ Retrieving potemkin-0.3.9.jar from https://clojars.org/repo/ Retrieving primitive-math-0.1.3.jar from https://clojars.org/repo/ Retrieving riddley-0.1.7.jar from https://clojars.org/repo/ Retrieving desiderata-1.0.2.jar from https://clojars.org/repo/ Retrieving aether-api-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving pomegranate-0.3.0.jar from https://repo1.maven.org/maven2/ Retrieving aether-impl-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving aether-util-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving aether-spi-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving aether-connector-file-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving plexus-classworlds-2.4.jar from https://repo1.maven.org/maven2/ Retrieving aether-connector-wagon-1.13.1.jar from https://repo1.maven.org/maven2/ Retrieving maven-aether-provider-3.0.4.jar from https://repo1.maven.org/maven2/ Retrieving sisu-inject-plexus-2.2.3.jar from https://repo1.maven.org/maven2/ Retrieving sisu-guice-3.0.3-no_aop.jar from https://repo1.maven.org/maven2/ Retrieving maven-model-3.0.4.jar from https://repo1.maven.org/maven2/ Retrieving sisu-inject-bean-2.2.3.jar from https://repo1.maven.org/maven2/ Retrieving maven-model-builder-3.0.4.jar from https://repo1.maven.org/maven2/ Retrieving maven-repository-metadata-3.0.4.jar from https://repo1.maven.org/maven2/ Retrieving plexus-component-annotations-1.5.5.jar from https://repo1.maven.org/maven2/ Retrieving plexus-interpolation-1.14.jar from https://repo1.maven.org/maven2/ Retrieving plexus-utils-2.0.6.jar from https://repo1.maven.org/maven2/ Retrieving wagon-provider-api-2.2.jar from https://repo1.maven.org/maven2/ Retrieving wagon-http-2.2.jar from https://repo1.maven.org/maven2/ Retrieving jsoup-1.6.1.jar from https://repo1.maven.org/maven2/ Retrieving wagon-http-shared4-2.2.jar from https://repo1.maven.org/maven2/ Retrieving commons-logging-1.1.1.jar from https://repo1.maven.org/maven2/ Retrieving httpclient-4.1.2.jar from https://repo1.maven.org/maven2/ Retrieving httpcore-4.1.2.jar from https://repo1.maven.org/maven2/ Retrieving jline-2.12.jar from https://repo1.maven.org/maven2/ Retrieving tools.nrepl-0.2.5.jar from https://repo1.maven.org/maven2/ Retrieving tools.cli-0.3.1.jar from https://repo1.maven.org/maven2/ Retrieving commons-codec-1.4.jar from https://repo1.maven.org/maven2/ Retrieving commons-io-1.4.jar from https://repo1.maven.org/maven2/ Retrieving commons-fileupload-1.2.1.jar from https://repo1.maven.org/maven2/ Retrieving servlet-api-2.5.jar from https://repo1.maven.org/maven2/ Retrieving httpmime-4.1.2.jar from https://repo1.maven.org/maven2/ Retrieving jackson-core-2.3.1.jar from https://repo1.maven.org/maven2/ Retrieving jackson-dataformat-smile-2.3.1.jar from https://repo1.maven.org/maven2/ Retrieving org.eclipse.jgit.java7-3.5.0.201409260305-r.jar from https://repo1.maven.org/maven2/ Retrieving jsch-0.1.50.jar from https://repo1.maven.org/maven2/ Retrieving org.eclipse.jgit-3.5.0.201409260305-r.jar from https://repo1.maven.org/maven2/ Retrieving JavaEWAH?-0.7.9.jar from https://repo1.maven.org/maven2/ Retrieving core.memoize-0.5.3.jar from https://repo1.maven.org/maven2/ Retrieving core.cache-0.6.3.jar from https://repo1.maven.org/maven2/ Retrieving data.priority-map-0.0.2.jar from https://repo1.maven.org/maven2/ Retrieving commons-compress-1.3.jar from https://repo1.maven.org/maven2/ Retrieving snakeyaml-1.5.jar from https://repo1.maven.org/maven2/ Retrieving jlayer-1.0.1.jar from https://repo1.maven.org/maven2/ Retrieving bcpg-jdk15on-1.51.jar from https://repo1.maven.org/maven2/ Retrieving bcprov-jdk15on-1.51.jar from https://repo1.maven.org/maven2/ Retrieving jna-4.1.0.jar from https://repo1.maven.org/maven2/ Retrieving data.xml-0.0.8.jar from https://repo1.maven.org/maven2/ Retrieving data.zip-0.1.1.jar from https://repo1.maven.org/maven2/ Retrieving tools.namespace-0.2.11.jar from https://repo1.maven.org/maven2/ Retrieving instaparse-1.4.1.jar from https://clojars.org/repo/ Retrieving timbre-3.4.0.jar from https://clojars.org/repo/ Retrieving alda-0.4.4.jar from https://clojars.org/repo/ Retrieving encore-1.21.0.jar from https://clojars.org/repo/ Retrieving pretty-0.1.16.jar from https://clojars.org/repo/ Retrieving djy-0.1.4.jar from https://clojars.org/repo/ Retrieving overtone-0.9.1.jar from https://clojars.org/repo/ Retrieving clj-native-0.9.3.jar from https://clojars.org/repo/ Retrieving at-at-1.2.0.jar from https://clojars.org/repo/ Retrieving osc-clj-0.9.0.jar from https://clojars.org/repo/ Retrieving byte-spec-0.3.1.jar from https://clojars.org/repo/ Retrieving midi-clj-0.5.0.jar from https://clojars.org/repo/ Retrieving scsynth-3.5.7.0.jar from https://clojars.org/repo/ Retrieving libs.handlers-0.2.0.jar from https://clojars.org/repo/ Retrieving scsynth-extras-3.5.7.0.jar from https://clojars.org/repo/ Retrieving clj-glob-1.0.0.jar from https://clojars.org/repo/ Retrieving midi.soundfont-0.1.0.jar from https://clojars.org/repo/ Retrieving reply-0.3.7.jar from https://clojars.org/repo/ Retrieving cheshire-4.0.3.jar from https://clojars.org/repo/ Retrieving parsley-0.9.2.jar from https://clojars.org/repo/ Retrieving data.json-0.2.3.jar from https://repo1.maven.org/maven2/ Retrieving tools.reader-0.8.13.jar from https://repo1.maven.org/maven2/ Retrieving jna-3.4.0.jar from https://repo1.maven.org/maven2/ Retrieving commons-net-3.0.1.jar from https://repo1.maven.org/maven2/ Retrieving jmdns-3.4.1.jar from https://repo1.maven.org/maven2/ Retrieving jline-2.12.1.jar from https://repo1.maven.org/maven2/ Retrieving jackson-core-2.0.6.jar from https://repo1.maven.org/maven2/ Retrieving tools.nrepl-0.2.8.jar from https://repo1.maven.org/maven2/ Retrieving jackson-dataformat-smile-2.0.6.jar from https://repo1.maven.org/maven2/

---

numpy's 'allclose'

---

mahyarm 3 days ago

For me a big thing is avoiding error prone inconsistencies and other footguns. The less footguns a language has, the better.

For example in objective-c, most of the language deals with nil items without crashing or errors, except for a few APIs and collections. This abstraction mismatch in collections causes runtime crashes and is a language footgun.

Swift fixes this footgun with optional types, but objective-c could of made it's collections nil-safe and apple could of just declared in the company that your apis have to be able to handle nil arguments, no exceptions. The new nullable annotations are an ugly patch to just help transition to swift and still do not fix the footgun.

Java's NPE is another example of a footgun, and C++ & C have footguns everywhere that generate billions of dollars of security industry work.

--- " ...

In Go, there's only one way to format your program correctly, there's only one way to document your program correctly, there's only one recommended way to serialize to JSON, it only has one (IMO really awesome package management system) etc. It's remarkable how little thought I now need to put into those things.

 threeseed 3 days ago

Every programming language starts out like that though.

It's just that over time needs/requirements change and then suddenly you have another package manager, another JSON parsing library etc.

reply

generic_user 3 days ago

As a counter argument I would point to C, C++ and Lua as examples of languages that try to keep a basic generic standard library and let everything else fall out in third party libraries. I consider Go more in line with the C family then with the scripting languages like Python etc, With a slightly bigger stdlib. "

---

numpy.find

numpy.nonzero

---

Oot 'print' autocoerces in the way that Python 'repr' does; if you want a prettyprint 'str' type coercion, you have to do that manually

---

" void do_something(vector<string>& v) { string val; cin>>val; ... int index = 0; bad for(int i=0; i<v.size(); ++i) if (v[i]==val) { index = i; break; } ... }

That loop is a restricted form of std::find. A much cleared expression of intent would be:

void do_something(vector<string>& v) { string val; cin>>val; ... auto p = find(v,val); better ... }

"

---

C++ Core Guideline support library

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-support

---

C++ STL

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-stdlib

also, C++ Boost

---

stuff like

http://msemac.redwoods.edu/~darnold/math15/spring2013/R/Activities/WelchTTest.html

---

these are the only (whitelisted) Python libraries available in Quantopian's sandbox (list from https://www.quantopian.com/help section "Module Import"):

domain-specific to Quantopian:


some python db libs/languages:

SQLalchemy, SQLalchemy Core, Peewee, SQLA

important features of SQLalchemy: http://pajhome.org.uk/blog/10_reasons_to_love_sqlalchemy.html

---

 ginko 1 day ago

>Use a proper context-free grammar parser if you need to parse a context free grammar, you know?

The thing is that regular expression are supported as language feature or as standard library in pretty much every language. If you want to build a proper parser, you'll have to jump through a lot more hoops. For instance for C++ I've tried tons of different lexer and parser generators and they all suck for various reasons. (Verbose syntax, uses global variables, C only, lexer not compatible with parser and vice versa,..) Most people seem to end up writing their own parsers from scratch.

The only time I've ever seen parsing done right is with Parsec for Haskell.

reply

gavinpc 22 hours ago

You might be interested in OMeta, whose goal is to make language experimentation quicker and more accessible by providing a kind of interface layer between a declarative grammar and a host language. I'm still reading the paper, so I can't vouch for it yet. But it's from VPRI and has Alan Kay's backing.[0]

[0] http://www.tinlizzie.org/ometa/

reply

jwmerrill 20 hours ago

I really enjoyed learning OMeta, and I'd recommend playing with it to others. However, the performance of the JavaScript? implementation is really bad. It uses exceptions as a primary mechanism for control flow, which is not generally well optimized in JS. I observed a toy calculator grammar parsing a string about 20 characters long throw and catch hundreds of exceptions.

I've had good success with Jison as a JS parser generator that is performant enough to feel good about using in production.

reply

hacker_9 1 day ago

It's a mistake to write your own parser because they can get so complex so quickly to even do basic stuff, and you'll lose sight of the original goal of the project. See Prolog - a whole programming language built around the idea of parsing language (albeit NLP)!

I suggest you take alook at Antrl4 for a powerful but easy to use parser plus lexer combo.

reply

ginko 1 day ago

I tried Antlr, even bought the book, and at least for C++ I found it pretty unworkable. Even with the book the documentation felt very incomplete. Maybe things work better on the Java side.

reply

bro-stick 1 day ago

Antlr is shit; it will waste your time for anything real. Bison+flex (once you figure out the quirks, undocumented assumptions and flags) and treetop are usable.

https://www.gnu.org/software/bison/manual/

http://flex.sourceforge.net/manual/

http://treetop.rubyforge.org

Disclaimer: A long time ago in a galaxy, far, far away, I wrote an optimizing Java-to-MIPS compiler (sans GC, so leaky heap) in C++ using Flex/Bison and again in Java using JavaCC?.

reply

 ised 1 day ago

"There is no such thing as an unmaintainable/illegible basic regex..."

To someone who knows BRE. I am one of those people. It's ERE and PCRE I do not understand very well.

Sharing solutions to common problems using BRE on HN always seems to trigger (unwarranted) criticism using either of the exact words you mention, or synonyms for them. "Unmaintainable" (by who?). "Illegible" (to who?).

I "maintain" 100's of BRE scripts. They are perfecty legible to me. None of them are so complex I cannot re-write them in a short time. It is the structure of the input that is complex and which takes time to recall.

I also use lex, a common utility found on almost all UNIX derived OS; this article seems to ignore that option. I like to think it's faster than Perl or Python, but I cannot say for sure.

reply

kevin_thibedeau 1 day ago

> I like to think it's faster than Perl or Python, but I cannot say for sure.

It is. I implemented an assembly language parser with pyparsing. It worked okay but the function call overhead with a combinator-based parser handling both the lexing and grammar was murder. I replaced it with regexes and got a 6x speedup. Not something I would do with a complex grammar though. Native code would obviously blow this away in speed but it is fast enough now.

reply

paulmd 21 hours ago

No, tasks like "extracting all links from a webpage" are absolutely trivial using an HTML parser. Run the following XPath query on an HTML object:

  //a[@class='specified_string']/@href 

Yes, you have to understand the syntax XPath to write such expressions, just like you have to know the language of regexes. Or at least be able to google them.

The answer to "who cares" is "you", because you're the one who's going to catch hell when your regex failed to capture some hyperlink that utilized some feature of XML that exceeded your test-cases. The one-liner above is guaranteed to Just Work on all valid XML documents, so why even create such a monstrosity?

Everyone knows that Regexes Cannot Parse HTML, and yet people still try it because they think they're smarter than Noam Chompsky. The real truth is that everything looks like a nail to these people, because all they have is a hammer.

reply

desas 19 hours ago

Because valid documents are rarer than documents where regex parsing is good enough.

reply

paulmd 19 hours ago

Then you're not talking about parsing HTML/XML, are you? How could you possibly know which links or syntax tokens are actually going to be displayed on a page if you feed the browser's parser an invalid document?

There are fault-tolerant HTML parsers like TagSoup? that are specifically designed to handle dirty HTML and spit out a valid document object. ...

 inglor 1 day ago

It's possible to a certain degree, but it's just way way harder then you'd do if you used a proper parser. For example, let's say I want to get all the recent question links off http://stackoverflow.com/questions/new , With RegExp?, I can write a clever thing that parses `<a href="(.?*)" class="question-link"` and _hope_ that the order never changes and the class never comes before the href and that they don't change that in a design or in other pages and that the href does not contain `class="question-link` inside it and to a degree that's valid.

The thing is - the alternative is to write a query selector - which is another more suitable domain specific language for making selections - only on the DOM instead of text, I'd just write `$(".question-link").get().map(x => x.href)` to get the hrefs and I know it's __always perfectly safe__. Now that example is trivial, if I only want links where the questions are tagged with C#, I get a much harder problem with Regex, but with query selectors it's still mostly trivial.

So, it's not that it's particularly hard to use regular expressions to solve it, it's just a lot harder than the alternative which is super simple and obviously correct.

reply

joosters 1 day ago

Your method relies on the class names not being renamed (e.g. I see "question-hyperlink" on /questions, not "question-link"). I'd skip the class entirely and match on the URL, since I doubt stackoverflow want their URLs to break:

  /<a[^>]*href="(\/questions\/\d+[^"]+)"/i

But... we can go back and forwards posting examples and find fault in any regex that I post or any selector that you post. It's missing the point. Both methods are at the mercy of web page redesigns. Both methods can be made more robust against certain changes, but cannot survive other changes. You are trying to say that regexes won't work. I am saying that both methods work.

reply

ifdefdebug 1 day ago

Look, you have essentially two options here: 1) add a full featured DOM parser to your program - do that if you have to understand the DOM; 2) write regex - do that if you don't care about the DOM, if you know how the server formats the data you need and if the server formats the data you need in a reproducible way.

> (...) and that they don't change that in a design or in other pages (...)

Well if they change the design of the pages then you will have to rewrite your regex accordingly in order to find the data you need. But if that happens, odds are high that your program, which uses a full featured DOM parser, will have to be rewritten as well in order to handle the modified output of the DOM parser...

reply

kragen 23 hours ago

in fact, in practice, i find that scrapers using regexp matchers are more robust against the kinds of template changes that people make than scrapers based on the html tree structure.

reply

---

POSIX 2008 system interfaces:

http://pubs.opengroup.org/onlinepubs/9699919799/idx/functions.html

and commandline utilities:

http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html

---

Python standard library topics, from What is Code by Paul Ford although i couldnt find this figure in the official online edition, but it is in the print edition and in various places online:

"

other libraries: "

---

from http://bytes.com/topic/python/answers/470920-recursive-map-nested-list

def recur_map2(fun, data): if hasattr(data, "__iter__"): return [recur_map2(fun, elem) for elem in data] else: return fun(data)

note: in Python 3 (but not Python 2), 'type('a'[0]) == type('a') == str', and strings are iterables, so if you try to recursive on a tree represented as nested lists with string leaves by checking for iterable-ness and iterating them if so, you'll go into an infinite loop, because a 1-character string is iterable, but the first element in it is another 1-character string...

as a solution you could do

def recur_nested_list_map2(fun, data): if type(data) == list): return [recur_map2(fun, elem) for elem in data] else: return fun(data)

but that only works for nested lists...

is there a good way to determine if one should recurse on an arbitrary nested data structure in Python?

a possibly hacky way that at least saves you from the above problem (sorta like set theory's foundation axiom..):

def recur_map2(fun, data): if hasattr(data, "__iter__"): return [(elem != data and recur_map2(fun, elem)) or fun(elem) for elem in data] else: return fun(data)

---

metatrader has a nice function 'Alert' where you give it a bunch of stringifyables and it concatenates them, then outputs them on an 'alert' console (which may cause an alert dialog to pop up (if the alert is new?))ws

eg

Alert("Two plus two is ", 3);

---

---

http://wayback.archive.org/web/20151006215628/http://dtab.io/

https://news.ycombinator.com/item?id=10319289

---

in pandas:

'appending' (unioning along the 'index' direction) is cumbersome:

df.append(DataFrame?(Series([1,2,3])))

its hard to remember .ix, .iloc (and Series doesnt have .iloc! but .ix works with integers in Series, even if you have labels; but if your labels ARE integers, then the labels take precedence!), .index, .columns, etc (but mb necessary). it's hard to remember that df[x] is indexing columns, but df.ix[x] is indexing the index, but df.ix[x, y] is indexing first the index, then columns (you can use : inside .ix, too)

(re: "Series doesnt have .iloc! but .ix works with integers in Series, even if you have labels; but if your labels ARE integers, then the labels take precedence!":

Series([1,2], index=['a','b']).iloc[0]


AttributeError?

Series([1,2], index=['a','b']).ix[0] Out[86]: 1

Series([1,2], index=[1,0]).ix[0] Out[88]: 2

Series([1,2], index=[1,0])[Series([1,2], index=[1,0]).index[0]] Out[92]: 1

(note that we can always explicitly get the index, as in the last of these examples)

)

another troubling inconsistency; sometimes df[x] means that 'x' represents rows, sometimes 'x' represents columns, eg:

df = DataFrame?([[1, 2, 3], [nan, inf, -inf]])

In [3]: df Out[3]: 0 1 2 0 1 2.000000 3.000000 1 NaN? inf -inf

[2 rows x 3 columns]

In [4]: df[0:1] Out[4]: 0 1 2 0 1 2 3

[1 rows x 3 columns]

In [5]: df[1] Out[5]: 0 2.000000 1 inf Name: 1, dtype: float64

it's weird that pointwise /0 gives nan, not inf (is this nonstandard?)

it's weird that index is not symmetric with columns (eg it expects the index to stay pretty constant; you have to use 'append' to add indices; but the columns can be mutated with 'df[col] = x')

---

need a function to give you the magic square, like MATLAB, or to give you the matrix

[1 2 3 ; 4 5 6 ; 7 8 9]

for testing in the REPL

---

say you have a DataFrame?, eg

DataFrame?(index=['a', 'b', 'c'], columns = [0, 1,3], data = [[1,-2,3], [4,5,6], [7,8,9]]) Out[60]: 0 1 3 a 1 -2 3 b 4 5 6 c 7 8 9

you want to first take the sign of everything; this is simple enough, as applying a function to a dataframe tends to apply it pointwise, which is good:

sign(DataFrame?(index=['a', 'b', 'c'], columns = [0, 1,3], data = [[1,-2,3], [4,5,6], [7,8,9]])) Out[61]: 0 1 3 a 1 -1 1 b 1 1 1 c 1 1 1

but then you want to, for each row, test whether the signs are all the same on each row, ie:

df = sign(DataFrame?(index=['a', 'b', 'c'], columns = [0, 1,3], data = [[1,-2,3], [4,5,6], [7,8,9]])); result = Series(index=df.index) for idx in df.index: result[idx] = all(df.ix[idx] == df.ix[idx][df.columns[0]])

 resultOut[80]: a    0 b    1 c    1

is there a vectorized (ie non-explict-looping) way to do this? You basically want to map a fn across each index. .apply does this. .apply takes an 'axis' argument to tell which dimension it is mapping across:

df.apply(lambda i: all(i == i[i.index[0]]), axis=1)

so note the difference between 'apply', which maps across slices, and 'applymap' which maps pointwise

---

http://pandas.pydata.org/pandas-docs/stable/basics.html

---

rolling (moving window) mean etc, expanding window mean etc, exponentially-weighted mean etc, and all of those for sum, weighted mean (eg another input series for the weights), etc

---

ggplot -- https://news.ycombinator.com/item?id=10386360

nicolapede 1 day ago

> And it's really hard to beat ggplot.

To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).

Also, what's wrong with comparing R to Pandas/Numpy ? They can only be used from within Python, right?

Edit: just realised from another comment that Pandas/Numpy can be accessed from R, too.

reply

6502nerdface 1 day ago

> > And it's really hard to beat ggplot.

> To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).

They're quite different, though, and I can see why many prefer ggplot. It's a declarative, domain-specific language that implements a Tufte-inspired "grammar of graphics" (hence the gg- in the name; see section 1.3 of [1], and [2,3]) for very fast and convenient interactive plotting, whereas matplotlib is just a clone of MATLIB's procedural plotting API.

[1] http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis...

[2] http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...

[3] http://vita.had.co.nz/papers/layered-grammar.html

reply

sweezyjeezy 23 hours ago

"matplotlib seems a good contender to me'

I've waxed lyrical about Python all over this thread, but here you have to give the medal to R. Matplotlib is one of my least favourite libraries to use, been doing it for almost 2 years, and I still spend half my time buried in the documentation trying to figure out how I'm supposed to move the legend slightly to the right or whatever.

ggplot probably has slightly less flexibility overall (mpl is monolithic), but for just doing easy things that you need 99% of the time, ggplot is king.

reply

nrpprn 21 hours ago

There is a gpplot clone in python. Also bokeh is starting to develop a grammar of graphics interface. Then there is seaborn and mbplot. Lots of stuff besides mplotlib

reply

dagw 1 day ago

matplotlib seems a good contender to me

On paper perhaps, less so in application. Sure you can probably make matplotlib do everything ggplot does with enough work, but working with ggplot is just so much quicker easier and more fun.

And I say that as someone who does all his data analysis in Python.

reply

has2k1 20 hours ago

I have rewritten the python ggplot to put it on per with ggplot2.

You can try out my dev version [1] (rewrite branch). It will be nearly API compatible.

[1] https://github.com/has2k1/ggplot

reply

 jbssm 17 hours ago

Well, for scientists wanting to publish, GGplot it's quite unpractical. Most of the time we have to publish in B&W magazines and GGPlot simply lacks the capabilities to do so properly (por instance B&W filling patterns).

Matplotlib with some good definitions ends up providing much better results and nicer looking plots fro B&W unlike what people normally think.

reply

pvaldes 16 hours ago

... and I remembered why I don't use ggplot at all, thanks. After lots and lots of plots done with R, I was starting to feel a bit weird reading the comments.

reply

---

"FP infrastructure for R in the style of underscore.js"

https://github.com/hadley/purrr


hadley 77 days ago

parent on: TauCharts?: Data-focused JavaScript? charting librar...

Given that you called them facets, I'm going to assume you're familiar with ggplot2. In retrospect, I thinking adding non-Cartesian coordinate systems was a mistake. They are a huge amount of work and only make a handful of charts easier (and most of which, like the radar chart, are not that useful).


http://vita.had.co.nz/papers/tidy-data.html

http://blog.rstudio.org/2014/07/22/introducing-tidyr/

https://news.ycombinator.com/item?id=10388331

"During this first job, Wickham began to reflect on better ways to store and manipulate data. “I’ve always been very certain that I could come up with a good way of doing things,” he explained, “and that that way would actually help people.” Although he didn’t know it at the time, he believes it was then that he “internalized” the concept of Third Normal Form, a database design concept that would become central to his future work. Third Normal Form is essentially a manner of structuring data in a way that reduces duplication of data and ensures consistency. Wickham refers to such data as “tidy,” and his tools promote and rely on it."

https://en.wikipedia.org/wiki/Third_normal_form

http://datascience.la/a-conversation-with-hadley-wickham-the-user-2014-interview/ also mentions lubridate

---

stared 203 days ago

Speaking as a Pythonist (but who is in love with ggplot2 and dplyr) the wonderful thing about IPython Notebook is that its possible to inline R code with no more fuss than adding "%%R" in a cell.:

http://nbviewer.ipython.org/github/davidrpugh/cookbook-code/...

BTW: For pandas-dplyr dictionary: http://nbviewer.ipython.org/gist/TomAugspurger/6e052140eaa5f...

---

http://stackoverflow.com/questions/28252585/functional-pipes-in-python-like-from-dplyr/

---

 mapcar 325 days ago

In the R Help Desk 2004 (http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf), Gabor Grothendieck recommends chron over POSIXct classes on account of the time zone conversions which occur when the tz attribute of the latter object is not "GMT". Will this not be a problem with lubridate? Thanks in advance.

hadley 325 days ago

I've never found that to be a problem in practice. Do you have an example where it's bitten you in practice?

(Also you should use UTC and not GMT)

mapcar 323 days ago

Hi Hadley, yes for instance

> as.chron("1970-01-01")+unclass(as.chron("2001-04-01")) [1] 04/01/01

> as.POSIXct("1970-01-01","EST")+unclass(as.POSIXct("2014-06-01","EST")) [1] "2014-06-01 05:00:00 EST"

If there is any conversion necessary it is difficult to get back the original intended time.

hadley 323 days ago

What does adding two dates together mean?

mapcar 323 days ago

Isn't this the conventional way of converting variables which have been coerced to their numeric representations back to time/date classes?

---

https://indradhanush.github.io/2015/03/23/dealing-with-datetime-objects-in-python/

---

this is (too?) complicated but we need at least some of it:

http://pandas.pydata.org/pandas-docs/stable/merging.html

---

seaborn

pandas.melt

---

"the regex system in Python...The best feature of the regex system in Python is without a doubt that it's making a clear distinction between matching and searching...when you perform a match you can provide an index to offset the matching but the matching itself will be anchored to that position...

>>> pattern = re.compile('bar') >>> string = 'foobar' >>> pattern.match(string) is None True >>> pattern.match(string, 3) <_sre.SRE_Match object at 0x103c9a510> "

...

In addition to the matching Python can search which means it will skip ahead until it finds a match:

>>> pattern = re.compile('bar') >>> pattern.search('foobar') <_sre.SRE_Match object at 0x103c9a578> >>> _.start() 3 "

" Enter The Scanner

This is where things get interesting. For the last 15 years or so, there has been a completely undocumented feature in the regular expression engine: the scanner. The scanner is a property of the underlying SRE pattern object where the engine keeps matching after it found a match for the next one. There even exists an re.Scanner class (also undocumented) which is built in top fo the SRE pattern scanner which gives this a slightly higher level interface.

The scanner as it exists in the re module is not very useful unfortunately for making the 'not matching' part faster, but looking at its sourcecode reveals how it's implemented: on top of the SRE primitives.

The way it works is it accepts a list of regular expression and callback tuples. For each match it invokes the callback with the match object and then builds a result list out of it. When we look at how it's implemented it manually creates SRE pattern and subpattern objects internally. (Basically it builds a larger regular expression without having to parse it). Armed with this knowledge we can extend this:

...

So how do we use this? Like this:

scanner = Scanner([ ('whitespace', r'\s+'), ('plus', r'\+'), ('minus', r'\-'), ('mult', r'\*'), ('div', r'/'), ('num', r'\d+'), ('paren_open', r'\('), ('paren_close', r'\)'), ])

for token, match in scanner.scan('(1 + 2) * 3'): print (token, match.group())

In this form it will raise an EOFError in case it cannot lex something, but if you pass skip=True then it skips over unlexable parts which is perfect for building things like wiki syntax lexers.

Scanning with Holes

When we skip, we can use match.start() and match.end() to figure out which parts we skipped over. So here the first example adjusted to do exactly that:

scanner = Scanner([ ('bold', r'\*\*'), ('link', r'\[\[(.*?)\]\]'), ])

def tokenize(string): pos = 0 for rule, match in self.scan(string, skip=True): hole = string[pos:match.start()] if hole: yield 'text', hole yield rule, match.group() pos = match.end() hole = string[pos:] if hole: yield 'text', hole

...

Fixing up Groups

One annoying thing is that our group indexes are not local to our own regular expression but to the combined one. This means if you have a rule like (a

b) and you want to access that group by index, it will be wrong. This would require a bit of extra engineering with a class that wraps the SRE match object with a custom one that adjusts the indexes and group names. If you are curious about that I made a more complex version of the above solution that implements a proper match wrapper in a github repository together with some samples of what you can do with it.

https://github.com/mitsuhiko/python-regex-scanner "

Grue3 5 hours ago

Python's re has nothing on CL-PPCRE [1] though. The ability to build up a "regular expression" from S-expressions is just too useful.

[1] http://weitz.de/cl-ppcre/

reply

willvarfar 6 hours ago

Another very-cool undocumented feature on another regex engine is re2's Set. It compiles a collection of regex to a single regex, and so allows you to very efficiently match a string against an array of patterns.

reply

andreasvc 5 hours ago

Which re2? I maintain a fork of re2 but it's not in there [1].

If you mention re2 the main cool feature about it is that it is efficient, matching in linear time using DFA. Unfortunately unicode strings need to be encoded to utf8 but if you can design your application to work with utf8 bytestrings you can avoid that cost.

[1] http://github.com/andreasvc/pyre2

reply

 junke 2 hours ago

For example, here below is a CL version of "Enter The Scanner". Instead of a closure, we could instanciate a class, but this is sufficient:

   (defun create-tokenizer (rules)                                                                                                      
     (loop for (token rule) in rules                                                                                                    
           for regex = (if (stringp rule) `(:regex ,rule) rule)                                                                         
           collect `(:register ,regex) into alternatives                                                                                
           collect token into tokens                                                                                                    
           finally                                                                                                                      
              (let ((scanner (ppcre:create-scanner `(:alternation ,@alternatives)))                                                     
                    (tokens (coerce tokens 'vector)))                                                                                   
                (return                                                                                                                 
                  (lambda (string &key (start 0))                                                                                       
                    ;; generator-like                                                                                                   
                    (lambda ()                                                                                                          
                      (multiple-value-bind (match-start match-end registers)                                                            
                          (ppcre:scan scanner string :start start)                                                                      
                        (cond                                                                                                           
                          (match-start                                                                                                  
                           (setf start match-end)                                                                                       
                           (values                                                                                                      
                            (aref tokens (position-if-not #'null registers))                                                            
                            match-start                                                                                                 
                            match-end))                                                                                                 
                          (t (values))))))))))                                                                                          
                                                                                                                                        

The above compiles the regex and returns a closure which accepts a string. That closure returns another closure which generates tokens on-demand for the given string, along with the start and end positions of the token inside the string. Here is a test:

   (loop with tokenizer = (funcall (create-tokenizer '((ws "\\s+")                                                                      
                                                       (plus #\+)                                                                       
                                                       (minus #\-)                                                                      
                                                       (mult #\*)                                                                       
                                                       (div #\/)                                                                        
                                                       (num :digit-class)                                                               
                                                       (par-open #\()                                                                   
                                                       (par-close #\))))                                                                
                                   "(1 + 2) * 3")                                                                                       
         for token = (multiple-value-list (funcall tokenizer))                                                                          
         while token                                                                                                                    
         collect token)                                                                                                                 
                                                                                                                                        
   => ((par-open 0 1) (num 1 2) (ws 2 3) (plus 3 4) (ws 4 5) (num 5 6)                                                                  
 (par-close 6 7) (ws 7 8) (mult 8 9) (ws 9 10) (num 10 11))

larkinrichards 1 hour ago

I believe there is a small bug in the final example, in the tokenize definition it references 'self' where I believe it should reference 'scanner'.

reply

There is nothing unique about Python's implementation of regular expressions. Ruby has an equally (if not more) powerful `StringScanner?`. This is a nice article but it could have done without the "it's one of the best of all dynamic languages I would argue" tone.

reply

the_mitsuhiko 5 hours ago

> Ruby has an equally (if not more) powerful `StringScanner?`

The string scanner is just a step by step matching of individual expressions. Because they are not folded together the "skip non matching" part has been done in Ruby which is precisely what the Python scanner avoids.

reply

-- https://news.ycombinator.com/item?id=10600520

---

so what should be async in the std libs? it's tempting to say 'everything' but this would impose a high computational cost which would discourage ppl from breaking functions into smaller subroutines. so mb:

see also https://en.wikipedia.org/wiki/Asynchronous_I/O

later: http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html

summary: based on the previous article the v8 js implementation switched to xorshift128+

---

packages people 'npm install' a lot

https://www.npmjs.com/#explicit

:

---

http://hackersdelight.org/basics2.pdf provides some other basic functions that we might want to provide in a library or in core (abs, etc)

---

Animats 1 hour ago

The initial bug in Ruby/Rails is striking in its stupidity.[1] You can send something to Ruby/Rails in a session cookie which, when unmarshalled, stores into any named global variable in the namespace of the responding program. It's not a buffer overflow or a bug like that. It's deliberately designed to work that way. It's like doing "eval" on untrusted input. This was on YC years ago.[2] Why was anything so idiotic ever put in Ruby at all?

Something like this makes you suspect a deliberate backdoor. Can the person who put this into Ruby/Rails be identified?

[1] http://robertheaton.com/2013/07/22/how-to-hack-a-rails-app-u... [2] https://news.ycombinator.com/item?id=6110386

reply

danso 1 hour ago

I think you're overextrapolating here, though I admit my knowledge on this isn't totally up to date.

As I understand it, Ruby's Marshal function, which takes text data and deserializes it, is not safe by default. So, is that a flaw of Ruby? I guess...except that this kind of serialization seems to be a standard feature in languages (well, Ruby and Python, the two things I currently use):

https://docs.python.org/3/library/pickle.html

> Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

So the true bug seems to be that in Rails ActiveSupport? (in a deprecated class, which uses some of Ruby's fun meta magic to deal with missing methods -- so basically, the classic obfuscation of functionality as a tradeoff for some sugary magic, all in a deprecated function that likely no one revisits), you can trigger a set of functions and routines in which the final decoding step, for whatever reason, ends up invoking Ruby's Marshal (via Rack: http://www.rubydoc.info/github/rack/rack/Rack/Session/Cookie...)

reply

bri3d 42 minutes ago

Marshalling bugs in other languages and frameworks:

It's hard to attribute malice to an obvious mistake that everyone makes.

reply https://news.ycombinator.com/item?id=10754964

---

" One of the big problems with Python and its developers is that the core developers take the position that the quality of third party packages is someone else's problem. Python doesn't even have a third party package repository - PyPI? is a link farm of links to packages elsewhere. You can't file a bug report or submit a patch through it. Perl's CPAN is a repository with quality control, bug reporting, and Q/A. Go has good libraries for most server-side tasks, mostly written at Google or used at Google, so you know they've been exercised on lots of data. "