proj-oot-ootSpreadsheetNotes2

some spreadsheet pros and cons:

pros: " Spreadsheets have a largely nonsymbolic representation of programs. The only symbolic representation used is the simple expression language, which people can easily learn as consists only of constants and function application. Of course, every spreadsheet program has some ad hoc language for defining new formula, but that’s outside the usual spreadsheet model and gets used very rarely. We can think of the spreadsheet as a limited kind of application-building toolkit. It supports a form of interactivity, in that we may select a location for editing, and updates occur in response to these edits (conceptually no different than any other application, be it Photoshop or Twitter). There’s no separate step of ‘running’ the program and we’re encouraged to consider the UI as just a way of viewing a program, rather than a separate artifact produced by a program. See viewing a program as a UI. The user can easily construct functions that manipulate sequences in an inductive fashion, considering only concrete inputs. For instance, the cell A2 takes the value = A1 * A3 + B$7, and the user drags to extend the definition to build a whole sequence. Of course, a programmer has no trouble generalizing this sort of thing and just writing a step function that uses symbolic inputs, but it requires less learning to do things the spreadsheet way. Lastly, and this might seem dumb, but a 2D grid as a template is actually a surprisingly nice starting point for a lot of computing tasks and layouts. In terms of approachability, it is much easier to modify an existing template than it is to start with a blank canvas and create entirely new content. A blank canvas requires that the user have a vision of what they want to create, which they execute on by understanding the means of abstraction and means of combination. In comparison, the ‘modify a template’ mode of spreadsheets lets the user get away with just reacting to what’s on the screen. Bret Victor talks about some of this in Learnable Programming. I won’t argue here that one modality is inherently better than the other, but part of the appeal of spreadsheets is they do support the ‘create by reacting’ modality better than traditional programming. "

cons: " They are entirely first-order. There is basically no support for abstraction (though all spreadsheet programs have some ad hoc language for defining new formula), and certainly no ability to define higher-order functions. Without the ability to define new abstractions, it’s impossible to manage complexity. The way people reuse spreadsheets is largely by copying, pasting, and modifying. Naturally, spreadsheet apps have some grown ad hoc ways of linking between spreadsheets, but this is all a very very poor substitute for a real programming language. The forms of interactivity spreadsheets do support is too limited. Why can’t I render a numeric cell as a slider that I can move back and forth to see things update instantly, for instance? And even when some form of interactivity is supported, it’s usually in some ad hoc, unguessable way, rather than something unified and obvious. The 2D grid which seems like a helpful starting point actually becomes quite annoying for more complex spreadsheets. I’m just declaring a collection of variables, why the heck do I have to worry about where they are positioned? Why the heck am I refering to values by position rather than by some meaningful name?? Also, the use of a grid often leads to lots of futzing around to deal with unwanted interaction between row heights and and column widths for logically unrelated parts of the layout. " -- http://pchiusano.github.io/2015-03-17/unison-update5.html

the cons each suggest an obvious ways to improve spreadsheets (i would expect most spreadsheets already do these, though?):

as pchiusano notes elsewhere, having typed cells is important too. And as others have suggested, having a way to do unit tests is important too.

also,

" David Barbour • 2 months ago

A useful variation on spreadsheets would be to make every cell a lens on some external value. A conventional spreadsheet is then modeled as a lens over unit. But if a spreadsheet becomes an editable view of external data, they become far more composable, a basis for shared UI (via pubsub on the external data), etc..

" -- http://pchiusano.github.io/2015-03-17/unison-update5.html#comment-1922324205

--

in response to the Unison language/IDE/platform: " Ok, so I spent about an hour casually reading the posts, watching the demo videos and glancing at the code. I agree with your premise and I like a lot of the implementation so far. It's great to see more innovation and research in this area alongside Light Table, Eve (which seems to be going in a similar direction with spreadsheets), NoFlo? and others.

...

Adding this as an optional layer to an existing editor might help work out some of the ergonomics. Performance issues aside, building on top of Atom might be a good choice since it's mostly web based like the current UI in the demos. Similarly, a custom kernel or cell type in ipython might get the ideas into daily use. " -- https://news.ycombinator.com/item?id=9514335

--

https://jupyter.org/

--

in reply to Unison:

jakub_h 22 hours ago

The things you're describing are EERILY similar to some stuff I've been conceptualizing and thinking about recently, as inspired by the VPRI people's work/OMeta/Shen. I'd just remark that...

"The program is a UI, and UI interaction is programming. Unison panels take the idea of spreadsheets, which blur the distinction between UI interaction and programming, to its logical conclusion."

...this blurring between programming and interaction (which I've also thought of) probably offers a potentially viable path to much more powerful voice control of devices, since programming APIs serving as user interfaces can now also (besides graphical UI) implicitly define strict grammars for structured voice input. That's the one thing I thought of that I didn't see mentioned there.

reply

--

white-flame 21 hours ago

One term I didn't see on the page, but you should look into examples of if you already haven't, is "structure editor". That's the notion of a source code editor which doesn't allow free-form text editing, but creation of well-formed language semantics.

reply

melloclello 20 hours ago

Also, "tree editor".

reply

--

" The program is a UI, and UI interaction is programming. Unison panels take the idea of spreadsheets, which blur the distinction between UI interaction and programming, to its logical conclusion. We don’t write a program to produce a UI as output, we write a program which is viewed as a UI. Interaction with the UI is, quite literally, programming, though the user doesn’t have to be aware of it. This model means we can solve problems of validation, autocomplete, and exquisitely context sensitive help once, in the Unison editor, and solve them in a principled way using the lens of programming languages and type systems, rather than recreating this functionality in ad hoc ways for each and every UI we write. " -- http://unisonweb.org/2015-05-07/about.html

---

https://a16z.com/2015/07/29/blockspring/

https://news.ycombinator.com/item?id=9969890

---

rgoddard 4 days ago

I am currently working as an Actuarial Analyst, but I have also worked several years as a programmer.

As an analyst the tools I use are excel, access, SAS Enterprise guide and Oracle SQL developer. One of the big problems I face is that we have no good way to abstract away a process and really make it reusable.

My general work flow is using SAS to pull data from multiple sources, combine and run the data through some series of logic/calculations. Then take the resulting data, copy to excel for some additional analysis or report. This might be for a monthly/quarterly report or an analysis that needs to be update with the additional runout of data.

But these steps are all tightly coupled together. If I want to rerun the same logic on a different data set, or an updated data set I will copy and paste all of the files, update the queries. I have no way to bundle them together so that I can easily reuse with different data sources, or refreshed data.

Really want I want is someway to encapsulate different sets of data transformations/calculations into to functions to reuse them in different contexts and among different people.

reply

dgudkov 3 days ago

Look at my EasyMorph? (http://easymorph.com). It's a visual replacement for scripted data transformations. People use it to replace SAS and Visual Basic scripting. It also allows creating reusable modules. Contact me at <hnusername>@easymorph.com if it looks interesting to you.

reply

shaunxcode 3 days ago

Hey, I clicked through, read the tutorial, got excited about your examples.. tried to download and found out it was windows only! I would have totally evaluated it further if there were an os x/linux option.

reply

dgudkov 3 days ago

Thanks for checking it out! As we're targeting Tableau users so eventually we will release an OS X version.

reply

nycdatasci 2 days ago

Speaking of Tableau (which was founded on the concept of VizQL?), how is this different? Doesn't tableau basically enable knowledge workers to create data-centric web applications?

reply

 mistermann 3 days ago

Does this have an API that can be called from .Net?

I'm really liking some of the things Microsoft is doing with Power Query, but I don't like how it is (afaik) only callable from Excel or PowerBI? online. I'd like similar capability, but more open, and could be called via scripting, from SQLCLR, etc.

Another big hitch: Microsoft has not published proper API's for manipulating PowerPivot? models in Excel, and I don't think they intend to - I've heard one 3rd party has reverse engineered the API's (you can decompile the .Net binaries, but I haven't had the time to look at it yet).

reply

tycho01 2 days ago

Would you perhaps have any info/reference about where I could learn more about this reverse engineered PowerPivot? API? This sounds pretty exciting. :D

To expand on my question, I'd heard (from Rob 'PowerPivotPro?' Collie, former product manager on the project IIRC) that the core had been written in 'unmanaged code' (probably C++?), so I believe reverse engineering it would be a significantly larger effort than just opening some of its DLLs in say DotPeek?, at least from as far as I've been able to tell.

reply

mistermann 2 days ago

The PowerPivot? engine itself I imagine is in unmanaged code, but the code that just writes datasets and whatnot to the model is (from what I've heard)managed code, and indeed you can decmpile the libraries and see all sorts of things, I've only looked around for about 10 minutes or so. And I had just read on some obscure thread that someone had successfully found the undocumented API call to write to the model, which is what I'm wanting to do - but I don't even know what the product name is that supposedly does this, sorry.

reply

tycho01 2 days ago

Right, makes sense, I'll try and check out what's available then. :)

By writing to the model, you mean programmatically adding new measures or the like?

My interest is in programmatically querying models using DAX, though to this end I'd also look to look in the direction of Microsoft's DirectQuery? mode in SQL Server which supposedly did DAX-to-SQL conversion.

If one could use such a conversion plus MDX to start querying models on an Apache Spark cluster through pivot table/chart interfaces...

reply

mistermann 1 day ago

Not even measures, I'm just wanting to be able to create tables, define relations, etc, with the accompanying sql or m script. I'm hopeful they'll let us do that some day, but I still don't quite believe they've changed their stripes entirely.

reply

dgudkov 3 days ago

Currently EasyMorph? supports integration through command line only. We do not plan having API for the desktop client, but we will definitely make EasyMorph? Server API if we reach that point.

reply

icebraining 3 days ago

We use Pentaho Kettle for those kinds of transformations. It's FOSS, and connects to a whole bunch of programs and formats.

It's a graphical tool - you drag-n-drop modules, then configure and connect them, though it can also run scripts (it has JavaScript?, Java, Bash and Ruby support, besides SQL, of course) - but after configuring the transformation/job, you can also run it on the terminal, which is useful for periodically re-running it.

http://community.pentaho.com/projects/data-integration/

reply

mindcrime 3 days ago

I've been doing a lot of work with Kettle as well, and it is a handy tool (albeit with a few warts).

What I think would be handy for use in an organizational setting, where "business users" might want to use some of the transforms, would be a way to publish transforms somewhere, making them discoverable and accessible to others. I don't want to make it sound like I'm talking about UDDI or anything (although, thinking about it, maybe you could use that), but just an easy way for a Joe Business User to get a list of available transforms, some explanation of what they do, what input they take, what they output, etc. And maybe a way to make changes to the "small stuff" (like the input and output path, for example) without having to load up Spoon and edit the ktr that way. Since transforms can be parameterized, that should be doable...

You could also picture combining this with something like a Yahoo Pipes like web interface, to let you define your own chains of transforms and operations as well. And hell, a web-based interface for editing ktr files would be a pretty interesting thing as well, if somebody would build it.

reply

mhw 3 days ago

Have a look at Alteryx (http://www.alteryx.com/) - it's pretty close to what you're describing, I think.

reply

myoffe 4 days ago

I haven't used it extensively, but SQL Server Integration Services (SSIS) looks like it does a lot of the things you're talking about.

reply

wesd 3 days ago

It does. There are other ETL tools as well.

https://en.wikipedia.org/wiki/Extract,_transform,_load#Tools

reply

knn 3 days ago

The databricks platform should solve exactly your problem - reusable data pipelining/transformation. I saw a demo of it last night and it was extremely slick. Their product is amazing, it makes data pipelining incredibly easy compared to setting up a hadoop cluster and running hive/etc. (I don't work for them - but if any databricks employee sees this, please hire me!) It runs on a spark cluster over AWS, which is much more modern and powerful than SAS/excel/sql. Since you know how to program already, it shouldn't be too hard to pick up spark (even has python bindings)

reply

mcarroll_ 3 days ago

@rgoddard - May be a bit overkill but check out Immuta (www.immuta.com). Its a data platform, built for data scientists, that enables you to query across many disparate sets of data using familiar patterns such as SQL, file system, etc. Our SQL interface allows you to hook to Excel, Tableau, Pentaho...so you could write your abstracted logic and connect to many data sources or mashed up analytic results. contact me at matt@immuta.com if you're interested after reading through the site.

reply

---

http://dtab.io/

---

https://news.ycombinator.com/item?id=10346811

---

https://fieldbook.com/developers

---

https://airtable.com/

---

guesstimate

https://news.ycombinator.com/item?id=10816563

---

yomritoyj 6 hours ago [-]

I love how notebooks allow the mixing of code and output and support incremental development by letting you choose which cells to execute. But I find the semantics horrible. Each time you execute a cell you do so in an environment that depends on your entire history and cannot be figured out by simply reading the notebook. I wish for a environment which would have the same semantics as a script but which would snapshot the environment at the entry to each cell so that when a cell is modified execution does not have to resume at the beginning. Even better if downstream data dependencies are tracked so that after modifying and reexecuting a cell we know which downstream results have become stale. Does such an environment exist?

reply

mjlm 5 hours ago [-]

I think this persistent state is one of the main advantages of the notebook environment, or the Matlab workspace, which I guess it was inspired by. It allows you to quickly try alternative values for certain variables without having to re-calculate everything. Saving snapshots would not be feasible if the project contains large amounts of data. If you want to reset everything, just "run all" from the beginning, or use a conventional IDE with a debugger.

reply

rhodin 3 hours ago [-]

No, not Matlab but Mathematica: "We were inspired originally by the excellent implementation in Mathematica" [1].

[1] http://ipython.org/ipython-doc/dev/whatsnew/version0.12.html...

reply

brians 1 hour ago [-]

And that came from Emacs and old Lisp environments---and perhaps something yet earlier?

As late as 2000, this was the single biggest advantage and single biggest impediment to new programmers in MIT's 6.001 lab: a bunch of nonvisible state, mutated by every C-x C-e. The student has tweaked two dozen points trying to fix a small program, and re-evaluated definitions after many of them, but maybe not all. The most straightforward help from a teacher is to get the buffer into a form such that M-x eval-region paves over all that, sets a known environment of top level definitions, and---more than half the time---the student's code now works.

I have similar concerns about much of Victor's work, for the same reason. Managing a mental model of complex state is a n important skill for programming, but it's best learned incrementally over long experience with more complex programs. These very interactive environments front load the need for that skill without giving any obvious structure for helping the student learn.

Contrast Excel and HyperCard?, which have no invisible state: you can click and see everything.

reply

yomritoyj 4 hours ago [-]

But you cannot recalculate if your calculation has trashed your inputs.And if it hasn't then the snapshot does not impose a cost. If you are willing to forego the opportunity to replay to save memory, just put the producer and consumer in the same cell.

reply

lqdc13 4 hours ago [-]

It is essentially already that if you are careful with the variables.

Currently, you can easily control which things are saved and which aren't. So it is the best of both worlds.

reply

thanatropism 59 minutes ago [-]

The real simple thing would be to paint a border or just a sideline in cells that have already been executed.

Just that would make everything much, much better.

reply

denfromufa 1 hour ago [-]

What you describe looks like reactive programming and lazy evaluation and Mathematica has support for this using the notion of Dynamic. This is how actually Manipulate is implemented.

reply

98Windows 5 hours ago [-]

That's why I've just resorted to using 'Restart & Run All' as my default way to run a notebook I've finished coding up.

reply

dalailambda 4 hours ago [-]

I imagine with a little introspection, a notebook can figure out dependencies between cells and cascade as needed.

reply

yomritoyj 3 hours ago [-]

I guess the challenge is to do it efficiently in the presence of pointers, implicit or explicit.

reply

brians 1 hour ago [-]

There's great work on this called differential dataflow. You have to trace at a finer grain than cells, but can then avoid lots of recalculation.

reply

cdavid 4 hours ago [-]

tonicdev seems to do it, though I have never used it: http://blog.tonicdev.com/2015/09/10/time-traveling-in-node.j...

note: it does undo, but not recalculation on demand ala excel.

reply

---

bicubic 15 hours ago [-]

> JupyterLab? adapts easily to multiple workflow needs, letting you move from a Notebook/narrative focus to a script/console one.

I'm not sure I like where that design is going. It's starting to look an awful lot like RStudio and Matlab and I moved away from those tools for a reason. My favourite thing about Jupyter is that it is focused on notebooks and narrative. It brought about a revolution of sorts; now we have people blogging and writing papers in Jupyter, github is full of random useful notebooks.

This design almost seems like a step backwards in that regard.

reply

nl 10 hours ago [-]

Context: We use Jupyter heavily (mostly against Spark).

In my experience there is a set of things that "traditional" Jupyter notebooks does really well. Anytime you have a linear flow of steps the notebook metaphor works really well.

However, if you are doing things approaching traditional development, where you have multiple sources of data, or loops that require debugging, or basically anything that isn't linear in nature it doesn't work so well.

I wouldn't want to lose traditional notebooks, but I'd love to be able to offer people something like this that offers better debugging and some development tool support, rather than jumping to a full desktop IDE.

reply

bicubic 8 hours ago [-]

My experience is in line with yours, debugging loops and functions is a big pain point.

However, I think there's a much better solution to be had here, which is to add more powerful debugging capabilities to Notebook. I think Notebook has potential for new debugging paradigms, imagine for example being able to break anywhere in a cell and get a new 'forked cell' which operates in the context of the code that you just broke into. I think that's the better direction to go instead of reverting to the very interfaces and paradigms that we moved away from.

reply

radiowave 5 hours ago [-]

This sounds like a promising idea.

As someone who still reaches for a Smalltalk environment when I need to prototype something, ipython notebook is about the closest thing I've ever found to the style of interaction you get with a Smalltalk REPL (aka "workspace"). Smalltalk deliberately blurs the lines between a text editor and REPL, and the debugger takes this a step further - a combined editor and REPL, within a suspended execution context. It could be argued, for example, that coding within the suspended context of a failed unit test, while the code underneath your cursor has this REPL-like liveness, is the non-cargo-cult way of doing TDD.

I'm not trying to claim Smalltalk as the Greatest Thing Ever, but its existence (and its "otherness" - from the point of view of today's conventional style of development) are evidence that there are useful tools to be had, somewhere down a road less travelled.

reply

muraiki 4 hours ago [-]

I learned enough Smalltalk to make prototypes with Seaside. I love the way notebooks work and to make them more Smalltalky would be wonderful not only for programmers but for the field as a whole.

reply

nl 7 hours ago [-]

I'm not sure (And by that I don't mean I disagree: I'm genuinely unsure).

To me, the traditional IDEs do work well for debugging and software development.

Notebooks are great for explanatory examples and interactive experiments. I think these are different to the type of software development I do when I use an IDE.

For example, I find notebooks great for rapid iteration of parameters when I'm doing "data science", or indeed most of the feature extraction->modelling->prediction data science pipeline.

What I don't find them good for is developing new algorithms. It isn't clear to me if this is an inherit limitation of the notebook format, or just something where it needs new developments.

(To be clear, I've also used both Zeppelin and Beaker notebooks and don't see any particular advantages. I've also used R Studio, but I don't really know enough R to comment sensibly on that)

reply

kot-behemoth 6 hours ago [-]

Seeing as you're heavily using Spark, have you had a look at Apache Zeppelin (site: https://zeppelin.apache.org, demo: https://www.youtube.com/watch?v=J6Ei1RMG5Xo)? Seems like a more powerful notebook approach, plus better architecture for using embedded d3.js viz. Also painless templated SQL -> published dashboard looks great for getting data visible early on.

reply

nl 1 hour ago [-]

Yes. Not really a fan.

It looks nice, but the installation experience is (was?) terrible (as in - didn't work at all). Note the long gap between the 0.5.6 release (January) and the 0.6.0 release (July)? There were 3 (4?) Spark releases in that time, and that meant that none of the out-of-the-box released worked for anything except the version of Spark you downloaded with it (and from memory that had problems too)

I got it working and evaluated it in some depth. I'm from a Java background, so I really wanted to like it.

But it turns out that all those features that seem really nice are mostly only nice if you are trying to build applications, not notebooks. Maybe it has improved, and maybe for some usecases it makes sense.

reply

---

p4wnc6 14 hours ago [-]

I agree, but would phrase it more like this: Jupyter has succeeded because each of the different major modes of interaction have been decoupled. If you just want to use it in a shell, you don't need to involve the browser at all. If you want a narrative format for sharing, presenting, or converting to slides, you can easily launch that environment.

This feels like a big step backwards for me too. It's effectively like replicating the MATLAB / Octave / PyDev? (Eclipse) sort of IDE-with-extras-plus-console that is so, so cripplingly bad, but acting like it's great and new just because it's all in the browser.

If you're a fan of productivity, you shouldn't want to do that kind of stuff in a browser. Heck, I even disable all of the dropdown menus in Emacs because even that is too much of a productivity hindrance / inefficient use of monitor space when I am writing, reading, and thinking about code.

This is one of those things where I feel that it doesn't actually solve practical use cases, doesn't make people more productive, but because there is a big hype engine behind it, it gets adopted and talked about anyway, and eventually becomes the sort of thing that an Office Space kind of manager starts to force you to use ... which really scares me. Stay off my lawn.

reply

soVeryTired 7 hours ago [-]

Can you elaborate on what you think is bad about the IDE-with-extras-plus-console framework? I do research in quantitative finance and I actually find it really useful.

reply

p4wnc6 1 hour ago [-]

One of the main things is that it continues to perpetuate primarily mouse-driven interaction with the development environment. Even when tools like this enable Emacs or vi key configurations, the integration just never quite works, and there are environment-specific options you are required to select that come from e.g. drop-down menus, etc. Interacting with UI elements is horrendously unproductive and disruptive to thinking. Putting it in the browser makes this worse, because then you've also got the browser's own key configurations, like tab switching or bookmarking, to worry about.

There is seldom any value in looking directly at code and at a console at the same time. But if you really want that, it's super easy to do it with a window manager like xmonad, or even just arranging shell windows on your desktop so that you can alt-tab between them easily.

You often want to quickly spawn and kill shell tabs, which themselves may or may not be in the same language. For example, I often have a tab in which I'm using IPython, another tab in which it's the working directory so that I can execute things with Python directly, mv/cp files, ls, etc. And then still more tabs in which I have background processes that check for file changes and run my unit tests whenever things change, sometimes a tab for Python 2 and another tab for Python 3. And further, development projects are almost always cross-language, so I tend to also have some tab opened for writing and working with C or Haskell at the same time, and possibly another with a psql shell.

Since there are so many necessary tabs just to do even the tiniest things, it means that any and all visual overhead must die. Switching these tabs, even if it is quick, inside of a clunky GUI application like a browser is just too unproductive -- the browser periphery already wastes maybe 5% of the available visual space, and then the overhead for the tab icons, clickable close buttons, etc., wastes another 5% inside of that, and then the width of the tabs is restricted because there's some left panel with directory information or in-memory workspace information (both total wastes of time), and the height is restricted below by some worse-than-plain-shell console; it just makes no sense. It's too much visual clutter and too inefficient to facilitate switching around as much as is necessary.

I think it should be emphasized again that the left-side panel showing either directory structure or the contents of an in-memory working environment are huge wastes of time. If you need to visualize a directory structure, that should just be another buffer, like a code source file, and when you want to view that you just switch to that buffer. There's no benefit to having some of your visual field distracted by it when you're working on other files. And the in-memory information is also generally crappy. It's another thing where if you really need it then it should just be in another buffer and you should quickly go to that buffer and give it all your attention for the short time you need it, then go back. It serves nothing by having it as an ever-present visual distraction. But more than that, relying on inspecting variables that way is a very infantile thing, and I see it a lot with MATLAB programmers. Their form of debugging is not to scientifically inspect and control the execution of the code, and use proper breakpoints and watchpoints to tells them what's going on, but instead to just "run everything" and then go and click to open up a spreadsheet-like view of a matrix variable or something and manually (!) inspect the data. Then they become reliant on this as a crutch and complain when it's no longer there, instead of learning proper ways to write tests and proper debugger usage and let those things automate the problem of zooming in on outlier data, messy data, or bugs.

Anyway, there's plenty more to say, but it's probably long-winded enough.

reply

---

maus42 11 hours ago [-]

I like the idea of executable cells for developing.

It's been a couple of years since I played around with Jupyter notebooks, but then I got frustrated quite soon:

Stepping outside my preferred editor (vim) was annoying, likewise I had no idea if (how?) I could export the code out of notebook to a regular .py text file (without clumsily copypasting each cell). And the .ipynb files itself seem quite terrible to manage in Git: write something and if you want to 'git diff' the changes, sometimes what you see is okay, sometimes it's not very... easily decipherable. (Text encoded image/png's?!)

My preferred workflow with R is to open the files I'm working with in both RStudio (with vim keybindings) and vim. Maybe if I just strictly restricted the use of notebook cells just for calling scripts and functions written in external files, and use the notebook as a whole just for presentation purposes, but then it wouldn't be that much an improvement over regular IPython. Any thoughts?

reply

j2kun 11 hours ago [-]

There's a one-line terminal command to convert to. It's like `jupyter nbconvert -o python` or something. Works okay out of the box, and has some meager customizability.

reply

takluyver 4 hours ago [-]

Yep, it's 'jupyter nbconvert --to python'. Docs here: http://nbconvert.readthedocs.io/

It actually has a lot of customisability if you're willing to get into Jinja templates.

reply

denfromufa 11 hours ago [-]

Notebook provides keyboard shortcuts also accessible and searchable from command palette. There is also ipymd to write notebook from your editor using markdown. Pycharm supports notebooks and maybe you can use IdeaVim?? You can export notebook to .py file from file menu. For git, just don't save the output, which can be removed from cells menu. See ipywidgets repo for notebook examples without outputs.

reply

---

https://alpha.trycarbide.com/

---

on carbide:

"

chvid 8 hours ago [-]

(First congrats on a great looking project and really good copywriting/teaser.)

I think this sort of highly interactive environment is best suited for languages without side effects at all.

I.e. purely functional language as you see it in a spreadsheet.

Allowing any language (JS, Python and so on) gives problems with how to handle errors, show variable values that change over time, how to handle programs that take a long time to evaluate (or never evaluates), how to handle programs that modify the environment (i.e env.x = env.x + 1) and so on.

reply "

---

i don't understand this but: "The environment we're building is highly reactive. It doesn't just compile code and then run it; if the computations produce new formulae, those have to go back to the top level, get analyzed again, and then be computed in turn. That sort of re-entrancy turns out to be a big deal for spreadsheets, and most of the important constructs depend on it. "

a comment by https://news.ycombinator.com/user?id=gruseom

---

https://harc.ycr.org/flex/

---

https://news.ycombinator.com/item?id=14032201

---

https://airtable.com/

---

jupyter notebook visualize visualization https://hacks.mozilla.org/2019/03/iodide-an-experimental-tool-for-scientific-communicatiodide-for-scientific-communication-exploration-on-the-web/

mrkstu 15 hours ago [-]

Instead of being targeted directly at scientific markets, I'd love to see a more generalized 'smart document' version of this. The promise of HyperCard?, Glue, OpenDoc? and even OLE of compound, programmable documents is something I've been looking for for decades.

I'd love to see someone come up with a path taking something like this or Jupyter and have it target a wide market and extend the capabilities in an accessible way. I think you'd blow half steps like Airtable out of the water.

reply

pablobaz 22 hours ago [-]

"Over the next couple months, we added Numpy, Pandas, and Matplotlib"

The python data stack running in web assembly!

reply

paulgb 20 hours ago [-]

And it's also available standalone for other applications!

https://github.com/iodide-project/pyodide

Hats off to the team, this is some really neat stuff and I'm happy to see it coming from Mozilla.

reply

---

[1]

---

[2]

---

leecarraher 2 hours ago [–] ...

Excel is no longer motivated by the original intention of a spreadsheet, and now caters to the lowest common denominator, a piece of graph paper. As such MS has shifted focus from doing calculation to text and graphics layout tool. white text copied from a terminal : white text white background you got it! comma separated numbers : default a long string with commas in it want a plot : it is in the insert menu for some reason, since plots and numbers are no longer excels raison d'être

jkaptur 2 hours ago [–]

The date parsing being discussed has been the behavior for at least 20 years, probably more like 30.

Your comment about graph paper echoes a comment from a former Excel PM: "The gridlines are the most important feature of Excel, not recalc."

https://www.joelonsoftware.com/2012/01/06/how-trello-is-diff...

---

a dataframe is like a spreadsheet, but without (sometimes) hetrogeneity, formulas, UI

---

" In the early days of computing, we quickly came up with the spreadsheet and the user-friendly relational database (Access, FileMaker?). "

johnorourke 4 days ago [–]

We have not done poorly. Atall. "no-code" is a whole movement now - it's the modern day COBOL. See: * Airbase (imagine circa 1998 Microsoft Access but on the web and with integrations) * Infinityapp.com * Monday.com (easily recreate your business processes and types of data, and integrate with other systems) * Asana * ....dozens more.

Then add the universal integration tools: * IFTTT * Zapier * Tray * ...and many more

jugg1es 4 days ago [–]

Microsoft Access is a database aimed at non-programmers. I'm not really sure why you would care if it's cloud-based or not if you aren't a programmer though. You could have Access save your database file to One Drive or something. Access also integrates with SQL Server and you can probably get a free tier one on Azure.

reply

chaostheory 3 days ago [–]

Microsoft is aiming to replace it with their easy to use Power App platform. Plenty of people and companies are working on this problem

reply

uses 4 days ago [–]

Airtable

reply

johnorourke 4 days ago [–]

Airbase.

reply

--- https://www.notboring.co/p/excel-never-dies

okay read (some good points, but a little verbose)

some points: excel and hypercard are some of the only examples of generic/low-code tools that took off i would add Borland Delphi, i hear good things about it (also, mb a db one? Visual FoxPro?? Filemaker? MS Access?)

(some of) Excel’s Limitations:

discussion: https://news.ycombinator.com/item?id=26386419

---

seems to me that provenance (as they mean it) and version-control (and the lack of unit testing, and the lack of comments) could all be solved by: - a textual "behind-the-scenes" "code view" of the code in each cell of the spreadsheet (e.g. like that inverse-color programming view i remember from childhood) - and, each piece of data becomes a pair (data, metadata for provenance e.g. which transformations were applied to get this)

would also be nice to have a 'jupyter workbook' view, too.

and you could have other views (e.g. a wiki-type view) but then you're getting into everything-in-one-app territoriy...

---

aarondia 1 day ago [–]

Excel, and more importantly, the spreadsheet is the best way to build intuition for a dataset, hands down. The alternatives, especially when it comes to Python and the default pandas output in a Jupyter Notebook are horrendous.

Where Excel falls short, is data size limitations + auditability. Putting more than 1M rows of data into Excel is not possible, and once you get into the low 100K's, it becomes almost unbearable. And handing off an Excel workbooks to a colleague is handing them hours of cell dependency tracing. On the other hand, data size + auditability are the super powers of Python data analysis.

I've been building a Python package, Mito (https://trymito.io/), its an interactive spreadsheet that automatically converts your spreadsheet analysis to the equivalent pandas code. You can write spreadsheet formulas, merge datasets, create pivot tables, etc. And because its implemented in Python, you can manipulate datasets with 10M rows of data with no problem. Our goal is to bring the intuitiveness of Excel data manipulation to Pandas.

reply

algorithmsRcool 1 day ago [–]

> Putting more than 1M rows of data into Excel is not possible, and once you get into the low 100K's, it becomes almost unbearable.

I dispute this. Yes, the normal spreadsheet view of excel will buckle under 1M rows, but excel has another feature called "Power Pivot" that is backed by an embedded database and scales into the high millions at least.

I've personally used excel on a dataset of 18M rows and PowerPivot? handled it just fine.

[0] https://support.office.com/client/Data-Model-specification-a...

[1] https://support.office.com/client/power-pivot-powerful-data-...

reply

stilisstuk 1 day ago [–]

Yes pp can handle data. But vba can not. And most spreadsheets contain vba for reporting and magic interfaces for managers. Slow and unmaintainable.

Any day: rmarkdown and csv

reply

---

loudmax 1 day ago [–]

IMHO where Excel falls short is in interoperability. Try processing .xlsx files in anything other than Excel and it can be painful. Let's see the new more open Microsoft really embrace competing on an even playing field that doesn't rely on ossified proprietary file formats.

reply

---

syntaxing 1 day ago [–]

Excel just works so well, almost too well. Wanna share some formulas? Excel. Finance calculator? Excel. A small (<2000 lines) database? Excel. Even with Python for Google Sheets, I sometimes want to rip my hair out when I use Google Sheets compared to Excel (though the Google Python is pretty useful for autogenerated sheets).

reply

scubbo 1 day ago [–]

> I sometimes want to rip my hair out when I use Google Sheets compared to Excel

Earnest, non-"gotcha" question - what is it about Google Sheets that you dislike or find irritating? I'm only an entry-level user for both, but I've found them of similar quality and functionality.

reply

jeanloolz 1 day ago [–]

I have extensive experience with both and the main benefit Excel has over Google Sheets (in my opinion) is the amount of rows you can handle all at once. With Excel you can manipulate 200k rows easily. The same can not be said with Google sheets due to the fact that it remains a cloud based tool. I found Google sheet to be enough though in 90% of my cases (may change depending on what you usually work on). The scripting ecosystem google sheets has is amazing (apps scripts and various python libraries) and is much stronger compared to Excel.

reply

---

dgdosen 1 day ago [–]

I think MS is making a postitive step in allowing loading js/ts libraries via node as part of it's programming model compared to just using VBA.

This could be very powerful. It would help Excel to be repurposed to potentially something greater....

Of course, Google Sheets and Apple Numbers should tap into that same functionality...

reply

---

codesnik 18 hours ago [–]

Spreadsheets, as they incarnated in Excel are akin to running old style no-named-subroutines, only-two-chars-per-variable-name, goto-line-number BASIC but on modern hardware. Ideas that could potentially grow into something safe and powerful cannot grow now because of market dominance of that outdated monster.

And I keep hearing the same arguments, like "it can be used without knowing how to program" - but have you actually seen those formulas they enter? How exactly it differs in complexity from, say, SQL? At least in SQL you'll have sanely named columns and can actually see logic, reading it a month later.

Mixing datasets and freeform reports in the same sheets is a design mistake. Having auto-conversion for data is a design mistake. Not having enforced row/column sets instead of just typing any formula anywere is a design mistake. All that leads to millions of wasted hours of human life just looking closely on rows and rows of numbers with squinted eyes, trying to figure where accidental keystroke had broken your data.

I really liked how MacOS? Numbers approached spreadsheets, before they were somewhat butchered for ipads and excel compatibility: spreadsheets were not "infinite", separating concept of tables from concept of pages; column and row headers automatically used as names in formulas instead of undescriptive A1:B2 format, making them actually readable.

Unfortunately it was also slow as hell.

Airtable is a good direction for data entry purposes by multiple people, unfortunately it's no good for even medium sized tables, and exporting is limited.

reply

pfundstein 17 hours ago [–]

Tables are, of course, an excellent way of organizing data. Now add in the easy point and click UI and simple formulas to get you hooked, and soon you find yourself deep in VLOOKUPS and macros with no way back. Excel eases you into data organisation and manipulation. Many tools are better for any given task, but Excel excels at generalising.

reply

Robotbeat 6 hours ago [–]

That’s what’s so awesome about it. All the elitist notions of what a programming language or database cannot be or has to be, about speed, best practices, standards, safety, harmful functions like GOTO, etc, fall to the wayside with a pragmatic interface that gets rid of all the required cruft and toolchains, just giving all the users direct access to power. And practically everyone in a desk job uses it occasionally. It’s as ubiquitous as coffee.

It’s a refreshing rebuke to modern convoluted toolchains, binary signing, prohibitions against doing this or that, all the ceremony of modern programming. It goes directly against the trend to make computers increasingly locked down in functionality, mere appliances for passive consumption by and harvesting of data from the masses. It’s like giving everyone in a desk job a Leatherman multitool and a lighter. A bicycle for the mind in actual truth. Permissionless innovation at its finest. Excel! Excelsior!

reply

---

eropple 2 hours ago [–]

The thing I've always had trouble with from there is extending and versioning domain logic, both for new features as well as for rollback. What's worked for you here?

reply

punnerud 2 hours ago [–]

I use a lot of views for abstraction and backward compatibility.

Rollback is always difficult on live systems, especially when the data propagate to integrations. I try to always have 'first created' and 'last changed' time stamp, and first created and last changed user. If the job is automatic the “user” should be traceable back to the individual job, not some generic system user.

reply

---

http://web.eecs.utk.edu/~azh/blog/notebookpainpoints.html https://news.ycombinator.com/item?id=22164916

---

[3] recommends this "Responsive compilers" talk at PLISS 2019, saying "In that talk he also provided some examples of how the Rust language was accidentally mis-designed for responsive compilation. It is an entirely watchable talk about compiler engineering and I recommend checking it out." slides: https://nikomatsakis.github.io/pliss-2019/responsive-compilers.html#1

in the talk, Matsakis suggests using the Salsa framework (spreadsheet-y updates; "a Rust framework for writing incremental, on-demand programs -- these are programs that want to adapt to changes in their inputs, continuously producing a new output that is up-to-date").

---

" var b = 1 var c = 2 var a = b + c b = 10 console.log(a) 3 (not 12 because "=" is not a reactive assignment operator)

now imagine you have a special operator "$=" that changes the value of a variable (executes code on the right side of the operator and assigns result to left side variable) not only when explicitly initialized, but also when referenced variables (on the right side of the operator) are changed var b = 1 var c = 2 var a $= b + c b = 10 console.log(a) 12 " -- [4]

---

https://www.cell-lang.net/

---

https://github.com/jamii/dida/blob/main/docs/why.md

---

7 carlmjohnson 23 hours ago (unread)

link flag

Excel is a local maxima, which sucks because it’s not good enough. Ordinary people can use Excel, which is great, but then the date type is actively harmful, which is insane. It mangles zipcodes in spite of it having been made by a US corporation for its whole existence! Like, I get it, sometimes you mangle foreign conventions due to unfamiliarity, but all of New England has its zipcodes mangled. That’s bad! And then because Excel is a local maxima, new products like Numbers and Sheets clone it instead of searching for a new maxima. It’s a pity because we can definitely do better.

    ~
    jonahx 19 hours ago (unread) | link | flag | 

Not disagreeing at all, but curious what you think some of the big changes would be if a great designer did a total rethink, but incorporating the good stuff that works?

    ~
    david_chisnall 5 hours ago (unread) | link | flag | 

Take a look at Quantrix Modeller. Lotus had two spreadsheet products:

    123, which was a VisiCalc clone. It used a rectangular grid because they thought that it would appeal to accountants.
    Improv, which had a clean separation of data and formulae and used pivot tables as its core data type. You’d define a new column as a single formula, rather than copying and pasting. This is the one that accountants actually liked.

Excel is a 123 clone, as are most other spreadsheets. Quantrix Modeller is, as far as I know, the only surviving Improv clone. They have some great videos about why this model is better. It’s less error-prone, easier to change, and so on.

When most people say that they want a spreadsheet, what they actually want is a database with a rich set of numerical library routines.

~ carlmjohnson 17 hours ago (unread)

link flag

I’m not imagining a big redesign. Just basic stuff: add types for timestamps, civil times, durations, locations, currency. Have a difference between something’s type and its display. Fix the 1900 leap year bug FFS. Default to not clobbering the next cell when something doesn’t fit. You could still have freeform A1 cells but you should push users towards using it like a database with proper rows and columns as much as you can. Design the app as though it were the most commonly used tool in business and science analysis and not whatever Dan Bricklin happened to think of in 1979.

---

notagoodidea 11 hours ago (unread)

link flag

I switched from python scripts for ad hoc ans throw away scripts as my first reflex to try to solve those situations at work with GSheet or Excel. Most of the the time I have to match and validate some datasets on an ad hic basis. GSheet is enough 80% of the time when there is no big issue that a bit of regexreplace, vlookup and some magic sprinkles around. Python is now for when datasets are really messy and I have to begin to seriously modify them or begin to do fuzzy matching, etc. I also saw my partner cutting down a manual painful process from two weeks to three days by learning more Excel and power query and etc without no programming background and no interest to learn programming neither.

I think power query interface and some other tools is something to explore as a way to bring more programming power in small bits besides the no code trends etc. There is a sweet spot in the tabular form that seems to make it easier to abstract for people with no programming experience. Reading a lot about APL/J/K and seeing the same kind of parralel than the author is making reflects on how those approaches could be fully merged together.

---

df 24 hours ago (unread)

link flag

It doesn’t mention pivot tables! Pivot tables let you do a lot of the statistical calculations you could do with R or Pandas or SQL, but in a graphical way and much more quickly. Surprisingly, as far as I know, no other spreadsheet has them even though they’re really a killer feature.

    ~
    alerque 12 hours ago (unread) | link | flag | 

LibreOffice? Calc has them too. Others have noted Google Sheets and Apple Numbers. What other speadsheets were you thinking of exactly? Maybe before being “surprised” that no other software except the thing you use has something at least do a modicum of research to see if it just your knowledge that is lacking or really every all the other software.

~ swehren 23 hours ago (unread)

link flag

Google Sheets has them, although they keep changing the name. I think it’s called data explorer now?

---

rtpg 5 hours ago (unread)

link flag
    Excel reminds me a lot of when I hear about bespoke programming languages that only have global variables, or that one architecture that had a fixed size stack of up to 10 characters. There’s so much power and things going on and potential but the base idea of “all of this goes on an infinite grid” is soooooooo bizarre in this day and age to me.
    Like OK I can go to a name manager(?) and name physical areas in my infinite grid (??) to store values that are also presentation (???).
    This reminds me of Rail, an esoteric language where you are literally drawing control flow in a 2 dimensional space to move calculation around. To me, Excel’s model feels just as silly. Just a big 2d array.
    Meanwhile everyone tries to remake Excel because its so big that we all seem to miss the fact that when people use Excel they have to work around the fact that you’re on a global grid (yes I know about worksheets). Meanwhile there is a potential MVP that just … that just introduces the idea of storing values in names and referring to things that way.
    We have the technology! We somehow got everyone to learn the box model with CSS! Why are we not making spreadsheet programs at a bit of a higher level of abstraction?
    (I think it’s good and cool to learn about how to use Excel, just want a version that knows about scoping or something)

---

mamcx 4 hours ago

root parent next [–]

> what is a good alternative to Access (or Fox, I add)

Nothing.

Access is(was) in fact a worse alternative to Fox:

Fox/dbase is the only data-oriented language that was relatively popular and fit for the use-case.

This is by a mile the main point: Is a desert looking for languages that are made for business app/data oriented programing (and much harder looking for something not weird).

The main options: Fox/dBase/Informix(? not remember), kdb+, Cobol, SQL(when extended as store procedure lang with loops and that)

--

This point is big. Having a good form builder (that is already rare) is not enough to be a real contender for this space. You need a language where making queries is truly nice.

In short, you need a language that is `LINQ/Relational` as first-class end-to-end.

This is the lang I trying to build: https://tablam.org

reply

no_wizard 4 hours ago

root parent next [–]

Looks alot like Lua!

and thats not a bad thing per se

reply

---

www.marktechpost.com/2023/10/21/researchers-from-ucsd-and-microsoft-introduce-coldeco-a-no-code-inspection-tool-for-calculated-columns/ Researchers from UCSD and Microsoft Introduce ColDeco?: A No-Code Inspection Tool for Calculated Columns