Bayle Shanks's website: notes-computer-webDevelopmentFramework

see also [notes-computer-programming-webDevelopmentFramework]

The typology of web frameworks:

"convention over configuration" vs "no magic": does the web framework try to provide reasonable defaults to make your life easier, or does it try to have a clean, understandable implementation so that if you need to do something unusual it doesn't take too long for you to figure out how to do it
big vs small: does the web framework try to provide a lot of functionality, or only the basics?
integrated database vs not: does the web framework provide functionality that integrates with the backing database, or does it free you to choose whatever database you want even a simple key-value store?
focus is on server-side vs on client-side: does the web framework focus on doing things on the server or on the client?

examples:

Rails: convention over configuration, big, integrated db, server-side
Sinatra: ?, small, ?, server-side
Django: no magic, big, integrated db, server-side
Flask: no magic, small, no integrated db, server-side
angular: ?, big, no integrated db, client-side

(the ?s tend to mean that i don't know enough about something, not that there is no answer)

woah this is really out of date! todo

A web development framework is a tool for programming dynamic websites. This page collects some of my thoughts on some existing web development frameworks. I've never operated a high-traffic website so the following may be all wrong.

summary

My first choice for a new project would be to use Ruby on Rails on Amazon with Elastic Beanstalk. The database would be either Amazon DynamoDB? or Amazon RDS. DynamoDB? isn't described below but it's like Cassandra without secondary indices. I'd choose DynamoDB? if transactions and indices were unneeded and if huge write throughput may potentially be needed later, and I'd choose RDS otherwise.

If DynamoDB? was chosen and if the project turned into a big successful company later on, then i'd switch to Cassandra (DynamoDB? and Cassandra are not API-compatible but their architecture is so similar that it seems that it would be easy to port from DynamoDB? to Cassandra, unless you were using DynamoDB?'s conditional write functionality).

AWS (Amazon Web Services)

Today this is the standard for IaaS? (infrastructure-as-a-service) cloud services that need to scale horizontally (for example, to provide high availability). 'IaaS?' means that AWS gives you (virtual) servers and you can install whatever you want on them and administer them yourself (as opposed to PaaS? offerings like Google App Engine (see below), where you only program a module that is called by Google's framework); 'scale horizontally' means that you can increase capacity by renting more servers, instead of by renting bigger servers; 'high availability' means that you have automated redundancy so that when one of your servers fails for whatever reason, another one quickly takes its place.

Here's a great (very) introductory guide to AWS: https://www.airpair.com/aws/posts/building-a-scalable-web-app-on-amazon-web-services-p1?wed

Some further notes:

"You can have a $ limit. Use a CloudWatch? alarm that shuts down your instances if the billing metric exceeds the growth rate you specify for the period you specify." -- https://news.ycombinator.com/item?id=8927711
Create a Billing Alarm to Notify You If Your Usage Exceeds the Free Tier
"I might suggest a emphasising a few things such as not having keys on login accounts(they negate multi-factor auth if leaked), and to ALWAYS pick or create a new IAM role if you aren't sure an existing one fits for the EC2 instances." -- https://news.ycombinator.com/item?id=8927805

(Python or Java) Google App Engine

Google App Engine (or Appengine, or GAE) is awesome -- they'll supposedly take care of all the ops headaches if you scale, and they have an integrated deployment environment with logging, deployment, database connectivity, and an admin/debug console all set up for you. They also force you/teach you to code your database accesses in a scalable style. The downsides are:

due to the latency and expense, it isn't practical to run jobs over a large fraction of the database, especially not if you want to extract all data from the database and run the job elsewhere. https://groups.google.com/forum/?fromgroups=#!topic/google-appengine/-FqljlTruK4 http://www.moneytoolkit.com/2012/08/the-problem-with-google-app-engine/ . It would be nice if there were a cheap way to mutate a large proportion of the entire dataset as a batch job.
it's a little slow, unreliable, and expensive (see http://www-cs-students.stanford.edu/~silver/gae.html . My guess is that the slowness is a consequence of the usage of the 'Megastore' database, which is a No-SQL style database whose defining feature is a limited form of transaction -- the possibility of doing transactions may come with a substantial speed hit. I wish they would get the latency down and the uptime up. I haven't looked at it for awhile, so perhaps they have.
not total control -- you're stuck in their environment, and if you need to do something weird they didn't think of (for us, it was allow us to support a simple custom protocol for large XML file uploads from a client) then you're out of luck. If you need a non-pure-python library that they haven't ported to their platform, you're out of luck. As time marches on the list of things they haven't thought of is growing smaller, however.
Google has a nasty habit of unpredictably abandoning products, and unpredictably raising prices on products. In the csae of Appengine, at least they promise to give you 3 years of warning before killing it. It's clear that Amazon's AWS is a huge success and won't be killed for longer than 3 years -- that's not clear with GAE. I wish they would promise to keep it around for a decade, or six years at the very least.
you're stuck in their environment -- if you need to get a bunch of data out for analysis or to migrate, it can be painful (see http://www-cs-students.stanford.edu/~silver/gae.html ). Hopefully they've addressed that since then but i dont know how to be sure (actually, apparently they have, unofficially: http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data/ ).
you can't trust their self-reported uptime measures (see http://www-cs-students.stanford.edu/~silver/gae.html ), because sometimes various services go down without the whole thing going down, and they don't always report that. I wish they were better about reporting these, hopefully they have gotten better since i last looked.
you'll be pretty tightly wedded to them, requiring a significant rewrite to migrate off later if you want to do that. I wish AppScale? were officially supported.
it doesn't support Ruby on Rails.

Many of these remained as of Aug 2012 according to: http://www.moneytoolkit.com/2012/08/the-problem-with-google-app-engine/

My guess is that no business will want to host a primary product on Google App Engine for these reasons (again, see http://www-cs-students.stanford.edu/~silver/gae.html for more), unless at the least the latency and uptime for standard requests, and cost and speed for updating large datasets, have improved since i last checked (and probably unless they extend the length of the won't-kill-it-for-3-years promise) (if anyone working on Appengine is reading this, the latency may be a hard problem, but at least two important elements could be fixed by managerial fiat; cap the cost of any single mapreduce job to something similar to what the analogous operation would cost if you were on AWS, and promise that GAE will stick around for 10 years). However, for personal projects that do not involve lots of system-wide analysis or system-wide updates to data (e.g. where each user has a little bit of data and it's very rare that you need to touch all the users), that you want to be slightly scalable just in case, for prototype products that you want to put up publically but that you're not sure that you want to actively improve unless they catch on, for internal sites, or for less important applications that you wouldn't spend time optimizing anyways, Google App Engine seems like an ideal choice. Another reason to prototype in App Engine is that it forces you to use horizontally scalable database access patterns such as no joins. I'm not sure, but i don't think Google itself uses Google App Engine for any of its core product offerings, but i think it uses it for some internal sites and some low priority offerings (such as http://www.google.com/moderator/ ).

If you can stand these caveats, and if you don't like Ruby on Rails, then choose Google App Engine. I loved many things about it and i wish i could use it for everything. Unless you are already experienced at this sort of thing, it may save you a bunch of time not figuring how to set stuff like monitoring and admin consoles up, and not initially worrying about keeping your site up under load.

If you use it, watch the App engine blog -- http://googleappengine.blogspot.com/ -- they've been very good about improving Appengine and getting rid of almost all of the gotchas that used to plague it in the past, save the ones above, and when they annouce an update it almost always has some awesome new functionality.

If you can't stand those caveats, then read on.

(Ruby) Grape

The only API-centric framework i found. i like it.

later: now there's https://github.com/rails-api/rails-api , http://blog.steveklabnik.com/posts/2012-11-22-introducing-the-rails-api-project . i haven't thought too hard about rails-api, but so far i think i prefer Grape -- you want low-level control of stuff when writing your API, and also Grape's DSL for API routing is helpful. p.s. i haven't looked at Grape for awhile but i just looked at it again -- it's being actively developed and it's gotten even better since the last time i looked at it. Go Grape!

(Python) Flask

Minimalist Python framework. I like it.

(Ruby) Sinatra

Minimalist Ruby framework. I like it.

(Ruby) Ruby on rails

The most popular kitchen-sink framework (popular not in terms of numbers, that would be some PHP thing, but in terms of some sort of subjective 'mindshare').

js

Ember and Angular and Knockout seem to be the popular ones right now. Also note Node.js for the server-side. Not sure which is best. Never used any of them. See http://addyosmani.github.com/todomvc/ .

(Python) Django

Kitchen sink Python framework. I've never used it so i don't know whether to recommend it. My inclination is to say that if you want a kitchen-sink framework (and i think you do if you want a traditional, server-generated, dynamic website that spans many webpages), you should use Ruby on Rails instead.

Which language?

I don't have enough experience to say when to use javascript.

I'd say if you are making a traditional, server-generated, dynamic website that spans many webpages, use Ruby and Ruby on Rails. If you are making an API, either one is good (Ruby's Grape is particularly nice). If you are making a small 'app', either Flask or Sinatra is good.

However.. if the API is a core part of your offering, rather than just an add-on that only a few users will use.. then i think you should have your frontend itself go thru the API, rather than directly access the db. I'm not sure if Ruby on Rails still has a lot to offer in that scenario. Maybe it does, i'm just not sure. See http://stackoverflow.com/questions/10941249/separate-rest-json-api-server-and-client , http://yetimedia.tumblr.com/post/35233051627/activeresource-is-dead-long-live-activeresource

More programmers know Python than Ruby but more junior web developers seem to know Ruby. However Python is being taught in a bunch of schools and grad students use it, so that may change. Python is an easier language to learn or to read if you don't know it.

So i guess my advice regarding which language is a little conflicting. Sorry, i can't make up my mind. If i had to choose today, for an application in which the API was important, i'd make the API in Grape and then the website in Ruby on Rails using ActiveResource? models.

Which database?

If you chose Google App Engine, you use their database. That's the whole point, because the database stuff is (so i'm told) what is hardest to scale.

If you aren't too worried about scaling to high numbers of simultaneous writes, and you just want to use the standard thing, use Postgresql. The big caveat with SQL is that it's hard (but not impossible if you have a lot of money) to horizontally scale to very high numbers of simultaneous writes without sharding.

If you aren't too worried about scaling to high numbers of simultaneous writes, and you need transactions, use Postgresql.

If you aren't too worried about scaling to high numbers of simultaneous writes, and you want the database to enforce schema-based consistency conditions on data in the database, use Postgresql.

If you aren't too worried about scaling to high numbers of simultaneous writes, and you don't need transactions, and you want to use document-oriented database queries (see http://docs.mongodb.org/manual/applications/read/#find ) to make it easy to agilely change the schema as you develop, or because you want to use a lightweight implementation of map/reduce ( http://docs.mongodb.org/manual/applications/map-reduce/ ), use Mongo. Two caveats with Mongo are (a) no joins, (b) no transactions. There's probably others, i don't know much about it. Apparently there are some operational issues: https://www.google.com/search?client=ubuntu&channel=fs&q=mongodb++caveat

If you aren't too worried about scaling to a huge database, or you can shard, and you could use the useful and interesting operations that it provides (but only within each shard), use Redis. The big caveats with Redis are (a) the keys for your data set must fit into your server memory, (b) no joins.

If you are worried about scaling to a high number of simultaneous writes, and you don't need transactions, and you're willing to spend more time developing because you will be using a simple key/value store, use Cassandra. The main caveats are that (a) no joins, (b) no transactions and (c) less popular (so less libraries, and less tested) than the others listed here. In theory Cassandra could even be a little easier to handle operationally than SQL because you don't have to deal with setting up masters and slaves and failover, but the potential immaturity of the platform kinda cancels that out -- also you need to make sure that you have enough nodes in your Cassandra ring so that if one goes down during high traffic, you don't experience cascading failure due to the load shifting from the dead node to the other nodes, pushing them over the brink too (this problem exists with any horizontal scaling setup, though).

It's said to be hard to switch data models after your application starts scaling. The 'safest' model in terms of horizontal scaling is one without joins (because it's hard to scale joins over multiple shards). If you know you will want to scale in this fashion eventually, you may want to make 'no joins' a rule from the beginning. Using Cassandra, Redis, or Mongo may help you to impose this discipline upon yourself or your team (or Google App Engine, but then that's hard to switch away from in any case).

Another way to put it:

If you need joins, use Postgresql. If you need transactions, use Postgresql unless your dataset will remain small (in which case you can consider Redis also).

If you only need a simple key-value store and you want scalability, use Cassandra.

If total size of the keyspace of each shard of your dataset will remain small, consider if you'd like Redis's unique features e.g. computing set intersection, union and difference; getting the member with highest ranking in a sorted set (Redis is also fast).

If you want to reduce development time by using a document-oriented datastore with lightweight map/reduce, at the cost of potentially more painful scaling than Cassandra, use Mongo.