Deployment, ops, devops tools

Well there's a bunch of stuff here that i don't understand (and some that i do).

databases

embedded:

SQLite
Berkeley DB
see https://www.quora.com/What-is-the-difference-between-SQLite-and-Berkeley-DB ; "Berkeley DB is meant for applications that require larger datasets or concurrency. In these types of use cases, Berkeley DB will outperform SQLite....if you want the highest performance and don't need/want the structure SQL provides, or if you'd like to implement your own object-oriented style of data access, go with BDB. If you already know SQL and are comfortable with it, have a small dataset and no need of high-perf concurrency, SQLite is probably what you want...BerkelyDB? is a simple key->value store. Sqlite is an embedded database that supports structured query language that implements most of the ANSI SQL specifications from 1992....sqlite offers much richer functionality (SQL support for inserts, update, deletes, selects, etc) but BerkeleyDB? scales better with very large sets (10M+) of data."

sql: Postgres (SQL) mysql, open fork mariadb

redis

column stores and augmented key-value stores:

cassandra
dynamoDB (AWS service)
riak
HBase (lower-level component of Hadoop)
voldemort
memcachedb (dead?)
vertica

document-oriented:

mongodb
couchdb (and BigCouch) (related product: Cloudant). PouchDB is javascript variant (that can still sync with CouchDB?)
couchbase (related to: couchdb, memcached; used to be called: membase) "Couchbase Server provides on-the-wire client protocol compatibility with memcached,[2] but is designed to add disk persistence, data replication, live cluster reconfiguration, rebalancing and multitenancy with data partitioning." "Couchbase is a CP type system meaning it provides consistency and partition tolerance."
rethinkDB

synchronization between often-offline clients:

couchdb

distributed filesystems and cloud storage:

HDFS (Hadoop distributed file system)
S3 (AWS service)
Ceph
Quobyte

More unusual use cases:

graphdbs
- neo4j
- orientDB
- HyperGraphDB?
tuplestores eg triplestores for RDF triples
very fast in-memory databases, often key-value or column stores
- kx / kdb
- onetick
- voltdb

caching

memcached

servers and services

VPS:

linode
digitalocean

IAAS:

amazon AWS
google compute (about $1/day per node)
Microsoft Azure
software
- openstack (AWS compatible open source IAAS implementation?) "OpenStack? is a free and open-source software platform for cloud-computing, mostly deployed as an infrastructure-as-a-service (IaaS?).[" "OpenStack? is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface." "When normal people talk about OpenStack? they are talking about a series of open-source storage and networking projects attached to the KVM hypervisor and some management tools." [2]
- Apache CloudStack
- the Triton Docker cloud
Joyent's Triton-based Docker cloud
Google Container Engine (Kubernetes-based hosted service on top of Google Compute; free for 5 nodes or less; "Google Container Engine (https://cloud.google.com/container-engine/) is essentially hosted Kubernetes." [3])

PAAS:

Heroku
- note: you can ask Heroku to create a containerized version of your app so that you can run it locally: https://devcenter.heroku.com/articles/docker
Google AppEngine?, open fork AppScale?
Microsoft Azure
Cloud Foundry (open-source software; then there is a propriety extension Pivotal Cloud Foundry; and this is sold as a service Pivotal Web Services)
IBM Bluemix (PAAS build on top of Cloud Foundry)
?: dotcloud (used to be run by the same company that did Docker)
Engine Yard (RoR?, PHP, Node.js)
?: RackSpace? appears to offer something corporatey here but i don't know exactly what. Started OpenStack? with NASA.
RedHat OpenShift. Now Docker and Kubernetes-based.
Deis based on Docker and CoreOS? with a 12-factor app philosophy. Open-source by Engine Yard.
Flynn. [4] [5]
Fabric8. Kubernetes-based, recommends OpenShift?. Java-focused.

Links:

https://labs.ctl.io/flynn-vs-deis-the-tale-of-two-docker-micro-paas-technologies/

lower-level components

zookeeper strictly ordered/strongly consistent "distributed configuration service, synchronization service, and naming registry" -- often used to implement locks/consensus in distributed systems, also service discovery
etcd: similar to ZooKeeper??
Consul: similar to ZooKeeper??

message buses / task queues

rabbitmq (AMQP)
celery
zeromq

big data stuff

In this context, data is not considered 'big' just because it's too big for a human to get a handle on, it has more to do with whether your data fits in RAM on a single computer. If your data fits in RAM you probably don't need these! And if not, you may be able to buy a fancy computer with more RAM at lower total cost than doing all future data analysis through these!

hadoop ("consultingware"; some big related companies: Cloudera and Hortonworks
- Hive provides HiveQL? query language. Also compatible with Spark.
- Drill is some other thing for "interactive analysis of large-scale datasets". It's an open-source version of Google BigQuery?.
- Pig is another higher-level tool for writing map-reduce queries on Hadoop
spark is somewhat like Hadoop but more focused on in-memory storage for faster queries?

full virtualization

Vagrant. "Vagrant is computer software that creates and configures virtual development environments.[3] It can be seen as a higher-level wrapper around virtualization software such as VirtualBox?, VMware, KVM," Linux Containers (LXC), and Docker, "and around configuration management software such as Ansible, Chef, Salt, and Puppet."
- note: Otto is billed as "the successor to vagrant" [6]
KVM. KVM are Xen appear to be the most popular open-source hypervisors. KVM might be more popular for most users, although Xen powers many of the largest clouds.
VMware. VMWare is a company known for its ESXi hypervisor, amongst other things, which is said to be "the de facto industry standard" [7]
Xen is a hypervisor which may be currently focusing on more 'enterprisy' things [8] eg powering Amazon EC2
Amazon EC2. Images for EC2 are called AMIs (Amazon Machine Images).
QEMU: virtual machine software. It can emulate various types of ISAs, such as ARM, MIPS, etc. It can emulate single programs or an entire OS. It can also run KVM or Xen (eg QEMU emulates the hardware, and KVM or Xen runs a guest operating system [9]).
QEMU/KVM . There is some relationship between QEMU and KVM [10]
Hyper-V. A hypervisor made by Microsoft.
VirtualBox is a "hypervisor for x86 computers"; that is, it runs a VM of an x86 machine on an x86 machine.

I've never really used any of these, but if i had to choose right now, i would say for most purposes use: KVM or Virtualbox, probably wrapped with Vagrant. For emulation of non-x86 machines use QEMU.

containers

Containers are 'lightweight VMs'; the application code is isolated into its container, but the operating system is not virtualized.

chroot jails: what we had LXC was invented; an application in a chroot jail sees the root of the jail as the filesystem root, and cannot access the rest of the filesystem.
LXC: Linux Containers: like chroot jails but with "more isolation" [11]. Similar to Solaris's Zones and FreeBSD?'s Jails [12].
docker: the most popular container solution as of this writing. Docker is at a higher level than LXC, and indeed Docker can use LXC under the covers

"Docker is first and foremost an image building and management solution. One of the largest objections to the “golden image” model is that you end up with image sprawl: large numbers of (deployed) complex images in varying states of versioning. You create randomness and exacerbate entropy in your environment as your image use grows. Images also tend to be heavy and unwieldy. This often forces manual change or layers of deviation and unmanaged configuration on top of images because the underlying images lack appropriate flexibility.

Compared to traditional image models Docker is a lot more lightweight: images are layered and you can quickly iterate on them. There is some legitimate argument to suggest that these attributes alleviate many of the management problems traditional images present." [13]

notes-computer-deployment