notes-computer-deployment

Deployment, ops, devops tools

Well there's a bunch of stuff here that i don't understand (and some that i do).

databases

embedded:

sql: Postgres (SQL) mysql, open fork mariadb

column stores and augmented key-value stores:

document-oriented:

synchronization between often-offline clients:

distributed filesystems and cloud storage:

More unusual use cases:

See also [1]

caching

servers and services

VPS:

IAAS:

PAAS:

Links:

lower-level components

message buses / task queues

big data stuff

In this context, data is not considered 'big' just because it's too big for a human to get a handle on, it has more to do with whether your data fits in RAM on a single computer. If your data fits in RAM you probably don't need these! And if not, you may be able to buy a fancy computer with more RAM at lower total cost than doing all future data analysis through these!

full virtualization

I've never really used any of these, but if i had to choose right now, i would say for most purposes use: KVM or Virtualbox, probably wrapped with Vagrant. For emulation of non-x86 machines use QEMU.

containers

Containers are 'lightweight VMs'; the application code is isolated into its container, but the operating system is not virtualized.

"Docker is first and foremost an image building and management solution. One of the largest objections to the “golden image” model is that you end up with image sprawl: large numbers of (deployed) complex images in varying states of versioning. You create randomness and exacerbate entropy in your environment as your image use grows. Images also tend to be heavy and unwieldy. This often forces manual change or layers of deviation and unmanaged configuration on top of images because the underlying images lack appropriate flexibility.

Compared to traditional image models Docker is a lot more lightweight: images are layered and you can quickly iterate on them. There is some legitimate argument to suggest that these attributes alleviate many of the management problems traditional images present." [13]

Note that root apps running in a Docker container have (some degree of?) root access on the host: http://www.itworld.com/article/2920349/security/for-containers-security-is-problem-1.html (but see https://docs.docker.com/engine/articles/security/ ).

Docker is said by some to be not sufficiently secure to sandbox malicious/untrusted code: https://news.ycombinator.com/item?id=10269560

Docker security guides: https://docs.docker.com/engine/articles/security/ https://benchmarks.cisecurity.org/tools2/docker/CIS_Docker_1.6_Benchmark_v1.0.0.pdf

Joyent claims to offer a hosting service that can sandbox potentially malicious containers: https://www.joyent.com/blog/the-seven-characteristics-of-container-native-infrastructure

Docker has various components besides the core:

Links relating to Docker (toread):

distributions

I don't know how these compare. [17] sounds like it is saying that NixOS? is more advanced, and i occasionally hear comments about ppl thinking NixOS? is 'the future of package management'. CoreOS? seems to have more uptake though, eg Deis.

cluster orchestration/provisioning

[26] recommends Kubernetes, Swarm, Deis, or Mesos, and Quay for a Docker repo ("registry"), and says, "Kubernetes, Swarm and Mesos handle the orchestration portions only, while Deis is a more feature-complete solution that handles the CI and registry portions as well"

[27] and [28] appears to opine that Swarm and Compose are too low-level/not declarative enough, and that something like Kubernetes should be used instead, or something on top of Kubernetes, such as OpenShift? or Deis. [29] recommends Deis.

[30] says (Docker) "Compose isn't great for deploying services but it really shines for command line utilities. At Pachyderm our entire build and test system is based on compose."

" The Docker command is used for working with one image.

When working with multiple images, coordinating ports, volumes, environment variables, links, and other things gets very troublesome very quickly as you get into using a mish-mash of deployment and management scripts.

Compose aims to solve this problem by declaratively defining ports, volumes, links, and other things.

Compose does allow you to scale up and down easily but it doesn't do auto-scaling, load balancing, or automated crash recovery -- this is where Swarm comes in.

Kubernetes does what both compose and swarm do, but in a single product.

Both Swarm and Kubernetes are designed to accommodate provisioning of resources across multiple hosts and automatically deciding which containers go where.

Compose, Swarm, and Kubernetes are all things you can install yourself.

Tutum is far bigger and the scope of its usage falls well outside of what Kubernetes and the other's do, but suffice to say that it's more of a PaaS? than anything else.

Someone please correct me if I'm wrong, I'm not very familiar with Swarm, Kubernetes, or Tutum. " -- https://news.ycombinator.com/item?id=10504398

" Compose: Multi-container orchestration. You define an application stack via a compose file including links between containers (web front end links to a database). When you run docker compose up on that file, compose stands up the containers in the right order to deal with dependencies.

Swarm: Represents a cluster of Docker Hosts as a single entity. Will orchestrate the placement of containers on the hosts based on different criteria and constraints. You interact with the swarm the same way you would any single docker host. You can plug in different discovery services (Consul, Etcd, Zookeeper) as well as different schedulers (Mesos, K8s, etc). " -- https://news.ycombinator.com/item?id=10504365

"

Swarm and Kubernetes are definitely competitors.

Swarm is a container manager that automatically starts and stops containers in a cluster using a scheduling algorithm. It implements the Docker API, so it actually acts as a facade that aggregates all the hosts in the pool. So you talk to it just like you would with a single-host Docker install, but when you tell Swarm to start a given container, it will schedule it somewhere in the cluster. Asking Swarm to list the running instances, for example, would list everything running on all the machines.

Kubernetes is also a container manager. The biggest difference is perhaps that it abstracts containers into a few high-level concepts — it's not tightly coupled with Docker and apparently Google plans to support other backends — that map more directly to how applications are deployed in practice. For example, it comes with first-class support for exposing containers as "services" which it can then route traffic to. Kubernetes has a good design, but for various reasons the design feels overly complicated, which is not helped by some of the terminology they've invented (like replication controllers, which aren't program, but a kind of declaration), nor by its somewhat enterprisy documentation.

Kubernetes is also complicated by the fact that every pod must be allocated a public (or at least routable) IP. If you're in a private data center that already has a DHCP server set up, that's a non-issue, but in this day and age, most people probably will need an overlay network. While there are tons of such solutions — Open vSwitch (aka OVS), GRE tunnels, IPsec meshes, OpenVPN?, Tinc, Flannel (formerly Rudder), VXLAN, L2TP, etc. — none of them can be called simple. Of course, plain Docker doesn't solve this in any satisfactory way, either, but at least you can be productive with Docker without jumping into the deep end like Kubernetes forces you to do.

Docker Networking is a stab at solving the issue by creating an overlay network through VXLAN, which gives you a Layer 2 overlay network. VXLAN has historically been problematic because it has required multicast UDP, something few cloud providers implement, and I didn't know VXLAN was a mature contender; but apparently the kernel has supported unicast (which cloud providers to support) since at least 2013. If so, that's probably the simplest overlay solution of all the aforementioned.

As for Compose, it's a small tool that can start a bunch of Docker containers listed in a YAML file. It's unrelated to Swarm, but can work with it. It was designed for development and testing, to make it easy to get a multi-container app running; there's no "master" daemon that does any provisioning or anything like that. You just use the "compose" tool with that one config file, and it will start all the containers mentioned in the file. While its usefulness is limited right now (for example, you can't ensure that two containers run on the same host, unlike Kubernetes with its pods), the Docker guys are working on making it more mature for production use. " -- https://news.ycombinator.com/item?id=10503603

" bboreham 36 days ago

> If so, that's probably the simplest overlay solution of all the aforementioned

(I work on Weave)

Weave Net also lets you create a Docker overlay network using VXLAN, without insisting that you configure a distributed KV store (etcd, consul, etc.). So I would argue Weave Net is the simplest :-)

More detail here: http://blog.weave.works/2015/11/03/docker-networking-1-9-wea... " -- https://news.ycombinator.com/item?id=10503603

" I think it worth mentioning that setting up networking in Kubernetes is largely a problem for deployers - not so much k8 users.

There are a variety of hosted, virtualized and bare metal solutions available.

The CoreOS? folks have some nice all-in-one VMs if you want to experiment.

Google's hosted Container engine is about as simple as it gets - and very inexpensive (I have been playing with it for a few weeks and have spent about $20). " -- https://news.ycombinator.com/item?id=10503603

"

    Swarm has the advantage (and disadvantage) of using the standard Docker interface. Whilst this makes it very simple to use Swarm and to integrate it into existing workflows, it may also make it more difficult to support the more complex scheduling that may be defined in custom interfaces.
    Fleet is a low-level and fairly simple orchestration layer that can be used as a base for running higher level orchestration tools, such as Kubernetes or custom systems.
    Kubernetes is an opinionated orchestration tool that comes with service discovery and replication baked-in. It may require some re-designing of existing applications, but used correctly will result in a fault-tolerant and scalable system.
    Mesos is a low-level, battle-hardened scheduler that supports several frameworks for container orchestration including Marathon, Kubernetes, and Swarm. At the time of writing, Kubernetes and Mesos are more developed and stable than Swarm. In terms of scale, only Mesos has been proven to support large-scale systems of hundreds or thousands of nodes. However, when looking at small clusters of, say, less than a dozen nodes, Mesos may be an overly complex solution." -- http://radar.oreilly.com/2015/10/swarm-v-fleet-v-kubernetes-v-mesos.html

"

...Nomad, which claims the world but when you dig into the code pretty large advertised chunks are simply not there yet. Nomad seems like a much more straightforward implementation of the Borg paper, and one day may be interesting once they write the rest of it. A nice Kubernetes feature that is similar to what you can do with fleet is the “Daemon Set” which lets you run certain things on every node. Some cool Mesos features that are pretty new and haven’t been talked about much yet:

" Only Kubernetes has facilities for (load-balancing?). I'm not sure why these other lower level tools are being compared to it. Especially Mesos and Marathon, which are much lower level. " -- https://news.ycombinator.com/item?id=10439704

Kubernetes apparently offers persistent-disk-like support via a Ceph plugin [31] [32]. See also [33].

" geggam 48 days ago

How many people actually need this level of scale vs how many people are implementing this because its $BUZZWORD

ownagefool 48 days ago

It's not about scale, it's about manageability. ...

vidarh 48 days ago

"This level of scale" is very trivial for some of these. E.g. for fleet, the only dependencies are systemd and etcd.

If you have a reachable etcd cluster (which you can run on a single machine, though you shouldn't) and your machines run systemd, and you have key'd ssh access between them, you can pretty much start typing "fleetctl start/stop/status" instead of "systemctl start/stop/status" and feed fleet systemd units and have your jobs spread over your servers with pretty much no effort.

For me it's an effortless way to get failover.

E.g. "fleetctl start someappserver@{1..3}" to start three instances of someappserver@.service. With the right constraint in the service file they'll be guaranteed to end up on different machines.

amouat 48 days ago

You need orchestration even at small scale (say more than 2 nodes). Otherwise you have to manually take care of container scheduling and failover etc.

dberg 48 days ago

Exactly, the biggest problem with running apps in Containers, is you need to ensure you dont have your entire cluster sitting on one physical node (or virtual node i guess), services can be restart, auto-scaled, etc. Even at a small scale some orchestration and management is required " -- https://news.ycombinator.com/item?id=10438804

Links:

server management/deployment/configuration management

Ansible and Salt started out without configuration managers. Ansible and Salt may be slightly newer and simpler, whereas Chef and Puppet may be slightly older and have larger communities. Ansible and Salt have YAML configuration files, whereas Chef and Puppet use Ruby and a DSL, respectively.

Links:

secret configuration management

build tools

See also [proj-plbook-plChToolingImpl the tooling chapter in my PL notes].

CI and testing tools

Code review tools

Version control/Revision management tools

AWS tips

links to tips

links to AWS tutorials

todo

(nothing here)

" bkanber 3 hours ago

So, you build a web app and it gets popular. It needs one load balancer, 5 app servers, at least two database nodes for replication, a redis cluster for caching and queuing, an elasticsearch cluster for full text search, and a cluster of job worker servers to do async stuff like processing images, etc.

In the ancient past, like when I'm from, you'd write up a few different bash scripts to help you provision each server type. But setting this all up, you'd still have to run around and create 20 servers and provision them into one of 5 different types, etc.

Then there's chef/puppet, which takes your bash script and makes it a little more maintainable. But there are still issues: huge divide between dev/prod environments, and adding 5 new nodes ASAP is still tedious.

Now you have cloud and container orchestration. Containers are like the git repos of the server world. You build a container to run each of your apps (nginx, redis, etc), configure each once (like coding and committing), and then they work identically on dev and prod after you launch them (you can clone/pull onto hardware). And what's more, since a container image is pre-built, it launches on metal in a matter of seconds, not minutes. All the apt-get install crap was done at image build time, not container launch time.

Things are a lot easier now, but you still have a problem. You're scaling to 30, maybe 50 different servers running 6 or 7 different services. More and more you want to treat your hardware as a generic compute cloud, but you can't escape that, even with docker, your servers have identities and personalities. You still need to sit and think about which of your 50 servers to launch a container on, and make sure it's under the correct load balancer, etc.

That's where Kubernetes steps in; it's a level of abstraction higher than docker, and works at the cluster level. You define services around your docker containers, and let Kubernetes initialize the hardware, and abstract it away into a giant compute cloud, and then all you have to do is tell kubernetes to scale a certain service up and down, and it automatically figures out which servers to take the action on and modifies your load balancer for that service accordingly.

At the scale of "a few servers", Kubernetes doesn't help much. At the scale of dozens or hundreds, it definitely does. "Orchestration" isn't just a buzzword, it's the correct term here; all those containers and services and pieces of hardware DO need to be wrangled. In the past it was a full time sysadmin job, now it's just a Kubernetes or Fleet config file.

Disclosure: I'm currently writing a book on Docker. Disclaimer: I have not had my coffee yet.

Edit: Since someone asked, I'm writing a book called "Complete Docker" which will be published by Apress. I don't know the exact pub date that Apress will launch it on, but I expect it'll be available in October. " -- https://news.ycombinator.com/item?id=11915944

Security

Links for security

Debugging

Some server monitoring commands

Links for server monitoring commands

Links

todo

okay, so it sounds like a good idea to have both development and production environments declaratively specified (rather than have them be the accumulation of unlogged mutations by various commands that were run on them). This should allow for (a) consistency between environments running on different machines, (b) elimination of mistakes in which a sysadmin accidentally does something a little different on one instance, (c) automated installation and deployment.

What are the best tools for that?

how do NixOS?, CoreOS?, Docker, Vagrant, and Otto relate?

is there one (set of) tool(s) that works (possibly in VMs) (a) for both development and production, (b) on both desktop and cloud machines, (c) interoperates with all of the most popular IAAS cloud hosts, (d) interoperates with much of the most popular deployment/autoscaling/orchestration/monitoring/load-balancing/management frameworks (stuff like kubernetes, fleet, docker swarm, opsworks, terraform, mesos, etc)?

some potentially relevant links:

---