notes-computer-programming-softwareArchitecture

why do we call it software 'architecture'? Architecture (of buildings) as a source of metaphor of things like 'foundation', 'plan', relating to strategy and organization of design

decouple even at the cost of writing more code (useless passthru interfaces) -- but there is a limit

devilspie is better than gnome integration pts b/c the integration pts (title, etc) are configurable by an advanced end user in a decoupled fashion (ie without patching and recompiling, which no one wants to do)

Hints for Computer System Design by Butler W. Lampson

toread: https://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf discussion: https://blog.acolyer.org/2016/09/05/on-the-criteria-to-be-used-in-decomposing-systems-into-modules/

toread: http://akkartik.name/parnas.pdf


good read:

https://engineering.shopify.com/blogs/engineering/shopify-monolith

" exterm 1 day ago [–]

> When I left, we were up to a few dozen components, and the number was climbing rapidly.

I should have included this in the blog post: The number of components _needs_ to be kept small. Shopify's main monolith is 2.8 million lines of code in 37 components, and I'd actually like to get that number _down_.

I like to compare this to the main navigation that we present to our merchants. It's useful if it has 8 entries. It's not useful if it has 400. ... " -- https://news.ycombinator.com/item?id=24510932

"

octernion 2 days ago [–]

we are actually doing precisely the same thing at instacart (breaking our 1+ million lines of code monolith into discrete components, which we call "domains"), and typing the boundaries and as much of the internals of these domains as possible with sorbet types.

this has the benefit of ruby dynamicism (fast development within domains, you can use all the nice railsy tooling, activerecord, and all the libraries we've built over the years), with type safety at the boundaries (we've also put in timeouts, thread separation, and error handling at the boundaries).

the additional benefit for using sorbet is that it makes making typed RPC calls (over twirp or graphql) much easier as you can introspect the boundaries trivially.

really cool to see other companies evolving similarly given the same starting conditions!

reply

exterm 2 days ago [–]

There are quite a few people talking about this kind of stuff on https://rubymod.slack.com. I can send invites, just DM me on twitter https://twitter.com/_exterm

reply

dragosmocrii 2 days ago [–]

Slightly off topic, but does anyone know if this "component based" development is what umbrella applications are in Elixir?

reply

exterm 2 days ago [–]

It's certainly related. In very general terms, I would say splitting a Rails app into multiple engines is the same pattern as umbrella applications.

However, there are more interesting specifics here about things like all engines sharing a database, but having exclusive ownership of tables, as well as splitting HTTP routing over multiple engines etc.

reply

Arubis 2 days ago [–]

I think you'll also find a lot of conceptual overlap with Phoenix Contexts; they'll generally all start as part of the same monolith/app but are sufficiently discrete that you can separate them out more easily than the Rails situation in TFA.

reply

" -- https://news.ycombinator.com/item?id=24506078

---

"

---

https://danuker.go.ro/the-grand-unified-theory-of-software-architecture.html https://news.ycombinator.com/item?id=24915497

---

https://www.brandonsmith.ninja/blog/write-code-not-too-much-mostly-functions https://news.ycombinator.com/item?id=25500671

---

https://thereader.mitpress.mit.edu/habits-of-expert-software-designers/

---

https://timkellogg.me/blog/2021/01/29/cold-paths

---

https://matklad.github.io//2021/02/06/ARCHITECTURE.md.html

examples: https://github.com/rust-analyzer/rust-analyzer/blob/d7c99931d05e3723d878bea5dc26766791fa4e69/docs/dev/architecture.md https://caddyserver.com/docs/architecture

baby 1 day ago [–]

I have a similar advice, but I will go one step further: add README.md to other folders as well. It is dope to have a map of your whole system in an Architecture.md (or a README if it's not too long), but it's even more dope to be able to click through it and have submaps of how other components are structured.

Displaying the folder/file structure and explaining what is what is a must. An example from Diem[1]:

  consensus
  ├── src
  │   ├── block_storage          # In-memory storage of blocks and related data structures
  │   ├── consensusdb            # Database interaction to persist consensus data for safety and liveness
  │   ├── liveness               # RoundState, proposer, and other liveness related code
  │   └── test_utils             # Mock implementations that are used for testing only
  └── consensus-types            # Consensus data types (i.e. quorum certificates)
  └── safety-rules               # Safety (voting) rules

---

https://www.gresearch.co.uk/article/in-praise-of-dry-run/ https://news.ycombinator.com/item?id=27263136

---

some advice extracted from [2] (all of the following are direct quotes):

that blog post also implies that on balance, functions should be short, but there are many other factors to consider that may make them long

---

hackinthebochs 5 hours ago [–]

There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.

Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.

The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

reply

jzoch 3 hours ago [–]

> you have failed to sufficiently explain

This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.

I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.

reply

---

" > Law of Demeter

"Don't go digging into objects" pretty much.

Talk to directly linked objects and tell them what you need done, and let them deal with their linked objects. Don't assume that you know what is and always will be involved in doing something on dependent objects of the one you're interacting with. "

---

"

---

"Write Code. Mostly functions. Not too much. " [4] ---

try to validate inputs before doing anything side-effectful (my take on [5])

---

formatting style:

arguments for putting closing parens/braces on their own line:

https://lukasz.langa.pl/1d1a43c4-9c8a-4c5f-a366-7f22ce6a49fc/ https://lobste.rs/s/e4fxit/why_sad_face https://news.ycombinator.com/item?id=27267578

note: could also put closing parens/braces on their own line, but indented. I might prefer that.

---

"Instead of using length(list) > 0 - which traverses the whole list - use list.is_empty() which in Erlang you can implement using pattern matching with [_

_] which looks like a robot butt." [6]

---

" 1. Use immutability by default, even in languages that make that harder than it should be 2. Understand how liberating idempotency is 3. Divide and conquer, use abstraction to make hard problems manageable 4. Delete code, delete tests, re-write stuff, re-write it again. Painful? Keep doing this till you get to the other side and feel liberation and empowerment, took me years to get past wincing at the idea of re-writing that thing AGAIN 5. Pay attention to your build tooling - to do any given task, it should always take exactly 1 command. How do i run the tests? Run the test command. How do i deploy to non-prod? Run the deploy command. Etc " Craig J Perry

---

https://orkhanscience.medium.com/software-architecture-patterns-5-mins-read-e9e3c8eb47d2

" Layered architecture Event-driven architecture Microkernel architecture (or Plugin architecture) Microservices architecture Space-based architecture (or Cloud architecture pattern) "

more detail:

https://www.oreilly.com/content/software-architecture-patterns/ https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/

---

Ask HN: Are there any openly available software architecture documents? https://news.ycombinator.com/item?id=22011743

Ask HN: What are good resources to learn system design? https://news.ycombinator.com/item?id=24762734

---

great intro:

https://robertheaton.com/2020/04/06/systems-design-for-advanced-beginners/

---

"One strain of Unix thinking emphasizes small sharp tools, starting designs from zero, and interfaces that are simple and consistent. This point of view has been most famously championed by Doug McIlroy?. Another strain emphasizes doing simple implementations that work, and that ship quickly, even if the methods are brute-force and some edge cases have to be punted. Ken Thompson’s code and his maxims about programming have often seemed to lean in this direction." -- Tradeoffs between Interface and Implementation Complexity in The Art of Unix Programming, by Eric S. Raymond

an essay on where this fails and how you decide when to build small separate tools vs when to build an integrated system:

https://brandur.org/small-sharp-tools

---

" Fred Brooks described conceptual integrity as “the most important consideration in system design” and gave three principles upon which it is based:

    orthogonality – individual functions (concepts) should be independent of one another
    propriety – a product should have only the functions (concepts) essential to its purpose and no more, and
    generality a single function (concept) should be usable in many ways.

The authors" (of What’s wrong with Git? A conceptual design analysis De Rossi & Jackson Onward! 2013, i think) "add a fourth: consistency – requiring for example that actions behave in a similar way irrespective of the arguments they are presented with or the states in which they are invoked. " -- [7]

---

https://programmingisterrible.com/post/176657481103/repeat-yourself-do-more-than-one-thing-and https://lobste.rs/s/ibticd/repeat_yourself_do_more_than_one_thing ---

This thesis provides empirical support that a specific measure of software architectural complexity is costly:

System design and the cost of architectural complexity https://dspace.mit.edu/handle/1721.1/79551

Specifically, they look thru sourcecode in an automated manner, construct the graph whose nodes are source code files and whose edges are the following cross-file relationships (page 73, section 5.1.2.1):

"The directionality of these arrows was chosen based on the likely direction of change propagation. (Change actually propagates in the opposite direction of the arrows given the way we have chosen to draw them.) In all cases, a change in the entity being "pointed to" had a reasonable chance of requiring a change in the entity from which the arrow originates. The "to" node is a file which defines an interface, provides functionality, or defines the structure of data that the "from" node relies upon"

Then they compute the transitve closure of this graph.

Then they compute two metrics for each node by looking at the transitive closure graph (page 76, section 5.1.2.3):

Visibility Fan In (VFI): how many other nodes have edges that go from the other node to this node? Visibility Fan Out (VFO): how many other nodes have edges that go from this node to the other node?

They observe that by looking at the VFI metric across various files, files tend to sharply cluster into either 'low VFI' or 'high VFI', and similarly for VFO (although some files may be high in one metric and low in the other) (page 79, section 5.1.3).

They then classify each file as:

low VFI, low VFO: 'peripheral' high VFI, low VFO: 'utility' low VFI, high VFO: 'control' high VFI, high VFO: 'core'

They then find that 'core' files are the most costly, in terms of defect rate, developer productivity, and probability of developer leaving the firm.

---

https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html

---

https://loup-vaillant.fr/articles/physics-of-readability

---

https://matklad.github.io/2023/11/15/push-ifs-up-and-fors-down.html https://news.ycombinator.com/item?id=38282950

---

see also [[notes--books--beautifulArchitecture?]].