notes-computer-programming-programmingTips

"Rob Pike's 5 Rules of Programming

    Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
    Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
    Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
    Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
    Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Pike's rules 1 and 2 restate Tony Hoare's famous maxim "Premature optimization is the root of all evil." Ken Thompson rephrased Pike's rules 3 and 4 as "When in doubt, use brute force.". Rules 3 and 4 are instances of the design philosophy KISS. Rule 5 was previously stated by Fred Brooks in The Mythical Man-Month. Rule 5 is often shortened to "write stupid code that uses smart objects". " -- http://users.ece.utexas.edu/~adnan/pike.html


how to familiarize yourself with a new codebase

https://news.ycombinator.com/item?id=9784008

https://news.ycombinator.com/item?id=8263402

http://stackoverflow.com/questions/215076/whats-the-best-way-to-become-familiar-with-a-large-codebase

"

I wrote some simple bash scripts around git which allow me to very quickly identify the most frequently-edited files, the most recently-edited files, the largest files, etc.

https://github.com/gilesbowkett/rewind

it's for assessing a project on day one, when you join, especially for "rescue mission" consulting. it's most useful for large projects.

the idea is, you need to know as much as possible right away. so you run these scripts and you get a map which immediately identifies which files are most significant. if it's edited frequently, it was edited yesterday, it was edited on the day the project began, and it's a much bigger file than any other, that's obviously the file to look at first.

rch 7 hours ago

...

It might be nice to surface files that are frequently edited together as well. "

JustSomeNobody? 4 hours ago

1. I make sure I can build and run it. I don't move past this step until I can. Period.

After that, if I don't have a particular bug I'm looking to fix or feature to add, I just go spelunking. I pick out some interesting feature and study it. I use pencil and paper to make copious notes. If there's a UI, I may start tracing through what happens when I click on things. I do this, again with pencil and paper first. This helps me use my mind to reason about what the code is doing instead of relying on the computer to tell me. If I'm working on a bug, I'll first try and recreate the bug. Again, taking copious notes in pencil and paper documenting what I've tried. Once I've found how to recreate it, I clean up my notes into legible recreate steps and make sure I can recreate it using those steps. These steps are later included in the bug tracker. Next I start tracing through the code taking copious notes, etc, etc. yada yada. You get the picture.

reply

 scott_s 7 hours ago

A post from last year, "Strategies to quickly become productive in an unfamiliar codebase": https://news.ycombinator.com/item?id=8263402

My comment from that thread:

I do the deep-dive.

I start with a relatively high level interface point, such as an important function in a public API. Such functions and methods tend to accomplish easily understandable things. And by "important" I mean something that is fundamental to what the system accomplishes.

Then you dive.

Your goal is to have a decent understanding of how this fundamental thing is accomplished. You start at the public facing function, then find the actual implementation of that function, and start reading code. If things make sense, you keep going. If you can't make sense of it, then you will probably need to start diving into related APIs and - most importantly - data structures.

This process will tend to have a point where you have dozens of files open, which have non-trivial relationships with each other, and they are a variety of interfaces and data structures. That's okay. You're just trying to get a feel for all of it; you're not necessarily going for total, complete understanding.

What you're going for is that Aha! moment where you can feel confident in saying, "Oh, that's how it's done." This will tend to happen once you find those fundamental data structures, and have finally pieced together some understanding of how they all fit together. Once you've had the Aha! moment, you can start to trace the results back out, to make sure that is how the thing is accomplished, or what is returned. I do this with all large codebases I encounter that I want to understand. It's quite fun to do this with the Linux source code.

My philosophy is that "It's all just code", which means that with enough patience, it's all understandable. Sometimes a good strategy is to just start diving into it.

reply

monk_e_boy 7 hours ago

Debugger! Surprised no one has mentioned it yet. I work in js and php, both of which I use the debugger a lot.

Set a breakpoint, burn through the code. Chrome has some really nice features - you can tell it to skip over files (like jQuery) you can open the console and poke around, set variables to see what happens.

Stepping though the code line by line for a few hours will soon show you the basics.

reply

kabdib 7 hours ago

I just crack open the source base with Emacs, and start writing stuff down.

I use a large format (8x11 inch) notebook and start going through the abstractions file by file, filling up pages with summaries of things. I'll often copy out the major classes with a summary of their methods, and arrows to reflect class relationships. If there's a database involved, understanding what's being stored is usually pretty crucial, so I'll copy out the record definitions and make notes about fields. Call graphs and event diagrams go here, too.

After identifying the important stuff, I read code, and make notes about what the core functions and methods are doing. Here, a very fast global search is your friend, and "where is this declared?" and "who calls this?" are best answered in seconds. A source-base-wide grep works okay, but tools like Visual Assist's global search work better; I want answers fast.

Why use pen and paper? I find that this manual process helps my memory, and I can rapidly flip around in summaries that I've written in my own hand and fill in my understanding quite quickly. Usually, after a week or so I never refer to the notes again, but the initial phase of boosting my short term memory with paper, global searches and "getting my hands to know the code" works pretty well.

Also, I try to get the code running and fix a bug (or add a small feature) and check the change in, day one. I get anxious if I've been in a new code base for more than a few days without doing this.

reply

rymndhng 5 hours ago

Totally agree with the point of pen/paper.

Something that compliments that approach is in-code annotation. Recently, I've recently been trying out https://github.com/bastibe/annotate.el which is pretty sweet. Check it out!

reply

kabdib 59 minutes ago

annotate.el looks pretty interesting, thank you.

reply

 amenghra 22 minutes ago

When you find interesting pieces of code, look at the commit that brought it to life. Commits contain precious gems of information: you'll understand what files are related, who worked on which parts of the codebase, how the commit was tested, related discussions, etc.

Some people use graphical tools to visualize a codebase (e.g. codegraph). It can help you understand what pieces of code are related to each other.

reply

Mithaldu 8 hours ago

This may or may not apply to you, since i work with Perl. Typically i'm in a situation where i'm supposed to improve on code written by developers with less time under their belt.

As such my first steps are:

1. tidy/beautify all the code in accordance with a common standard

2. read though all of it, while making the code more clear (split up if/elsif/else christmas trees, make functions smaller, replace for loops with list processing)

While doing that i add todo comments, which usually come with questions like "what the fuck is this?" and make myself tickets with future tasks to do to clean up the codebase.

By the end of it i've looked at everything once, got a whole bunch of stuff to do, and have at least a rough understanding of what it does.

reply

jeremiep 5 hours ago

Both points aren't possible to do with large codebases.

It will take months merely to tidy the code with the effect of making the rest of the team hate you for committing thousands of files for superficial changes. Its much more productive for everyone to simply adapt to the existing style guidelines.

Reading all of the code is only an option for the smallest of codebases. Reading code will only get you so far before you get lost in the complexity of how all the parts interact with each other.

A better approach would be to limit yourself to a subset of the codebase and start poking around with a debugger while the system is running. Then you can gradually work your way through the codebase starting from the core functionality.

reply

 lukaslalinsky 7 hours ago

Please don't take this as a criticism, but how long have you been programming? I'm asking because I used to have an opinion like this when I was just starting, but after a few years I realized that changing all of the code as the first thing is one of the worst things to do.

reply

 fourier 2 hours ago

I'm working a lot with a huge legacy codebases in C/C++. Here are some advices:

1. Be sure what you can compile and run the program

2. Have good tools to navigate around the code (I use git grep mostly)

3. Most of the app contain some user or other service interaction - try to get some easy bit (like request for capabilities or some simple operation) and follow this until the end. You don't need a debugger for it - grep/git grep is enough, these simple tools will force you to understand codebase deeply.

4. Sometimes writing UML diagrams works -

5. If it is possible, use a debugger, start with the main() function.

reply

PirateDave? 8 minutes ago

4. Sometimes writing UML diagrams works

I found myself doing this more often and find it very useful. I've been using Freemind and it seems to do the trick. http://freemind.sourceforge.net/wiki/index.php/Main_Page

reply

 gshx 5 hours ago

I start with running the tests if there are any. Typically peeling layers of the onion starting with the boundary. If there are no tests, then I'll try to write them. Then running tests in debug mode helps step through the code. If I have the luxury of asking questions to an engineer experienced with the codebase, I request a high level whiteboarding session all the while being cognizant of their time.

Some others have mentioned recency/touchTime as another signal. For large complex codebases, that may or may not always work.

reply

jajaBinks 1 hour ago

For a large c/C++ code base, I use an editor called SourceInsight?. This is the most invaluable tool for navigating code I've come across in my 3 year career as a software developer. I work in a very large software company, and there are several code bases running into millions of lines of C/C++ code. My previous team had 60,000+ files, with the largest file being about 12k loc.

If you have access to logs from a production service / component, I find TextAnalyzer?.net quite invaluable. I take an example 500 mb log dump - opened in TextAnalyzer?.net and just scroll through the logs (often jumping, following code paths etc), while keeping the source code side by side. This allows me to understand the execution flow, and is typically faster than attaching a debugger. If it's a multi-threaded program, the debugger is hard to work with - and logs are your best friend. You are lucky if the log has thread information (like threadId etc)

reply

http://stackoverflow.com/questions/215076/whats-the-best-way-to-become-familiar-with-a-large-codebase

"

What's the best way to become familiar with a large codebase? [closed] up vote 63 down vote favorite 48

Joining an existing team with a large codebase already in place can be daunting. What's the best approach;

    Broad; try to get a general overview of how everything links together, from the code
    Narrow; focus on small sections of code at a time, understanding how they work fully
    Pick a feature to develop and learn as you go along
    Try to gain insight from class diagrams and uml, if available (and up to date)
    Something else entirely?

I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?

-- Summary of answers:

    Step through code in debug mode to see how it works
    Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
    Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
    Go through existing unit tests for functional code, in a similar fashion to above
    Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
    Make small edits or bug fixes, then gradually build up
    Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.

"

http://stackoverflow.com/questions/146936/what-can-you-do-to-a-legacy-codebase-that-will-have-the-greatest-impact-on-impro?rq=1

" up vote 30 down vote accepted

    Read Michael Feather's book "Working effectively with Legacy Code"

This is a GREAT book.

If you don't like that answer, then the best advice I can give would be:

    First, stop making new legacy code[1]

[1]: Legacy code = code without unit tests and therefore an unknown

Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.

NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.

Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this: http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx shareeditflag

answered Sep 28 '08 at 22:57 chadmyers 3,0851227

"

" up vote 18 down vote

I work with a legacy 1M LOC application written and modified by about 50 programmers.

Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.

Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...

Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.

IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.

Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.

It can worth it. Sometime warning can hide a potential bug. shareeditflag "


" What have I learnt? • Writing a program is difficult • Writing a correct program is even more so • Writing a publishable program is exacting • Programs are not written. They grow! • Controlling growth needs much discipline • Reducing size and complexity is the triumph • Programs must not be regarded as code for computers, but as literature for humans " -- Niklaus Wirth, http://wirth-symposium.ethz.ch/slides/wirth.pdf

---

"

henrik_w 104 days ago [-]

A few habits I've found that work well for me:

1. Start small, then extend.

2. Change one thing at a time.

3. Add logging and error handling early.

4. All new lines must be executed at least once.

5. Test the parts before the whole.

6. Fix the known errors, then see what’s left.

Taken from here: https://henrikwarne.com/2015/04/16/lessons-learned-in-software-development/ "

---

"

photonios 104 days ago [-]

"

---

https://bestpractices.coreinfrastructure.org/ https://github.com/coreinfrastructure/best-practices-badge/blob/master/doc/criteria.md

---