Bayle Shanks's website: notes-computer-programming-programmingLanguageDesign-prosAndCons-C

opinions

http://herbsutter.com/2011/10/12/dennis-ritchie/ notes that the real achievement of C was portability while still being close to the metal

https://docs.google.com/presentation/d/1h49gY3TSiayLMXYmRMaAEMl05FaJ-Z6jDOWOz3EsqqQ/preview?usp=sharing&sle=true#slide=id.p

K: C is the best balance I've ever seen between power and expressiveness. You can do almost anything you want to do by programming fairly straightforwardly and you will have a very good mental model of what's going to happen on the machine; you can predict reasonably well how quickly it's going to run, you understand what's going on and it gives you complete freedom to do whatever you want. C doesn't put constraints in your way, it doesn't force you into using a particular programming style; on the other hand, it doesn't provide lots and lots of facilities, it doesn't have an enormous library, but in terms of getting something done with not too much effort, I haven't seen anything to this day that I like better. There are other languages that are nice for certain kinds of applications, but if I were stuck on a desert island with only one compiler I'd want a C compiler.

M: Actually C is also my favorite programming language, and I've written a lot of programs in it, but since I began writing compilers for C, I have to confess I've begun to like it less. Some things are very hard to optimize. Can you tell us about the worse features of C, from your point of view?

K: I can't comment on the ``worse, but remember, C is entirely the work of Dennis Ritchie, I am but a popularizer and in particular I cannot say what is easier or hard to compile in C. There are some trivial things that are wrong with C: the switch statement could have been better designed, the precedences of some operators are wrong, but those are trivial things and everybody's learned to live with them. I think that the real problem with C is that it doesn't give you enough mechanisms for structuring really big programs, for creating ``firewalls within programs so you can keep the various pieces apart. It's not that you can't do all of these things, that you can't simulate object-oriented programming or other methodology you want in C. You can simulate it, but the compiler, the language itself isn't giving you any help. But considering that this is a language which is almost 30 years old now and was created when machines were tiny compared to what they are today, it's really an amazing piece of work and has stood the test of time extremely well. There's not much in it that I would change.

 Sometimes I do write C++ instead of C. C++ I think is basically too big a language, although there's a reason for almost everything that's in it. When I write a C program of any size, I probably will wind-up using 75, 80, 90% of the language features. In other words, most of the language is useful in almost any kind of program. By contrast, if I write in C++ I probably don't use even 10% of the language, and in fact the other 90% I don't think I understand. In that sense I would argue that C++ is too big, but C++ does give you may of the things that you need to write big programs: it does really make it possible for you to create objects, to protect the internal representation of information so that it presents a nice facade that you can't look behind. C++ has an enormous amount of mechanism that I think is very useful, and that C doesn't give you. " -- http://www.cs.cmu.edu/~mihaib/kernighan-interview/

" AK In the 1960s Ted Steele spent several years promoting an idea called UNCOL (universal computer-oriented language), and, to me, by a weird and interesting process—mainly because it’s easy to implement—C turned out to be UNCOL. I don’t think any human being should write in it, but it’s a great target for anybody who wants to do multiplatform things—especially if you pick the right subset.

The problem with the Cs, as you probably know if you’ve fooled around in detail with them, is that they’re not quite kosher as far as their arithmetic is concerned. They are supposed to be, but they’re not quite up to the IEEE standards. You have to pick a subset of C and you have to have some side information to get to a mathematically perfect transform of your VM. " -- alan kay, http://queue.acm.org/detail.cfm?id=1039523

" Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

    Trust the programmer.
    Don't prevent the programmer from doing what needs to be done.
    Keep the language small and simple.
    Provide only one way to do an operation.
    Make it fast, even if it is not guaranteed to be portable.

The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule. An example of this willingness to live with what the machine does can be seen in the rules that govern the widening of char objects for use in expressions: whether the values of char objects widen to signed or unsigned quantities typically depends on which byte operation is more efficient on the target machine.

One of the goals of the Committee was to avoid interfering with the ability of translators to generate compact, efficient code. In several cases the Committee has introduced features to improve the possible efficiency of the generated code; for instance, floating point operations may be performed in single-precision if both operands are float rather than double. "

" At the WG14 meeting in Tokyo, Japan, in July 1994, the original principles were re-endorsed and the following new ones were added:

Support international programming. During the initial standardization process, support for internationalization was something of an afterthought. Now that internationalization has become an important topic, it should have equal visibility. As a result, all revision proposals shall be reviewed with regard to their impact on internationalization as well as for other technical merit.

Codify existing practice to address evident deficiencies. Only those concepts that have some prior art should be accepted. (Prior art may come from implementations of languages other than C.) Unless some proposed new feature addresses an evident deficiency that is actually felt by more than a few C programmers, no new inventions should be entertained.

Minimize incompatibilities with C90 (ISO/IEC 9899:1990). It should be possible for existing C implementations to gradually migrate to future conformance, rather than requiring a replacement of the environment. It should also be possible for the vast majority of existing conforming programs to run unchanged.

Minimize incompatibilities with C++. The Committee recognizes the need for a clear and defensible plan for addressing the compatibility issue with C++. The Committee endorses the principle of maintaining the largest common subset clearly and from the outset. Such a principle should satisfy the requirement to maximize overlap of the languages while maintaining a distinction between them and allowing them to evolve separately. The Committee is content to let C++ be the big and ambitious language. While some features of C++ may well be embraced, it is not the Committee’s intention that C become C++.

Maintain conceptual simplicity. The Committee prefers an economy of concepts that do the job. Members should identify the issues and prescribe the minimal amount of machinery that will solve the problems. The Committee recognizes the importance of being able to describe and teach new concepts in a straightforward and concise manner. "

---

this presentation http://www.slideshare.net/olvemaudal/deep-c (and comments) show several places where C and C++ use a different design criterion than Python's "refuse the temptation to guess". E.g.:

if you declare your own printf with the wrong signature in C, it will still be linked to the printf in the std library, but will crash at runtime, e.g. void printf( int x, int y); main() { int a=42, b=99;

printf( a, b); }

static vars (but not other vars) initialized to 0 by default. the supposed rationale for this difference is that since the static var only has to be zeroed once, this is not a very large speed cost, but it would cost a lot to set every variable
compiler allows you to read from uninitialized vars without a warning in comes cases (if optimization is on then a warning)
code like "int a = 41; a = a++" leaves a in an undefined state because "you can only update a variable once between sequence points" or it becomes undefined, but on many compilers works anyway
- a sequence point is "a point in the program's execution sequence where all previous side effects SHALL have taken place and all subsequent side-effects SHALL NOT have taken place"
- "Between the previous and next sequence point anobject shall have its stored value modified at mostonce by the evaluation of an expression. (6.5) a = a++ this is undefined!"
- "Furthermore, the prior value shall be read only todetermine the value to be stored. (6.5) a + a++ this is undefined!! "
the evaluation order of expressions is unspecified. so code like "a = b() + c()" can call b() and c() in any order. If they have side effects then this might matter, yet no compiler error is given.
however the evaluation order of a() && b() IS specified

also, if the standard says that source code must end with a newline, that's a bit silly imo

also, the presentation says "C has very few sequence points.This helps to maximize optimization opportunitiesfor the compiler." -- this is a tradeoff of optimization vs. principal of least surprise.

also, if you have a variable defined outside any function, it is statically allocated whether or not it has the 'static' keyword. but now the 'static' keyword is used as an access modifier to define visibility to other compilation units in the linker! 'static' means 'local to this compilation unit, not visible in this linker', and the default is 'visible to the linker, can be accessed from other compilation units'! this is confusing!

another note: if the designer of the language would rate their knowledge of it as a "7", then it's too big

another note:

include <iostream> struct X { int a; char b; int c; void set_value(int v) { a = v;} int get_value() { return a;} void increase_value() { a++;} };

int main() { std::cout << sizeof(X) << std::endl; }

prints the same result as

include <iostream> struct X { int a; char b; int c; };

int main() { std::cout << sizeof(X) << std::endl; }

because the former struct X defn is just a shortcut for

struct X { int a; char b; int c;};

void set_value(struct X * this, int v) { this->a = v; } int get_value(struct X * this) { return this->a; } void increase_value(struct X * this) { this->a++; }

however this is somewhat confusing because the syntax makes it look as if the struct actually held three functions as data elements.

if you change any or all of those function defns to virtual functions, e.g.

struct X { int a; char b; int c; virtual void set_value(int v) { a = v;} virtual int get_value() { return a;} void increase_value() { a++;} };

now a (single) pointer to a vtable is added to the struct

---

" Basically, coding in C/C++ requires an intimate knowledge of how the compiler works, and sometimes even how the target chip works. C/C++ software projects of notable size oftentimes cannot be simply recompiled/linked to a different target architecture without changing things like compiler flags, the makefile, and even the application code itself. That makes it a non-portable, leaky abstraction, and dramatically increases the amount of knowledge that's required of a C/C++ developer. "

I think this presentation is interesting from a programming language design point of view; if you were designing a new language, this presentation highlights various 'gotchas' in C and C++ that might be the sort of thing you would try to avoid creating in your new language.

Many of these fall under the heading of opportunities to apply the Pythonic design criterion "refuse the temptation to guess":

in C, according to one of the comments, it was claimed (i didn't check) that if you declare your own printf with the wrong signature, it will still be linked to the printf in the std library, but will crash at runtime, e.g. "void printf( int x, int y); main() {int a=42, b=99; printf( a, b);}" will apparently crash.
- A new programming language might want to throw a compile-time error in such a case (as C++ apparetly does, according to the slides).
In C, depending on compiler options, you can read from an uninitialized variable without a warning
- A new programming language might want to not auto-initialize any variables, and to throw a compile-time error if they are used before initialization.
In C, code like "int a = 41; a = a++" apparently compiles but leaves 'a' in an undefined state because "you can only update a variable once between sequence points" or it becomes undefined, but on many compilers works anyway. A sequence point is "a point in the program's execution sequence where all previous side effects SHALL have taken place and all subsequent side-effects SHALL NOT have taken place".
- A new programming language might want to throw a compile-time error in such a case
In C, the evaluation order of expressions is unspecified. so code like "a = b() + c()" can call b() and c() in any order. If they have side effects then this might matter, yet no compiler error is given. However, the evaluation order of a() && b() IS specified.
- A new programming language might want to throw a compile-time error when side-effectful code is called in context in which the order of evaluation is unspecified.

Other miscallaneous gotchas:

In C, static vars (but not other vars) are initialized to 0 by default.
- A new programming language might want to either auto-initialize all variables, or to not auto-initialize any variables,
The presentation says "C has very few sequence points. This helps to maximize optimization opportunities for the compiler.". This is a tradeoff between optimization vs. principal of least surprise.
- A new programming language which wanted to make things as simple as possible would maximize 'sequence points', putting them in between practically every computation step. But some new programming languages would choose to minimize sequence points in order to allow the compiler to optimize as much as possible.
The presentation says that the standard says that source code must end with a newline.
- Imo that's a bit pedantic and the ideal programming language would not care if code ended in a newline.
In one context (inside a function), the 'static' keyword is used to make a variable persist across calls to that function. But in another context (outside of any function), the same 'static' keyword is used as an access modifier to define visibility to other compilation units!
- Using the same keyword for two different (albeit related) purposes is confusing. A new programming language might either drop one of those features entirely, or have a distinct keyword for it.

upvote

dllthomas 17 hours ago

link

I'm not sure "provide only one way of doing things" is a core principle of C. It at least seems to have fallen to "type less" in several cases:

    i = i + 1;
    i += 1;
    i++;
    ++i;

    a[i]
    *(a + i)

    a->foo
    (*a).foo

upvote

dllthomas 17 hours ago

link

Regarding padding, with GCC at least it's not (precisely) word size that it optimizes for, but alignment constraints of the particular members. A short will be bumped to a multiple of 2, an __int128 will be bumped to a multiple of 16. The alignment restriction of an entire struct is the largest alignment of any member. This certainly has the intent and consequence of more aligned loads.

pjmlp 1 day ago

link

The wonders of C:

NULL pointer dereferences,
Invalid pointer arithmetic leading to SIGSEGV due to unmapped memory access,
Out-of-bounds reads and writes to stack, heap and static-based arrays,
Invalid free() calls,
Double free() calls over the same pointer,
Division errors,
Assertion failures,
Use of uninitialized memory.

But hey, any good programmer always writes perfect C code.

haberman 1 day ago

link

Yeah, and you know, the Rust guys are actually doing some great work on making a language that solves some of these problems without giving up the advantages that make C programmers choose C.

Meanwhile, you are just being snarky on a message board while using a whole stack of software that was built by the programmers whose work you are criticizing.

---

http://spin.atomicobject.com/2014/03/25/c-single-member-structs/

summary: (a) the C sizeof operator has weird behavior on arrays; it seems that if the array is on the stack, it gives the length of the array, but if it is on the heap, it gives the size of the pointer to the array. This can be dealt with either by using a typedef or by wrapping the array in a one-element struct, both of which cause sizeof to give the length of the array in any case. (b) Unwanted Type Coercion: say you have two types which are both ints but of different units, e.g. seconds and milliseconds. You may want the compiler to check to make sure you aren't adding these together. A typedef won't do this; if you create 'second' and 'millisecond' types which are both ints, then you add values of these types together, the compiler will look past the typedef, see they are really both ints, and auto-coerce them to allow them to be added. But you can wrap them in a one-element struct. Now no operations are defined and you must define e.g. addition manually.

http://codingrelic.geekhold.com/2008/10/aliasing-by-any-other-name.html

For the sake of efficiency, during optimization, C compilers may make some assumptions about which pointers may be aliased. If they made no assumptions at all, then whenever a value is written to the location pointed to by a pointer, it would have to be assumed that the value at every other location pointed to by every other pointer may have changed. This would prevent any optimizations where the compiler replaces some pointer-accessed locations with registers.

So, the standard defined a rule for when the compiler may assume that two pointers are not aliases. The rule is when they are of incompatible types.

So, if you have two aliased pointers of incompatible types (e.g. one is a pointer to a 16-bit number and the other is a pointer to a 32-bit number), the C compiler may 'optimize' in ways that assume that the pointers are not aliased, causing bugs.

gdb can report the wrong thing for function arguments in stack traces, because sometimes the arguments are passed in registers, not on the stack, and the registers have been overwritten

http://codingrelic.geekhold.com/2008/07/gdb-lies-to-you.html

Proposal for a Friendly Dialect of C

http://blog.regehr.org/archives/1180

---

" Where is Rust Positioned?

The three languages where I do most of my work in are Python, C and C++. To C++ I have a very ambivalent relationship because I never quite know which part of the language I should use. C is straightforward because the language is tiny. C++ on the other hand has all of those features where you need to pick yourself a subset that you can use and it's almost guaranteed that someone else picks a different one. The worst part there is that you can get into really holy wars about what you should be doing. I have a huge hatred towards both the STL and boost and that has existed even before I worked in the games industry. However every time the topic comes up there is at least someone who tells me I'm wrong and don't understand the language.

Rust for me fits where I use Python, C and C++ but it fills that spot in very different categories. Python I use as language for writing small tools as well as large scale server software. Python there works well for me primarily because the ecosystem is large and when it breaks it's a super straightforward thing to debug. It also keeps stays running and can report if things go wrong.

However one interesting thing about Python is that unlike many other dynamic languages, Python feels very predictable. For me this is largely because I am a lot less dependent on the garbage collector that in many other languages. The reason for this is that Python for me means CPython and CPython means refcounting. I'm the guy who will go through your Python codebase and break up cycles by introducing weak references. Who will put a refcount check before and after requests to make sure we're not building up cycles. Why? Because I like when you can reason about what the system is doing. I'm not crazy and will disable the cycle collector but I want it to be predictable.

Sure, Python is slow, Python has really bad concurrency support, the interpreter is quite weak and it really feels like it should work differently, but it does not cause me big problems. I can start it and it will still be there and running when I come back a month afterwards.

Rust is in your face with memory and data. It's very much like C and C++. However unlike C and C++ it feels more like you're programming with Python from the API point of view because of the type inference and because the API of the standard library was clearly written with programmer satisfaction in mind. " -- http://lucumr.pocoo.org/2014/10/1/a-fresh-look-at-rust/

---