| |
> C can teach you useful things, like how memory is a huge array of bytes, but you can also learn that without writing C programs. People say, C teaches you about memory allocation. Yes it does, but you can learn what that means as a concept without learning a programming language. The problem with those takes is that it always come from the people that already know it and had a lot of baggage that distorts it’s view that no one else should learn. Due to several reasons I always worked with high level languages and last year I started to learn C and C++ and how beautiful was to know exactly how are you allocating resources in your application that I felt quite good programming mentally and intellectually. When you come from a world of resource overprovisioning (CPU, memory) when you start to get your head about those languages that puts performance first, you feel good about it in the first moment but in the second after you feel that your whole experience was subpar due to lack of a thought process. | |
| |
> ...I started to learn ... how beautiful [it] was to know exactly how are you allocating resources in your application... Of course, as the author implies there's an MMU, register renaming, not to mention a whole OS involved, so how "exactly" are we talking about? As Blake wrote, there's "a world in a grain of sand". (Of course I'm excited for you and share your pleasure in understanding memory layout and its implications. Computing is really fun the less abstract it is. And that sheer depth means we can plumb those depths -- or heights -- as far as we like). | |
| |
It is as close to hardware as your userspace program is allowed to get. There is definitely a whole bunch of lower layers, but you cannot really get to them, so I think it's pretty fair to call C "a lowest layer language for userspace after assembly". And probably for kernel programming, too - C gets into implementation-defined behavior when there are MMUs involved, but that's not a big practical problem. (if you work at Intel and have access to secret microcode definitions than C is not a high level language... but cases like this are super rare) | |
| |
Thanks for the words. Maybe it's about the whole experience that I got in the beginning of career that several experienced folks told me "See, learn Python/Ruby 'cause it has more jobs available and pay more". It was true on my demographic at that time. Along the time I was not exposed to those "low level" languages and it had the effect that I never had that perception. That's why I felt that my experience was subpar: In a point in time, I was living in a complete different epistemic universe even being on the technology. | |
| |
> It was true on my demographic at that time. Haha. It's true of webdev now, but I irrationally rejected that advice as hard as I could. Made job search more difficult. But, ever since I learned C, I felt that, if I can't "low level" program, I'd rather not program at all. >:] Inb4 C is a high level programming language. And it paid off: I now work with like-minded nerds and tech I'm passionate about. (As opposed to anything web-related, which gives me some kind of ineffable dysphoria for a multitude of reasons, some scrutable and some not.) | |
| |
Everybody starts in a different place and nobody knows everything. | |
| |
Still, you can gain a lot of knowledge just by learning the introductory texts. In this case, Programming in C.Will it make you an expert C programmer? No. But it will absolutely increase your knowledge on how a modern computer works. | |
| |
From the article they also cite that C has wrappers around it to make the higher level code be faster. So I would also additionally say that you should learn how you can extend your program by learning how your language interact with C / C++ | |
| |
Now do it the other way around - write a complex project using MicroPython while being conscious about memory allocations, because mark-and-sweep GC takes 200ms (6 frames at 30 FPS budget) to go through 8MB of SPI PSRAM. It's a nightmare that makes you cry for C :) | |
| |
I'm confused as to why it's such a big deal to learn C. It's not a very large or complex language. If you already know how to program, there isn't that much to learn to understand C (pointers aren't complex). It's also a fine beginner language, especially in a structured university setting imo. Yes, you probably don't need to learn it. But it can come in handy. For instance, there's a wealth of useful software out there in C. It's nice to be able to read its source, hack on it, or leverage it from your higher-level language via FFI. | |
| |
C syntax is limited and therefore "simple". Building advanced data structures and programs with your limited toolset does require some out-of-the-box thinking. This is partially why C is a common teaching language. Because you have to build a lot of tools that are easy to take for granted in other languages. Take a Map or Dictionary for example. C doesn't technically speaking provide one, so you learn to implement them pretty quickly and often. Technically speaking C doesn't even have a primative string type, you just create an array of chars, which by the way has to be a pointer. While this is super easy to do once you've seen it once or twice, it teaches you how strings work in almost every other language. The reason you can trivially loop through a string in python, is because it works the same way, they have just abstracted it a tiny bit from you. Now to the author's point... does that really make you a better developer to know that? Therefore is it really needed? No probably not needed. But these exercises and experience does help solidify more complex topics later on. So you don't need C, but I still think there's benefit in learning it. | |
| |
> Take a Map or Dictionary for example. C doesn't technically speaking provide one, so you learn to implement them pretty quickly and often. I take issue with the "quickly and often" part, but you do learn to implement them! And that provides valuable insight when it comes to other languages. I've had senior devs who, like past me, didn't understand why removing an element from a vector or map invalidates iterators or indices on it. I learned precisely why when implementing vectors in C: the usual implementation of those collections shifts everything around to maintain element contiguity in memory, so now your iterator/index is pointing to a different element (which incrementation will skip) or possibly past all the elements! And when you're presented with these fancy data structures in Java or Python, that feels unintuitive and wrong. "Come on; I should be able to just iterate through this and remove certain elements as I go," like you would remove objects from a shelf. The shelf doesn't rearrange the objects after you take one out! But the reason it feels like an affront to common sense is that you've been trained to take the fancy data collections for granted and not forced to understand the complexity behind them that makes them efficient. Of course, the real answer is to do that with some functional algorithms, like `std::remove_if()`, instead of via iteration. That way, you'll really be in the cloud-scraping ivory tower and never have to think about what's going on below. https://youtu.be/ggijVDI6h_0 :p | |
| |
I haven't tried writing C(++) in ages, but what recall being a giant irritant was the tooling. Writing simple programs was fine enough, but the whole matter of linking libraries and writing Make files was a headache. | |
| |
This is a huge pain point. If you're writing small programs that fit in a couple files in a single directory, then you can get by with manual invocations of clang or gcc at the command line. If you want to build anything of moderate complexity, then you need to get familiar with all the complexity of a separate build system like Make/CMake/Bazel/etc. Java has a similar problem - once your program grows beyond ~1 directory, you end up needing to learn Maven/Gradle/etc. Build and testing tooling is one thing that younger languages have generally done much better. Rust, Go, Dart and others all have standard build tools integrated with the language that scale to large projects, with standard testing frameworks integrated with those tools. Scripting languages like Python have import features within the language itself. Much lower cognitive overhead. | |
| |
Are you suggesting `cargo test` is somehow any simpler and nicer to use than `cd build/ ; cmake ../src/ && make -j100 && ctest`? | |
| |
This is a huge pain if you want to support multiple systems, which most of the biggest products do. But if you (1) choose a single OS, (2) have "few" (say under 20) files and (3) rely only on OS-provided libraries, then things are actually pretty simple. For example, you only support Ubuntu 20.04 or Fedora 39. Then your makefiles are nice and easy to read, and you just rely on pkg-config for the libraries. | |
| |
I Nixify all the C I work with. It has good, automatic support for all the common tooling setups. | |
| |
> there isn't that much to learn to understand C Man, what a bold claim. I could say all kinds of things about the number of books written about C, the size and complexity of its compilers, the failure of even veterans to avoid mistakes, the constant disagreement amongst even experts about the most fundamental things like realloc, but my only argument--indeed the only argument I need--is that the standard [0] is literally hundreds of pages. > pointers aren't complex Pointers are so complex they're responsible for the vast majority of security bugs across all of computing history, and they're so powerful and overloaded that they can't be fixed without bonkers hardware interventions like CHERI. You might say, "nah it's buffers" but when you have a pointer, everything is a buffer! You might further say "nah it's random undefined behavior", but it's very, very hard to use a pointer without invoking undefined behavior. Here's a quick thing I googled: int func(int **a) { *a = NULL; return 1234; } int main() { int x = 0, *ptr = &x; *ptr = func(&ptr); // <-??? printf("%d\n", x); // print '1234' printf("%p\n", ptr); // print 'nil' return 0; }
Did you know that the language doesn't specify whether the deference on the left of `*ptr = func(&ptr);` comes before the address-of on the right, thus this behavior is undefined? Can you really be arguing that this isn't complex? Look I love C, but it's deeply complex and a lot of that is pointers. [0]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf | |
| |
> pointers aren't complex They are complex if you are coming from the Python world. To truly know their recursive declaration syntax, you have to see it from a compiler's perspective. Pointer arithmetic isn't something you do often even in C++. It's a crude kind of reference that can be mutated arithmetically and doesn't provide any safety whatsoever. It maps well to the indirection mechanisms and addressing modes provided by the underlying hardware. | |
| |
Everything is complex when you're barely starting, but in general, pointers aren't complex. Don't mistake the complexity with ease of making mistakes though. | |
| |
Honestly I find Python's value vs. reference semantics fairly confusing, and it's not always obvious whether an operation is going to make a copy or modify an existing instance of an object. Good API design and docstrings mitigate this, but it's one more source of complexity to think about. I run into this much less frequently in C++ code - it's normally pretty clear from parameter and return types whether I'm dealing with a copy or a reference. This might just be because C++ broke my brain into assuming: Object foo = other_object;
is a copy operation, and taking a reference or pointer would require extra characters (e.g. copy by default). Most other languages are the opposite: assignment creates a reference by default, and making a copy would require extra characters (e.g. .copy() in Python or .clone() in Java). That's my biggest mental adjustment when moving back-and-forth between C++ and Python. | |
| |
Python itself only has one kind of argument passing (neither by-value or classic C++ by-reference), and nothing is copied. Library functions could make confusing choices, it's true. | |
| |
> pointers aren't complex It really helps if you understand assembler addressing modes. Then it makes total sense. | |
| |
huh? I work with pointers all the time, and I don't think I ever cared about assembler addressing modes.. Who cares if you pre-increment in the same instruction or if you need a separate operation for that? Who cares if offset is computed as part of the load or separately? This does not change how the program is built or debugged. (Unless you are talking about x86 real mode segment registers. Luckily, we can leave this behind and never talk about them again :) ) | |
| |
The point may be that time is a scarce resource and you better focus on what's important for you. C may be it, but I think Ned's point is that people might be learning C thinking it will give them a skill that it actually doesn't. | |
| |
I find thinking about time like that ends up counterproductive. You can read K&R and code/read a little C and never build a single thing in it and have it be worth it. For instance, I wrote C in school but basically never use it professionally or for my hobbies (directly at least). But I'd say learning it was worth it - my school curriculum didn't "waste" my scarce resource of time. You do end up learning to think about computers in a different way. | |
| |
Pointers aren’t complex, but C makes them look complex with the counter-intuitive syntax. Therefore, I don’t think C is a good language for learning pointers. | |
| |
C might not be strictly necessary as a language one would write in. But all of those problems that C++, Rust or Zig solve - these all are discussed in terms of C. Toolchains, OSes, concepts of static and dynamic linkage are all defined in terms of whatever few abstractions C provides. And sometimes all you have is a good old C compiler. Because C is everywhere. It is not necessary to know the language but it sure is useful to see beneath and beyond. | |
| |
With C, if you can understand the lifetime of variables on the stack, how to manage pointers to the memory allocated on the heap, etc, it'll help with learning Rust, Zig, etc. Implementing your own malloc/free, memory arenas, reference counting, vtables, etc, and doing so all in C will lead to a concrete understanding of how these features work in other languages and ultimately to more intuitive designs and implementations. | |
|
| |
I always think that these "C is a high level language / virtual machine and it doesn't teach you how your computer works because your computer isn't a PDP-11" style takes are very silly. C is a high level language that maps very straightforwardly to what the hardware does and programming on a microcontroller without an OS let's you eliminate a ton of abstractions that otherwise take years to comprehend. If someone wants to understand what's going on "behind the scenes" then it's going to be very useful for the to know C before trying to learn how the Cpython interpretor or JVM work. | |
| |
The tone of this article is a bit combative and incurious for my taste. If you want to understand "how a computer works" - well sure C has gone from low-level to a high-level language. But the model of "how a computer works" - from Python or Ruby or JavaScript - that's very often expressed in C: kernels and libraries, in multiple layers. There's so much rich information in C code bases on why something behaves like it does. Whole histories of computing are expressed in C - types and APIs that tell you why something might work on one computer and not on another. So sure, don't learn C if you don't need to, but it's not hard to understand if you're motivated. And it really does expose you to how a computer works in ways you won't find in higher-level languages. | |
| |
Personal PoV: I think it is important to learn the C abstraction because lots of senior people in the industry talk in that abstraction level and if you're not able to communicate on that level, it may be a communication hurdle for you. Lots of people certainly can survive without knowing any C, I'm just saying that it is not an irrelevant thing and if you have the time to invest in learning a bit more than the basics, it is not a waste of time. | |
| |
Sure, I never said it was irrelevant or a waste of time. It can be useful and interesting. It's just not necessary, and it doesn't tell you how computers really work. | |
| |
First, just wanted to say I follow your blog for a long time and use software you created for many years, so I respect you a lot! I think we're just violently agreeing. You posted a few reasons people may feel compelled to use as reasons to learn C that are misleading. I'm just adding to that from my personal experience that learning C, even if you are not going to use it every day, also has a collateral advantage to help when talking to people who "think in C" as an abstraction. Peace. | |
| |
I really enjoy C programming. I am a beginner. I wrote a JIT compiler in C recently. Why didn't I write it in Rust? The semantics of C really fit how what I think about what computers do: that we move around things between boxes of memory and between registers. That computers is largely logistics between "places". Edit: It's a bit like a factory as in factorio I didn't write it in Rust because I find the semantics of C++, Rust, Haskell and other functional languages to be much more abstract and harder than this simple idea of stateful movement. https://github.com/samsquire/compiler | |
| |
If your goal is to learn about how program execution works or memory and other low-level things I suspect learning how to build a compiler targeting a VM (and building the VM) is a good way to learn these things. You'll end up learning how to emit VM instructions to store data and the address of the program counter, jump to the function code, then jump back and unpack all that data, etc. You'll learn how to implement those VM instructions for a target hardware platform. You'll learn how difficult it is to design a good instruction set that covers the platforms you want to support. The nand2tetris course is a pretty good one; you'll design the hardware itself in the first part and in the second you'll build a compiler/VM/etc for it. And then you'll gain an appreciation for how C works and why it's useful. And also why higher-level languages are useful. I don't think you need to learn C any more than you need to learn separation logic or category theory though. Update The problem with "learning C" is that it has a ton of warts and you might injure yourself in the effort to master it. The language is coupled tightly to libc, the specification is small as far as modern ISO standards go but it still leaves a lot to the imagination (and the implementors). It's really quite a complex language to learn well due to all of the sharp edges. And of course, C isn't the royal road. There are lots of other models for thinking about programs that aren't based on imperative procedures. It's just a language. If you want to think about programming and PLT... there are a lot more sophisticated tools. | |
| |
You should learn C because everything around you is based on C, most of it open source, and the ability to read and study the massive amount of C code that everything else relies on is a smart bet for understanding computers in a broader sense, and for better bending them to your will. | |
| |
Yep, that's why I'm learning it - I don't like C, and I'm not good at it, but I want to be able to work on existing kernel code so there's not really a choice. | |
| |
I think this statement was long ago amended to: "you should learn a systems language, if you want to be a 'complete'/'well-rounded' SWE" C was never the point (it was just the common low level language), the point was that just learning PHP/Python/JavaScript/Ruby wasn't enough to get a full grasp of everything that's going on, since they hide a lot of the warts from you. I, personally, still believe this. That being said, you can definitely make a career out of doing nothing but JavaScript/TypeScript/React, PHP/Laravel, RoR, etc. | |
| |
These days, to be "complete" wouldn't I need to know frontend JavaScript, neural networks, and a dozen other things I don't know? What does "complete" mean? | |
| |
Yes, this is why software engineering is a skill you master in 20 years, not 3 months. | |
| |
I don't believe there is "complete" as much as there is "well-rounded", which is what I aim for, personally. | |
| |
Well by definition 'complete' means to "the greatest extent or degree". So a complete SWE would need systems knowledge. But is anyone really a complete SWE? Even people that are experts in systems probably aren't complete because they don't know AI. So we aim for well-rounded. That's the best we can maintain. | |
| |
Thus the quotes. As another responder says "well-rounded" is a better phrase, but either way the field is large and you'll always be learning in it. | |
| |
Yes, that sounds about right! | |
| |
> C can teach you useful things, like how memory is a huge array of bytes Careful: In C, memory isn't a huge array of bytes. | |
| |
Memory (i.e. the totality of the machine's RAM) isn't, but the subsets allocated to you are. Iterating a pointer on an array certainly will give you the next item in said array; it'll also give you something random should you go out of bounds. | |
| |
It might, but it might not. This is actually undefined behavior, so it might do anything. This is part of C's complexity: it abstracts memory such that it mostly seems like a big array, but the addresses you're getting bear no resemblance to where they physically are. Depending on your OS, dereferencing that out of bounds pointer might signal a segfault, give you another process' data, or return data from a sensor, for example. Stuff like this is just very hard to explain to like, a Python engineer because it's just a totally different world. | |
| |
No, this is not undefined behavior. Aggregate types are the one aspect of the C "memory model" that is very well defined. The physical memory isn't necessarily contiguous, but the model of memory is. Which is the entire point. arr[0] and arr[999] are 1000 virtually contiguous blocks. You can confirm that in the spec here: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf | |
| |
Sorry I was referring to this: > it'll also give you something random should you go out of bounds. Out-of-bounds array accesses are UB | |
| |
I mean, that was the point of: > it'll also give you something random should you go out of bounds. Are you just taking umbrage with the word I used? "random" clearly is meant to imply that the language makes no guarantees. | |
| |
Oh, I guess I thought you meant it would be like uninitialized memory. If you're saying you mean "UB" when you write "random" then yeah, we agree, although I think it's kind of confusing. | |
| |
> Oh, I guess I thought you meant it would be like uninitialized memory Uninitialized memory/data isn't "random", it's specifically null. Random could be something uninitialized, or it could be the kernel entrypoint or it could be a pixel data byte from some BMP. | |
| |
> Uninitialized memory/data isn't "random", it's specifically null. Ooh that's very untrue [0]. Although, I guess I should make sure we're talking about the same thing. I'm saying this is undefined (the little program I've linked below): #include <stdio.h> int main() { int a[4] = {0, 1, 2, 3}; int b = a[1000]; /* undefined behavior! */ printf("%d\n", b); }
A lot of people think b should be something random (uninitialized memory), or NULL, but literally anything can happen: those options, your printer spits up all its ink onto the ceiling, you read data from a sensor on an embedded board, you read data from a different process' memory, you get a segfault, etc. It's dependent on how your OS (or whatever) manages memory beneath you. In this case w/ these compiler options and such, it's uninitialized memory. [0]: https://godbolt.org/z/KKfhxdo99 | |
| |
Your definition of "random" is bespoke to you. Suffice to say, my original statement clearly matches what you seem to be getting at; whether you like the choice of words or not. | |
| |
Anyone reasonable would read "it'll also give you something random should you go out of bounds" as meaning you'll get a random value from out of bounds array accesses. They're basically the same sentences. I think if you're honest, you'll admit you either weren't clear about what would happen, or at least were a little careless with your language. But, maybe I've been too much of a jerk here! Maybe I jumped too hard into "someone's wrong on the internet!" mode. It's OK to be wrong, cool and rock and roll even. Lord knows I've been wrong and will be wrong again. If this is true then I've embarrassed myself and I apologize. Let's go forward and be excellent to each other. | |
| |
Well, it's an array of bytes with holes in it, because of the address map. Worse, the holes may change size as the OS allocates more memory as the program runs. So in that sense, no, it's not a huge array of bytes, and access to some addresses will segfault. But C lets you treat memory like a huge array of bytes. You can do things like: char *p = (char*)0; char value = p[address];
This will segfault if you touch the wrong address. Other than that minor detail, you can think of memory as a huge array of bytes. | |
| |
> Careful: In C, memory isn't a huge array of bytes. Of course it is; the platform you are used to might have additional constraints, but that is outside of C itself. On your platform (hardware/OS combination) it may not not be an array of bytes, but rest assured that conforming C compilers exist on platforms, such as embedded ones, where you can literally get a pointer to address zero, and use all the RAM as a huge array of bytes. | |
| |
Help us understand: what is the model of memory in C? | |
| |
C supports systems with segmented memory (https://en.m.wikipedia.org/wiki/Memory_segmentation). You also aren’t allowed to observe (or even create pointers which aren’t even dereferenced) memory or addresses outside of allocations you’ve made, so you really can’t observe much about the memory model. But I think the OP’s point is more about how when you allocate memory, you get an array of bytes to play with. And all higher level languages build on top of that and abstract it away as much as they can. | |
| |
Define "allowed". Will I get arrested? No. Will the compiler stop me? Also no. Will the program crash? Maybe. Almost certainly if I do it often, or without understanding. You aren't guaranteed to be safe if you access memory or addresses outside of allocations you've made (with stack and static memory counting as "allocations you've made). But on embedded systems with memory-mapped I/O, I have done things like *(unsigned long *)0xFFFE1404 = 0x00011472;
in order to write values to the registers of a peripheral device. Those I/O registers were memory that I "owned", even though I never allocated it in any way. | |
| |
The C standard does not allow it. It’s simply undefined behavior under the standard, and all bets are off as far as C is concerned. But of course an implementation is free to define additional behaviors beyond the C specification. That’s done all the time. But that’s really a “flavor” of C and not pure “vanilla” C. | |
| |
No, it's implementation-defined, not undefined behavior. That means the compiler must document a consistent behaviour. From 6.3.2.3 [ISO/IEC 9899:2011]: An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type. Also, the C standard merely codified existing practices and common extensions. Actual use of C has converted integers to pointers for a long time. If converting integer literals into pointers were undefined behavior, it would just show that the C standard isn't being practically useful in one area (since it's commonly done in practice). | |
| |
> Actual use of C has converted integers to pointers for a long time. Quite probably since the first C-based implementation of Unix on a PDP-11. So it's been known to be "a thing that C does" for quite a bit longer than the standard existed. | |
| |
If you want your program to work on more than one architecture+compiler combination, then implementation-defined is effectively the same as undefined. | |
| |
Not entirely. Say I'm doing my original example, working on an embedded system. My code isn't going to port to anything that doesn't have the same hardware, so architecture isn't an issue. Any compiler supporting that architecture for an embedded application is going to do the right thing with that kind of C statement, so that isn't an issue either. So, implementation-defined means that you can't count on compilers doing the same thing. But there are some things that are implementation-defined where you can pretty much count on any compiler doing the same thing. And, as trealira said, compilers are supposed to clearly state what they do in such cases, so you can read the compiler's statement and see if there are any surprises. This is less true if you're writing library code. There, you have to support all compilers, or at least all conforming ones, and you have to make fewer assumptions. | |
|
| |
Memory is a term describing the the semi-permanent (permanent while it receives power) state of data as it exists in fast storage. If you try to access random areas of said memory, you will be sorely disappointed as the OS instead allocates you memory as needed and barriers tasks from accessing each other's memory. In other words, location 0x0000ffff in your application does not map to system location 0x0000ffff, but instead a translated portion of said block. In addition, there are no guarantees as to how that memory will be ordered/allocated/segmented outside of specific requests for a contiguous block of memory via something like malloc. You can assume your array (static and dynamic) is contiguous, but that's the only assumption you can make. | |
| |
I wouldn't call that the C memory model, as it's imposed by the OS on all programs. | |
| |
C requires an OS to provide CRT and syscalls for it to function. So you're correct in that the memory model is more akin to "what the OS decides to offer you", with the singular guarantee that specific pieces of data will be contiguous. If you were to write freestanding C code, this certainly changes. At which point the memory model becomes "what you decide to provide". | |
| |
> C requires an OS to provide CRT and syscalls for it to function. That doesn't make sense; the OS itself is written in C. > If you were to write freestanding C code, this certainly changes. At which point the memory model becomes "what you decide to provide". Okay, so we're back to, C doesn't impose anything on you, though the OS may. | |
| |
The C memory model is what imposes pointer provenance and all that jazz on C programs. The OS provides an abstraction over physical memory, and programming languages abstract over individual address spaces. | |
| |
> That doesn't make sense; the OS itself is written in C. Please point to a non-userspace memory allocation library/implementation that does not rely on lower level logic to function. > Okay, so we're back to, C doesn't impose anything on you, though the OS may. You're just reversing the original statement. The statement was "you can't assume anything about memory in C" (paraphrasing). They then asked "why not?" The explanation is that: What you think is 'memory' in C isn't, and certainly doesn't map to what most people assume about memory; because C doesn't impose a memory model, it relies on the underlying OS/environment to do so. The only "memory model" is thus: a requested allocation (whether on the stack or heap), if provided to you, will match your request; all other assumptions are invalid. If you want to move goalposts, argue about logical boundaries, etc, have at it. Or if that answer doesn't satisfy you, I don't know what to tell you; but simply rephrasing the original problem does even less. | |
| |
Except you can't see that model from (for example) Python, and you don't need to. | |
| |
Python presents an abstraction, but underneath, it's bound by the address space just as C is. | |
| |
Yes, but why do I need to know that as a Python programmer? What mistakes might I make if I don't know about it? | |
| |
I was just making it clear that the OS policy pervades. That is, the developers of CPython or another execution context need to care about it, and so it informs some of the implementation details that may leak into programs. We agree that Python presents an abstraction so that Python programmers (usually) don't have to see this. | |
| |
I'm not sure that abstraction is even in the OS. I think its on the memory controller itself. The OS reads from the memory controller. | |
| |
I'm pretty sure it's more of a cooperative arrangement; the OS writes the configuration that the MMU uses, thereby imposing its rules on programs. | |
|
|
| |
This isn't really responsive to the question. The C memory model is basically defined in stdatomic.h [0]. It's around 10 pages and describes affordances you almost never, ever see. > In addition, there are no guarantees as to how that memory will be ordered/allocated/segmented outside of specific requests for a contiguous block of memory via something like malloc. You can assume your array (static and dynamic) is contiguous... The standard can only be referring to the abstract machine. In truth, your OS might give you 1,000,000 elements on 1 page, and 1,000,000 elements on a different page, which exist nowhere near each other in RAM (or have been swapped to disk, etc. etc.), or are being CoW'd into existence, and so on. This is empirically true--from the days when programs like Chromium would try and malloc all the memory in a machine. The pointer that malloc returned could not have referred to a contiguous memory block. You can try it on your machine by malloc'ing more memory than you have and then reading from the blob. [0]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf#s... | |
| |
> The standard can only be referring to the abstract machine. In truth, your OS might give you 1,000,000 elements on 1 page, and 1,000,000 elements on a different page, which exist nowhere near each other in RAM (or have been swapped to disk, etc. etc.), or are being CoW'd into existence, and so on. This is where the "model" portion of the statement comes into play and why OPs point is even more cogent. The memory will be virtually contiguous, even if it's physically disparate. | |
| |
> The memory will be virtually contiguous, even if it's physically disparate. I don't really understand the value of extolling the fact that like, you can incrementally iterate through an array in C. The days of systems with segmented memory are long behind us, and even then it wasn't like you'd have an array that spanned segments--you couldn't! Honestly what language/platform exists that doesn't have this property and what would that even look like? Like you'd somehow have to know that elements 100-200 in an array are "no good" and you have to skip them? People keep trying to put meat on the very basic bones you've got here, but you keep insisting that, yep, 2 comes after 1 and 717 comes after 716. Great! We know! And we say stuff about how trying to index off the end of an array leads to UB, or how the array may not actually be contiguous in physical memory, or it might be in various caches, or how sometimes your array goes from 1024 members to 1025 members and your FPS drops from 300 to 30 because you fell out of cache, and you're like, "sure but you still access arrays with consecutive indexes". Well, yeah! It's kind of the point of using C that you have access to or some control over these kinds of things. They're useful to the conversation. Continually bringing us back to indexing... I don't think is. > This is where the "model" portion of the statement comes into play and why OPs point is even more cogent. Eh, "memory model" is a specific phrase referring to how memory is defined to work in a threaded environment [0]. The original "Help us understand: what is the model of memory in C?" prompt is referring to the fact that unless you prolifically use the API in stdatomic.h (which very few things do), you just have undefined behavior all over the place if you ever dare to use anything related to threads. Also, it was only defined in C11--not a lot of things have updated, even now. --- Overall I want to emphasize that this whole thread is doing a real good job of proving Ned Batchelder's point: even people who think they know or understand C don't (to be clear, I do not think I understand C), and things are generally alright. There's--clearly, reading through everything--a culture of "You need to be an expert in some low-level, 'real' language/platform before you can write meaningful software", but you don't, and my evidence is Facebook, maybe the most influential software ever written. [0]: https://research.swtch.com/plmm | |
| |
Aliasing is one example of the huge-array-of-bytes abstraction being broken | |
| |
A very simplified and possibly wrong answer: memory in C is a bunch of arrays of bytes that have lots of rules on when and how you can use a particular array given the state of all relevant arrays. | |
| |
Author (probably) needs to learn C. | |
| |
Nice, the ad hominem begins. I maintain software written in C. | |
| |
It's a joke, nothing personal. I don't really know C even if I use it a lot. I know nothing about you, I'm just using the same snark I read from your post, tongue in cheek. If you want something constructive, what was the point of the post if you're wrapping it up with "learning C can be useful"? I love C, it's fantastic and it has taught me a lot about how and why some things are as they are. Just compiling with a simple lib helped me understand headers and the -devel packages some distros provide. The difficulties doing so efficiently shows why newer languages have package managers. Another point could be that the entire world runs on C, and knowing it can help either porting software safely or maintaining projects. Basically what I get from it is: "you (probably) don't need a drivers license, unless you want to drive, then get one". Sometimes the journey can be extremely beneficial even if there's no direct use. Would you claim learning C hindered you in any way? | |
| |
More like, you probably don't need to know about spark plugs, unless you want to fix your own car. The point of the post is that abstractions are inevitable and you choose your own level. And: you can be a great programmer without knowing C. | |
| |
> The point of the post is that abstractions are inevitable and you choose your own level. And: you can be a great programmer without knowing C. Absolutely, but I get the feeling you're taking the common statement "learning C can be a great experience" and bastardazing it into a straw-man "you NEED to learn C" that you then argue against. I just think it's an obvious statement and it reads like you're being nagged by people to learn, but you don't want to learn C and defend your position. For me nothing has taught me so much about development than a weekend hacking in C and stepping through the code with Valgrind or reading the binary output did. And I recommend all my peers do the same, most don't and that's okay. But I think it's a shame that people in general don't seem to care that much outside of work. I did enjoy the read though, even if clickbaity. | |
| |
I was reacting to someone literally saying, "Everyone should learn C because it teaches you how computers really work." In my experience it's no an uncommon recommendation. Yes, people also say, "learning C can be a great experience," and I agree. | |
| |
It gave me a better understanding of how computers really work. If you already have a CS degree that might not be the case, but he has a point and the experience opened a whole new world for me. Something I assumed was too difficult for me for a long time. Add Valgrind and watch the assembly while stepping line by line, taught me so much. Why C as opposed to other languages? It's portable and very simple on the surface, there's not much to it but difficult enough to be a teachable opportunity. | |
| |
I’m pretty sure Ned knows C. He’s been around for a while. He’s a name in the Python community. I’m reasonably confident he plays in the CPython interpreter when he needs to. | |
|
| |
I think it's helpful to learn a little bit of C, to understand why computers work at all, and how we can get, step by step, from bits and bytes to higher-level "things". Long ago, a friend of mine - a brilliant developer, who nevertheless never dug any lower than basic C++, once tried to write a simple C program that printed out some text. His first attempt went something like: void greet(char *name) { char *text = "Hello, "; text += name; text += "\n"; printf(text); }
It was a lot of fun going together over his program, and exploring why things didn't work the way he expected! | |
| |
I would argue everyone should learn c purely from a linux/free software point of view. So many open source libraries are written in c, and getting as many eyes as possible on that source code would definitely be beneficial. | |
| |
There's also a difference between reading K&R and being comfortable with conventional C. Reading through source code from a mature project like PostgreSQL or the Linux kernel is very different from just knowing how malloc and free work. Being able to follow along in the PostgreSQL source code has been helpful when we see production behavior we don't expect. | |
| |
Serious question: The article points out that: `C is far removed from modern computer architectures: there have been 50 years of innovation since it was created in the 1970’s.` Is there a C-like language that uses abstractions based on current ARM or x86 processors? i.e. something above assembly that learning would help us understand how these modern processors actually work? | |
| |
What language would? Even if you directly write assembly the microcode or pipelining layers are not visible to you, its abstracted in hardware itself. In that sense I think accusing C of not being hardware friendly because these aspects are not exposed to it is nonsense. These are not exposed to even assembly language forget any other higher level language. Regarding parallel programming and GPUs I think a better case could be made for alternative abstractions. | |
| |
My point wasn't that C wasn't hardware friendly. My point was that it doesn't teach you "how a computer really works". It teaches you a lower-level abstraction than Python does, but it's still an abstraction, and there's a ton of mechanisms at work that you can't see. | |
| |
Of course. Even below assembly language there are many layers on today's computers. As you noted pipelining, microcode, branch prediction, etc there is a lot more below it. And further below one gets electronics and then quantum theory and so on. So its a matter of what abstraction one can safely ignore I think. And today I think we at least need to go to the level of that sub-assembly stuff if one wants to write anything that does not perform badly. I personally also feel whatever layer one usually works in, one should learn the layer immediately below at the very least. So yes I think we agree. | |
| |
> And today I think we at least need to go to the level of that sub-assembly stuff if one wants to write anything that does not perform badly You can write fast software today in assembly, C, C++, Rust, and languages like that. You don't need to go to the microcode level. The reason we have trouble with slow, bloated software today is because people don't do that, and instead use slower languages, like Python or JavaScript, using inefficient algorithms, with lots of potentially unnecessary dependencies that optimize for the general case, and not whatever specific problem the programmer wants to solve (which is often less efficient as well). | |
| |
By sub-assembly level I meant we must have some knowledge at least of pipelining, branch prediction, caching etc at least regardless of your language choice which is to say its not a completely clean abstraction you can totally ignore and forget about. Layers underneath that are probably not so important. | |
| |
Oh, yeah, that makes sense. I agree that having an understanding of those things is necessary to write fast software. | |
|
| |
> One reason to learn C is to grasp the basic syntax and control structures that it shares with many languages in the same family. Why not just learn the language that you want to learn? | |
| |
Because it might be a while before a JavaScript programmer reaches regular expressions in their training. Reading the AWK book is like skipping ahead on a few subjects. | |
| |
You probably don't need to learn C and program in it, but I do think knowing material covered in say [1] Computer Systems - A Programmer's Perspective will make you a better engineer, and historically computer systems were implemented in C and that heritage is still with us. So, being able to read C at least is a prerequisite to understand some of it. [1] - https://www.amazon.com/Computer-Systems-Programmers-Perspect... | |
| |
> Pipelining, cache misses, branch prediction, multiple cores, even virtual memory are all completely invisible to C programs. Are there languages where these are visible? Any time I've seen them discussed it's been in terms of writing C/C++/Java that understand how they work, e.g., the disruptor pattern padding variables to a 64-byte cache line. > C teaches you an abstraction of computers based on the PDP-11. Hmm, I'd wonder how a C designed today would look based on modern architectures. | |
| |
Isn’t the modern version called Go? | |
| |
Learning some C - especially how pointers work with memory, and the often challenging memory addressing operators (&, *, [], ., ->) - is very useful, even if the end result is a strong desire to only use languages that abstract all that away. Otherwise it's very easy to get confused over, say, the difference between 'pass by value' and 'pass by reference' in JavaScript, or the fairly complicated approach Python uses, 'pass by assignment': https://medium.com/@devyjoneslocker/understanding-pythons-pa... It's also not a bad idea to learn the correct way to open and read a file into memory using C (a page of code at least, with lots of NULL checks and malloc and free), just to appreciate how much work something like Python (a few lines of code) is doing for you under the hood. | |
| |
Those justifications or "reasons" for learning fascinate me. C imagines a PDP-11-like abstract machine, and for reasons good and bad modern processor designers try to make that abstract machine fast. But if you want to get a feel for what the hardware might be doing, as the author says, C is not a good choice, and never really was. If you want to start to understand the hardware, write some assembly code. It's not that hard, but you have to think about the actual hardware model. The author's other point is also correct (much as it irks me -- I want to know everything): nobody understands the whole vertical stack from quantum mechanics to the latest frameworks, much less its horizontal ramifications (subtleties of TCP implementation or category theory). I could creditably* describe perhaps 60% of how a car works these days but I can no longer do more than simple maintenance. Infuriating, yet surely a good thing. * creditable, sure, but credibly? Not for me to decide. | |
| |
Even assembly won't expose you to pipelining, cache misses, and branch prediction. | |
| |
Exactly, "start to understand". Register renaming, speculative execution, all sorts of fun stuff. And all that's pretty high level when you start to look at the deeper things the chip designers had to think about that still influenced the higher level design, like thermals, transmission implications of capacitance...it's so great that it never ends! | |
| |
My work is in (embedded systems and reverse engineering) so I use C every day as a necessity and love it. For me it is the perfect level of abstraction, not too high and not too low. I use Python for glue code and for running unit tests and other boring stuff and find it frustrating with too many ways of doing things, too many libraries for achieving the same thing. It could be I am still bitter because Python3 broke a bunch of my scripts by forcing everything to be a byte array. | |
| |
I love C but, this is empirically obvious right? Most software engineers don't know C, things are going OK, QED. | |
| |
You'd think it was obvious, but people keep saying it. And there are commenters here disagreeing. | |
| |
Man, you are right. I went through the thread and tried to challenge things. We really do live in wild times haha. | |
| |
It is obvious that don’t need to learn C. Need as in strictly speaking sense. PSYou dont need to attend school or university. But I bet nearly every parent tells their kids to attend and graduate with good grades. | |
| |
You (probably) don't need to learn any programming language. | |
| |
Is the author proficient in C? | |
| |
Stastically, author has probably written C longer than you have been alive. | |
|
|