| |
[Languages like Rust] do help you to write more features with fewer bugs, but they are not of much help when you need to squeeze the very last flop from the hardware you rent.I do think that Rust helps you squeeze out that last 1% of performance over C++. Because instead of spending all day finding that one line of code where an unsafe threaded memory access leads to a race condition bug, I can spend more time performance profiling and optimizing the hot spots. | |
| |
Beyond that, this article is just a little bit absurd. Say what you will about Rust but it's a functional and gradually maturing ecosystem powering more and more things as time goes by. Given that it's sorta-kinda intended to build the kind of things C++ does, the idea that it'll eventually overtake C++ is at least plausible. In comparison this article claims that1) A cool research language packaged into a VS Code plugin2) A cool research compiler for Python + numpy3) A cool generic ISA project will kill C++? Maybe for the author's _very_ specific use case of blindly copying things from SymPy, but really I think you'd have better odds betting on a D resurgence. | |
| |
> 3) A cool generic ISA projectI wish we had an ISA with non-linear, two-level memory space, so you can have non-overlapping arrays for free (well, as free as virtual memory required to support that would be free): that would mean that e.g. realloc() will never have to call memmove(), range checks are inescapable since they're baked into hardware, etc. But implementing (and using) C on such an ISA will be deeply unpleasant, so it will just die because if something doesn't run C efficiently... it won't be used. | |
| |
You still need something to identify which memory space the array is from. Unless you are talking about some form of hardware copy-on-write, this is no better than what we have today | |
| |
Compared to Spiral, though? I’d prefer ‘automatically finding the optimal solution for a target architecture’ to ‘spending my time performance profiling’ with Rust. | |
| |
Does Spiral have a competitive ecosystem?Raw performance is useful but when you build, as the article suggests, a cloud service you need more than fast calculations.Also I'm not sure how the article author thinks Python libraries & tools can beat C++ but disregards other languages for not achieving the same (or better) out of the box. If Python can be made to compete with C++ any language can, and almost any other language is more preferrable to me than Python when it comes to production code. | |
|
| |
Yeah I guess OP never heard of the Rust based components in Firefox. | |
| |
> Do you know that in MSVC uint16_t(50000) + uin16_t(50000) == -1794967296?This is pretty basic C++/computer architecture stuff. > They just don’t give you a competitive advantage over C++. Or, for that matter, even over each other. Most of them, for instance, Rust, Julia, and Cland even share the same backend. You can’t win a car race if you all share the same car. But it's the same backend that C++ uses...? > Because if you can write in Python and have the performance of C++, why would you want to write in C++? I'm no rust fanboy, but isn't this the argument he used against Rust in the first place? | |
| |
Python in 2023 is like Perl in 1996.Given the choice I'd rather program in anything else. | |
| |
And actually, the best Perl golfers[1] I knew wrote the cleanest, most maintainable code.I wish "clever but bad" programming games were more common in every language. It takes care of the learner's need to push limits, but is a pleasantly humorous reminder why we don't do that in production. [1] https://perlgolf.sourceforge.net | |
| |
Exactly! I liked using it to program Blender3D scripts in 2004/5, but then lost any love for Python. | |
| |
> This is pretty basic C++/computer architecture stuff.Well, it would be if they'd written it correctly—that should be a multiplication, not an addition. As written, it would certainly be very mysterious, but mostly just because it isn't true. | |
| |
Thank you, I was starting to question my sanity. | |
| |
> This is pretty basic C++/computer architecture stuff.The C rules for integer type promotions are incredibly hard to reason about. They should have just banned implicit conversions from the beginning. This is not super basic stuff. | |
| |
What does computer architecture have to do with this? What is your explanation for why this happens? | |
| |
The underlying assumption is the architecture provides instructions that operate as numtype x numtype -> numtype, with int being the smallest possible numtype. To execute a + b it must be mapped onto one of these instructions. For example, char + short would get mapped onto the int x int -> int instruction. The usual arithmetic conversions formalize the algorithm for doing this.If you ignore signedness, the rule is extremely simple and natural: numtype = max(type(a), type(b), int), where char < short < int < long < long long < float < double < long double. Signedness throws a wrench into the works though. | |
| |
That assumption would be incorrect then. | |
| |
If you care about efficiency, you have to take computer architecture into account. (One important example is that floating-point numbers are not reals we tend to think of them as.) | |
| |
The example provided has nothing to do with computer architecture or efficiency. In fact if anything it's less efficient than what most computer architectures provide natively.I'm questioning OP because I have strong doubts that he actually understands C's integer promotion rules that he can claim that it's basic stuff. They are very subtle and even C and C++ experts get bitten by them. | |
| |
The usual computer architecture includes the representation of floating-point numbers that is a very efficient (albeit imprecise) model of the reals. | |
| |
The example has nothing to do with floating points not to mention most computer architectures do not have floating point units. | |
| |
The underlying architecture of the machine defines the representations of the integer multiplicands (two's complement in the case the author is referring), so the bit pattern that results from doing the multiplication is interpreted as the negative number given, according to the architecture. That's all I meant. | |
| |
Ah okay, well what you said is simply untrue. It's the C++ standard that requires two's complement (S 6.8.1.3 mandates this) regardless of architecture. The reason for the negative result has nothing to do with computer architecture and has to do with C++'s integer promotion rules which promote an unsigned 16-bit integer to a signed int.There is a reason that the author explicitly states that MSVC is the compiler that produces this output, because clang and GCC do not do so, even if they run on the same computer architecture. This is due to the fact that both those compilers can recognize that signed integer overflow is undefined behavior and leverage this fact. As others have said, to consider these details just basic things is absurd. Almost no one knows this stuff, including you. | |
| |
> It's the C++ standard that requires two's complementI looked into this and you're totally right! But it actually didn't until relatively recently. Other representations were explicitly permitted by the standard at least until C++14 or so, what I said should be interpreted in that context. (And in fact, the standard requires it _because_ architecture was already universally designed that way, which I know you already know.) Thanks for the information though, this is good to know. > The reason for the negative result has nothing to do with computer architecture An existing conforming implementation will produce the bit patterns given in the article because of the architecture of the machine it's running on, so that's why I mentioned architecture. > and has to do with C++'s integer promotion rules which promote an unsigned 16-bit integer to a signed int. For sure, it also has to do with that, it just wasn't relevant to the "architecture" part of my comment which I was explaining, so I omitted it. I think we're saying the same thing from different points of view. > As others have said, to consider these details just basic things is absurd. I guess I'm inured to it, having worked at this level of abstraction for a long time haha. This is why progress is made one funeral at a time I suppose D: > Almost no one knows this stuff, including you. There are a lot of things I don't know, and this may be one of them! Thanks for your responses. | |
| |
> Do you know that in MSVC uint16_t(50000) + uin16_t(50000) == -1794967296?That problematic value is produced by multiplication (*), not addition (+). It happens because uint16_t is unsigned short, and all arithmetic must be promoted to at least rank int. The easiest universal fix for proper signed/unsigned arithmetic promotion is: (0U + uint16_t(50000)) * uint16_t(50000). See https://stackoverflow.com/questions/39964651/is-masking-befo... > And suddenly it turns out that all the “C++ killers”, even these which I wholeheartedly love and respect like Rust, Julia, and D, do not address the problem of the XXI century. They are still stuck in the XX. You forgot to mention Carbon ( https://en.wikipedia.org/wiki/Carbon_(programming_language) ) and cppfront ( https://github.com/hsutter/cppfront ). | |
| |
I think any reasonable C++ programmer should expect that adding two 16 bit uints of values 50000 will be greater than what a 16bit uint can hold and will overflow. | |
| |
Actually, what will happen according to the C++ standard depends on the size of int.C++ is doing implicit integer promotion of integer variables with types smaller than int, and that promotion converts those values (of the operands of the +) to int (gross generalization yeah yeah). So the result on g++ amd64 will be 100000 (as you would expect) if int is more than 16 bits (nowadays it is), even WITH `(uint16_t(50000) + uint16_t(50000))`. I've also tried it in MSVC 2010 and it says the result of `std::cout << (uint16_t(50000) + uint16_t(50000)) << std::endl` is 100000 (both on win32 and on x64). Try it on an arduino and you will get 34464 (with g++ targeting 8 bit atmel). Think you want implicit integer promotion in a systems language? You really don't. They are an unnecessary language feature. Also, the article is only tangentially about that--that's just an intro. The actual body makes very good points, and I think it's more than a little tongue-in-cheek :) | |
| |
Apparently, as the GP points out, the article author meant to write asterisk where he wrote plus.uint16(50_000) * uint16(50_000) is uint32(2_500_000_000), which turns out to be int32(-1_794_967_296), the garbage result the author cites. | |
| |
To paraphrase Dijkstra, "the use of C++ cripples the mind; its teaching should, therefore, be regarded as a criminal offence". Maybe a reasonable C++ programmer would expect that, sure, but: should a reasonable programmer expect that? Automatic promotion to reasonably sized integers (and in the limit, integers of unlimited precision) is ancient tech. | |
| |
> just promote my arithmetic to bigints randomly, #YOLO | |
| |
> Just throw away the most-significant digits randomly, #YOLO | |
| |
Please do not put your own words in Dijkstra's mouth: he said exactly what needed to be said. | |
| |
That’s not the issue, though, because the operands are promoted to int, which in that example happens to be 32-bit. The actual issue is that the product 50000 * 50000 = 2500000000 exceeds even the 32-bit 2^32-1 = 2147483647 and due to modulo arithmetics results in 2500000000 - 2^32 = -1794967296.If int was 16-bit (which it is allowed to be), no integer promotion would take place, and the result would be 50000^2 mod 2^16 = 63744. | |
| |
> any reasonable programmerFTFY | |
| |
No, it's exactly an unreasonable programmer who'd expect that, being conditioned to such ridiculous extravagancies by an inadequate language, which was the point of my another comment in this thread.Multiplication of intX_t by intX_t should produce int2X_t which is btw is what actually happens in hardware! And detection of overflow of addition/multiplication in a sane language should not require ridiculous algebraic acrobatics that involve additional additions/multiplications (or a division, yeah, I've seen that too). | |
| |
But in a (typed) language, types are constraints that must be respected (regardless of what hardware does), and integer precision (as specified in code) is one of them. | |
| |
Sorry, what does "respect" means? Ruby and Pascal have division operator that takes two integers and returns a real — is this disrespectful? Is it disrespectful for multiplication to have signature template<size_t N> operator*(int<N>, int<N>) -> int<2*N> instead of template<size_t N> operator*(int<N>, int<N>) -> int<min(numeric_limits<int>::width, N)> ? Why or why not? | |
| |
Types exist for a reason, and one is the desire for consistency (or uniformity) of representation: say, I want to be able to store the result of an operation in the storage element which looks the same as those where the operands came from (think of an array, for example). | |
| |
int-n * int-n -> int-n (n > 1) is fundamentally the wrong type logically for a multiplication, though, so something has to give. Sure, some languages have chosen for the arithmetic to be broken in order to maintain type, but that's certainly not clearly the right choice when others have chosen auto-promotion and most hardware has chosen a (limited selection of) int-n * int-n -> int-2n multiplication primitives to work with. | |
| |
The rule int<N> * int<N> —> int<2*N> breaks one of important expectations from a type, which is the property on being “closed” with respect to certain operations. Even if an arithmetical operation increases the length of the result, it has to stop somewhere, so there’s little justification for such increase in the first place, if one is to be logical about this. | |
| |
BigInts are closed under addition/subtraction/multiplication. Limited-size integers are not. That's "a truth that may hurt" and all programming languages have to deal with it. Most of them decided that using modular arithmetic and silently throwing away the most-significant digits is fine.But again, all your arguing about "numbers should be closed under arithmetic operations just as they are in C/C++" is kinda pointless because in C/C++, they are not! When you add/multiply two signed char's, or two shorts, you get an int, not a signed char/short, so only ints and longs are actually closed under arithmetic operations. Whoops! | |
| |
Nobody expects subtraction to be closed on the naturals, nor division on the integers. Why would be expect multiplication defined to be closed over fixed-width integers to coincide with the general arithmetic operation? | |
| |
Well, by your definition even division of reals is not closed - which means that your definition is not a correct one. A closed operation is allowed to have an "undefined" result. (And, by the way, division of integers is perfectly closed, if you ignore the remainder.) | |
| |
Division on the reals is indeed is not normally considered closed. The implied closure properties you want are not customary. | |
| |
Rust has managed to shift the discussion around systems programming to focus on safety.It didn't used to be like that. Memory-related crashes and vulnerabilities weren't blamed on C++, but just bad programmers not following best practices. Any discussion of flaws in C++ was a circular "if you don't like C++, then use C", and "if you don't like C, then use C++", because there wasn't a third option. Now even the C++ community is wondering whether C++ is becoming a "legacy" language. | |
| |
> Because if you can write in Python and have the performance of C++, why would you want to write in C++?Because then you'd have to write in Python? Seriously, all languages have tradeoffs. Performance is hardly the only consideration. | |
| |
I broadly agree with the thrust here. I find it funny because as a hardware engineer I know hardware languages literally don't work for hardware. You can write almost any crazy valid HDL (insert any systemverilog testbench code here), but there's no chance it'll map to something reasonable in your target device. Maybe you're targeting FPGA, maybe ASIC, whatever. The language let's you express anything, but what you need to express is legal. As such, what you need to bring to the table is a clear understanding of what you expect your code to map to in whatever device you're targeting.The same is true in low level software, if you're really interestied in performance it doesn't matter if you express yourself in Rust or C++ what you're really doing is trying hard to point the compiler to one best solution- a solution you know through either experience or experiment. If you care less about performance, pick a higher level language and trust the compiler more. If you're just trying to solve a problem and don't need performance at all (any python script, or building the worlds most popular code editor) just stick to python or javascript or something. | |
| |
There are two issues I have with discussions of code performance:1: Unsubstantiated and vague statements, e.g. X is 50% faster than Y, well, how is it measured, what are the actual numbers? 2: Measuring performance by measuring how much time a program takes, e.g. X takes 3 seconds while Y takes 6 seconds, well, is that consistent across measurements, have you measured it on different machines, have you measured that under various different conditions? Various factors can impact the apparent performance of code, among which is memory layout. Considering on GNU/Linux (and maybe other platforms) the global environment is passed into a program by putting it above the stack (remember, stack grows down) the memory layout can differ by virtue of having different values for the environment variables USER (username), HOST (hostname), and PWD (current working directory); apparent performance is not really reliable. I don't know of the best practices for measuring performance, but if I were responsible for measuring the performance of a program I'd do so by looking at its disassembly, assigning a cost to the different types of instructions, and adding the costs together. That probably is a terrible way to go about it, so I welcome any insight into why that isn't done. | |
| |
It is done, in a sense, in cases where it really matters (video codecs etc). More tooling for better optimization is cool and exciting and all that, but it's also not free -- extended compilation times in Rust are a nice example. | |
| |
> Do you know that in MSVC uint16_t(50000) + uin16_t(50000) == -1794967296? Do you know why? Yeah, that’s what I thought.I was very confused by this claim before I realized it should be *, not +. Either way, the operands are promoted to int before performing the addition or multiplication. But with addition, this is not too surprising, as you just get the mathematically correct result 100000. With multiplication, you get 2500000000, which is too large for a 32-bit signed integer. This is undefined behavior, but in practice it often results in wrapping, which is where you get -1794967296. | |
| |
Important to note: this is dependent on the implementation-specific sizeof(int). So while on most architectures uint16_t is the unsigned type that cannot be safely multiplied; on others uint8_t or uint32_t might be unsafe to multiply.As a result, it is nearly impossible to write platform-independent code that does something like hashing (where wraparound is intended) while also producing consistent results across platforms. If course in reality, it's the other way around: 64-bit platforms were forced to define `int` as a 32-bit integer, because otherwise they wouldn't be able to run any existing code. C23 finally fixes this by introducing new fixed-size integer types _BitInt(N) that don't suffer from this numeric promotion mess.But of course the typedefs everyone is using will have to stay broken. | |
| |
I've been noticing more and more ML projects such as hugging face adding support for JAX which makes Python have automatic differentiation which is a key feature of Julia.https://github.com/google/jax | |
| |
The "Double buffering + Spiral" in the chart example bothered me. Indeed, it does look like the double buffering was the source of the speedup. Spiral's solution used a different memory layout to avoid NUMA overhead: https://www.spiral.net/doc/papers/ipdps2018_dtp.pdfIf Spiral spotted the problem automatically that's very cool, but that's elimination of a specific bottleneck, and that is possible to do manually too. The article made it sound like a 2x superoptimizer for arbitrary algorithms, but that's likely too good to be true. | |
| |
Never heard of Spiral but it looks really interesting, particularly compared to MKL and FFTW. It’s been possible for a long time for C and C++ compilers to use “profile guided optimisation”, but it’s not widely used in practice because it’s pretty awkward to leverage. | |
| |
There are currently more than 5 million C++ developers out there. And that number is going up. New C++ projects are started every day. And the world runs on C/C++ code. So it will take quite some time (forever?) for any new language to “kill” C++. | |
| |
"You can’t win a car race if you all share the same car."Love that. | |
| |
And then goes on to laud Numba, which uses LLVM underneath as well. | |
| |
Fascinating read. I still remember using decorators when writing OpenMP code in python. Instant parallelism without having to fiddle with pthreads and the like. | |