GCC 15.1 | Svelte Hacker News

Calavar 3 months ago

> {0} initializer in C or C++ for unions no longer guarantees clearing of the whole union (except for static storage duration initialization), it just initializes the first union member to zero. If initialization of the whole union including padding bits is desirable, use {} (valid in C23 or C++) or use -fzero-init-padding-bits=unions option to restore old GCC behavior.

This is going to silently break so much existing code, especially union based type punning in C code. {0} used to guarantee full zeroing and {} did not, and step by step we've flipped the situation to the reverse. The only sensible thing, in terms of not breaking old code, would be to have both {0} and {} zero initialize the whole union.

I'm sure this change was discussed in depth on the mailing list, but it's absolutely mind boggling to me

nikic 3 months ago

Fun fact: GCC decided to adopt Clang's (old) behavior at the same time Clang decided to adopt GCC's (old) behavior.
So now you have this matrix of behaviors: * Old GCC: Initializes whole union. * New GCC: Initializes first member only. * Old Clang: Initializes first member only. * New Clang: Initializes whole union.
- augusto-moura 3 months ago
  
  That's funny and sad at the same time.
  And it shows a deeper problem, even though they are willing to align behavior between each other, they failed to communicate and discuss what would be the best approach. That's a bit tragic, IMO
  - Neywiny 3 months ago
    
    I would argue the even deeper problem is that it's implementation defined. Should be in the spec and they should conform to the spec. That's why I'm so paranoid and zeroize things myself. Too much hassle to remember what is or isn't zero.
    
    flohofwoe 3 months ago
    
    I wouldn't depend on that too much either though, or at least not depend on padding bytes being zeroed. The compiler is free to replace the memset call with code that only zeroes the struct members, but leaves junk in the padding bytes (and the same is true when copying/assigning a struct).
    
    Gibbon1 3 months ago
    
    Standard should be changed to require all uninitialized memory be set to zero.
    Which includes padding bytes.
- homebrewer 3 months ago
  
  Since having multiple compilers is often touted as an advantage, how often do situations like what you're describing happen compared to the opposite — when a second compiler surfaces bugs in one's application or the other compiler?
  - uecker 3 months ago
    
    I compile my projects with clang and GCC. It is quite often that one compilers finds minor issues the other does not.
- iamthejuan 3 months ago
  
  It is like an era of average.
- zeroq 3 months ago
  
  i will call it "webification" of C!

mtklein 3 months ago

This was my instinct too, until I got this little tickle in the back of my head that maybe I remembered that Clang was already acting like this, so maybe it won't be so bad. Notice 32-bit wzr vs 64-bit xzr:

    $ cat union.c && clang -O1 -c union.c -o union.o && objdump -d union.o
    union foo {
        float  f;
        double d;
    };

    void create_f(union foo *u) {
        *u = (union foo){0};
    }

    void create_d(union foo *u) {
        *u = (union foo){.d=0};
    }

    union.o: file format mach-o arm64

    Disassembly of section __TEXT,__text:

    0000000000000000 <ltmp0>:
           0: b900001f      str wzr, [x0]
           4: d65f03c0      ret

    0000000000000008 <_create_d>:
           8: f900001f      str xzr, [x0]
           c: d65f03c0      ret

mtklein 3 months ago

Ah, I can confirm what I see elsewhere in the thread, this is no longer true in Clang. That first clang was Apple Clang 17---who knows what version that actually is---and here is Clang 20:

    $ /opt/homebrew/opt/llvm/bin/clang-20 -O1 -c union.c -o union.o && objdump -d union.o

    union.o: file format mach-o arm64

    Disassembly of section __TEXT,__text:

    0000000000000000 <ltmp0>:
           0: f900001f      str xzr, [x0]
           4: d65f03c0      ret

    0000000000000008 <_create_d>:
           8: f900001f      str xzr, [x0]
           c: d65f03c0      ret

dzaima 3 months ago

Looks like that change is clang ≤19 to clang 20: https://godbolt.org/z/7zrocxGaq

ogoffart 3 months ago

> This is going to silently break so much existing code
The code was already broken. It was an undefined behavior.
That's a problem with C and it's undefined behavior minefields.
- ryao 3 months ago
  
  GCC has long been known to define undefined behavior in C unions. In particular, type punning in unions is undefined behavior under the C and C++ standards, but GCC (and Clang) define it.
  - mtklein 3 months ago
    
    I have always thought that punning through a union was legal in C but UB in C++, and that punning through incompatible pointer casting was UB in both.
    I am basing this entirely on memory and the wikipedia article on type punning. I welcome extremely pedantic feedback.
    
    jcranmer 3 months ago
    
    > punning through a union was legal in C
    In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.
    > but UB in C++
    C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.
    C++20 added std::bit_cast which is a much safer interface to type punning than unions.
    > punning through incompatible pointer casting was UB in both
    There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.
    
    trealira 3 months ago
    
    > In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex.
    In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.
    
    ryao 3 months ago
    
    We can use UB to refer to both. :)
    
    hermitdev 3 months ago
    
    > We can use UB to refer to both. :)
    You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.
    Conflate them at your own peril.
    
    JadeNB 3 months ago
    
    > > We can use UB to refer to both. :)
    > You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.
    > Conflate them at your own peril.
    I think that ryao was not conflating them, but literally just pointing out, as a joke, that "UB" can stand for "undefined behavior" or "unspecified behavior." Taking advantage of this is inviting dangerous ambiguity, which is why ryao's suggestion ended with ":)," but I think that saying that it's wrong is an overstateent.
    
    trealira 3 months ago
    
    Maybe, but we were talking about "undefined behavior," not "UB," so the point is moot.
    
    ryao 3 months ago
    
    The GCC developers disagree as of last December:
    > Type punning via unions is undefined behavior in both c and c++.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
    
    saagarjha 3 months ago
    
    I think they're wrong about C.
    
    jotux 3 months ago
    
    Saw this recently and thought it was good: https://www.youtube.com/watch?v=NRV_bgN92DI
    
    ryao 3 months ago
    
    There has been plenty of misinformation spread on that. One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked (after I had a bug report closed due to UB). I could find the bug report if I look for it, but I would rather not do the search.
    
    trealira 3 months ago
    
    From a draft of the C23 standard, this is what it has to say about union type punning:
    > If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
    In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.
    So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.
    
    amboar 3 months ago
    
    Section J.1 _Unspecified_ behavior says
    > (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
    So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.
    From section 4 we have:
    > A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.
    
    ryao 3 months ago
    
    Here is what was said:
    > Type punning via unions is undefined behavior in both c and c++.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
    Feel free to start a discussion on the GCC mailing list.
    
    trealira 3 months ago
    
    I actually might, although not now. Thanks for the link. I'm surprised he directly contradicted the C standard, rather than it just being a misunderstanding.
    
    ryao 3 months ago
    
    According to another comment, the C standard contradicts the C standard on this:
    https://news.ycombinator.com/item?id=43794268
    Taking snippets of the C standard out of context of the whole seems to result in misunderstandings on this.
    
    trealira 3 months ago
    
    It doesn't. That commenter is saying that in C99, it was unspecified behavior. Since C11 onward, it's been removed from the unspecified behavior annex and type punning is allowed, though it may generate a trap/non-value representation. It was never undefined behavior, which is different.
    Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.
    
    ryao 3 months ago
    
    Most of the C code I write is C99 code, so it is undefined behavior either way for me (if I care about compilers other than GCC and Clang).
    That said, I am going to defer to the GCC developers on this since I do not have time to make sense of all versions of the C standard.
    
    trealira 3 months ago
    
    That's fair. In the end, what matters is how C is implemented in practice on the platforms your code targets, not what the C standard says.
    
    uecker 3 months ago
    
    Union type punning is allowed and supported by GCC: https://godbolt.org/z/vd7h6vf5q
    
    ryao 3 months ago
    
    I said that GCC defines type punning via unions. It is an extension to the C standard that GCC did.
    That said, using “the code compiles in godbolt” as proof that it is not relying on what the standard specifies to be UB is fallacious.
    
    uecker 3 months ago
    
    I am a member of the standards committee and a GCC maintainer. The C standard supports union punning. (You are right though that relying on godbolt examples can be misleading.)
    
    jotux 3 months ago
    
    https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Typ...
    
    ryao 3 months ago
    
    What is your point? I already said that GCC defines it even though the C standard does not. As per the GCC developers:
    > Type punning via unions is undefined behavior in both c and c++.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
    
    jotux 3 months ago
    
    > One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked
    I just was citing the source of this for reference.
    
    ryao 3 months ago
    
    I see. Carry on then. :)
  - flohofwoe 3 months ago
    
    > type punning in unions is undefined behavior under the C and C++ standards
    Union type punning is entirely valid in C, but UB in C++ (one of the surprisingly many subtle but still fundamental differences between C and C++). There's specifically a (somewhat obscure) footnote about this in the C standard, which also has been more clarified in one of the recent C standards.
    
    ryao 3 months ago
    
    There is no footnote about it in the C standard. Someone proposed adding one to standardize the behavior, but it was never accepted. Ever since then, people keep quoting it even though it is a rejected amendment.
    
    jcranmer 3 months ago
    
    Footnote 107 in C23, on page 75 in §6.5.2.3:
    > If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
    (though this footnote has been present as far back as C99, albeit with different numbers as the standard has added more text in the intervening 24 years).
    
    ryao 3 months ago
    
    The GCC developers disagree with your interpretation:
    > Type punning via unions is undefined behavior in both c and c++.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
    
    flohofwoe 3 months ago
    
    I'm not sure tbh what's there to 'interpret' or how a compiler developer could misread that, the wording is quite clear.
    
    ryao 3 months ago
    
    It is an excerpt being taken out of context. Of course it is quite clear. Taking it out of context ignores everything else that the standard says. That interpretation is wrong as far as compiler authors are concerned.
    
    trealira 3 months ago
    
    The context is that it's a footnote. The footnote is referenced in this paragraph:
    A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (106), and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
    106) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
    In that same document, union type punning is explicitly listed under Annex J.1, Unspecified Behavior:
    (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
    The standard is extremely clear and explicit that it's not undefined behavior.
    
    ryao 3 months ago
    
    This is not considering the document as a whole. I will defer to the GCC developers on what the document means on this.
    
    jcranmer 3 months ago
    
    I am a member of the C standards committee, and I'm telling you you're wrong here. Martin Uecker is also member of the C standards committee, and has just responded to that bug saying that the comment you linked is wrong. I, and others here, have quoted literal standards text to you explaining why type punning through unions is well-defined behavior in C.
    I don't know who Andrew Pinski is, but they're factually incorrect regarding the legality of type punning via unions in C.
    
    uecker 3 months ago
    
    Andrew is a GCC developer who is very competent (much more than myself regarding GCC), but I think he was mistakenly assuming the C++ rules apply to C here as well.
    
    trealira 3 months ago
    
    I'm interested in hearing how considering the document as a whole leads to a different conclusion.
    
    nialv7 3 months ago
    
    I wouldn't be surprised if Andrew Pinski was just wrong. It's anecdotal but my impression of him isn't very good.
  - mat_epice 3 months ago
    
    EDIT: This comment is wrong, see fsmv’s comment below. Leaving for posterity because I’m no coward!
    - - -
    Undefined behavior only means that the spec leaves a particular situation undefined and that the compiler implementor can do whatever they want. Every compiler defines undefined behavior, whether it’s documented (or easy to qualify, or deterministic) or not.
    It is in poor taste that gcc has had widely used, documented behaviors that are changing, especially in a point release.
    
    fsmv 3 months ago
    
    I think you're confusing unspecified and undefined behavior. UB could do something randomly different every time and unspecified must chose an option.
    In a lot of cases in optimizing compilers they just assume UB doesn't exist. Yes technically the compiler does do something but there's still a big difference between the two.
    
    mat_epice 3 months ago
    
    Thanks, you’re right, I was mistaken.
- grandempire 3 months ago
  
  When you have a big system many people rely on you generally try to look for ways to keep their code working - not look for the changes you’re contractually allowed to make.
  GCC probably has a better justification than “we are allowed to”.
  - arp242 3 months ago
    
    > GCC probably has a better justification than “we are allowed to”.
    Maybe, but I've seen GCC people justify such changes with little more than "it's UB, we can change it, end of story", so I wouldn't assume it.
- mwkaufma 3 months ago
  
  Undefined in the standard doesn't mean undefined in GCC. Type-punning through unions has always been a special case that GCC has taken care with beyond the standard.
myrmidon 3 months ago

I honestly feel that "uninitialized by default" is strictly a mistake, a relic from the days when C was basically cross-platform assembly language.
Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.
Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.
This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).
- rwmj 3 months ago
  
  GCC now supports -ftrivial-auto-var-init=[zero|uninitialized|pattern] for stack variables https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...
  For malloc, you could use a custom allocator, or replace all the calls with calloc.
  - myrmidon 3 months ago
    
    Very nice, did not know about this!
    The only problem with vendor extensions like this is that you can't really rely on it, so you're still kinda forced to keep all the (redundant) zero intialization; solving it at the language level is much nicer. Maybe with C2030...
- bjourne 3 months ago
  
  There are many low-level devices where initialization is very expensive. It may mean that you need two passes through memory instead of one, making whatever code you are running twice as slow.
  - myrmidon 3 months ago
    
    I would argue that these cases are pretty rare, and you could always get nominal performance with the __noinit hint, but I think this would seldomly even be needed.
    If you have instances of zero-initialized structs where you set individual fields after the initialization, all modern compiler will elide the dead stores in the the typical cases already anyway, and data of relevant size that is supposed to stay uninitialized for long is rare and a bit of an anti-pattern in my opinion anyway.
  - modeless 3 months ago
    
    Ok, those developers can use a compiler flag. We need defaults that work better for the vast majority.
    
    bjourne 3 months ago
    
    Then why are you using C? :P
    
    01HNNWZ0MV43FF 3 months ago
    
    I'm not, looks like a bad language with worse implementations
    
    nullc 3 months ago
    
    C is a bad language, too bad all the others are even worse. :P
  - nullc 3 months ago
    
    meh, the compiler can almost always eliminate the spurious default initialization because it can prove that first use is the variable being set by the real initialization. The only time the redundant initialization will be emitted by an optimizing compiler is when it can't prove its redundant.
    I think the better reason to not default initialize as a part of the language syntax is that it hides bugs.
    If the developers intent is that the correct initial state is 0 they should just explicitly initialize to zero. If they haven't, then they must intend that the correct initial state is the dynamic one in their code and the compiler silently slipping in a 0 in cases the programmer overlooked is a missed opportunity to detect a bug due to the programmer under-specifying the program.
    
    bluecalm 3 months ago
    
    It only works for simple variables where initialisation to 0 is counter productive because you lose a useful compiler warning (about using initialised variable).
    The main case is about arrays. Here it's often impossible to prove some part of it is used before initialisation. There is no warning. It becomes a tradeoff: potentially costly initialisation (arrays can be very big) or potentially using random values other than 0.
    
    nullc 3 months ago
    
    Fair point though compilers could presumably do much better warning there on arrays-- at least treating the whole array like a single variable and warning when it knows you've read it without ever reading for it.
    
    bluecalm 3 months ago
    
    C has pointers. It's often very difficult or impossible to deduct if an array was written to or not. It's possible in some cases (local array and lack of pointers of the same type in the scope) though so yeah, a warning would be useful in those cases.
    
    RustyRussell 3 months ago
    
    In recent years I've come to rely on this non-initialization idiom. Both because as code paths change the compiler can warn for simple cases, and because running tests under Valgrind catches it.
- nullc 3 months ago
  
  Zero initializing often hides real and serious bugs, however. Say you have a function with an internal variable LEN that ought to get set to some dynamic length that internal operations will run over. Changes to the code introduce a path which skips the setting of LEN. Current compilers will (very likely) warn you about the potentially uninitialized use, valgrind will warn you (assuming the case gets triggered), and failing all that the program will potentially crash when some large value ends up in LEN-- alerting you to the issue.
  Compare with default zero init: The compiler won't warn you, valgrind won't warn you, and the program won't crash. It will just be silently wrong in many cases (particularly for length/count variables).
  Generally the attention to exploit safety can sometimes push us in directions that are bad for program correctness. There are many places where exploit safety is important, but also many cases where its irrelevant. For security it's generally 'safe' is a program erroneously shuts down or does less than it should but that is far from true for software generally.
  I prefer this behavior: Use of an uninitialized variable is an error which the compiler will warn about, however, in code where the compiler cannot prove that it is not used the compiler's behavior is implementation defined and can include trapping on use, initializing to zero, or initializing to ~0 (the complement of zero) or other likely to crash pattern. The developer may annotate with _noinit which makes any use UB and avoids the cost of inserting a trap or ~0 initialization. ~0 init will usually fail but seldom in a silent way, so hopefully at least any user reports will be reproducible.
  Similar to RESTRICT _noinit is a potential footgun, but its usage would presumably be quite rare and only in carefully maintained performance critical code. Code using _noinit like RESTRICT is at least still more maintainable than assembly.
  This approach preserves the compiler's ability to detect programmer error, and lets the implementation pick the preferred way to handle the remaining error. In some contexts it's preferable to trap cleanly or crash reliably (init to ~0 or explicit trap), in others its better to be silently wrong (init 0).
  Since C99 lets you declare variables wherever so it is often easy to just declare a variable where it is first set and that's probably best, of course. .. when you can.
- bluGill 3 months ago
  
  C++26 has everything initialiied by default. The value is not specified though. Implementations are encourage to use something weird to detect using before explict initialization.
- elromulous 3 months ago
  
  Devil's advocate: this would be unacceptable for os kernels and super performance critical code (e.g. hft).
  - TuxSH 3 months ago
    
    > this would be unacceptable for os kernels
    Depends on the boundary. I can give a non-Linux, microkernel example (but that was/is shipped on dozens of millions of devices):
    - prior to 11.0, Nintendo 3DS kernel SVC (syscall) implementations did not clear output parameters, leading to extremely trivial leaks. Unprivileged processes could retrieve kernel-mode stack addresses easily and making exploit code much easier to write, example here: https://github.com/TuxSH/universal-otherapp/blob/master/sour...
    - Nintendo started clearing all temporary registers on the Switch kernel at some point (iirc x0-x7 and some more); on the 3DS they never did that, and you can leak kernel object addresses quite easily (iirc by reading r2), this made an entire class of use-after-free and arbwrite bugs easier to exploit (call SvcCreateSemaphore 3 times, get sema kernel object address, use one of the now-patched exploit that can cause a double-decref on the KSemaphore, call SvcWaitSynchronization, profit)
    more generally:
    - unclearead padding in structures + copy to user = infoleak
    so one at least ought to be careful where crossing privilege boundaries
  - myrmidon 3 months ago
    
    No, just throw the __noinit attribute at every place where its needed.
    You probably would not even need it in a lot of instances because the compiler would elide lots of dead stores (zeroing) even without hinting.
  - sidkshatriya 3 months ago
    
    Would you rather have a HFT trade go correctly and a few nanoseconds slower or a few nanoseconds faster but with some edge case bugs related to variable initialisation ?
    You might claim that that you can have both but bugs are more inevitable in the uninitialised by default scenario. I doubt that variable initialisation is the thing that would slow down HFT. I would posit is it things like network latency that would dominate.
    
    hermitdev 3 months ago
    
    > Would you rather have a HFT trade go correctly and a few nanoseconds slower or a few nanoseconds faster but with some edge case bugs related to variable initialisation ?
    As someone who works in the HFT space: it depends. How frequently and how bad are the bad-trade cases? Some slop happens. We make trade decisions with hardware _without even seeing an entire packet coming in on the network_. Mistakes/bad trades happen. Sometimes it results in trades that don't go our way or missed opportunities.
    Just as important as "can we do better?" is "should we do better?". Queue priority at the exchange matters. Shaving nanoseconds is how you get a competitive edge.
    > I would posit is it things like network latency that would dominate.
    Everything matters. Everything is measured.
    edit to add: I'm not saying we write software that either has or relies upon unitialized values. I'm just saying in such a hypothetical, it's not a cut and dry "do the right thing (correct according to the language spec)" decision.
    
    Imustaskforhelp 3 months ago
    
    We make trade decisions with hardware _without even seeing an entire packet coming in on the network_
    Wait what????
    Can you please educate me on high frequency trading... , like I don't understand what's the point of it & lets say one person has created a hft bot then why the need of other bot other than the fact of different trading strats and I don't think these are profitable / how they compare in the long run with the boglehead strategy??
    
    hermitdev 3 months ago
    
    This is a vast, _vast_ over-simplification: The primary "feature" of HFT is providing liquidity to market.
    HFT firms are (almost) always willing to buy or sell at or near the current market price. HFT firms basically race each other for trade volume from "retail" traders (and sometimes each other). HFTs make money off the spread - the difference between the bid & offer - typically only a cent. You don't make a lot of money on any individual trade (and some trades are losers), but you make money on doing a lot of volume. If done properly, it doesn't matter which direction the market moves for an HFT, they'll make money either way as long as there's sufficient trading volume to be had.
    But honestly, if you want to learn about HFT, best do some actual research on it - I'm not a great source as I'm just the guy that keeps the stuff up and running; I'm not too involved in the business side of things. There's a lot of negative press about HFTs, some positive.
  - pjmlp 3 months ago
    
    It is acceptable enough for Windows, Android and macOS, that have been doing for at least the last five years.
    That is the usual fearmongering when security improvements are done to C and C++.
  - saagarjha 3 months ago
    
    The same OS kernel that zeros out pages before handing them back to me?
    
    frontfor 3 months ago
    
    This is arguing in bad faith. Just because the kernel does that doesn’t mean it does that in everywhere else.
    
    saagarjha 3 months ago
    
    The point is that there are security implications to not zeroing out memory, even if it costs performance. Making an argument that it’s too performance sensitive to do anything doesn’t actually hold water.
zzo38computer 3 months ago

I thought that {} should always initialize everything regardless of whether or not there is anything in between the braces, and that {0} should only be valid if the first member is a numeric or pointer type (but otherwise has the same effect as {} with nothing in between). I thought that would make more sense, isn't it?
(If you write {} with multiple values when initializing a union, then it should be an error unless all of the values are the same and all of the corresponding members (the first few if you do not explicitly specify which ones) are of the same type as each other.)
- wahern 3 months ago
  
  C never had {} until C23. In C {0} was the only way to explicitly zero-initialize a structure in a generic manner. It works because in C initializer lists are applied to members as-if nested structures are flattened out lexically.
  However, a long time ago C++ went in a completely different direction with initializer lists, and gcc and clang started emitting warnings (in C mode) about otherwise perfectly valid C code, thus the adoption of C++'s {} for C23. {0} is still technically valid C23, though, as well as valid C89, C90, C99, and C11. In fact, reading both C23 and C89 I'm struck by how little the language has changed:
  C89 3.5.7p16:
  > If the aggregate contains members that are aggregates or unions, or if the first member of a union is an aggregate or union, the rules apply recursively to the subaggregates or contained unions. If the initializer of a subaggregate or contained union begins with a left brace, the initializers enclosed by that brace and its matching right brace initialize the members of the subaggregate or the first member of the contained union. Otherwise, only enough initializers from the list are taken to account for the members of the first subaggregate or the first member of the contained union; any remaining initializers are left to initialize the next member of the aggregate of which the current subaggregate or contained union is a part.
  C23 6.7.10p21:
  > If the aggregate or union contains elements or members that are aggregates or unions, these rules apply recursively to the subaggregates or contained unions. If the initializer of a subaggregate or contained union begins with a left brace, the initializers enclosed by that brace and its matching right brace initialize the elements or members of the subaggregate or the contained union. Otherwise, only enough initializers from the list are taken to account for the elements or members of the subaggregate or the first member of the contained union; any remaining initializers are left to initialize the next element or member of the aggregate of which the current subaggregate or contained union is a part.
Blikkentrekker 3 months ago

I have to say, I've read the discussion this generated and it's a bit scary how no one seems to know whether type punning through unions is undefined or not in C, or rather, my conclusion reading it all is more so that many people are wrong and that is defined behavior, but some of the people who are wrong about it are actual GCC compiler developers so it can't be too easy to be right.
- krackers 3 months ago
  
  I don't understand why newer revisions of C don't work on fixing these small issues. Things that were previously "undefined/implementation-defined behavior" can easily be made to behave sensibly without breaking anything. Type punning, 2s complement overflow, 0-initializtion of unions, all of those should "just behave" sensibly how the programmer expects. And you can already get there with the right compiler flags, so why not just codify it. It's also not going to break anything since it was undefined behavior in the first place.
  - Blikkentrekker 3 months ago
    
    The thing about writing standards is that if you write standards compiler writers vehemently disagree with, they will just not implement them, and they disagree with it because their consumers do. A standard typically documents what is already happening. This is why some languages call their standards “reports”. They investigate and document what the majority of compilers are currently doing and encourage the others to follow suit.
    As for overflow, the reality is that most compilers simply assume it won't happen at this point. They do this because the consumers want it because it simply generates far faster code being able to assume that it won't happen. Yes, people often come with pathological examples to show why this is a bad idea of ridiculous optimizations being made no one expects because compilers assume it won't ever happen, but those are pathological, in practice it really comes down to loops. In many loops, compilers having to assume that loop variables can overflow in theory disables all sorts of optimizations and elisions and in practice they won't overflow and if they overflow that's an unintended bug anyway.
    Obviously a a very basic example is a loop adding some counter value to a counter and stopping when the counter is past a certain value. Assuming that integers can overflow, and that thus adding a value can make the counter less than what it used to be in theory obviously disables many optimizations in streamlining the logic. Just in general, assuming overflow can't occur means being able to make the assumption that adding a positive integer to another integer will always produce a larger integer than the original, that is a very powerful assumption for optimizations to be able to make obviously, assuming that overflow can happen removes it that's why it's undefined behavior. Compilers are free to assume it will never happen.
  - darthwalsh 3 months ago
    
    C still supports a huge variety of embedded processors, which I imagine influences the overflow UB. But clearing up the type semantics would be nice.
    
    krackers 3 months ago
    
    Are there any processors today which _don't_ use 2s complement?
    
    Gibbon1 3 months ago
    
    I use embedded processors. I don't know of any that don't use 2s complement. There are only a handful of increasingly irreverent processors that are big endian. And x86 real mode processors are long in the tooth.
    There other thing is the ratio of processing power vs memory size is very high for embedded machines. You have processors that can hold their own against a 486 but only have 16k of RAM. And the marginal cost of performance is low. A lot of devices spend most of their time doing utterly nothing.
mastax 3 months ago

Do distros have tooling to deal with this type of change?
I imagine it would be very useful to be able to search through all the C/C++ source files for all the packages in the distro in a semantic manner, so that it understands typedefs and preprocessor macros etc. The search query for this change would be something like "find all union types whose first member is not its largest member, then find all lines of code where that type is initialized with `{0}`".
- ryao 3 months ago
  
  As a retired Gentoo developer, I can say not really as far as I know. There could be static analysis tools that can find this, but I am not aware of anyone who runs them on the entire distribution.
  - mastax 3 months ago
    
    In theory it's just an extension of IDE tooling. A CLI with a little query language wrapping libclang. In practice I'm sure it's a nightmare just to get 20,000 packages' build systems wrangled such that the right source files get indexed by libclang, and all the endless plumbing for downloading packages and reporting results, and on and on.
    
    ryao 3 months ago
    
    Distribution build systems typically operate outside of an IDE. I suspect that it would be a nightmare to get 20,000 packages to compile in an IDE.
    It is possible in theory to write a compiler plugin to generate an error when code that does this is found and it would make it easy to find all of the instances in all packages by building with `make -k`, provided that the code is not hidden behind an unused package flag.
- ris 3 months ago
  
  Distributions tend to use shell-script-wrapped compilers that can inject additional flags desired by the distribution, and in all likelihood distributions will just add flags that force the old behaviour if there are problems.
anon-3988 3 months ago

lol this is exactly the kind of stuff I expects from C or C++ haha its kinda insane people just decide to do this amidst all the talk about correctness/safety.
ryao 3 months ago

> This is going to silently break so much existing code
How much code actually uses unions this way?
> especially union based type punning in C code
I have never done type punning via the GNU C compiler extension in a way that would break because of this. I always assign a value to it and then get out the value from a new type. Do you know of any code that does things differently to be affected by this?
- ndiddy 3 months ago
  
  > How much code actually uses unions this way?
  I see this change caused Mbed-TLS to start failing its test suite when compiled with GCC 15: https://github.com/Mbed-TLS/mbedtls/issues/9814 (kinda scary since it's a security library). Hopefully other projects with less rigorous test suites aren't using {0} in that way. The Github issue mentions that Clang tried a similar optimization a while ago and backed it out after user complaints, so maybe the same thing will happen with GCC.
  - ryao 3 months ago
    
    GCC’s developers have a strong insistence on standards conformance (minus situations where they explicitly choose to deviate, like type punning in unions) over the status quo. We already went through a much more severe shift with strict aliasing enforcement by GCC and they never changed course. I do not expect this to be any different.
- Calavar 3 months ago
  
  I would guess a lot. People aren't intimately familiar with the standard, and people are lazy when it comes to writing boilerplate like initialization code. And up until now, it just worked, so even a good test suite wouldn't catch it.
  EDIT: I initially mentioned type punning for arithmetic, but this compiler change wouldn't affect that
  - ryao 3 months ago
    
    How would that be broken by this? The union will be zero initialized regardless because this change only affects situations where the union members are of different lengths, but for integer to float, the union members should always be the same length or bad things will happen.
    
    Calavar 3 months ago
    
    I realized my mistake and I think I edited my comment a split second before you replied, but you're right. That particular type punning scenario wouldn't be affected by this change because 1) the members are the same size, so there's no padding bits 2) the specific union member is going to be initialized to the input parameter, not with the syntax sugar for aggregate zero initialization.
    
    ryao 3 months ago
    
    Well, under your original version, I could see someone filling in bit fields in the float like the exponent and sign while leaving the mantissa zeroed, but given that the integer and float would be the same length, there is no section that would be left uninitialized by this change.
    In order for this change to leave something uninitialized, you would need to have a member of the union after the first member that is longer than the first member. Code that does that and relies on {0} to zero the union seems incredibly rare to me.
akoboldfrying 3 months ago

Initialisation in C++ is just footguns all the way down.
not2b 3 months ago

I'm skeptical of the claim that this change will "silently break so much existing code". For it to change the behavior of code, the first member would have to be smaller than other members, someone would have to use this construct to initialize union objects, and it would have to affect the behavior. In any case, it's standard for the Fedora, Ubuntu, and Debian developers to go through all the packages and test with new GCC versions before they come out, so that issues are fixed before the new compiler is released.
psyclobe 3 months ago

There is no reason to use a union unless you're doing some C stuff; in which case just use C.
mistrial9 3 months ago

using UNION was always considered sketchy IMHO. This is trivia for security exploiters?
- grandempire 3 months ago
  
  No. This is how sum types are implemented.
  And from a runtime perspective it’s going to be a struct with perhaps more padding. You’ll need more details about your specific threat model to explain why that’s bad.
  - mistrial9 3 months ago
    
    a quick search says that std::variant is the modern replacement to implement your niche feature "sum types"
    
    jlouis 3 months ago
    
    Not a niche feature. Fundamental for any decent language with a type system.
    
    mistrial9 3 months ago
    
    ok, but C99 and C++11 and others, all have ways to implement types. "Fundemental" as you say.. using UNION in C++ is not a good choice to implement types.. in old C99, you can use UNION that way but why? footguns all around.
    
    saagarjha 3 months ago
    
    std::variant does not exist in C99.
    
    soraminazuki 3 months ago
    
    Whoa, that's a core building block of programming and computer science that you're dismissing as "niche" without explanation.
    
    mistrial9 3 months ago
    
    yes types are a core building block of programming and computer science, but not using UNION ? this casual dismissal of "criticisms of UNION" here seems superficial and un-wise to me.
    
    soraminazuki 3 months ago
    
    Sum types, not C unions. Different concepts.
    A sum type is a concept from type theory. Like unions, it expresses a type that can be either one of multiple types. But unlike unions, it retains information about which type it is.
    Properly implemented sum types are completely type safe. I can't be 100% sure what your particular "criticisms" of C unions precisely are, but assuming they all relate to type safety, they don't apply to sum types.
    Sum types are important because any real world project has to deal with data that's either A or B. There's nothing controversial here.
    In C, a union is a way to implement that. Yes, it's unsafe. But can you eliminate the use of unsafe features from C projects? No, if they deal with memory.
    Also, it's rich and quite frankly rude to brush off my comment as "casual dismissals," "superficial," and "unwise" when it's a direct response to this.
    > your niche feature "sum types"
    That's pure unprovoked smugness right there that contains no substance of what your criticisms actually are, let alone the reason.
    
    grandempire 3 months ago
    
    That’s for C++. And how is std::variant implemented?
    
    LowLevelMahn 3 months ago
    
    not using a union: https://ojdip.net/2013/10/implementing-a-variant-type-in-cpp... because the union can't be extended with variadic template types
    
    LegionMammal978 3 months ago
    
    Actually, it does use a union, in both libstdc++ [0] and libc++ [1]. (Underneath a lengthy stack of base classes, since it wouldn't be C++ if it weren't painful to match the specified semantics.)
    [0] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3...
    [1] https://github.com/llvm/llvm-project/blob/llvmorg-20.1.3/lib...
    
    grandempire 3 months ago
    
    So instead it has a buffer large enough to hold all the types? That’s what union does.
    Still waiting to hear the security concerns.
VyseofArcadia 3 months ago

I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.
I can deal with the footguns if they aren't cheekily mutating over the years. I feel like in C++ especially we barely have the time to come to terms with the unintended consequences of the previous language revision before the next one drops a whole new load of them on us.
- seritools 3 months ago
  
  > If the size of the new type is larger than the size of the last-written type, the contents of the excess bytes are unspecified (and may be a trap representation). Before C99 TC3 (DR 283) this behavior was undefined, but commonly implemented this way.
  https://en.cppreference.com/w/c/language/union
  > When initializing a union, the initializer list must have only one member, which initializes the first member of the union unless a designated initializer is used(since C99).
  https://en.cppreference.com/w/c/language/struct_initializati...
  → = {0} initializes the first union variant, and bytes outside of that first variant are unspecified. Seems like GCC 15.1 follows the 26 year old standard correctly. (not sure how much has changed from C89 here)
- pjmlp 3 months ago
  
  Programming languages are products, that is like saying you want to keep using vi 1.0.
  Maybe C should have stop at K&R C from UNIX V6, at least that would have spared the world in having it being adopted outside UNIX.
  - rgoulter 3 months ago
    
    I liked the idea I heard: internet audiences demand progress, but internet audiences hate change.
  - ryao 3 months ago
    
    If C++ had never been invented, that might have been the case.
    
    pjmlp 3 months ago
    
    C++ was invented exactly because Bjarne Stroustoup vouched never again to repeat the downgrade of his development experience from Simula to BCPL.
    When faced with writing a distributed systems application at Bell Labs, and having to deal with C, the very first step was to create C with Classes.
    Also had C++ not been invented, or C gone into an history footnote, so what, there would be other programming languages to chose from.
    Lets not put programming languages into some kind of worshiping sanctuary.
    
    uecker 3 months ago
    
    I don't think C would have become a footnote if not for C++ given UNIX.
    
    pjmlp 3 months ago
    
    Most likely C++ would not happened, while at the same time C and UNIX adoption would never gotten big enough to be relevant outside Bell Labs.
    Which then again, isn't that much of a deal, industry would have steered into other programming languages and operating systems.
    Overall that would be a much preferable alternative timeline, assuming security would be taken more seriously, as it has taken 45 years since C.A.R Hoare Turing award speech and Morris worm, and only after companies and government started to feel the monetary pain of their decisions.
    
    uecker 3 months ago
    
    I think there are very good reasons why C and UNIX were successful and are still around as foundational technologies. Nor do I think C or UNIX legacy are the real problem we have with security. Instead, complexity is the problem.
    
    pjmlp 3 months ago
    
    Starting by being available for free with source code tapes, and a commented source code book.
    History would certainly have taken a different path when AT&T was allowed to profit from Bell Labs work, as their attempts to later regain control from UNIX prove.
    Unfortunately that seems the majority opinion on WG14, only changed thanks to government and industry pressure.
    
    uecker 3 months ago
    
    Being free was important and history could have taken many paths, but this does not explain why it is still important today and has not been replaced despite many alternatives. WG14 consists mostly of industry representatives.
    
    pjmlp 3 months ago
    
    It is important today just like COBOL and Fortran are with ongoing ISO updates, sunken cost, no one is getting more money out of rewriting their systems just because, unless there are external factors, like government regulations.
    Then we have the free beer UNIX clones as well.
    Those industry members of WG14 don't seem to have done much security wise language improvement during the last 50 years.
    
    uecker 3 months ago
    
    I think this is far from the truth.
- ryao 3 months ago
  
  I suspect this change was motivated by standards conformance.
  - fuhsnn 3 months ago
    
    The wording of GCC maintainer was "the standard doesn't require it." when they informed Linux kernel mailing list.
    https://lore.kernel.org/linux-toolchains/Z0hRrrNU3Q+ro2T7@tu...
    
    matheusmoreira 3 months ago
    
    Reminds me of strict aliasing. Same attitude...
    https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
- Ragnarork 3 months ago
  
  > I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.
  Thank goodness this is not how the software world works overall. I'm not sure you understand the implications of what you ask for.
  > if they aren't cheekily mutating over the years
  You're complaining about languages mutating, then mention C++ which has added stuff but maintained backwards compatibility over the course of many standards (aside from a few hiccups like auto_ptr, which was also short lived), with a high aversion to modifying existing stuff.
- _joel 3 months ago
  
  Perl 6 and Python 3 joined the chat
- hulitu 3 months ago
  
  It's careless development. Why think something in advance when you can fix it later. It works so well for Microsoft, Google and lately Apple. /s
  The release cycle of a software speaks a lot about its quality. Move fast, break things has become the new development process.
  - pasc1878 3 months ago
    
    That does not make sense for anything that exists over decades.
    Do you want to be still using Windows NT, or C++ pred 2004 standard or python 2.0
    We learn more and need to add to things., Some things we designed 30 years ago were a mistake should we stick with them.
    You can't design everything before release for much software. Games you can or bespoke software for a business as you can define what it does, but then the business changes.

omoikane 3 months ago

Really excited about #embed support:

> C: #embed preprocessing directive support.

> C++: P1967R14, #embed (PR119065)