When the Simplest Concurrent Program Goes Against All Intuition

wyounas.github.io

45 points by simplegeek 6 months ago

It's never actually explained how n=2 is possible. Here's how it works:

P1 and P2 start executing. P1 sets temp=1, then P2 runs through the loop for i=1..9. Then P1 sets n=temp=1. P2 sets temp=2, then P1 resumes executing for i=2..10, and completes. P2 sets n=temp=2, then completes.

The intuition is P1 checkpoints, P2 runs through the loop up until the last iteration, P1 resets n to the checkpoint + 1, P2 checkpoints, P1 runs through the loop, then P2 resets n to the checkpoint + 1.

simplegeek 6 months ago

Thank you! I initially included the illustration and explanation in the first draft but decided to remove them in favor of the full trail. Looking back, I think that was a mistake. I’ve now added back the illustration along with some explanation.

tombert 6 months ago

This was fun to port over to PlusCal to verify it myself:

    EXTENDS Naturals, Sequences
    
    (*--algorithm StateStore {
    
    variables
      n = 0; completions = <<>>; 
    
    define {
        Invar == Len(completions) = 2 => n # 2
    }
    
    fair process (thing \in {"p", "q"})
    variables i = 0, temp = 0; 
      {
       Loop:
            while (i < 10) {
               First:
                   temp := n + 1;
               Second:
                   n := temp;
               Last: 
                   i := i + 1; 
        };
        
        Complete:
            completions := Append(completions, 1)
      }
    }*)

(I acknowledge that this isn't the most elegant Pluscal but I think it is a roughly accurate?)

simplegeek 6 months ago

Thank you :) I was wondering how this would look in TLA+/PlusCal.

Groxx 6 months ago

To be pedantic (which I feel is fair because it's a concurrency question about possibilities):

>If we run P and Q concurrently, with ‘n’ initialized to zero, what would the value of ‘n’ be when the two processes finish executing their statements?

It depends a lot on the observer of `n` and what relationship it has with P and Q. Which isn't defined.

E.g. a trivial boundary-answer is that it can be 0, if nothing guarantees that P's and Q's threads' memory reaches the `n`-observer's thread. This is somewhat common if you try to synchronize via `sleep`, because that usually doesn't guarantee anything as all.

We also don't know the computer architecture. We can probably assume at least one byte moves between memory levels atomically, because basically every practical system does that, but that doesn't have to be true. If bits move between memory levels independently, this could observe 31, because it can be any combination of the bits between `00000` and `10100`, which includes `01011` -> there can be a `1` in any position, including all positions, so you can observe a number that was never assigned. (IRL: multi-word values and partial initializations can do this, e.g. it's why double-checked locking is flawed without something to guarantee initialization completes before the pointer is exposed)

simplegeek 6 months ago

You're right, I could have phrased it better to something like:
"If we run P and Q concurrently with ‘n’ initialized to zero, what extreme interleaving could result in the lowest value of 'n' when the two processes finish executing their statements on a model checker?"
I'll edit it to improve, thanks.
- Groxx 6 months ago
  
  tbh I think zero is a completely reasonable result for a general thought-exercise like this. sleep-based "memory fixes" are quite common in practice, and there's no synchronization at all in the sample.
  though there's a lot of fun in here if you allow for optimizing compilers and lack of synchronization. e.g. without explicit synchronization between P/Q and the observing thread (assuming it's the main thread), it's reasonable for a compiler to simply delete P and Q entirely and replace the whole program with `print(0)`

fjfaase 6 months ago

This implicitely assumes atomic assignments, meaning that during an assignment all bits representing a value are transfered in one atomic unit. This sounds a bit trivial, but if one would be working with large integers that are stored in multiple memory words, it is less trivial.

I think it is possible to implement locking with only atomic assigments.

jprete 6 months ago

I get your point, and my first question was "what operations are even atomic here for this problem to be well-defined?".
But I think "language operations are atomic and everything else isn't" is a reasonable inference for a puzzle.
I'm also intrigued by locking being done through atomic assignments; I thought it required at least something like "write if equal to value" or "write new value and return old".
- fjfaase 6 months ago
  
  The idea is that there is one task that does the locking. It does so by copying a value from a memory location A to B, when B is equal 0 (the initial value) at regular intervals. Any task that wants to get the lock, writes a unique task identifier (unequal to 0) to memory location A and starts polling B. When it becomes equal to its task identifier, it did get the lock. To release the lock, it writes 0 to B. If it did not get the lock and sees that B has become zero, and A is different, it may overwrite A with its own task identifier.
  This locking is 'slow' and has 'unfair' behavior.
  - fjfaase 6 months ago
    
    I think that the following PROMELA code proofs it:
    byte n = 0 byte A = 0 byte B = 0 proctype Client(byte id) { do :: (A == 0) -> B = id :: (A == id) -> n = n + 1; assert (n <= 1); n = n - 1; A = 0 od } proctype LockServer() { do :: (A == 0) -> A = B od } init { atomic { run LockServer(); run Client(1); run Client(2); run Client(3); } }
    It fails when two 'Client(2)' is replaced by 'Client(1)'
yencabulator 6 months ago

You can replace the read and store with atomic operations and n==2 is still possible. As far as I can tell, the model being verified already uses atomic operations.
bcoates 6 months ago

Yeah, looking at the code my 'intuition' was "this is a classic data race, the program is broken" not "somewhere between 10 and 20".
I suppose if you assume all global accesses are atomic, it's a good demonstration that atomic operations don’t compose atomically (even in trivial cases) and aren't particularly useful as concurrency primitives.
- Salgat 6 months ago
  
  The question is more testing your full understanding of why this is a race condition, including understanding the possible side effects, since that's the only thing you start out with when someone sends in a bug report.

hedin_hiervard 6 months ago

I was once asked a very similar problem during a job interview for a position of a software engineer. I was offered to solve it on a piece of paper or orally. We ran out of time and they offered me to take it home. I did and I spent a few hours thinking about it. Google and ChatGPT were not able to produce better results than the intuitive answer, so I built a test program that confirmed the weird behaviour. After meditating for a few hours I was able to find the correct answer and I was indeed fascinated by this. Do you also think it's too much for a usual interview?

kunley 6 months ago

Kudos for solving this. And yes, this is too much for a job interview.
- burfle 6 months ago
  
  I don't think this an unreasonable puzzle for a role that involves problem-solving or security.
  You can read the code, figure out what primitives you have, translate the problem into something simple (I like card and board games), then map it back.
  Problem: two processes each do this ten times: - read the counter - do any number of other things in other processes - write back what you read, plus one.
  Game: You have two piles of cards. Each card costs 1 move to play, but lets you discard any number of cards from other piles at the same time.
  Solution: play one red card, discarding everything but one blue card. Play the single remaining blue card, discarding the leftover red cards.
  Back: process 1 reads, process 2 runs 9 steps, process 1 writes, process 2 reads, process 1 finishes, process 2 writes.

sim7c00 6 months ago

the best thing i think i read somewhere is to think of modern cpus as a bunch of computers on a network. each task is transmitted in messages over the network potentially, executed and its result sent back in a packet. despite this not being fully accurate it does give an idea that even in a set of 100 operations, due to caches, latencies and also kernel/hardware stuff happening during these 100 operations however much you try to isolate them, the first operation submitted might see its result sent back last.

this will help to determine or realize better when and where synchronization is needed even though you might not expect it. (or remoddeling of the code).

its extremely hard to writr concurrent code well. i think i've hardly ever managed lockless code that wasnt super bugy (yet! :D). it takes a lot of focus and patience, and tons details about the platform(s) it might run on.

chozang 6 months ago

Bonus points for mentioning Spin. My "Software Engineering" course in 1987 in grad school would have been more accurately titled, "Using the Spin Model Checker". Except when I was looking for it, I had never came across any mention of Spin since.

jpc0 6 months ago

As another commentator mentioned, you have violated a precondition, there's no point trying to understand what would happen because its indeterminate.

Izikiel43 6 months ago

The intuition I developed over the years says that result is unknown, it's a race condition, all bets are off. Specially since doing N+1.

dtgriscom 6 months ago

Seems pretty clear that the result could be anything from 1 to 20.

(Assumes atomic assignments, as noted by someone else.)

vbsd 6 months ago

What sort of an interleaving would produce 1? Seems provably impossible to me, assuming atomic assignments.
- dtgriscom 6 months ago
  
  I took another look, and AFAICT you're right. The best I can do is get the value 2:
  * Run procedure P into its first iteration, stopping after temp is assigned "n + 1", or the value 1
  * Run procedure Q through nine iterations, stopping after the ninth has completed (n changes, but who cares?)
  * Run the next line of procedure P's first iteration, so that it assigns temp's value of 1 to n; n is now 1
  * Run procedure Q into its tenth iteration, stopping after temp is assigned the value 2
  * Complete procedure P (n changes, but who cares?)
  * Complete procedure Q, so that n is assigned the value of temp, or 2.

lincpa 6 months ago

[dead]