This is about 25-30% faster than the current scalar version, both on an
M1 Mac Mini and a Raspberry Pi 4 on a 32-bit Raspberry Pi OS install.
Since this needs tons of NEON registers, it might run better on a Pi4 in
64-bit mode, if the compiler understands it can have 32 SIMD registers
instead of 16, and never shuffle things between registers and RAM.
Reference Issue #23.
Since we use the same Cr and Cb values for each of these rows, this
lets us calculate them once and reuse them with both rows' Y values.
The test case went from 2.0 seconds to 1.9.
Reference Issue #23.
Turns out dramatically reducing register pressure by precalculating a
bunch of stuff and shoving them into `#define`s makes both of these
approaches just about similar in performance, but the `#else` block is
_still_ slightly faster, but it's a statistical rounding error now, and
maybe just imaginary.
Reference Issue #23.
It's not clear to me why the `#if 0` block would be slower...maybe we ran
out of CPU registers and it ended up shuffling stuff back and forth
from the stack?
Just adding this to revision control so I have a note that I tried this,
later.
Reference Issue #23.
The test case dropped from 2.4 seconds to about 2.0 on this (x86-64)
laptop, and my guess is this is a much more drastic improvement on
things like a Raspberry Pi.
Reference Issue #23.
This precalculates most stuff, so the tight loop does a lot less math.
A test case on my laptop dropped from 3.2 seconds to around 2.4. Not bad!
Reference Issue #23.
This was passing pthread_mutex_t around by value, but this is
often a struct, so we were copying the struct by-value, where we
would lock the _copy_ that is thrown away upon return from
Mutex_Lock(), and the actual mutex will sit there untouched,
which is to say it basically disables locking. You might get
lucky and the struct has some internal pointer where you end up
with a working lock anyhow, but this is not the case on macOS
and probably other places, too.
Surprisingly, this very rarely produced a noticible race
condition, but there have been weird assertions from time to
time, and I expect this will fix those.
This blacks out the screen for a bit between seeks and gameplay resuming.
Right now there isn't code to enable this, but eventually I'll hook up
cvars at the engine level and add this.
God mode chooses the first choice that leads to a sequence that
isn't marked `kills_player=true`, but the reversed long falling
platform and the fire room both have choices that lead to a
non-interactive sequence that, on timeout, leads to a
player-killing sequence.
Mark those sequences as fatal, too, to fix god mode. In normal
play, the game doesn't look at `kills_player` unless there's no
`nextsequence` available, so it won't end the scene early to
be there.