I got thoroughly nerd-sniped by the same guy’s Quite Okay Audio format, because he did a much less mic-drop job of that one. The target bitrate is high, for a drop-in replacement on MP3 or Vorbis, and the complexity level is… weird. QOI is shockingly simple; QOA involves brute force and magic numbers. I thought I could do better.
I embraced the 64-bit “frame” concept, did some Javascript for encoding and decoding as a real-time audio filter, and got one-bit samples sounding pretty good… in some contexts. Basically I implemented delta coding. Each one-bit sample goes up or down by a value specified in that frame - with separate up and down values, defined in log scale, using very few bits. Searching for good up/down values invites obsession but works fine with guess-and-check because each dataset is tiny. I settled on simulated annealing.
Where I ditched this is shortly after doing double-delta coding. So instead of a 0 making sample N+1 be sample N plus the Down value, sample N+1 is always sample N plus the Change value, and a 0 makes Change equal Change plus the Down value. This turns out to be really good at encoding a wiggly line, one millisecond at a time. If it’s low-frequency. Low frequencies sound fantastic. Old-timey music? Gorgeous, slight hiss. Speech? Crystal clear. Techno? Complete honking garbage. Hilariously bad. Throw a high sine wave at delta coding and you get noise. Double delta coding, you get pleasant noise, but it’s still nonsense bearing little resemblance to the input. It’s not even a low-pass filter; the encoding method just chokes.
The clear fix would be re-implementing an initial test, where you specify high and low absolute values, and your one-bit samples just pick between them. It’s naive carried-error quantization and it sounds like a child’s toy that’s never getting new batteries. But I’d do it alongside the delta options. Selecting which approach produces the least error would be done per-millisecond. You’d get occasional artifacting instead of mangled output or constant buzzing. I just ran out of steam and couldn’t be arsed.
Quite Okay Imaging. One-page spec, comically fast, similar ratios.
I got thoroughly nerd-sniped by the same guy’s Quite Okay Audio format, because he did a much less mic-drop job of that one. The target bitrate is high, for a drop-in replacement on MP3 or Vorbis, and the complexity level is… weird. QOI is shockingly simple; QOA involves brute force and magic numbers. I thought I could do better.
I embraced the 64-bit “frame” concept, did some Javascript for encoding and decoding as a real-time audio filter, and got one-bit samples sounding pretty good… in some contexts. Basically I implemented delta coding. Each one-bit sample goes up or down by a value specified in that frame - with separate up and down values, defined in log scale, using very few bits. Searching for good up/down values invites obsession but works fine with guess-and-check because each dataset is tiny. I settled on simulated annealing.
Where I ditched this is shortly after doing double-delta coding. So instead of a 0 making sample N+1 be sample N plus the Down value, sample N+1 is always sample N plus the Change value, and a 0 makes Change equal Change plus the Down value. This turns out to be really good at encoding a wiggly line, one millisecond at a time. If it’s low-frequency. Low frequencies sound fantastic. Old-timey music? Gorgeous, slight hiss. Speech? Crystal clear. Techno? Complete honking garbage. Hilariously bad. Throw a high sine wave at delta coding and you get noise. Double delta coding, you get pleasant noise, but it’s still nonsense bearing little resemblance to the input. It’s not even a low-pass filter; the encoding method just chokes.
The clear fix would be re-implementing an initial test, where you specify high and low absolute values, and your one-bit samples just pick between them. It’s naive carried-error quantization and it sounds like a child’s toy that’s never getting new batteries. But I’d do it alongside the delta options. Selecting which approach produces the least error would be done per-millisecond. You’d get occasional artifacting instead of mangled output or constant buzzing. I just ran out of steam and couldn’t be arsed.