I originally told the story over on the other site, but I thought I’d share it here. With a bonus!

I was working on a hardware accessory for the OG iPad. The accessory connected to the iPad over USB and provided MIDI in/out and audio in/out appropriate for a musician trying to lay down some tracks in Garage Band.

It was a winner of a product because at its core, it was based on a USB product we had already been making for PCs for almost a decade. All we needed was a little microcontroller to put the iPad into USB host mode (this was in the 30-pin connector days), and then allow it to connect to what was basically a finished product.

This product was so old in fact that nobody knew how to compile the source code. When it came time to get it working, someone had to edit the binaries to change the USB descriptors to reflect the new product name and that it drew <10mA from the iPad’s USB port (the original device was port-powered, but the iPad would get angry if you requested more than 10mA even if you were self-powered). This was especially silly because the original product had a 4-character name, but the new product had a 7-character name. We couldn’t make room for the extra bytes, so we had to truncate the name to fit it into the binary without breaking anything.

Anyway, product ships and we notice a problem. Every once in a while, a MIDI message is missed. For those of you not familiar, MIDI is used to transmit musical notes that can be later turned into audio by whatever processor/voice you want. A typical message contains the note (A, B, F-sharp, etc), a velocity (how hard you hit the key), and whether it’s a key on or key off. So pressing and releasing a piano key generate two separate messages.

Missing the occasional note message wouldn’t typically be a big deal except for instrument voices with infinite sustain like a pipe organ. If you had the pipe organ voice selected when using our device, it’s possible that it would receive a key on, but not a key off. This would result in the iPad assuming that you were holding the key down indefinitely.

There isn’t an official spec for what to do if you receive another key-on of the same note without a key-off in between, but Apple handled this in the worst way possible. The iPad would only consider the key released if the number of key-ons and key-offs matched. So the only way to release this pipe organ key was to hope for it to skip a subsequent key-on message for the same key and then finally receive the key-off. The odds of this happening are approximately 0%, so most users had to resort to force quitting the app.

Rumors flooded the customer message boards about what could cause this behavior, maybe it was the new iOS update? Maybe you had to close all your other apps? There was a ton of hairbrained theories floating around, but nobody had any definitive explanation.

Well I was new to the company and fresh out of college, so I was tasked with figuring this one out.

First step was finding a way to generate the bug. I wrote a python script that would hammer scales into our product and just listened for a key to get stuck. I can still recall the cacophony of what amounted to an elephant on cocaine slamming on a keyboard for hours on end.

Eventually, I could reproduce the bug about every 10 minutes. One thing I noticed is that it only happened if multiple keys were pressed simultaneously. Pressing one key at a time would never produce the issue.

Using a fancy cable that is only available to Apple hardware developers, I was able to interrogate the USB traffic going between our product and the iPad. After a loooot of hunting (the USB debugger could only sample a small portion, so I had to hit the trigger right when I heard the stuck note), I was able to show that the offending note-off event was never making it to the iPad. So Apple was not to blame; our firmware was randomly not passing MIDI messages along.

Next step was getting the source to compile. I don’t remember a lot of the details, but it depended on “hex3bin” which I assume was some neckbeard’s version of hex2bin that was “better” for some reasons. I also ended up needing to find a Perl script that was buried deep in some university website. I assume that these tools were widely available when the firmware was written 7 years prior, but they took some digging. I still don’t know anything about Perl, but I got it to run.

With firmware compiling, I was able to insert instructions to blink certain LEDs (the device had a few debug LEDs inside that weren’t visible to the user) at certain points in the firmware. There was no live debugger available for the simple 8-bit processor on this thing, so that’s all I had.

What it came down to was a timing issue. The processor needed to handle audio traffic as well as MIDI traffic. It would pause whatever it was doing while handling the audio packets. The MIDI traffic was buffered, so if a key-on or key-off came in while the audio was being handled, it would be addressed immediately after the audio was done.

But it was only single buffered. So if a second MIDI message came in while audio was being handled, the second note would overwrite the first, and that first note would be forever lost. There is a limit to how fast MIDI notes can come in over USB, and it was just barely faster than it took to process the audio. So if the first note came in just after the processor cut to handling audio, the next note could potentially come in just before the processor cut back.

Now for the solution. Knowing very little about USB audio processing, but having cut my teeth in college on 8-bit 8051 processors, I knew what kind of functions tended to be slow. I did a Ctrl+F for “%” and found a 16-bit modulo right in the audio processing code.

This 16-bit modulo was just a final check that the correct number of bytes or bits were being sent (expecting remainder zero), so the denominator was going to be the same every time. The way it was written, the compiler assumed that the denominator could be different every time, so in the background it included an entire function for handling 16-bit modulos on an 8-bit processor.

I googled “optimize modulo,” and quickly learned that given a fixed denominator, any 16-bit modulo can be rewritten as three 8-bit modulos.

I tried implementing this single-line change, and the audio processor quickly dropped from 90us per packet to like 20us per packet. This 100% fixed the bug.

Unfortunately, there was no way to field-upgrade the firmware, so that was still a headache for customer service.

As to why this bug never showed up in the preceding 7 years that the USB version of the product was being sold, it was likely because most users only used the device as an audio recorder or MIDI recorder. With only MIDI enabled, no audio is processed, and the bug wouldn’t happen. The iPad however enabled every feature all the time. So the bug was always there. It’s just that nobody noticed it. Edit: also, many MIDI apps don’t do what Apple does and require matching key on/key off events. So if a key gets stuck, pressing it again will unstick it.

So three months of listening to Satan banging his fists on a pipe organ lead to a single line change to fix a seven year old bug.

TL;DR: 16-bit modulo on an 8-bit processor is slow and caused packets to get dropped.

The bonus is at 4:40 in this video https://youtu.be/DBfojDxpZLY?si=oCUlFY0YrruiUeQq

  • CosmicTurtle0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    94
    ·
    6 months ago

    OP: not only was this a great solve, you wrote this very well.

    If you haven’t already, I’d do a RCA for your company and send it to your manager and manager’s manager.

    And keep a copy for yourself.

    There aren’t many engineers that can code and write well.

  • litchralee
    link
    fedilink
    English
    arrow-up
    26
    ·
    edit-2
    6 months ago

    There was a ton of hairbrained theories floating around, but nobody had any definitive explanation.

    Well I was new to the company and fresh out of college, so I was tasked with figuring this one out.

    This checks out lol

    Knowing very little about USB audio processing, but having cut my teeth in college on 8-bit 8051 processors, I knew what kind of functions tended to be slow.

    I often wonder if this deep level understanding of embedded software/firmware design is still the norm in university instruction. My suspicion has been that focus moved to making use of ever-increasing SoC performance and capabilities, in the pursuit of making it Just Work™ but also proving Wirth’s Law in the process via badly optimized code.

    This was an excellent read, btw.

  • Bezier@suppo.fi
    link
    fedilink
    arrow-up
    6
    ·
    6 months ago

    Nice bonus :D

    Looks like it’s the same dock I was imagining the whole time. A couple years ago I randomly got a chance to play around with an old ipad (2?) and a midi keyboard. Didn’t encounter this bug.

  • ipkpjersi@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    6 months ago

    Great write-up! The biggest bugs always end up being the “simplest”/one-line things lol

    • ch00f@lemmy.worldOP
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      4 months ago

      I believe the optimization came because the denominator was a power of two. In my memory, the function counted up all of the bytes being sent and checked to see that the sum was a multiple of 16 (I think 16 bytes made a single USB endpoint or something; I still don’t fully understand USB).

      For starters, you can split up a larger modulo into smaller ones:

      X = (A + B); X % n = (A % n + B % n) % n

      So our 16 bit number X can be split into an upper and lower byte:

      X = (X & 0xFF) + (X >> 8)

      so

      X % 16 = ((X & 0xFF) % 16 + (X >>8) % 16) % 16

      This is probably what the compiler was doing in the background anyway, but the real magic came from this neat trick:

      x % 2^n = x & (2^n - 1).

      so

      x % 16 = x & 15

      So a 16 bit modulo just became three bitwise ANDs.

      Edit: and before anybody thinks I’m good a math, I’m pretty sure I found a forum post where someone was solving exactly my problem, and I just copy/pasted it in.

      Edit2: I’m pretty sure I left it here, but I think you can further optimize by just ignoring the upper byte entirely. Again, only because 16 is a power of 2 and works nicely with bitwise arithmatic.

      • Sekoia@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        1
        ·
        6 months ago

        Ahh, that makes sense. Powers of two are real convenient. Your math is a little wrong though: X != (X & 0xFF) + (X >> 8), but X = (X & 0xFF) + (X >> 8) << 8 The right half can be removed entirely if you’re doing modulo 16, since the first 4 bits will always be 0. So it simply becomes X & 15! Much cleaner for sure.