If you care about performance, you may want to avoid CSV files. But since our data sources are often like our family, we can’t make a choice, we’ll see in this blog post how to process a CSV file as fast as possible.

  • fartsparkles
    link
    fedilink
    arrow-up
    27
    ·
    8 months ago

    Holy shit, switching to PyArrow is going to make me seem a mystical wizard when I merge in the morning. I’ve easily halved the execution time of a horrible but unavoidable job (yay crappy vendor “API” that returns a huge CSV).

    • AlecSadler
      link
      fedilink
      arrow-up
      3
      ·
      8 months ago

      You and me both. I’ve been parsing around 10-100 million row CSVs lately and…this will hopefully help.