How to Solve "Out of Memory" Errors in Pipedream

My team and I use Pipedream for all of our business’ automations and workflows. I’ve also pushed Pipedream to its absolute limits by building a workflow that transcribes uploaded audio files, summarizes them, and sends the transcript/summary to Notion.

It’s in building this workflow that I started hitting Out of Memory errors. It’s also the point at which I started tearing my hair out, particularly because Pipedream’s own documentation on this error is scant on details.

With much of my hair on the floor, I began a… very tedious… process of research and meticulous testing to figure out what actually causes these Out of Memory errors. Here’s what I’ve learned.

In summary:

Every step in a workflow spawns its own process, which adds to the memory load.
Importing NPM packages will increase memory load. Some packages are hefty.
You can monitor per-step memory use with node.js’ own process.memoryUsage() function.
There is no way to monitor the memory use of the entire workflow.
Writing your own code to combine workflow steps into fewer steps (ideally just one) will reduce memory use.
Native Pipedream no-code steps might be using inefficient code. Open Github issues in these cases.

Here’s a bunch of additional detail on these points, which I originally wrote on a whim in response to a question in their support forum.

First, yes, each step in a workflow invokes its own memory overhead. I’m not clear on what the baseline load is, but minimizing steps helps with memory use.

This can also make workflows run faster (and hence use fewer credits). I find that workflows with many steps often have a far greater total execution time than the sum of the times reported by each step. This inflates credit usage, but you can mitigate it by writing code that places all workflow functions in a single step.

You’ll have an easier time doing this with the CLI, which lets you build out entire app directory structures and get around the character limit in the in-app editor. This is exactly how Notion Voice Notes is built. It’s a “workflow” that consists of over 6,000 lines of active code across many files.

Second, you can use node.js’ own process.memoryUsage() and process.cpuUsage() methods to roughly gauge resource usage at a specific point during a run. Here’s an example I use in Notion Voice Notes:

logMemoryUsage(context) {
    const usage = process.memoryUsage();
    const cpuUsage = process.cpuUsage();
    console.log(`Resource Usage (${context}):`, {
        Memory: {
            RSS: `${Math.round(usage.rss / 1024 / 1024)}MB`,
            HeapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
            HeapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
            External: `${Math.round(usage.external / 1024 / 1024)}MB`
        },
        CPU: {
            User: `${Math.round(cpuUsage.user / 1000)}ms`,
            System: `${Math.round(cpuUsage.system / 1000)}ms`,
            Total: `${Math.round((cpuUsage.user + cpuUsage.system) / 1000)}ms`
        }
    });
}Code language: PHP (php)

However, this likely isn’t the true usage, as it’s only what the node.js process can see from inside its container at runtime. The container itself likely adds some overhead – and as mentioned, each step in a workflow is its own container.

A big lesson I learned from using these functions, though, is that all those fun NPM packages we like to import into our code steps add to the memory pressure – and some packages are big. I used to use the natural package to do tokenization, but had to strip it out of my workflows because it’s gigantic.

I’ll also share that “Memory” is a bit misleading. The Lambda architecture has a single “Memory” setting that tunes RAM, CPU, and potentially even network I/O throughput. So, “Out of Memory” might actually mean your workflow is CPU-bound. For most of the stuff we’re building, I doubt that’s the case, but in my quest to squeeze as much out of ffmpeg in Pipedream as possible, I’m pretty sure I’ve hit CPU limits.

Finally, I’ll note that you can solve both memory and time-use issues by writing better code. A couple fun lessons I’ve learned:

First, concurrency. I have one workflow that fetches YouTube view counts for my videos and sends them to Notion.

By default, this happened serially… and on my paid account, this was taking 8.5 minutes per run to update 335 videos, costing 17 credits!

Then I learned about Promise.all() and started making the calls concurrently. Each modified a different Notion page, so the order didn’t matter. Afterward, I got them all updated in 35 seconds, which cost only 2 credits. Concurrency is a big deal.

This also taught me that posted API rate limits are not always as they seem. For example, Notion’s posted request limit is ~3 requests per second, but it’s actually 2,700/15 minutes.

This means don’t need to setTimeout() for 333ms between requests, and you can actually do like 100 requests in a second if you need. You just might get rate-limited if you keep that up for more than a few seconds – so just implement exponential backoff in your code instead.

Second, Pipedream’s no-code steps aren’t always efficient. I learned this when I created three versions of my Notion Voice Notes workflow template – one for Google Drive, Dropbox, and Microsoft OneDrive.

Users kept reporting that the Dropbox one would hit Out of Memory errors. I was tearing my hair out over this, thinking the problem was in the custom steps I’d written.

But then I looked at their Dropbox → Download to TMP step that I was using, and realize that it was loading the audio file into memory using a buffer, rather than streaming it to /tmp/, as the Google Drive/OneDrive steps were doing. This is now fixed: [BUG] Improve Dropbox Download File action performance · Issue #16874 · PipedreamHQ/pipedream · GitHub

Many devs don’t understand the difference between buffers and streams in node, and most never need to until they get hit with a memory problem like this.

TLDR: A stream reads a small chunk of a big file and writes it to the disk before moving onto the next. It’s akin to you using a wheelbarrow to haul a pile of dirt from your driveway to your garden, one small load at a time. A buffer reads the entire file into RAM before doing anything with it. This is like you giving the dump truck driver an enthusiastic thumbs-up before he dumps the entire 6-ton load of soil on your head.

I guess I’ll also mention that same goes for learning the difference between exec and spawn. Normally, you’re building piddly little workflows that run on buff-Shiba architecture, so you don’t have to care about this stuff.

But once you start trying to optimize your costs, you start to gain a real appreciation for efficient code that squeezes more out of constrained resources.

How to Solve “Out of Memory” Errors in Pipedream

🤔 Have an UB Question?

🤔 Have a Question?