Avoid Unleashing Zalgo in Node.js

Avoid Unleashing Zalgo in Node.js

Synchronous or asynchronous ?

which one should i choose while implementing an API ?

what the hell does unleashing zalgo means ?

Before answering those questions, let's first clarify differences between sync and async callbacks :

  • synchronous callback is invoked before a function returns. an example might be list.forEach(callback); when forEach returns, you would expect that the callback has been invoked on each element.
  • asynchronous callback is invoked after a function returns or at least on another thread's stack. asynchronous calls are popular with IO-related APIs, such as socket.connect(callback); when connect() returns, the callback may not been called, since it's waiting for the connection to complete.

Now, let's imagine that you have an API which takes a callback, and sometimes that callback is called immediately and other times that callback is called at some point in the future, then you will render any code using this API impossible to reason about, and cause the release of Zalgo.

Zalgo is an internet legend about an ominous entity believed to cause insanity, death, and the destruction of the world.

One of the most dangerous situations is to have an API that behaves synchronously under certain conditions and asynchronously under others. Let's take the following example:

import { readFile } from 'fs'
const cache = new Map()
function inconsistentRead (filename, cb) {
     if (cache.has(filename)) {
          // invoked synchronously
          cb(cache.get(filename))
      } else {
          // asynchronous function
          readFile(filename, 'utf8', (err, data) => {
                cache.set(filename, data)
                cb(data)
      })
    }
}

the preceding function uses the cache map to store the results of different file read operations. this function is dangerous because it behaves asynchronously until the file is read for the first time and the cache is set, but it is synchronous for all the subsequent requests once the file's content is already in the cache.

Now, let's discuss how the use of an unpredictable function, such as the one that we just defined, can easily break an application. Consider the following code:

function createFileReader (filename) {
const listeners = []
inconsistentRead(filename, value => {
    listeners.forEach(listener => listener(value))
})
return {
    onDataReady: listener => listeners.push(listener)
  }
}

when the preceding function is invoked, it creates a new object allowing us to set multiple listeners for a file read operation. All the listeners will be invoked when the read operation completes and the data is available. Let's see how to use the createFileReader() function:

const reader1 = createFileReader('data.txt')
reader1.onDataReady(data => {
    console.log(`First call data: ${data}`)

    // ...sometime later we try to read again from the same file
    const reader2 = createFileReader('data.txt')
    reader2.onDataReady(data => {
        console.log(`Second call data: ${data}`)
    })
})

The preceding code will print the following:

First call data: some data

As you can see, the callback of the second reader is never invoked. Let's see why:

  • During the creation of reader1 , our inconsistentRead() function behaves asynchronously because there is no cached result available. This means that any onDataReady listener will be invoked later in another cycle of the event loop, so we have all the time we need to register our listener.
  • Then, reader2 is created in a cycle of the event loop in which the cache for the requested file already exists. In this case, the inner call to inconsistentRead() will be synchronous. So, its callback will be invoked immediately, which means that all the listeners of reader2 will be invoked synchronously as well. However, we are registering the listener after the creation of reader2 , so it will never be invoked.

The callback behavior of our inconsistentRead() function is really unpredictable as it depends on many factors, such as the frequency of its invocation, the filename passed as an argument, and the amount of time taken to load the file.

The bug that you've just seen can be extremely complicated to identify and reproduce in a real application. Imagine using a similar function in a web server, where there can be multiple concurrent requests. Imagine seeing some of those requests hanging, without any apparent reason and without any error being logged. This can definitely be considered a nasty defect.

Isaac Z. Schlueter, the creator of npm and former Node.js project lead, in one of his blog posts, compared the use of this type of unpredictable function to unleashing Zalgo.

The lesson to learn from the unleashing zalgo example is that is imperative for an API to clearly define its nature: either synchronous or asynchronous.

We can fix our inconsistentRead() function by making it completely synchronous or completely asynchronous.

Using synchronous APIs

Node.js provides a set of synchronous direct style APIs for most basic I/O operations. For example, we can use the fs.readFileSync() function in place of its asynchronous counterpart.

import { readFileSync } from 'fs'
const cache = new Map()
function consistentReadSync (filename) {
    if (cache.has(filename)) {
       return cache.get(filename)
    } else {
       const data = readFileSync(filename, 'utf8')
       cache.set(filename, data)
       return data
    }
}

Bear in mind that changing an API from async to sync or vice versa, will require a change to the style of all the code using it. So we will have to totally change the interface of our createFileReader() API and adapt it so that it always works synchronously.

  • Always choose a direct style for purely synchronous functions to eliminate any confusion around its nature and will also be more efficient from a performance perspective.
  • Use blocking APIs sparingly and only when they don't affect the ability of the application to handle concurrent asynchronous operations.

Using asynchronous APIs

We can make our inconsistentRead() function purely asynchronous, by scheduling the synchronous callback invocation to be executed in the future instead of running immediately in the same event loop cycle. In Node.js, this is possible with process.nextTick() , which defers the execution of a function after the currently running operation completes. Its functionality is very simple: it takes a callback as an argument and pushes it to the top of the event queue, in front of any pending I/O event, and returns immediately.

import { readFile } from 'fs'
const cache = new Map()
function consistentReadAsync (filename, callback) {
     if (cache.has(filename)) {
          // deferred callback invocation
          process.nextTick(() => callback(cache.get(filename)))
      } else {
         // asynchronous function
         readFile(filename, 'utf8', (err, data) => {
              cache.set(filename, data)
              callback(data)
      })
   }
}

Now, thanks to process.nextTick() , our function is guaranteed to invoke its callback asynchronously, under any circumstances.