Think Transformations

How to Avoid Making Stew

David Veen

published:

updated:

05 september 2020

26 december 2020

In imperative programming the way we solve problems often turns code into an unreadable mess. We examine how rethinking the problem as a set of transformations on a collection leads to clean, declarative code. In other words: How to avoid making a stew.

To see different programming paradigms in action we need a simple procedure to implement. Examples that are short enough to fit a blog post often feel a bit contrived. Sadly this won't be an exception - but here we go:

Given a list of names, find all the names that have less than 5 characters and at least two distinct vowels. Given the list 'Ann', ' Ian ', 'Eve', 'Waldo', we should return 'Ian' (note the trim!).

Take a minute to think about your implementation. What are the steps you need? How do they come together?

The Imperative implementation

If anything, the imperative approach probably looks familiar. We take an empty container and start adding things to it. Take a name, trim it, cut it into pieces and check the number of vowels. If it satisfies the criteria, add it to the result. Rinse, repeat.

const NAMES = ['Ann', ' Ian ', 'Eve', 'Waldo'];
const VOWELS = ['A', 'E', 'I', 'O', 'U', 'a', 'e', 'i', 'o', 'u'];

// imperative implementation
function shortNamesWithDistinctVowels(names) {
  const result = [];
  for (const name of names) {
    const trimmedName = name.trim();
    if (trimmedName.length < 5) {
      const vowelsFound = []
      for (const char of trimmedName) {
        const lowercasedChar = char.toLowerCase();
        if (VOWELS.includes(char) &&
          !vowelsFound.includes(lowercasedChar)
        ) {
          vowelsFound.push(lowercasedChar);
        }
      }
      if (vowelsFound.length > 1) {
        result.push(trimmedName);
      }
    }
  }
  return result;
}

const result = shortNamesWithDistinctVowels(NAMES); // => ['Ian']

This works. But does this code communicate clear and succint thoughts to its reader? No. All the steps needed to perform the task are mashed into a whole, slowly simmered until none of the original ingredients - the steps of the algorithm - have recognisable shape. It's a stew.

Written in pseudocode, the implementation looks like this:

Imperative pseudocode

/*
  Create a result array.
  For every name,
    get the trimmed name.
    If the length of the trimmed name is smaller than 5
      Define an empty array of vowels found.
      For each character in the name
        Get the lowercased character.
        if the lowercased character is listed as a vowel
        and that character is not yet included in the array of vowels found,
          add it to the list of vowels found.
      If the length of the list of vowels found exceeds 1,
        add the trimmed name to the result
  The list of names that are short and have multiple vowels is the result array.
*/

Moving towards a declarative alternative

Whether a stew is something you enjoy is, on the whole, a question of taste undeserving of condemnation. But to follow the analogy until it breaks: it's easy to create stew from a recipe, but hard to derive the recipe from the stew. When we read code it ought to tell us the recipe, not throw us into boiling broth.

Can we clean up our imperative implementation? Abstraction and descriptive naming helps, up to a point. We can create some helper functions to separate concerns and communicate intent. Lines 10 through 18, for example, can be extracted to a getVowelCount function. We can extract isShort, getTrimmedName and isVowel functions as well.

This is a step in the right direction, but the improvement is limited. Separating functions more aggressively hits diminishing returns. Is seperating isShort, getTrimmedName and isVowel really worth it? We are also left with some pieces that actively resist breaking up. It's hard to split up getVowelCount any further, because whatever we take out will still need access to or mutate vowelsFound - go ahead, try it :)

The problem isn't that the code we've written is somehow unclean JavaScript. The problem is our stew-making approach.

Define an array, check some conditions, mutate the array. Those instructions tell the computer what to do instead of telling the reader what it is we're trying to achieve. Take another look at the pseudocode. The way the solution reads differs miles from the way we would explain what it does.

What if we change our approach?

Whether a stew is something you enjoy is, on the whole, a question of taste undeserving of condemnation. But code ought to tell us the recipe, not throw us into boiling broth.

Transforming collections pseudocode

/*
A short name is a name that has less than 5 characters.
A trimmed name is the name trimmed.

  A character is a vowel when it is listed as a vowel.
  A deduplicated array is a list of the Set of its values.
  The list of vowels in a name is
    the lowercased name
    as a list of characters
    that are vowels.
A name has multiple vowels if the length of the deduplicated list of vowels exceeds 1.

The list of names that are short and have multiple vowels is
  the list of names
    that when trimmed
    are short
    and have multiple vowels.
*/

The pseudo-code here reads a bit formally, and it isn't necessarily more concise than our original approach. It's not a step by step breakdown of the problem, but a set of definitions that can be evaluated separately and then combined to create the intended result. Instead of mashing ingredients together, we're working with clearly defined and separated parts. The point isn't so much about reusability of each of these helper functions, but about the ability to reason about each of them. This makes this solution easier to verify, and more adaptable to changing requirements. Let's look at the implementation in code.

Transforming collections

const isShort = name => name.length < 5;
const trim = name => name.trim();

function hasMultipleVowels(name) {
  const isVowel = char => VOWELS.includes(char);
  const deduplicate = vowels => ([...new Set(vowels)]);

  const vowels = name
    .toLowerCase()
    .split('')
    .filter(isVowel)
  return deduplicate(vowels).length > 1;
}

function shortNamesWithDistinctVowels(names) {
  return names
    .map(trim)
    .filter(isShort)
    .filter(hasMultipleVowels)
}

const result = shortNamesWithDistinctVowels(NAMES); // => ['Ian']

Thinking transformations

The key is to rethink the operation as a series of transformations. We have a collection of names from which we need to derive another collection of names. To get from A to B we need a transformation that can get us there. That transformation needs several steps: it need a transformation that can trim a name, another to select a name if it is short, and yet another to select a name if it has multiple vowels. These transformations can be simple (like trim, isShort) or complex (like hasMultipleVowels) but - borrowing from the Zen of Python - complex should never mean complicated. That is, a complex transformation is achieved by the composition of other transformations, nothing else. This approach is called collection-based programming.

Complex is better than complicated

— Zen of Python

The approach has four major advantages. If we look at the implementation of shortNamesWithDistinctVowels, notice how it communicates intent rather than implementation details. It's extremely declarative! Notice, second, how the implementation follows almost directly from its definition. Provided the transformations do what they say they do, the correctness of the algorithm is easy to verify. If we trust the underlying transformations, we can stop reading and be satisfied that it works. And if we want to dig deeper, we can examine and test each transformation separately. Third, by chaining operations we gain a lot of flexibility. Adding operations to the chain is straightforward; define a transformation and insert or append it. The chain is also explicit in the order of operations. So changing that order is easy too - we can simply move a transformation up and down the chain to alter the order in which it is executed. Finally, this implementation never has us mutate state. That's not to say it doesn't have state. Behind the scenes JavaScript creates intermediate data structures for us. Point is we don't need to worry about maintaining it.

Are there downsides to this approach? Sure there are.

The most important worry, paradoxically, is readability. Compared to the first implementation we need a lot more knowledge of JavaScript to understand the second. I won't go in-depth on any of those topics here (take a look at the further reading links below if you need an introduction), but it's worth keeping in mind what our colleagues may find 'alien' when they look at our code.

First, while chaining operations on a string should be familiar, using 'map' and 'filter' (and their big brother 'reduce') on collections might not be as commonplace. These are higher-order functions that take a function as argument, iterate over the collection with that function, and return a new collection. Since functions are first-class citizens in JavaScript, we can provide a named function as a parameter instead of writing it inline. As long as the signature of our functions matches the the signature of the argument, this works beautifully. It means we can write .filter(isShort) instead of the more verbose .filter(name => isShort(name)). This is called tacit programming, or point-free notation (the "points" here are the arguments that we don't provide as we combine functions).

Second, the collection-based implementation is structured a bit like a mathematical proof. There's a bunch of definitions on top, then the actual operation at the end. QED-style. A lot of us are trained to read code top to bottom, and this approach maps poorly to that expectation. In fact the implementation is much easier to read bottom to top. That might seem like a small detail, but we're in the business of communicating to our colleagues (and our future selves), not the computer. We cannot force our colleagues to view something as readable. Coming to a shared understanding may take persuasion, training, repetition and time.

Another trade-off is that chained operations, despite their power and flexibility, must abide a strict rule: each operation must fit its signature to the operations on either side of it. What if there are multiple inputs and outputs? What if operation C needs the output of B, but also the original output A? Solutions are fairly easy to imagine (carry tuples of values along the chain, or nest operations so we can keep values that need to 'skip' operations in a closure), but they incur a cost insofar as it makes the chain more complicated, and less declarative. What about errors? Handling null values? Our nice and clean toy example might look more gnarly and complicated in a real project.

A final trade-off is that behind-the-scenes creation of intermediate data structures potentially impacts performance. This may offend some of our colleague's sensibilities, especially those with backgrounds where control over memory and performance is a core concern. In JavaScript, it's a sacrifice we should be willing to make 99% of the time. At the very least, let's not consider it a problem until we know that it's a problem. Prefer proving something is correct over proving that it's fast.

Taking it further

I'm convinced that thinking in transformations is a powerful technique that makes code less error-prone and easier to maintain. Does that mean I would check the above solution into production? It depends. How accepting are my colleagues of this type of solution? Do we encounter similar problems in the domain that might benefit from code sharing?

Depending on the context I might opt not to write out each definition so laboriously. We could just write the operations inline. We lose some of the explicit intentionality, but shed some bulk and indirection. The transformations remain the same.

I would also look for ways to compose transformations rather than chaining them. That way we don't need any intermediate data structures to perform the operation. Take a look at the example below, written with Lodash/fp. It's a lot shorter but also asks more from the reader. It's right on the edge of breaking the 'never be clever' maxim (or the admonition 'Sparse is better than dense'). What do you think?

Lodash: living on the edge since 2012

import { filter, intersection, map, pipe, toArray, toLower, trim } from 'lodash/fp'

const shortNamesWithDistinctVowels = pipe(
  map(trim),
  filter(name => name.length < 5),
  filter(pipe(
    toLower,
    toArray,
    intersection(VOWELS), // returns the overlap between two arrays
    vowels => vowels.length >= 2
  ))
);

const result = shortNamesWithDistinctVowels(NAMES); // => ['Ian']

Think Transformations

David Veen

The Imperative implementation

Moving towards a declarative alternative

Thinking transformations

Taking it further

Inspiration and further reading

Privacy

Disclaimer

Terms and conditions