Clean code & programming principles - The ultimate beginner's guide

Clean code & programming principles - The ultimate beginner's guide

Featured on Hashnode

This article is the beginner's introductory guide to programming principles.

First we're going to examine what good code is. The qualities of good code. That's because those qualities come before programming principles. Programming principles are just guidelines to help us apply those qualities to code.

Afterwards, we'll examine the most important programming principles, one-by-one, at an introductory level.

Hopefully, this article will feel less like "have small functions" and more like "these are the qualities you want in code, for reasons 1, 2 and 3. So as you can see, small functions help you achieve those in ways X, Y and Z".

I believe that this kind of understanding is more beneficial than just knowing some arbitrary rules. They're especially helpful if you've been stuck on how to apply certain programming principles in the past. Knowing how they help and what they're trying to achieve should help you apply them even in unfamiliar situations.

Target audience

I believe that this article is suitable for all audiences.

If you're a beginner developer, some of the things mentioned in this article may be too abstract. But, some others should be useful immediately. Nevertheless, this article will give you an understanding that will help you very much in the future, even if you don't understand all of it now.

If you're an intermediate-level developer, you'll probably gain the most benefit. You are probably writing medium to large programs. You've got the hang of the basics. Now, you need to learn how to write code that scales (in size). This is what programming principles help you with.

If you're an advanced-level developer, you'll probably know most of these things already. However, you might enjoy this article nonetheless.


Qualities of good code

What is good code?

To answer that question, first we need to examine the requirements of code. Then, the qualities that we (people) need for something to be easy to work with. After that, the qualities of good code become obvious.

If you want to skip the discussion, here are the conclusions:

The requirements of code are that:

  • it should work as intended, without bugs
  • it should be built as quickly and efficiently as possible (without sacrificing quality) (just like all products)
  • it should be easy and fast to work with and modify (for the next time you need to work with it)

Some of our limitations are that:

  • we can't remember too much at any one time. This means that we won't remember that modifying X will break Y and Z.
  • we find complicated things disproportionally more difficult than simple things
  • making multiple similar changes is very error-prone for us
  • we have bad days where we are bored, can't focus and don't pay too much attention
  • we always make mistakes, no matter what. This means that we need tests (manual or automated) and other error-catching things.

From those two, after a bit of reasoning, we conclude that code should:

  • be simple (because we're bad with complicated things)
  • be immediately understandable (so we can understand it quickly and make changes faster. Also so we don't misunderstand it and create bugs, especially if we're not really focusing that day)
  • be organised (so we can understand the project structure easier and find the files we need to modify faster)
  • be independent (so we can make reasonable changes to X without breaking 1,000 other things in the project)
  • have minimal duplication (because we're bad with repetitive changes. They're also slower)

More details and explanations are below. If you're not interested, please skip to the next section.

Requirements of code

Software is a product. Businesses hire programmers to build software products. It's not abstract art (usually). It's something built for a specific purpose.

From a business perspective, products:

  • have to be fit for purpose and work as intended
  • should be as cheap and efficient as possible to create (without sacrificing quality)

The same applies to software.

But software has some unique aspects. It needs constant modification. That's because software is often never "finished". Companies may be requesting new features for decades after initial release. Also, there may be bugs that need fixing at any time. Finally, during development, programmers constantly modify the code.

Therefore, for the software product to be as efficient and cheap as possible to create and maintain, the code needs to be easy and fast to work with and modify.

Not to mention that being easy to work with means less bugs due to changes.

So, the requirements of code are that:

  • it should work as intended, without bugs
  • it should be built as quickly and efficiently as possible (without sacrificing quality)
  • it should be easy and fast to work with and modify (for the next time you need to work with it)

For even more detail on this, please see the post requirements of software.

Human limitations and bad code

Code can be difficult to work with because of our limitations.

Here are some of our limitations and what we can do to counter them.

Memory

We can't remember too much at any one time. The quote about short term memory and the magical number 7 plus or minus 2 comes to mind.

To counter that, we need code to be sufficiently independent (decoupled) and without hidden dependencies. That way, when we're modifying code, we won't accidentally break it due to forgetting to also update a dependency that we didn't remember existed.

We like things simple

Complicated things are disproportionally more difficult for us. This is partly because we need to keep in mind many things about them at once. Therefore, we should make code simple and easy to work with.

We are impatient

We get impatient, skim things often, have bad days and get bored.

To counter that, we should make code simple, easy to understand and easy to work with.

We are bad with repetitive work

Repetition is error-prone for us, particularly if every repetition is slightly different.

Repetitive work means more chances to make an error. Also, probably due to impatience and lack of focus, we're more likely to rush this type of work. We don't usually provide the necessary care and attention to every single change. To help, we should minimise repetitive work.

We make mistakes

We make mistakes often and in all areas of life. This includes programming, mathematics, engineering, art, design and everything else.

Therefore, we always need to double check our work. As a result, we use practices like code reviews and automated testing. We also use tools to statically analyse our code.

How we should work on software

We should work on software deliberately. We should know and understand as much as possible about the code we're currently working on. This means that we'll be as certain as possible that we're doing the right thing and that we won't break anything.

In comparison, if we're just trying things at random, we're not certain that they'll work. Most of the things we try won't work, except the last one (at which point we'll stop). Also, we'll only know whether they work or not because of our tests. We'll probably manually test everything we try.

This is problematic, because, since we're not really sure what we're doing, we might have broken other things that we won't think to test.

So, to minimize the chance of error, it's important to understand as much as possible about what we're doing.

The best way to do that is to make code simple, easy to understand and easy to work with.

How code should be

Everything we've examined so far points to a certain way for how code should be. Code should:

  • be simple (because we're bad with complicated things)
  • be immediately understandable (so we can understand it quickly and make changes faster. Also so we don't misunderstand it and create bugs, especially if we're not really focusing that day)
  • be organised (so we can understand the project structure easier and find the files we need to modify faster)
  • be independent (so we can make reasonable changes to X without breaking 1,000 other things in the project)
  • have minimal duplication (because we're bad with repetitive changes. They're also slower)

Next, let's examine the programming principles.

hunter-haley-s8OO2-t-HmQ-unsplash.jpg

Be pragmatic - The most important principle

Not just in programming, but pretty much everything in life, being pragmatic is essential.

It means to remember the true goal of what you're trying to accomplish, maximise that, and not get side-tracked.

In programming, your aims are to:

  • have code that works correctly
  • make your changes as quickly and efficiently as possible
  • make the code easy and fast to work with for the next time someone works on it

The programming principles are guidelines to help you do that. But, your aims come first. If a programming principle will be detrimental to your aims, you shouldn't apply it.

Don't apply principles to the extreme

For example, having code that's short is commonly considered a good thing. It has many benefits which we'll examine later. But you should never make your code shorter if it will make it more difficult to understand and work with.

Don't play "code golf", where you use complicated syntax and mathematical tricks to make the code as short as possible. That makes the code more complicated and more difficult to understand.

In other words, have code that's short (the guideline), but only if it makes the code simpler and easier to understand (your aims).

Balance time spent refactoring

Additionally, you need to make your changes in a reasonable timeframe. You've got to balance how much time you spend refactoring code against how much benefit it will provide.

For example, if you have some code that's very difficult to understand, you absolutely should refactor it. It might take a few hours, but it's probably worth it. It will make your project easier to work with in the long-term. You'll reclaim the time you spent refactoring through higher efficiency in the future.

But, if you have some code that's almost perfect, don't spend 3 days refactoring it only to make it slightly better. You would have spent 3 days for almost no benefit. Instead, you could have used that time in better ways. You could have written a new feature, or refactored a more suitable part of the codebase.

The point here is: You need to prioritise based on value. That usually means keeping code pretty clean and refactoring when needed. But it probably doesn't mean spending an unreasonable amount of time refactoring for almost no benefit.

YAGNI

Another important thing to talk about is YAGNI. It stands for "you ain't gonna need it".

It warns you against coding things in anticipation of features you might need in the future. For a simple contrived example, you may create a function foo, which has the parameter bar. But you might think "feature X might be added in the future, which will need a parameter baz, so let me add it to the function now".

In general, you want to be wary of doing that. Firstly, that feature is probably never going to be needed. Secondly, you increase the complexity of the code today, making it harder to work with. Thirdly, if that feature is needed in the future, you might code it differently to how you anticipate today.

Instead, code the simplest solution for what you need today. Then, make the changes needed for that feature when it's needed (if ever).

This is optimal, because you won't needlessly waste your time or make the codebase more complicated. Even if you did predict a feature correctly, it will be much faster to code it when you need it compared to all of the time you would have spent coding everything prematurely.

Personal recommendations

Create a fairly simple solution for what you need today, that is simple to understand and work with.

Write clean code and maintain your code so it's fairly clean. Refactoring may take time upfront, but it pays off in the long-term because the code is easier to work with.

Only apply programming principles if they'll make your code better and easier to work with.

If you're newer to programming principles, consider applying them more heavily than necessary when you practice. You'll get practice applying them and you'll get a feel for when you've taken them too far.

sarah-dorweiler-x2Tmfd1-SgA-unsplash.jpg

KISS (keep it simple stupid) and the principle of least astonishment

KISS (keep it simple stupid) is another principle that's universal to most things in life. It means that your code should be simple and easy to understand.

The principle of least astonishment is also important. It means that things should work exactly as you expect them to, they shouldn't be surprising. It's a cousin to KISS.

If you don't keep things simple and easy to understand, then:

  • everything takes longer to understand
  • sometimes you might not understand how things work, even after spending a lot of time on them
  • you might misunderstand how things work. Then, if you modify the software, you could easily create bugs.

How to apply KISS and the principle of least astonishment

Here are some guidelines for making your code simple and easy to understand.

Default to writing dumb code, avoid writing clever code

Dumb code is simple code. Clever code is probably not simple code.

Really clever code is not simple, it's difficult to understand and it's tricky. People will misunderstand it and create bugs as a result.

Keep code short and concise

Shorter code is more likely to be simple.

Short code means that units, such as functions and classes, do less things. That means they're simpler and easier to understand.

Use good names

If you have a well-named function, you can understand what it does from the name, without reading the function body. The same applies to all code. This makes your work faster and easier.

The name also provides meaning, which helps you decipher code faster.

For example, if you see the code 2 * Math.PI * radius, you may not understand what it's doing and why, even after reading it. You may look at it and be like "what? PI, radius?? What is this???".

But, if you see const circleArea = 2 * Math.PI * radius, straight away you're like "oh I get it. It's calculating the area of the circle, of courseeeee. No wonder PI and radius are there...".

Always consider the programmer reading the code for the first time

This is the person you're trying to optimise the code for. The colleague who has never worked on this code before, or even yourself, 6 months from now, when you've forgotten what this code does and how it works.

"Any code of your own that you haven't looked at for six or more months might as well have been written by someone else." - Eagleson's law

Consider that when you're writing the code, you know what the code needs to do and you just code it. But the person reading the code for the first time, has to parse what the code is doing and also has to understand why it's doing it.

Consider immutability (never reassigning the values of variables)

Immutability provides a guarantee that a value will never change.

This makes the code simpler to understand, because you don't have to trace through the code for the history of the variable, just in case it happened to change anywhere in your codebase.

Follow existing conventions

Code that follows existing conventions is unsurprising. Code that breaks conventions can be very unexpected. Someone who skims the code may not realise that it doesn't follow the convention, so they may misunderstand how it works.

Try to follow conventions which already exist in your codebase. Conventions which exist in your language or framework are less essential to follow, but also recommended.

munro-studio-wl8sb9ufMTs-unsplash.jpg

Separation of concerns

Separation of concerns means to organise functionality well in code.

Code should be separated into sensible units (modules, classes, functions and methods). Someone looking at the code should immediately understand what the particular unit does.

For example, if you have a Circle class, an Enumerable interface or a Math object or module, you tend to have a pretty good idea of what each does and contains. You would expect to find Math.PI, or Math.pow(base, exponent) (these methods exist in the JavaScript Math object). However, you wouldn't expect to find Math.printHelloToTheScreen() or Math.produceAccountingReport(). The methods in the latter example would be unexpected, which would break the principles of KISS and least astonishment.

In addition, units should be small and only do one thing (also known as the single responsibility principle). Another way of thinking about this is that different concerns should be separated at a granular level.

For example, you shouldn't have a god-class called Shape that has functionality for all possible shapes within it. Instead, you should have a small class for each shape.

This code is the bad version:

// Bad god class

class Shape {
  constructor(typeOfShape, length1, length2 = null) { // length2 is an optional parameter
    this.type = typeOfShape;
    if (this.type === 'circle') {
      this.radius = length1;
    } else if (this.type === 'square') {
      this.width = length1;
    } else if (this.type === 'rectangle') {
      this.width = length1;
      this.length = length2
    }
    // And so on for many more shapes
  }

  getArea() {
    if (this.type === 'circle') {
      return Math.PI * this.radius ** 2;
    } else if (this.type === 'square') {
      return this.width * this.width;
    } else if (this.type === 'rectangle') {
      return this.width * this.length;
    }
    // And so on for many more shapes
  }
}

This is the good version:

// Good small and simple classes

class Circle {
  constructor(radius) {
    this.radius = radius;
  }
  getArea() {
    return 2 * Math.PI * this.radius;
  }
}

class Rectangle {
  constructor(width, length) {
    this.width = width;
    this.length = length;
  }
  getArea() {
    return this.width * this.length;
  }
}

Here is another example.

This code is the bad version:

// Function does too many things

function sendData(data) {
  const formattedData = data
    .map(x => x ** 2)
    .filter(Boolean)
    .filter(x => x > 5);

  if (formattedData.every(Number.isInteger) && formattedData.every(isLessThan1000)) {
    fetch('foo.com', { body: JSON.stringify(formattedData) });
  } else {
    // code to submit error
  }
}

This code is the better version:

// Functionality is separated well over multiple functions

function sendData(data) {
  const formattedData = format(data);

  if (isValid(formattedData)) {
    fetch('foo.com', { body: JSON.stringify(formattedData) });
  } else {
    sendError();
  }
}

function format(data) {
  return data
    .map(square)
    .filter(Boolean)
    .filter(isGreaterThan5);
}

function isValid(data) {
  return data.every(Number.isInteger) && data.every(isLessThan1000);
}

function sendError() {
  // code to submit error
}

The idea that you should have small, specific units applies to all code.

Advantages of small units

Smaller, more specific units, have multiple advantages.

Better code organisation

Technically, with the god-class Shape, you know where to go to find the circle functionality, so the organisation is not too bad.

But, with the more specific units of Circle and Rectangle, you can find functionality faster and easier.

It's less obvious with the sendData example, but the same thing applies. Say you want to find the functionality for validating the data. You can find that instantly in the second version. There is a function clearly named isValid. sendData also calls isValid(formattedData), which labels where the data is validated.

However, in the first version of sendData, you'll have to spend more time reading through the details of sendData to find it. Also, the part where the data is validated isn't labelled. You'll have to both parse the code and recognise the line which does the data validation. If you're not familiar with the code, this may be difficult.

In summary, smaller units provide better organisation.

Simplicity and understandability

If you examine the Shape example, you'll see that the code there is quite long and complex. It's difficult to follow. In comparison, the classes Circle and Rectangle are super simple. As a result, they're much easier to understand.

In the sendData example, understanding what sendData does is easier in the second version. It almost reads like English:

  1. Format data
  2. If the data is valid: fetch
  3. Else: sendError

You also don't have to read the implementation of the separate functions, such as isValid, because their names tell you what they do.

All of the smaller functions are simpler too. They are clearly labelled (which helps you understand them even if the implementation is complicated) and they only do a small thing.

In general, smaller units have less code and do less things. This applies the KISS principle, which makes code easier to read and understand.

Easier changes

Code that does fewer things is easier to change than code which does many things.

At the very least, the code you need to change isn't surrounded by other code that you need to carefully avoid changing. Also, you need to understand the code before changing it, which is easier with small units.

Consider the god-class Shape example. The code for the functionality of all the shapes is entangled together. If you try to change the code for the circle, you could accidentally modify something else and create a bug. Also, the functionality for circle exists in multiple different methods inside Shape. You'll have to jump around and change multiple different things.

On the other hand, Circle and Rectangle are very easy to change. Unrelated code is nowhere to be found. You can't break any other shape by accident.

The same applies to the sendData example.

In the second version, if you want to change the data validation, you change the code in isValid and you're finished. You can't break any unrelated code, because there isn't any.

However, in the first version, since a lot of unrelated code is placed together, you might accidentally change something else by accident.

Easier to test

In general, if a unit does less stuff, it's easier to test than if it does more stuff.

Easier to reuse

If a unit does one specific thing, it's immediately reusable any time you need that one thing. However, if a unit does 10 things, or even 2 things, it's generally not reusable unless you need all of those things.

How to apply separation of concerns

To apply separation of concerns, you extract functionality.

For example, with Shape, if you extract all of the relevant code for the circle functionality into its own class, you end up with Circle.

Here is a more step-by-step process.

Here is Shape again for reference.

class Shape {
  constructor(typeOfShape, length1, length2 = null) { // length2 is an optional parameter
    this.type = typeOfShape;
    if (this.type === 'circle') {
      this.radius = length1;
    } else if (this.type === 'square') {
      this.width = length1;
    } else if (this.type === 'rectangle') {
      this.width = length1;
      this.length = length2
    }
    // And so on for many more shapes
  }

  getArea() {
    if (this.type === 'circle') {
      return Math.PI * this.radius ** 2;
    } else if (this.type === 'square') {
      return this.width * this.width;
    } else if (this.type === 'rectangle') {
      return this.width * this.length;
    }
    // And so on for many more shapes
  }
}

Let's define a class called Circle.

class Circle {}

From Shape, let's extract only the constructor functionality that's relevant to circle. That's the part inside the constructor method and inside the if (this.type === 'circle') conditional.

class Circle {
  constructor(radius) {
    this.radius = radius;
  }
}

Repeat for the getArea function:

class Circle {
  constructor(radius) {
    this.radius = radius;
  }

  getArea() {
    return Math.PI * this.radius ** 2;
  }
}

And so on for all the other methods which might be in Shape. Afterwards, repeat for the other shapes.

The same process applies for sendData, although in this case we're not completely replacing sendData like we did with Shape and Circle. Instead, we're extracting functionality into separate functions and calling them inside sendData.

For example, the code to format data was moved into the formatData function and the code to check if the data is valid was moved into the isValid function.

When to apply separation of concerns

Now that you understand the "why" and "how" of separation of concerns, when should you apply it?

Generally, you want "small, specific units that only do one thing".

However, the definition of "one thing" varies, it depends on context.

If you were to show the god-class Shape to someone, they might rightfully say that it only does one thing. "It handles shapes".

Someone else may say that Shape does a lot of things. "It handles circles, rectangles and so on. That's multiple things".

Both claims are correct. It all depends on what level of abstraction you consider.

In general, it's good to consider small levels of abstraction. You want units that do small, specific things.

That's because, as already examined, smaller units have more benefits than larger units.

So, here are some guidelines.

When code feels large and complicated

If you feel that some code is difficult to understand, or too large, try extracting some units out of it.

Can you keep extracting?

Robert Martin has a technique that he calls "extract till you drop".

In short, you keep extracting functionality until there is no reasonable way of extracting any more.

As you write code, consider: "Can I extract some more functionality from this unit, into a separate unit?"

If it's possible to extract further, then consider doing so.

See Robert Martin's blog post on extract till you drop for more information on this technique.

Reasons to change

Consider, what reasons does this code have to change?

Code which is placed together, which has different reasons to change (different parts may change at different times), is bad, as we've already examined.

The solution is to move code with different reasons to change into separate units.

Consider the Shape example. Shape will change when:

  • anything needs changing for circles
  • anything needs changing for rectangles
  • anything needs changing on any other shape
  • a new shape needs to be added or removed

In the sendData example, sendData could change if:

  • the formatting of the data needs to change
  • the validation of the data needs to change
  • the data in the error request needs to change
  • the endpoint (URL) of the error request needs to change
  • the data in the sendData request needs to change
  • the endpoint (URL) of the sendData request needs to change

All of these reasons are indicators that you may want to separate that functionality.

Who (which role in the company) may want to change this code

This is another flavour of "what reasons does this code have to change".

It asks who (which role in the company) may want to change the code.

In the sendData example:

  • developers may want to change something about the URL endpoints of the requests or the bodies of the requests
  • accountants may want to change the data validation in the future
  • a product owner who uses the submitted data to generate reports could want to format the data differently in the future

Both of these questions (what could change and who may want changes) try to point out different concerns in the code, that may benefit from separation.

Be pragmatic

The final point is to be pragmatic.

You don't have to separate everything to the extreme. The goal is to have code that's easy to work with.

For example, you don't need to enforce every function in your codebase to be at maximum 4 lines long (which is possible to do). You would end up with hundreds of miniscule functions. They may be harder to work with than more reasonably sized functions, that are an average of 4 to 8 lines long.

pexels-andrea-piacquadio-3808057.jpg

Principle of least knowledge

In software, it's beneficial to minimise knowledge. This includes the knowledge that code has of other code (dependencies), as well as the knowledge you need to work with particular areas of code.

In other words, you want software to be decoupled and easy to work with. Making changes shouldn't break seemingly unrelated code.

Knowledge in code

In programming, knowledge means dependencies.

If some code (call it module A), knows about some other code (call it module B), it means that it uses that other code. It depends on it.

If some code is being used elsewhere, that means that there are limitations on how you can change it, otherwise you would break the code that uses it.

Without discipline and control, this is where you can get into a chain reaction of propagating changes. The situation where you just wanted to make a small change and had to modify every file in the system to do so. You changed A, which was used by B and C so you had to change both of those to accommodate your changes to A. In turn B and C were used in other places which you also had to change. And so on.

Every change is error-prone, multiple cascading changes are much worse.

Additionally, you need to actually remember or know that these dependencies exist. This is quite difficult to do, especially when dependencies propagate far and wide throughout your code. But if you don't remember, you won't make all of the required changes and you'll immediately introduce bugs.

That's why you need to minimise knowledge in your code.

Modifications to code

Here are the possible changes you can make to already-existing code.

No change to contract

The only change you can make with no propagating changes, is a change that doesn't affect anything else in the codebase.

For example:

// Original
function greet(name) {
  return 'Hello ' + name;
}

// After change
function greet(name) {
  return `Hello ${name}`;
}

These two functions are equivalent from a caller's point of view. They have the same contract. If you change from one version to the other, nothing else in the codebase needs to change, because nothing could possibly be affected by this change.

Changing the contract of a "private" function

The next best case is when you change the contract of a private function. Something that's not public to the majority of the codebase. In this case, if you change the contract, the code that is affected is very small.

For example, consider this Circle class:

// Circle.js
class Circle {
  constructor(radius) {
    this.radius = radius;
  }

  getArea() {
    return _privateCalculation(this.radius);
  }
}

function _privateCalculation(radius) {
  return Math.PI * radius ** 2;
}

export default Circle;

Next, consider that we want to delete _privateCalculation. Here is the code after the change:

// Circle.js
class Circle {
  constructor(radius) {
    this.radius = radius;
  }

  getArea() {
    return Math.PI * this.radius ** 2;
  }
}

export default Circle;

When we deleted _privateCalculation, getArea was affected. As a result, we also had to modify getArea to accommodate the changes. However, since _privateCalculation wasn't used anywhere else in the codebase and since getArea didn't change its contract, we're finished. Nothing else in the codebase needs to be modified.

Changing the contract of a public function

The pattern continues in the same way. If you change the contract of anything, you'll have to modify everything that uses it to accommodate. If you change more contracts as a result, you'll have to modify even more things. And so on.

For example, if you delete getArea, you'll have to update all of the code in the codebase that uses it. Since getArea is a public function, many things could be using it.

In general, you want to prevent these situations.

The only real way to prevent them is to separate concerns properly. You need to organise your code into sensible units that make sense for your project. If done well, that minimises the chance that you'll need to change the contract of those units in the future.

For example, what is the chance that the Circle class needs to change its contract? It's very low.

Other than that, keep everything you can private, so that very little is affected when you need to change code.

Now, changes to public things are necessary sometimes. That's life. It could be due to new requirements, or due to large refactors. You'll deal with them when you need to, but hopefully it won't be too often.

More tips

The principle of least knowledge has many more applications. They all deal with making code independent to changes and with minimizing the mental knowledge you need to work with code.

Other applications of this principle include:

  • the interface segregation principle. This keeps interfaces small. It means that code which uses an interface depends on less things. It allows for easier future changes such as splitting a class based on its interfaces or creating a smaller separate class for an interface.
  • the law of Demeter. This prevents functions / methods from depending on long chains of object compositions.
  • immutability. This eliminates changes to variables. It means that you don't need to track how the variable has changed over time. It reduces the knowledge you need to work.
  • only accessing things in the local scope (or maybe instance scope). Global things are accessible by many things in the codebase. Changing them may break many things. It's also difficult to track how they change over time, because many things can change them. However, local things are more "private". This makes tracking changes easier.

pexels-madison-inouye-1117274.jpg

Abstraction and don't repeat yourself (DRY)

DRY (don't repeat yourself) is a core principle in programming.

It says that if you have multiple instances of similar code, you should refactor them into a single abstraction. That way you'll end up with just one instance of the code, rather than multiple.

To accommodate the differences, the resulting abstraction accepts arguments.

Motivation for DRY

One of the reasons for DRY is to cut down the time you need to write code. If you already have an abstraction for X functionality, then you can import it and use it, rather than re-code it from scratch every time you need it.

Another reason is to make changes easier. As already mentioned, we're bad with repetitive work. If code is DRY, then you only have to make a specific change in one place. If code isn't DRY then you have to make a similar change in multiple places. Making a single change is safer and faster than making multiple similar changes.

Additionally, keeping code DRY applies separation of concerns. The abstraction will have to be placed in a sensible place in the codebase (good for code organisation). Also, the implementation of the abstraction is separated from the caller.

How to apply abstraction and DRY

Here are some guidelines for applying DRY.

Combine similar code into a single abstraction

Whenever you find multiple instances of the same or similar code, combine them into a single abstraction. If there are slight differences between the instances, accept arguments to handle them.

You've probably done this a vast number of times throughout your career.

To illustrate the point, let's use the function map as an example. map is a function that handles this common process:

  1. Create a new, empty, array
  2. Iterate over an array with a for-loop
  3. Run some functionality on every value
  4. Push the resulting value to the new array
  5. After the for-loop ends, return the new array

This process is very common. It appears all the time in many codebases.

Here is what it normally looks like using a for-loop.

function double(x) {
  return x * 2;
}

function doubleArray(arr) {
  const result = [];
  for (let i = 0; i < arr.length; i++) {
    const element = arr[i];
    const transformedElement = double(element);
    result.push(transformedElement);
  }
  return result;
}

const arr = [1, 2, 3, 4];
const result = doubleArray(arr);

In addition to the function doubleArray, there would be many other functions that are almost exactly the same. The only differences would be the array they iterate over and the transformation they make on each element.

So, take the common parts from those functions and put them into a separate function called map. Accept arguments for the things that are different every time, the array and the transformation to run on each element.

Here is the resulting code.

function map(array, transformationFn) {
  const result = [];
  for (let i = 0; i < array.length; i++) {
    const element = arr[i];
    const transformedElement = transformationFn(element);
    result.push(transformedElement);
  }
  return result;
}

Then, in every function in your codebase similar to doubleArray, use map instead.

function double(x) {
  return x * 2;
}

function doubleArray(arr) {
  return map(arr, double);
}

const arr = [1, 2, 3, 4];
const result = map(arr, double);

(Of course, arrays in JavaScript already have a built-in method for map, so you wouldn't need to create a standalone map function. This was just for illustrative purposes.)

You can do the same with any other code. Any time you encounter similar code, combine it into a single abstraction and accept arguments for any differences.

Rule of three

The rule of three is a precaution against combining functionality too early.

It states that you should combine functionality into a single abstraction if there are three occurrences of it. Don't combine if there are only two occurrences.

That's because the instances of code you might combine, may diverge (each may change differently) in the future.

For example, consider this code:

function validateUsername(str) {
  return str.length >= 6;
}

function validatePassword(str) {
  return str.length >= 6;
}

It would probably be a mistake to combine the duplicate functionality into its own abstraction, like so:

// combined too early

function validateUsername(str) {
  return validate(str);
}

function validatePassword(str) {
  return validate(str);
}

function validate(str) {
  return str.length >= 6;
}

The problem is that, in the future, validateUsername and validatePassword may change differently. It's not difficult to see how that may happen.

For example, in the future, validateUsername may need to check that there are no special characters, while the password may require special characters.

Obviously you could make both scenarios work in the validate function using conditionals, but it would be messier than if you had kept the functionality separate.

This is why we use the rule of three. Waiting until the third occurrence makes it more likely that the similar functionality is significant rather than coincidental. It means that things are less likely to diverge in the future.

It also makes it so that if one of the three instances of similar code diverges, you can separate it and still keep the abstraction for the other two. On the other hand, if you combined functionality on the second occurrence, then had to separate them out again, you would have to revert both of them.

In summary, refactoring on the second occurrence is more likely to be a waste of time.

Of course, the rule of three is just a guideline. Remember to be pragmatic and do what's best for your project. Some similar instances of code may be changing in the same way every time. Or maybe they are each quite complicated to change, and you have to make a similar change to both every time. In that case, it may be more beneficial for your project to combine them into a single abstraction, even if you have to ignore the rule of three.

john-barkiple-l090uFWoPaI-unsplash.jpg

Side effects

The last thing we're going to look at is side effects. These aren't a single principle, but a combination of many principles + being pragmatic.

(And no, they're not just the domain of functional programming. It's essential for all code to handle side effects properly.)

In programming, the general definition of a side effect is anything that changes the state of the system. This includes:

  • changing the value of a variable
  • logging to the console
  • modifying the DOM
  • modifying the database
  • any mutation whatsoever

It also includes "actions" that may not be viewed as mutations, such as sending data over the network.

I also say that accessing non-local scope is a side effect. It may not be in the official definition, but it's as unsafe as other side effects, especially if the variable you're trying to access is mutable. After all, if you access a global variable whose value isn't what you expect, you have a bug, even if the code in question doesn't modify it.

All code needs "side effects" to be useful. For example, you have to modify the database or the DOM at some point.

But side effects can be dangerous. They need to be handled carefully.

The danger of side effects

Side effects are not directly harmful, but they can be indirectly harmful.

For example, code A and B might both depend on the value of a global variable. You might change the value of the global variable, because you want to influence code A. But, you don't remember that code B will be affected as well. As a result, you now have a bug.

These hidden dependencies, where you change one thing and something else breaks, can be very difficult to remember, track and manage.

Another example is changing the DOM. The DOM can be thought of as just a global object with state. The problem is that, if different pieces of code affect the DOM at different times, in non-compatible ways, there can be bugs. Maybe code A depends on element X to be there, but code B deleted that entire section altogether just before code A ran.

Perhaps you've encountered bugs like these in your work as well.

Additionally, side effects break most of the principles we've covered so far:

  • KISS and the principle of least astonishment
  • principle of least knowledge (because code affects other, seemingly unrelated code)
  • separation of concerns (because concerns are not necessarily self-contained or well-organised)

One important thing to understand however, is that side effects are not inherently harmful. They only cause bugs if we code them incorrectly. They are code we write which happens to be incompatible with other code we write. We write code A and then we write code B which breaks code A under certain circumstances.

The main danger of side effects is that they're generally very difficult to track. The reason for that is because tracking global state, which anything can modify at any time, is very difficult. If uncontrolled, how could you possibly track changes made to the DOM over time? You may have to track so many things that it just wouldn't be feasible.

Asynchronicity and race conditions also add to the complexity and difficulty of tracking side effects.

Another downside of side effects is that code with side effects is generally harder to test.

Handling side effects

Even though side effects are dangerous, they can be handled effectively.

Be pragmatic

The most important point, as always, is to be pragmatic.

You don't have to avoid all side effects to the extreme. You are only required to be careful with potentially incompatible code.

For example, immutability is a good way to avoid many types of side effects. However, immutability makes little difference in the local scope of functions.

For example, here are two functions that do the same thing. One uses immutability and the other doesn't.

function factorial1(n) {
  let result = 1;
  for (let i = 1; i <= n; i++) {
    result *= i;
  }
  return result;
}

function factorial2(n) {
  if (n <= 1) {
    return 1;
  }
  return n * factorial2(n - 1);
}

In the example, factorial1 uses mutation. The values of result and i both change during execution.

factorial2 uses immutability. The values of the variables inside it never change during function execution.

But it makes no difference. Other than some language limitations of recursion (which we'll ignore for this example), for all intents and purposes, factorial1 and factorial2 are exactly the same from the perspective of the caller.

In fact, people tend to be less comfortable with recursion, so factorial2 could actually be the worse choice depending on your team.

So be pragmatic and do what's best for your project.

Immutability

Having said that, immutability is an easy way to avoid a large portion of side effects.

By never modifying variables in your code unnecessarily, you remove a large problem. You won't have things changing unexpectedly. You also won't have to track the lifecycle of variables to know what values they contain.

When starting with immutability, start simple. Then, over time, try to make as many things immutable in your work as possible.

Instead of modifying a variable, create a new variable for the new value. Instead of modifying an object, create a new object with the new values you want.

For example:

// Example 1 - Don't do this
function doubleArray(array) {
  for (let i = 0; i < array.length; i++) {
    array[i] = array[i] * 2; // mutates the original array
  }
}
const arr = [0, 1, 2, 3];
doubleArray(arr);
// Example 2 - Do this
function double(x) {
  return x * 2;
}
function doubleArray(array) {
  return array.map(double); // returns a new array, without modifying the original
}
const arr = [0, 1, 2, 3];
const result = doubleArray(arr);

In example 1, the original array is modified.

In example 2 the original array is not modified. doubleArray creates and returns a new array with the doubled values. Outside of the function, we create the new variable result to hold the new array.

Immutability performance concerns

Immutability may be slightly worse for performance. However, you probably shouldn't worry about that, because:

  • you shouldn't do premature optimisation for performance. Don't worry about performance except for the bottlenecks in your code.
  • in most cases, immutability won't have a significant impact on performance
  • you can use a performant immutable data structures library, such as Immer for JavaScript. It converts some operations from Big-O(n) time (such as copying an entire object) to Big-O(1) time.
  • you can be pragmatic. You don't have to apply immutability in places where it would bottleneck performance.

Also, in some cases, immutability can improve performance by making things easier to run in parallel.

Avoid non-local scope

Avoid accessing or modifying things that are not exclusively in the local scope of your functions or methods. This means that it's probably okay to modify variables that originated in your local scope, but not variables which were passed in as arguments (originated outside of the local scope).

If necessary, it's alright to mutate things up to instance or module scope.

The further away from local scope you go, the more dangerous it gets, because things become more global. This makes things harder to track and introduces far-reaching dependencies in your code.

Wherever possible:

  • pass things in explicitly as arguments
  • stick as close to local-scope as possible

For example:

// Example 1 - Don't do this
function doubleResult() {
  result *= 2; // Accesses and mutates a variable outside of the local scope
}
let result = 5;
doubleResult();
// Example 2 - Do this
function double(n) {
  return n * 2; // Accesses parameter which is in local scope. Doesn't mutate anything
}
const initialValue = 5;
const result = double(initialValue);

In example 1, doubleResult accesses result, which is a variable outside of its local scope. It also mutates it, changing its value. Now, if any other code in the codebase accesses result, it will see the new value.

In example 2, double only accesses its parameter, which is part of its local scope. It doesn't mutate any values outside of its local scope.

In a real codebase, something resembling example 1 could be very difficult to track. The result variable may be defined much further away from both the doubleResult function as well as the function call. This makes it harder to track the value of result.

Also, if result isn't what you expect, you have a bug. For example, you may have already called doubleResult 3 times but you may not remember.

Overall, in example 1, you can't predict what a function that uses result will do unless you know the exact value of result at that time. To do this, you'll need to search and trace through the entire codebase to keep track of result at all times.

In the second example, initialValue is always 5, so there are never any surprises. Also you can see what the function is doing immediately and can easily predict what will happen.

Be extremely careful

Sometimes you can't just rely on immutability. For example, at some point, you must mutate the DOM or the database, or make a call to a third party API, or run some sort of side effect. As already mentioned, asynchronicity only adds to the problem.

In this case, you just have to be extremely careful.

Side effects are probably where the majority of the bugs in your codebase exist. They're the hardest code to understand and track.

Regardless of what you do to try and manage them, you must always invest the required time and attention to them.

Separate pure and impure functionality

For the most part, try to separate code with side effects and code without side effects. Your functions shouldn't both perform side effects and have "pure" code. They should do one or the other (within reason).

This is also known as the command-query separation principle. It's also an application of separation of concerns.

For starters, something like writing to the database is very different to calculating what to write to the database. Those two concerns can change independently and for different reasons. As we examined in separation of concerns, they should be separated.

Further, pure functions are generally easy to understand, reuse and test. Functions with side effects are not. Therefore, for your codebase to be easy to work with, you probably want as many functions as possible to be pure. This means that you should separate your pure functionality from your side effects.

For example, instead of this:

function double(x) {
  return x * 2;
}

function doubleArrayAndDisplayInDOM(array) { // this function does a non-trivial calculation / operation and performs a side effect
  const doubled = array.map(double); // (pretend this is a non-trivial calculation / operation)
  document.querySelector('#foo').textContent = doubled; // writing to the DOM is a side effect
}

function main() {
  doubleArrayAndDisplayInDOM([1, 2, 3, 4]);
}

Do this:

function double(x) {
  return x * 2;
}

function doubleArray(array) { // this function only does a calculation / operation
  return array.map(double);
}

function displayInDom(content) { // this function only performs a side effect
  document.querySelector('#foo').textContent = content;
}

function main() {
  const doubled = doubleArray([1, 2, 3, 4]);
  displayInDom(doubled);
}

Clear areas of responsibility

As much as possible, you need to make sure that your code doesn't have conflicts. Code which performs side effects shouldn't conflict with other code performing other side effects at different times.

A good way to do this is to have distinct areas of responsibility in your code.

For example, if code A modifies element X in the DOM, then it should ideally be the only code which modifies that part of the DOM. All other code that needs to influence X should talk to code A to do so. That way tracking changes to element X is as easy as possible.

Additionally, try to organise code dependencies well. For example, code A shouldn't run if any other code runs which would conflict with it. Also, code A shouldn't run if the state that it depends on isn't there or isn't what code A expects.

Side effects in pairs

For side effects which come in pairs (e.g. open / close file), the function that started the side effect should also finish it.

For example, instead of this:

/* Note, this is pseudocode */

function openFile(fileName) {
  const file = open(fileName);
  return file;
}
const file = openFile('foo.txt');

/* Lots of other code in-between */

doStuffToFile(file);
close(file);

Do this:

/* Note, this is pseudocode */

function useFile(fileName, fn) {
  const file = open(fileName);
  fn(file);
  close(file);
}
useFile('foo.txt', doStuffToFile);

Robert Martin calls this technique "passing a block". The function useFile both opens and closes the file, so it doesn't leave an open file pointer in the system.

This ensures that the file will be closed when it's no longer needed.

As for the functionality to perform on the file, that's passed into the function. It's the parameter fn.

This ensures that you won't forget to finish the side effect later. It also provides good code organisation and makes the code easy to understand and track. The entire side effect is fully handled in one place.

Consider using a framework or functional programming language

As with immutability, the best option might be to avoid side effects as much as possible.

To help with this, you can consider delegating some of them to a framework, library, or functional programming language.

For example, for working with the DOM, you can use a library such as React (or one of the many alternatives).

Something like React handles all of the DOM-related side effects. Then, in your application, you just write pure functions. You don't modify the DOM directly. Instead, your functions generate an object that represents what the DOM should look like.

This is good for you, because working with pure functions is much easier than working with side effects.

As for actually modifying the DOM, those side effects still occur, but they're React's problem now.

Additionally, the parent / child hierarchy of React ensures that your DOM manipulations won't conflict with each other and cause problems. For example, React code involving element X won't run if element X won't actually exist. This is an example of good organisation and structure in your code to prevent conflicts with other side effects.

Of course, there are many more pros and cons to using something like this. But it's just an option for you to consider.


Further reading

That was a high-level overview of what I consider to be the most important concepts for writing good code. I hope that this article helped you understand the reasoning, motivation and overview behind clean code and programming principles. Hopefully, this knowledge will help you when you go on to learn more programming principles, or find more practical examples of them.

For the next step, I recommend learning clean code and programming principles more practically. Use a resource that explains the concepts with many examples and applications in code.

I highly recommend looking into content created by Robert Martin. For the "quick", free version, I found his lectures Coding a better world together part 1 and Coding a better world together part 2 to be some of the best programming videos I've ever watched. For more detail you might want to check out his book Clean Code or his videos Clean Coders (start with the fundamentals series and the SOLID principles). I've learned a lot from Robert Martin's resources. I especially like that he explains the principles very practically, giving many practical examples of each and a lot of information in general.

I also found the book The Pragmatic Programmer very good. Some of the details are outdated, but the concepts are not. That book truly hammers in the concept of being pragmatic. If anyone reads the 20th anniversary edition of The Pragmatic Programmer please let me know what you thought. It's on my list but I haven't read it yet.

I'm sure there are other amazing resources as well, but these are the ones I'm familiar with and can personally recommend.

Finally, I recommend thinking about the programming principles yourself. Challenge them, consider where they might be useful or not be. Spend time on your own and consider everything that this article discussed.

Alright, if you have any comments, feedback, or even counter-arguments to what this article discussed, please let me know in the comments. I'm always happy for a discussion. See you next time.