Continuous Integration

The first step to delivering consistent and high-quality software is Continuous Integration (CI). CI is all about ensuring your software is in a deployable state at all times. That is, the code compiles and the quality of the code can be assumed to be of reasonably good quality.

Source control

CI starts with some shared repository, typically a source control system, such as Subversion (SVN) or Git. Source control systems make sure all code is kept in a single place. It's easy for developers to check out the source, make changes, and check in those changes. Other developers can then check out those changes.

In modern source control systems, such as Git, you can have multiple branches of the same software. This allows you to work on different stages of the software without troubling, or even halting, other stages of the software. For example, it is possible to have a development branch, a test branch, and a production branch. All new code gets committed on development; when it is tested and approved, it can move on to the test branch and, when your customer has given you approval, you can move it into development. Another possibility is to have a single main branch and create a new (frozen) branch for every release. You could still apply bug fixes to release branches, but preferably not new features.

Don't underestimate the value of source control. It makes it possible for developers to work on the same project and even the same files without having to worry too much about overwriting others' code or being overwritten by others.

Next to code, you should keep everything that's necessary for your project in your repository. That includes requirements, test scripts, build scripts, configurations, database scripts, and so on.

Each check into this repository should be validated by your automated build server. As such, it's important to keep check-ins small. If you write a new feature and change too many files at once, it becomes harder to find any bugs that arise.

CI server

Your builds are automated using some sort of CI server. Popular CI server software includes Jenkins (formerly Hudson), Team Foundation Server (TFS), CruiseControl, and Bamboo. Each CI server has its own pros and cons. TFS, for example, is the Microsoft CI server and works well with .NET (C#, VB.NET, and F#) and integrates with Visual Studio. The free version only has limited features for only small teams. Bamboo is the Atlassian CI server and, thus, works well with JIRA and BitBucket. Like TFS, Bamboo is not free. Jenkins is open source and free to use. It works well for Java, in which Jenkins itself was built, and works with plugins. There are a lot of other CI servers, all with their own pros and cons, but the thing they all have in common is that they automate software builds. For this book, we will use Jenkins as the CI server of choice.

Your CI server monitors your repository and starts a build on every check in. A single build can compile your code, run unit tests, calculate code coverage, check style guidelines, lint your code, minify your code, and much more. Whenever a build fails, for example, because a programmer forgot a semi-colon and checked in invalid code or because a unit test fails, the team should be notified. The CI server may send an email to the programmer who committed the offending code, to the entire team, or you could do nothing (which is not best practice) and just check the status of your build every once in a while. The conditions for failure are completely up to the developer (or the team). Obviously, when your code does not compile correctly because it's missing a semicolon, that's a fail. Likewise, a failing unit test is an obvious fail. Less obvious is that a build can fail when a certain project does not have at least a 90% test code coverage or your technical debt, that is, the time it takes to rewrite quick and dirty solutions to more elegant solutions grows to more than 40 hours.

The CI server should build your software, notify about failures and successes, and ultimately create an artifact. This artifact, an executable of the software, should be easily available to everyone on the team. Since the build passed all of the teams, criteria for passing a build, this artifact is ready for delivery to the customer.

Software quality

That brings us to the point of software quality. If a build on your CI server succeeds, it should guarantee a certain level of software quality. I'm not talking perfect software that is bug-free all of the time, but software that's well tested and checked for best practices. Numerous types of tests exists, but we will only look at a few of them in this book.

Unit tests

One of the most important things you can do to guarantee that certain parts of your software produce correct results is by writing unit tests. A unit test is simply a piece of code that calls a method (the method to be tested) with a predefined input and checks whether the result is what you expect it to be. If the result is correct, it reports success, otherwise it reports failure. The unit test, as the name implies, tests small and isolated units of code.

Let's say you write a function int Add(int a, int b) in C# (I'm pretty sure every programmer can follow though):

public static class MyMath
{
   public static int Add(int a, int b)
   {
      return a + b;
   }
}

The first thing you want to test is whether Add indeed returns a + b and not a + a, or b + b, or even something random. That may sound easier than it is. If you test whether Add(1, 1) returns 2 and the test succeeds, someone might still have implemented it as a + a or b + b. So at the very least, you should test it using two unequal integers, such as Add(1, 2). Now what happens when you call Add(2147483647, 1)? Does it overflow or throw an exception and is that indeed the outcome you suspected? Likewise, you should test for an underflow (while adding!?). -2147483647 + -1 will not return what you'd expect. That's three unit tests for such a simple function! Arguably, you could test for +/-, -/+, and -/- (-3 + -3 equals -6 and not 0), but you'd have to try really hard to break that kind of functionality, so those tests would probably not give you an extra useful test. Your final unit tests may look something like the following:

[TestClass]
public class MathTests
{
   [TestMethod]
   public void TestAPlusB()
   {
      int expected = 3;
      int actual = MyMath.Add(1, 2);
      Assert.AreEqual(expected, actual, "Somehow, 1 + 2 did not equal 3.");
   }

   [TestMethod]
   [ExpectedException(typeof(OverflowException))]
   public void TestOverflowException()
   {
      // MyMath.Add currently overflows, so this test will fail.
      MyMath.Add(int.MaxValue, 1);
   }

   [TestMethod]
   [ExpectedException(typeof(OverflowException))]
   public void TestOverflowException()
   {
      // MyMath.Add currently underflows, so this test will fail.
      MyMath.Add(int.MinValue, -1);
   }
}

Of course, if you write a single unit test and it succeeds, it is no guarantee that your software actually works. In fact, a single function usually has more than one unit test alone. Likewise, if you have written a thousand unit tests, but all they do is check that true indeed equals true, it's also not any indication of the quality of your software. Later in this book, we will write some unit tests for our software. For now, it suffices to say your tests should cover a large portion of your code and, at least, the most likely scenarios. I would say quality over quantity, but in the case of unit testing, quantity is also pretty important. You should actually keep track of your code coverage. There are tools that do this for you, although they cannot check whether your tests actually make any sense.

It is important to note that unit tests should not depend upon other systems, such as a database, the filesystem, or (third-party) services. The input and output of our tests need to be predefined and predictable. Also, we should always be able to run our unit tests, even when the network is down and we can't reach the database or third-party service. It also helps in keeping tests fast, which is a must, as you're going to have hundreds or even thousands of tests that you want to run as fast as possible. Instant feedback is important. Luckily, we can mock (or fake) such external components, as we will see later in this book.

Just writing some unit tests is not going to cut it. Whenever a build passes, you should have reasonable confidence that your software is correct. Also, you do not want unit tests to fail every time you make even the slightest change. Furthermore, specifications change and so do unit tests. As such, unit tests should be understandable and maintainable, just like the rest of your code. And writing unit tests should be a part of your day to day job. Write some code, then write some unit tests (or turn that around if you want to do Test-Driven Development). This means testing is not something only testers do, but the developers as well.

In order to write unit tests, your code should be testable as well. Each if statement makes your code harder to test. Each function that does more than one thing makes your code harder to test. A thousand-line function with multiple nested if and while loops (and I've seen plenty) is pretty much untestable. So when writing unit tests for your code, you are probably already refactoring and making your code prettier and easier to read. Another added benefit of writing unit tests is that you have to think carefully about possible inputs and desirable outputs early, which helps in finding edge cases in your software and preventing bugs that may come from them.

Integration tests

Checking whether an Add function really adds a and b is nice, but does not really give you an indication that the system as a whole works as well. As said, unit tests only test small and isolated units of code and should not interact with external components (external components are mocked). That is why you will want integration tests as well. Integration tests test whether the system as a whole operates as expected. We need to know whether a record can indeed be saved in and retrieved from a database, that we can request some data from an external service, and that we can log to some file on the filesystem. Or, more practically we can check whether the frontend that was created by the frontend team actually fits the backend that was created by the backend team. If these two teams have had any problems or confusion in communication, the integration tests will, hopefully, sort that out.

Last year, we created a service for a third party who wanted to interface with a system we wrote. The service did not do a lot basically it took the received message and forwarded it to another service that we used internally (and wasn't available outside of the network). The internal service had all of the business rules and could read from, and write to, a database. Furthermore, it would, in some cases, create additional jobs that would be put on a (asynchronous) queue, which is yet another service. Last, a fourth service would pick up any messages from the queue and process them. In order to process a single request, we potentially needed five components (external service, internal service, database, queue, and queue processor). The internal service was thoroughly unit tested, so the business rules were covered. However, that still leaves a lot of room for errors and exceptions when one of the components is not available or has an incompatible interface.

Big bang testing

There are two approaches to integration testing: big bang testing and incremental testing. With big bang testing, you simply wait until all the components of a system are ready and then start testing. In the case of my service, that meant developing and installing everything, then posting some requests and checking whether the external service could call the internal service, and whether the internal service could access the database and the queue and, not unimportant, give feedback to the external service. Furthermore, of course, I had to test whether the queue triggered the processing service and whether the processing service processed the message correctly too.

In reality, the processing also used the database; it put new messages on the queue and sent emails in case of errors. Additionally, all the components had to access the hard drive for logging to a file (and do not assume the filesystem is always available; the first time on production I actually ran into an Unauthorized Exception and nothing was logged). So that means even more integration testing.

Incremental testing

With incremental testing, you test components as soon as they are available and you create stubs or drivers (some sort of placeholder) for components that are not yet available. There are two approaches here:

Top-down testing: Using top-down testing would mean I would've checked whether the external service could make a call to the internal service and, if the internal service was not available yet, create a stub that pretends to be the internal service.

Bottom-up testing: Bottom-up is testing the other way around, so I'd start testing the internal service and create a driver that mimics the external service.

Incremental testing has the advantage that you can start defining tests early before all the components are complete. After that, it becomes a matter of filling in the gaps.

Acceptance tests

After having unit tested our code and checked whether the system as a whole works, we can now assume our software works and is of decent quality (at least, the quality we expect). However, that does not mean that our software actually does what was requested. It often happens that the customer requests feature A, the project manager communicates B, and the programmer builds C. There is a really funny comic about it with a swing (do a Google image search for how projects really work). Luckily, we have acceptance tests.

An acceptance test tests whether specific functionality, as described in the specification, works as expected. For example, the external service we built made it possible for the third party to make a call using a specific login method, create a user, update the user, and finally, deactivate that user. The specifics of the updates were described in the specifications document. Some fields were specified by the third party and some fields were calculated by the service. Keep in mind that the actual calculations had been unit tested and that we knew all the parts worked together as we had done some integration testing. This test was all about testing whether the third party, using their Java technology (our service was written in C#, but communication was XML), could indeed create and update a user. I probably tested that manually once or twice. The problem with testing this manually was that it was a web service; the input and output was XML which is not that easy to read and write. The service only returned whether or not the user was successfully created (and if not, why) so in order to test whether everything had gone well, I needed to look up the user record in the database, along with all other records that should have been created. I knew how to do that at the time, but if I needed to do it again now, I'd be pretty frustrated. And if I do not know how to properly test it, then how will my coworkers who need to make changes to the service know? Needless to say, I created something like 30 automated tests that check whether specific use cases work as intended.

Another one of our applications, a website, works pretty much the same. A user can create a record on page A, look it up on page B, and update it. Obviously, XML is not going to cut it here; this is not a web service. In this case, we used GUI tests (that is, Graphical User Interface tests). Our build server is just going to run the application and click on the buttons that we told it to click. If the button is not available, we've got ourselves an error. If the button is available, but does not take us to the requested page, we've got an error. If the page is correctly loaded, but the record is not visible (for whatever reason), we've got an error. The important thing here is that the tests do more or less exactly what our users will do as well.

There is some confusion on the difference between integration tests and acceptance tests. Both test the entire system, but the difference is that integration tests are written from a technical perspective while acceptance tests are written from the perspective of the product owner or business users.

Smoke tests

Of course, even when all of your tests succeed, a product can still break in production. The database may be down or maybe you have a website and the web server is down. It is always important to also test whether your software is actually working in a production environment, so be sure to always do an automated smoke test after deployment that gives you fast and detailed feedback when something goes wrong. A smoke test should test whether the most important parts of your system work. A manual smoke test is fine (and I'd always manually check whether your software, at least, runs after a release), but remember it's another human action that may be forgotten or done poorly.

Some people run smoke tests before doing integration and acceptance tests. Integration and acceptance tests test an entire system and, as such, may take a bit of time. A smoke test, however, tests only basic functionality, such as does the page load? When a smoke test fails, you can skip the rest of your tests, saving you some time and giving you faster feedback.

There are many types of tests available out there. Unit tests, smoke tests, integration tests, system tests, acceptance tests, database tests, functional tests, regression tests, security tests, load tests, UI tests... it never ends! I'm pretty sure you could spend an entire year doing nothing but writing tests. Try selling that to your customer; you can't deliver any working software, but the software you can't deliver is really very well tested. Personally, I'm more pragmatic. A test should support your development process, but a test should never be a goal on its own. When you think you need a test, write a test. When you don't, don't. Unfortunately, I have seen tests that did absolutely nothing (except give you a false sense of security), but I'm guessing someone just wanted to see tests, any tests, really bad.

Other quality gates

Next to tests, you want other measurements of code quality. For example, code that has many nested if statements is hard to test and understand. Writing an if statement without curly braces (for single statements) will increase the chances of bugs in the future. Not closing database connections or file handles may lock up your system and cause other processes to fail. Failing to unsubscribe from (static) events may cause memory leaks. Such errors may easily pass unit tests, but will eventually fail in production. These sort of errors can be very difficult to find as well. For example, a memory leak may cause your application to run slowly or even crash after a day or two. Good luck finding bugs that only happen to some users, sometimes, because they haven't closed the application in two days. Luckily, there are tools that find exactly these kinds of issues. SonarQube is one such tool. It will show you where you can improve your code, how important it is that you fix this code, the time it will probably take to fix it, and a trending graph of your technical debt.

It is important to note here that these issues, unlike unit tests, may or may not be actual bugs. For example, the following code is completely valid, but may introduce bugs that are not easy to spot:

if (valid)
   DoSomething();

Now the specifications change and you, or a coworker, have to change this code so something else is also executed when valid. You change the code as follows:

if (valid)
   DoSomething();
   DoSomethingElseIfValid(); // This is a bug as it's always executed.

Tools such as SonarQube, will recognize this pattern and they will warn you that the code is not best practice including an explanation on what's wrong with it and how to change it. In this case, the original code should be changed, so it's clear what happens when valid:

if (valid)
{
   DoSomething();
}

We will have a look at SonarQube later in this book and see both C# and JavaScript issues that may or may not be bugs.

Automation

Depending on what you're used to, I've got some bad news for you. When doing CI, the command line is your best friend. Personally, I see the need for a command line, but I don't like it one bit. It requires way too much typing and memorization for my taste. Anyway, Linux users rejoice and spoiled Windows users get ready for a trip back to the 80s when user interfaces had yet to be invented. However, we're going to automate a lot, and that will be the computer's job. Computers don't use user interfaces. So, while you hit F5 in Visual Studio and compile your code, your build server needs to know it should run MSBuild with some parameters, such as the location of your solution or the msbuild file.

Luckily, most tools have some form of command-line interface. Whether you are working with .NET, JavaScript, Java, SQL Server, Oracle, or any language or tool, you can always run it using a command line. Throughout this book, we will use various tools and I do not think we will use any of them without using the command line as well. In fact, the command line seems to be back (although, was it ever really gone?). Various tools, such as NodeJS, npm, and MongoDB, are used through the command line. Furthermore, we will see tools, such as MSBuild, MSTest, and NuGet, that all work from the command line (or from a single click in your IDE).

Teamwork

Imagine doing all this locally on your own computer. For simplicity, let's say you've got some code that has to compile and some unit tests that have to run. Easy enough, everybody should be able to do that. Except your manager, who doesn't have the developer software installed at all. Or the intern, who forgot to kick off the unit tests. Or the developer, who works on a different OS making some tests, that aren't important to him, fail (for example, we have an application developed on and for Windows, but a complimentary app for iOS developed on a Mac). Suddenly, getting a working and tested executable becomes a hassle for everyone who isn't working on this project on a daily basis. Besides, the people who can get a working executable may forget to run tests, creating a risk that the executable is compiling, but not actually working. As you can see, a lot can go wrong and there are only two steps. I've intentionally left out all the other tests and quality gates we might have. And that's the biggest benefit to CI. The software is compiled and fully tested automatically, reducing the chance of human errors and making it considerably easier to get a working executable that is more or less guaranteed to work. By testing on a server that closely or completely resembles the production environment, you can further eliminate hard to find bugs.

As you might have guessed, CI is not something you just do. It's a team effort. If you're writing unit tests to make sure everything works as best as it can, but your team members commit large chunks of code, never write tests and ignore the build status, your build becomes untrustworthy and quite useless. In any case, it will not lead to the (increase in) software quality you were hoping for.

Having said all of the above, it's crucial that you, and your team, take your automated build environment very seriously. Keep build times short, so that you get near-instant feedback when a build fails. When someone checks in code that makes the build fail, it should become a top priority to fix the build. Maybe it's that missing semi-colon, maybe a test fails, or maybe more tests have to be added. The bottom line is, when the build fails, it becomes impossible to get an executable with the latest features that's guaranteed to pass your tests and other quality criteria.

When your build passes, it guarantees that the software passes your tests and other quality gates for good software, which should indicate that it's unlikely that the software will break or, worse, produce erroneous results in that part of the system. However, if your tests are of low quality, the software may still break even though your tests pass. Parts of the system that are not tested may still break. Even tested parts can still produce bugs. As such, Continuous Integration is not some magical practice that will guarantee that your code is awesome and free of bugs. However, not practicing it will almost certainly guarantee something somewhere sometime will go wrong.