Meet Amit, a deployment engineer at yet another enterprise technology firm. His job is to make sure that during every product release, the organization's code base is compiled and deployed properly in the production environment. During every release, he pulls in the application code and all the necessary jar files and places them in the classpath. He then starts the application that results in the Java Virtual Machine (JVM) loading all the classes and initializing execution.
One night, there was a major product feature release. There were a lot of changes to the code that were all supposed to be deployed and launched together. Amit made sure that all the new code was compiled properly and he had all the necessary jars in the classpath. He then had to start the application. Before he clicked on the button to launch the build, Amit wondered if there was some way he could make sure everything was good and that the application would work without any runtime class errors.
One thing that could potentially go wrong was if he had missed adding a certain class or jar in the classpath. Was there a way he could statically verify whether all the classes were available without actually running the application?
Each JAR bundled a set of types in a set of packages. Each type therein could potentially import other types, either from the same JAR or from other jars. To make sure he has all the classes in the classpath, he has to go to each class and verify that all its imports are in the classpath. Considering that the number of classes in his application run to thousands, it's a Herculean task.
The following diagram is a simplified version of what a sample deployed Java application looks like:
There are four jar files in the picture above, each of which contains packages and classes within them. Jar 1 is deployed in Classpath A, Jar 2 and Jar 3 in Classpath B, and Jar 4 in Classpath C. Let's assume each jar has two classes as indicated by the smaller white boxes. The three paths are configured as classpaths for the Java runtime, so the runtime knows to look at all three paths to scan and pick up classes.
After scanning all the classpaths, this is what the structure looks like to the Java runtime:
Notice that the runtime doesn't care which directory or classpath the package/type is in. It also doesn't care which jar the package/type is bundled in. As far as the Java runtime is concerned, it's just a flattened list of types in packages!
In Java, a classpath is a just set of paths. Any of those locations could have the jars and classes that the application needs to work. You can immediately see how easy it is for things to break! There's always a possibility that some of the classes that the application uses are not available in the classpath. Perhaps a missing jar or library. If the runtime doesn't have a specific class it needs, the application could start running fine, but throw a NoClassDefFoundError much later. That too, only when the execution hits a point where a missing class is actually needed.
This is a huge and very real problem in large Java applications today. There is a whole ecosystem of solutions that have sprung up to address this. For example, tools and build utilities, such as Maven or Gradle, standardize the process of specifying and acquiring external dependencies. Process-based solutions such as continuous integration aim to solve the unpredictable nature of builds across various development environments. However, all that such tools can do is make the process predictable. They cannot verify the validity or accuracy of the result that they help assemble. Once the dependencies are fetched, there's nothing that those tools can do to detect missing or duplicate types in the classpath.
Back to Amit's story. Having no way to verify whether all the classes are available up front, Amit hopes for the best and deploys the application. The application starts up fine and runs for a couple of hours without any errors. However, there's still no saying if he's got it right. Maybe there's a class in there that hasn't been executed yet, but when it has, the JVM might realize that it cannot find one of its imports. Or, maybe, there are duplicate versions of the same class in the classpath and the JVM picks up the first copy it finds. Wasn't there a better way to ensure that any given Java application will work reliably in advance?