You're reading from Introduction to JVM Languages Get familiar with the world of Java, Scala, Clojure, Kotlin, and Groovy

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781787127944

Length 450 pages

Edition 1st Edition

Languages

Clojure

Tools

Akka

Concepts

Programming Language

Author (1):

Vincent van der Leun

View More author details

JVM concepts

Every aspiring JVM developer should be familiar with its most important concepts:

JVM is a virtual machine
Most implementations feature a just-in-time (JIT) compiler
It offers a few built-in primitive datatypes
Everything else is an object
Objects are accessed via reference types
The garbage collector (GC) process removes obsolete objects from memory
Build tools are used a lot in the JVM world

Virtual machine

That the Java Virtual Machine is a virtual machine is a rather obvious observation, but it should be kept in mind. One of the consequences is that you are, in theory, writing applications for a type of machine that differs from the machine you are developing or running your applications on.

It generally does not matter whether the code runs on a 32-bit or 64-bit version of the Java Runtime Environment (JRE). The latter will probably make more memory available to the application than the 32-bit version, but the running program will not care about this difference as long as it doesn't make native operating system calls or require gigabytes of memory.

Unlike a language, such as C, where datatype sizes are dependent on the native system, Java does not have this issue (or feature, depending on your point of view). An int integer on JVM is always signed and is of 32-bit size, no matter on which computer platform or system architecture it is running.

Finally, it should be noted that each application that runs on JVM loads its own instance of JVM on system memory. This means when you run multiple Java applications at the same time, they will all have their own copy of JVM at their disposal; this also means different applications can use different versions of JVM if required for whatever reason. For security reasons, it is not suggested that you have different versions of the JDK or JRE on one system; it's usually better to have only the latest supported versions installed.

The JIT compiler

Although not dictated anywhere, all popular JVM implementations are not just simple interpreters; they feature complex JIT compilers along with their interpreters.

When you launch a Java application, JVM is launched and initialized first. Once this is done, it immediately starts interpreting and running the Java bytecode. If the interpreter believes it makes sense, it will compile sections of the programs and load libraries to native executable code in memory and start executing that version of the code instead of the interpreted Java bytecode version. This often results in code that could be executed much faster.

Whether the code is compiled or interpreted depends on many things. If a routine is called often, it becomes a probable candidate for the JIT compiler to compile it to the native code.

The advantage of the JIT approach is that the distributed files can be cross-platform and the user does not have to wait for native compiling of the whole application. Applications start executing immediately after JVM is initialized, and the optimization is done under the hood.

Primitive datatypes

JVM has a few so-called built-in primitive datatypes. This is the main reason why Java is not considered a pure OOP language. Variables of these types are not objects and always have a value:

Java name	Description and size	Values (inclusive)
`byte`	Signed byte (8 bits)	-128 to 127
`short`	Signed short integer (16 bits)	-32768 to 32767
`int`	Signed integer (32 bits)	-2³¹ to 2³¹-1
`long`	Signed long integer (64 bits)	-2⁶³ to 2⁶³-1
`float`	Single-precision floating point (32-bit)	Non-precise floating point values
`double`	Double-precision floating point (64-bit)	Non-precise floating point values
`char`	A single Unicode UTF-16 character (16-bit)	Unicode character 0 to 655535
`boolean`	Boolean	True/False

Note that not all JVM languages support the creation of variables of primitive types and follow this modern assumption: everything takes the object approach. We will see that this is usually not a problem as the Java Class Library has wrapper objects that wrap primitive types, and most languages, including Java, automatically use these wrappers when required. This process is called auto-boxing.

Classes

Functions and variables are always declared inside a class. Even the application entry function that is called upon a program launch, called the main() function, is a function that is located inside a class.

JVM only supports the single-inheritance model. Classes always inherit from one class at the maximum. This is not a big loss. As we will see in the next chapter, a structure called an interface comes to the rescue. An interface is basically a list of function prototypes (only the definition of functions, without code) and constants. Classes that implement an interface are required by the compiler to have implementations for those functions. Classes can implement as many interfaces as they want, but they must provide implementations for each method of all the implemented interfaces.

Some languages covered in this book hide these facts completely from the developer. For example, unlike Java, some languages allow functions and variables to be written outside class declarations or even executable code outside function definitions. Other languages support inheritance of multiple classes. Internally, these languages do clever tricks to work around JVM limitations and design decisions.

JVM classes are usually grouped in packages. In the next chapter, we will see how classes are organized.

Reference types

Like most modern programming languages, JVM does not work with direct memory pointers to objects; it uses reference types. A reference type variable either points to a specific instance of a class or it points to nothing.

If a reference type points to an object, it can be used to call the object's methods or access public attributes.

If a reference type points to nothing, it is called a null reference. When calling methods or reading attributes using a null reference, an error will be generated at runtime. We will see that some languages covered in this book came up with solutions to this common problem.

References and null references

Let's take a look at the following code:

    Product p = new Product();
    p.setName("Box of biscuits");

Assume that Product is a class here that is available to the program. We create a Product instance and the p variable points to it. We then call the setName method on this object instance.

JVM does not give direct access to the memory location where the Product object is stored. It just provides a reference to the created object. When using the variable p, JVM figures out which memory location it has to reach for the object that the variable points to.

We add the following lines to the previous snippet:

    p = null;
    p.setName("This line will produce an error at run-time");

A reference can be cleared explicitly by assigning null to it. Note that this is not necessary for variables declared inside a method, as they will be cleared automatically upon exiting the method. However, it is perfectly acceptable to still do it anyway. Now the variable p is a null reference. In the next paragraph, we will see what will happen to object instances that are no longer referenced by any reference type variable.

The preceding code will compile fine. When running the program, the last line will cause a NullPointerException error, though. If no error handling capability was implemented in the application, it will crash. Many modern IDEs try to detect these situations and warn the developer about them.

Garbage collector

JVM does not require the programmer to manually allocate and release blocks of memory when creating or disposing of objects. The programmer can generally concentrate on just creating objects when he or she needs them.

A process known as the GC halts the application at certain intervals and scans the memory for objects that are no longer in scope (not reachable by any other object loaded at that point). It will remove those objects that can safely be deleted from the memory and reclaim the freed space.

This process used to cause very serious performance issues in the past, but the algorithm has improved much over the years. Also, if an application needs it, system administrators can configure many parameters of the GC to better control it.

The developer should always keep the high-level concept of the GC algorithm in mind. If you keep creating tons of objects and always keep them in the scope (meaning making it in such a way that all those objects can be reached, for example, by storing them in a list that the application can access), then out of the memory, errors are very likely to occur sooner or later.

Example

Let's assume you have developed an e-commerce application for an online store. Also, let's assume that each logged-in user has their own ShoppingBasket instance that holds the products that they add to their basket.

Say, a user has logged in today and is planning to buy a soap bar and a delicious pack of cookies. For this user, the application will create two Product instances, one for each chosen product, and add them to the products list of ShoppingBasket:

Just before visiting the checkout page, the user sees that Amazon offers the same cookies at a much better price and decides to remove the cookies from the basket. Technically, the application would remove the Product instance from the list of products. But from there on, the product instance representing Chocolate cookies is an orphan object. As there is no reference to it, it cannot be reached by the application:

After a while, JVM's GC kicks in and sees the Chocolate cookies object instance. It determines that the object cannot be reached in any way by the application anymore and therefore decides to remove it. The memory the object was using up will now be released:

There are several tricks to tame GC. One well-known trick when an application needs to work with lots of similar objects is to put these objects in a pool (list of objects). When an application needs an object, it simply gets one from the pool and modifies the object according to its needs. When it has finished and doesn't need the object anymore, it will put it back in the pool. Since these objects are always in the scope (when not used, in the pool, which the application can access), GC will not try to dispose of these objects.

Backward compatibility

The maintainers of JVM and Java Class Library understand the needs of business developers. The code that is written today should ideally run tomorrow. JVM offers reasonable backward compatibility. Developers familiar with Python 2 and 3 will know that this is not a given in the industry.

Newer JVM versions can run applications that were compiled for older JVM versions, as long as the application's code does not use APIs or technologies that were removed from the JVM version that is running the application. Here's an example: libraries compiled for Java 6 can still be loaded and used in projects that run on a Java 8 JVM instance. But this is not the case the other way around; applications running on a Java 6 JVM instance cannot load classes compiled for later versions.

Of course, like every other platform or language, the JDK and Java Class Library maintainers have to deprecate classes and whole technologies from time to time. While there are issues, backward compatibility on JVM is generally better than many other platforms and languages. Also, APIs are generally only removed if proper and well-documented alternatives exist.

Build tools

Back when projects were simpler, simple batch or operating system shell script files used to automate the compiling and packaging process. As projects became more complex, it became harder to define these scripts. For different operating systems, completely different scripts had to be written.

Soon, the first set of dedicated Java build tools appeared. These worked with XML build files. More or less, cross-platform compatible scripts could be written this way. At first, long and cumbersome scripts had to be written; later, tools worked with the convention over configuration paradigm. When using the conventions suggested by the tool, much less code has to be written; however, if your situation is different from the default behavior, it can take a lot of effort to let the tools do what you want or need. Newer tools ditch XML files and provide script languages to automate the building.

Some of the features that many of those tools offer are as follows:

Built-in dependency managers that can download add-on libraries from well-known repositories from the Internet
Automatically run unit tests and conditionally stop packaging if a test fails

The JDK does not offer a build tool itself, but it will be hard to find projects that do not at least use one of the following open source build automation tools:

Apache Ant (has no built-in dependency manager and works with XML-based build scripts)
Apache Maven (introduced convention over configuration with XML files and works with plugins)
Gradle (build scripts written in Groovy or Kotlin)

JVM programmers that use a popular IDE do not have to worry too much about build automation tools. This is because all IDEs can generate build scripts themselves. If you want more control, you can start writing your own scripts manually and let the IDE use that script to compile, test, and run your project.