Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Build Your Own Programming Language
Build Your Own Programming Language

Build Your Own Programming Language: A programmer's guide to designing compilers, interpreters, and DSLs for modern computing problems , Second Edition

Arrow left icon
Profile Icon Clinton L. Jeffery
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (33 Ratings)
Paperback Jan 2024 556 pages 2nd Edition
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Clinton L. Jeffery
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (33 Ratings)
Paperback Jan 2024 556 pages 2nd Edition
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Build Your Own Programming Language

Why Build Another Programming Language?

This book will show you how to build your own programming language, but first, you should ask yourself, why would I want to do this? For a few of you, the answer will be simple: because it is so much fun. However, for the rest of us, it is a lot of work to build a programming language, and we need to be sure about it before we make that kind of effort. Do you have the patience and persistence that it takes?

This chapter points out a few good reasons to build your own programming language, as well as some circumstances in which you don’t need to build your contemplated language. After all, designing a class library for your application domain is often simpler and just as effective. However, libraries have their limitations, and sometimes, only a new language will do.

After this chapter, the rest of this book will take for granted that, having considered things carefully, you have decided to build a language. But first, we’re going to consider our initial options by covering the following main topics in this chapter:

  • Motivations for writing your own programming language
  • Types of programming language implementations
  • Organizing a bytecode language implementation
  • Languages used in the examples
  • The difference between programming languages and libraries
  • Applicability to other software engineering tasks
  • Establishing the requirements for your language
  • Case study – requirements that inspired the Unicon language

Let’s start by looking at motivations.

Motivations for writing your own programming language

Sure, some programming language inventors are rock stars of computer science, such as Dennis Ritchie or Guido van Rossum! Becoming a rock star in computer science was easier back in the previous century. In 1993, I heard the following report from an attendee of the second ACM History of Programming Languages Conference: “The consensus was that the field of programming languages is dead. All the important languages have been invented already. This was proven wildly wrong a year or two later when Java hit the scene, and perhaps a dozen times since then when important languages such as Go emerged. After a mere six decades, it would be unwise to claim our field is mature and that there’s nothing new to invent that might make you famous.

In any case, celebrity is a bad reason to build a programming language. The chances of acquiring fame or fortune from your programming language invention are slim. Curiosity and a desire to know how things work are valid reasons, so long as you’ve got the time and inclination, but perhaps the best reason to build your own programming language is necessity.

Some folks need to build a new language, or a new implementation of an existing programming language, to target a new processor or compete with a rival company. If that’s not you, then perhaps you’ve looked at the best languages (and compilers or interpreters) available for some domain that you are developing programs for, and they are missing some key features for what you are doing, and those missing features are causing you pain. This is the stuff Master’s theses and PhD dissertations are made of. Every once in a blue moon, someone comes up with a whole new style of computing for which a new programming paradigm requires a new language.

While we are discussing your motivations for building a language, let’s also talk about the different kinds of languages, how they are organized, and the examples this book will use to guide you.

Types of programming language implementations

Whatever your reasons, before you build a programming language, you should pick the best tools and technologies you can find to do the job. In our case, this book will pick them for you. First, there is a question of the implementation language, which is to say, the language that you are building your language in.

Programming language academics like to brag about writing their language in that language itself, but this is usually only a half-truth (or someone was being very impractical and showing off at the same time). There is also the question of just what kind of programming language implementation to build:

  • A pure interpreter that executes the source code itself
  • A native compiler and a runtime system, such as in C
  • A transpiler that translates your language into some other high-level language
  • A bytecode compiler with an accompanying bytecode machine, such as in Java

The first option is fun, but the resulting language is usually too slow to satisfy real-world project requirements. The second option is often optimal, but may be too labor-intensive; a good native compiler may take years of effort.

The third option is by far the easiest and probably the most fun, and I have used it before with good success. Don’t discount a transpiler implementation as a kind of cheating, but do be aware that it has its problems. The first version of C++, AT&T’s cfront tool, was a transpiler, but that gave way to compilers, and not just because cfront was buggy. Strangely, generating high-level code seems to make your language even more dependent on the underlying language than the other options, and languages are moving targets. Good languages have died because their underlying dependencies disappeared or broke irreparably on them. It can be the death of a thousand cuts.

For the most part, this book focuses on the fourth option; over the course of several chapters, we will build a bytecode compiler with an accompanying bytecode machine because that is a sweet spot that gives a lot of flexibility, while still offering decent performance. A chapter on transpilers and preprocessors is provided for those of you who may prefer to implement your language by generating code for another high-level language. A chapter on native code compilation is also included, for those of you who require the fastest possible execution.

The notion of a bytecode machine is very old; it was made famous by UCSD’s Pascal implementation and the classic SmallTalk-80 implementation, among others. It became ubiquitous to the point of entering lay English with the promulgation of Java’s JVM. Bytecode machines are abstract processors interpreted by software; they are often called virtual machines (as in Java Virtual Machine), although I will not use that terminology because it is also used to refer to software tools that implement real hardware instruction sets, such as IBM’s classic platforms, or more modern tools such as Virtual Box.

A bytecode machine is typically quite a bit higher level than a piece of hardware, so a bytecode implementation affords much flexibility. Let’s have a quick look at what it will take to get there…

Organizing a bytecode language implementation

To a large extent, the organization of this book follows the classic organization of a bytecode compiler and its corresponding virtual machine. These components are defined here, followed by a diagram to summarize them:

  • A lexical analyzer reads in source code characters and figures out how they are grouped into a sequence of words or tokens.
  • A syntax analyzer reads in a sequence of tokens and determines whether that sequence is legal, according to the grammar of the language. If the tokens are in a legal order, it produces a syntax tree.
  • A semantic analyzer checks to ensure that all the names being used are legal for the operations in which they are being used. It checks their types to determine exactly what operations are being performed. All this checking makes the syntax tree heavy, laden with extra information about where variables are declared and what their types are.
  • An intermediate code generator figures out memory locations for all the variables and all the places where a program may abruptly change execution flow, such as loops and function calls. It adds them to the syntax tree and then walks this even fatter tree, before building a list of machine-independent intermediate code instructions.
  • A final code generator turns the list of intermediate code instructions into the actual bytecode, in a file format that will be efficient to load and execute.

In addition to the steps of this bytecode virtual machine compiler, a bytecode interpreter is written to load and execute programs. It is a giant loop with a switch statement in it. For very high-level programming languages, the compiler might be no big deal, and all the magic may be in the bytecode interpreter. The whole organization can be summarized by the following diagram:

Figure 1.1 – Phases and dataflow in a simple programming language

Figure 1.1: Phases and dataflow in a simple programming language

It will take a lot of code to illustrate how to build a bytecode machine implementation of a programming language. How that code is presented is important and will tell you what you need to know going in, as well as what you may learn from going through this book.

Languages used in the examples

This book provides code examples in two languages using a parallel translations model. The first language is Java because that language is ubiquitous. Hopefully, you know Java (or C++, or C#) and will be able to read the examples with intermediate proficiency. The second example language is the author’s own language, Unicon. While reading this book, you can judge for yourself which language is better suited to building programming languages. As many examples as possible are provided in both languages, and the examples in the two languages are written as similarly as possible. Sometimes, this will be to the advantage of Java, which is a bit lower level than Unicon. There are sometimes fancier or shorter ways to write things in Unicon, but our Unicon examples will stick as close to Java as possible. The differences between Java and Unicon will be obvious, but they are somewhat lessened in importance by the compiler construction tools we will use.

This book uses modern descendants of the venerable Lex and YACC tools to generate our scanner and parser. Lex and YACC are declarative programming languages that solve some of our hard problems at a higher level than Java or Unicon. It would have been nice if a modern descendant of Lex and YACC (such as ANTLR) supported both Java and Unicon, but such is not the case. One of the very cool parts of this book is this: by choosing tools for Java and Unicon that are very compatible with the original Lex and YACC and extending them a bit, we have managed to use the same lexical and syntax specifications of our compiler in both Java and Unicon!

While Java and Unicon are our implementation languages, we need to talk about one more language: the example language we are building. It is a stand-in for whatever language you decide to build. Somewhat arbitrarily, this book introduces a language called Jzero for this purpose. Niklaus Wirth invented a toy language called PL/0 (programming language zero; the name is a riff on the language name PL/1) that was used in compiler construction courses. Jzero is a tiny subset of Java that serves a similar purpose. I looked pretty hard (that is, I googled Jzero and then Jzero compiler) to see whether someone had already posted a Jzero definition we could use and did not spot one by that name, so we will just make it up as we go along.

The Java examples in this book will be tested using Java 21; maybe other recent versions of Java will work. You can get OpenJDK from http://openjdk.org, or if you are on Linux, your operating system probably has an OpenJDK package that you can install. Additional programming language construction tools (Jflex and byacc/j) that are required for the Java examples will be introduced in subsequent chapters as they are used. The Java implementations we will support might be more constrained by which versions will run these language construction tools than anything else.

The Unicon examples in this book work with Unicon version 13.3, which can be obtained from http://unicon.org. To install Unicon on Windows, you must download a .msi file and run the installer. To install on Linux, you should follow the instructions found on the unicon.org site.

Having gone through the basic organization of a programming language and the implementation that this book will use, perhaps we should take another look at when a programming language is called for, and when building one can be avoided by developing a library instead.

The difference between programming languages and libraries

Unless you are in it for the “fun” or the intellectual experience, building a programming language is a lot of work that might not be necessary. If your motives are strictly utilitarian, you don’t have to make a programming language when a library will do the job. Libraries are by far the most common way to extend an existing programming language to perform a new task. A library is a set of functions or classes that can be used together to write applications for some hardware or software technology. Many languages, including C and Java, are designed almost completely to revolve around a rich set of libraries. The language itself is very simple and general, while much of what a developer must learn to develop applications consists of how to use the various libraries.

The following is what libraries can do:

  • Introduce new data types (classes) and provide public functions (an API) to manipulate them
  • Provide a layer of abstraction on top of a set of hardware or operating system calls

The following is what libraries cannot do:

  • Introduce new control structures and syntax in support of new application domains
  • Embed/support new semantics within the existing language runtime system

Libraries do some things badly, so you might end up preferring to make a new language:

  • Libraries often get larger and more complex than necessary.
  • Libraries can have even steeper learning curves and poorer documentation than languages.
  • Every so often, libraries have conflicts with other libraries.
  • Applications that use libraries can become broken if the library changes incompatibly in a later version.

There is a natural evolutionary path from a library to a language. A reasonable approach to building a new language to support an application domain is to start by making or buying the best library available for that application domain. If the result does not meet your requirements in terms of supporting the domain and simplifying the task of writing programs for that domain, then you have a strong argument for a new language.

This book is about building your own language, not just building your own library. It turns out that learning about tools and techniques to implement programming languages is useful in many other contexts.

Applicability to other software engineering tasks

The tools and technologies you learn about from building your own programming language can be applied to a range of other software engineering tasks. For example, you can sort almost any file or network input processing task into three categories:

  • Reading XML data with an XML library
  • Reading JSON data with a JSON library
  • Reading anything else by writing code to parse it in its native format

The technologies in this book are useful in a wide array of software engineering tasks, which is where the third of these categories is encountered. Frequently, structured data must be read in a custom file format.

For some of you, the experience of building your own programming language might be the single largest program you have written thus far. If you persist and finish it, it will teach you lots of practical software engineering skills, besides whatever you learn about compilers, interpreters, and the such. This will include working with large dynamic data structures, software testing, and debugging complex problems, among other skills.

That’s enough of the inspirational motivation. Let’s talk about what you should do first: figure out your requirements.

Establishing the requirements for your language

After you are sure you need a new programming language for what you are doing, take a few minutes to establish the requirements. This is open-ended. It is you defining what success for your project will look like. Wise language inventors do not create a whole new syntax from scratch. Instead, they define it in terms of a set of modifications to make to a popular existing language.

Many great programming languages (Lisp, Forth, Smalltalk, and many others) had their success significantly limited by the degree to which their syntax was unnecessarily different from mainstream languages. Still, your language requirements include what it will look like, and that includes syntax.

More importantly, you must define a set of control structures or semantics where your programming language needs to go beyond existing language(s). This will sometimes include special support for an application domain that is not well served by existing languages and their libraries. Such domain-specific languages (DSLs) are common enough that whole books are focused on that topic. Our goal for this book will be to focus on the nuts and bolts of building the compiler and runtime system for such a language, independent of whatever domain you may be working in.

In a normal software engineering process, requirements analysis would start with brainstorming lists of functional and non-functional requirements. Functional requirements for a programming language involve the specifics of how the end user developer will interact with it. You might not anticipate all the command-line options for your language up front, but you probably know whether interactivity is required, or whether a separate compile step is OK. The discussion of interpreters and compilers in the previous section, and this book’s presentation of a compiler, might seem to make that choice for you, but Python is an example of a language that provides a fully interactive interface, even though the source code you type into Python gets compiled into bytecode and executed by a bytecode machine, rather than being interpreted directly.

Non-functional requirements are properties that your programming language must achieve that are not directly tied to the end user developer’s interactions. They include things such as what operating system(s) your language must run on, how fast execution must be, or how little space the programs written in your language must run within.

The non-functional requirement regarding how fast execution must be usually determines the answer as to whether you can target a software (bytecode) machine or need to target native code. Native code is not just faster; it is also considerably more difficult to generate, and it might make your language considerably less flexible in terms of runtime system features. You might choose to target bytecode first, and then work on a native code generator afterward.

The first language I learned to program on was a BASIC interpreter in which the programs had to run within 4 KB of RAM. BASIC at the time had a low memory footprint requirement. But even in modern times, it is not uncommon to find yourself on a platform where Java won’t run by default! For example, on virtual machines with configured memory limits for user processes, you may have to learn some awkward command-line options to compile or run even simple Java programs.

In addition to identifying functional and non-functional requirements, many requirements analysis approaches also define a set of use cases and ask the developer to write descriptions for them. Inventing a programming language is different from your average software engineering project, but before you are finished, you may want to go there and perform such a use case analysis. A use case is a task that someone performs using a software application. When the software application is a programming language, if you are not careful, the use cases may be too general to be useful, such as write my application and run my program. While those two might not be very useful, you might want to think about whether your programming language implementation must support program development, debugging, separate compilation and linking, integration with external languages and libraries, and so forth. Most of those topics are beyond the scope of this book, but we will consider some of them.

Since this book presents the implementation of a language called Jzero, here are some requirements for Jzero. Some of these requirements may appear arbitrary. You could certainly add your own requirements and produce your own Java dialect, but this list describes what we are aiming for in this book. If it is not clear to you where one of the following requirements came from, it either came from our source inspiration language (plzero) or previous experience teaching compiler construction:

  • Jzero should be a strict subset of Java. All legal Jzero programs should be legal Java programs. This requirement allows us to check the behavior of our test programs when we are debugging our language implementation.
  • Jzero should provide enough features to allow interesting computations. This includes if statements, while loops, and multiple functions, along with parameters.
  • Jzero should support a few data types, including Booleans, integers, arrays, and the String type. However, it only needs to support a subset of their functionality, (as you’ll see later). These types are enough to allow input and output of interesting values into a computation.
  • Jzero should emit decent error messages, showing the filename and line number, including messages for attempts to use Java features not in Jzero. We will need reasonable error messages to debug the implementation.
  • Jzero should run fast enough to be practical. This requirement is vague, but it implies that we won’t be doing a pure interpreter. Pure interpreters that execute source code directly without any internal code generation step are a very retro thing, evocative of the 1960s and 1970s. They tend to execute unacceptably slowly by modern standards. On the other hand, you might very well decide that your language should provide the highly interactive look and feel of a pure interpreter, like Python does. Anyhow, that is not in Jzero’s requirements.
  • Jzero should be as simple as possible so that I can explain it. Sadly, this rules out writing a full description of a native code generator or even an implementation that targets JVM bytecode; we will provide our own simple bytecode machine.

Perhaps more requirements will emerge as we go along, but this is a start. Since we are constrained for time and space, perhaps this requirements list is more important for what it does not say, rather than for what it does say. By way of comparison, here are some of the requirements that led to the creation of the Unicon programming language.

Case study – requirements that inspired the Unicon language

This book will use the Unicon programming language, located at http://unicon.org, for a running case study. We can start with reasonable questions such as, why build Unicon, and what are its requirements? To answer the first question, we will work backward from the second one.

Unicon exists because of an earlier programming language called Icon, from the University of Arizona (http://www.cs.arizona.edu/icon/). Icon has particularly good string and list processing facilities and is used to write many scripts and utilities, as well as both programming language and natural language processing projects. Icon’s fantastic built-in data types, including structure types such as lists and (hash) tables, have influenced several languages, including Python and Unicon. Icon’s signature research contribution is its integration of goal-directed evaluation, including backtracking and automatic resumption of generators, into a familiar mainstream syntax. This leads us to Unicon’s first requirement.

Unicon requirement #1 – preserve what people love about Icon

One of the things that people love about Icon is its expression semantics, including its generators and goal-directed evaluation. A generator is an expression that is capable of computing more than one result; several popular languages feature generators. Goal-directed evaluation is a semantic to execute code in which expressions either succeed or fail, and when they fail, generators within the expression can be resumed to try alternative results that might make the whole expression succeed. This is a big topic beyond the scope of this section, but if you want to learn more, you can check out The Icon Programming Language, Third Edition, by Ralph and Madge Griswold, at www.cs.arizona.edu/icon.

Icon also provides a rich set of built-in functions and data types so that many or most programs can be understood directly from the source code. Unicon’s preservation goal is 100% compatibility with Icon. In the end, we achieved more like 99% compatibility.

It is a bit of a leap from preserving the best bits to the immortality goal of ensuring old source code will run forever, but for Unicon, we include that as part of requirement #1. We have placed a much firmer requirement on backward compatibility than most modern languages. While C is very backward compatible, C++, Java, Python, and Perl are examples of languages that have wandered away, in some cases far away, from being compatible with the programs written in them back in their glory days. In the case of Unicon, perhaps 99% of Icon programs run unmodified as Unicon programs. Unicon requirement #2 was to support programming in large-scale projects.

Unicon requirement #2 – support large-scale programs working on big data

Icon was designed for maximum programmer productivity on small-sized projects; a typical Icon program is less than 1,000 lines of code, but Icon is very high level, and you can do a lot of computing in a few hundred lines of code! Still, computers keep getting more capable, and modern programmers are often required to write much larger programs than Icon was designed to handle.

For this reason of scalability, Unicon adds classes and packages to Icon, much like C++ adds them to C. Unicon also improved the bytecode object file format and made numerous scalability improvements to the compiler and runtime system. It also refines Icon’s existing implementation to be more scalable in many specific items, such as adopting a much more sophisticated hash function. Unicon requirement #3 is to support ubiquitous input/output capabilities at the same high level as the built-in types.

Unicon requirement #3 – high-level input/output for modern applications

Icon was designed for classic UNIX pipe-and-filter text processing of local files. Over time, more and more people wanted to use it to write programs that required more sophisticated forms of input/output, such as networking or graphics.

Arguably, despite billionfold improvements in CPU speed and memory size, the biggest difference between programming in 1970 and programming in the 2020s is that we expect modern applications to use a myriad of sophisticated forms of I/O: graphics, networking, databases, and so forth. Libraries can provide access to such I/O, but language-level support can make it easier and more intuitive.

Support for I/O is a moving target. At first, with Unicon, I/O consisted of networking facilities and GDBM and ODBC database facilities to accompany Icon’s 2D graphics. Then, it grew to include various popular internet protocols and 3D graphics. The definition of what I/O capabilities are ubiquitous continues to evolve, varying by platform, but touch input and gestures or shader programming capabilities are examples of things that have become ubiquitous today, and maybe they should be added to the Unicon language as part of this requirement. The challenge posed by this requirement is increased by Unicon requirement #4.

Unicon requirement #4 – provide universally implementable system interfaces

Icon is very portable. I have run it on everything, from Amigas to Crays to IBM mainframes with EBCDIC character sets. Although the platforms have changed almost unbelievably over the years, Unicon still retains Icon’s goal of maximum source code portability: code that gets written in Unicon should continue to run unmodified on all computing platforms that matter.

For a very long time, portability meant running on PCs, Macs, and UNIX workstations. But again, the set of computing platforms that matter is a moving target. These days, to meet this requirement, Unicon should be ported to support Android and iOS, if you count them as computing platforms. Whether they count might depend on whether they are open enough and used for general computing tasks, but they are certainly capable of being used as such.

All those juicy I/O facilities that were implemented for requirement #3 must be designed in such a way that they can be multi-platform portable across all major platforms.

Having given you some of Unicon’s primary requirements, here is an answer to the question, why build Unicon at all? One answer is that after studying many languages, I concluded that Icon’s generators and goal-directed evaluation (requirement #1) were features that I wanted when writing programs from now on. However, after allowing me to add 2D graphics to their language, Icon’s inventors were no longer willing to consider further additions to meet requirements #2 and #3. Another answer is that there was a public demand for new capabilities, including volunteer partners and some financial support. Thus, Unicon was born.

Summary

In this chapter, you learned the difference between inventing a programming language and inventing a library API to support whatever kinds of computing you want to do. Several different forms of programming language implementations were considered. This first chapter allowed you to think about functional and non-functional requirements for your own language.

These requirements might be different for your language than the example requirements discussed for the Java subset Jzero and the Unicon programming language, which were both introduced.

Requirements are important because they allow you to set goals and define what success will look like. In the case of a programming language implementation, the requirements include what things will look and feel like for the programmers that use your language, as well as what hardware and software platforms it must run on. The look and feel of a programming language include answering both external questions regarding how the language implementation and the programs written in the language are invoked, as well as internal issues such as verbosity: how much the programmer must write to accomplish a given compute task.

You may be keen to get straight to the coding part. Although the classic build-and-fix mentality of novice programmers might work on scripts and short programs, for a piece of software as large as a programming language, we need a bit more planning first. After this chapter’s coverage of the requirements, Chapter 2, Programming Language Design, will prepare you to construct a detailed plan for the implementation, which will occupy our attention for the remainder of this book!

Questions

  1. What are the pros and cons of writing a language transpiler that generates C code, instead of a traditional compiler that generates assembler or native machine code?
  2. What are the major components or phases in a traditional compiler?
  3. From your experience, what are some pain points where programming is more difficult than it should be? What new programming language feature(s) address these pain points?
  4. Write a set of functional requirements for a new programming language.

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://discord.com/invite/zGVbWaxqbw

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Takes a hands-on approach; learn by building the Jzero language, a subset of Java, with example code shown in both the Java and Unicon languages
  • Learn how to create parsers, code generators, scanners, and interpreters
  • Target bytecode, native code, and preprocess or transpile code into a high-level language

Description

There are many reasons to build a programming language: out of necessity, as a learning exercise, or just for fun. Whatever your reasons, this book gives you the tools to succeed. You’ll build the frontend of a compiler for your language and generate a lexical analyzer and parser using Lex and YACC tools. Then you’ll explore a series of syntax tree traversals before looking at code generation for a bytecode virtual machine or native code. In this edition, a new chapter has been added to assist you in comprehending the nuances and distinctions between preprocessors and transpilers. Code examples have been modernized, expanded, and rigorously tested, and all content has undergone thorough refreshing. You’ll learn to implement code generation techniques using practical examples, including the Unicon Preprocessor and transpiling Jzero code to Unicon. You'll move to domain-specific language features and learn to create them as built-in operators and functions. You’ll also cover garbage collection. Dr. Jeffery’s experiences building the Unicon language are used to add context to the concepts, and relevant examples are provided in both Unicon and Java so that you can follow along in your language of choice. By the end of this book, you'll be able to build and deploy your own domain-specific language.

Who is this book for?

This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler design or construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate or better proficiency in Java or C++ programming languages (or another high-level programming language) is assumed.

What you will learn

  • Analyze requirements for your language and design syntax and semantics.
  • Write grammar rules for common expressions and control structures.
  • Build a scanner to read source code and generate a parser to check syntax.
  • Implement syntax-coloring for your code in IDEs like VS Code.
  • Write tree traversals and insert information into the syntax tree.
  • Implement a bytecode interpreter and run bytecode from your compiler.
  • Write native code and run it after assembling and linking using system tools.
  • Preprocess and transpile code into another high-level language

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jan 31, 2024
Length: 556 pages
Edition : 2nd
Language : English
ISBN-13 : 9781804618028
Vendor :
Oracle
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jan 31, 2024
Length: 556 pages
Edition : 2nd
Language : English
ISBN-13 : 9781804618028
Vendor :
Oracle
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total €57.95 €83.97 €26.02 saved
Clang Compiler Frontend
€29.99
Build Your Own Programming Language
€37.99
Learn LLVM 17
€29.99 €37.99
Total €57.95€83.97 €26.02 saved Stars icon
Banner background image

Table of Contents

25 Chapters
Section I: Programming Language Frontends Chevron down icon Chevron up icon
Why Build Another Programming Language? Chevron down icon Chevron up icon
Programming Language Design Chevron down icon Chevron up icon
Scanning Source Code Chevron down icon Chevron up icon
Parsing Chevron down icon Chevron up icon
Syntax Trees Chevron down icon Chevron up icon
Section II: Syntax Tree Traversals Chevron down icon Chevron up icon
Symbol Tables Chevron down icon Chevron up icon
Checking Base Types Chevron down icon Chevron up icon
Checking Types on Arrays, Method Calls, and Structure Accesses Chevron down icon Chevron up icon
Intermediate Code Generation Chevron down icon Chevron up icon
Syntax Coloring in an IDE Chevron down icon Chevron up icon
Section III: Code Generation and Runtime Systems Chevron down icon Chevron up icon
Preprocessors and Transpilers Chevron down icon Chevron up icon
Bytecode Interpreters Chevron down icon Chevron up icon
Generating Bytecode Chevron down icon Chevron up icon
Native Code Generation Chevron down icon Chevron up icon
Implementing Operators and Built-In Functions Chevron down icon Chevron up icon
Domain Control Structures Chevron down icon Chevron up icon
Garbage Collection Chevron down icon Chevron up icon
Final Thoughts Chevron down icon Chevron up icon
Section IV: Appendix Chevron down icon Chevron up icon
Answers Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(33 Ratings)
5 star 66.7%
4 star 24.2%
3 star 0%
2 star 6.1%
1 star 3%
Filter icon Filter
Top Reviews

Filter reviews by




James W Feb 25, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Oh boy this took me right back to my coding days well back to my old Amiga and assembler and debugging my version of byte code to build onwards with C and parsing and code blocks to tree traversal not for the faint hearted but great to see the x64 code generation to quote‘Main memory access is slow. Performance is heavily impacted by how registers are used. Optimalregister allocation is nondeterministic polynomial-time complete (NP-complete) – very difficult.’You need to have a mind shift but once you get it then you understand what’s going on under the hood.
Amazon Verified review Amazon
SRP Apr 23, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
My interest in this book was more academic than "for fun." In the end, I had fun working through the examples in the book. The balance between theory and practice was just right, which was the main reason the book was more appealing to me.While not a deal breaker, I also think (like some other reviewers) that using Java-only examples would have sufficed. To be fair, the author does mention that by giving examples in Java and Unicon, it will be up to the readers to judge for themselves which is better suited to them.I recommend the book to anyone who has an interest in programming.
Amazon Verified review Amazon
Chaitanya Yadav Feb 19, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book equips you with the knowledge and tools needed to turn your language creation dream into reality. With insights from Dr. Jeffery's experience and a choice of Unicon or Java examples, it's a must-read for aspiring language creators.𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝘄𝗶𝗹𝗹 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺 𝘁𝗵𝗶𝘀:𝗔𝗻𝗮𝗹𝘆𝘇𝗶𝗻𝗴 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗱𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝘀𝘆𝗻𝘁𝗮𝘅 & 𝘀𝗲𝗺𝗮𝗻𝘁𝗶𝗰𝘀: Define your language's purpose and shape its structure.𝗖𝗿𝗮𝗳𝘁𝗶𝗻𝗴 𝗴𝗿𝗮𝗺𝗺𝗮𝗿 𝗿𝘂𝗹𝗲𝘀: Master the expressions and control flow that bring your language to life.𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝘀𝗰𝗮𝗻𝗻𝗲𝗿 & 𝗽𝗮𝗿𝘀𝗲𝗿: Decode and understand the code written in your language.𝗔𝗱𝗱𝗶𝗻𝗴 𝘀𝘆𝗻𝘁𝗮𝘅 𝗰𝗼𝗹𝗼𝗿𝗶𝗻𝗴: Enhance readability and user experience in popular IDEs like VS Code.𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗶𝗻𝗴 𝘁𝗿𝗲𝗲 𝘁𝗿𝗮𝘃𝗲𝗿𝘀𝗮𝗹𝘀: Interact with your language's structure for powerful code manipulation.𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗶𝗻𝗴 𝗯𝘆𝘁𝗲𝗰𝗼𝗱𝗲 𝗶𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗻𝗮𝘁𝗶𝘃𝗲 𝗰𝗼𝗱𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻: Run your language directly on the machine or translate it for wider compatibility.𝗘𝘅𝗽𝗹𝗼𝗿𝗶𝗻𝗴 𝗽𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗼𝗿𝘀 & 𝘁𝗿𝗮𝗻𝘀𝗽𝗶𝗹𝗲𝗿𝘀: Transform code between languages for flexibility and efficiency.𝗖𝗿𝗲𝗮𝘁𝗶𝗻𝗴 𝗱𝗼𝗺𝗮𝗶𝗻-𝘀𝗽𝗲𝗰𝗶𝗳𝗶𝗰 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀: Tailor your language to specific problem domains for optimal solutions.𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗨𝗻𝗶𝗰𝗼𝗻 & 𝗝𝗮𝘃𝗮 𝗰𝗼𝗱𝗲 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀: Follow along in your preferred language.
Amazon Verified review Amazon
Rohit Mehta Apr 11, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book has very detailed explanation of how programming languages are built. You'll have to dig deeper a little bit to understand each and every concept. This book along with the detailed explanation also takes a step by step approach with code segments for each part. Also, after each chapter there are questions which will make you think and help you understand the topic in more detail. Highly recommended for anyone with a curious mind for programming.
Amazon Verified review Amazon
John Max Apr 14, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an exceptional resource that delves deep into the intricacies of language design and implementation. I must say this book is a goldmine of knowledge for individuals fascinated by the process of creating programming languages.One of the book's standout features I liked is its hands-on approach. By guiding readers through the development of the Jzero language, the authors provide practical examples with code snippets available in both Java and Unicon languages. This dual-language approach not only enhances comprehension but also accommodates readers' language preferences.The book covers essential aspects such as creating parsers, scanners, interpreters, and code generators in great detail, making it an indispensable resource for aspiring language developers. I have read the previous book and the addition of a new chapter in this edition focusing on preprocessors and transpilers, addresses contemporary challenges in language development, enriching the book's content.I particularly appreciate the modernized and expanded code examples. Dr. Jeffery's expertise in crafting the Unicon language is evident throughout the book, providing valuable context and relevant examples. The book's structured approach, from analyzing language requirements to implementing domain-specific features and addressing garbage collection, is well-paced and easy to follow. Practical tips, such as implementing syntax-coloring in IDEs like VS Code, further enhance the learning experience.In summary, "Build your own Programming Language - Second Edition" is a must-have for any computer science enthusiast. The comprehensive coverage, practical examples, and expert insights keeps you engaged through out . Highly recommended!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.