Create Your Programming Language: Step-by-Step Instructions

Create your programming language from scratch: Step-by-step guide
Create Your Programming Language: Step-by-Step Instructions
Published on

It's quite natural to get interested to create your programming language. Unfortunately, most of the explanations that we encounter are either too theoretical or academic. Some others have too many details of implementation. We still don't know how things work, even after reading them.

Overview

Most requests to create your Programming Language (PL) actually turn out to be requests to learn how to build a compiler. Their desire is to understand the inner mechanics of how it all works, to use a new language.

While a compiler is indeed required, the process of creating a new PL encompasses other phases: 

1. Designing of a language: Designer of the language has to make some fundamental decisions regarding paradigms that will be supported and the syntax of the language.

2. Writing compiler: A compiler has to be written.

3. Standard library implementation: Implementation of standard library has to be done.

4. Providing Auxiliary tools: Editors, Build systems have to be provided for proper development

Create your  Programming Language

You can ignore this if you just want to learn how these sorts of things work and you're going to do that by writing your own compiler. Getting started is pretty easy: just take a subset of an existing language or devise a simple version. However, you will have to consider it if you want to create your programming language.

Create your programming language with two phases

1.The big picture phase

We answer the central questions about our language during the first phase itself.

What execution paradigm shall we use? Will it be based on necessity or utility? or maybe on business rules or state machines?

What font face shall we prefer—Dynamic or static?

What kind of programs is this language going to be best suited for? Large systems or small scripts?

We care most about performance. Readability?

Should it resemble any other existing programming language? Is it for C programmers or is it easy enough for someone coming from Python to learn?

Should it run on a particular platform JVM, CLR?

If any, what style of metaprogramming facilities are we trying to be able to support? Largest? templates? Reflection? 

2. The refinement phase

We'll keep using the language, and it will evolve in the second phase. We'll have problems and ideas that can't be expressed, or that are expressed only in a very awkward way, and we'll need to finally change the language. Undramatic as the second phase may be compared to the first, we must not forget that we are polishing our language during this phase to make it feasible.

How about we develop a compiler?

Since things are rather complicated, we do the following:

1. Write a parser: The parser for our compiler will read the text of our programs and make decisions about which instructions they correspond to. It will also create internal data structures to represent the expressions, statements, and classes it recognizes. Instead of working directly with the original text, the rest of the parser will work with those data structures.

2. Translate the parse tree into an Abstract Syntax Tree: Since the parser generates lots of details which we are not interested in for our compiler, the data structure it builds will be a bit low level usually. We often want to reorganize the data structures in a somewhat higher level.

3. Symbol resolution: We write things like a + 1 in the code. It is left to our compiler to figure out what ‘a’ means. Is that a field? Is it changeable? Is it a parameter for a method? We answer to that by looking at the code.

4. Check the tree: We should check that the programmer didn't make any errors. Is he trying to add an int and a boolean? Or access a field that doesn't exist? We should create suitable error messages.

In this phase we turn the code into a form which the computer can take advantage of it. We might have correct machine code or bytecode for a virtual machine.

5. Generation of machine code: This is where we convert the code into a form that the computer will be able to use. It could be proper machine code or bytecode for a virtual machine.

6. Link the code: At times, we need to link the machine code generated for our programs with the code of the static libraries we want to use to obtain a single executable.

Conclusion:

Creating your programming language from scratch isn't just about compilers. Rather, it's about the design of a tool for programmers: what kind of programs, how readable, what kind of platform? First comes the big picture, then building a compiler incrementally: parsing code, building structures, and translating into a form the computer understands. End with libraries and tools on top to really make using your language great. Quite an interesting process, but more significantly, how do you feel about your vision of a unique and powerful language?

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net