How computers understand code

Nov 08, 2022 7:14 am

How Computers Understand Code

image


Margaret Hamilton, the programmer who took us to the moon.


Have you ever wondered how your computer knows how to run your code? There are several steps involved. It's a complex process, but understanding the basics makes it way easier to write good programs. When you understand why you have to structure your code in a specific way, it makes it clearer which constructs to use, and when.


Contrary to popular belief, programming languages are actually foreign to computers. Computers understand instructions as binary, not text. So obviously, there is a translator somewhere between the two that does a conversion. Over time, these translators have gotten more advanced.


In the early days of computers, humans translated instructions to machine code. Computers back then weren't powerful enough to understand complex code, so we used punch cards to program computers. Similarly to binary machine instructions, punching different combinations of holes led to different operations being performed.


The first programming language was Assembly. Assembly is a 1:1 representation of instructions as text. For example, mov eax, ebx directly translates to a MOV instruction for the x86 computer architecture. Compared to punch cards, Assembly made it much easier to write programs, but it was still very low-level.


Most programmers will never have to write any Assembly in the real world. In large codebases, it becomes difficult to keep track of how everything works when all the code is in Assembly. Also, it takes a long time to write code to perform basic logic, like printing a message to the screen. So instead, we write most code in high-level languages that abstract away these details. By this definition, C, Java, and Python, are all high-level languages.


Nowadays, computers are super powerful. Instead of humans punching cards, or building programs completely out of low-level instructions, we have special programs to turn high-level instructions from text to binary. These programs are called compilers, and they translate source code into machine instructions. You likely have several compilers already on your system, such as the gcc compiler, or the javac compiler for Java. Your Web browser also has a compiler for JavaScript!


Compilers 101

Making a compiler is a long process, and requires lots of effort, but it's generally divided into a few steps:

  1. Lexical analysis: the compiler reads text and turns it into a stream of tokens. Tokens classify text from source. For example, C has different types of tokens for the if and for keywords.
  2. Parsing: the compiler turns the stream of tokens into a data structure that represents the entire program. Usually, this is a tree, which we call an abstract syntax tree (AST). Syntax errors can be detected in either the lexical analysis phase, or the parsing phase.
  3. Semantic analysis: the compiler analyzes the AST and produces a data structure that describes the behavior of the program. This data structure usually contains information about the meaning of tree nodes. For example, a C compiler analyzes the semantics of your main() function to make sure you return an int. This data structure is called an intermediate representation, because it is typically passed to the next steps of the compiler before the final machine code is generated.
  4. Optimization: The compiler makes changes to the intermediate representation to improve the performance of the resulting program, without changing the expected result of the program.
  5. Code generation: the compiler transforms the intermediate representation into actual machine code. This may be actual binary code, but in many cases is just code in another programming language. For example, a C compiler generates Assembly, which is then compiled into machine code. A special compiler that compiles Assembly to machine code is called an assembler.


Now that you know how computers understand code, what sorts of programs are you going to write? With this knowledge, you could potentially even come up with your own programming language and compiler.


I've actually made my own programming language - check out thosakwe/t2b on GitHub. In a future article, I'll explain how I made t2b, and how you can make your own programming language.


See you Thursday,

Tobe Osakwe

Comments