I Created an Assembly Language

It was surprisingly easy

Cameron Kroll
4 min readJan 30, 2022
Photo by Michael Dziedzic on Unsplash

I’ve always been interested in the idea of an evolving computer program. Sure, there are so many ways this could go wrong, but I feel like evolution would make our computers feel more alive.

A couple of months ago, I decided to try making a C# program that evolves. The problem is, IL (C# bytecode) is structured in very specific ways, so almost all mutations would just crash the program.

This problem exists in pretty much all assembly languages. Also, while an evolving computer virus would be extremely cool, it would get old once it starts causing problems.

To solve both the safety problem and the mutation problem, I created an assembly language that runs on its own emulator. The program can only interact with the main computer in specific ways, so it’s sandboxed.

I can also add a bunch of error corrections in the emulator so that mutations don’t completely destroy the program. For example, the emulator treats any unknown instructions like a NOP (no operation) instruction and skips them.

The Emulator

While normal assembly is run on physical chips, my bytecode runs on an emulator written in C++. I’ve structured my emulator in a fairly similar way to real computers, but I’ve simplified it in a couple of ways to make it easier for the program to evolve.

Registers

All computers have tiny bits of memory on the CPU that are used to store and alter data. While some CPUs have separate registers for float and integer values, my emulator has 256 general-purpose registers and distinguishes between floats and integers with different instructions.

While all 256 registers are technically available to the program, the first five are special. First, there’s IPTR and SPTR. IPTR is the address of the current instruction, and SPTR is the address of the top of the stack. To be specific, SPTR is the next free spot on the stack.

Something important to note is that the stack and program memory are mapped together. The stack is defined as the first 2000 bytes of memory, so certain instructions need to offset pointers by 2000 so they point to the right address.

The next two registers are CMP1 and CMP2. The JNE (Jump Not Equal) and JIE (Jump If Equal) instructions check CMP1 and CMP2 when deciding whether to jump. CMP1 is also used to set certain flags which can be used by the JFLAG (Jump if FLAG) instruction.

Finally, there’s the TIME register. TIME contains the number of microseconds that have passed since the program started. Because this is a 32-bit emulator, TIME rolls over about every 33 minutes.

One last thing, r1-r5 are used for syscalls. For example, the COUT syscall prints the (null-terminated) string found at the memory address contained by r1.

Flags

My assembly language has five flags that can be set depending on the values in CMP1 and CMP2. First, there’s GTHAN and LTHAN. GTHAN is true if the value in CMP1 is greater than the value in CMP2. LTHAN is the opposite.

Next, there’s NEG, POS, and ZERO. NEG is true if CMP1 is negative, POS if it’s positive, and ZERO if CMP1 is equal to zero.

Syscalls

We need a way for our programs to interface with the real computer they’re running on. Instead of giving our program access to the entire computer, it can use the CALLOP instruction to call specific functions that we define.

Right now, I’ve only got four syscalls. There’s COUT and CIN, which read and write to the console. There’s MALLOC, which expands the program’s memory by a certain amount of bytes and puts a pointer to the new bytes in r2.

Finally, there’s FREE. This is probably the most dangerous syscall since it shrinks the program’s memory to whatever size is specified in r1. If r1 is zero, or less than the current instruction’s index, the program will end.

I’m still thinking about other syscalls I might want to add. I think a way to get random numbers would be interesting, and I should probably add a way to get the UTC time.

It’d be nice if I could add a way to write data to files, but this seems a little dangerous. As a compromise, I may add a designated file (with a fixed size) that the program can use for persistent memory.

Data types

My emulator/assembly language only supports two data types, int and float. I don’t support unsigned versions of these data types, but some instructions automatically convert to unsigned types.

As I mentioned earlier, all registers have the same type. Instead of having dedicated float and integer registers, I have different math instructions that reinterpret the data in the registers depending on their types. For example, there’s the ADD instruction for integers and the ADDF instruction for floats.

While certain instructions expect specific data types (like JMP), this isn’t a huge problem, since all the instructions fail gracefully. For example, if you jump to a memory address that does not exist, the program will just exit. This is incredibly important because we want mutations to be able to harm the program without crashing it. In this case, the program still has a chance to be shared.

The program

Here’s where things get difficult. I don’t have a way to write this assembly language. This is just a side project I did this week, so I decided writing an assembler wouldn’t be worth the work.

So far, I’ve got a program that says hi and asks the user to share it with their friends. I’m hoping this is enough to start, but that’s why I’m writing this article.

I’m hoping this can become something big, but that can only happen if this thing spreads. Part two of this article will contain a link to the source code, and I’d really appreciate it if you check it out.

--

--