Post

The ULR JIT Engine Finally Works! 🎉

This article discusses code from the Uncommon Language Runtime.

After over a year of developing the ULR1, JIT compilation and execution is finally here! Here’s a rundown of the current JIT pipeline, from start to finish.

We start with writing our JIT program in plaintext.

1
2
3
4
5
6
7
8
9
10
# file: adder.uil

public class []Adder
	public static [System]Int32 Add([System]Int32,[System]Int32)
		ldapl 0
		ldapl 1
		add i32
		ret
	end
end

The file starts with a class declaration for Adder, declaring it in the global namespace (as denoted by the empty square brackets). Adder contains a single static method Add which takes two 32-bit signed integers and returns an integer (int Add(int, int)). Add contains four instructions. The first two instructions (ldapl - “Load Argument-Passed Local”) load arguments onto the evaluation stack (similar to MSIL/CIL). The first argument is 0, the second is 1, and so on. The next instruction (add i32) takes the last two values on the evaluation stack, interprets them as signed 32-bit integers, and adds them, pushing the result to the stack. The final instruction returns from the function, popping the top value off the evaluation stack as the return value.

To compile our UIL file to a binary format, we invoke the UIL Assembler:

1
uilasm adder.uil

The assembler outputs a binary file called adder.uil.ulas (“ulas” standing for Uncommon Language ASsembly). The binary file contains two sections: the first is the code section, which contains type/method definitions and method bodies, and the second section is the strings section, which contains any string literals that may have been in the program as well as any type or method name references (for example, the names []Adder and [System]Int32 would also be in the strings section).

Since the ULR at this point does not have the ability to run standalone JIT assemblies (but will have that ability soon!), we’ll need to create a native C++ assembly that loads the JIT assembly and invokes it for us.

Here’s what our native assembly looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// file: AssemblyInfo.cpp

#include <StdULR.hpp>
#include <iostream>

BEGIN_ULR_EXPORT

void InitAssembly(ULRAPIImpl* ulr)
{
	internal_api = ulr;
}

void overload0_ns0_Program_ctor(char* self) {}

sizeof_ns1_System_Int32 overload0_ns0_Program_Main(char* argv)
{
	internal_api->LoadJITAssembly("/path/to/adder.uil.ulas");

	Type* Int32Type = internal_api->GetType("[System]Int32");

	MethodInfo* addinfo = internal_api->GetMethod(
		internal_api->GetType("[]Adder"),
		"Add",
		{ Int32Type, Int32Type }
	);

	std::cout << addinfo->offset << std::endl;

	auto add = (int (*)(int, int)) addinfo->offset;

	int a = 4;
	int b = 5;

	std::cout << "add(" << a << ", " << b << ") = " << add(a, b) << '\n';

	return 0;
}

char ulrmeta[] = "pc[]Program:[System]Object,$8;.ctor p();.entr s[System]Int32 Main([System]String[]);\n";

void* ulraddr[] = {
	(void*) overload0_ns0_Program_ctor,
	(void*) overload0_ns0_Program_Main
};

char* ulrdeps[] = { nullptr };

END_ULR_EXPORT

First, we include the ULR C++ library (StdULR.hpp) and then we begin our ULR C++ code (BEGIN_ULR_EXPORT, which really just resolves to an extern "C" {).

When the ULR loads a native C++ assembly it calls the function InitAssembly(ULRAPI*) as soon as loading is done to pass the ULRAPI instance handle to every assembly that it loads. This object exposes methods that allow native code to interact with the ULR and its framework.

We then must define a class in native code so that we can actually run something. We’ll name the class Program and keep it in the global namespace.

Every non-static class must have a constructor, but since we aren’t going to create any instances we’ll just make the constructor an empty function. Technically the name doesn’t matter but for convention we’ll name it overload0_ns0_Program_ctor.

Next we’ll define the main function, which takes in the a string array of command-line arguments (notice that the type is char* argv rather than char** argv, because the “character” pointer that is being received is actually just a byte pointer/handle to a ULR string array, not a C string array) and returns an integer return code (sizeof_ns1_System_Int32).

After that, we load our JIT assembly using the ULRAPI. The ULRAPI instance finds the assembly file and passes it to ULR::IL::JITContext::Compile, which uses a simple algorithm to compile instructions to native x64 with the evaluation-stack-based paradigm (no optimizations as of now).

We then use reflection to get the Adder type and its Add method. Once we have the handle to the MethodInfo instance, we can use MethodInfo->Invoke (the dynamic way to invoke a method via reflection) to call the method, but since we know the number and type of arguments here at compile time, we’re writing native code, and the JIT compiler has compiled our ULR assembly to ABI-compatible native code as well, we can just find the memory address of the function (addinfo->offset), cast it to the correct C function pointer type ([System]Int32 is stored equivalently to a native int so this works), and call the function directly.

We use cout to output the result of the addition because the ULR unfortunately does not have ULR-internal I/O support and the standard library is extremely minimal at this point (embarassing, I know).

The end of the file is a bit more cryptic. Since native assemblies contain already compiled native code that isn’t centered around ULR code structures the same way a JIT assembly is, we need to export the ULR metadata of the module to the runtime that is loading it.

This export process is done through the three global variables at the bottom of the file. I’ll break down ulrmeta first.

It starts with pc, denoting a public class to follow, then the namespace and name of the class ([]Program). After the colon is the type that the class inherits from (in this case, System.Object). A comma follows, and an optional list of interfaces that the class implements can be specified. Then comes the size of the type (8 bytes) denoted by $8. On a 64-bit runtime, 8 bytes is enough to just store the type pointer of the object, because []Program has no fields declared. Then comes the constructor indicator .ctor, and after that the constructor signature p(): public, with no arguments. Following that is the entrypoint indicator .entr and the entrypoint signature s[System]Int32 Main([System]String[]): static, implicitly private, returns an int and takes in an array of strings.

ulraddr contains a sequential mapping of memory addresses that correspond with the fields/methods declared in ulrmeta.

ulrdeps specifies static dependencies (other ULR assemblies) that this assembly has that the ULR Loader should load before continuing with the current assembly. Our current assembly has no dependencies other than the standard library, which is loaded by the ULR by default on startup.

And that’s a brief overview of how the ULR runs assemblies, from start to finish!

Hope you enjoyed and thanks for reading!

  1. My first commit is dated December 14, 2023. ↩︎

This post is licensed under CC BY 4.0 by the author.