Scripting Objects without Trash Day
The two-axis chart. It's a classic tool for simplifying information. However, for the most part, it is... not very good. The metrics on them are almost always vibe checks. And flattens endless nuance into a literal 2D plane. But, there is one thing they do have above other charts: they're fun to make.
So, in the spirit of the internet, I've made my own two-axis chart:
Now normally, the position of a language denotes how strongly it has that property. However, in this chart, I only care about which quadrant (or quadrants) the language is in. And like all other charts, I'm just doing a vibe check. The criteria for "scripting languages" is pretty broad:
- It's in the name (JavaScript and family).
- The language maintainers call it a scripting or embeddable language.
- Wikipedia includes a note about it being a scripting language.
- I feel like I've heard people call it a scripting language.
[1] Dart seems to be tagged as a "Scripting Language" on Wikipida tho, I have a feeling that classification is dated. Dart originally targeted JS when it first drop.
Now, I've highlighted a particular area of this chart. The weird corner of "No GC" and "Typically Called a Scripting Language".
Garbage Collections and Scripting. A Match Made in Heaven, Right?
For most programmers high level and "has a GC" is a given. It can feel almost intuitive. If you want to move away from the details of the machine, then you start automatic various tasks. One such task is memory management. Moreover, it's common and even mandatory for some high-level languages.
This is even more true for scripting languages. A scripting language is built to be a layer above whatever language you are using. They are tools for rapid iteration. To add back in manual memory management sounds almost like an oxymoron. You have lifted a low-level feature into the domain of some of the highest-level languages.
However, it is an interesting conjecture: what would a scripting language without a GC look like? Well, luckily I don't have to speculate, as some people have already implemented them.
GDScript
The first language I want to look at is GDScript. GDScript is the language that kicked off this train of thought. The quickest TL;DR for GDScript is that it's a python-ish-like language that serves as the main scripting language for the Godot game engine.
Now, not everything is left to the programmer. It does provide the RefCounted
type. However, the root class Object
does not inherit from it. Thus, it has no
automatic memory management. Furthermore, every object you can place into the
scene tree inherits from Node
which inherits from Object
. Therefore, the
bulk of your game's entities will be manually memory managed.
So, a snippet of code like the following will leak memory:
func _process(delta: float) -> void:
Node.new()
To prevent memory leaks like this, we need to either hold onto the reference and free it later or add it to a parent node. Nodes in Godot will automatically free all children once they are destroyed. This allows for complex subtrees to be built without needing to handle memory management for each member.
var orphan_node = Node.new()
func _notification(what: int) -> void:
match what:
NOTIFICATION_PREDELETE:
orphan_node.free()
func _ready() -> void:
add_child(Node.new())
func _process(delta: float) -> void:
free() # All children will be freed here
Also, it's worth noting that a typical Godot application will be using the function queue_free(). It flags the object for deletion at the end of the frame. Meanwhile
free()
will instantly distroy that object, potentially while that object is still in use during that frame.
In a way, GDScript is similar to C++ with new/delete
. GDScript provides some
degree of memory safety. It has bounds checking on arrays and you can't allocate
arbitrary blocks of memory. However, it's still possible to run into double-free
bugs and dangling pointers.
func _ready() -> void:
var node = Node2D.new()
node.free()
node.position = Vector2() # oops
node.free() # double oops
This can get mudded by the fact that Godot allows for coroutines and shared
references. This means that any reference to a node in the scene tree is unsafe
to use after returning from a yield/await call. Luckily Godot provides the
global function is_instance_valid(object)
:
var x = $Entities/X
var y = $Entities/Y
await get_tree().idle_frame
# This is unsafe, we don't know if `x` holds a valid instance.
x.position = Vector2.ONE * 100
# This is safe, we have re-affirmed that `$Y` is alive.
if is_instance_valid(y):
y.position = Vector2.ONE * 110
I have some damn ugly sections of code in one of my projects from this quirk of GDScript.
So as you can see, GDScript inherits some of the challenges that manual memory management carries. Additionally, this is compounded by GDScript's concurrency features. However, there is one language known for its fearless concurrency. And, it happens to have a scripting language.
Dyon
Dyon is a Rust-based scripting language created by the people behind Piston, the Rust-based game engine. Unlike Rust, dyon uses semi-dynamic typing. I say "semi-dynamic" as once a variable has a type, it cannot be changed.
a := 10
a = "Hello, World" // this will result in an error
To be honest, this almost mirrors the way I've learned to minimize errors when working with dynimically typed code. But, it may be unexpect for people used to freely changing the types of variables.
Additionally, Dyon has a fairly expressive syntax. It allows for both the classic C-style syntax and a more functional style.
// C-style
fn succ(x) -> {
return x + 1
}
// Functional-style
sq(x) = x * x
Dyon has a lot of other neat features, but this isn't a review of Dyon. As you'd assume, Dyon has no GC. Instead, it uses lifetime analysis like Rust. All variables and inputs are checked for lifetimes:
fn fact(n: f64) -> f64 {
product := 1
for i [2, (n+1)) {
product *= 2
}
return clone(product)
}
In this snippet, to return the variable product
, I have to clone it.
Otherwise, the value held by it won't live long enough. Attempting to return it
directly will produce this error from Dyon:
--- ERROR ---
In `main.dyon`:
`product` does not live long enough
37,12: return product // have to clone value as product
37,12: ^
Additionally, Dyon allows the programmer to mark the lifetimes of input variables.
stash(store, id, value: 'store) {
store[id] = b
}
In this example, I have the input value: 'store
. In Dyon this indicates that
value
will outlive store
. This allows Dyon to safely give store
a
reference to value
.
However, since Dyon is dynamically typed, it cannot reason about types ahead of time. So, all values are subject to the same lifetime rules. Thus, I have to clone primitive values like
bool
,f64
, andstr
.
Nesting objects and arrays get a bit more complex. Internally, Dyon uses copy-on-write. This means that nested values that are not created on the stack first, will operate in a "copy-on-write" fashion.
box_0 := { inner: { x: 10 } }
box_1 := box_0
box_0.inner.x += 1
println(box_0) // {inner: {x: 11}}
println(box_1) // {inner: {x: 10}}
Once we modify box_0
, a copy of inner
is created. However, values will be
full references if all values have been made on the stack:
inner := { x: 0 }
box_0 := { }
box_0.inner := inner
inner.x += 1
println(box_0) // {inner: {x: 1 }}
Here when inner
is changed box_0.inner
is also changed.
I find the idea of bringing lifetimes into a scripting language to be an interesting idea. I would imagine that such analysis could lead to a fairly performant scripting language. Additionally, it could mitigate some foot guns that GDScript's model presents. However, in my early planning phase, this post began to grow.
My Search Space Expanded
This post originally got kicked off after pondering how weird GDScript was. It had "friendly" python-like syntax and was gradually typed yet featured no GC. I then remembered Dyon existed and decided to include it. Its role is much like Lua. On its own, Dyon can't do much (well aside from maybe a bit of 4D math). However, by wrapping it into a broader API, Dyon can be used as a full programming language.
But after posting my chart to Mastodon, people soon gave me a reason to expand my search.
Rhai
Rhai bills itself as an "embedded scripting language and engine for Rust". It appears to be inspired or derived from ChaiScript (which is its namesake). Like Dyon, it also has no GC. However, unlike Dyon, Rhai does not make use of lifetimes. I don't have too much to say about Rhai. However, from my experiments, it seems to use copy-on-assignment. Moreover, these copies appear to be deep copies. For example, constructing a cons list:
fn cons(head) { return #{head: head}; }
fn cons(head, rest) { return #{head: head, rest: rest}; }
fn list_map(f) {
// Use the implicit reference through `this`
this.head = f.call(this.head);
this.rest?.list_map(f);
}
let xs = cons(7, cons(11));
let ys = xs;
ys.list_map(|x| x * x);
print(`${xs}`); // [PRINT] #{"head": 7, "rest": #{"head": 11}}
print(`${ys}`); // [PRINT] #{"head": 49, "rest": #{"head": 121}}
As you can see here, the original list is completely untouched despite
list_map
using in-place mutation. By implementing variables in this way, Rhai
can avoid needing any sort of memory tracking. Once a variable is assigned or
lost, any existing values can be freed. However, Rhai isn't alone in this trait.
Shell Scripting Languages, What is Going on in There?
This one comes from Julia
There exist a variety of shell scripting languages such as Bash, Zsh, and Fish. Shell scripting languages exist in an interesting position. They can avoid the need for GC for one simple reason: like Rhia, they don't have references.
Without references, the moment the value of a variable is changed its old value can be destroyed. Furthermore, values are always copied between variables. This can be demonstrated in the following Fish example:
set -l xs (seq 10)
set -l ys $xs
set xs[1] 0
echo $xs # 0 2 3 4 5 6 7 8 9 10
echo $ys # 1 2 3 4 5 6 7 8 9 10
The output shows us the values of ys
remain unchanged. Additionally, bash
disallows nested lists and references to arrays:
x=(1 2 3 4 5)
x[0]=(1 2 3 4 5) # Illegal.
declare -n y=x # Also, illegal.
These restrictions appear to remove the need for GC. However, the memory story isn't that simple.
Some posts online indicate that you cannot trust Bash to release memory deterministically. Moreover, my experiments with Fish have made determining when memory is released quite hard. For example, the following program will allocate about ~1GB of ram:
set -l values a b c e d f g h i j k l m n o p q r s t u v w x y z
for value in $values
set -l $value (seq 524288)
end
The magic number of 524288 was experimentally found.
Now, if I run that loop a couple of times, I can see that memory usage never changes. Furthermore, I can set and unset various values, and the memory profile jumps all around. It's clear that Fish has some mechanism for freeing memory, but I have can't place my finger on it.
Now I doubt shell scripting languages are designed for advanced memory management. A lot of tasks can be done with little to no allocation by the shell directly. And for many day-to-day usages, their processes are short-lived.
Now, shell scripts were one of two recommendations I looked into. However, after doing deeper digging, one of the recommendations sadly doesn't land in the elusive Top-Right Corner.
ZDoom's Scripting Languages
This one comes from Caws
After doing some digging around on ZDoom forms, wikis, docs, and a blog post about ZDoom., it appears that ZDoom implements a tracing garbage collector. Looking at ZDoom's source code, it indeed does contain a mark-and-sweep system:
// Taken from:
// https://github.com/ZDoom/gzdoom/blob/master/src/common/objects/dobjgc.cpp
void FullGC()
{
if (State <= GCS_Propagate)
{
// Reset sweep mark to sweep all elements (returning them to white)
SweepPos = &Root;
// Reset other collector lists
Gray = NULL;
State = GCS_Sweep;
}
// Finish any pending sweep phase
while (State != GCS_Finalize)
{
SingleStep();
}
MarkRoot();
while (State != GCS_Pause)
{
SingleStep();
}
SetThreshold();
}
The full GC is apparently based on Lua's which is pretty neat.
This sadly eliminates ZScript, DECORATE, and ASC from the elusive top-right corner. Since ZDoom is essentially their "runtime", all the languages are GC'ed by default. Although...
ZScript for ZBrush has Manual Memory Management?
Now, I accidentally found this out. Thanks to search engines and page ranking, I stumbled across information about ZBrush's scripting language. At first, I ignored it. I was looking up ZDoom stuff. Then I noticed the preview mentioned something peculiar: Memory Blocks.
According to the ZBrush docs and user guide, Memory Blocks are ZScripts solutions to persistent values. They can even be saved to files and passed between scripts. Creating a block is as simple as followed:
[MemCreate, block, 1024, 0]
This is essentially a malloc()
. In a DLS. For a digital sculpting program. And
it also includes a free()
:
[MemDelete, block]
This was a mildly surprising thing to stumble across. Out of all of the languages featured here, this random scripting language for an art program happens to allow directly allocating contiguous blocks of memory. It honestly seems like quite a powerful feature for an application-specific scripting language to have. It makes me wonder what other application-specific languages have search abilities. Tho, I think I'm going to end my exploration here.
There's Probably More, but Everything Has to End Eventually
Alright, so I imagine if I keep digging deeper I can find more and more instances. These scripting languages cover a wide array of programming situations. Some are built for game scripting. Others are built for task automation. But they all share that common trait of having no GC. They might opt out of garbage collection for performance reasons, determinism, or just simply not needing it by design. However, with the rise in languages forgoing garbage collection, there might be a rise GCless scripting languages to match. And, I hope to see people experimenting with this niche little corner.