October status update

What's going on?

Unfortunately, not much.

Getting to my original goal of being able to do entire Advent of Code with the language itself (at least in time) seems pretty grim. With real life finding a way to spoil all the fun.

That is not to say that nothing has happened. It's just been slower than hoped. With me doing a year-long return trip to the University world and working part-time it hasn't left too many hours for Flower. I'm actually thinking that even if the compiler wouldn't be there in time for AoC, I could, and probably still write the code in Flower anyways. I'm not waiting a year for the next one, I want to be done by then.

Anyways, what has been going on with Flower the past month or so? It's mostly internal stuff of getting stuff into a point where I can build on a solid foundation.

Small-size optimised dynamic arrays

The amount of "TODO: this needs to be small-sized optimised later on"- notes was getting to a point where I decided that it was time to make that happen. There is a lot of small array use in the compiler, and SBO vector type was something I absolutely wanted. (And as a bonus, it will get to my ftl library when I can get it to a sane state for freestanding subset of C++)

Most of the use of std::vector will probably get replaced by this smallvec type slowly.

Why not keep using `std::vector`s

The short story is, that std::vector is not small-size optimised, and it will never be.

The C++ standard mandates that a std::swap(v1, v2) operation may not invalidate existing iterators and/or references to either of the std::vector instances v1 or v2. This basically prevents the SBO for std::vector. Since I don't have to follow the standard with my own containers, I can ignore this. (Though note: this makes my smallvec not a drop-in replacement, even if it mostly follows the same API!)

Why did I want small-size optimised dynamic arrays? Well, a lot of data in Flower compiler is held in std::vector instances, which are often just one to three elements long. Since standard vectors are not using SBO, every vector needs to be heap-allocated and heap-allocating lots of small objects that are used often is a recipe for some serious cache trashing.

So what exactly SBO does?

What small-buffer optimisation does, is that since a vector (or dynamic storage of any kind) requires at least two pointers to the memory region it uses, the vector itself has a size outside of just the elements it contains. We can use sizeof(std::vector<T>) to check out how much space the vector itself needs. For libstdc++ and libc++ both, that size seems to be 24 bytes (for x86_64, at least). We could use that memory instead of doing a dynamic allocation if it can hold our data. This is what SBO is.

The implementation

The smallvec I wrote is currently around 500 lines of C++20 code (435 to be exact), and it is somewhat STL-compatible. It needs some polish and edge case handling with nontrivial types, but for now it handles are the cases needed by the compiler. I intend to improve it still somewhat, but the main bits are going to stay the same at this point. Like regular std::vector, sizeof(flvm::smallvec<T>) is 24 bytes for x86_64 machine. It holds two T* pointers and one std::size_t, so its size is equal to 2 * sizeof(T*) + sizeof(std::size_t) to be more precise.

The std::size_t hold information like this

struct smallvec_metadata
{
    constexpr static std::size_t
        SMALL_BIT =
            (1ull << 8 * (sizeof(std::size_t) - 1));

    constexpr bool 
        is_small() const noexcept
        { return !(meta & SMALL_BIT); }

    constexpr std::size_t 
        small_size() const noexcept
        { return (meta & ~SMALL_BIT); }

    constexpr std::size_t
        big_cap() const noexcept
        { return (meta & ~SMALL_BIT); }

    constexpr void set_small(bool state) noexcept {
        meta &= ~SMALL_BIT;
        if (not state)
            meta |= SMALL_BIT;
    }

    std::size_t meta{};
};

So, basically I stole one bit to mark if we are currently using the small-size optimised version or not. This drops the max size of our vector by a factor of two, but I can't imagine an use case where that would become an issue without virtual address space becoming a problem anyways so we can just ignore that "flaw".

The other bits of the metadata are just either the size() of the vector in case the vector is currently small-size-optimised, or its capacity() if not.

This allows us to have a smallvec that handles most of what the standard std::vector would, with the drawback of having less guarantees about iterator/reference invalidation (or exception safety, our smallvec is definitely not exception-safe currently).

The smallvec itself has signature of

template <typename T,
          std::size_t SBO_Size = SBO_MINIMUM_SIZE,
          typename Allocator_Type = std::allocator<T>>
class smallvec;

Where SBO_Size can be adjusted if we want to use more stack space before the dynamic allocation kicks in. We also use standard allocators for dynamic memory management.

All in all, pretty happy with this one for now. Needs a bit of polish but the basics are there, and I can do the polishing later™.

Variable-sized integers

Another thing I've been working on is a variable-sized-integer (or bigint) type. Since currently the default integer type for Flower integers is a bigint, needing to write this myself shouldn't come as a too much of surprise. I basically want this now so I can handle all the integer types in both the VM and the compiler with same code, and can use the normal integer code paths for these.

The type is nothing to write home about, currently it's a simple sign + smallvec to handle everything. It probably needs some severe optimisation for the VM part later on, but "make it work"-part is currently the more important one, as there is some pretty basic functionality still missing (it just handles +, -, *, / and conversions currently.) I hope to get these to usable state early November so I can get back to the meat of the compiler. And try to have a chance to actually have the compiler running in December.

Compiler and VM tests

Another thing I've been working on has been the testing part for the compiler. Now that I've got my own data structures, they especially needed some actual unit tests.

The test suite is currently quite small (~300 assertions), one of which fails, but it has already been very useful in me finding a couple of bugs in my code. And this is the point where I was absolutely certain I needed to start writing the test suite ASAP. Though it only tests VM/compiler internals at the moment, it needs to be expanded to actually check that programs conform to the specification too.

The specification

While it's in very, very early stage, the first couple of words of the Flower specification were written down. The specification is "inspired" by C++, C, and Hare specifications, and needs a lot of work, but I felt it was necessary to start first steps of that too. Just so it's there, even if it's not stable in any shape or form.

I feel the language specification is as important part of a proper language as a compiler, so well. It's a start. There are some things I certainly wanted (C++'s as-if-rule), and the general idea for the spec is that I can keep it as open as possible (but not a drop more than that). The best way I found for that (I'm sorry C++ people) was to describe the abstract machine, like the C++ standard does. On the other hand, I find it to be specific enough that you don't need to do guesswork. I'll probably include a lot more examples than any of the three standards I've used as a source of inspiration so far.

In any case the goal for the spec is not a low page count, it's ease of understanding and clarity.

And what's up next?

Fixing the problems found with the tests, finishing varint type and trying desperately to get the compiler / VM to a good enough state for Advent of Code.

Probably not going to happen, but a man can dream.

Posted 2022-10-31 in status implementation