October status update
What's going on?
Unfortunately, not much.
Getting to my original goal of being able to do entire Advent of Code with the language itself (at least in time) seems pretty grim. With real life finding a way to spoil all the fun.
That is not to say that nothing has happened. It's just been slower than hoped. With me doing a year-long return trip to the University world and working part-time it hasn't left too many hours for Flower. I'm actually thinking that even if the compiler wouldn't be there in time for AoC, I could, and probably still write the code in Flower anyways. I'm not waiting a year for the next one, I want to be done by then.
Anyways, what has been going on with Flower the past month or so? It's mostly internal stuff of getting stuff into a point where I can build on a solid foundation.
Small-size optimised dynamic arrays
The amount of "TODO: this needs to be small-sized optimised later on"-
notes was getting to a point where I decided that it was time to make
that happen. There is a lot of small array use in the compiler, and
SBO vector type was something I absolutely wanted. (And as a bonus,
it will get to my ftl
library when
I can get it to a sane state for freestanding subset of C++)
Most of the use of std::vector
will probably get replaced by this
smallvec
type slowly.
Why not keep using std::vector
s
The short story is, that std::vector
is not small-size optimised, and
it will never be.
The C++ standard mandates that a std::swap(v1, v2)
operation may not invalidate
existing iterators and/or references to either of the std::vector
instances v1
or v2
. This basically prevents the SBO for std::vector
. Since I don't have
to follow the standard with my own containers, I can ignore this. (Though note:
this makes my smallvec
not a drop-in replacement, even if it mostly follows
the same API!)
Why did I want small-size optimised dynamic arrays? Well, a lot of data in
Flower compiler is held in std::vector
instances, which are often just one
to three elements long. Since standard vectors are not using SBO, every vector
needs to be heap-allocated and heap-allocating lots of small objects that are
used often is a recipe for some serious cache trashing.
So what exactly SBO does?
What small-buffer optimisation does, is that since a vector (or dynamic storage
of any kind) requires at least two pointers to the memory region it uses, the
vector itself has a size outside of just the elements it contains. We can use
sizeof(std::vector<T>)
to check out how much space the vector itself needs.
For libstdc++ and libc++ both, that size seems to be 24 bytes (for x86_64, at
least). We could use that memory instead of doing a dynamic allocation if it
can hold our data. This is what SBO is.
The implementation
The smallvec
I wrote is currently around 500 lines of C++20 code (435 to be exact),
and it is somewhat STL-compatible. It needs some polish and edge case handling
with nontrivial types, but for now it handles are the cases needed by the compiler.
I intend to improve it still somewhat, but the main bits are going to stay the same
at this point. Like regular std::vector
, sizeof(flvm::smallvec<T>)
is 24 bytes
for x86_64 machine. It holds two T*
pointers and one std::size_t
, so its size
is equal to 2 * sizeof(T*) + sizeof(std::size_t)
to be more precise.
The std::size_t
hold information like this
struct smallvec_metadata
{
constexpr static std::size_t
SMALL_BIT =
(1ull << 8 * (sizeof(std::size_t) - 1));
constexpr bool
is_small() const noexcept
{ return !(meta & SMALL_BIT); }
constexpr std::size_t
small_size() const noexcept
{ return (meta & ~SMALL_BIT); }
constexpr std::size_t
big_cap() const noexcept
{ return (meta & ~SMALL_BIT); }
constexpr void set_small(bool state) noexcept {
meta &= ~SMALL_BIT;
if (not state)
meta |= SMALL_BIT;
}
std::size_t meta{};
};
So, basically I stole one bit to mark if we are currently using the small-size optimised version or not. This drops the max size of our vector by a factor of two, but I can't imagine an use case where that would become an issue without virtual address space becoming a problem anyways so we can just ignore that "flaw".
The other bits of the metadata are just either the size()
of the vector in
case the vector is currently small-size-optimised, or its capacity()
if not.
This allows us to have a smallvec
that handles most of what the standard
std::vector
would, with the drawback of having less guarantees about iterator/reference
invalidation (or exception safety, our smallvec
is definitely not exception-safe
currently).
The smallvec itself has signature of
template <typename T,
std::size_t SBO_Size = SBO_MINIMUM_SIZE,
typename Allocator_Type = std::allocator<T>>
class smallvec;
Where SBO_Size
can be adjusted if we want to use more stack space before the
dynamic allocation kicks in. We also use standard allocators for dynamic memory
management.
All in all, pretty happy with this one for now. Needs a bit of polish but the basics are there, and I can do the polishing laterâ„¢.
Variable-sized integers
Another thing I've been working on is a variable-sized-integer (or bigint) type. Since currently the default integer type for Flower integers is a bigint, needing to write this myself shouldn't come as a too much of surprise. I basically want this now so I can handle all the integer types in both the VM and the compiler with same code, and can use the normal integer code paths for these.
The type is nothing to write home about, currently it's a simple sign + smallvec to handle everything. It probably needs some severe optimisation for the VM part later on, but "make it work"-part is currently the more important one, as there is some pretty basic functionality still missing (it just handles +, -, *, / and conversions currently.) I hope to get these to usable state early November so I can get back to the meat of the compiler. And try to have a chance to actually have the compiler running in December.
Compiler and VM tests
Another thing I've been working on has been the testing part for the compiler. Now that I've got my own data structures, they especially needed some actual unit tests.
The test suite is currently quite small (~300 assertions), one of which fails, but it has already been very useful in me finding a couple of bugs in my code. And this is the point where I was absolutely certain I needed to start writing the test suite ASAP. Though it only tests VM/compiler internals at the moment, it needs to be expanded to actually check that programs conform to the specification too.
The specification
While it's in very, very early stage, the first couple of words of the Flower specification were written down. The specification is "inspired" by C++, C, and Hare specifications, and needs a lot of work, but I felt it was necessary to start first steps of that too. Just so it's there, even if it's not stable in any shape or form.
I feel the language specification is as important part of a proper language as a compiler, so well. It's a start. There are some things I certainly wanted (C++'s as-if-rule), and the general idea for the spec is that I can keep it as open as possible (but not a drop more than that). The best way I found for that (I'm sorry C++ people) was to describe the abstract machine, like the C++ standard does. On the other hand, I find it to be specific enough that you don't need to do guesswork. I'll probably include a lot more examples than any of the three standards I've used as a source of inspiration so far.
In any case the goal for the spec is not a low page count, it's ease of understanding and clarity.
And what's up next?
Fixing the problems found with the tests, finishing varint type and trying desperately to get the compiler / VM to a good enough state for Advent of Code.
Probably not going to happen, but a man can dream.