Flower is a programming language

November 2022 status update

What's going on?

November was unfortunately as slow as I feared. To top that, I managed to find not one, but two pretty critical design flaws in the bytecode format I had used. That wasn't too surprising since the entire bytecode was pretty ad-hoc design, but I had hoped I could push the redesign after Advent of Code. No such luck.

So. AoC will not be happening with Flower. Although I might sketch out solutions with it anyways, if I do, I'll probably post about them if I can muster the time.

But some stuff actually still happened in November. Unfortunately most of that is pretty low-level stuff still, and probably not very interesting. Nevertheless, I owe it to myself to keep writing these status posts.

Variable-sized integer type

The goal was to get the varint type working in early November and that actually did happen. It was pretty close to done in October already, but some fixing after making tests was required. Another thing that got added to them is the serialisation support to/from bytecode. It will definitely still require more work in the future, but for now it's satisfactory.

The compiler also handles them now with some grace, e.g. converts them internally to fixed-size-integers in simple cases. It's not much yet, but it's a start.

More very basic custom types

There's a surprising amount of custom types I've had to gobble together while working on the compiler. So far, I've written a rtvec, shelf, smallvec and now buffer and commandline join the fray.

buffer

I'm in no means new to writing standard-compatible containers, but I was a bit surprised I needed to write the buffer class. It got its start when I realised that for all the joy <filesystem> has brought, there still isn't sane file I/O in C++.

Seriously. This is the example from cppreference's basic_fstream

#include <iostream>
#include <fstream>
#include <string>
 
int main() {
  std::string filename = "test.bin";
  std::fstream s(filename, s.binary | s.trunc | s.in | s.out);
  if (!s.is_open()) {
    std::cout << "failed to open " << filename << '\n';
  } else {
    // write
    double d = 3.14;
    s.write(reinterpret_cast<char*>(&d), sizeof d); // binary output
    s << 123 << "abc";                              // text output
 
    // for fstream, this moves the file position pointer (both put and get)
    s.seekp(0);
 
    // read
    s.read(reinterpret_cast<char*>(&d), sizeof d); // binary input
    int n;
    std::string str;
    if (s >> n >> str)                             // text input
      std::cout << "read back from file: " << d << ' ' << n << ' ' << str << '\n';
  }
}

wtf

So, I made a generic memory buffer class that can be written arbitrary data to without worrying too much about data sizes, types or pointers, and then dumped or read from a file to avoid this insanity. I have more than enough reinterpret_casts (61 to be exact) in my code already, thank you. And some of them absolutely blow up the second somebody tries to run the compiler on a big-endian machine.

commandline

Another thing I needed was a common way to handle command line arguments, I've got the bytecode compiler and the vm both, which require them and I didn't want to "just hack something fast" again. So, I wrote a very simple command line parser that can be used like this.

flower::util::commandline cli(argc, argv);

cli.check("--lexonly", [&](){
    runconf.request_result = compiler_step::lexer;
});
cli.check("--debug", [&](){
    runconf.verbosity = verbosity_level::full;
});
cli.check("--print-symbols", [&](){
    runconf.print_symbol_table = true;
});
cli.check('o', [&](std::string_view file){
    runconf.bytecode_output = file;
});
cli.check('t', [&](){
    runconf.print_symbol_table = true;
});

auto positional_args = cli.unused_arguments();

It didn't take me more than a couple of hours but I am pretty happy how simple it ended up being, so there's that.

Bytecode redesign / VM rework

Designing a bytecode format is hard.

I would love to have a bytecode format that is both fast to run in the VM internally, and easy to convert to LLVM bitcode, QBE or even just C.

Turns out these goals are not easy to fit together. Especially since I'm more familiar with assembly languages than bytecode formats that virtual machines commonly use.

The first real attempt (already scrapped in september) was completely fine at running the code fast, but it was way too close to the metal. I realised pretty quickly that all the information lost in the translation made it pretty close to impossible to either optimise or convert to anything else than straight assembly instructions.

So I made a second attempt, and now I realise that it's still too close to bare metal. Granted, this one probably could've been saved for now with including a bunch of additional metadata. There is already some metadata that will be needed for onward conversions and debug info, but it would've still made quite a mess.

So. Once again I went back to specifications for LLVM bitcode. This time I also took glances at both Java and Lua VMs. After the realisation that I probably don't need to be lower-level than Java VM, I scrapped almost entire bytecode format I had put in place.

The codegen changes were pretty minimal. From the first time around I knew that I wouldn't get it right the first attempt, and the codegen parts didn't really have too many magic numbers or anything like that. The VM, on the other hand, needed some more rework with its instructions.

Language test suite

It didn't get included yet in the master branch, but currently some work has been going on to actually compile and run some programs and check their outputs.

This is now done by just a terribly insecure python script that searches for other python files in given path, and executes functions in them in a way more indiscriminate manner than I'm comfortable with. It works nicely and is extremely simple to add new tests though, so it probably stays.

Stats!

So, unfortunately, nothing too exciting has happened. But I thought it'd be at least somewhat interesting for some if I included some actual stats of the git repository here.

There were in total 40 commits to the compiler / VM repository development branch in November (as of 2022-11-24, more still going to happen in this month)

I decided include git stats from two diffs between refs. This is representative of what happened in November in total. I have it in two sets of diffs since the first set got rebased and it got a bit harder to find the end of October / start of November split with the dates getting modified.

 include/flowc/codegen/flir.hpp      |   6 +++
 include/flvm/bytecode.hpp           |  21 +++++-----
 include/flvm/flvm.hpp               |  49 ++++++++++++++--------
 include/flvm/program.hpp            |  24 ++---------
 include/flvm/smallvec.hpp           |   8 +++-
 include/flvm/type.hpp               |  34 +++++++++++-----
 include/flvm/utility.hpp            | 179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/flvm/value.hpp              |  32 +++++++++++++++
 include/flvm/varint.hpp             |  86 ++++++++++++++++++++++++++++++++++----
 src/codegen/flir.cpp                | 163 +++++++++++++++++++++++++++++++++++-------------------------------------
 src/common_util/debug.hpp           |  12 +++---
 src/context/compiler_context.cpp    |   7 ++--
 src/context/scope.cpp               |   4 +-
 src/context/symbol_table.cpp        |   7 +++-
 src/flvm/bytecode.cpp               |   3 ++
 src/flvm/disassembler.cpp           |   4 +-
 src/flvm/flvm_cli.cpp               |  91 +++++++++++++++++++++++++++++++++++++++++
 src/flvm/meson.build                |  18 +++++++-
 src/flvm/program.cpp                |  99 ++++++++++++++++++++++++++++++++++++++++++++
 src/flvm/type.cpp                   |  56 -------------------------
 src/flvm/value.cpp                  | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/flvm/vm.cpp                     | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 src/meson.build                     |   6 +++
 src/semantic/generate_root_asg.cpp  |   4 +-
 tests/flvm/buffer.cpp               |  97 +++++++++++++++++++++++++++++++++++++++++++
 tests/flvm/smallvec.cpp             |  29 +++++++++++++
 tests/flvm/varint.cpp               |  15 +++++++
 tests/internal/flir/conversions.cpp |  32 +++++++++++++++
 tests/internal/meson.build          |  13 ++++++
 tests/meson.build                   |   1 +
 tools/flc/flc.cpp                   |   3 +-
 31 files changed, 1114 insertions(+), 240 deletions(-)

 include/common/cli.hpp        | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/flvm/bytecode.hpp     | 180 ++++++++++++++++++++++++++++++++++++--------------------------------------------------
 include/flvm/disassembler.hpp |   4 +-
 include/flvm/flvm.hpp         |  64 ++++++++++---------------------
 include/flvm/function.hpp     |   4 +-
 include/flvm/program.hpp      |   8 ++++
 include/flvm/type.hpp         |  35 +----------------
 include/flvm/utility.hpp      |  13 +++++++
 include/flvm/value.hpp        |   8 ++--
 src/codegen/flir.cpp          |  63 +++++++++++++++---------------
 src/flvm/disassembler.cpp     |  96 ++++++++++++++++++++++++++++------------------
 src/flvm/flvm_cli.cpp         |  57 ++++++++++-----------------
 src/flvm/function.cpp         |   2 +-
 src/flvm/program.cpp          |   4 +-
 src/flvm/value.cpp            |  14 +++++--
 src/flvm/vm.cpp               | 150 ++++++++++++++++++++++++++++++++++++++++-------------------------------
 tests/flvm/meson.build        |  13 +++++++
 tests/meson.build             |   1 +
 tools/flc/flc.cpp             |  90 +++++++++++--------------------------------
 19 files changed, 490 insertions(+), 433 deletions(-)

Posted 2022-11-24 in status