libfud

Table of Contents

Published on 6 October 2024.

Last modified 27 October 2024.

This is my personal library of functions, effectively a C++ wrapper around C and C++ functions and structures written to my tastes.

All snippets on this page are licensed under the Apache License, Version 2.0 and copyrighted by myself, Dominick Allen. If you are reading this, I expect you to have a nominal understanding of C++, including using namespaces and templates.

1. Motivations: Why C++? Why my own library?

1.1. What I like about C++

I like C++. Mostly. Is it Stockholm syndrome? Maybe, but there are aspects of C++ I find hard to resist:

  • Fine tuned control of hardware and access to the underlying OS, and high performance.
  • Strong typing. Stronger than plain C, at least. Higher expressivity than C - scoped enumerations, closures, namespaces, and methods are very nice features.
  • High compatability with C - with notable exceptions such as std::atomic vs the _Atomic type specifier.
  • Access to many libraries, for C and C++. The ecosystem is mature compared to languages like Rust and Zig, and much larger than languages like D .
  • The object model , flawed as it may be, enables RAII with destructors. Is RAII universally good? Zig eschews 'hidden' destructors, and uses defer and errdefer to make control flow explicit. The creator of Odin has some excellent commentary about maximizing explicitness vs minimizing implicitness. Ultimately, I am in favor of destructors, especially given the lack of defer and errdefer in C++.
  • The compiler will allow you to do stupid things when it comes to the lifetimes of objects. This isn't as bad as you might think, but I will admit that it increases the difficulty of judging how correct a program is.
  • As bloated as the spec may be, it is a spec, and there are multiple compilers available for it.

1.2. What I don't like about C++

As much as I like C++, I have issues with some parts of C++ culture, and I definitely have problems with the standard libraries of C and C++.

  • Exceptions are terrible for reasoning about code. Rather than goto, they are comefrom . Of course, this is a simplistic, hyperbolic statement, and there may be valid reasons to use exceptions, but I abhor their liberal proliferation throughout a codebase.
  • The STL is rife with exceptions. Containers may throw exceptions on allocation failure, and although this may be overcome with polymorphic allocators in the pmr namespace, the STL's algorithms still may perform arbitrary allocations and throw other exceptions. I recommend that you read the EASTL (yes, that EA), motivations to see additional, sometimes dated, criticisms of the STL.
  • Streaming operations in C++ suck for various reasons . Consider the following:

    // primary's streamId, sequence, and length members are arrays of uint8_t
    output << std::hex
        << "PacketId: 0x" << static_cast<uint16_t>(primary.streamId[0])
        << static_cast<uint16_t>(primary.streamId[1]) << "\n"
        << "Sequence: 0x" << static_cast<uint16_t>(primary.sequence[0])
        << static_cast<uint16_t>(primary.sequence[1]) << "\n"
        << "Length: 0x" << static_cast<uint16_t>(primary.length[0])
        << static_cast<uint16_t>(primary.length[1])
        << std::dec
        << " (" << definition.length << "\n";
    

    If you are wondering why I've cast the uint8_t fields, it is because stream operators interpret them, correctly, as characters, rather than integers. This is a consequence of a shallow abstraction, rather than any deeper problem. Contrast with format:

    output = std::format(
        "PacketId: 0x{:02X}{:02X}\n"
        "Sequence: 0x{:02X}{:02X}\n",
        "Length: 0x{:02X}{:02X} ({})\n",
        primary.streamId[0], primary.streamId[1],
        primary.sequence[0], primary.sequence[1],
        primary.length[0], primary.length[1], definition.length);
    

    The format approach actually does more than the stream, since it succinctly and non-statefully sets the field width in addition to specifying a hex output format.

  • std::filesystem stands out as a particularly bad part of the standard library, mixing exceptions and error codes inappropriately, and is prone to TOCTOU bugs.
  • The C standard library is full of warts too. C strings are a trillion dollar disaster. There are other bad decisions like atoi , getc , gets , asctime , gethostbyname , setjump=/=longjmp , strncpy , strtok , scanf , and the whole family of printf functions.
  • Character classification is broken by the design of the API, taking integer arguments, but causing undefined behavior if the input is negative.
  • Locales were a mistake .
  • errno is a significant cause of pain in C programming. Usually users only need to check it on error returns, but for functions like strtol where the range of the output covers all possible sentinels, users must explicitly set errno to 0 before calling the function to determine if an error has happened.
  • Both the C and C++ standard libraries happily make allocations in a way that the user has no control over, often in places where a user may not consider it. For example std::find and std::transform can throw std::bad_alloc , but don't accept allocators. std::function had a poor record of support for custom allocators, and support was removed for it in C++17. Under the hood, fopen and its ilk are using an allocator without any user control.

1.3. Why I am Writing My Own Library in C++

Simply put, I want to overcome the problems I listed above in a way that minimizes my pain. I also want to stroke my own ego, so that's why I'm really doing it. I don't want you to use my library, there are better alternatives. Consider using Abseil , the Embedded Template Library , the EA STL , Frozen , or even rolling your own and mixing and matching.

  • Minimize provocation of undefined behavior to the minimum amount possible. I don't want to run into UB accidentally. This means that any system interface calls where an input being invalid invokes should get their own wrapper to check their input.
  • Simplify checking the returns and outputs of functions to know from the caller whether the call has succeeded, failed, or is in a state of partial success/failure.
  • Use the behavior of destructors to manage resources that need to be released safely.
  • Stridently avoid exceptions. I am currently using std::variant to implement a Result type, but I use my own Option class. std::variant throws bad access exceptions, which I am okay with, as long as the consumer does not try to catch them. In the future, I plan to implement my own variant which does not throw an exception in favor of an uncatchable abort via an assertion.
  • Liberally use assertions to ensure invariants are upheld. Assertions are upheld in production code.
  • Allow the use of custom allocators anywhere that allocation may be used. This is a first class design principle in Zig and Odin , and it should be the case for every new systems language.

2. Guiding Principles

The Guiding principles are not meant to be enforced all the time, but breaking them requires some justifications. Please, bear in mind these are specific to my library, and not your own codebase.

  • All results must be checked, and generally functions should be marked as nodiscard .
  • No exceptions are thrown.
  • goto is not allowed.
  • Assertions are enabled in release builds.
  • All loops must terminate. Recursion is not allowed.
  • Side effects must be isolable and restricted to the minimum amount necessary. Obviously, this is taken in moderation for functions involving IO.
  • All functions should be thread safe. Any function not thread safe must say so in its name.
  • No Minimal dependencies on 3rd party libraries. My current exception to this rule is dragonbox , for formatting floating point numbers.
  • Minimal dependencies on the C and C++ standard library; primarily IO, noexcept and constexpr/consteval functions. Exceptions: std::variant , syscall (see openat2 ), various type traits, std::move , cstdint , cstddef , and some other core features. Generally, anythign which does not invoke exceptions is fair game, and anything which involves a system call are on the table.
  • Any objects which allocate must take a user specified allocator, which may be defaulted to a generic allocator, which users define with the functions fudAlloc and fudFree .
  • Allocations are intrinsically fallible. Operations involving allocations must be able to signal that they can fail.

I am not trying to make this library portable across all systems. I eschew portability in favor of comprehensibility and minimal amount of configuration. I want to be able to reason about the code, and compile time configuration makes that much more difficult. libfud targets desktop Linux systems.

3. A guided tour of the code

3.1. A first look at libfud.hpp

Before you read any further, understand that snippets of headers don't include header guards and are reformatted to display on html code blocks. Thank you.

//libfud.hpp
#include "fud_array.hpp"
#include "fud_allocator.hpp"
#include "fud_result.hpp"
#include "fud_status.hpp"
#include "fud_string.hpp"

#include <cstdint>

namespace fud {

constexpr size_t GIT_REV_CHARS = 13;

struct FUD {
    uint8_t major;
    uint8_t minor;
    uint8_t patch;
    Array<char, GIT_REV_CHARS> revision;
};

FUD fud();

Result<String, FudStatus> getEnv(
    const char* name,
    Allocator* allocator= &globalFudAllocator);

template <typename T>
concept CStringRepr = requires(T strObj) {
    { strObj.c_str() } -> std::convertible_to<const char*>;
};

template <CStringRepr T>
Result<String, FudStatus> getEnv(
    const T& name,
    Allocator* allocator = &globalFudAllocator)
{
    return getEnv(name.c_str(), allocator);
}

} // namespace fud

Let's go through this in chunks. The library defines everything in the namespace fud . This header defines the struct FUD , which is returned by the function fud . FUD is a simple struct (in old-school parlance, POD , but now trivial and standard layout) for version information, including the git revision. I use cmake functionality to accomplish this, although it's quite cumbersome and beyond the scope of this article. The function fud returns an instance of FUD filled with the version information of the library. Wherever a name similarity arises with the standard library, I will prefix my version with the namespace and colons.

Next up in this header, let's take a look at fud::getEnv , with doxygen comment included.

/**
 * \brief Get an environmental variable if it exists.
 *
 * \param[in] name The name of the variable to look up.
 * \param[in] allocator The allocator used by the string returned.
 *
 * \retstmt The value of the string bound to the variable if it exists.
 * \retcode FudStatus::NullPointer if name is a null pointer.
 * \retcode FudStatus::NotFound if no binding for the variable exists.
 */
Result<String, FudStatus> getEnv(
    const char* name,
    Allocator* allocator= &globalFudAllocator);

This is a wrapper around the standard library getenv . The standard library getenv returns a null pointer if the specified environment variable is not found. What does fud::getEnv do? It returns a Result<String, FudStatus> , so on error it will be giving back a FudStatus::Notfound error code. In fact, fud::getEnv does additional checking on its input that getenv does not. If you pass a null pointer to getenv, it will happily segfault. In contrast, fud::GetEnv will return the error variant of result with the value of FudStatus::NullPointer. On success, another difference arises. fud::GetEnv returns a fud::String instead of a char *. There is a very important distinction here: getenv returns a pointer to mutable data in the environment. Quoting the man page:

As typically implemented, getenv() returns a pointer to a string within the environment list. The caller must take care not to modify this string, since that would change the environment of the process. The implementation of getenv() is not required to be reentrant. The string pointed to by the return value of getenv() may be statically allocated, and can be modified by a subsequent call to getenv(), putenv(3), setenv(3), or unsetenv(3).

That is to say, you must not modify the returned value from getenv , and you can't count on its permanency. It is therefore wise to allocate your own string for it and move on.

There is another function signature for fud::getEnv :

template <typename T>
concept CStringRepr = requires(T strObj) {
    { strObj.c_str() } -> std::convertible_to<const char*>;
};

template <CStringRepr T>
Result<String, FudStatus> getEnv(
    const T& name,
    Allocator* allocator = &globalFudAllocator)
{
    return getEnv(name.c_str(), allocator);
}

This definition uses a concept of a CStringRepr which requires that for a given T fulfilling the concept to have a method c_str which returns a pointer that is convertible to const char* . It is the caller's responsibility to ensure that the c_str method truly returns a null terminated string.

3.2. Spreading fud::fud

Now, let's take a look under the hood, at the implementation.

// libfud.cpp
#include "libfud.hpp" // declarations
#include "fud_config.hpp" // build time information
#include <cstdlib> // getenv

namespace fud {

FUD fud()
{
    FUD fudInfo{};
    static_assert(sizeof(GitHash) >= sizeof(fudInfo.revision));
    fudInfo.major = FudVersionMajor;
    fudInfo.minor = FudVersionMinor;
    fudInfo.patch = FudVersionPatch;
    copyMem<sizeof(fudInfo.revision) - 1>(fudInfo.revision, FudGitHash);
    fudInfo.revision[fudInfo.revision.size() - 1] = '\0';
    return fudInfo;
}

/* ... */
} // namespace fud

After the includes and the namespace declaration, we get to the implementation of fud::fud() . First, the default constructor is called to make fudInfo . Then, static_assert is used to ensure that GitHash is at least as big as fudInfo . Then, the version information is assigned to fudInfo , and it's returned - but what is GitHash ?

// fud_version.hpp
#include <cstdint>

namespace fud {

constexpr uint8_t FudVersionMajor = 0;
constexpr uint8_t FudVersionMinor = 42;
constexpr uint8_t FudVersionPatch = 0;
constexpr const char GitHash[] = "b50980ad70684530d55b7adf20de6047ebf53ba2";

} // namespace fud

As you can see, it's simply a string array. The contents of this file are derived from a cmake configuration file, fud_version.hpp.in . The method to get the git hash itself is left for a possible future article, but it's easy to search for it to find how you might like to implement it for your project.

You might be wondering what fudAssert is doing. Let's dig in. An old version of this code used a different signature of copyMem which returns a status. There is an infallible version I use instead now:

// fallible copyMem
FudStatus copyMem(void* destination, size_t destination_size,
                  const void* source, size_t count)
{
    if (anyAreNull(destination, source)) {
        return FudStatus::NullPointer;
    }

    if (destination_size < count) {
        return FudStatus::ArgumentInvalid;
    }

    auto* destPtr = static_cast<char*>(destination);
    const auto* sourcePtr = static_cast<const char*>(source);
    for (decltype(destination_size) idx = 0; idx < count; ++idx) {
        destPtr[idx] = sourcePtr[idx];
    }

    return FudStatus::Success;
}

// infallible copyMem
template <size_t Count, typename T, typename U>
void copyMem(T& destination, const U& source)
{
    static_assert(Count <= sizeof(U));
    static_assert(Count <= sizeof(T));
    static_assert(std::is_standard_layout_v<T>);
    static_assert(std::is_standard_layout_v<U>);
    static_assert(std::is_trivially_copyable_v<T>);
    static_assert(std::is_trivially_copyable_v<U>);

    auto* destPtr = reinterpret_cast<char*>(&destination);
    const auto* srcPtr = reinterpret_cast<const char*>(&source);
    for (size_t idx = 0; idx < Count; ++idx) {
        destPtr[idx] = srcPtr[idx];
    }
}

As you can see, the templated version can not exceed the bounds of either T or U, and uses references rather than pointers. It also ensures that both destination and source are trivially copyable and standard layout - see this stack overflow answer for why this is important.

3.3. Assertions

I apologize for not having a proper lead in to this section, but I want to keep this section despite no longer being connected to libfud.cpp directly, since it is important for the library as a whole.

// fud_assert.hpp
#include <source_location>

namespace fud {

[[noreturn]] void assertFail(
    const char* assertion,
    std::source_location sourceLocation = std::source_location::current());

#define fudAssert(expr) ((expr) ? static_cast<void>(0) : \
                         assertFail(#expr, std::source_location::current()))

} // namespace fud

fud_assert.hpp defines a function, and a macro to invoke the function. The purpose of the macro is to translate the assertion into a constant string with the # stringifying preprocessor operator, and to explicitly call assertFail with the current source location. This is similar to the standard assert behavior, but it has a twist: it does not depend on NDEBUG to determine its "true" definition. It always calls assertFail when the assertion is false, in line with Tiger Style .

#include "fud_assert.hpp"

#include "fud_format.hpp"
#include "fud_string.hpp"
#include "fud_string_view.hpp"

#include <cstdio>

namespace fud {

struct BufferSink {
    DrainResult drain(StringView source);
};

DrainResult BufferSink::drain(StringView source)
{
    DrainResult result{0, FudStatus::Success};
    if (source.m_data == nullptr) {
        result.status = FudStatus::NullPointer;
        return result;
    }
    if (source.m_length == 0) {
        result.status = FudStatus::Success;
        return result;
    }
    // cast from const utf8* to const char*
    const auto* sourcePtr = reinterpret_cast<const char*>(source.m_data);
    /* TODO: give users control over this functionality */
    result.bytesWritten = fwrite(sourcePtr, 1, source.m_length, stderr);
    if (result.bytesWritten != source.m_length) {
        result.status = FudStatus::Full;
    }
    return result;
}

[[noreturn]] void assertFail(const char* assertion,
                             const std::source_location sourceLocation)
{
    BufferSink sink;
    const char* fileName = sourceLocation.file_name();
    if (fileName == nullptr) {
        fileName = "Unknown file";
    }
    const char* functionName = sourceLocation.function_name();
    if (functionName == nullptr) {
        functionName = "Unknown Function";
    }
    auto formatResult = format(
        sink, FormatCharMode::Unchecked,
        "{}:{}:{}: {}\n",
        fileName, functionName, sourceLocation.line(),
        assertion);
    static_cast<void>(formatResult);

    std::terminate();
}

} // namespace fud

When the assertion fails, the textual representation of the assertion, the caller's filename, the line number, and the function name are included in a logged message to stderr. Because this is meant to only be used when invariants are violated, the program is terminated ungracefully.

This particular exception implementation uses a custom fud::format implementation. If you are unfamiliar with the syntax of std::format and std::format_to_n , then I recommend that you review the format spec . Unlike std::format , however, these format strings are not yet compile time checked. fud::format does not allocate directly - instead, a Sink is passed in as an argument, which controls how output is passed to it. Internally, it uses statically sized, stack-local buffers to hold numeric inputs. It uses a chunking algorithm to handle pathological cases of large padding sizes. I will cover fud::format in much greater detail in a later article.

Now that we've covered how assertions are implemented, let's unwind back to libfud.cpp .

3.4. Spreading more fud

The second function from libfud.hpp is fud::getEnv .

Result<String, FudStatus> getEnv(const char* name, Allocator* allocator)
{
    using RetType = Result<String, FudStatus>;

    if (name == nullptr) {
        return RetType::error(FudStatus::NullPointer);
    }

    const char* resultString = getenv(name);
    if (resultString == nullptr) {
        return RetType::error(FudStatus::NotFound);
    }

    return String::makeFromCString(resultString, allocator);
}

As I mentioned previously, fud::getEnv is responsible for checking its input, checking the output of getenv , and returning a sound Result . Result is a class that emulates Rust's own result type, templating on a success type T and an error type E - for fud::getEnv , a String and FudStatus . We'll dig further into the details of Result in a moment, I promise. String is its own rabbit hole that deserves its own article - for now, take it as a class similar to std::string with an explicit allocator argument, which is given the default globalFudAllocator in the declaration.

First, fud::getEnv checks that its input is valid. If it isn't, it returns the appropriate FudStatus code. Next, it calls getenv , and checks that its return is not null. If it is null, it indicates that no match was found, so FudStatus::NotFound is returned. Next, the object envar of type String is created, and its validity is checked. This is required when constructing objects without the ability to throw an exception. If that check fails, FudStatus::Failure is returned. Finally, envVar is returned as a success.

3.5. Results are nicer than sentinels and out parameters.

The idea of a result type is to allow a single return to encapsulate two variants depending on if the function succeeded or failed. I have elected to use std::variant to implement them instead of tagged unions to reduce the possibility of undefined behavior, but the cost is that an invalid variant access will throw std::bad_variant_access . I consider this an acceptable trade for the semantics of results compared to throwing exceptions (particularly for non-exceptional circumstances like network timeouts), pairing out-parameters with results, or even worse, sentinel values:

int sentinel(int x) {
    if (isInvalid(x)) {
        errno = EINVAL;
        return -1;
    }
    return foo(x);
}

void doThing(std::vector<int> xVec) {
    for (auto x: xVec) {
        // -1 may be an acceptable output of f
        errno = 0;
        auto tmp = sentinel(x);
        if (tmp == -1 && errno != 0) {
            printf("Error %d\n", errno);
        } else {
            printf("Mapped %d to %d", x, tmp);
        }
    }
}

When using a sentinel, it occasionally is the case that the sentinel is a valid output of the function in question, which requires additional logic to determine if it's truly an error condition or not. The abseil project recommends against the use of sentinels . The C standard library is full of these awful functions - a good demonstration of a valid sentinel value comes from atoi - in glibc and musl , 0 is returned on error, but errno is not set! The onus is on the user to check that the string is not convertible to 0 to determine if it is an error condition. Thankfully, strtol rectifies the situation… by allowing errno to be set, to facilitate determining if the sentinel value is valid. This also necessitates setting errno to 0 before making the call. Examining the next alternative, using an out parameter, doesn't make a compelling argument for ergonomics either.

FudStatus outParameter(int x, int& out) {
    if (isInvalid(x)) {
        return FudStatus::InvalidInput;
    }
    out = foo(x);
    return FudStatus::Success;
}

void doThing(std::vector<int> xVec) {
    for (auto x: xVec) {
        int out = 0;
        auto status = outParameter(x, out);
        if (status != FudStatus::Success) {
            printf("Error %s\n", FudStatusToString(status));
        } else {
            printf("Mapped %d to %d", x, out);
        }
    }
}

As you can see, using an out parameter requires that the argument used to take the output must exist before the function call. The state of the value may be indeterminate, and should not be used in an error scenario before writing a different value to it. Now, let's see why I think Result is the right way of handling this:

Result<int, FudStatus> returnResult(int x) {
    if (isInvalid(x)) {
        return Result<int, FudStatus>::error(FudStatus::InvalidInput);
    }
    return Result<int, FudStatus>::okay(foo(x));
}

void doThing(std::vector<int> xVec) {
    for (auto x: xVec) {
        auto result = returnResult(x);
        if (result.isError()) {
            printf("Error %s\n", FudStatusToString(result.getError()));
        } else {
            printf("Mapped %d to %d", x, result.getOkay());
        }
    }
}

C++ finally has its own result type in the standard library with the addition of std::expected to the standard library in C++23.

3.6. Result: The Foundation of libfud

It is not an exaggeration to say that the Result class is a foundational building block of this library. At the time of writing, a grep in the include and source directories finds at least 92 instances of it - and many functions which do not return a Result instead return FudStatus . For the sake of expedience, I will post only a shortened version, skipping move-versions of functions - you can view the full implementation here .

#include <variant>
#include <utility>

namespace fud {

/** \brief A result type which contains either a T on success or an E on error. */
template <typename T, typename E>
class [[nodiscard]] Result {
  public:
    using ResultType = Result<T, E>;

    static constexpr ResultType okay(const T& okay) {
        return ResultType{okay};
    }

    static constexpr ResultType error(const E& error) {
        return ResultType{error};
    }

    [[nodiscard]] constexpr bool isOkay() const {
        return (m_value.index() == 0);
    }

    [[nodiscard]] constexpr bool isError() const {
        return (m_value.index() == 1);
    }

    [[nodiscard]] constexpr const T& getOkay() const& {
        return std::get<T>(m_value);
    }

    [[nodiscard]] constexpr const T& getOkayOr(const T& alternative) const& {
        if (!isOkay()) {
            return alternative;
        }
        return std::get<T>(m_value);
    }

    [[nodiscard]] constexpr const E& getError() const& {
        return std::get<E>(m_value);
    }

    [[nodiscard]] constexpr const E& getErrorOr(const E& alternative) const& {
        if (!isError()) {
            return alternative;
        }
        return std::get<E>(m_value);
    }

  private:
    constexpr Result() : m_value() {}

    std::variant<T, E> m_value;
};

} // namespace fud

Starting with the obvious: the class is templated on the success type, T, and the error type, E. The methods isOKay and isError do what they say on the tin; isOkay is analogous to the member function has_value of std::expected . Access to the value is guarded through the getOkay and getError methods; grabbing the wrong variant is a sure-fire way to stop the program. An alternative is to use the getOkayOr and getErrorOr methods, which will return the supplied value if the result is an error or okay, respectively.

I'm not sure how I feel about the design of std::expected. I would say it's better than nothing, but the framing of the abstraction is one which is so asymmetrically preferential to the happy path that handling the sad path is unduly burdensome. I prefer Rust's design for a simple reason: C++ lacks the pattern matching that truly makes Result a first class citizen in Rust. For the foreseeable future, whether as additions to the standard library or in 3rd party libraries, std::expected , std::optional , and other algebraic data types can not be as strong of a design choice as they are in languages with pattern matching.

3.7. FudStatus

There's one major element left from the opening discussion about libfud.hpp to discuss, FudStatus . FudStatus is a scoped enumeration meant to capture standard failure modes, as well as success.

// fud_status.hpp
enum class [[nodiscard]] FudStatus : int32_t
{
    Success = 0,
    NullPointer,
    Failure,
    NotFound,
    Partial,
    /* Omitted */
    NotImplemented,
    NotSupported
};

constexpr const char* FudStatusToString(FudStatus status)
{
    switch (status) {
    case FudStatus::Success:
        return "Success";
    case FudStatus::NullPointer:
        return "NullPointer";
    /* Omitted */
    default:
        return "Unknown";
    }
}

The function FudStatusToString can also be used for a string representation of the status, to facilitate pretty printing the status.

4. Additional components of libfud

With that, the introduction to libfud is complete. There are more parts: string conversion, formatting, and the wrapper the SQLite C interface would make good articles to cover. As I write additional articles, I will link back to them on this page.