libfud
Table of Contents
Published on 6 October 2024.
Last modified 27 October 2024.
This is my personal library of functions, effectively a C++ wrapper around C and C++ functions and structures written to my tastes.
All snippets on this page are licensed under the Apache License, Version 2.0 and copyrighted by myself, Dominick Allen. If you are reading this, I expect you to have a nominal understanding of C++, including using namespaces and templates.
1. Motivations: Why C++? Why my own library?
1.1. What I like about C++
I like C++. Mostly. Is it Stockholm syndrome? Maybe, but there are aspects of C++ I find hard to resist:
- Fine tuned control of hardware and access to the underlying OS, and high performance.
- Strong typing. Stronger than plain C, at least. Higher expressivity than C - scoped enumerations, closures, namespaces, and methods are very nice features.
-
High compatability with C - with notable exceptions such as
std::atomic
vs the_Atomic
type specifier. - Access to many libraries, for C and C++. The ecosystem is mature compared to languages like Rust and Zig, and much larger than languages like D .
-
The
object model
, flawed as it may be, enables
RAII
with destructors. Is RAII
universally good?
Zig
eschews 'hidden' destructors, and uses
defer
anderrdefer
to make control flow explicit. The creator of Odin has some excellent commentary about maximizing explicitness vs minimizing implicitness. Ultimately, I am in favor of destructors, especially given the lack ofdefer
anderrdefer
in C++. - The compiler will allow you to do stupid things when it comes to the lifetimes of objects. This isn't as bad as you might think, but I will admit that it increases the difficulty of judging how correct a program is.
- As bloated as the spec may be, it is a spec, and there are multiple compilers available for it.
1.2. What I don't like about C++
As much as I like C++, I have issues with some parts of C++ culture, and I definitely have problems with the standard libraries of C and C++.
-
Exceptions
are terrible for reasoning about code. Rather than goto, they are
comefrom
. Of course, this is a simplistic, hyperbolic statement, and there may be valid reasons to use exceptions, but I abhor their liberal proliferation throughout a codebase. -
The STL is rife with exceptions. Containers may throw exceptions on allocation
failure, and although this may be overcome with
polymorphic allocators
in the
pmr
namespace, the STL's algorithms still may perform arbitrary allocations and throw other exceptions. I recommend that you read the EASTL (yes, that EA), motivations to see additional, sometimes dated, criticisms of the STL. -
Streaming operations in C++ suck for various reasons . Consider the following:
// primary's streamId, sequence, and length members are arrays of uint8_t output << std::hex << "PacketId: 0x" << static_cast<uint16_t>(primary.streamId[0]) << static_cast<uint16_t>(primary.streamId[1]) << "\n" << "Sequence: 0x" << static_cast<uint16_t>(primary.sequence[0]) << static_cast<uint16_t>(primary.sequence[1]) << "\n" << "Length: 0x" << static_cast<uint16_t>(primary.length[0]) << static_cast<uint16_t>(primary.length[1]) << std::dec << " (" << definition.length << "\n";
If you are wondering why I've cast the
uint8_t
fields, it is because stream operators interpret them, correctly, as characters, rather than integers. This is a consequence of a shallow abstraction, rather than any deeper problem. Contrast with format:output = std::format( "PacketId: 0x{:02X}{:02X}\n" "Sequence: 0x{:02X}{:02X}\n", "Length: 0x{:02X}{:02X} ({})\n", primary.streamId[0], primary.streamId[1], primary.sequence[0], primary.sequence[1], primary.length[0], primary.length[1], definition.length);
The format approach actually does more than the stream, since it succinctly and non-statefully sets the field width in addition to specifying a hex output format.
-
std::filesystem
stands out as a particularly bad part of the standard library, mixing exceptions and error codes inappropriately, and is prone to TOCTOU bugs. -
The C standard library is full of warts too. C strings are a trillion dollar
disaster. There are other bad decisions like
atoi
,getc
,gets
,asctime
,gethostbyname
,setjump=/=longjmp
,strncpy
,strtok
,scanf
, and the whole family ofprintf
functions. - Character classification is broken by the design of the API, taking integer arguments, but causing undefined behavior if the input is negative.
- Locales were a mistake .
-
errno
is a significant cause of pain in C programming. Usually users only need to check it on error returns, but for functions likestrtol
where the range of the output covers all possible sentinels, users must explicitly seterrno
to 0 before calling the function to determine if an error has happened. -
Both the C and C++ standard libraries happily make allocations in a way that
the user has no control over, often in places where a user may not consider
it. For example
std::find
andstd::transform
can throwstd::bad_alloc
, but don't accept allocators.std::function
had a poor record of support for custom allocators, and support was removed for it in C++17. Under the hood,fopen
and its ilk are using an allocator without any user control.
1.3. Why I am Writing My Own Library in C++
Simply put, I want to overcome the problems I listed above in a way that minimizes my pain. I also want to stroke my own ego, so that's why I'm really doing it. I don't want you to use my library, there are better alternatives. Consider using Abseil , the Embedded Template Library , the EA STL , Frozen , or even rolling your own and mixing and matching.
- Minimize provocation of undefined behavior to the minimum amount possible. I don't want to run into UB accidentally. This means that any system interface calls where an input being invalid invokes should get their own wrapper to check their input.
- Simplify checking the returns and outputs of functions to know from the caller whether the call has succeeded, failed, or is in a state of partial success/failure.
- Use the behavior of destructors to manage resources that need to be released safely.
-
Stridently avoid exceptions. I am currently using
std::variant
to implement a Result type, but I use my ownOption
class.std::variant
throws bad access exceptions, which I am okay with, as long as the consumer does not try to catch them. In the future, I plan to implement my own variant which does not throw an exception in favor of an uncatchable abort via an assertion. - Liberally use assertions to ensure invariants are upheld. Assertions are upheld in production code.
- Allow the use of custom allocators anywhere that allocation may be used. This is a first class design principle in Zig and Odin , and it should be the case for every new systems language.
2. Guiding Principles
The Guiding principles are not meant to be enforced all the time, but breaking them requires some justifications. Please, bear in mind these are specific to my library, and not your own codebase.
-
All results must be checked, and generally functions should be marked as
nodiscard
. - No exceptions are thrown.
-
goto
is not allowed. - Assertions are enabled in release builds.
- All loops must terminate. Recursion is not allowed.
- Side effects must be isolable and restricted to the minimum amount necessary. Obviously, this is taken in moderation for functions involving IO.
- All functions should be thread safe. Any function not thread safe must say so in its name.
-
NoMinimal dependencies on 3rd party libraries. My current exception to this rule is dragonbox , for formatting floating point numbers. -
Minimal dependencies on the C and C++ standard library; primarily IO,
noexcept and constexpr/consteval functions. Exceptions:
std::variant
,syscall
(see openat2 ), various type traits,std::move
,cstdint
,cstddef
, and some other core features. Generally, anythign which does not invoke exceptions is fair game, and anything which involves a system call are on the table. -
Any objects which allocate must take a user specified allocator, which may be
defaulted to a generic allocator, which users define with the functions
fudAlloc
andfudFree
. - Allocations are intrinsically fallible. Operations involving allocations must be able to signal that they can fail.
I am not trying to make this library portable across all systems. I eschew
portability in favor of comprehensibility and minimal amount of configuration. I
want to be able to reason about the code, and compile time configuration makes
that much more difficult.
libfud
targets desktop Linux systems.
3. A guided tour of the code
3.1.
A first look at
libfud.hpp
Before you read any further, understand that snippets of headers don't include header guards and are reformatted to display on html code blocks. Thank you.
//libfud.hpp #include "fud_array.hpp" #include "fud_allocator.hpp" #include "fud_result.hpp" #include "fud_status.hpp" #include "fud_string.hpp" #include <cstdint> namespace fud { constexpr size_t GIT_REV_CHARS = 13; struct FUD { uint8_t major; uint8_t minor; uint8_t patch; Array<char, GIT_REV_CHARS> revision; }; FUD fud(); Result<String, FudStatus> getEnv( const char* name, Allocator* allocator= &globalFudAllocator); template <typename T> concept CStringRepr = requires(T strObj) { { strObj.c_str() } -> std::convertible_to<const char*>; }; template <CStringRepr T> Result<String, FudStatus> getEnv( const T& name, Allocator* allocator = &globalFudAllocator) { return getEnv(name.c_str(), allocator); } } // namespace fud
Let's go through this in chunks. The library defines everything in the namespace
fud
. This header defines the struct
FUD
, which is returned by the function
fud
.
FUD
is a simple struct (in old-school parlance,
POD
, but now trivial
and standard layout) for version information, including the git revision. I use
cmake functionality to accomplish this, although it's quite cumbersome and
beyond the scope of this article. The function
fud
returns an instance of
FUD
filled with the version information of the library. Wherever a name
similarity arises with the standard library, I will prefix my version with the
namespace and colons.
Next up in this header, let's take a look at
fud::getEnv
, with doxygen comment
included.
/** * \brief Get an environmental variable if it exists. * * \param[in] name The name of the variable to look up. * \param[in] allocator The allocator used by the string returned. * * \retstmt The value of the string bound to the variable if it exists. * \retcode FudStatus::NullPointer if name is a null pointer. * \retcode FudStatus::NotFound if no binding for the variable exists. */ Result<String, FudStatus> getEnv( const char* name, Allocator* allocator= &globalFudAllocator);
This is a wrapper around the standard library
getenv
. The standard library
getenv
returns a null pointer if the specified environment variable is not
found. What does
fud::getEnv
do? It returns a
Result<String, FudStatus>
, so
on error it will be giving back a
FudStatus::Notfound
error code. In fact,
fud::getEnv
does additional checking on its input that
getenv
does not. If
you pass a null pointer to getenv, it will happily segfault. In contrast,
fud::GetEnv will return the error variant of result with the value of
FudStatus::NullPointer. On success, another difference arises. fud::GetEnv
returns a fud::String instead of a char *. There is a very important distinction
here: getenv returns a pointer to mutable data in the environment. Quoting the
man page:
As typically implemented, getenv() returns a pointer to a string within the environment list. The caller must take care not to modify this string, since that would change the environment of the process. The implementation of getenv() is not required to be reentrant. The string pointed to by the return value of getenv() may be statically allocated, and can be modified by a subsequent call to getenv(), putenv(3), setenv(3), or unsetenv(3).
That is to say, you must not modify the returned value from
getenv
, and you
can't count on its permanency. It is therefore wise to allocate your own string
for it and move on.
There is another function signature for
fud::getEnv
:
template <typename T> concept CStringRepr = requires(T strObj) { { strObj.c_str() } -> std::convertible_to<const char*>; }; template <CStringRepr T> Result<String, FudStatus> getEnv( const T& name, Allocator* allocator = &globalFudAllocator) { return getEnv(name.c_str(), allocator); }
This definition uses a
concept
of a
CStringRepr
which requires that for a
given
T
fulfilling the concept to have a method
c_str
which returns a
pointer that is convertible to
const char*
. It is the caller's responsibility
to ensure that the
c_str
method truly returns a null terminated string.
3.2.
Spreading
fud::fud
Now, let's take a look under the hood, at the implementation.
// libfud.cpp #include "libfud.hpp" // declarations #include "fud_config.hpp" // build time information #include <cstdlib> // getenv namespace fud { FUD fud() { FUD fudInfo{}; static_assert(sizeof(GitHash) >= sizeof(fudInfo.revision)); fudInfo.major = FudVersionMajor; fudInfo.minor = FudVersionMinor; fudInfo.patch = FudVersionPatch; copyMem<sizeof(fudInfo.revision) - 1>(fudInfo.revision, FudGitHash); fudInfo.revision[fudInfo.revision.size() - 1] = '\0'; return fudInfo; } /* ... */ } // namespace fud
After the includes and the namespace declaration, we get to the implementation
of
fud::fud()
. First, the default constructor is called to make
fudInfo
. Then,
static_assert
is used to ensure that
GitHash
is at least as
big as
fudInfo
. Then, the version information is assigned to
fudInfo
, and
it's returned - but what is
GitHash
?
// fud_version.hpp #include <cstdint> namespace fud { constexpr uint8_t FudVersionMajor = 0; constexpr uint8_t FudVersionMinor = 42; constexpr uint8_t FudVersionPatch = 0; constexpr const char GitHash[] = "b50980ad70684530d55b7adf20de6047ebf53ba2"; } // namespace fud
As you can see, it's simply a string array. The contents of this file are
derived from a cmake configuration file,
fud_version.hpp.in
. The method to get
the git hash itself is left for a possible future article, but it's easy to
search for it to find how you might like to implement it for your project.
You might be wondering what
An old version
of this code used a different signature of
fudAssert
is doing. Let's dig in.
copyMem
which returns a
status. There is an infallible version I use instead now:
// fallible copyMem FudStatus copyMem(void* destination, size_t destination_size, const void* source, size_t count) { if (anyAreNull(destination, source)) { return FudStatus::NullPointer; } if (destination_size < count) { return FudStatus::ArgumentInvalid; } auto* destPtr = static_cast<char*>(destination); const auto* sourcePtr = static_cast<const char*>(source); for (decltype(destination_size) idx = 0; idx < count; ++idx) { destPtr[idx] = sourcePtr[idx]; } return FudStatus::Success; } // infallible copyMem template <size_t Count, typename T, typename U> void copyMem(T& destination, const U& source) { static_assert(Count <= sizeof(U)); static_assert(Count <= sizeof(T)); static_assert(std::is_standard_layout_v<T>); static_assert(std::is_standard_layout_v<U>); static_assert(std::is_trivially_copyable_v<T>); static_assert(std::is_trivially_copyable_v<U>); auto* destPtr = reinterpret_cast<char*>(&destination); const auto* srcPtr = reinterpret_cast<const char*>(&source); for (size_t idx = 0; idx < Count; ++idx) { destPtr[idx] = srcPtr[idx]; } }
As you can see, the templated version can not exceed the bounds of either T or U, and uses references rather than pointers. It also ensures that both destination and source are trivially copyable and standard layout - see this stack overflow answer for why this is important.
3.3. Assertions
I apologize for not having a proper lead in to this section, but I want to keep
this section despite no longer being connected to
libfud.cpp
directly, since
it is important for the library as a whole.
// fud_assert.hpp #include <source_location> namespace fud { [[noreturn]] void assertFail( const char* assertion, std::source_location sourceLocation = std::source_location::current()); #define fudAssert(expr) ((expr) ? static_cast<void>(0) : \ assertFail(#expr, std::source_location::current())) } // namespace fud
fud_assert.hpp
defines a function, and a macro to invoke the function. The
purpose of the macro is to translate the assertion into a constant string with
the
#
stringifying preprocessor operator, and to explicitly call
assertFail
with the current source location. This is similar to the standard
assert
behavior, but it has a twist: it does not depend on
NDEBUG
to determine its
"true" definition. It always calls
assertFail
when the assertion is false, in
line with
Tiger Style
.
#include "fud_assert.hpp" #include "fud_format.hpp" #include "fud_string.hpp" #include "fud_string_view.hpp" #include <cstdio> namespace fud { struct BufferSink { DrainResult drain(StringView source); }; DrainResult BufferSink::drain(StringView source) { DrainResult result{0, FudStatus::Success}; if (source.m_data == nullptr) { result.status = FudStatus::NullPointer; return result; } if (source.m_length == 0) { result.status = FudStatus::Success; return result; } // cast from const utf8* to const char* const auto* sourcePtr = reinterpret_cast<const char*>(source.m_data); /* TODO: give users control over this functionality */ result.bytesWritten = fwrite(sourcePtr, 1, source.m_length, stderr); if (result.bytesWritten != source.m_length) { result.status = FudStatus::Full; } return result; } [[noreturn]] void assertFail(const char* assertion, const std::source_location sourceLocation) { BufferSink sink; const char* fileName = sourceLocation.file_name(); if (fileName == nullptr) { fileName = "Unknown file"; } const char* functionName = sourceLocation.function_name(); if (functionName == nullptr) { functionName = "Unknown Function"; } auto formatResult = format( sink, FormatCharMode::Unchecked, "{}:{}:{}: {}\n", fileName, functionName, sourceLocation.line(), assertion); static_cast<void>(formatResult); std::terminate(); } } // namespace fud
When the assertion fails, the textual representation of the assertion, the caller's filename, the line number, and the function name are included in a logged message to stderr. Because this is meant to only be used when invariants are violated, the program is terminated ungracefully.
This particular exception implementation uses a custom
fud::format
implementation. If you are unfamiliar with the syntax of
std::format
and
std::format_to_n
, then I recommend that you review
the format spec
. Unlike
std::format
, however, these format strings are not yet compile time
checked.
fud::format
does not allocate directly - instead, a
Sink
is passed
in as an argument, which controls how output is passed to it. Internally, it
uses statically sized, stack-local buffers to hold numeric inputs. It uses a
chunking algorithm to handle pathological cases of large padding sizes. I will
cover
fud::format
in much greater detail in a later article.
Now that we've covered how assertions are implemented, let's unwind
back to
libfud.cpp
.
3.4. Spreading more fud
The second function from
libfud.hpp
is
fud::getEnv
.
Result<String, FudStatus> getEnv(const char* name, Allocator* allocator) { using RetType = Result<String, FudStatus>; if (name == nullptr) { return RetType::error(FudStatus::NullPointer); } const char* resultString = getenv(name); if (resultString == nullptr) { return RetType::error(FudStatus::NotFound); } return String::makeFromCString(resultString, allocator); }
As I mentioned previously,
fud::getEnv
is responsible for checking its input,
checking the output of
getenv
, and returning a sound
Result
.
Result
is a
class that emulates Rust's own result type, templating on a success type
T
and
an error type
E
- for
fud::getEnv
, a
String
and
FudStatus
. We'll dig
further into the details of
Result
in a moment, I promise.
String
is its own
rabbit hole that deserves
its own article
- for now, take it as a class similar
to
std::string
with an explicit allocator argument, which is given the default
globalFudAllocator
in the declaration.
First,
fud::getEnv
checks that its input is valid. If it isn't, it returns the
appropriate
FudStatus
code. Next, it calls
getenv
, and checks that its
return is not null. If it is null, it indicates that no match was found, so
FudStatus::NotFound
is returned. Next, the object
envar
of type
String
is
created, and its validity is checked. This is required when constructing objects
without the ability to throw an exception. If that check fails,
FudStatus::Failure
is returned. Finally,
envVar
is returned as a success.
3.5. Results are nicer than sentinels and out parameters.
The idea of a result type is to allow a single return to encapsulate two
variants depending on if the function succeeded or failed. I have elected to use
std::variant to implement them instead of tagged unions to reduce the
possibility of undefined behavior, but the cost is that an invalid variant
access will throw
std::bad_variant_access
. I consider this an acceptable trade
for the semantics of results compared to throwing exceptions (particularly for
non-exceptional circumstances like network timeouts), pairing out-parameters
with results, or even worse, sentinel values:
int sentinel(int x) { if (isInvalid(x)) { errno = EINVAL; return -1; } return foo(x); } void doThing(std::vector<int> xVec) { for (auto x: xVec) { // -1 may be an acceptable output of f errno = 0; auto tmp = sentinel(x); if (tmp == -1 && errno != 0) { printf("Error %d\n", errno); } else { printf("Mapped %d to %d", x, tmp); } } }
When using a sentinel, it occasionally is the case that the sentinel is a valid
output of the function in question, which requires additional logic to determine
if it's truly an error condition or not. The
abseil
project
recommends against
the use of sentinels
. The C standard library is full of these awful
functions - a good demonstration of a valid sentinel value comes from
atoi
-
in
glibc
and
musl
, 0 is returned on error, but
errno
is not set! The onus
is on the user to check that the string is not convertible to 0 to determine if
it is an error condition. Thankfully,
strtol
rectifies the situation… by
allowing
errno
to be set, to facilitate determining if the sentinel value is
valid. This also necessitates setting
errno
to 0 before making the
call. Examining the next alternative, using an out parameter, doesn't make a
compelling argument for ergonomics either.
FudStatus outParameter(int x, int& out) { if (isInvalid(x)) { return FudStatus::InvalidInput; } out = foo(x); return FudStatus::Success; } void doThing(std::vector<int> xVec) { for (auto x: xVec) { int out = 0; auto status = outParameter(x, out); if (status != FudStatus::Success) { printf("Error %s\n", FudStatusToString(status)); } else { printf("Mapped %d to %d", x, out); } } }
As you can see, using an out parameter requires that the argument used to take
the output must exist before the function call. The state of the value may be
indeterminate, and should not be used in an error scenario before writing a
different value to it. Now, let's see why I think
Result
is the right way of
handling this:
Result<int, FudStatus> returnResult(int x) { if (isInvalid(x)) { return Result<int, FudStatus>::error(FudStatus::InvalidInput); } return Result<int, FudStatus>::okay(foo(x)); } void doThing(std::vector<int> xVec) { for (auto x: xVec) { auto result = returnResult(x); if (result.isError()) { printf("Error %s\n", FudStatusToString(result.getError())); } else { printf("Mapped %d to %d", x, result.getOkay()); } } }
C++ finally has its own result type in the standard library with the addition of std::expected to the standard library in C++23.
3.6. Result: The Foundation of libfud
It is not an exaggeration to say that the
Result
class is a foundational
building block of this library. At the time of writing, a grep in the include
and source directories finds at least 92 instances of it - and many functions
which do not return a
Result
instead return
FudStatus
. For the sake of
expedience, I will post only a shortened version, skipping move-versions of
functions - you can view the full implementation
here
.
#include <variant> #include <utility> namespace fud { /** \brief A result type which contains either a T on success or an E on error. */ template <typename T, typename E> class [[nodiscard]] Result { public: using ResultType = Result<T, E>; static constexpr ResultType okay(const T& okay) { return ResultType{okay}; } static constexpr ResultType error(const E& error) { return ResultType{error}; } [[nodiscard]] constexpr bool isOkay() const { return (m_value.index() == 0); } [[nodiscard]] constexpr bool isError() const { return (m_value.index() == 1); } [[nodiscard]] constexpr const T& getOkay() const& { return std::get<T>(m_value); } [[nodiscard]] constexpr const T& getOkayOr(const T& alternative) const& { if (!isOkay()) { return alternative; } return std::get<T>(m_value); } [[nodiscard]] constexpr const E& getError() const& { return std::get<E>(m_value); } [[nodiscard]] constexpr const E& getErrorOr(const E& alternative) const& { if (!isError()) { return alternative; } return std::get<E>(m_value); } private: constexpr Result() : m_value() {} std::variant<T, E> m_value; }; } // namespace fud
Starting with the obvious: the class is templated on the success type, T, and
the error type, E. The methods
isOKay
and
isError
do what they say on the
tin;
isOkay
is analogous to the member function
has_value
of
std::expected
. Access to the value is guarded through the
getOkay
and
getError
methods; grabbing the wrong variant is a sure-fire way to stop the
program. An alternative is to use the
getOkayOr
and
getErrorOr
methods, which
will return the supplied value if the result is an error or okay, respectively.
I'm not sure how I feel about the design of
std::expected.
I would say it's
better than nothing, but the framing of the abstraction is one which is so
asymmetrically preferential to the happy path that handling the sad path is
unduly burdensome. I prefer Rust's design for a simple reason: C++ lacks the
pattern matching
that truly makes
Result
a first class citizen in Rust. For
the foreseeable future, whether as additions to the standard library or in 3rd
party libraries,
std::expected
,
std::optional
, and other algebraic data
types can not be as strong of a design choice as they are in languages with
pattern matching.
3.7. FudStatus
There's one major element left from the opening discussion about
libfud.hpp
to
discuss,
FudStatus
.
FudStatus
is a scoped enumeration meant to capture
standard failure modes, as well as success.
// fud_status.hpp enum class [[nodiscard]] FudStatus : int32_t { Success = 0, NullPointer, Failure, NotFound, Partial, /* Omitted */ NotImplemented, NotSupported }; constexpr const char* FudStatusToString(FudStatus status) { switch (status) { case FudStatus::Success: return "Success"; case FudStatus::NullPointer: return "NullPointer"; /* Omitted */ default: return "Unknown"; } }
The function
FudStatusToString
can also be used for a string representation of
the status, to facilitate pretty printing the status.
4. Additional components of libfud
With that, the introduction to libfud is complete. There are more parts: string conversion, formatting, and the wrapper the SQLite C interface would make good articles to cover. As I write additional articles, I will link back to them on this page.