wallet/docs/libraries/cpp_automatic_memory_management.md

612 lines
23 KiB
Markdown
Raw Normal View History

---
title:
C++ Automatic Memory Management
---
# Memory Safety
Modern, mostly memory safe C++, is enforced by:\
- gsl
- Microsoft safety checker
- Guidelines
- language checker
`$ clang-tidy test.cpp -checks=clang-analyzer-cplusplus*, cppcoreguidelines-*, modernize-*` will catch most of the issues that esr
complains about, in practice usually all of them, though I suppose that as
the project gets bigger, some will slip through.
static_assert(__cplusplus >= 201703, "C version of out of date");
The gsl adds span for pointer arithmetic, where the
size of the array pointed to is kept with the pointer for safe iteration and
bounds checking during pointer maths. This should be available in the standard template library with C20.
Modern C++ as handles arrays as arrays where possible, but they quickly
decay to pointers which you avoid using spans. std::array is a C array
whose size is known at compile time, and which is protected from decay to
a pointer. std::vector is a dynamically resizable and insertable array
protected from decay to a pointer which can have significant overheads.
std::make_unique, std::make_shared create pointers to memory managed
objects. (But single objects, not an array, use spans for pointer
arithmetic)
auto sp = std::make_shared<int>(42);
std::weak_ptr<T> wp{sp};
# Array sizing and allocation
/* This code creates a bunch of "brown dog" strings on the heap to test automatic memory management. */
char ca[]{ "red dog" }; //Automatic array sizing
std::array<char,8> arr{"red dog"}; //Requires #include <array>
/* No automatic array sizing, going to have to count your initializer list. */
/* The pointer of the underlying array is referenced by &arr[0] but arr is not the underlying array, nor a pointer to it. */
/* [0] invokes operator[], and operator[] is the member function that accesses the underlying array.*/
/* The size of the underlying array is referenced by arr.size();*/
/* size known at compile time, array can be returned from a function getting the benefits of stack allocation.*/
// can be passed around like POD
char *p = new char[10]{ "brown dog" }; //No automatic array
// sizing for new
std::unique_ptr<char[]>puc{ p }; // Now you do not have
// to remember to delete p
auto puc2 = std::move(puc); /* No copy constructor. Pass by reference, or pass a view, such as a span.*/
std::unique_ptr<char> puc3{ new char[10]{ "brown dog" } };
/* Array size unknown at compile or run time, needs a span, and you have to manually count the initialization list. */
/* Compiler guards against overflow, but does not default to the correct size.*/
/* You can just guess a way too small size, and the compiler in its error message will tell you what the size should be. */
auto pu = std::make_unique<char[]>(10); // uninitialized,
// needs procedural initialization.
/* span can be trivially created from a compile time declared array, an std:array or from a run time std:: vector, but then these things already have the characteristics of a span, and they own their own storage. */
/* You would use a span to point into an array, for example a large blob containing smaller blobs.*/
// Placement New:
char *buf = new char[1000]; //pre-allocated buffer
char *p = buf;
MyObject *pMyObject = new (p) MyObject();
p += (sizeof(MyObject+7)/8)*8
/* Problem is that you will have to explictly call the destructor on each object before freeing your buffer. */
/* If your objects are POD plus code for operating on POD, you dont have to worry about destructors.*/
// A POD object cannot do run time polymorphism.
/* The pointer referencing it has to be of the correct compile time type, and it has to explicitly have the default constructor when constructed with no arguments.*/
/* If, however, you are building a tree in the pre-allocated buffer, no sweat. */
/* You just destruct the root of the tree, and it recursively destructs all its children. */
/* If you want an arbitrary graph, just make sure you have owning and non owning pointers, and the owning pointers form a tree. */
/* Anything you can do with run time polymorphism, you can likely do with a type flag.*/
static_assert ( std::is_pod<MyType>() , "MyType for some reason is not POD" );
class MyClass
{
public:
MyClass()=default; // Otherwise unlikely to be POD
MyClass& operator=(const MyClass&) = default; // default assignment Not actually needed, but just a reminder.
};
### alignment
```c++
// every object of type struct_float will be aligned to alignof(float) boundary
// (usually 4)
struct alignas(float) struct_float {
// your definition here
};
// every object of type sse_t will be aligned to 256-byte boundary
struct alignas(256) sse_t
{
float sse_data[4];
};
// the array "cacheline" will be aligned to 128-byte boundary
alignas(128) char cacheline[128];
```
# Construction, assignment, and destruction
six things: ([default
constructor](https://en.cppreference.com/w/cpp/language/default_constructor),
[copy
constructor](https://en.cppreference.com/w/cpp/language/copy_constructor),
[move
constructor](https://en.cppreference.com/w/cpp/language/move_constructor),
[copy
assignment](https://en.cppreference.com/w/cpp/language/copy_assignment),
[move
assignment](https://en.cppreference.com/w/cpp/language/move_assignment)
and [destructor](https://en.cppreference.com/w/cpp/language/destructor))
are generated by default except when they are not.
So it is arguably a good idea to explicitly declare them as default or
deleted.
Copy constructors
A(const A& a)
Copy assignment
A& operator=(const A other)
Move constructors
class_name ( class_name && other)
A(A&& o)
D(D&&) = default;
Move assignment operator
V& operator=(V&& other)
Move constructors
class_name ( class_name && )
## rvalue references
Move constructors and copy constructors primarily exist to tell the
compiler how to handle temporary values, rvalues, that have references to possibly
costly resources.
`class_name&&` is rvalue reference, the canonical example being a reference to a compiler generated temporary.
The primary purpose of rvalue references is to support move semantics in
objects that reference resources, primarily unique_pointer.
`std::move(t)` is equivalent to `static_cast<decltype(t)&&>(t)`, causing move
semantics to be generated by the compiler.
`t`, the compiler assumes is converted by your move constructor or move assignment into a valid state where your destructor will not need to anything very costly.
`std::forward(t)` causes move semantics to be invoked iff the thing referenced
is an rvalue, typically a compiler generated temporary, *conditionally*
forwarding the resources.
where `std::forward` is defined as follows:
template< class T > struct remove_reference {
typedef T type;
};
template< class T > struct remove_reference<T&> {
typedef T type;
};
template< class T > struct remove_reference<T&&> {
typedef T type;
};
template<class S>
S&& forward(typename std::remove_reference<S>::type& a) noexcept
{
return static_cast<S&&>(a);
}
`std::move(t)` and `std::forward(t)` don't actually perform any action
in themselves, rather they cause the code referencing `t` to use the intended
copy and intended assignment.
## constructors and destructors
If you declare the destructor deleted that prevents the compiler from
generating its own, possibly disastrous, destructor, but then, of
course, you have to define your own destructor with the exact same
signature, which would ordinarily stop the compiler from doing that
anyway.
When you declare your own constructors, copiers, movers, and deleters,
you should generally mark them noexcept.
struct foo {
foo() noexcept {}
foo( const foo & ) noexcept { }
foo( foo && ) noexcept { }
~foo() {}
};
Destructors are noexcept by default. If a destructor throws an exception as
a result of a destruction caused by an exception, the result is undefined,
and usually very bad. This problem is resolved in complicated ad hoc
ways that are unlikely to be satisfactory.
If you need to define a copy constructor, probably also need to define
an assignment operator.
t2 = t1; /* calls assignment operator, same as "t2.operator=(t1);" */
Test t3 = t1; /* calls copy constructor, same as "Test t3(t1);" */
## casts
You probably also want casts. The surprise thing about a cast operator
is that its return type is not declared, nor permitted to be declared,
DRY. Operator casts are the same thing as constructors, except declared
in the source class instead of the destination class, hence most useful
when you are converting to a generic C type, or to the type of an
external library that you do not want to change.
struct X {
int y;
operator int(){ return y; }
operator const int&(){ return y; } /* C habits would lead you to incorrectly expect "return &y;", which is what is implied under the hood. */
operator int*(){ return &y; } // Hood is opened.
};
Mpir, the Visual Studio skew of GMP infinite precision library, has some
useful and ingenious template code for converting C type functions of
the form `SetAtoBplusC(void * a, void * b, void * c);` into C++
expressions of the form `a = b+c*d;`. It has a bunch of intermediate
types with no real existence, `__gmp_expr<>` and `__gmp_binary_expr<>`
and methods with no real existence, which generate the appropriate
calls, a templated function of potentially unlimited complexity, to
convert such an expression into the relevant C type calls using
pointers. See section mpir-3.0.0.pdf, section 17.5 “C++ Internals”.
I dont understand the Mpir code, but I think what is happening is that
at run time, the binary expression operating on two base types creates a
transient object on the stack containing pointers to the two base types,
and the assignment operator and copy create operator then call the
appropriate C code, and the operator for entities of indefinite
complexity creates base type values on the stack and a binary expression
operator pointing to them.
Simpler, but introducing a redundant copy, to always generate
intermediate values on the stack, since we have fixed length objects
that do not need dynamic heap memory allocation, not that costly, and
they are not that big, at worst thirty two bytes, so clever code is apt
to cost in overheads of pointer management
That just means we are putting 256 bits of intermediate data on the
stack instead of 128, hardly a cost worth worrying about. And in the
common bad case, (a+b)\*(c+d) clever coding would only save one stack
allocation and redundant copy.
# Template specialization
namespace N {
template<class T> class Y { /*...*/ }; // primary template
template<> class Y<double> ; // forward declare specialization for double
}
template<>
class N::Y<double> { /*...*/ }; // OK: specialization in same namespace
is used when you have sophisticated template code, because you have to
use recursion for looping as the Mpir system uses it to evaluate an
arbitrarily complex recursive expression but I think my rather crude
implementation will not be nearly so clever.
extern template int fun(int);
/*prevents redundant instantiation of fun in this compilation unit and thus renders the code for fun unnecessary in this compilation unit.*/
# Template traits, introspection
Template traits: C++ has no syntactic sugar to ensure that your template
is only called using the classes you intend it to be called with.
Often you want different templates for classes that implement similar functionality in different ways.
This is the entire very large topic of template time, compile time code, which is a whole new ball of wax that needs to be dealt with elsewhere
# Abstract and virtual
An abstract base class is a base class that contains a pure virtual
function ` virtual void features() = 0;`.
A class can have a virtual destructor, but not a virtual constructor.
If a class contains virtual functions, then the default constructor has
to initialize the pointer to the vtable. Otherwise, the default
constructor for a POD class is empty, which implies that the default
destructor is empty.
The copy and swap copy assignment operator, a rather slow and elaborate
method of guaranteeing that an exception will leave the system in a good
state, is never generated by default, since it always relates to rather
clever RAII.
An interface class is a class that has no member variables, and where
all of the functions are pure virtual! In other words, the class is
purely a definition, and has no actual implementation. Interfaces are
useful when you want to define the functionality that derived classes
must implement, but leave the details of how the derived class
implements that functionality entirely up to the derived class.
Interface classes are often named beginning with an I. Heres a sample
interface class:.
class IErrorLog
{
public:
virtual bool openLog(const char *filename) = 0;
virtual bool closeLog() = 0;
virtual bool writeError(const char *errorMessage) = 0;
virtual ~IErrorLog() {} // make a virtual destructor in case we delete an IErrorLog pointer, so the proper derived destructor is called
// Notice that the virtual destructor is declared to be trivial, but not declared =0;
};
[Override
specifier](https://en.cppreference.com/w/cpp/language/override)
struct A
{
virtual void foo();
void bar();
};
struct B : A
{
void foo() const override; // Error: B::foo does not override A::foo
// (signature mismatch)
void foo() override; // OK: B::foo overrides A::foo
void bar() override; // Error: A::bar is not virtual
};
Similarly [Final
specifier](https://en.cppreference.com/w/cpp/language/final)
[To obtain aligned
storage](http://www.cplusplus.com/reference/type_traits/aligned_storage/)for
use with placement new
void* p = aligned_alloc(sizeof(NotMyClass));
MyClass* pmc = new (p) MyClass; //Placement new.
// ...
pmc->~MyClass(); //Explicit call to destructor.
aligned_free(p);.
# GSL: Guideline Support Library
The Guideline Support Library (GSL) contains functions and types that
are suggested for use by the C++ Core Guidelines maintained by the
Standard C++ Foundation. This repo contains [Microsofts implementation
of GSL](https://github.com/Microsoft/GSL).
git clone https://github.com/Microsoft/GSL.git
cd gsl
git tag
git checkout tags/v2.0.0
Which implementation mostly works on gcc/Linux, but is canonical on
Visual Studio.
For usage of spans ([the replacement for bare naked non owning pointers
subject to pointer
arithmetic)](http://codexpert.ro/blog/2016/03/07/guidelines-support-library-review-spant/)
For usage of string spans ([String
spans](http://codexpert.ro/blog/2016/03/21/guidelines-support-library-review-string_span/)
These are pointers to char arrays. There does not seem to be a UTF8
string_span.
GSL is a preview of C++20, as boost contained a preview of C++11.
It is disturbingly lacking in official documentation, perhaps because
still subject to change.
[Unofficial
documentation](http://modernescpp.com/index.php/c-core-guideline-the-guidelines-support-library)
It provides an optional fix for Cs memory management problems, while
still retaining backward compatibility to the existing pile of rusty
razor blades and broken glass.
# The Curiously Recurring Template Pattern
[CRTP](https://www.fluentcpp.com/2017/05/16/what-the-crtp-brings-to-code/),
makes the relationship between the templated base class or classes and
the derived class cyclic, so that the derived class tends to function as
real base class. Useful for mixin classes.
template <typename T> class Mixin1{
public:
// ...
void doSomething() //using the other mixin classes and the derived class T
{
T& derived = static_cast<T&>(*this);
// use derived...
}
private:
mixin1(){}; // prevents the class from being used outside the mix)
friend T;
};
template <typename T> class Mixin2{
{
public:
// ...
void doSomethingElse()
{
T& derived = static_cast<T&>(*this);
// use derived...
}
private:
Mixin2(){};
friend T;
};
class composite: public mixin1<composite>, public mixin2<composite>{
composite( int x, char * y): mixin1(x), mixin2(y[0]) { ...}
composite():composite(7,"a" ){ ...}
}
# Aggregate initialization
A class of aggregate type has no constructors the aggregate
constructor is implied default.
A class can be explicitly defined to take aggregate initialization
Class T{
T(std::initializer_list<const unsigned char> in){
for (auto i{in.begin); i<in.end(); i++){
do stuff with i
}
}
but that does not make it of aggregate type. Aggregate type has *no*
constructors except default and deleted constructors
# functional programming
To construct a lambda in the heap:
auto p = new auto([a,b,c](){})
Objects inside the lambda are constructed in the heap.
similarly placement `new`, and `unique_ptr`.
To template a function that takes a lambda argument:
template <typename F>
void myFunction(F&& lambda){
//some things
You can put a lambda in a class using decltype,and pass it around for
continuations, though you would probably need to template the class:
template<class T>class foo {
public:
T func;
foo(T in) :func{ in } {}
auto test(int x) { return func(x); }
};
....
auto bar = [](int x)->int {return x + 1; };
foo<(bar)>foobar(bar);
But we had to introduce a name, bar, so that decltype would have
something to work with, which lambdas are intended to avoid. If we are
going to have to introduce a compile time name, easier to do it as an
old fashioned function, method, or functor, as a method of a class that
is very possibly pod.
If we are sticking a lambda around to be called later, might copy it by
value into a templated class, or might put it on the heap.
auto bar = []() {return 5;};
You can give it to a std::function:
auto func_bar = std::function<int()>(bar);
In this case, it will get a copy of the value of bar. If bar had
captured anything by value, there would be two copies of those values on
the stack; one in bar, and one in func_bar.
When the current scope ends, func_bar will be destroyed, followed by
bar, as per the rules of cleaning up stack variables.
You could just as easily allocate one on the heap:
auto bar_ptr = std::make_unique(bar);
std::function <int(int)> increm{[](int arg{return arg+1;}}
presumably does this behind the scenes
On reflection we could probably use this method to produce a
templated function that stored a lambda somewhere in a templated class
derived from a virtual base class for execution when the event triggered
by the method fired, and returned a hashcode to the templated object for
the event to use when the event fired. The event gets the event handler
from the hashcode, and the virtual base class in the event handler fires
the lambda in the derived class, and the lambda works as a continuation,
operating in the context wherein it was defined, making event oriented
programming almost as intuitive as procedural programming.
But then we have a problem, because we would like to store event
handlers in the database, and restore them when program restarts, which
requires pod event handlers, or event handlers constructible from POD
data, which a lambda is not.
We could always have some event handlers which are inherently not POD
and are never sent to a database, while other event handlers are, but
this violates the dry design principle. To do full on functional
programming, use std::function and std::bind, which can encapsulate
lambdas and functors, but are slow because of dynamic allocation
C++ does not play well with functional programming. Most of the time you
can do what you want with lambdas and functors, using a pod class that
defines operator(\...)
# auto and decltype(variable)
In good c++, a tremendous amount of code behavior is specified by type
information, often rather complex type information, and the more ones
code description is in types, the better.
But specifying types everywhere violates the dry principle, hence,
wherever possible, use auto and decltype(variable) to avoid redundant
and repeated type information. Wherever you can use an auto or a
decltype for a type, use it.
In good event oriented code, events are not triggered procedurally, but
by type information or data structures, and they are not handled
procedurally, as by defining a lambda, but by defining a derived type.
# Variable length Data Structures
C++ just does not handle them well, except you embed a vector in them,
which can result in messy reallocations.
One way is to drop back into old style C, and tell C++ not to fuck
around.
struct Packet
{
unsigned int bytelength;
unsigned int data[];
private:
// Will cause compiler error if you misuse this struct
void Packet(const Packet&);
void operator=(const Packet&);
};
Packet* CreatePacket(unsigned int length)
{
Packet *output = (Packet*) malloc((length+1)*sizeof(Packet));
output->bytelength = length;
return output;
}
Another solution is to work around C++s inability to handle variable
sized objects by fixing your hash function to handle disconnected data.
# for_each
template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function fn){
while (first!=last) {
fn (*first);
++first;
}
return move(fn);
}
# Range-based for loop
for(auto x: temporary_with_begin_and_end_members{ code;}
for(auto& x: temporary_with_begin_and_end_members{ code;}
for(auto&& x: temporary_with_begin_and_end_members{ code;}
for (T thing = foo(); auto& x : thing.items()) { code; }
The types of the begin_expr and the end_expr do not have to be the same,
and in fact the type of the end_expr does not have to be an iterator: it
just needs to be able to be compared for inequality with one. This makes
it possible to delimit a range by a predicate (e.g. “the iterator
points at a null character”).
If range_expression is an expression of a class type C that has both a
member named begin and a member named end (regardless of the type or
accessibility of such member), then begin_expr is \_\_range.begin() and
end_expr is \_\_range.end();
for (T thing = foo(); auto x : thing.items()) { code; }
Produces code equivalent to:
T thing = foo();
auto bar = thing.items();
auto enditer = bar.end;
for (auto iter = bar.begin(); iter != enditer; ++iter) {
x = *iter;
code;
}