Functions and Files

← Back to the blog

Functions

A function is a collection of statements, and consists of a function signature, and a body. Whereas the signature describes the input and output the function requires and gives, while the body is a collection of statements.

A function can have:

int foobar(int a) { return 42; }

or more abstractly

void|type foo(...parameter-list) { ...statements };

With the introduction of functions, we come across another concept, called lifetimes, and scopes. A lifetime is the period of execution in which the variable is alive, so to speak (we will get back to this), while a scope is where in a program we can access the given variable, when it is alive.

For a function, everything declared inside the function has the same lifetime, meaning that they come to life when the function is called, and are then no longer usable, after the function is done executing. Let us look at an example.

int foo(int a) { int b{}; }

Both, int a; and int b will here have the same lifetime. Coming to life when the function is called, and then destroyed when the function is done. There is one curiosity to keep in mind though. Although their lifetimes are the same, they are created and destroyed in order, and then destroyed in the reverse order again.

int foo(int a, int b) { int c {2}; } // c, b, then a destroyed

Will first create the variable a, then b, and at last c. Then, when destroying them, the order is the opposite. c -> b -> a.

Since the lifetime is the same, function parameters is considered function local:

void foo(int a) { ... };

Really has the same lifetime as:

void foo() { int a{} };

Temporary Objects

Function returns will be our first encounter with temporary objects. Temporary objects are created by the compiler when it needs to hold a value for a short period of time. Function returns are one such way. This return creates a temporary object, holding a copy of the value returned. Temporary objects have no scope, and are destroyed at the end of the full expression in which they are used. Thus temporary objects are always destroyed before the next statement executes.

int foo() { return 42; }; int a = foo(); // Returns 42

The compiler will here create a temporary variable, holding the value 42, which is then valid until the end of the given statement, allowing the int a variable to be copy-initialised with the given value from the temporary variable (think of it as int tmp {42}; int a = tmp; // Then tmp is destroyed.)

Function declarations and definitions

Just like variables, functions also have to be declared, so that the compiler knows the function signature, before generating the code for the function call. This is important, as we noticed in our study of variables, variables can have different sizes, and hence, the compiler must know, when generating the code for a function call, how much space it has to set aside for each variable. This takes us to the topic of an ABI but is not something we are going to dig into now. Maybe later! It is an interesting topic.

These function definitions we can split out into a separate file, called a header file, which then can hold the declaration of the function signature.

// foo.hpp int foo();
// main.cpp #include "foo.hpp" int main() { int a {foo()}; } int foo() { return 42; }

This means that we can now declare the function signature in another foo.hpp file, and then define it later on in the main.cpp file, and everything will compile and work (which it does not, if foo.hpp is not included.

This brings us nicely onto our next topic, which is compilation units. Which is, as the name suggests, a self-standing unit of compilation. A very simple compilation unit for our project could be the collection of a foo.hpp and foo.cpp file, with both the function declaration in the header file, and then the function definition in the cpp file.

// foo.cpp int foo() {return 42; }

This means that given we are creating a self-standing unit of our program. This unit, now only has to be compiled once, for every instance it is used in our program. While though it may not seem like much, for larger projects, this is a massive speed boost, as we do not have to recompile the whole project, every time we change a line. For our project, for instance, changing a line in main.cpp will not recompile the int foo() function. When your project grows to thousands modules, this is a big save. Now, the tuple foo{cpp,hpp} is what is called a compilation-unit a freestanding unit of compilation! (it is that simple).

This means that a compiler can compile each file independently. It does not need to have any knowledge of what is going on in any other file, that the one it is compiling. This is a classic example of seperation of concerns which is principle in computer science you will see a lot of utilty in.

ODR - The One Definition Rule

If we keep building on our foo project. A long time goes by, and another int foo() function is called, but now with a different implementation, giving us the following two funnctions:

// foo.cpp int foo() { return 42; }
// bar.cpp int foo() { return 42 * 2; }
// main.cpp #include "foo.hpp" #include "bar.hpp" int main { int a { foo() }; }

Will the variable a in this instance be initialised with the value 42 or the double of given value?

The answer is that it will not even compile. The reason is that it has broken the One Definition Rule which states that every variable and function can have only one definition in a given program unit.

The ODR rule comes in three parts

  1. Inside a single file, or namespace, no two functions are allowed to have the same definition multiple times.

  2. Within a whole program, each given function or variable in a given scope, is only allowed to have one definition. This only includes the symbols visible to the linker (i.e., symbols having internal linkage).

  3. Types, Templates, Inline Functions and Inline Variables are allowed to have multiple definitions, in different files, as long as the definitions are identical.

Namespaces

One solution to this is using namespaces which can give a different scope to the symbols defined within them. Hence it reduces the possiblity of naming conflicts in between files, when multiple are included in a project.

A namespaces in C++ is the mechanism for scoping identifiers. An identifier in namespace A, can have all the same symbols as an identifier in namespace B.

This avoids the name collisions which would otherwise be introduced.

Depending on where the naming collision is introduced, the result will either be an error during compilation, or if the are introduced in separate files, the result is that the linker will bork.

The no-namespace , or the global namespace as it is called, is where everything, not in a declared namespace is put: by default.

Parsing

Parsing of a header file will proceed in multiple iterations.

  1. The compiler macros will be handled (i.e., all the ifdefs, and other macros
  2. are expanded, disgarded)
  3. All headers are recursively expanded
  4. The newly fully expanded source-file is parsed

Header files

All .cpp files should have a matching .hpp file. And the header should have definitions. Not code.

Reason for this, is that a header file could potentially be included in a lot of different source files, and a change in the header will cause all files including it to get recompiled. Believe me. In a big project, this turns into an issue really fast.

Angled brackets, vs double quotes (<> vs "") inclusion

The <> vs "" difference, is that "" is resolved locally first. This means, even if there are multiple files named foo.hpp in your project. Doing "" means that it will be included from the foo.hpp file found locally in the same folder as the foo.cpp file you are currently working on.

Angled brackets, means that the preprocessor will look for the header file in the directories given by the `Include` parameters given to the compiler (for system headers, this is added automatically by the compiler itself).

Header guards

Guarding against duplicate definitions...

Remember, that you may include two files, which both includes a third file. Then, when including, your data, definitions and declarations from the third file will be included multiple times, and the program will fail to compile, to the frustration of everyone involved.

The well established solution to this problem, is the infamous header-guard:

#ifndef FILE_NAME_HPP
#define FILE_NAME_HPP

... source

#endif
      

And all is again well.

It might also be advisable to add the file path to the header-guard, to prevent symbol collisions.

Taking a look at the standard library included with my installation of `Clang`, we will find:

/*===---- stdbool.h - Standard header for booleans -------------------------===
*
* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
* See https://llvm.org/LICENSE.txt for license information.
* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
*
*===-----------------------------------------------------------------------===
*/

#ifndef __STDBOOL_H
#define __STDBOOL_H
      

Where it does not help

Header-guards does not guard against inclusion into different source-files. Hence we could still wind up with a header file included multiple times (and it's symbols and definitions) into the same project (consisting of multiple source files).

Example:

// foo.hpp
#ifndef FOO_HPP
#define FOO_HPP

int foo_bar() {
    return 1;
}

int do_foo_bar();

#endif
      
// bar.cpp
#include "foo.hpp"

int do_foo_bar() {
    return foo_bar();
}
                
// main.cpp
#include "foo.hpp"

int main(int argc, char *argv[])
{
    foo_bar();
    do_foo_bar();
    return 0;
}
                

Giving the error during compilation:

// console
❌1 zsh ❯ g++ bar.cpp main.cpp -o main
/nix/store/9g4gsby96w4cx1i338kplaap0x37apdf-binutils-2.43.1/bin/ld: /tmp/ccuARkCs.o: in function `foo_bar()':
main.cpp:(.text+0x0): multiple definition of `foo_bar()'; /tmp/ccvfToLj.o:bar.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
                

Formatted for clarity:

// console

/nix/store/9g4gsby96w4cx1i338kplaap0x37apdf-binutils-2.43.1/bin/ld:
/tmp/ccuARkCs.o:

in function `foo_bar()':
main.cpp:(.text+0x0):
multiple definition of `foo_bar()';
/tmp/ccvfToLj.o:bar.cpp:(.text+0x0):
first defined here
collect2: error: ld returned 1 exit status
                

The solution is of course to only have declarations in the header-file, and the definitions in an accompanying source-file.

// foo.hpp
#ifndef FOO_HPP
#define FOO_HPP

int foo_bar();
int do_foo_bar();

#endif
                

Now there will only be one definition of every function, and thus no violation of ODR, and hence the linker will be happy!

The C++ alternative to header-guards

C++ has it's own shorter version of a header-guard:

//file.hpp
#pragma once

... source
    

Fun fact: #pragma once is not defined by the C++ standard, and as such, there is really no guarantee that your compiler will support it. As such, it is frowned upon in some software houses. In fact, it has not been allowed any place I have worked so far! (Although personally I like and use it!). And it is funny to note, that the std-lib does use it then of course!