Stateless Properties

4. Stateless Properties

Stateless properties are the foundational block of all property-based testing tools, and tend to be supported across all implementations of quickcheck derivatives in the wild. Knowing and understanding these gives the primitives required to use all kinds of property-based testing frameworks, but more importantly, they are the first step in starting to think in properties rather than in enumerated examples.

This chapter introduces the general structure of a property, how to execute property tests, and the data generators that are available out of the box when using PropEr or Quickcheck in Erlang. The chapter then concludes with a simple case study related to writing simple properties.

4.1. Structure of Properties

PropEr’s plugin allows the creation of basic properties through rebar3’s plugin facilities:

→ rebar3 new proper base
===> Writing test/prop_base.erl

The generated file should contain the following sections:

-module(prop_base).
-include_lib("proper/include/proper.hrl").

%%%%%%%%%%%%%%%%%%
%%% Properties %%% (1)
%%%%%%%%%%%%%%%%%%
prop_test() ->
    ?FORALL(Type, mytype(),
        begin
            boolean(Type)
        end).

%%%%%%%%%%%%%%%
%%% Helpers %%% (2)
%%%%%%%%%%%%%%%
boolean(_) -> true.

%%%%%%%%%%%%%%%%%%
%%% Generators %%% (3)
%%%%%%%%%%%%%%%%%%
mytype() -> term().

1	a section for properties, the test cases themselves
2	a section for helper functions, which will likely handle part of the work required by more complex properties
3	a section for generators, special functions used to generate data

Not all property test suites will have this structure in that specific order, but their elements will likely be there in all cases.

Properties as shown in the previous code sample make heavy use of Erlang macros. The macro has the form:

            (1)
?FORALL(InstanceOfType, TypeGenerator, (2)
        PropertyExpression). (3)

1	This is a variable that gets bound to the value of the generated data structure
2	This is a call to a function that generates data. This is usually done through or with the help of special functions provided by the framework.
3	An expression that must return either `true` if the property holds, or `false` if it fails.

Since PropertyExpression is a single expression, usage of a begin … end expression is frequent. It allows to wrap more complex expressions into a single one. The ?FORALL macro can be translated as:

proper:forall(TypeGenerator, fun(InstanceOfType) -> PropertyExpression end).

This format may feel less magical, and removes the need for the less-often needed begin … end expression. However, since all property-based testing tools in Erlang implement the ?FORALL macro interface, they may end up being more readable to other developers in the end.

4.2. Running Properties

Properties can be run from the command line by calling rebar3 proper:

→ rebar3 proper
[...]
===> Testing prop_base:prop_test()
....................................................................................................
OK: Passed 100 test(s).
===>
1/1 properties passed

In this output, every period is a successful test case. For every single one of these test cases, PropEr did the following:

it expanded the type generator (term()) to a given data type
it passed the expanded type to the property, binding it to the InstanceofType variable
it validated that the actual PropertyExpression returned true

That term() generator is something that is not standard in Erlang. That’s where property-based testing frameworks do a bit of magic. The include file directives at the top of the file is causing all of it:

-include_lib("proper/include/proper.hrl").

The file contains macro code and inlines and imports multiple functions from proper. The term() function is actually shorthand for the proper_types:term() function call, which gives the following result when called from a shell:

→ rebar3 as test shell
[...]
1> proper_types:term().
{'$type',[{env,[{'$type',[{env,{inf,inf}},
                          {generator,{typed,#Fun<proper_types.7.91186128>}},
                          {is_instance,{typed,#Fun<proper_types.8.91186128>}},
                          {shrinkers,[#Fun<proper_types.9.91186128>]},
                          [...]

Each generator is a function that returns metadata that PropEr knows how to interpret. This is a high level recipes for data types, and it will be handled automatically by the framework. To debug generators in the terminal, PropEr can be told to generate an instance of the data type through the proper_gen:pick(Type) function call:

2> proper_gen:pick(proper_types:term()).
{ok,{30,'iPMÒR\237M\203',[],0,-2.3689578345518227,{},
     [[],4,{}],
     -0.04598152960751201}}
3> proper_gen:pick(proper_types:term()).
{ok,['^ l&\212iå×\210\n',1.5133511315056003,
     -8.964623044991622,-49.99970474088979,
     {<<1:1>>,1,5.666317902012531,{}},
     3.799206557402714,-0.5871676980812601,3]}
4> proper_gen:pick(proper_types:number()).
{ok,-6}

This will prove useful when debugging generators, especially as they grow in complexity.

4.3. Default Generators

Property-based testing frameworks tend to come with a few basic generators included. In the case of PropEr and Quickcheck, those are:

Generator

Data Generated

Sample

any()

all Erlang terms producible by PropEr

any of the samples below

atom()

Erlang atoms

'ós43Úrcá\200'

binary()

Binary data aligned on bytes

<<2,203,162,42,84,141>>

binary(Len)

Binary data of fixed length Len, in bytes

<<98,126,144,175,175>>

bitstring()

Binary data without byte alignment

<<10,74,2:3>>

bitstring(Len)

Binary data of fixed length Len, in bits

<<11:5>>

boolean()

Atoms true or false. Also bool()

true, false

byte()

Integer between 0 and 255 inclusively

154

char()

Character codepoint, between 0 and 1114111 inclusively

23

choose(Min, Max)

Any integer between Min and Max inclusively

choose(1, 1000) ⇒ 596

fixed_list([Type])

Generates a list where all entries match a generator

fixed_list([boolean(), byte(), atom()]) ⇒ [true,2,b]

float()

Floating point number. Also real().

4.982972307245969

float(Min, Max)

Floating point number between Min and Max inclusively

float(0.0, 10.0) ⇒ 5.934602482212932

frequency([{N, Type}])

Returns the generator of one of those in the second tuple element, with a probability similar to the value N. Also weighed_union/1

frequency([{1, atom()}, {10, char()}, {3, binary()}]) ⇒ 23 (chances of getting an atom are 1:10)

function([Arg], Ret)

Generates anonymous functions

function([char(), bool()], atom()) → #Fun<proper_gen.25.96459342>

integer()

Integers

89234

list()

List of terms, equivalent to list(any())

any of the samples, as list elements

list(Type)

A list of terms of type Type

list(boolean()) ⇒ [true, true, false]

loose_tuple(Type)

A tuple with terms of type Type

loose_tuple(boolean()) ⇒ {true, true, false}

neg_integer()

An integer smaller than 0

-1231

non_empty(Gen)

Constraint for binaries and list not being empty

non_empty(list()) ⇒ [abc]

non_neg_float()

A float greater than or equal to 0.0

98.213012312

non_neg_integer()

An integer greater than or equal to 0

98

number()

A float or integer

123

oneof(Types)

Returns the generator of one of those in Types, also union(Types) and elements(Types) ^[1]

oneof([atom(), char()]) ⇒ a

pos_integer()

Integer greater than 0

1

range(Min, Max)

Any integer between Min and Max inclusively

range(1, 1000) ⇒ 596

string()

Equivalent to list(char())

"^DQ^W^{^R/}D" (may generate weird escape sequences!)

term()

Same as any()

any of the samples in this table

timeout()

Union of non_neg_integer() and the atom infinity

312

tuple()

A tuple of random terms

tuple() ⇒ {true, 13.321123, -67}

{T1, T2, …}

A literal tuple with types in it

{boolean(), char()} ⇒ {true, 1}

vector(Len, Type)

A list of length Len of type Type

vector(5, integer()) ⇒ [17,0,1,3,8]

4.4. Writing Properties

What is a Property introduced how to think about properties; this chapter introduces the foundations of an actual property-based testing framework. While a large amount of properties can be written with just the default generators. The biggest(List) function from What is a Property represents a good opportunity to turn theory to practice, with a test-first approach.

All properties must have a name of the form prop_Name(). The harness for PropEr in rebar3 looks at the prop_ prefix, both in the file name and the function, to know which pieces of code to run. The first property can be implemented with the same overall format as the template:

-module(prop_base).
-include_lib("proper/include/proper.hrl").

%%%%%%%%%%%%%%%%%%
%%% Properties %%%
%%%%%%%%%%%%%%%%%%
prop_biggest() ->
    ?FORALL(List, list(integer()),
        begin
            biggest(List) =:= lists:last(lists:sort(List))
        end).

The type generator (list(integer())) is a composition of types, generating a list of integers for each iteration. The type generators for PropEr can be parametrized whenever it makes sense (i.e. a list can take a type as an argument, and so can a tuple, but an integer would not make sense to parametrize).

The biggest/1 function itself may look as follows:

%%%%%%%%%%%%%%%
%%% Helpers %%%
%%%%%%%%%%%%%%%
biggest([Head | _Tail]) ->
    Head.

This implementation is obviously wrong, since the biggest element is not always gonna be the first of the list. Running the tests will only confirm that:

→ rebar3 proper
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_biggest()
!
Failed: After 1 test(s).
An exception was raised: error:function_clause.
Stacktrace: [{prop_base,biggest, (1)
                        [[]],    (2)
                        [{file,"/Users/ferd/code/self/pbt/pbt/test/prop_base.erl"},
                         {line,16}]},
             {prop_base,'-prop_biggest/0-fun-0-',1,
                        [{file,"/Users/ferd/code/self/pbt/pbt/test/prop_base.erl"},
                         {line,10}]}].
[] (3)

Shrinking (0 time(s))
[] (4)

1	The stack trace tells us that `prop_base:biggest([])` failed.
2	The argument list in the stacktrace for the failing function has a single argument in it, the empty list (`[]`)
3	The empty list created by the generator caused the failure and is reported
4	The framework attempts to shrink the failing test case further, but the empty list cannot be simplified further. This is as simple as the output will be.

Getting the biggest entry out of an empty list is nonsensical and should reasonably fail. The use case can be ignored by forcing the property to run on non-empty lists:

prop_biggest() ->
    ?FORALL(List, non_empty(list(integer())),  (1)
        begin
            biggest(List) =:= lists:last(lists:sort(List))
        end).

1	the `non_empty()` generator has been added around the list of integers

Running the test again, the result is now:

→ rebar3 proper
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_biggest()
.........!
Failed: After 10 test(s).
[-1,6,3,1]

Shrinking ...(3 time(s))
[0,1]
===>
0/1 properties passed, 1 failed
===> Failed test cases:
prop_base:prop_biggest() -> false

9 tests passed. The tenth one failed. Initially, it did so with the list of numbers [-1,6,4,1]. PropEr then shrunk the list to a simpler expression that triggers the failing case: a list with two numbers, where the second one is bigger than the first one.

This is legitimate output causing a legitimate failure. The code needs to be patched:

%%%%%%%%%%%%%%%
%%% Helpers %%%
%%%%%%%%%%%%%%%
biggest([Head | Tail]) ->
    biggest(Tail, Head).

biggest([], Biggest) ->
    Biggest;
biggest([Head|Tail], Biggest) when Head > Biggest ->
    biggest(Tail, Head);
biggest([Head|Tail], Biggest) when Head < Biggest ->
    biggest(Tail, Biggest).

This code iterates over the entire list while keeping note of the biggest element seen at any given point. Whenever an element is bigger than the noted element, it replaces it. At the end of the iteration, the biggest element seen is returned.

With the new implementation, the previous case should be resolved:

→ rebar3 proper
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_biggest()
.................!
Failed: After 18 test(s).
An exception was raised: error:function_clause.
Stacktrace: [{prop_base,biggest,
                        [[0,6,2,17],0],
                        [{file,"/Users/ferd/code/self/pbt/pbt/test/prop_base.erl"},
                         {line,19}]},
             {prop_base,'-prop_biggest/0-fun-0-',1,
                        [{file,"/Users/ferd/code/self/pbt/pbt/test/prop_base.erl"},
                         {line,10}]}].
[0,-4,0,6,2,17]

Shrinking ..(2 time(s))
[0,0]
===>
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_base:prop_biggest() -> false

After 18 runs, property-based testing found a bug. The initial failing case was [0,-4,0,6,2,17]. The actual problem with that result is not necessarily obvious. Shrinking however reduced the counterexample to [0,0], and the interpretation is simpler: if a list is being analyzed and the currently largest item is equal to the one being looked at, the comparison fails.

A quick patch can address this:

biggest([Head | Tail]) ->
    biggest(Tail, Head).

biggest([], Biggest) ->
    Biggest;
biggest([Head|Tail], Biggest) when Head >= Biggest -> (1)
    biggest(Tail, Head);
biggest([Head|Tail], Biggest) when Head < Biggest ->
    biggest(Tail, Biggest).

1	the `>` operator is changed for `>=` in order to handle equality.

The property finally holds:

→ rebar3 proper
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_biggest()
....................................................................................................
OK: Passed 100 test(s).
===>
1/1 properties passed

In fact, the property can be exercised more by running it 10,000 times:

→ rebar3 proper -n 10000
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_biggest()
....................................... [...]
OK: Passed 10000 test(s).
===>
1/1 properties passed

That function being properly tested hinges on trusting the lists:sort/1 function. It is a trustable function since it is in the standard library. For the sake of the exercise, a property for it could be:

prop_sort() ->
    ?FORALL(List, list(term()),
        begin
            is_ordered(lists:sort(List))
        end).

[...]

is_ordered([A,B|T]) ->
    A =< B andalso is_ordered([B|T]);
is_ordered(_) -> % smaller lists
    true.

The is_ordered/1 function implements the check described in Alternate wording of properties—"each number in a sorted list is smaller than (or equal to) the one that follows."

Calling rebar3 proper will check both properties. A single property can be run by specifying it with the -p argument:

→ rebar3 proper -p prop_sort
===> Verifying dependencies...
===> Compiling pbt
===> Testing prop_base:prop_sort()
....................................................................................................
OK: Passed 100 test(s).
===>
1/1 properties passed

The only untrusted function left is lists:last():

prop_last() ->
    ?FORALL({List, Last}, {list(term()), term()},
        begin
            NewList = lists:append(List, [Last]),
            Last =:= lists:last(NewList)
        end).

Rather than just picking a static value to be used as a last entry for every list, the generator is a two-element tuple. The first element is the list, and the second one is a generated integer to be used as a final value.

The property will pass every time.

4.5. Questions

4.5.1. Question 1

What are 3 types of strategies that can be used to help design properties?

4.5.2. Question 2

What is a function call to get a sample of a generator’s data?

4.5.3. Question 3

Explain what the following property could be testing or demonstrating:

prop_a_sample() ->
    ?FORALL({Start,Count}, {integer(), non_neg_integer()},
        begin
            List = lists:seq(Start, Start+Count),
            Count+1 =:= length(List)
            andalso
            increments(List)
        end).

increments([Head | Tail]) -> increments(Head, Tail).

increments(_, []) -> true;
increments(N, [Head|Tail]) when Head == N+1 -> increments(Head, Tail);
increments(_, _) -> false.

This is made more difficult than it needs to be because there are no comments in the code.

4.5.4. Question 4

Using prop_sort() as a source of inspiration, write some properties to show that lists:keysort/2 works properly

4.5.5. Question 5

The following property uses a model to validate that it works fine. However, the property is failing. Can you figure out if the model or the system under test is to blame? Can you fix it?

prop_set_union() ->
    ?FORALL({ListA, ListB}, {list(number()), list(number())},
        begin
            SetA = sets:from_list(ListA),
            SetB = sets:from_list(ListB),
            ModelUnion = lists:sort(ListA ++ ListB),
            lists:sort(sets:to_list(sets:union(SetA, SetB))) =:= ModelUnion
        end).

4.5.6. Question 6

The following property verifies that two dictionaries being merged results in each of the entries being in there only once by comparing each of the keys is present:

prop_dict_merge() ->
    ?FORALL({ListA, ListB}, {list({term(), term()}), list({term(), term()})},
        begin
            Merged = dict:merge(fun(_Key, V1, _V2) -> V1 end,
                                dict:from_list(ListA),
                                dict:from_list(ListB)),
            extract_keys(lists:sort(dict:to_list(Merged)))
            ==
            lists:usort(extract_keys(ListA ++ ListB))
        end).

extract_keys(List) -> [element(1, T) || T <- List].

We however assert that the test is not solid enough because it only superficially tests the result of merging both dictionaries. What parts of its inputs and outputs are not being validated properly, and how would you improve it?

4.5.7. Question 7

Write a function that counts the number of words in a given string of text, and write a property that validates its implementation. Only consider spaces (' ') to be word-separators.

4.6. Solutions

4.6.1. Question 1

What are 3 types of strategies that can be used to help design properties?

Modeling; comparing the implementation with a simpler but obviously correct one
Wording the specification differently, and generalizing traditional tests
Symmetric properties; related functions are all tested together

4.6.2. Question 2

What is a function call to get a sample of a generator’s data?

proper_gen:pick(proper_type:Type())

4.6.3. Question 3

Explain what the following property could be testing or demonstrating:

prop_a_sample() ->
    ?FORALL({Start,Count}, {integer(), non_neg_integer()},
        begin
            List = lists:seq(Start, Start+Count),
            Count+1 =:= length(List)
            andalso
            increments(List)
        end).

increments([Head | Tail]) -> increments(Head, Tail).

increments(_, []) -> true;
increments(N, [Head|Tail]) when Head == N+1 -> increments(Head, Tail);
increments(_, _) -> false.

This is made more difficult than it needs to be because there are no comments in the code.

What the property is doing is validating the lists:seq(Start, Stop) function, which would be expected to return a list of integers in the range [Start, …, Stop]. For example, running lists:seq(2,5) should return [2,3,4,5]. The property does the validation of this by looking at two aspects of such a list:

the list should contain as many entries as the range covered by both terms (2..5 has 4 entries, or just (5-2)+1)
to avoid having the test succeed on outputs such as [1,1,1,1], the increments/1 function is used to ensure that each number is greater than the following one.

4.6.4. Question 4

Using prop_sort() as a source of inspiration, write some properties to show that lists:keysort/2 works properly

Two example solutions:

%% @doc this function tests that any lists of `{Key,Val}' pairs
%% end up being able to be sorted by the key by using `lists:keysort/2'.
prop_keysort1() ->
    ?FORALL(List, list({term(),term()}),
        begin
            %% is_key_ordered checks that all tuples' keys are
            %% ordered. Modified from `prop_base:is_ordered/1'
            is_key_ordered(lists:keysort(1, List))
        end).

is_key_ordered([{A,_},{B,_}=BTuple|T]) ->
        A =< B andalso is_key_ordered([BTuple|T]);
is_key_ordered(_) -> % smaller lists
        true.

%% @doc This function instead works by using random tuples with various
%% sizes, and picking a random key to test it.
%% This tests broader usages of lists:keysort, such as
%% `lists:keysort(2, [{a,b},{e,f,g},{t,a,n,e}])' yielding the list
%% `[{t,a,n,e},{a,b},{e,f,g}]', where the comparison takes place
%% on the second element of each tuple.
%%
%% While more complete than the previous one, this function
%% does not accurately portray the need for stability in the
%% function as documented: `[{a,b}, {a,a}]' being sorted will
%% not be tested here!
%% Those can either be added in a regular test case, or would
%% require devising a different property.
prop_keysort2() ->
    ?FORALL(List, non_empty(list(non_empty(list()))),
        begin
            %% Since the default built-in types do not let easily
            %% create random sized-tuples that do not include `{}',
            %% which is not working with `lists:keysort/2', we
            %% create variable-size tuples ourselves.
            Tuples = [list_to_tuple(L) || L <- List],
            %% To know what position to use, we're going to use
            %% the smallest, to avoid errors
            Pos = lists:min([tuple_size(T) || T <- Tuples]),
            Sorted = lists:keysort(Pos, Tuples),
            Keys = extract_keys(Pos, Sorted),
            %% The keys returned by keysort have to be in the
            %% same order as returned by `lists:sort/1', which
            %% we now trust.
            Keys == lists:sort(Keys)
        end).

extract_keys(Pos, List) -> [element(Pos, T) || T <- List].

4.6.5. Question 5

The following property uses a model to validate that it works fine. However, the property is failing. Can you figure out if the model or the system under test is to blame? Can you fix it?

prop_set_union() ->
    ?FORALL({ListA, ListB}, {list(number()), list(number())},
        begin
            SetA = sets:from_list(ListA),
            SetB = sets:from_list(ListB),
            ModelUnion = lists:sort(ListA ++ ListB),
            lists:sort(sets:to_list(sets:union(SetA, SetB))) =:= ModelUnion
        end).

The problem is with the model; sets do not generally allow duplicate elements. In this case, the call that generates ModelUnion adds both lists together. This inadvertently maintains duplicate elements, which the actual sets module avoids.

If the call to lists:sort is changed to lists:usort (which removes duplicated elements) when handling ModelUnion, the property will adequately represent sets by removing duplicate elements and pass.

4.6.6. Question 6

The following property verifies that two dictionaries being merged results in each of the entries being in there only once by comparing each of the keys is present:

prop_dict_merge() ->
    ?FORALL({ListA, ListB}, {list({term(), term()}), list({term(), term()})},
        begin
            Merged = dict:merge(fun(_Key, V1, _V2) -> V1 end,
                                dict:from_list(ListA),
                                dict:from_list(ListB)),
            extract_keys(lists:sort(dict:to_list(Merged)))
            ==
            lists:usort(extract_keys(ListA ++ ListB))
        end).

extract_keys(List) -> [element(1, T) || T <- List].

The function is shaky because it validates only the keys portion of dictionary merging. The values resulting from the merge operation are untouched, and the test makes no mention of how it may resolve or report conflicts. To be safe, the property should either take into account the conflict-resolution function for merging, or a second property should be added to cover it.

4.6.7. Question 7

Write a function that counts the number of words in a given string of text, and write a property that validates its implementation. Only consider spaces (' ') to be word-separators.

First the function:

word_count(String) ->
    Stripped = string:trim(dedupe_spaces(String), both, " "),
    Spaces = lists:sum([1 || Char <- Stripped, Char =:= $\s]),
    case Stripped of
        "" -> 0;
        _ -> Spaces + 1
    end.

dedupe_spaces([]) -> [];
dedupe_spaces([$\s,$\s|Rest]) -> dedupe_spaces([$\s|Rest]);
dedupe_spaces([H|T]) -> [H|dedupe_spaces(T)].

And the test, using an alternative implementation:

prop_word_count() ->
    ?FORALL(String, non_empty(string()),
            word_count(String) =:= alt_word_count(String)
    ).

alt_word_count(String) -> space(String).

space([]) -> 0;
space([$\s|Str]) -> space(Str);
space(Str) -> word(Str).

word([]) -> 1;
word([$\s|Str]) -> 1+space(Str);
word([_|Str]) -> word(Str).

1. in QuickCheck, oneof() shrinks towards a failing element whereas elements() shrinks towards the first element of the list. PropEr does not make that distinction.

PropEr Testing