10. Case Study: Stateful Properties with a Bookstore
To demonstrate the power of the stateful approach to property tests, a somewhat complex application involving a database will be used. The application relies on a PostgreSQL database. The examples of this chapter will require a valid installation in order to run code samples. Multiple guides exist, following any of them should work.
The project replicates a simple version of inventory management in a bookstore. The project itself is already written as an OTP application, and the tests are being added after the fact. The rebar.config file for the code base is:
{plugins, [rebar3_proper]}.
{escript_main_app, crud}.
{escript_name, "crud_init"}.
{escript_emu_args, "%%! -escript main crud_init\n"}.
{deps, [eql, {pgsql, "26.0.1"}, {uuid, "1.7.1", {pkg, uuid_erl}}]}.
{profiles, [{test, [{deps, [{proper, "1.2.0"}]}]}]}.
{shell, [{apps, [crud]}]}.
This shows the PropEr plugin being used, an Escript being generated (to set up the database), and dependencies on pgsql
, a database client, eql
, a library used to access SQL queries within Erlang, and uuid
, to be used for primary keys.
The script to set up the database is just a series of shell commands called from within Erlang:
-module(crud_init).
-export([main/1]).
main(_) ->
%% Only tested on linux and OSX
%% See also: https://www.postgresql.org/docs/9.6/static/server-start.html
ok = filelib:ensure_dir("postgres/data/.init-here"),
io:format("initializing database structure...~n"),
io:format("~s~n", [os:cmd("initdb -D postgres/data")]),
io:format("starting postgres instance...~n"),
io:format("~s~n", [os:cmd("pg_ctl -D postgres/data -l logfile start")]),
timer:sleep(5000),
io:format("setting up 'crud_db' database...~n"),
io:format("~s~n", [os:cmd("psql -h localhost -d template1 -c 'CREATE DATABASE crud_db;'")]),
io:format("OK.~n"),
init:stop().
The script can be built by calling rebar3 escriptize
and called as follows:
$ rebar3 escriptize
[...]
===> Building escript...
→ _build/default/bin/crud_init
initializing database structure...
[...]
OK.
With this in place, the application is ready to run. The application is stateful and based on Postgres, with the following crud.app.src
app file:
{application, crud,
[{description, "CRUD book handling"},
{vsn, "0.1.0"},
{registered, []},
{mod, {crud_app, []}},
{applications, [kernel, stdlib, eql, pgsql]},
{env,[
{pg, [
{user, "ferd"}, % $USER
{password, ""},
{database, "crud_db"}, % as specified by crud_init.erl
{host, "127.0.0.1"},
{port, 5432},
{ssl, false} % not for tests!
]}
]}
]}.
Which should be updated for the local user. The application starts a standard supervisor without children that loads the SQL queries via a call to the crud_db
module:
-module(crud_app).
-behaviour(application).
-export([start/2, stop/1]).
start(_StartType, _StartArgs) -> crud_sup:start_link().
stop(_State) -> ok.
-module(crud_sup).
-behaviour(supervisor).
-export([start_link/0, init/1]).
start_link() ->
crud_db:load_queries(),
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) -> {ok, {{one_for_all, 0, 1}, []}}.
The entire application manages books through the set of (unoptimized) SQL queries that follow:
-- Setup the database
-- :setup_table_books
CREATE TABLE books (
isbn varchar(20) PRIMARY KEY,
title varchar(256) NOT NULL,
author varchar(256) NOT NULL,
owned smallint DEFAULT 0,
available smallint DEFAULT 0
);
-- Clean up the table
-- :teardown_table_books
DROP TABLE books;
-- Add a book
-- :add_book
INSERT INTO books (isbn, title, author, owned, available)
VALUES ($1, $2, $3, $4, $5 );
-- Add a copy of an existing book
-- :add_copy
UPDATE books SET
owned = owned + 1,
available = available + 1
WHERE isbn = $1;
-- Borrow a copy of a book
-- :borrow_copy
UPDATE books SET available = available - 1 WHERE isbn = $1 AND available > 0;
-- Return a copy of a book
-- :return_copy
UPDATE books SET available = available + 1 WHERE isbn = $1;
-- Find books
-- :find_by_author
SELECT * FROM books WHERE author LIKE $1;
-- :find_by_isbn
SELECT * FROM books WHERE isbn = $1;
-- :find_by_title
SELECT * FROM books WHERE title LIKE $1;
The queries are used in a final module that ties things together:
-module(crud_db).
-export([load_queries/0, setup/0, teardown/0,
add_book/3, add_book/5, add_copy/1, borrow_copy/1, return_copy/1,
find_book_by_author/1, find_book_by_isbn/1, find_book_by_title/1]).
setup() ->
run_query(setup_table_books, []).
teardown() ->
run_query(teardown_table_books, []).
add_book(ISBN, Title, Author) ->
add_book(ISBN, Title, Author, 0, 0).
add_book(ISBN, Title, Author, Owned, Available) ->
BinTitle = unicode:characters_to_binary(Title),
BinAuthor = unicode:characters_to_binary(Author),
case run_query(add_book, [ISBN, BinTitle, BinAuthor, Owned, Available]) of
{{insert,0,1},[]} -> ok;
{error, Reason} -> {error, Reason};
Other -> {error, Other}
end.
add_copy(ISBN) ->
handle_single_update(run_query(add_copy, [ISBN])).
borrow_copy(ISBN) ->
handle_single_update(run_query(borrow_copy, [ISBN])).
return_copy(ISBN) ->
handle_single_update(run_query(return_copy, [ISBN])).
find_book_by_author(Author) ->
handle_select(run_query(find_by_author, [iolist_to_binary(["%",Author,"%"])])).
find_book_by_isbn(ISBN) ->
handle_select(run_query(find_by_isbn, [ISBN])).
find_book_by_title(Title) ->
handle_select(run_query(find_by_title, [iolist_to_binary(["%",Title,"%"])])).
with_connection(Fun) ->
{ok, Conn} = connect(),
Res = Fun(Conn),
close(Conn),
Res.
run_query(Name, Args) ->
with_connection(fun(Conn) -> run_query(Name, Args, Conn) end).
run_query(Name, Args, Conn) ->
pgsql_connection:extended_query(query(Name), Args, Conn).
handle_select({{select, _}, List}) -> {ok, List};
handle_select(Error) -> Error.
handle_single_update({{update,1}, _}) -> ok;
handle_single_update({error, Reason}) -> {error, Reason};
handle_single_update(Other) -> {error, Other}.
connect() -> connect(application:get_env(crud, pg, [])).
connect(Args) ->
case pgsql_connection:open(Args) of
{pgsql_connection, _} = Conn -> {ok, Conn};
{error, _} = Err -> Err
end.
close(Conn) -> pgsql_connection:close(Conn).
load_queries() ->
ets:new(crud_sql, [named_table, public, set, {read_concurrency, true}]),
SQLFile = filename:join(code:priv_dir(crud), "queries.sql"),
{ok, Queries} = eql:compile(SQLFile),
ets:insert(crud_sql, Queries),
ok.
query(Name) ->
case ets:lookup(crud_sql, Name) of
[] -> {query_not_found, Name};
[{_, Query}] -> Query
end.
With this application, integration tests would be expected to be complex to write, not just because of the environment, but because of the validation rules that need to be written. Stateful property-based tests will make things simple.
10.1. Sequential Tests
Before starting to elaborate the model that will be used, defining the common generators for all interactions in the bookstore will be essential:
title() -> ?LET(S, string(), elements([S, unicode:characters_to_binary(S)])). (1)
author() -> ?LET(S, string(), elements([S, unicode:characters_to_binary(S)])). (2)
isbn() -> ?LET(ISBN, oneof([isbn13(), isbn10()]), iolist_to_binary(ISBN)). (3)
isbn13() ->
?LET({Code, ISBN}, {oneof(["978","979"]), isbn10()}, Code ++ "-" ++ ISBN).
isbn10() ->
?LET(Group, group(),
?LET({PubTitle, Check}, {publisher_title(length(Group)), check()},
Group ++ "-" ++ PubTitle ++ "-" ++ Check)).
group() -> (4)
?LET(Group, oneof([range(0,5), 7, range(80,94), range(600,621),
range(950,989), range(9926,9989), range(99901,99976)]),
integer_to_list(Group)).
publisher_title(GroupSize) -> (5)
Size = 9 - GroupSize,
?LET({Pos, PubTitle}, {range(1, Size-1), vector(Size, range($0,$9))},
begin
{Publisher, Title} = lists:split(Pos, PubTitle),
Publisher ++ "-" ++ Title
end).
check() -> [oneof([range($0,$9), $X])]. (6)
1 | The title of a book is any string. The API should tolerate both strings defined as lists of unicode codepoints or as utf8-encoded binaries, so both are generated |
2 | Authors are also arbitrary unicode strings |
3 | The ISBN stands for the International Standard Book Number, a series of 10 or 13 digits that uniquely identify every book published. The result is always coerced to a binary to help deal with pgsql that always returns binaries. |
4 | The groups identify the country of publishing or the language of the text, or in some cases, both. |
5 | The two middle parts of the ISBN are a specific code granted to a publisher, which can then assign the remaining numbers to identify the books they publish |
6 | The final digit is a check for error detection; this generator does not bother with it. There are 11 possible digits, going from 0 to 9, and then the letter X , standing for 10 in roman numerals. |
The property itself matches the general form shown in the previous chapter:
-module(prop_crud).
-include_lib("proper/include/proper.hrl").
-compile(export_all).
-export([command/1, initial_state/0, next_state/3,
precondition/2, postcondition/3]).
prop_test() ->
?FORALL(Cmds, commands(?MODULE),
begin
setup(),
{History, State, Result} = run_commands(?MODULE, Cmds),
teardown(),
?WHENFAIL(io:format("=======~n"
"Failing command: ~p~n"
"At state: ~p~n"
"=======~n"
"Result: ~p~n"
"History: ~p~n",
[lists:nth(length(History), Cmds),
State,Result,History]),
aggregate(command_names(Cmds), Result =:= ok))
end).
setup() ->
{ok, _} = application:ensure_all_started(crud),
crud_db:setup().
teardown() ->
crud_db:teardown().
The ?WHENFAIL
command is a little bit fancier than those seen before, and should help show more significant output if or when the tests fail.
10.1.1. Wide Scanning
The model can then be elaborated. To ensure the general test environment works, starting wide by doing basic command generation and executing the test without an active model works well:
%% Initial model value at system start. Should be deterministic.
initial_state() -> #{}.
command(_State) -> (1)
oneof([
{call, crud_db, add_book, [isbn(), title(), author(), 1, 1]},
{call, crud_db, add_copy, [isbn()]},
{call, crud_db, borrow_copy, [isbn()]},
{call, crud_db, return_copy, [isbn()]},
{call, crud_db, find_book_by_author, [author()]}, % or author part
{call, crud_db, find_book_by_title, [title()]}, % or title part
{call, crud_db, find_book_by_isbn, [isbn()]}
]).
%% Picks whether a command should be valid under the current state.
precondition(_State, {call, _Mod, _Fun, _Args}) -> (2)
true.
%% Given the state `State' *prior* to the call `{call, Mod, Fun, Args}',
%% determine whether the result `Res' (coming from the actual system)
%% makes sense.
postcondition(_State, {call, _Mod, _Fun, _Args}, _Res) ->
true.
%% Assuming the postcondition for a call was true, update the model
%% accordingly for the test to proceed.
next_state(State, _Res, {call, _Mod, _Fun, _Args}) ->
NewState = State,
NewState.
1 | All interface functions are added with arguments based on the generators defined earlier. |
2 | The rest of the stateful test interface is left untouched, to its default state. |
Running the tests works, but detects a failure (and quite a large one since pgsql
dies very noisily):
$ rebar3 proper
[...]
** Reason for termination ==
** {badarg,[{erlang,list_to_binary,
[[12,24,6,36,18,5,14,343,27,24,26,10,35,55,14,3,53,0]],
[]},
{pgsql_protocol,encode_parameter,4,
[{file,"/Users/ferd/code/self/pbt/book/crud/_build/default/lib/pgsql/src/pgsql_protocol.erl"},
{line,175}]},
[...]
(4 time(s))
[{set,{var,2},{call,crud_db,add_book,[[54,49,49,45,51,48,56,49,45,50,56,45,48],[12,24,6,36,18,5,14,343,27,24,26,10,35,55,14,3,53,0],<<0,2,13,51,2,27>>,1,1]}}]
The log output gets mixed up with the PropEr output, but looking at the end of the dump, the last error from the last shrinking attempt can be found with the problematic item. In this case, the postgres driver tries to encode a string (list of unicode codepoints) but chokes on it. By mapping it to the arguments in the symbolic call, it appears the title is a problem. It seems that the pgsql library cannot handle Erlang’s unicode strings, only the UTF-8 encoded binaries.
Erlang’s Unicode situation is complex. There are multiple formats of strings:
All types can be converted and handled by the |
By changing the database queries to always convert all forms of input to utf8-encoded binaries, the bug should go away:
add_book(ISBN, Title, Author) ->
add_book(ISBN, Title, Author, 0, 0).
add_book(ISBN, Title, Author, Owned, Available) ->
BinTitle = unicode:characters_to_binary(Title), (1)
BinAuthor = unicode:characters_to_binary(Author), (2)
case run_query(add_book, [ISBN, BinTitle, BinAuthor, Owned, Available]) of
{{insert,0,1},[]} -> ok;
{error, Reason} -> {error, Reason};
Other -> {error, Other}
end.
[...]
find_book_by_author(Author) -> (3)
handle_select(run_query(find_by_author, [unicode:characters_to_binary(["%",Author,"%"])])).
find_book_by_isbn(ISBN) ->
handle_select(run_query(find_by_isbn, [ISBN])).
find_book_by_title(Title) -> (4)
handle_select(run_query(find_by_title, [unicode:characters_to_binary(["%",Title,"%"])])).
Running the tests then fails on unicode values of 0x00
, meaning that the protocol likely uses zero-delimited strings and using them causes failures. The generators also need to be updated to succeed:[1]
title() -> friendly_unicode().
author() -> friendly_unicode().
friendly_unicode() ->
?LET(X, ?SUCHTHAT(S, string(),
not lists:member(0, S) andalso
nomatch =:= string:find(S, "_") andalso
nomatch =:= string:find(S, "%")), (1)
elements([X, unicode:characters_to_binary(X)])).
1 | Also ensures the string does not use SQL search patterns, because those are making the models complex fast! |
The tests pass, but aside from ensuring the system does not crash, nothing is validated. If the system’s return values were less general, it could make sense to add postconditions straight away, possibly looking like:
postcondition(_State, {call, _, add_book, _Args}, Res) ->
Res =:= ok;
postcondition(_State, {call, _, find_book_by_isbn, _Args}, {ok,L}) ->
is_list(L);
[...]
postcondition(_State, {call, _Mod, _Fun, _Args}, _Res) ->
false.
The framework would then walk through the system, validating output as it goes. The weakness of this approach is that it limits what can be validated. It is hard to know how many results find_book_*
functions should return without tracking them, as it may be difficult to update books without knowing the ones already in the system. The validation may be limited to what type analysis would provide, and a type analyzer is a better tool to do that kind of work than tests.
10.1.2. Precise Testing
Using a model allows to dig deeper into what the system can or should do. Being able to look at a simpler version of the program behaviour to predict correct expected results allows thorough testing of program behaviour. For the book system at hand, the simplest thing to do is to use a map to represent the database. On every function call made, the next_state
function tracks what happens:
next_state(State, _, {call, _, add_book, [ISBN,Title,Author,Owned,Avail]}) ->
State#{ISBN => {ISBN, Title, Author, Owned, Avail}}; (1)
next_state(State, _, {call, _, add_copy, [ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned+1, Avail+1}}; (2)
next_state(State, _, {call, _, return_copy, [ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned, Avail+1}}; (3)
next_state(State, _, {call, _, borrow_copy, [ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned, Avail-1}}; (4)
next_state(State, _Res, {call, _Mod, _Fun, _Args}) ->
NewState = State, (5)
NewState.
1 | The database uses the user-submitted ISBNs as a primary key |
2 | Adding a copy increments both the number of available and owned copies |
3 | Returning a book only increments the number of available copies, not the number owned. |
4 | Borrowing a book decreases the number of available copies |
5 | All other operations do not impact the state |
It is important to note that the results (_Res
) are ignored. Since all of the data regarding books can be derived deterministically from the arguments, there is no need to consider the values the system would return. The variable is useful in case a return value is used, although it cannot be pattern-matched against, nor transformed without symbolic calls since it may be a placeholder during the command generation phase. Avoiding the return value when possible is usually a good idea.
That gives an overall model of the expected execution flow, but without validation. An option would be to set up broad postconditions such as:
-
the number of available copies for a book is always at least 0
-
the number of owned copies for a book is always greater than 0
-
the number of available copies for a book is never greater than the number of owned copies
-
an error is returned whenever someone tries to borrow a copy of an unavailable book
The first 3 points can only be validated through the model, or when looking up a book. In fact, the next_state
transition is probably wrong there since it allows the model to go below 0 on a bunch of copies. The issue can be resolved in one of many ways:
-
Use
precondition
to prevent the bad state from happening innext_state
. Preconditions would invalidate the bad sequence of calls, but would remove the ability to do negative tests based on that behaviour as well. -
Do manual validation in
next_state
to ensure the constraints are always respected in the model. This is workable, but a bit backwards. Since the calls can yield different results based on the state, then the state transition has to reproduce the decisions made by the system at every round to make sure their logic matches. If the model becomes as complex as the application, then the tests become less reliable.[2] -
Introduce more determinism in the commands. A given call could yield a single type of expected results, making the transitions simpler to reason about. This is impractical if one does not want to change the API of code unless a shim is used. A shim is a middleman that can be used to wrap actual calls to the system into a format that appears predictable, allowing the model to make assumptions about the execution taking place.
A bit of all three approaches will be used in various situations based on what the developer desires. The first one is more useful when the call should not logically happen in some context (such as a 'log out' functionality not being possible without being logged in). The second and third one are fairly equivalent, and should be balanced according to complexity. Some obvious cases can be done easily in next_state
, but trickier ones will require behaviour similar to the one needed in the application to be reliable. For these cases, a deterministic shim becomes interesting since it guarantees that the logic to find out about conditions (such as 'the entry is duplicate') within the program will not be repeated in the test code. Note that preconditions will be required to help enforce determinism constraints nonetheless.
The shim module only needs to 'tunnel' the function calls and allow specific keys to mark determinism:
-module(shim_crud).
-compile(export_all).
add_book(_Key, ISBN, Title, Author, Owned, Avail) ->
crud_db:add_book(ISBN, Title, Author, Owned, Avail).
add_copy(_Key, Id) -> crud_db:add_copy(Id).
borrow_copy(_Key, Id) -> crud_db:borrow_copy(Id).
return_copy(_Key, Id) -> crud_db:return_copy(Id).
find_by_author(_Key, Author) -> crud_db:find_book_by_author(Author).
find_by_title(_Key, Title) -> crud_db:find_book_by_title(Title).
find_by_isbn(_Key, ISBN) -> crud_db:find_book_by_isbn(ISBN).
That is all it does. The key is always ignored; its only role to let the model annotate or classify specific calls:
command(S) -> (1)
Base = [{call, shim_crud, add_book, [new, isbn(), title(), author(), 1, 1]},
{call, shim_crud, add_copy, [invalid, isbn()]}, (2)
{call, shim_crud, borrow_copy, [invalid, isbn()]},
{call, shim_crud, return_copy, [invalid, isbn()]},
{call, shim_crud, find_by_author, [invalid, author()]},
{call, shim_crud, find_by_title, [invalid, title()]},
{call, shim_crud, find_by_isbn, [invalid, isbn()]}],
Stateful = case maps:size(S) of
1 -> % no values, just the state
[];
_ ->
[{call, shim_crud, add_book, [exists,isbn(S),title(),author(),1,1]},
{call, shim_crud, add_copy, [exists, isbn(S)]}, (3)
{call, shim_crud, borrow_copy, [avail, isbn(S)]},
{call, shim_crud, borrow_copy, [unavail, isbn(S)]},
{call, shim_crud, return_copy, [taken, isbn(S)]},
{call, shim_crud, return_copy, [full, isbn(S)]},
{call, shim_crud, find_by_author, [exists, author(S)]},
{call, shim_crud, find_by_title, [exists, title(S)]},
{call, shim_crud, find_by_isbn, [exists, isbn(S)]}]
end,
oneof(Base ++ Stateful).
[...]
isbn(State) -> (4)
elements(maps:keys(State)).
author(State) ->
elements([partial(Author) || {_,_,Author,_,_} <- maps:values(State)]).
title(State) ->
elements([partial(Title) || {_,Title,_,_,_} <- maps:values(State)]).
partial(String) ->
L = string:length(String), (5)
?LET({Start, Len}, {range(0, L), non_neg_integer()},
string:slice(String, Start, Len)).
1 | The new value lets other functions know that the ISBN generated should not already exist |
2 | All calls annotated invalid require the given ISBN not to exist in the model’s state |
3 | Generators can make decisions based on the model state |
4 | The new generators introduced |
5 | Uses Unicode string functionality only available in Erlang/OTP 20 and later |
The commands are divided in two sections, those that require entries to exist and those that do not. Every shimmed call has a key acting as a qualifier, allowing to enforce determinism. The keys are generally chosen by exploration of the model. By iterative refinement when trying to represent the program in callbacks such as next_state
or postcondition
, the distinct use cases come up and can be transferred one after the other into the calls.
Since the commands now have more stringent requirements, preconditions can be used to ensure they will be applied correctly:
%% Picks whether a command should be valid under the current state.
precondition(S, {_, _, add_book, [new, ISBN | _]}) -> not has_isbn(S, ISBN); (1)
precondition(S, {_, _, find_by_author, [invalid, A]}) -> not like_auth(S, A); (2)
precondition(S, {_, _, find_by_title, [invalid, T]}) -> not like_title(S, T);
precondition(_, {_, _, find_by_author, _}) -> true;
precondition(_, {_, _, find_by_title, _}) -> true; (3)
precondition(_, {_, _, find_by_isbn, _}) -> true;
precondition(S, {_, _, _, [invalid, ISBN | _]}) -> not maps:is_key(ISBN, S); (4)
precondition(S, {_, _, borrow_copy, [Key, ISBN]}) ->
#{ISBN := {_, _, _, _, A}} = S,
case Key of (5)
avail -> A > 0;
unavail -> A =:= 0
end;
precondition(S, {_, _, return_copy, [Key, ISBN]}) ->
#{ISBN := {_, _, _, O, A}} = S,
case Key of (6)
taken -> O > 0 andalso O > A;
full -> O =:= A
end;
precondition(_State, {call, _Mod, _Fun, _Args}) -> (7)
true.
[...]
%% Helpers
has_isbn(Map, ISBN) ->
[] =/= [true || {K, _, _, _, _} <- maps:values(Map), K =:= ISBN].
like_auth(Map, Auth) ->
lists:any(fun({_,_,A,_,_}) -> nomatch =/= string:find(A, Auth) end,
maps:values(Map)).
like_title(Map, Title) ->
lists:any(fun({_,T,_,_,_}) -> nomatch =/= string:find(T, Title) end,
maps:values(Map)).
1 | The new keyword on add_book is used through pattern matching to know the ISBN shouldn’t already exist. |
2 | The calls for find_by_author and find_by_title use fuzzy-matching logic (LIKE %STRING% in SQL) and must be filtered out explicitly. Otherwise an empty search matches all values and breaks determinism; ensuring that the call is actually invalid necessitates explicit measures. |
3 | The other find_by_* calls are always going to be valid no matter what. The generators should ensure the preconditions are respected. |
4 | The other calls that have the invalid key all require that the generated ISBN does not exist yet |
5 | Whether a book can be borrowed or not depends on the key and the number of available copies |
6 | Whether a book can be returned or not depends on the number of owned and available copies |
7 | All other calls are assumed to be legal |
The state transitions can then be added with almost no changes, while remaining correct:
next_state(State, _, {call, _, add_book, [new,ISBN,Title,Author,Owned,Avail]}) ->
State#{ISBN => {ISBN, Title, Author, Owned, Avail}};
next_state(State, _, {call, _, add_copy, [exists,ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned+1, Avail+1}};
next_state(State, _, {call, _, return_copy, [taken,ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned, Avail+1}};
next_state(State, _, {call, _, borrow_copy, [avail,ISBN]}) ->
#{ISBN := {ISBN, Title, Author, Owned, Avail}} = State,
State#{ISBN => {ISBN, Title, Author, Owned, Avail-1}};
next_state(State, _Res, {call, _Mod, _Fun, _Args}) ->
NewState = State,
NewState.
Any call that should not be allowed and therefore should not result in modified state is omitted. All that’s left to do is validate that postconditions match:
postcondition(_, {_, _, add_book, [new | _]}, ok) -> (1)
true;
postcondition(_, {_, _, add_book, [exists |_ ]}, {error, _}) ->
true;
postcondition(_, {_, _, add_copy, [exists | _]}, ok) ->
true;
postcondition(_, {_, _, add_copy, [invalid | _]}, {error,not_found}) ->
true;
postcondition(_, {_, _, return_copy, [invalid | _]}, {error,not_found}) ->
true;
postcondition(_, {_, _, return_copy, [taken | _]}, ok) ->
true;
postcondition(_, {_, _, return_copy, [full | _]}, {error,_}) ->
true;
postcondition(_, {_, _, borrow_copy, [invalid | _]}, {error,not_found}) ->
true;
postcondition(_, {_, _, borrow_copy, [avail | _]}, ok) ->
true;
postcondition(_, {_, _, borrow_copy, [unavail | _]}, {error,unavailable}) ->
true;
postcondition(_, {_, _, find_by_isbn, [invalid | _]}, {ok,[]}) ->
true;
postcondition(S, {_, _, find_by_isbn, [exists, ISBN]}, {ok,[Res]}) -> (2)
equals(S, ISBN, Res);
postcondition(_, {_, _, find_by_author, [invalid | _]}, {ok,[]}) ->
true;
postcondition(S, {_, _, find_by_author, [exists, Auth]}, {ok,Res}) -> (3)
lists:all(fun({_,_,A,_,_}) -> nomatch =/= string:find(A, Auth) end, Res)
andalso
lists:all(fun(Rec={ISBN,_,_,_,_}) -> equals(S, ISBN, Rec) end, Res);
postcondition(_, {_, _, find_by_title, [invalid | _]}, {ok,[]}) ->
true;
postcondition(S, {_, _, find_by_title, [exists, Title]}, {ok,Res}) -> (4)
lists:all(fun({_,T,_,_,_}) -> nomatch =/= string:find(T, Title) end, Res)
andalso
lists:all(fun(Rec={ISBN,_,_,_,_}) -> equals(S, ISBN, Rec) end, Res);
postcondition(_State, {call, _Mod, _Fun, _Args}, _Res) -> (5)
io:format("non-matching postcondition: {~p,~p,~p} -> ~p~n",
[_Mod, _Fun, _Args, _Res]),
false.
[...]
equals(Map, ISBNA, {ISBNB, TitleB, AuthorB, OwnedB, AvailB}=_R) ->
case Map of
#{ISBNA := {ISBNA, TitleA, AuthorA, OwnedA, AvailA}} ->
{ISBNA, OwnedA, AvailA} =:= {ISBNB, OwnedB, AvailB}
andalso % deal with equivalence in unicode string formats (6)
string:equal(TitleA, TitleB) andalso string:equal(AuthorA, AuthorB);
_ ->
false
end.
1 | A series of very straightforward validation functions are added, checking the return values |
2 | When reading a value, strong validation can be done by ensuring that the value reported by the system exactly matches the one the model things should be there |
3 | For fuzzy matches on author names, first a validation is done that the result set respects the constraints, and then the opportunity is taken to do the validation of the returned results with what is expected by the model |
4 | The same validation is repeated for title searches. Note that the postcondition only checks that the returned results are valid, not that all possible valid results were returned! |
5 | All calls not matched by a postcondition are considered to be invalid, to avoid accidentally leaving unchecked code paths |
6 | The comparison of records needs to be aware of the possible different values in string representation, done by using string:equal/2 . |
That’s quite a lot of validation.
10.1.3. Debugging
The property is ready for a test run:
→ rebar3 proper
[...]
===> Testing prop_crud:prop_test()
..........................non-matching postcondition: {shim_crud,return_copy,[full,<<"99909-9-619-X">>]} -> ok (1)
!
Failed: After 28 test(s). (2)
[{set,{var,1},{call,shim_crud,borrow_copy,[invalid,<<57,57,51,52,45,57,56,49,50,
45,48,45,56>>]}},{set,{var,2},{call,shim_crud,add_book,[new,<<57,57,57,48,57,45,
54,57,45,55,48,45,88>>,[2,2,23,7,3,7,1,2,16],<<22,5,2>>,1,1]}},{set,{var,3},
{call,shim_crud,find_by_isbn,[exists,<<57,57,57,48,57,45,54,57,45,55,48,45,88>>
]}},{set,{var,4},{call,shim_crud,return_copy,[full,<<57,57,57,48,57,45,54,57,45,
55,48,45,88>>]}}]
=======
Failing command: {set,{var,4}, (3)
{call,shim_crud,return_copy,[full,<<"99909-69-70-X">>]}}
At state: #{<<"99909-69-70-X">> =>
{<<"99909-69-70-X">>,[2,2,23,7,3,7,1,2,16],<<22,5,2>>,1,1}}
=======
Result: {postcondition,false}
History: [{#{},{error,not_found}}, (4)
{#{},ok},
{#{<<"99909-69-70-X">> =>
{<<"99909-69-70-X">>,[2,2,23,7,3,7,1,2,16],<<22,5,2>>,1,1}},
{ok,[{<<"99909-69-70-X">>,
<<2,2,23,7,3,7,1,2,16>>,
<<22,5,2>>,
1,1}]}},
{#{<<"99909-69-70-X">> =>
{<<"99909-69-70-X">>,[2,2,23,7,3,7,1,2,16],<<22,5,2>>,1,1}},
ok}]
Shrinking ===>
0/1 properties passed, 1 failed
===> Failed test cases:
prop_crud:prop_test() -> {'EXIT',
{{badmatch,#{}},
[{prop_crud,precondition,2,
[{file,
"/Users/ferd/code/self/pbt/book/crud/test/prop_crud.erl"},
{line,73}]}, (5)
[...]
1 | Given the shape of the postconditions, this call especially failed. The return value was ok , but for a book copy being returned when the inventory is full, the expected return is an error of any kind. |
2 | The test case failure shows a full list of the generated commands |
3 | The error value shows which command failed (as par of the ?WHENFAIL macro) along with the state the system had when it failed |
4 | The history of states along with the return value at each step is output. Reading through it along with the list of commands allows to replay the test case’s history one call at a time. |
5 | The shrinking failed for the property |
Even if the error is simple to fix without even shrinking, it is concerning that shrinking failed. This should be fixed before moving further. The reason for shrinking to fail is due to the intimate weaving made between the command generation and the precondition. The code avoided double-checking pre-conditions when command generation did it already. But the way shrinking works causes problem. If given a sequence of commands [A,B,C,D,E,F]
that yields a failure, shrinking will be done by progressively removing commands from the sequence to see if a shorter one can cause the problem:
Then the framework might decide that [A,C,F,G]
is the minimal sequence of commands able to reproduce the failure. The caveat with this approach is that if command B
introduced the state required to make the precondition for D
work, removing B
will cause an inconsistency for D
. In the previous example, the third list of commands may crash for the wrong reason. The distinction is very important:
-
Commands can be generated according to the model state, but this can only be seen as an optimization to make generation faster or more accurate
-
Encoding the constraints of command generation into preconditions is mandatory for shrinking to work.
Fixing the preconditions can be done by adding a few more clauses closing the gap, or transforming those that always assumed command generation worked well:
[...]
precondition(S, {_, _, find_by_author, [_, A]}) -> like_auth(S, A); (1)
precondition(S, {_, _, find_by_title, [_, T]}) -> like_title(S, T);
precondition(S, {_, _, find_by_isbn, [_, ISBN]}) -> has_isbn(S, ISBN);
[...]
precondition(S, {_, _, borrow_copy, [Key, ISBN]}) -> (2)
case S of
#{ISBN := {_,_,_,_,A}} when Key =:= avail -> A > 0;
#{ISBN := {_,_,_,_,A}} when Key =:= unavail -> A =:= 0;
_ -> false
end;
precondition(S, {_, _, return_copy, [Key, ISBN]}) ->
case S of
#{ISBN := {_,_,_,O,A}} when Key =:= taken -> O > 0 andalso O > A;
#{ISBN := {_,_,_,O,A}} when Key =:= full -> O =:= A;
_ -> false
end;
[...]
1 | Clauses added along with the important ones because the same problem can be imagined for existing entries |
2 | Main patch for the current bug |
Rather than always assuming a given ISBN can be found for a case, there is a possibility for it not to match. In such a case, the precondition is false (rather than crashing), which lets proper shrinking take place. Re-running the test indeed finds a much simpler case:
→ rebar3 proper
[...]
===> Testing prop_crud:prop_test()
Failed: After 37 test(s).
[LIST OF 12 COMMANDS]
Shrinking non-matching postcondition: [...] (5 time(s))
[{set,{var,1},{call,shim_crud,add_book,[new,<<55,45,50,53,50,55,51,49,45,55,56,
45,57>>,<<11,4,1,11,1,6,7,22>>,<<194,183,1,5,8,11,16>>,1,1]}},{set,{var,5},
{call,shim_crud,return_copy,[full,<<55,45,50,53,50,55,51,49,45,55,56,45,57>>]}}]
=======
Failing command: {set,{var,5},
{call,shim_crud,return_copy,[full,<<"7-252731-78-9">>]}}
[...]
Simpler steps to reproduce means easier time finding the bugs. In this case, adding a new book and then instantly returning a copy does not return the expected (failing) response. This one can be fixed by fixing the SQL query that is behind it:
-- Return a copy of a book
-- :return_copy
UPDATE books SET available = available + 1
WHERE isbn = $1 AND available < owned;
Along with a matching change in crud_db
:
handle_single_update({{update,1}, _}) -> ok;
handle_single_update({{update,0}, _}) -> {error, unavailable}; (1)
handle_single_update({error, Reason}) -> {error, Reason};
handle_single_update(Other) -> {error, Other}.
1 | Updating with 0 results means no matching entry could be allowed |
This fixes the problem, but then reveals more bugs:
$ rebar3 proper
[...]
non-matching postcondition:
{shim_crud,borrow_copy,[unavail, <<"7-8285-7073-4">>]} -> {error, not_found}
[...]
[{set,{var,4},{call,shim_crud,add_book,[new,<<55,45,56,50,56,53,45,55,48,55,51,45,52>>,[1,13,4,8,13,8],[4,5,7,6,6,1],1,1]}},{set,{var,5},{call,shim_crud,borrow_copy,[avail,<<55,45,56,50,56,53,45,55,48,55,51,45,52>>]}},{set,{var,6},{call,shim_crud,borrow_copy,[unavail,<<55,45,56,50,56,53,45,55,48,55,51,45,52>>]}}]
=======
Failing command: {set,{var,6},
{call,shim_crud,borrow_copy,
[unavail,<<"7-8285-7073-4">>]}}
At state: #{<<"7-8285-7073-4">> =>
{<<"7-8285-7073-4">>,[1,13,4,8,13,8],[4,5,7,6,6,1],1,0}}
[...]
The problem for this one is that the record returns {error, not_found}
whereas the expected result is {error, unavailable}
. To properly report errors, the queries must be broken up into different parts: not finding an entry and returning a result where no legal return value exists cannot be done at the same time if they are to report distinct errors.
The SQL queries can be left unmodified, the flow of how they are called can change in crud_db
:
borrow_copy(ISBN) ->
with_connection(fun(Conn) ->
case handle_select(run_query(find_by_isbn, [ISBN], Conn)) of
{ok, [_Found]} ->
case run_query(borrow_copy, [ISBN], Conn) of
{{update,1}, _} -> ok;
{{update,0}, _} -> {error, unavailable};
{error, Reason} -> {error, Reason};
Other -> {error, Other}
end;
_ ->
{error, not_found}
end
end).
[...]
handle_single_update({{update,1}, _}) -> ok;
handle_single_update({{update,0}, _}) -> {error, not_found};
handle_single_update({error, Reason}) -> {error, Reason};
handle_single_update(Other) -> {error, Other}.
It is not the most elegant code, but it works. All tests pass:
===> Testing prop_crud:prop_test()
....................................................................................................
OK: Passed 100 test(s).
16% {shim_crud,borrow_copy,2}
16% {shim_crud,return_copy,2}
14% {shim_crud,add_copy,2}
13% {shim_crud,find_by_isbn,2}
13% {shim_crud,find_by_title,2}
13% {shim_crud,add_book,6}
12% {shim_crud,find_by_author,2}
Alternative statistics on the length of the command sequences (fetched by changing the aggregate/1
function) tell that the runs can reach up to 40 operations in size without any other prompting. Specifically longer runs can be triggered by asking PropEr to do so. For example, rebar3 proper --start_size=200 -n 10
may take a similar amount of time to run, but with test runs of 70 to 150 commands in length, with a fewer number of runs overall.
10.2. Parallel Tests
The parallel version of the bookstore property test would be:
prop_parallel() ->
?FORALL(Cmds, parallel_commands(?MODULE),
begin
setup(),
{History, State, Result} = run_parallel_commands(?MODULE, Cmds),
teardown(),
?WHENFAIL(io:format("=======~n"
"Failing command sequence:~n~p~n"
"At state: ~p~n"
"=======~n"
"Result: ~p~n"
"History: ~p~n",
[Cmds,
State,Result,History]),
aggregate(command_names(Cmds), Result =:= ok))
end).
Running it a few times should reveal no failures, which is not necessarily surprising given what was mentioned in last chapter. The VM can be given a bit of help by ramping out the aggressiveness of the scheduler (by setting +T9
as a VM option). This will tend to be better at finding bugs, even though there are none found this time around:
$ ERL_ZFLAGS="+T9" rebar3 proper -p prop_parallel -n 1000
===> Testing prop_crud:prop_parallel()
.................................f................................f.......
[...]
No full trust is to be given to such tests, but analysis or instrumenting of the code with yield()
gave nothing. Careful analysis will show that at worst, a bad error value may be returned, but since all the boundary checking for data modifications are done within SQL queries, things are unlikely to go wrong. Of course, anyone wanting to be safer should just use transactions in Postgres, rather than hoping to replicate the bugs.
With all this, near full integration tests for the application’s interactions with the database have been written. Connectivity failures are still left untouched, but those may be more interesting to cover in another property, rather than adding to the complexity of this one.
10.3. Exercises
10.3.1. Question 1
Which callbacks for stateful tests belong in which execution phases?
10.3.2. Question 2
Pattern matching on the model state can be used to direct the generation of commands based on the current context. Why is this not sufficient and why do preconditions need to be used?
10.3.3. Question 3
The three ways to initialize a stateful property test are seen in a file:
prop_test1() -> (1)
?FORALL(Cmds, commands(?MODULE),
begin
actual_system:start_link(),
{History, State, Result} = run_commands(?MODULE, Cmds),
actual_system:stop(),
Result =:= ok
end).
prop_test2() -> (2)
actual_system:start_link(),
?FORALL(Cmds, commands(?MODULE),
begin
{History, State, Result} = run_commands(?MODULE, Cmds),
Result =:= ok
end).
prop_test3() -> (3)
actual_system:start_link(),
?FORALL(Cmds, commands(?MODULE),
begin
{History, State, Result} = run_commands(?MODULE, Cmds),
Result =:= ok
end),
actual_system:stop().
What will happen to the 'actual system' in every one of them?
10.3.4. Question 4
What is the caveat of using the Res
value–the result from the actual system–in the next_state/3
callback? Is there a scenario where it is absolutely useful to use it?
10.3.5. Question 5
The following command generators have been written for a given system:
$%% Option 1 %%%
commands(_State) ->
oneof([
{call, actual_system, log_on, [username(), password()]},
...
]).
username() -> elements(["user1", "user2", "user3", string()]).
password() -> string().
%%% Option 2 %%%
commands(S) ->
oneof([
{call, shim_system, log_on, [success, userpass(S)]},
{call, shim_system, log_on, [bad_pass, {username(S), password()}]},
{call, shim_system, log_on, [unknown_user, {bad_username(S), password()}]},
...
]).
username() -> string().
password() -> string().
username(#{users := List}) -> elements([User || {User,_Pass} <- List]);
userpass(#{users := List}) -> elements(List).
What does each strategy entail in terms of benefits and challenges for the rest of the test?
10.4. Solutions
10.4.1. Question 1
Which callbacks for stateful tests belong in which execution phases?
-
abstract:
initial_state/0
,command/1
,precondition/2
,next_state/3
-
real:
initial_state/0
,command/1
,precondition/2
,next_state/3
,postcondition/3
10.4.2. Question 2
Pattern matching on the model state can be used to direct the generation of commands based on the current context. Why is this not sufficient and why do preconditions need to be used?
The pattern matching in the generator works fine to create an initial list of commands to run on the model system, but as soon as a failure happens–or a constraint needs to be enforced when parallelizing commands–the modification of the command list is done without regards to the initial patterns. Only the preconditions are used to ensure the validity of the command sequence.
Without preconditions, the framework is not able to manipulate the sequence in any way.
10.4.3. Question 3
The three ways to initialize a stateful property test are seen in a file:
prop_test1() -> (1)
?FORALL(Cmds, commands(?MODULE),
begin
actual_system:start_link(),
{History, State, Result} = run_commands(?MODULE, Cmds),
actual_system:stop(),
Result =:= ok
end).
prop_test2() -> (2)
actual_system:start_link(),
?FORALL(Cmds, commands(?MODULE),
begin
{History, State, Result} = run_commands(?MODULE, Cmds),
Result =:= ok
end).
prop_test3() -> (3)
actual_system:start_link(),
?FORALL(Cmds, commands(?MODULE),
begin
{History, State, Result} = run_commands(?MODULE, Cmds),
Result =:= ok
end),
actual_system:stop().
What will happen to the 'actual system' in every one of them?
-
For the first one, the actual system will be started and stopped before every single test iteration.
-
The actual system will be booted once before the tests run, and the system instance will be shared for all iterations
-
The actual system will be started, then an abstract representation of the test case will be created. After this, the system will be stopped. However, the test itself will not have run once at that point, so by the time the framework would try to execute the test, the actual system has already shut down.
PropEr does not have very clean fixture systems; if the state can’t be set up and torn down for every iteration, then using another framework and calling the Property-based testing framework by hand may be the only solution. However, shrinking may prove difficult in a stateful setting it isn’t reset every time!
10.4.4. Question 4
What is the caveat of using the Res
value–the result from the actual system–in the next_state/3
callback? Is there a scenario where it is absolutely useful to use it?
The value must be treated as fully opaque since placeholders will be generated by the framework during the abstract phase of command generation. Symbolic calls may be added to the command when used later in an actual execution context.
It is simpler to avoid it entirely, but one of the useful cases comes when interaction with the actual system relies on values it has returned and are not predictable. Those may include process identifiers, sockets, or any other unique, unpredictable, or transient resource. Whenever the system returns one of these and expects them back, one may not be able to plan the value ahead of time from the model state. The model is then better off holding a reference to that value (without altering it).
10.4.5. Question 5
The following command generators have been written for a given system:
$%% Option 1 %%%
commands(_State) ->
oneof([
{call, actual_system, log_on, [username(), password()]},
...
]).
username() -> elements(["user1", "user2", "user3", string()]).
password() -> string().
%%% Option 2 %%%
commands(S) ->
oneof([
{call, shim_system, log_on, [success, userpass(S)]},
{call, shim_system, log_on, [bad_pass, {username(S), password()}]},
{call, shim_system, log_on, [unknown_user, {bad_username(S), password()}]},
...
]).
username() -> string().
password() -> string().
username(#{users := List}) -> elements([User || {User,_Pass} <- List]);
userpass(#{users := List}) -> elements(List).
What does each strategy entail in terms of benefits and challenges for the rest of the test?
The first strategy tends to create simpler test scenarios, more similar to fuzzing. The state changes for the model and the postconditions will either need to be (a) more general or (b) do dynamic detection of the system’s state. This tends to create complexity around validation when wanting to be precise, but is good to explore broad system behaviour.
The second strategy requires more upfront support and design, and a lot more care around the use of preconditions to ensure consistency. On the other hand, the validation of the program’s behaviour is made simple and predictable, and will avoid sharing implementation details with the actual system by design. It is good to exercise well-defined system behaviour.