Thursday, 31 March 2016

Taming Configuration Files - general term processing

Overview

This instalment allows the matching of general Erlang terms including recursive structures such as filename and iolist.  Several weeks ago I did some work on matching Erlang terms.  This unpublished work has been extended.  

This extension involved the storing of parser state, and the change of the validation rules from two parameters to three parameters.  Code was broken in this process.  This state could have been stored in the process dictionary with a lot less rework, and perhaps where time is money, such as in a commercial environment it would have been done that way.  In this rework the existing tests became invaluable in ensuring that there was no loss of quality in the process.

Specifications

As per the earlier work, the intention is to check that an Erlang term conforms to a definition which is also an Erlang term.  This comparison is performed by term_defs:validate/2, which takes two parameters, the term specification, and the term respectively.  If the two do not match an error is thrown, listing both the rule and the data.

Syntactic Conventions

{} - denotes a tuple structure as per normal Erlang
Captialised - denotes a syntactic construction.  The details of which are covered in this article.
[] - denotes a list as per normal Erlang

Matching Atomic Values

The test for a given value is done by the {value,Value} term where the corresponding term in the expression to be tested must match Value.  Where a type test is required, the {builtin,TypeTest} test is used.  TypeTest is any unary function exported from the Erlang module that returns boolean.

Matching from a list of possible values

{options,[Item_Spec]}

In a options construction, one of the Item_Spec's in the list must match.  For example: {options,[{value,male},{value,female}]} indicates that the valid contents are one of the set male or female.

Matching a list

{list,TypeTest}

All items within the list must pass the type test.  For example to test that all the items in the list are integers use {list,{builtin,is_integer}}.

Defining a pattern

{define, Key, Pattern}

This associates a Pattern with a Key.  For example {define int {builtin is_integer}} associates the atom int with the check for an integer.  This symbolic referencing allows recursive data patterns to be defined.  Defining always returns true.

Matching all items in a list of tests

{match_all,[TypeTest]}

Although this could be used with several filters, this functionality is not usable because of the lack of user functions in this specification.  This is currently used to specify macros for recursive type tests using define.

Matching a tuple structure

{tuple, [Test]}

The number of tests in the test list must match the number of positions in the tuple, and each test must be passed for the tuple to be valid.  For example {tuple,[{value,employee},{list,{builtin,is_integer}]} will match {employee,"Joe Smith"}.

Matching a property list

{property_list, [ KeyValuePairTest ]}

The KeyValuePairTest is a tuple containing an optionality flag  which satisfies the rule {options,[{value,opt},{value,reqd}]}, a KeyName (which is an atom), and a data specification.  For every key with the reqd keyword, that item must exist in the data, and its associated value must be valid.  For every key with the opt keyword, if it exists then its associated value must be valid.  For example, a property list must contain gender, and the value must be either male or female.

{property_list [{reqd,gender,{options[{value,male},{value,female}]}]}

Although the property list construct could have been defined from other rules in this specification, it is a commonly used structure and was therefore given its own abstraction.

A complex example

{match_all,[
     {define,iolistmember,{options,[{builtin,is_integer},iolist]}},
     {define,iolist,{options,[{builtin,is_binary},{list,iolistmember}]}},
     iolist]}

This example tests that data is an iolist.  A valid iolist might be  [<<"This is a valid ">>,"iolist"].

An iolist is either a binary of a list of integers and io lists.  This example shows how recursive structures can be defined using a combination of match_all and define.  Notice how it is the last item in the match_all that causes the test to fire, and this item references back to the earlier defines.


The code


-module(term_defs).

-export ([validate/2,test/0]).

-author('Tony Wallace').
-purpose(  <<"Confirm that an erlang term matches a specification.",
  "The atom datadef matches any data definition",
  "Type testing is done by matching the erlang term to {builtin,functionanme}",
  "functionname is a unary function exported from the erlang module, for example"
  "{builtin,is_integer} will check that the matched term is an integer./n",
  "Where an erlang term must match a given term the {value,Term} pattern is used.",
  "The {value,Term} construction is valuable where there is a list of valid options",
           ", for example {options,[{value,option1},{value,option2}]}.  In this case the",
  "term can match either option1 or option 2./n",
  "A list is defined by the construction {list,DataDef}, where each item of the",
  "list must conform the the specification in DataDef.  For example a list of"
  "integers is defined as {list,{builtin,is_integer}}.\n",
  "Tuples are matched to {tuple,[DataDef]}.  Each term contained within the tuple",
  "must match its associated DataDef./n",
  "Property lists are given special treatment.  A property list is defined its contents",
  ".  The specification is:/n ",
  "     {property_list,[ ",
  "         {tuple, ",
  "             {options,{value,opt},{value,reqd}}, ",
  "             Key,Specification}]} ",
    " opt - this key is optional, reqd this key is required",
    "Key - is an erlang term, normally an atom\n ",
    "Specification a datadef that that key value must satisfy.">>).

validate(Def,Term) ->
    {R,_}=validate(Def,Term,dict:new()),
    R.
validate(Def,Term,State) ->
    %io:format("validate(~p,~p,~p)~n~n",[Def,Term,State]),
    case maybe_validate(Def,Term,State) of
{true,NewState} ->
   {true,NewState};
{false,_} ->
   not_valid(Def,Term)
    end.
%maybe_validate([Def],Term,State) ->
%    validate(Def,Term,State);
maybe_validate({define,Key,Value},_Term,State)  ->
    NewState = dict:store(Key,Value,State),
    {true,NewState};
maybe_validate(any,_,S) ->
    {true,S};
maybe_validate({property_list,KeyDefs},PL,State) ->
    Checked = [pl_entry(KeyDef,PL,State) || KeyDef <- KeyDefs],
    R=lists:foldl(fun(true,A) -> A;(_,_)->false end, true, Checked),
    {R,State};
maybe_validate({tuple,DefList},Term,S) 
  when is_tuple(Term) andalso (length(DefList) =:= tuple_size(Term)) ->
    TermList = tuple_to_list(Term),
    tuple_flds(DefList,TermList,S);
maybe_validate({tuple,_},_,S) ->
    %% tuple sizes do not match or Term is not a tuple
    {false,S};
maybe_validate({list,TermDef},Term,State) 
  when is_list(Term)  ->
    Valid=[validate(TermDef,X,State) || X <- Term],
    R=lists:foldl(fun({X,_},A) -> A and X end,true,Valid),
    {R,State};
maybe_validate(TermDef={builtin,Atom},Term,State)   ->
    case catch(apply(erlang,Atom,[Term])) of
{'EXIT',_} ->
   not_valid(TermDef,'undefined');
true -> {true,State};
false -> {false,State};
X -> 
   not_valid(TermDef,{'not_boolean',X})
    end;
maybe_validate({options,TermDef},Term,State) when is_list(TermDef) ->
    choices(TermDef,Term,State);
maybe_validate({match_all,[]},_,State)  ->
    {true,State};
maybe_validate({match_all,[H|T]},Term,State)  ->
    {R1,State1} = validate(H,Term,State),
    case R1 of
true ->
   validate({match_all,T},Term,State1);
false ->
   not_valid(H,Term)
    end;
maybe_validate({value,X},X,State) ->
    {true,State};
maybe_validate(datadef,{value,_},State) ->
    {true,State};
maybe_validate(datadef,{builtin,Fname},State)
  when is_atom(Fname)->
    EE = erlang:module_info(exports),
    case proplists:get_value(Fname,EE) of
1 -> {true,State};
_ -> {false,State}
    end;
maybe_validate(datadef,{list,ItemDef},State) ->
    validate(datadef,ItemDef,State);
maybe_validate(datadef,{property_list,[{Opt,_KeyName,DataDef}|KeyList]},State) ->
    validate([{value,opt},{value,reqd}],Opt,State),
    {true,NewState} = validate(datadef,DataDef,State),
    maybe_validate(property_list,KeyList,NewState);
maybe_validate(datadef,{property_list,[]},S) -> {true,S};
maybe_validate(datadef,{tuple,DefList},State) 
  when is_list(DefList)->
    Validated = [validate(datadef,X,State) || X <- DefList],
    R=lists:foldl(fun({true,_},A) -> A;(_,_) -> false end,true,Validated),
    {R,State};
maybe_validate(Key,Term,State) when is_atom(Key) ->
    case lookup(Key,State) of
undefined -> {false,State};
Pattern -> validate(Pattern,Term,State)
    end;
maybe_validate(_,_,State) ->
    {false,State}.

lookup(Key,Dict) ->
    lookup2(dict:is_key(Key,Dict),Key,Dict).
lookup2(false,_,_) ->
    undefined;
lookup2(true,Key,Dict) ->
    dict:fetch(Key,Dict).

pl_entry({Opt,Key,Def},PL,State) ->
    case proplists:get_value(Key,PL) of
undefined ->
   %% it is valid for an optional key to be undefined
   (Opt =:= opt);
Data ->
   %% if it exists it must be valid
   {true,_}=validate(Def,Data,State),
   true
    end.

tuple_flds([H1|T1],[H2|T2],State) ->
    case validate(H1,H2,State) of
{true,NewState} ->
   tuple_flds(T1,T2,NewState);
{false,_S} ->
   not_valid(H1,H2)
    end;
    

tuple_flds([],[],State) ->
    {true,State}.

choices([H|T],Term,State) ->
    %io:format("choices ([~p|~p],~p,~p)~n",[H,T,Term,State]),
    case maybe_validate(H,Term,State) of
{true,NewState} -> 
   {true,NewState};
{false,NewState} -> 
   choices(T,Term,NewState)
    end;

choices([],_,State) -> 
    {false,State}.


not_valid(Def,Term) ->
    throw({invalid,Def,Term}).
    
   
test() ->
    T=test([
   {datadef,{builtin,is_integer},true},
   {datadef,{value,value},true},
   {datadef,{tuple,[{value,employee}]},true},
   {datadef,{tuple,employee},invalid},
   {datadef,{list,{builtin,is_integer}},true},
   {datadef,{property_list,[]},true},
   {{builtin,is_integer},5,true},
   {{builtin,is_integer},atom,invalid},
   {{options,[{value,option1},{value,option2}]},option2,true},
   {{options,[{value,option1},{value,option2}]},option3,invalid},
   {{tuple,[{value,employee},{builtin,is_list}]},{employee,"Robert"},true},
   {{tuple,[{value,employee},{builtin,is_list}]},{emp,"Robert"},{invalid,{value,employee},emp}},
   {{tuple,[{value,employee},{builtin,is_list}]},{employee,"Robert",male},invalid},
   {{list,{builtin,is_integer}},[3,4,5],true},
   {{list,{builtin,is_integer}},[3,a,5],{invalid,{builtin,is_integer},a}},
   {{property_list,[]},[],true},
   {{property_list,[{opt,gender,{options,[{value,male},{value,female}]}}]},[],true},
   {{property_list,[{opt,gender,{options,[{value,male},{value,female}]}}]},[{gender,male}],true},
   {{property_list,[{opt,gender,{options,[{value,male},{value,female}]}}]},[{gender,transgender}],
    {invalid,{options,[{value,male},{value,female}]},transgender}},
   {{property_list,[{reqd,gender,{options,[{value,male},{value,female}]}}]},[],invalid},
   {{match_all,[{define,int,{builtin,is_integer}},int]},5,true},
   {{match_all,[{define,int,{builtin,is_integer}},int]},'hello',{invalid,{builtin,is_integer},hello}},
   {{match_all,[
     {define,iolistmember,{options,[{builtin,is_integer},iolist]}},
     {define,iolist,{options,[{builtin,is_binary},{list,iolistmember}]}},
     iolist]},
    <<"This is a valid iolist">>,true},
   {{match_all,[
     {define,iolistmember,{options,[{builtin,is_integer},iolist]}},
     {define,iolist,{options,[{builtin,is_binary},{list,iolistmember}]}},
     iolist]},
     [<<"This is a valid ">>,"iolist"],true},
   {{match_all,[
     {define,iolistmember,{options,[{builtin,is_integer},iolist]}},
     {define,iolist,{options,[{builtin,is_binary},{list,iolistmember}]}},
     iolist]},
     'This is not a valid iolist',
      {invalid,{options,[{builtin,is_binary},{list,iolistmember}]},'This is not a valid iolist'}},
   {{match_all,[
     {define,iolistmember,{options,[{builtin,is_integer},iolist]}},
     {define,iolist,{options,[{builtin,is_binary},{list,iolistmember}]}},
     iolist]},
     "This is a valid iolist",true}
 ]),
    Filtered=[{X,Y} || {X,Y} <- T, Y =/= pass],
    io:format("~p~n",[Filtered]).
test(L) when is_list(L) ->
    [test1(H) || H <- L].

test1(T={Def,Arg,_R}) ->
    ExpectedResult = make_result(T),
    case catch(validate(Def,Arg)) of
ExpectedResult -> {T,pass};
Failed -> {T,Failed}
    end.

make_result({_,_,true}) ->
    true;
make_result({Def,Arg,invalid}) ->
    {invalid,Def,Arg};
make_result({_Def,_Arg,Term}) ->
    Term.


Thursday, 17 March 2016

File contexts in Erlang

The problem

I decided to do some testing.  Each test was independent, and some were designed to throw errors.  As these processes throw errors, why not put them in separate processes and trap the exit signals.  Each test can be in its own directory with its own files this makes these tests independent of each other.  Finally now that they are independent run them in parallel and collect the results.

Pretty simple, what could go wrong?  Of course being computing nothing works as planned.  The tests worked individually but running in parallel they failed.  Tests were picking up files intended for the other test in the other directory.  Current directory was shared state, and what the state of that location is is determined by the last process to change it.  Relative file names were mapped against that global working directory.

The solution

The current directory was made part of process state, that is a change directory command that stored a value in the process dictionary.  Local versions of file operations, file:consult, file:write_file and others were written to execute in a local context.  Local io_format versions were written to output to the local directory to aid debugging.  Problem solved.

Thinking generally

This problem is not that different from the global name problem I wrote on earlier.  The context system should be extended with local directory and context sensitive file operations.  It is understandable why the Erlang runtime system is built the way it is, but as this experience shows, we can do better.

Friday, 11 March 2016

Taming Erlang Configuration files

Problem Outline

When building systems we normally have several configuration files.  We have these files for a number of reasons.
  • As part of the system boot.  The system may need to access various resources such as databases and webservers in order to start.  These resources in turn need data to be supplied to them for them to establish their function.
  • To cope with site specific conditions or business rules.
  • To set settings that tend to be static and considered unworthy of (or too sensitive for) a user interface.
Configuration files are an essential part of our delivered systems and are here to stay.

There are several aspects of configuration files that tend to be problematic.  
  • They are interpreted.  While it is true that the must conform to an overall syntax to be readable by the system, their contents are read and interpreted by the system in an adhoc and as needed basis.  There tends not to be the static analysis that would be performed on compiled code.
  • Their effects are dispersed throughout the system making it often difficult to associate the manifestation of the problem with its cause.
What I desire is a static analysis tool, one that says this is an invalid value, or this value is missing, at the time the configuration file is updated, rather than a strange fault of unknown origin occurring some time later.  Certainly there are tool such as XML parsers that can be used for this: putting configuration data in xml and parsing against a schema.  Once validated the data could become available.

Erlang systems tend to be configured with lists of erlang terms loaded in with file:consult.   This is a system understood and familiar to Erlang Programmers.  Besides is it really a good use of time rewriting existing erlang term configurations into XML?  What is needed is a way to assert certain properties of the configuration file.  By adding a definition file to existing Erlang configuration requirements, existing files can be used without modification, and the schema files are transferable between sites and projects.

Requirements

  • Everything is Erlang.  This is an Erlang shop.  Language choice is a strategic decision.
  • Do static analysis of configuration files.  Justification is the above section.
  • Keep it simple.  Rather a limited but useful tool new than a more sophisticated tool that is never delivered or is difficult to use or unreliable.
  • Sensible workflow/tool integration.  Static analysis should be part of deployment.  Tools like make should be able to detect out of date files and act accordingly.
  • Erlang configuration files tend to be property lists, which sometimes recursively embed property lists.  The system must be able to check that an arbitrary nested key exists.

Design

  1. Every configuration file has a schema entry which defines the definition to which the file conforms.  
  2. The definition file is a configuration file with a schema entry also.
  3. After checking a non human readable form of the configuration file is generated and it is this version that may be used by applications.
  4. By checking file timestamps, tools such as make can determine if a configuration file has been changed and needs rechecking.
  5. The Erlang module that provides check and compile facilities exports a "main/1" entrypoint so that is easy to integrate into tools like make using escript.
  6. Recursive processing of property lists to check key existence.
The master configuration file is as follows:

File conf_def.conf:

{schema,"conf_def.conf.etf"}.
{required_keys,[schema,required_keys]}.

As we can see this configuration file defines itself and validates against a compiled version of itself.  This is a chicken and egg situation resolved by the method conf:compile/1, which allows a compile without validation.  As we can see the value of required_keys is a list of keys, in this case the two keys that are required are schema and required, so this file conforms with its own definition.

A key can also be a list.  For example a key could be [webserver,[listen_port]] which says there should be a key in the configuration file called webserver, which has a property list for a value, and in that property list should be an entry with the key listen_port.

Code

-module (conf).
-export([main/1,compile/1,check/1,consult/1,checkcompile/1,test/0]).

main([]) ->
    ok;
main([H|T]) ->
    checkcompile(H),
    main(T).

checkcompile(F) ->
    check(F),
    compile(F).

consult(ConfFile) ->
    must_read(ConfFile++".etf").

compile(ConfFile) ->
    {ok,SrcData} = file:consult(ConfFile),
    BinData = erlang:term_to_binary(SrcData),
    FileName = ConfFile++'.etf',
    ok=file:write_file(FileName,BinData).

check(ConfFile) ->
    {ok,ConfData} = file:consult(ConfFile),
    ok=check_key(schema,ConfData),
    Schema = proplists:get_value(schema,ConfData),
    B=must_read(Schema),
    Spec = binary_to_term(B),
    Errors =
[
do_it(X,ConfData) || X <- Spec
],
    ErrorResults =
lists:filter(fun (X) -> ok =/= X end,Errors),  
    Status = (ErrorResults =:= []),
    case Status of
true ->
   ok;
false ->
   KL1 = [atom_to_list(X) || X <- ErrorResults],
   KL2 = lists:foldl(fun(X,Acc) -> X ++ ", " ++ Acc end,[],KL1),
   [$,,KL3] = KL2,
   Msg = "Errors for "++ConfFile++KL3,
   throw(Msg)
    end.

must_read(File) ->
    {A,B} = file:read_file(File),
    ok=match(A,ok,fun() -> ["Failed to read file ",File," reason ",atom_to_list(B)] end),
    B.
   
match(A,A,_) ->
    ok;
match(_,_,B) ->
    throw(iolist_to_binary(B())).

do_it({schema,_},_) ->
    %% already checked
    ok;
do_it({required_keys,KeyList},Data) ->
    MissingKeys = [check_key(X,Data)  || X <- KeyList],
    ErrorResults =
lists:filter(fun (X) -> ok =/= X end,MissingKeys),  
    Status = (ErrorResults =:= []),
    case Status of
true ->
   ok;
false ->
   KL1 = [atom_to_list(X) || X <- MissingKeys],
   KL2 = lists:foldl(fun(X,Acc) -> X ++ ", " ++ Acc end,[],KL1),
   [$,,KL3] = KL2,
   io_lib:format("~nMissing keys:~s~n",
[KL3])
    end.

check_key(Key,Data) when is_atom(Key) ->
    case proplists:is_defined(Key,Data) of
true ->
   ok;
false ->
   Key
     end;

check_key([],_Data) ->
    ok;
check_key(_L=[H|T],Data) ->
    %io:format("Checking key ~p in ~p~n",[_L,Data]),
    case check_key(H,Data) of
ok ->
   SubData = proplists:get_value(H,Data),
   check_key(T,SubData);
K -> K
    end.

test() ->
    WsSchemaString=
"{schema,\"conf_def.conf.etf\"}." ++ [$\r] ++
"{required_keys,[schema,webserver,[webserver,port]]}.",
    ok = file:write_file("webserver_schema.conf",WsSchemaString),    
    WsConfig = 
"{schema, \"webserver_schema.conf.etf\"}." ++ [$\r] ++
"{webserver, [{port,9600}]}.",
    ok = file:write_file("webserver.conf",WsConfig),  
    main(["webserver_schema.conf","webserver.conf"]).
    
   

Thursday, 3 March 2016

Organized "christianity" a work of the devil? John 10



In John chapter 9 we see the healing of the man born blind and the tension between Jesus and the religious elite climb a notch. Here we see Jesus take direct aim at the Pharisees and tell them what he thinks of them.

John 10:1 starts with:


"Very truly I tell you Pharisees"


He does not talk behind their back. He confronts them directly. Jesus is not a gossip.


"anyone who does not enter the sheep pen by the gate but climbs in some other way is a thief and a robber. The gatekeeper opens the gate for him, and the sheep listen to his voice. He calls his own sheep by name and leads them out. 4 When he has brought out all his own, he goes on ahead of them, and his sheep follow him because they know his voice. 5 But they will never follow a stranger; in fact, they will run away from him because they do not recognize a stranger’s voice.” 6 Jesus used this figure of speech, but the Pharisees did not understand what he was telling them."


At this point, if I were a Pharisee I would be feeling pretty good. I have been through the apprenticeship, and studied. I have done the hard yards and entered my profession correctly. The gatekeepers, the religious authorities have accepted me as one of there own. People listen to me, give me respect and honour. The good people, the sheep that is. As for the rabble they are not God's sheep so it is okay that they are not listening to me. Let me smugly relax, even my enemies compliment me!

This is a clever strategy. Get your audience onside before you hit them. Nathan does the same when confronting David.

Jesus has not finished with them yet however:


7 Therefore Jesus said again, “Very truly I tell you, I am the gate for the sheep. 8 All who have come before me are thieves and robbers, but the sheep have not listened to them. 9 I am the gate; whoever enters through me will be saved.[a] They will come in and go out, and find pasture. 10 The thief comes only to steal and kill and destroy; I have come that they may have life, and have it to the full.

The bit that hurts "all who come before me are thieves and robbers". Ouch! Is Jesus really calling the religious leaders of his time thieves and robbers? "but the sheep have not listened to them." Is he saying that those outside of the religious assembly, those who pay no attention to their teaching the real sheep, the real people of God?

Next Jesus identifies himself as the gate and the path to true life and blessing and compares himself to the thieves of organised religion.

11 “I am the good shepherd. The good shepherd lays down his life for the sheep. 12 The hired hand is not the shepherd and does not own the sheep. So when he sees the wolf coming, he abandons the sheep and runs away. Then the wolf attacks the flock and scatters it. 13 The man runs away because he is a hired hand and cares nothing for the sheep.
Now Jesus presses his point. He will die for his sheep. The hired hand will not. This is the problem with organised religion. They do it for money. As soon as money gets involved then there is the potential for divided loyalties. You cannot serve both God and money.

As for dogs attacking sheep have you seen it happen? The dog does attack all the sheep at once. Instead it selects one and puts itself between its target and the flock so that it can pick it off. The wolf tries to divide God's people, to cut them off from each other. If someone is getting singled out by gossip and being pushed out of the flock will a hired hand get involved on their behalf? Of course not he knows where his self interest lies and it is not taking on the wolf.

14 “I am the good shepherd; I know my sheep and my sheep know me—15 just as the Father knows me and I know the Father—and I lay down my life for the sheep. 16 I have other sheep that are not of this sheep pen. I must bring them also. They too will listen to my voice, and there shall be one flock and one shepherd. 17 The reason my Father loves me is that I lay down my life—only to take it up again. 18 No one takes it from me, but I lay it down of my own accord. I have authority to lay it down and authority to take it up again. This command I received from my Father.”

Jesus speaks of his love for us and his sacrifice. Those of faith will listen to his voice. In these days if we look at organised religion do we truly see one flock with one shepherd. It is getting better the animosity between various brands of ungodly religion is reducing as Jesus' sheep recognise Christ in each other and ignore these traditional barriers.

Tuesday, 1 March 2016

Global Erlang namespace wrecks modularity

Global Erlang namespace wrecks modularity

Many of the best properties of Erlang systems derive from the architecture of lightweight independent processes without shared memory solving a problem by message passing.  Messages are addressed to processes.  To send a process a message then the sender needs the destination address (process id) or the recipient.
To avoid the need for the sender to know the address of the recipient system processes register a global name in the erlang runtime system.  These processes can then be found by a lookup of the register.  A similar situation also arises with applications which again register a globally unique name.
In some situations it would be good to break up our system into disjoint sets of processes each operating in its own context.  Consider providing webservices, each customer has its own pages, its own database etc.  As soon as a global name is used, like a webserver application, or a mnesia database the structure is compromised.  The only answer is separate erlang instances.