Overview
I have rewritten the specification from Darwen (2006) in Z using the conventions in Lightfoot (2001). I was going to use ISO/IEC (2002) but I found it hard going.
Note: This document was originally reliant upon MathML for the equations and UniCode for the special characters. However, Google Sites gave me issues with MathML. I tried coverting the more complex MathML expressions into PNG images because Google Sites, but this made the page difficult to edit. I ended using Unicode for the mathematical symbols, and have removed all of the MathML.
Design
Conversion of Darwen's Specifications
I shall attempt to convert Darwen (2006) into Z.
Types
The types are derived from the database (sub)schema (Darwen 2006, slide #11).
[JOB_NAME] | the set of all possible names of jobs |
[PERSON_NAME] | the set of all possible names of persons |
[SALARY] | the set of all possible salaries |
The values of the ID column are assumed to be natural numbers ( ℕ ). No special type is considered necessary. If the design is extended to include other ID columns, then I would set up types for these kinds of columns in order to enforce strong typing.
The choice of different types for the NAME and JOB columns is essential for strong typing.
The introduction of a type for the SALARY column allows the introduction of special constraints during physical design.
System State
The sets are derived from the database (sub)schema tables (Darwen 2006, slide #11), with the constraints from (Darwen 2006, slide #12). I have called the set name the same as the table name.
PERS_INFO | |||
CALLED : ℙ(ℕ × PERSON_NAME) DOES_JOB : ℙ(ℕ × JOB_NAME) EARNS : ℙ(ℕ × SALARY) JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ |
|||
#dom CALLED = #CALLED #dom DOES_JOB = #DOES_JOB #dom EARNS = #EARNS dom CALLED ⊆ dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED dom DOES_JOB ∩ JOB_UNK = ∅ dom DOES_JOB ∩ UNEMPLOYED = ∅ UNEMPLOYED ∩ JOB_UNK = ∅ dom DOES_JOB ⊆ dom CALLED JOB_UNK ⊆ dom CALLED UNEMPLOYED ⊆ dom CALLED dom CALLED ⊆ dom EARNS ∪ SALARY_UNK ∪ UNSALARIED dom EARNS ∩ SALARY_UNK = ∅ dom EARNS ∩ UNSALARIED = ∅ UNSALARIED ∩ SALARY_UNK = ∅ dom EARNS ⊆ dom CALLED SALARY_UNK ⊆ dom CALLED UNSALARIED ⊆ dom CALLED |
I have described the two (2) column table as a mathematical relation (a set of ordered pairs derived from the Cartesian product of two (2) sets). Constraints such as #dom CALLED = #CALLED are how I represent the Primary Key constraints. I have assumed that Darwen (2006, slide #12) meant to include Primary Key constraints for the other two (2) tables.
I hope it is obvious that the other constraints translate directly in the formulae given.
The problem with this first draft of the specifications is that it in unweildly for such a simple design. I plan do improvements to this specification.
First Transformation
The first transformation is to the declaration of the sets that are mathematical relations. These mathematical relations are really functions , so I can rewrite their definitions as follows by incorporating those primary key constraints:
CALLED : ℕ ⇸ PERSON_NAME
DOES_JOB : ℕ ⇸ JOB_NAME
EARNS : ℕ ⇸ SALARY
These are partial functions ( ⇸ ) as not all identification numbers are asigned to persons, have known jobs, or have known salaries.
Second Transformation
The next transformation is to replace the subset statements with equalities. The first group of subset statements is:
dom CALLED ⊆ dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED
dom DOES_JOB ⊆ dom CALLED
JOB_UNK ⊆ dom CALLED
UNEMPLOYED ⊆ dom CALLED
These can be replaced by the following statement:
dom CALLED = dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED
The brief reasoning behind this transformation is twofold:
- The last three (3) subset statements can be replaced by a union of subsets. That is, the union of subsets is itself a subset.
- And if two (2) sets are subsets of each other, then they must be equal to each other. (Refer to the equality property of subsets in de Haan & Koppelaars (2007, p.31)).
Third Transformation
The third transformation is to replace the following non-overlapping constraints:
dom DOES_JOB ∩ JOB_UNK = ∅
dom DOES_JOB ∩ UNEMPLOYED = ∅
UNEMPLOYED ∩ JOB_UNK = ∅
These can be replaced by the disjoint operator (Lightfoot 2001, pp.17-18) as follows:
disjoint〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉
Fourth Transformation
The final transformation is now to combine the equality statement with this disjoint one by using the partition operator (Lightfoot 2001, p.18) as follows:
〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED
I think this conveys the same information as distributed keys and distributed foreign keys as presented in Darwen (2006, slide #13).
Final Version
By doing similar transformations to the rest of the constraints, I end up with the following succinct specification of the sets:
PERS_INFO | |||
CALLED : ℕ ⇸ PERSON_NAME DOES_JOB : ℕ ⇸ JOB_NAME EARNS : ℕ ⇸ SALARY JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ |
|||
〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED 〈dom EARNS, SALARY_UNK, UNSALARIED〉 partition dom CALLED |
Problems with Final Version
Based on these constraints, the following combinations are possible for salary and job:
Salary | Job | Validity |
---|---|---|
Known | Known | Valid |
Known | Unknown | Valid |
Known | Unemployed | Invalid |
Unknown | Known | Valid |
Unknown | Unknown | Invalid |
Unknown | Unemployed | Invalid |
Unsalaried | Known | Invalid |
Unsalaried | Unknown | Invalid |
Unsalaried | Unemployed | Valid |
This matches the sample data table provided in Darwen (2006, slide #4). There are only four (4) valid cases. However, the existing specification allows for nine (9) posssible ones.
There appears to be two (2) extra constraints that are not articulated in Darwen (2006):
- An employed person has, at least, one of job name and salary known. Or, for a person to be employed, they have to have to reveal either a job title, a salary, or both. There are no unsalaried jobs.
- An unemployed person is unsalaried because they are earning no money.
Augmented Final Version
To address these extra constraints, the specification for the data storage is modified as follows:
PERS_INFO | |||
CALLED : ℕ ⇸ PERSON_NAME DOES_JOB : ℕ ⇸ JOB_NAME EARNS : ℕ ⇸ SALARY JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ |
|||
〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED 〈dom EARNS, SALARY_UNK, UNSALARIED〉 partition dom CALLED JOB_UNK ∩ SALARY_UNK = ∅ UNEMPLOYED = UNSALARIED |
Initialisation
Because of the simplicity of the definition of the data storage, the corresponding initialisation specification is also very simple:
init == [PERS_INFOʹ | CALLEDʹ = ∅]
The constraints ensure that all other sets are null as well.
Operations
There are four (4) data creation operations. Each of these correspond to the four (4) valid cases revealed in the augmented final specification for the data storage.
Note: I am using the maplet ( ↦ ) to represent a pair of key and value.
Create an Employed Person with all Details Known
ADD_EMPLOYED_JOB_SALARY | |||
ΔPERS_INFO name? : PERSON_NAME salary? : SALARY job? : JOB_NAME id? : ℕ |
|||
id? ∉ dom CALLED CALLEDʹ = CALLED ∪ {id? ↦ name?} EARNSʹ = EARNS ∪ {id? ↦ salary?} DOES_JOBʹ = DOES_JOB ∪ {id? ↦ job?} |
The data storage is changed by this operation. This operation adds an employed person with a known job and salary.
There are four (4) input variables:
- id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
- name? is the name of the person being added.
- salary? is the salary of the person being added.
- job? is the name of the job that the person is doing.
Create an Employed Person with only Salary Known
This operation is described in Darwen (2006, slide #15).
ADD_EMPLOYED_SALARY | |||
ΔPERS_INFO name? : PERSON_NAME salary? : SALARY id? : ℕ |
|||
id? ∉ dom CALLED CALLEDʹ = CALLED ∪ {id? ↦ name?} EARNSʹ = EARNS ∪ {id? ↦ salary?} JOB_UNKʹ = JOB_UNK ∪ {id?} |
The data storage is changed by this operation. This operation adds an employed person with a known salary.
There are three (3) input variables:
- id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
- name? is the name of the person being added.
- salary? is the salary of the person being added.
Create an Employed Person with only Job Known
ADD_EMPLOYED_JOB | |||
ΔPERS_INFO name? : PERSON_NAME job? : JOB_NAME id? : ℕ |
|||
id? ∉ dom CALLED CALLEDʹ = CALLED ∪ {id? ↦ name?} SALARY_UNKʹ = SALARY_UNK ∪ {id?} DOES_JOBʹ = DOES_JOB ∪ {id? ↦ job?} |
The data storage is changed by this operation. This operation adds an employed person with a known job.
There are three (3) input variables:
- id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
- name? is the name of the person being added.
- job? is the name of the job that the person is doing.
Create an Unemployed Person
ADD_UNEMPLOYED | |||
ΔPERS_INFO name? : PERSON_NAME id? : ℕ |
|||
id? ∉ dom CALLED CALLEDʹ = CALLED ∪ {id? ↦ name?} UNSALARIEDʹ = UNSALARIED ∪ {id?} UNEMPLOYEDʹ = UNEMPLOYED ∪ {id?} |
The data storage is changed by this operation. This operation adds an unemployed person.
There are two (2) input variables:
- id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
- name? is the name of the person being added.
Enquiry Operation
There is one (1) enquiry operation. The specification is based upon Darwen (2006, slide #16).
PERS_INFO_REPORT | |||
ΞPERS_INFO SALARY_INFO = SALARY ∪ { 'Unsalaried', 'Salary unknown' } JOB_INFO = JOB_NAME ∪ { 'Unemployed', 'Job unknown' } REPORT_INFO : ℙ( PERSON_NAME × JOB_INFO × SALARY_INFO ) REPORT! : ℕ ⇸ REPORT_INFO |
|||
∀id : dom CALLED ● ∃sal : SAL_INFO, job : JOB_INFO, name : PERSON_NAME | ( name, job, sal ) = REPORT! id ⇒ ( name = CALLED id ∧ ( job = DOES_JOB id ∨ job = 'Unemployed' ∧ id ∈ UNEMPLOYED ∨ job = 'Job unknown' ∧ id ∈ JOB_UNK ) ∧ ( sal = EARNS id ∨ sal = 'Unsalaried' ∧ id ∈ UNSALARIED ∨ sal = 'Salary unknown' ∧ id ∈ SALARY_UNK ) ) |
The data storage is not changed by this operation.
There are no input variables for this operation.
There is only one (1) output variable:
- REPORT! is the set of persons with their name, job information, and salary information.
I had to extend two (2) of the base sets to acommodate the missing information.