Logical Design


Overview

I have rewritten the specification from Darwen (2006) in Z using the conventions in Lightfoot (2001). I was going to use ISO/IEC (2002) but I found it hard going.

Note: This document was originally reliant upon MathML for the equations and UniCode for the special characters. However, Google Sites gave me issues with MathML. I tried coverting the more complex MathML expressions into PNG images because Google Sites, but this made the page difficult to edit. I ended using Unicode for the mathematical symbols, and have removed all of the MathML.

Design

Conversion of Darwen's Specifications

I shall attempt to convert Darwen (2006) into Z.

Types

The types are derived from the database (sub)schema (Darwen 2006, slide #11).

[JOB_NAME] the set of all possible names of jobs
[PERSON_NAME] the set of all possible names of persons
[SALARY] the set of all possible salaries

The values of the ID column are assumed to be natural numbers ( ). No special type is considered necessary. If the design is extended to include other ID columns, then I would set up types for these kinds of columns in order to enforce strong typing.

The choice of different types for the NAME and JOB columns is essential for strong typing.

The introduction of a type for the SALARY column allows the introduction of special constraints during physical design.

System State

The sets are derived from the database (sub)schema tables (Darwen 2006, slide #11), with the constraints from (Darwen 2006, slide #12). I have called the set name the same as the table name.

PERS_INFO

CALLED : ℙ(ℕ × PERSON_NAME)
DOES_JOB : ℙ(ℕ × JOB_NAME)
EARNS : ℙ(ℕ × SALARY)
JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ

#dom CALLED = #CALLED
#dom DOES_JOB = #DOES_JOB
#dom EARNS = #EARNS
dom CALLED ⊆ dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED
dom DOES_JOB ∩ JOB_UNK = ∅
dom DOES_JOB ∩ UNEMPLOYED = ∅
UNEMPLOYED ∩ JOB_UNK = ∅
dom DOES_JOB ⊆ dom CALLED
JOB_UNK ⊆ dom CALLED
UNEMPLOYED ⊆ dom CALLED
dom CALLED ⊆ dom EARNS ∪ SALARY_UNK ∪ UNSALARIED
dom EARNS ∩ SALARY_UNK = ∅
dom EARNS ∩ UNSALARIED = ∅
UNSALARIED ∩ SALARY_UNK = ∅
dom EARNS ⊆ dom CALLED
SALARY_UNK ⊆ dom CALLED
UNSALARIED ⊆ dom CALLED

I have described the two (2) column table as a mathematical relation (a set of ordered pairs derived from the Cartesian product of two (2) sets). Constraints such as #dom CALLED = #CALLED are how I represent the Primary Key constraints. I have assumed that Darwen (2006, slide #12) meant to include Primary Key constraints for the other two (2) tables.

I hope it is obvious that the other constraints translate directly in the formulae given.

The problem with this first draft of the specifications is that it in unweildly for such a simple design. I plan do improvements to this specification.

First Transformation

The first transformation is to the declaration of the sets that are mathematical relations. These mathematical relations are really functions , so I can rewrite their definitions as follows by incorporating those primary key constraints:

CALLED : ℕ ⇸ PERSON_NAME
DOES_JOB : ℕ ⇸ JOB_NAME
EARNS : ℕ ⇸ SALARY

These are partial functions ( ) as not all identification numbers are asigned to persons, have known jobs, or have known salaries.

Second Transformation

The next transformation is to replace the subset statements with equalities. The first group of subset statements is:

dom CALLED ⊆ dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED
dom DOES_JOB ⊆ dom CALLED
JOB_UNK ⊆ dom CALLED
UNEMPLOYED ⊆ dom CALLED

These can be replaced by the following statement:

dom CALLED = dom DOES_JOB ∪ JOB_UNK ∪ UNEMPLOYED

The brief reasoning behind this transformation is twofold:

  1. The last three (3) subset statements can be replaced by a union of subsets. That is, the union of subsets is itself a subset.
  2. And if two (2) sets are subsets of each other, then they must be equal to each other. (Refer to the equality property of subsets in de Haan & Koppelaars (2007, p.31)).

Third Transformation

The third transformation is to replace the following non-overlapping constraints:

dom DOES_JOB ∩ JOB_UNK = ∅
dom DOES_JOB ∩ UNEMPLOYED = ∅
UNEMPLOYED ∩ JOB_UNK = ∅

These can be replaced by the disjoint operator (Lightfoot 2001, pp.17-18) as follows:

disjoint〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉

Fourth Transformation

The final transformation is now to combine the equality statement with this disjoint one by using the partition operator (Lightfoot 2001, p.18) as follows:

〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED

I think this conveys the same information as distributed keys and distributed foreign keys as presented in Darwen (2006, slide #13).

Final Version

By doing similar transformations to the rest of the constraints, I end up with the following succinct specification of the sets:

PERS_INFO

CALLED : ℕ ⇸ PERSON_NAME
DOES_JOB : ℕ ⇸ JOB_NAME
EARNS : ℕ ⇸ SALARY
JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ

〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED
〈dom EARNS, SALARY_UNK, UNSALARIED〉 partition dom CALLED

Problems with Final Version

Based on these constraints, the following combinations are possible for salary and job:

Salary Job Validity
Known Known Valid
Known Unknown Valid
Known Unemployed Invalid
Unknown Known Valid
Unknown Unknown Invalid
Unknown Unemployed Invalid
Unsalaried Known Invalid
Unsalaried Unknown Invalid
Unsalaried Unemployed Valid

This matches the sample data table provided in Darwen (2006, slide #4). There are only four (4) valid cases. However, the existing specification allows for nine (9) posssible ones.

There appears to be two (2) extra constraints that are not articulated in Darwen (2006):

  1. An employed person has, at least, one of job name and salary known. Or, for a person to be employed, they have to have to reveal either a job title, a salary, or both. There are no unsalaried jobs.
  2. An unemployed person is unsalaried because they are earning no money.

Augmented Final Version

To address these extra constraints, the specification for the data storage is modified as follows:

PERS_INFO

CALLED : ℕ ⇸ PERSON_NAME
DOES_JOB : ℕ ⇸ JOB_NAME
EARNS : ℕ ⇸ SALARY
JOB_UNK, UNEMPLOYED, SALARY_UNK, UNSALARIED : ℙℕ

〈dom DOES_JOB, JOB_UNK, UNEMPLOYED〉 partition dom CALLED
〈dom EARNS, SALARY_UNK, UNSALARIED〉 partition dom CALLED
JOB_UNK ∩ SALARY_UNK = ∅
UNEMPLOYED = UNSALARIED

Initialisation

Because of the simplicity of the definition of the data storage, the corresponding initialisation specification is also very simple:

init == [PERS_INFOʹ | CALLEDʹ = ∅]

The constraints ensure that all other sets are null as well.

Operations

There are four (4) data creation operations. Each of these correspond to the four (4) valid cases revealed in the augmented final specification for the data storage.

Note: I am using the maplet ( ) to represent a pair of key and value.

Create an Employed Person with all Details Known

ADD_EMPLOYED_JOB_SALARY

ΔPERS_INFO
name? : PERSON_NAME
salary? : SALARY
job? : JOB_NAME
id? : ℕ

id? ∉ dom CALLED
CALLEDʹ = CALLED ∪ {id? ↦ name?}
EARNSʹ = EARNS ∪ {id? ↦ salary?}
DOES_JOBʹ = DOES_JOB ∪ {id? ↦ job?}

The data storage is changed by this operation. This operation adds an employed person with a known job and salary.

There are four (4) input variables:

  1. id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
  2. name? is the name of the person being added.
  3. salary? is the salary of the person being added.
  4. job? is the name of the job that the person is doing.

Create an Employed Person with only Salary Known

This operation is described in Darwen (2006, slide #15).

ADD_EMPLOYED_SALARY

ΔPERS_INFO
name? : PERSON_NAME
salary? : SALARY
id? : ℕ

id? ∉ dom CALLED
CALLEDʹ = CALLED ∪ {id? ↦ name?}
EARNSʹ = EARNS ∪ {id? ↦ salary?}
JOB_UNKʹ = JOB_UNK ∪ {id?}

The data storage is changed by this operation. This operation adds an employed person with a known salary.

There are three (3) input variables:

  1. id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
  2. name? is the name of the person being added.
  3. salary? is the salary of the person being added.

Create an Employed Person with only Job Known

ADD_EMPLOYED_JOB

ΔPERS_INFO
name? : PERSON_NAME
job? : JOB_NAME
id? : ℕ

id? ∉ dom CALLED
CALLEDʹ = CALLED ∪ {id? ↦ name?}
SALARY_UNKʹ = SALARY_UNK ∪ {id?}
DOES_JOBʹ = DOES_JOB ∪ {id? ↦ job?}

The data storage is changed by this operation. This operation adds an employed person with a known job.

There are three (3) input variables:

  1. id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
  2. name? is the name of the person being added.
  3. job? is the name of the job that the person is doing.

Create an Unemployed Person

ADD_UNEMPLOYED

ΔPERS_INFO
name? : PERSON_NAME
id? : ℕ

id? ∉ dom CALLED
CALLEDʹ = CALLED ∪ {id? ↦ name?}
UNSALARIEDʹ = UNSALARIED ∪ {id?}
UNEMPLOYEDʹ = UNEMPLOYED ∪ {id?}

The data storage is changed by this operation. This operation adds an unemployed person.

There are two (2) input variables:

  1. id? is the identification number of the person being added to the data storage. This number must not be already used in the data storage.
  2. name? is the name of the person being added.

Enquiry Operation

There is one (1) enquiry operation. The specification is based upon Darwen (2006, slide #16).

PERS_INFO_REPORT

ΞPERS_INFO
SALARY_INFO = SALARY ∪ { 'Unsalaried', 'Salary unknown' }
JOB_INFO = JOB_NAME ∪ { 'Unemployed', 'Job unknown' }
REPORT_INFO : ℙ( PERSON_NAME × JOB_INFO × SALARY_INFO )
REPORT! : ℕ ⇸ REPORT_INFO

∀id : dom CALLED ●
∃sal : SAL_INFO, job : JOB_INFO, name : PERSON_NAME |
( name, job, sal ) = REPORT! id ⇒
( name = CALLED id ∧
( job = DOES_JOB id ∨
job = 'Unemployed' ∧ id ∈ UNEMPLOYED ∨
job = 'Job unknown' ∧ id ∈ JOB_UNK ) ∧
( sal = EARNS id ∨
sal = 'Unsalaried' ∧ id ∈ UNSALARIED ∨
sal = 'Salary unknown' ∧ id ∈ SALARY_UNK ) )

The data storage is not changed by this operation.

There are no input variables for this operation.

There is only one (1) output variable:

  1. REPORT! is the set of persons with their name, job information, and salary information.

I had to extend two (2) of the base sets to acommodate the missing information.