Integrated Rules

From Data Quality Toolkit
Revision as of 10:59, 7 November 2012 by PeerSchwirtz (talk | contribs)
Jump to: navigation, search

This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.


Integrity Rules

1 Atomized Genus element 2 Collection date fields 3 Site coordinate latitude 4 Site coordinate longitude 5 Syntax of email elements 6 ISO country element 7 Scientific name (zoology) 8 Scientific name (botany) 9 Mime type for multimedia objects 10 Check whether multimedia object file is available 11 Check whether multimedia object has an associated copyright statement 12 Check whether rule 7 and rule 8 find the scientific name 13 Check whether the value for measurement and fact is a number 14 Check whether Record basis is mapped





1 Atomized Genus elements should start with a single uppercase character followed ny a non-empty sequence of lower-case characters

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Zoological/GenusOrMonomial /DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Botanical/GenusOrMonomial

Regular expression:

[A-Z][a-z]+


2 Check whether collection date fields conform to specification in ABCD 2.06

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeEnd /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeEnd

Regular expression:

\d\d\d\d(\-(0[1-9]|1[012])(\-((0[1-9])|1\d|2\d|3[01])(T(0\d|1\d|2[0-3])(:[0-5]\d){0,2})?)?)?|\-\-(0[1-9]|1[012])(\-(0[1-9]|1\d|2\d|3[01]))?|\-\-\-(0[1-9]|1\d|2\d|3[01])


3 Check numeric ranges of site coordinates latitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LatitudeDecimal

Rule:

-90.0 <= lat <= 90.0


4 Check numeric ranges of site coordinates longitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LongitudeDecimal

Rule:

-180.0 <= lon <= 180.0


5 Check syntactical correctness of ABCD elements used for email addresses

ABCD elements:

All elements with email-addresses

Regular expression:

^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9\-\.]+)$


6 Check whether country element conforms with ISO3166

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Country/ISO3166Code

Rule:

Use 2- or 3-letter ISO country code (ISO3166-1).


7 Check whether scientific name is known by zoological name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Zoological Name Service


8 Check whether scientific name is known by botanical name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Botanical Name Service


9 Check whether field for multimedia object type uses mime types

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/FileURI /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/Format

Rule:

Use http://www.ietf.org/rfc/rfc2046.txt


10 Check whether multimedia object file is available

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/File

Rule:

HTTP HEAD request


11 Check whether multimedia object has an associated copyright statement

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/IPR/Copyrights/Copyright/Text

Rule:

Copyright element has to be non-empty.


12 Check whether rule 7 and rule 8 find the scientific name

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use rule 7 and rule 8


13 Check whether the value for measurement and fact is a number

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/LowerValue /DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/UpperValue /DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactText /DataSets/DataSet/Units/Unit/Gathering/Height/MeasurementOrFactText

Rule:

MaF data type field values have to be a number


14 Check whether Record basis is mapped

ABCD elements:

/DataSets/DataSet/Units/Unit/RecordBasis

Rule:

Record basis field has to be mapped