Difference between revisions of "Overview rebind workflow"
LornaMorris (talk | contribs) (→Overview of the reBiND workflow) |
LornaMorris (talk | contribs) (→Overview of the reBiND workflow) |
||
Line 7: | Line 7: | ||
Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema. ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational datebase we used the BioCASe Provider Software (BPS). The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated. | Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema. ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational datebase we used the BioCASe Provider Software (BPS). The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated. | ||
− | Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review | + | Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. Hopefully the document should now be valid, but at least all the problems are now marked. |
The next step is the review. The notes from the correction belong to one of three types: | The next step is the review. The notes from the correction belong to one of three types: |
Revision as of 17:05, 8 October 2014
Overview of the reBiND workflow
This figure shows the general structure of the reBiND processing architecture. It shows each step in the workflow from submission of a dataset, preparation and processing to its final publication.
Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema. ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational datebase we used the BioCASe Provider Software (BPS). The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated.
Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. Hopefully the document should now be valid, but at least all the problems are now marked.
The next step is the review. The notes from the correction belong to one of three types:
- information (some changes were made, but there are no problems to expect from that)
- warning (the correction module made changes but is not sure about it, problems with the content that can not be changed automatically but has no consequence for the validity of the document)
- error (problems that cause the document to be invalid and can not be fixed automatically).
At least the warnings and errors should be reviewed and fixed by any of the users. Some of the problems could be the result of some technical issues, which are better handled by the Content Administrator or the Technical Administrator, whereas other problems could be caused by the content and are therefore better handled by the Contributing Scientists.