Difference between revisions of "End-user workflows for name matching"

Latest revision as of 16:35, 22 September 2025

The workflows that end-users follow vary significantly based on the specific use case, ranging from checking a single name to uploading a regional or monographic checklist with thousands of names. There are four main types of name-matching processes:

Direct Use of the Name Matching Service: Utilizing the on-line tools provided directly by the service.
Using Third-Party Tools: Leveraging tools such as [OpenRefine] that access the name matching services.
Using a web-accessible API or a programming package to access the name matching services.
Using Local Tools: Downloading the dataset provided and using local tools to perform the matching.

The choice of method mainly depends on the expected result but also on the number of records to be matched and on the technical in-house expertise available to the user. A type 3 or 4 process usually requires some expertise in biodiversity data management. TETTRIs provides links to download sites to get the aggregator's data. For type 2, TETTRIs will provide some example use cases that have been successfully tested. For type 1 (direct use of the aggregator’s services, the respective documentation is pointed out in a list of general capabilities of aggregators.

The process itself can be divided into four phases:

Preparing the Data:
- A text-only list of names is required, which can be created from a spreadsheet column or a table. Each name should be on a separate line.
Submitting the Data:
- This phase depends on the chosen type of checking process.
Getting and Interpreting the Results:
- For process types 1 and 2, results are provided as lists of exact matches and possible candidates. Interpretation involves assessing these candidates and selecting the correct match if appropriate.
Incorporating the Results Locally:
- This involves making local corrections based on the matching results. It may also include integrating the aggregator’s name ID into the local dataset to enable linkage and potential interaction with the aggregator.

In the course of the project, we have first focused on OpenRefine, especially for medium to large datasets. However, not all relevant services provide an API that allows to use OpenRefine, and for many users, OpenRefine (in spite of its quite straight-forward procedures) seems to be an overly technical tool. We have also employed downloads and direct matching, for example for the herbarium and Euro+Med PlantBase use cases with the WFO Plant List. Realizing that many users are thoroughly confused by the offers available, we later focused on online matching mechanisms that do not require a technical background on the users side, i.e. those that either have a direct drag-n-drop interface or that allow uploads of files, or both.

The ongoing effort to document these processes faces a moving target, as the main services and datasets are continuously evolving, hopefully influenced by the requirements posted by TETTRIs WP2. Close collaboration with the TETTRIs 3PP project on Taxonomic Name Linking Services (TNLS), along with the involvement of the Catalogue of Life as a project member, yielded productive complementarity in enhancing end-user workflows.

To help users to make a choice of services and datasets available, TETTRIs has developed an online tool that allows users to enter their preferences and retrieve a list of potentially useful and available service/dataset combinations. A prototype of the tool is provided as "CheckMyName - Taxonomic Name Matching Service Explorer" and described in more detail in this document. The choice of criteria to define the user's needs as well as the corresponding metadata needed to make a choice will be detailed in a forthcoming article.

@@ Line 1: / Line 1: @@
-The workflows that end-users follow vary significantly based on the specific use case, ranging from checking a single name to uploading a regional or monographic checklist with thousands of names. There are three main types of name-checking processes:
+The workflows that end-users follow vary significantly based on the specific use case, ranging from checking a single name to uploading a regional or monographic checklist with thousands of names. There are four main types of name-matching processes:
-#'''Direct Use of the Aggregator's Name Matching Mechanisms:''' Utilizing the tools provided directly by the aggregator.
+#'''Direct Use of the Name Matching Service:''' Utilizing the on-line tools provided directly by the service.
-#'''Using Third-Party Tools:''' Leveraging tools such as [[https://openrefine.org/ OpenRefine]] that access the aggregator's name matching services.
+#'''Using Third-Party Tools:''' Leveraging tools such as [[https://openrefine.org/ OpenRefine]] that access the name matching services.
-#'''Using Local Tools:''' Downloading the aggregator's data and using local tools to perform the matching.
+#'''Using a web-accessible API''' or a programming package to access the name matching services.
-The choice of method mainly depends on the expected result but also on the number of records to be matched and on the technical in-house expertise available to the user. A type 3 process usually requires some expertise in biodiversity data management. TETTRIs provides links to [[Downloads_from_aggregators|download sites]] to get the aggregator's data. For type 2, TETTRIs will provide some example use cases that have been successfully tested. For type 1 (direct use of the aggregator’s services, the respective documentation will be pointed out in a list paralleling the [[Existing_name_checking_mechanisms|list of general capabilities of aggregators]].
+#'''Using Local Tools:''' Downloading the dataset provided and using local tools to perform the matching.
+The choice of method mainly depends on the expected result but also on the number of records to be matched and on the technical in-house expertise available to the user. A type 3 or 4 process usually requires some expertise in biodiversity data management. TETTRIs provides links to [[Taxonomic datasets|download sites]] to get the aggregator's data. For type 2, TETTRIs will provide some example use cases that have been successfully tested. For type 1 (direct use of the aggregator’s services, the respective documentation is pointed out in a [[Name matching services|list of general capabilities of aggregators]].
-The choice of method depends on the expected outcome, the volume of records to be matched, and the technical expertise available to the user. Type 3 processes generally require expertise in biodiversity data management. TETTRIs offers links to [[Downloads_from_aggregators|download sites]] for the aggregator's data. For type 2 processes, TETTRIs provides example use cases that have been successfully tested. For type 1 processes, relevant documentation will be documented paralleling the listed [[Existing_name_checking_mechanisms|list of general capabilities of aggregators]].
+The process itself can be divided into four phases:
+#'''Preparing the Data:'''
+#*A text-only list of names is required, which can be created from a spreadsheet column or a table. Each name should be on a separate line.
+#'''Submitting the Data:'''
+#*This phase depends on the chosen type of checking process.
+#'''Getting and Interpreting the Results:'''
+#*For process types 1 and 2, results are provided as lists of exact matches and possible candidates. Interpretation involves assessing these candidates and selecting the correct match if appropriate.
+#'''Incorporating the Results Locally:'''
+#*This involves making local corrections based on the matching results. It may also include integrating the aggregator’s name ID into the local dataset to enable linkage and potential interaction with the aggregator.
-For the process itself, we can in principle distinguish 4 phases:
+In the course of the project, we have first focused on OpenRefine, especially for medium to large datasets. However, not all relevant services provide an API that allows to use OpenRefine, and for many users, OpenRefine (in spite of its quite straight-forward procedures) seems to be an overly technical tool. We have also employed downloads and direct matching, for example for the herbarium and Euro+Med PlantBase use cases with the WFO Plant List. Realizing that many users are thoroughly confused by the offers available, we later focused on online matching mechanisms that do not require a technical background on the users side, i.e. those that either have a direct drag-n-drop interface or that allow uploads of files, or both.
-*Preparing the data
-In all cases, a list of names is needed in text-only format, which can be created from a spreadsheet column or be part of a table containing these names. One name only in one line is always required.
+The ongoing effort to document these processes faces a moving target, as the main services and datasets are continuously evolving, hopefully influenced by the [[Aggregator_services_wish_list|requirements]] posted by TETTRIs WP2. Close collaboration with the TETTRIs 3PP project on Taxonomic Name Linking Services (TNLS), along with the involvement of the Catalogue of Life as a project member, yielded productive complementarity in enhancing end-user workflows.
-*Submitting the data
-Depends on the type of checking process.
+To help users to make a choice of services and datasets available, TETTRIs has developed an online tool that allows users to enter their preferences and retrieve a list of potentially useful and available service/dataset combinations. A prototype of the tool is provided as [https://wiki.bgbm.org/tettriswiki/uploads/tettriswiki/demo/ "CheckMyName - Taxonomic Name Matching Service Explorer"] and described in more detail in [https://wiki.bgbm.org/tettriswiki/uploads/tettriswiki/a/a9/2025-09-22-CheckMyName-Description.pdf this document]. The choice of criteria to define the user's needs as well as the corresponding metadata needed to make a choice will be detailed in a forthcoming article.
-*Getting and interpreting the results
-Essentially, for process type 1 and 2 the results are provided by listing exact matches and possible candidates, i.e. names that match the input to a certain extent. Interpretation refers to assessing the candidates and, if appropriate, selecting one of them as the correct match.
-*Incorporating the results locally
-On the one hand, this refers to corrections of names made locally as a result of candidate matching. On the other hand, once matches were made unambiguously, this may result in incorporating the aggregator's name ID into the local dataset, to allow linkage to the aggregator and (if such functionality is made available) interaction with the aggregator.

Difference between revisions of "End-user workflows for name matching"

Latest revision as of 16:35, 22 September 2025

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Content

MediaWiki navigation

Tools