Data quality is of the utmost importance to deliver high-quality outputs at the end of the project, and even beyond. However, from time to time errors occur in the database, and these need to be cleaned up before any project can make use of the available data. Many of the errors that occur are a result of human error and could have been fixed immediately if quality control procedures had been implemented and followed since the start of the computerisation phase.
Although there are many kinds of errors in the database, experience tells us that there are two major problems that occur during the computerisation process:
Duplication of plant names
Extension of data dictionaries
Duplication of plant names happens when Data Entry Clerks enter a species name into the database twice, resulting in the use of two different genspec codes (a genspec code is a serial comprising of two sets of numbers, one for the genus and one for the species). When information concerning a species is called up, say Acacia karroo, only the specimens encoded under one genspec are accessed, while the specimens of the same species, under the additional genspec, are not picked up. This problem is created when clerks do not double-check their work.
The second, but not less frequently occurring error is the extension of data dictionaries (a data dictionary is a list of variables of a specific database field, such as habitat, growth form, or flower colour). Adding variables to the fields results in them being mixed up; for example, an additional 60 variables have been added to the habitat field of the database. Of the 60 additions, only 1% are true habitat types. This problem is a result of incorrect interpretation of specimen label information, because Data Entry Clerks usually don’t have a strong botanical background.
These problems were addressed at database courses for beginners. The aim was to teach botanical concepts and interpretation. The course was attended by Data Entry Clerks from the participating herbaria of southern Africa. To appreciate the quality of the information contained in the database, they learnt to extract and present information with Microsoft Access queries and reporting facilities. The queries and reports gave them access into the database, enabling them to determine the extent of the errors. It was hoped that such training exercises would lessen the problems and hone the clerks to enter accurate data. Clerks were also encouraged to get approval from scientific personnel before any changes or additions were made to the database.
Quality control, when exercised by the Data Entry Clerk, slows the rate of computerisation. Yet, despite this larger workload being placed on some of the clerks in the region, the computerisation process has increased tremendously during the past four years owing to the increased productivity of the participating herbaria. Currently, the annual computerisation rate stands at 150,000 specimens per annum (versus 34,000 in 1998), which means 600 specimens are computerised per working day in the southern African herbaria. This means that, on average, every participating herbarium computerises some 38 specimens per day.
In many instances, Data Entry Clerks do not understand the terminology under the fields in the database and subsequently waste time contemplating and adding new variables. As discussed earlier, this leads to the extension of data dictionaries with invalid fields.
Data Entry Clerk positions have a high turnover and new clerks are usually appointed long before the next database training course (there is approximately one course every 10 months). In addition, knowledge transfer is inadequate in some of the participating institutions. Clerks therefore remain unproductive until they have attended a course.
Project management did not place enough emphasis on the importance of captured specimen data for both the institution and the region. Participating institutions, owing to lack of commitment, did not see this activity as a primary objective and resources were concentrated elsewhere. In addition, participating herbaria did not grasp the magnitude of the task of computerising thousands of specimens.
No quality control process was put in place and it was wrongly assumed that the way data was entered was correct. Untrained Data Entry Clerks were responsible for many wrongly encoded entries, which will have to be corrected later and subsequently waste a lot of time.
Certain countries in the region use different georeferencing systems and these need to be translated while the herbarium specimen labels are encoded. Lack of resources in participating herbaria forces Data Entry Clerks to conduct this activity themselves and this slows down the computerisation process considerably.
Misspelt information is sometimes entered and needs to be corrected at a later stage, which stretches the existing resources and slows down the encoding activities.
Many participating institutions do not adequately make use of local computer service providers to repair the systems when computers break down. Countries have on many occasions waited weeks for SABONET IT to fix their “faulty” database, when in fact the problem was of a technical nature and could have been fixed within a day by a service provider.
Networking problems have created major breakdowns. Simple actions, such as improper logon, disrupt the networking systems, disabling workstations to capture data onto the server.
Service providers in the participating countries often do not load the correct software, which creates clashes and deficient hardware. This usually goes undetected by participating institutions, resulting in confusion and unproductive periods.
SABONET database courses have always stressed the importance of backing up data on a daily basis. Despite the warnings and reminders, many countries still fail to do so and up to a month’s captured data have been lost on many occasions when hard disks crashed.
The involvement of SABONET in the SECOSUD project created a lot of confusion about responsibilities and, as a result, in certain institutions much time was spent on activities not related to SABONET.
—by Trevor Arnold & Stefan Siebert
SABONET News 7.2: 92