

Although
there are many kinds of errors in the database, experience tells us that
there are two major problems that occur during the computerisation process:
Duplication
of plant names
Extension
of data dictionaries
Duplication
Duplication
of plant names happens when Data Entry Clerks enter a species name into
the database twice, resulting in the use of two different genspec codes
(a genspec code is a serial comprising of two sets of numbers, one
for the genus and one for the species). When information concerning a species
is called up, say Acacia karroo, only the specimens encoded under
one genspec are accessed, while the specimens of the same species, under
the additional genspec, are not picked up. This problem is created when
clerks do not double-check their work.
Data
Dictionaries
The second,
but not less frequently occurring error is the extension of data dictionaries
(a data dictionary is a list of variables of a specific database
field, such as habitat, growth form, or flower colour). Adding variables
to the fields results in them being mixed up; for example, an additional
60 variables have been added to the habitat field of the database.
Of the 60 additions, only 1% are true habitat types. This problem is a result
of incorrect interpretation of specimen label information, because Data
Entry Clerks usually don’t have a strong botanical background.
These
problems were addressed at database courses for beginners. The aim was to
teach botanical concepts and interpretation. The course was attended
by Data Entry Clerks from the participating herbaria of southern Africa.
To appreciate the quality of the information contained in the database,
they learnt to extract and present information
with Microsoft Access queries and reporting facilities. The queries and
reports gave them access into the database, enabling them to determine the
extent of the errors. It was hoped that such training exercises would lessen
the problems and hone the clerks to enter accurate data. Clerks were also
encouraged to get approval from scientific personnel before any changes
or additions were made to the database.
Quality
control, when exercised by the Data Entry Clerk, slows the rate of computerisation.
Yet, despite this larger workload being placed on some of the clerks in
the region, the computerisation process has increased tremendously during
the past four years owing to the increased productivity of the participating
herbaria. Currently, the annual computerisation rate stands at 150,000 specimens
per annum (versus 34,000 in 1998), which means 600 specimens are computerised
per working day in the southern African herbaria. This means that, on average,
every participating herbarium computerises some 38 specimens per day.
Quality
Control
Although
we have reached the highest computerisation rate since the start of the
project, certain problems have negatively influenced the rate of the process:
In
many instances, Data Entry Clerks do not understand the terminology under
the fields in the database and subsequently waste time contemplating and
adding new variables. As discussed earlier, this leads to the extension
of data dictionaries with invalid fields.
Data
Entry Clerk positions have a high turnover and new clerks are usually appointed
long before the next database training course (there is approximately one
course every 10 months). In addition, knowledge transfer is inadequate in
some of the participating institutions.
Clerks therefore remain unproductive until they have attended a course.
Project
management did not place enough emphasis on the importance of captured specimen
data for both the institution and the region. Participating institutions,
owing to lack of commitment, did not see this activity as a primary objective
and resources were concentrated elsewhere. In addition, participating herbaria
did not grasp the magnitude of the task of
computerising thousands of specimens.
No
quality control process was put in place and it was wrongly assumed that
the way data was entered was correct. Untrained Data Entry Clerks were responsible
for many wrongly encoded entries, which will have to be corrected later
and subsequently waste a lot of time.
Certain
countries in the region use different georeferencing systems and these need
to be translated while the herbarium specimen labels are encoded. Lack of
resources in participating herbaria forces Data Entry Clerks to conduct
this activity themselves and this slows down the computerisation process
considerably.
Misspelt
information is sometimes entered and needs to be corrected at a later stage,
which stretches the existing resources and slows down the encoding activities.
Many
participating institutions do not adequately make use of local computer
service providers to repair the systems when computers break down. Countries
have on many occasions waited weeks for SABONET IT to fix their “faulty”
database, when in fact the problem was of a technical nature and could have
been fixed within a day by a service provider.
Networking
problems have created major breakdowns. Simple actions, such as improper
logon, disrupt the networking systems, disabling workstations to capture
data onto the server.
Service
providers in the participating countries often do not load the correct software,
which creates clashes and deficient hardware. This usually goes undetected
by participating institutions, resulting in confusion and unproductive periods.
SABONET
database courses have always stressed the importance of backing up data
on a daily basis. Despite the warnings and reminders, many countries still
fail to do so and up to a month’s captured data have been lost on many occasions
when hard disks crashed.
The
involvement of SABONET in the SECOSUD project created a lot of confusion
about responsibilities and, as a result, in certain institutions much time
was spent on activities not related to SABONET.
—by Trevor Arnold & Stefan Siebert
SABONET News 7.2: 92

