This chapter describes a workflow for measuring a barcode’s accuracy when identifying species. First, assemble a database of specimens with their marker sequences and their species binomials. The species binomials provide a “taxonomic gold standard” for species identification and should be as accurate as possible, to avoid penalizing correct species assignment. Second, select a computer algorithm for assigning species to barcode sequences. Only one algorithm (BLAST+P) has improved notably on the simple strategy of assigning specimens to the species of the database sequence(s) nearest under p-distance. Global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or with multiple sequence alignment algorithms) align entire barcode sequences, using all available information, so they sometimes produce more accurate species identifications than local sequence alignments (e.g., with BLAST), particularly when BLAST produces barcode alignments of small subsequences within the sequences. Finally, consensus has settled on “the probability of correct identification” (PCI) as the appropriate measurement of species identification accuracy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. The chapter discusses some variant PCIs, their calculation and the estimation of their statistical sampling errors. It also discusses good practice in incorporating PCR failure and species with singleton representatives into data summaries. For software relevant to this chapter, see http://tinyurl.com/spouge-barcode.
CITATION STYLE
Spouge, J. L. (2016). Measurement of a barcode’s accuracy in identifying species. In DNA Barcoding in Marine Perspectives: Assessment and Conservation of Biodiversity (pp. 29–41). Springer International Publishing. https://doi.org/10.1007/978-3-319-41840-7_2
Mendeley helps you to discover research relevant for your work.