Publication Details

AFRICAN RESEARCH NEXUS

SHINING A SPOTLIGHT ON AFRICAN RESEARCH

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database

SpringerPlus, Volume 5, No. 1, Article 1896, Year 2016

Background: Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. Results: By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. Conclusions: The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments.

Statistics
Citations: 28
Authors: 3
Affiliations: 1
Identifiers
Research Areas
Cancer
Environmental
Genetics And Genomics
Infectious Diseases