Terminology

Here is a brief glossary of terms used throughout the code and project.

  • novel k-mer: a k-mer linked (or putatively linked) to a novel germline variant
  • ikmer: short for interesting k-mer, a synonym for novel k-mer with an emphasis on its unverified status
  • partition: a set of reads that share novel k-mers and are thus (putatively) associated with the same variant; sometimes this is abbreviated in the code using cc or CC, referring to the fact that these partitions are reflected as connected components in the shared novel *k-mers read graph; other abbreviations in the code include PART and CALLCLASS
  • augfastx: sequences in Fasta or Fastq format, augmented with annotations indicating the position and abundance of interesting k-mers and mate sequences; the kevlar.parse_augmented_fastx and kevlar.print_augmented_fastx commands can be used to read and write data in augmented Fasta or Fastq format
  • contig: in the context of kevlar, a contig almost always refers to a sequence assembled from a set of reads sharing novel k-mers and thus (putatively) spanning the same novel variant
  • reference cutout: the algorithm kevlar uses to align contigs and call variants is not designed to map a short contig to a long chromosome sequence; therefore kevlar computes a “reference target sequence” or a “cutout” of the genome to which each contig is aligned for variant calling