Whole genome sequencing for the discovery of genetic markers for detection of Shigella and enteroinvasive Escherichia coli

, PhD
Division of Microbiology
Center for Food Safety and Applied Nutrition (CFSAN), FDA
5100 Paint Branch Parkway, College Park, MD 20740, USA

Emily Pettengill and Rachel Binet

Shigella and enteroinvasive Escherichia coli (EIEC) are closely related human-specific pathogens that share mechanisms of pathogenesis to cause bacillary dysentery or Shigellosis. Due to the high degree of genetic relatedness, current biochemical and serological methods have limited ability to differentiate between the two groups. Markers for rapid identification will aid efforts to track and trace-back outbreaks.

A diverse collection of 96 Shigella and EIEC isolates, 18 Escherichia and 2 Salmonella isolates were used for phylogenetic analyses. Whole genome sequence (WGS) data was obtained from in-house sequencing of isolates using an Illumina MiSeq and from public databases. A phylogenetic tree was constructed with data from SNP analyses, which was also used to identify SNP markers. Additionally, markers were determined from cluster analysis of the presence/absence of genes in annotated genomes. We identified 2,863 core SNPs that formed 11 polyphyletic clusters of Shigella and EIEC. This suggests that those bacteria have evolved independently multiple times and are closely related to each other and other pathogenic E. coli. We present a panel of SNP markers specific to each phylogenetic cluster as well as a list of cluster specific gene markers for molecular identification.

Phylogenetic analyses clearly show a very close relationship between Shigella, EIEC and other E. coli. In light of this, we believe that Shigella should be placed within the E. coli group to reflect the phylogeny. Markers presented here can be used to protect public health and safety through faster and more precise detection.