Réseau Scientifique PhyloMAP

PhyloMAP#5

La réunion annuelle du réseau PhyloMAP (Phylodynamique des Maladies Animales et des Plantes) a eu lieu le jeudi 14 novembre 2024, à l’Institut des Systèmes Complexes à Paris (dans le 13e arrondissement, métro Nationale).

Pour cette journée du réseau, nous avons eu le plaisir d’accueillir comme orateurs invités (titres et résumés en bas de page) :

  • Sarah Hill (Royal Veterinary College, University of London, UK)
  • Jean-Michel Hily (Institut Français de la Vigne et du Vin, Le Grau-Du-Roi - LPA Vitivirobiome, Colmar)
  • Maude Jacquot (IFREMER, La Tremblade, France)
  • Nils Poulicard (PHIM, Université Montpellier, IRD, INRAE, Cirad, Institut Agro, Montpellier, France)

Cette année ces rencontres se sont poursuivies le lendemain, le vendredi 15 novembre 2024, par un workshop qui avait pour thème “Co-circulation de plusieurs génotypes : problèmes et méthodes”.


Programme:

Matin :

09:20-09:40 – Acceuil des participants

09:40-10:20 – Nils Poulicard – Dispersion and evolutionary history of rice yellow mottle virus in Africa: tales of rice and men

(PHIM, Université Montpellier, IRD, INRAE, Cirad, Institut Agro, Montpellier, France)

Rice has become a pillar of food security in Africa. During the 20th century, rice cultivation intensified to cope with the rising demand due to demographic changes in Africa. Rice yellow mottle virus (RYMV, Solemoviridae) is a major biotic constraint to rice production in Africa. RYMV is a (+)ssRNA virus transmitted at short distances by beetles and by contact between plants during cultural practices. There is no evidence of seed transmission. Seven major strains have been identified with a marked spatial diversity. Several sources of resistance to RYMV have been identified in rice, but none are currently widely used in field. Resistance-breaking risk maps have been proposed based on the spatial distribution of the strains and their pathogeny estimated under controlled conditions. However, the validity and the sustainability of these risk maps are strictly dependent on the dispersal and adaptive characteristics of the RYMV in field conditions. The main objectives of this study are i) to reconstruct the dispersal dynamics of RYMV in Africa, ii) to identify the main drivers of RYMV evolution and dispersal, and iii) to estimate the impact of RYMV evolutionary history on the sustainability of resistance genes in fields. Based on RYMV genetic data collected in Africa since the 1970s, the phylogeography of RYMV was reconstructed using Bayesian evolutionary inference. These spatiotemporal reconstructions revealed links between RYMV expansion dynamics, the evolution of rice production in Africa and the migration of human populations during historical conflicts. Furthermore, we showed that the spatial dispersal of the RYMV has shaped its genetic evolution, with the emergence of adaptive mutations to new host species and interstrain recombination events. Overall, combining field epidemiology, experimental assays and modelling, we have partially unravelled the balance and interplay between genetic determinants and stochasticity in the evolution and epidemiology of a plant virus.

10:20-10:40 – Fabiana Gambaro – Navigating sampling bias in discrete phylogeographic analysis: assessing the performance of an adjusted Bayes factor

Bayesian phylogeographic inference is a powerful tool in molecular epidemiological studies, enabling the reconstruction of the dispersal history of rapidly evolving pathogens. BEAST, a Bayesian phylogenetic inference software package, provides a discrete trait analysis (DTA) that integrates geographic information as discrete characters and infers transition events among discrete sampling locations. The DTA model can be coupled to a model averaging procedure to elucidate the subset of epidemiological links that appropriately explain the diffusion process, which provides a Bayes factor (BF) test to identify significant migration links. The BF support for a particular link is the ratio between the a posteriori and the a priori expection that this migration link helps explain the migration history. In its current setup, however, the a priori expectation only depends on the number of trait states but does not account for the relative abundance of the involved trait states. This can bias inference in the presence of uneven sampling, and appropriately identifying relevant patterns is crucial for reliable hypothesis testing.

To mitigate potential artifacts from sampling bias, Vrancken and colleagues recently introduced and applied an adjusted Bayes factor (BFadj) which incorporates information on the relative abundance of samples by location.

In this study, we formally assess the performance of the BFadj, specifically determining to what extent it can identify false positive from the standard approach. To achieve this, we used simulated epidemics of rabies virus in dogs in Morocco, representing different degrees of sampling bias. We compare the standard and adjusted BF for all transition events. Our results indicate that the BFadj leads to the identification of fewer false positive transitions events but also fewer true positives, hence it appears to more efficiently classify transitions events as non-signiificant at the expense of being more conservative.

10:40-11:00 – Sebastian Leuqime –

Large-scale metagenomic and -transcriptomic studies have revolutionized our understanding of viral diversity and abundance, allowing us to generate an unprecedented amount of viral genomic sequences useful to all kinds of studies, from virus discovery to phylodynamics/molecular epidemiology, especially in less studied organisms. In contrast, endogenous viral elements (EVEs), remnants of viral sequences integrated into host genomes, have received limited attention in the context of virus discovery. EVEs resemble their original viruses, making distinguishing between active infections and integrated remnants difficult, affecting virus classification and biases downstream analyses.

In our study, we assessed the effects of EVEs on a prototypical virus discovery pipeline. We examined EVEs and exogenous viral sequences linked to Orthomyxoviridae, a diverse family of negative-sense segmented RNA viruses, in 13 genomic and 538 transcriptomic datasets of Culicinae mosquitoes. Our analysis revealed a substantial number of viral sequences in transcriptomic datasets. However, a significant portion appeared not to be exogenous viruses but transcripts derived from EVEs. Distinguishing between transcribed EVEs and exogenous virus sequences was especially difficult in samples with low viral abundance. For example, three transcribed EVEs showed full-length segments, devoid of frameshift and nonsense mutations, exhibiting sufficient mean read depths that qualify them as exogenous virus hits.

Our study highlights that our knowledge of the genetic diversity of viruses can be altered by the underestimated presence of EVEs in transcriptomic datasets, leading to false positives and altered or missing sequence information, which can affect downstream analyses, especially phylodynamics/molecular epidemiology.

11:00-11:20 – Pause café

11:20-12:00 – Maude Jacquot – Impact of genomic data on marine mollusc disease control

(IFREMER, La Tremblade, France)

Infectious diseases have the potential to impose substantial mortality, morbidity and economic burdens on human and animal populations. Tracking diseases spread to assist in their control has traditionally mainly relied on the analysis of case data gathered as the outbreaks proceed. However, key questions in infectious disease epidemiology, such as the detection and characterization of outbreak pathogens, their spatio-temporal transmission dynamics, and the identification of driving factors, can now be much more accurately addressed thanks to recent advances in sequencing, phylodynamics and landscape phylogeography. Molecular sequences are widely used to understand epidemics affecting humans. While this approach is less common for veterinary and especially marine pathogens, it is expected to play a major role in developing strategies for their control in the near future. In this context, we assess the potential and feasibility of molecular surveillance and molecular epidemiology in controlling marine pathogens. Using spatio-temporal simulations of epidemics, we quantify the contribution of molecular data for the characterisation of factors that drive the emergence, spread and maintenance of farmed bivalve pathogens compare to case data. We explore scenarios for which such approaches are relevant and efficient to inform policy stakeholders and provide guidelines for systematic use. Our findings are then applied to a newly-generated genome dataset of a marine invertebrate pathogen, which currently shows significance relevance. Finally, genomic data inferences will be integrated to epidemiological mechanistic models with the aim of producing a decision support system for marine mollusc disease management.

12:00-12:20 – Francesco Pinotti – Epidemic and phylodynamic analysis of H9N2 avian influenza virus transmission patterns in live-bird markets in Bangladesh

Poultry production in Bangladesh has expanded considerably over the last three decades, facilitating access to meat and improving food security. At the same time, however, the intensification of poultry production and distribution networks raises fundamental questions about their role in shaping the dynamics of avian pathogens. H9N2 avian influenza virus (AIV) is endemic in Bangladesh and highly prevalent in live bird markets, but its epidemiology remains relatively understudied. Here, we use mathematical models to shed light on the transmission patterns of H9N2 AIV in markets. First, we identify short latency periods and frequent (re-)introductions as the main factors that allow viral persistence in individual markets. Second, we show how genomic data can be used to better understand H9N2 AIV diversity within and between markets. Finally, we describe a novel initiative that combines mechanistic and phylodynamic tools in order to model the dynamics of H9N2 AIV at the scale of the entire production and distribution networks. These results shed new light on H9N2 AIV transmission and offer novel opportunities to obtain a systemic, multi-scale understanding of epidemic risk and diversity of AIV outbreaks.

12:20-12:40 – Antinea Sallen – Tracing the origin of Ralstonia solanacearum in mainland France

Ralstonia solanacearum is a phytopathogenic bacteria belonging to the Ralstonia solanacearum species complex (RSSC), which also includes species Ralstonia pseudosolanacearum and Ralstonia syzygii. It is considered one of the most damaging pests worldwide. Reports of R. solanacearum have been made in several European countries since 1990, and, more recently, outbreaks of R. pseudosonalacearum have been reported as well. Despite all species of the RSSC being classified as quarantine pests in the European Union, few studies have focused on the genotypic diversity of European RSSC strains, and more specifically French ones. Between 1994 and 2023, more than 360 RSSC strains were collected in mainland France as part of territory surveillance. Phylotype and sequevar characterization revealed that all of them belonged to the genetic group phylotype II, sequevar 1 (IIB-1). In order to understand the origin of French outbreaks, we investigated the genotypic and genomic diversity of this strain collection. In particular, we explored the potential of Multiple Loci VNTR Analysis (MLVA), widely used to monitor diversity among bacterial populations, to discriminate closely related R. solanacearum strains. A novel MLVA scheme was specifically designed for R. solanacearum phylotype IIB-1 strains, and therefore adapted to the mainland French strain collection. Nearly 50 different VNTR profiles were discriminated among the near-clonal French strains and were analyzed in relation to isolation year, host and sampling location. In order to estimate the diversity of mainland French strains and to have more clues about the origin of introduction, 29 European strains of R. solanacearum IIB-1 were included in the MLVA study as well. The genomes of 235 mainland French R. solanacearum IIB-1 strains were also sequenced to further investigate their genomic diversity. Previously available worldwide public genomes of R. solanacearum were considered as well. These genomic data will complement the genotypic ones obtained by MLVA and will provide evidence for the origins and dates of introduction of R. solanacearum in mainland France.

12:40-14:00 – Pause déjeuner

Après-midi :

14:00-14:40 – Sarah Hill – Epidemiology and evolution of viruses in salmon

(Royal Veterinary College, University of London, UK)

Atlantic salmon (Salmo salar) aquaculture is one of the fastest-growing food production systems. However, over 20% of fish die before they can be harvested, with infectious disease considered the main single cause of death. Piscine myocarditis virus (PMCV) and infectious salmon anaemia virus (ISAV) represent two of the most harmful viruses in salmon aquaculture. High mortalities caused by these viruses result in severe economic damage, reduce fish welfare, and increase the impact of production on wild populations that are already suffering catastrophic declines. Understanding the evolution and spread of viruses in salmon is crucial to supporting the sector to reduce viral transmission. To achieve this, we generated over 500 whole viral genome sequences from farmed and wild fish infected with ISAV and PMCV in three countries. Using real-time data tracking the movements of boats between farms, and high-resolution (cage-level) information on affected fish, we conducted phylodynamic analyses to investigate how these viruses may be spreading between farms and between different generations of fish. Our findings highlight the emerging importance of whole-genome sequencing and phylodynamic methods in tracking fish virus transmission and evolution. Continued genomic surveillance is essential for improving biosecurity, controlling viral spread, and mitigating economic impacts in salmon aquaculture.

14:40-15:00 – Amandine Cunty – Genomic analyses of French Xylella fastidiosa subsp. pauca whole genomes directly sequenced from contaminated plants

Xylella fastidiosa, native to the Americas, is a plant pathogenic bacterium with a wide host range that causes important diseases of grapevine, citrus and olive trees. Since 2013, X. fastidiosa has been detected in Europe, where different subspecies have been identified in various plant species in Italy, France, the Balearic Islands, Spain and Portugal. Isolation of this bacterium from contaminated plant material is not always successful and whole genome sequencing data are needed to perform comparative genomics in order to study the epidemiology and origin of X. fastidiosa introductions, and to provide new insights into the survey and management of this quarantine bacterium. In order to target the bacterial whole genome sequence directly from infected plant samples, a SureSelect targeted enrichment method was developed. It was applied to different French plant samples naturally infected, with different subspecies and sequence types of X. fastidiosa. Enrichment was very effective in recovering the whole genome sequences, with genome coverage significantly improved for all enriched plant samples, regardless of plant species or level of contamination. The targeted enrichment method was performed on two specific plant samples contaminated by the subspecies pauca, sampled in south-eastern France in Menton, one in 2015 and the other in 2019. The two bacterial genome sequences captured were analysed and compared with several X. fastidiosa subspecies pauca whole genome sequences available in public databases. A tip-dating analysis was performed to estimate the date of introduction into France. The results obtained from comparative genomic and phylogenetic approaches enabled the most probable scenario for the introduction of these strains in France to be inferred, within the context of the European outbreaks.

15:00-15:20 – Denis Fargette – Metagenomics and phylogeography of sobemoviruses

Virus particles of sobemovirus species are characterized by high stability in plant residues, in the soil and animal intestinal tracks. Accordingly, sequences of sobemoviruses are increasingly being extracted from gut, sediment, and soil metagenomes. Recently, the information obtained from these sequences helped to reconsider the evolution of sobemoviruses (doi.org/10.1371/journal.ppat.1011911). In this presentation, the phylogeography of the sobemoviruses will be re-evaluated, with particular emphasis on sequences obtained from non-plant metagenomes. The ecology of the sobemoviruses examined in light of these sequences is significantly different from that based solely on sequences from infected plants. Furthermore, this analysis reveals that the ecology of the sobemoviruses is unique among plant viruses.

15:20-15:40 – Pause café

15:40-16:20 – Jean-Michel Hily – Le datamining, un outil puissant pour décrypter la dispersion mondiale et l’histoire évolutive de virus de vigne

(Institut Français de la Vigne et du Vin, Le Grau-Du-Roi - LPA Vitivirobiome, Colmar)

À l’aube du séquençage à haut débit (HTS), le dépôt et l’accumulation d’informations génétiques sous forme numérique au sein de bases de données dédiées sont considérables et ne cessent de croître. Des Tera-bases de données in silico sont produites quotidiennement, pour des applications cliniques ciblées ainsi qu’à des fins de recherche. Cependant, seule une infime partie des données est utilisée et analysée, c’est-à-dire la partie qui était dédiée à répondre à la question pour laquelle elles ont été produites. Le datamining, le processus de collecte, de recherche, d’extraction et de découverte d’informations utilisables au sein d’une si grande quantité de données, devient donc un outil très important et puissant pour identifier d’éventuels nouveaux agents pathogènes, de nouveaux virus ou de nouveaux variants de virus connus. C’était le cas par exemple pour la désormais célèbre famille des Coronaviridae (Edgar et al. 2022). Ces méthodes ne se limitent pas au règne « animal » et peuvent également être utilisées pour d’autres hôtes (comme la vigne, Vitis spp. par exemple). Notre projet pilote visait à mieux comprendre un trichovirus nouvellement décrit, le Grapevine Pinot gris virus (GPGV), qui infecte la vigne (Giampetruzzi et al. 2012). Le virus a désormais été détecté dans la plupart, sinon dans la totalité, des pays viticoles où il a été recherché. L’association du datamining et d’outils de bioinformatique, nous a permis de retrace l’histoire évolutive de ce virus.

16:20-16:40 – Isis Lorenzo – *Nettoyer pour Innover : Améliorer la qualité des données publiques de pathogènes

alimentaires pour une utilisation optimale dans des modèles d’IA*

En sécurité alimentaire, l’Intelligence Artificielle (IA) a récemment été utilisée afin de prédire les voies de contamination, la virulence ou la résistance d’agents pathogènes (AP). Cependant, ces modèles ont été entrainés sur des jeux de données restreints (<2000 souches). A l’ère du Big Data, l’étude des AP bénéficie de vastes bases de données accessibles publiquement, qui pourraient permettre de mieux comprendre leurs processus épidémiologiques et évolutifs. En effet, la démocratisation du séquençage et la politique Open-Data ont entraîné une croissance exponentielle du nombre de génomes bactériens disponibles dans les bases de données publiques. Malgré des tentatives d’harmonisation, les métadonnées contextuelles (origine, localisation, date de collection…) des souches restent souvent hétérogènes et non structurées du fait de leur provenance variée. Ainsi, l’application de l’IA à ces métadonnées massivement disponibles nécessite le développement d’outils et de méthodes adaptés. L’objectif de cette étude a été de développer des méthodes automatisées de normalisation et de standardisation permettant l’interopérabilité et l’exploitation de ces métadonnées et données génomiques publiques. Pour cela, nous nous sommes principalement appuyés sur deux référentiels internationaux : Geopy pour les données géographiques et l’ontologie Foodex2 pour décrire et hiérarchiser les sources alimentaires. Ces métadonnées ont été ensuite croisées avec les données d’annotation génomique (gènes d’intérêt, SNP, cgMLST…). Nous avons collecté deux bases de données d’AP impliqués dans des toxi-infections alimentaires à partir de NCBI. Seules les souches présentant une origine clairement définie ont été sélectionnés pour la suite : 46.977 souches de Listeria monocytogenes (Lm) et 10.018 souches de Vibrio parahaemolyticus (Vp). L’harmonisation des termes décrivant le contexte d’isolement des souches a permis une réduction de la diversité du vocabulaire utilisé d’un facteur 3X à 4X. Les bases obtenues incluent ainsi des souches provenant de 69 (Lm) et 41 (Vp) pays différents collectées entre 1900 et 2024, et issues de 368 (Lm) et 77 (Vp) origines différentes (aliments, environnements ou humains). Ces outils de standardisation sont cruciaux pour l’utilisation des modèles d’IA utilisant les données génomiques. Elles pourront être appliquées à d’autres AP d’intérêts en sécurité

16:40-17:00 – Simon Dellicour – How fast are viruses spreading in the wild?

Genomic data collected from viral outbreaks can be exploited to reconstruct the dispersal history of viral lineages in a two-dimensional space using continuous phylogeographic inference. These spatially explicit reconstructions can subsequently be used to estimate dispersal metrics that can inform on the dispersal dynamics and the capacity to spread among hosts. Heterogeneous sampling intensity of genomic sequences can however impact the accuracy of dispersal metrics gained through phylogeographic inference. In our study, we use simulations to evaluate the robustness of three dispersal metrics — a lineage dispersal velocity, a diffusion coefficient, and an isolation-by-distance signal metric — to the sampling effort. Our results reveal that both the diffusion coefficient and isolation-by-distance signal metrics appear to be robust to the number of samples considered for the phylogeographic reconstruction. We then use these two dispersal metrics to compare the dispersal pattern and capacity of various viruses spreading in animal populations. Our comparative analysis reveals a broad range of isolation-by-distance patterns and diffusion coefficients mostly reflecting the dispersal capacity of the main infected host species but also, in some cases, the likely signature of rapid and/or long-distance dispersal events driven by human-mediated movements through animal trade. Overall, our study provides key recommendations for the lineage dispersal metrics to consider in future studies and illustrates their application to compare the spread of viruses in various settings.

17:00-17:20 – Déborah Merda – Etude des réassortiments du virus responsable de la fièvre catarrhale, dans deux zones géographiques : la Guyane et l’Afrique du Nord

Les phénomènes de réassortiments chez les virus pouvent être à l’origine de nouvelles émergences de maladie. La fièvre catarrhale ovine est une maladie virale qui touche les petits ruminants, dans le monde entier. Le génome de ce virus est composé de 10 fragments d’ARN, dont 2 codent pour des protéines de surface, utilisées pour le sérotypage. En Guyane, une collection de 43 souches a été séquencée dans le cadre du suivi épidémiologique d’un foyer avec une grande diversité entre 2010 à 2020. Une situation similaire en Afrique du Nord a été mise en évidence grâce à des approches phylogénétiques, et pour laquelle 25 génomes complets sont disponibles.Dans notre étude, nous avons étudié les réassortiments en utilisant une approche d’inférence bayésienne avec l’outils BEAST (Drummond & Bouckaert, 2014) et le package CoalRe (Müller et al. 2020), dont le modèle de coalescent prend en compte les évènements de réassortiment, et a déjà été utilisé pour analyser l’histoire évolutive du virus de l’influenza, et de les comparer aux approches phylogénétiques classiques. Le taux de réassortiment moyen inféré est plus important que celui inféré chez l’influenza mais est équivalent à celui inféré chez H1N1 (Müller et al. 2020). En conclusion, le nombre de réassortiments est important chez ce virus et ces derniers peuvent avoir lieu entre souches de sérotype différent. Ceci représente un risque d’émergence de nouvelles souches dont la virulence ne serait pas connue.

17:20 – Fin de la journée