We reconstructed a cell-scale model of R. solanacearum that encompassed three modules, a genome-scale metabolic network, a macromolecule secretion module, and a virulence regulatory network. The two first modules are biochemical reaction networks whereas the regulatory network is a biochemical interaction network. Thus, the modular structure of this resulting ‘hybrid model’ (Le Novere et al. 2015) allows computational approaches with appropriate methods for both types of networks, i.e. constraint based modeling for the biochemical networks and multi-state logical modeling for the regulatory network. Both methods do not require kinetics parameters and thus are relevant for genome-scale analysis. We took advantage of our recently developed application to perform computation of such hybrid model (Marmiesse et al. 2015).

Draft metabolic models have been built from these four published metabolic models: Ralstonia eutropha (RehMBEL1391) (Park et al, 2011)) Bacillus subtilis (Bs_iYO844 (Oh et al, 2007)), Pseudomonas aeruginosa (iMO1086 (Oberhardt et al, 2011)) Escherichia coli (iJO1366 (Orth et al, 2011)) The two first ones were selected because of their phylogenetic proximity with R. solanacearum. The bacterium P. aeruginosa was selected because of its pathogen lifestyle and the model of E. coli was used because of the high quality of its reconstruction. The Systems Biology Markup Language (SBML) is for years the standard file format to exchange metabolic models (Hucka et al, 2003). We naturally use this format for all our reconstruction and analysis steps. Three metabolic models (B. subtilis, P. aeruginosa, and E. coli) were published in this format and so easily collected while the model of R. eutropha had to be converted from pdf to SBML since pdf was the only format supplied by the authors for their model. We downloaded proteins sequences of the entire genome of the sources organisms from NCBI, plus the one of R. solanacearum strain GMI1000, later called the “target organism”, from the official genome web portal1 (Salanoubat et al, 2002).
We used the Autograph method (Notebaart et al, 2006) to automatically transfer by orthology the gene reaction associations of the four reference models in four draft models (Figure 2). The orthology prediction was made by using a tuned version of Inparanoid (Remm et al, 2001). Inparanoid found orthologs between the proteome of R. solanacearum and each of the four reference proteomes. The Blast results obtained in Inparanoid were filtered by only selecting hits for which the identity exceeds 30 % and the coverage exceeds 50 %. The BLOSUM45 matrix (reasonable for prokaryotes) was used for the Inparanoid bootstrapping step. For the propagation step, we developed a function called PropagateGprsFromOrthologs in the parseBioNet package (JAVA package dedicated to metabolic networks that we develop for some years). PropagateGprsFromOrthologs takes as input the SBML reference file and a tabulated file containing for each protein of R. solanacearum the ortholog in reference specie, and returns a SBML file. Reactions in the reference model are included by PropagateGprsFromOrthologs in the target model if at least one reference gene has an ortholog in R. solanacearum. In the gene protein reaction links of the kept reactions, the identifiers of the genes are replaced by the identifiers of the corresponding orthologs in R. solanacearum or by the mention “no_ortholog” when the gene does not have any identified ortholog in R. solanacearum. Reactions without associated genes are propagated. Results of the automatic propagation step are displayed in Figure 3. The proportion of reactions propagated from each reference model is quite high (> 60%). We will see in the next section that the manual curation performed after merging highly reduces this number.
We launched SAMIR to build four standardized metabolic models from the four draft models built by propagation. To be able to compare our model with most of the metabolic models, we chose the BIGG database for which the identifiers have been used to build most of the existing metabolic models. We can see in Figure 5 that only 287 propagated reactions are found in every reference model. Also, a lot of the reactions (515) propagated in the final model exclusively come from E. coli. This can easily be explained by the completeness of this model which is not found in the other ones. Numerous reactions have been manually removed from the propagation results (e.g 671 reactions from E.coli). Indeed, orthology evidence is often not sufficient to add a reaction in the model. The metabolic context, deduced by the other reactions and by the literature, thus made reactions dismissed. At last, 853 reactions which don’t come from the propagation results have been manually added to the final model . This highlights the importance of manual curation after a propagation step.