Cancer has been shown to stem from multiple, independent mutations, aggregating to gain control of cellular activity. Many studies focus on isolated mutations crucial to disease progression. However, in so doing they forfeit a greater sense of any single gene’s causative role in the greater systematic flux developing. A better understanding of this interrelatedness of disease-linked components is essential to treatment and prevention. To achieve this we assemble an array of genes, each known to have a role in human cancers, and measure the cumulative effect of knocking down genes singly and in pairs via siRNA. Dependencies are inferred among the genes with a high level of accuracy via a method of limiting false links. Accuracy is measured as the ability to reproduce an independent dataset compared to a distribution of shuffled, inferred networks. In this way, both direct interactions stemming from such targeted knock-down as well as the overall effect of the perturbation on the readout genes are uncovered. We are able to reproduce many known links in addition to predicting novel interactions, a subset of which is experimentally validated.
Motivation: Inference of Gene Regulatory Networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many GRN inference methods exist, but the topology of their estimates tend to be sensitive to changes in method specific parameters. Even though the inferred network is optimal given the parameters, it has been shown that many links are wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied.
Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate
of GRN inference that can be applied to any setting of inference parameters, noise level, or data property. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, and RNI inference methods. An improved inference accuracy was observed in almost all situations. The method is part of the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.
I have worked in collaborating with other students, namely Andreas Tjärnberg
, on the GeneSpider Package for MATLAB, which hopes to tackle a few
key issues in modern network inference.
Inference of gene regulatory networks (GRNs) is a central goal in systems biology. It is therefore important to evaluate the accuracy of GRN
inference methods in the light of network and data properties. Although several packages are available for modelling, simulate, and analyse GRN inference,
they offer limited control of network topology together with system dynamics, experimental design, data properties, and noise characteristics. Independent
control of these properties in simulations is key to drawing conclusions about which inference method to use in a given condition and what performance to
expect from it, as well as to obtain properties representative of real biological systems.
Some of my favorite shots from recent travels
Most of my works from over the years; n.b. no formal training whatsoever