A structural analysis of the hypoxia response network

Jianjie Li; Yuqi Gao; Xuan Yu

doi:10.7717/peerj.10985

A structural analysis of the hypoxia response network

1Department of Health Service, Army Medical University, Chongqing, Shapingba, China

2Institute of Medicine and Hygienic Equipment for High Altitude Region, Army Medical University, Chongqing, Shapingba, China

DOI: 10.7717/peerj.10985

Published: 2021-04-06
Accepted: 2021-01-31
Received: 2020-06-26

Academic Editor: Yong Wang

Subject Areas: Biochemistry, Bioinformatics, Computational Biology, Computational Science
Keywords: Hypoxia response network, Complex networks, Network topology, Bow-tie structure, Structural complexity

Copyright: © 2021 Li et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Li J, Gao Y, Yu X. 2021. A structural analysis of the hypoxia response network. PeerJ 9:e10985 https://doi.org/10.7717/peerj.10985

Abstract

Background

The hypoxia-inducible factor-1 (HIF-1) signaling pathway is an important topic in high-altitude medicine. Network analysis is a novel method for integrating information on different aspects and levels of biological networks. However, this method has not been used in research on the HIF-1 signaling pathway network. To introduce this method into HIF-1-related research fields and verify its feasibility and effectiveness, we used a network analytical method to explore the structural attributes of the HIF-1 signaling pathway network.

Methods

First, we analyzed the overall network of the HIF-1 signaling pathway using information retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG). We performed topology analysis, centrality analysis, and subgroup analysis of the network. Then, we analyzed the core network based on the overall network analysis. We analyzed the properties of the topology, the bow-tie structure, and the structural complexity of the core network.

Results

We obtained topological structure diagrams and quantitative indicators of the overall and core networks of the HIF-1 signaling pathway. For the structure diagrams, we generated topology diagrams of the network and the bow-tie structure of the core network. As quantitative indicators, we identified topology, centrality, subgroups, the bow-tie structure, and structural complexity. The topology indicators were the number of nodes, the number of lines, the network diameter, and the network density. The centrality indicators were the degree, closeness, and betweenness. The cohesive subgroup indicator was the components of the network. The bow-tie structure indicators included the core, input, and tendril-like structures. The structural complexity indicators included a power-law fitting model and its scale parameter.

Conclusions

The core network could be extracted based on the subgroup analysis of the overall network of the HIF-1 signaling pathway. The critical elements of the network could be identified in the centrality analysis. The results of the study show the feasibility and effectiveness of the network analytical method used to explore the network properties of the HIF-1 signaling pathway and provide support for further research.

Background

Hypoxia affects the work abilities and health of soldiers at high altitudes (Truesdell & Wilson, 2006; Qiang et al., 2014). Research on the hypoxia response network (HRN) is a necessary field of high-altitude medicine that aims to promote the health and work abilities of soldiers at high altitudes. The hypoxia-inducible factor-1 (HIF-1) signaling pathway is one of the crucial components of the HRN (Harris, 2002; Hockel & Vaupel, 2001). Most studies on the HRN have been carried out based on classical biomedical methods, and these studies have made remarkable achievements. With the development of systems biology, researchers have tried to explore the HRN from a systematic view to compensate for the shortcomings of traditional research methods. Kohn et al. (2004) proposed a theoretical HRN model based on ordinary differential equation behavior. They analyzed the structural composition of HRN in detail with the aim of identifying the core subsystem responsible for HRN conversion. The network has been decomposed into multiple primary paths using the extreme path analysis method (Palsson, 2006). It can be well matched with the consensus of existing research, showing that the path switching or branching effect may be the cause of the intense response to the oxygen concentration. Heiner & Sriram (2010) subsequently constructed the general Petri net structural model of the HRN and analyzed the relative modules and properties of the network algorithmically. Additionally, network analysis methods and tools have been used to analyze biological networks, providing a novel perspective for studying traditional biological problems (Barabasi & Oltvai, 2004). Ding et al. (2009) studied the structural and functional properties of the giant strong component of the B. thuringiensis metabolic network. Zhang & Zhang (2019) performed a protein-protein interaction network analysis of insecticide resistance molecular mechanisms. Ma & Zeng (2003) studied the connectivity of the metabolic networks of 65 biological species, which did not include the core network of HIF-1 signaling pathways. They found that these biological metabolic networks are similar to the Internet in their macrostructure, which also presents a bow-tie structure (Ma & Zeng, 2003). Network analysis methods focus on the properties of the connections of things and investigate these connections as a whole (Boccaletti et al., 2006). However, this method has not been used to analyze the HRN.

This study describes an analytical method for the hypoxia network involving the analysis of the quantitative network indicators of the HIF-1 signaling pathway from the perspective of complex networks. It aims to explore the structural properties of this network and verify the feasibility of the network analytical method. We analyzed the structural attributes of this network after constructing it from the corresponding biomedical database and then checked the complex properties of the network.

Methods

Datasets

This study focused on the structure of the hypoxia-inducible factor-1 (HIF-1) (Semenza, 1999; Pugh & Ratcliffe, 2003) signaling pathway of the HRN and retrieved related information from the entries in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa & Goto, 2000). KEGG provides data resources related to the high-level functions and operations of cells, organs, ecosystems, and other life systems based on molecular-level information, especially information used for parsing molecular datasets generated by gene sequencing and other high-throughput experimental techniques (Kanehisa et al., 2017). There are two essential elements of KEGG discussed in this paper: the KEGG PATHWAY and KEGG ORTHOLOGY databases. The KEGG PATHWAY database is a collection of knowledge-based hand-drawn maps of pathways based on molecular interaction networks covering metabolic, genetic information processing, environmental information processing, cellular processes, biological systems, human diseases, and drug discovery. KEGG PATHWAY provides a concrete explanation of the meaning of the markers of the HIF-1 signaling pathway. The KEGG ORTHOLOGY (KO) database is a set of ortholog assemblages manually defined to denote the nodes (boxes) in the KEGG PATHWAY maps. The distinctive identifier that determines each KO entry is referred to as the K number (‘K’ followed by a five-digit number).

We analyzed the data in the KEGG Markup Language (KGML) file. These data contain information that corresponds to the HIF-1 signaling pathway in KEGG PATHWAY. The KGML, which enables the automated mapping of KEGG pathways, is an extensible markup language (XML) representation of the KEGG pathway. It is conducive to the computer-aided analysis and model building of gene/protein and chemical networks. There are two types of graphic elements in the KGML-based metabolic pathways. One is a rectangle, which represents an enzyme, connected by a “relationship.” The other is a circle, which represents a compound, connected by a “reaction.” In non-metabolic pathways, there are only rectangular elements, which indicates that there are only proteins joined by “relationships” in these pathways. The HIF-1 signaling pathway addressed in this study belongs to the metabolic pathways.

Overall analytical methods

The methods of this study are based on network analysis. The network of the HIF-1 signaling pathway is a metabolic network. It is a directed unprivileged network; i.e., the direction of a line between two nodes needs to be considered rather than the weight in the network. The corresponding analytical methods applied in this study are as follows: Process the text in the KGML file of the HIF-1 signaling pathway. Refine the network topology information and transform it into a network file that is identifiable for network analysis. Analyze the network indicators and draw the network topology maps. Analyze and test the structural complexity of the network. The generation of the network file is based on regular expression using the text-editing software Notepad++ and functions using the spreadsheet software Excel. The process of network analysis is based on network analysis models using Pajek. These methods and tools are comprehensively utilized to realize the qualitative and quantitative integration analysis of the network structural properties of the HIF-1 signaling pathway.

Biological network analysis tools are used to identify, analyze, visualize, or simulate nodes (reactants) and edges (reactions) from various input data types, including mathematical models of biological networks. There are four commonly used tools, namely Gephi, Networkx, IGraph, and Pajek. All of them are free for use, and they can handle large graph size. Gephi and Pajek are GUI based network analysis tools, whereas Networkx and IGraph are used in a programming language. We have chosen Pajek as the analysis tool based on the following reasons: First, Pajek supports a multi-relational network graph. There are different reactants in the HIF-1 signaling pathway, such as orthologous enzymes, compounds, and groups. And the kind of their relationships is diverse, including activation reactions, expression reactions, and inhibition reactions. Therefore, the network of the HIF-1 signaling pathway is a multi-relational network which can be analyzed by Pajek. Second, Pajek support partition feature enables the identification and extraction of the HIF-1 signaling pathway core network. Third, Pajek supports more network graph layout algorithms, and its algorithms for analyzing the corresponding network features are better. Forth, Pajek is a standalone software that is easy to learn and easy to use (Akhtar, 2014).

Topology

The network topology indicators include the node number, the line number, the line value, the network density, and the network diameter. The node is an entity that forms the network. The lines between the nodes represent the connections between network entities. The line value is the weight of a line, which indicates the strength of the relationship between network entities. The diameter of the network is the longest path of the shortest distance between pairs of nodes in the network, which is the number of maximum steps required to connect any pair of nodes in the network. The network density is equal to the numeric ratio of the actual connections to the possible connections. It indicates the degree of closeness of the relationships between the nodes in the network.

Centrality

Centrality is an essential concept of network analysis. A highly centralized network supports the convenient transmission of information. The central node has a critical influence on the transmission of information in the network. The metrics for centrality used in this paper include the centrality of the degree, the closeness, and the betweenness.

Degree centrality

The degree centrality of a node is defined as the number of connections of a node. It provides the most intuitive conceptual form of centrality indicator. It can be classified as the overall, input, or output degree centrality. These values correspond to the total number of connections, the number of input connections, and the number of output connections of the node, respectively. The regular formula of the degree centrality is as follows (Wasserman & Faust, 1994): (1) $D C_{i} = D i n_{i} + D o u t_{i}$ where DC_i represents the overall degree, Din_i represents the input degree, and Dout_i represents the output degree.

Closeness centrality

A node’s closeness centrality is the value obtained by dividing the number of all other nodes by the sum of the geodesic distance between the node and all other nodes. The geodesic distance is the number of connections included in the shortest path between the two nodes. The farther away a node is from other nodes, the lower the closeness centrality of the node is, and vice versa. Similar to the degree centrality, the closeness centrality of the nodes involves the overall closeness centrality, the input closeness centrality, and the output closeness centrality. The closeness centrality of a node reflects the nearness of a node to other nodes. The closer a node is to other nodes, the easier it is for information to reach the node, and the higher its closeness centrality is. The regular formula of the closeness centrality is as follows (Wasserman & Faust, 1994): (2) $C C_{i} = \frac{N - 1}{\sum_{j = 1}^{N} d_{i j}}$ where i ≠ j, N is the number of nodes, and d_ij is the shortest pathway between nodes i and j.

Betweenness centrality

A node’s betweenness centrality is the ratio of the shortest pathways passing through this node to all of the shortest pathways between any two nodes in the network. The degree centrality and closeness centrality are based on the reachability of a node in the network. In view of betweenness centrality, if this indicator of a node is high, its importance as an intermediary node in the network is higher, and its bridging ability is active.

The normal formula for the betweenness centrality is as follows (Wasserman & Faust, 1994): (3) $B C_{i} = \frac{2 \times \sum_{j \leq k} g_{j k} (i) ∕ g_{j k}}{(N - 1) (N - 2)}$ where i ≠ j ≠ k, g_jk represents the number of the shortest pathways between nodes j and k; g_jk(i) represents the number of the shortest pathways containing i, and N represents the number of nodes.

Cohesive subgroups

Some entities in the network relate to each other so tightly that they form a small local group known as a cohesive subgroup. The number of cohesive subgroups in the network as well as the local and global associations between subgroups are analyzed via cohesive subgroup analysis. This analysis includes component analysis, K-core analysis, and island analysis based on different analytical indicators and perspectives (De Nooy, Mrvar & Batagelj, 2018). The component analysis is mainly applied in this study. The component is an essential indicator of the cohesive subgroup analysis, which refers to the largest connected subnetwork in the network; i.e., there is a way to reach other nodes between any nodes in the subnetwork. Biometabolism networks often contain components that are not connected, and the most significant component is often the one that needs attention.

Bow-tie structure

The bow-tie property is an essential component of these properties. Broder et al. (2000) revealed that there is a bow-tie structure in the topology structure of the Internet at the macro level.

Figure 1 shows the bow-tie structure. The nodes of the input section can reach the nodes of the core section of a strong connection, while the reverse is not true. The nodes of the core section of the strong connection can reach each other nodes. The nodes of the core section of the strong connection can reach the nodes of the output section, while the reverse is not true. The nodes of the pipe section can reach the nodes of the output section via the input section. The tendril section connects with the input section and the output section. The nodes of the input section can reach the nodes in the same section or the nodes of the output section. There are no connections between the nodes of the non-connected component and other sections.

Figure 1: The bow-tie structure of the network.
This structure is composed of the input section, the core section of strong connection, the output section, the tendril section, the pipeline section, and the component with no connection.

Download full-size image

DOI: 10.7717/peerj.10985/fig-1

Ma & Zeng (2003) divided the macrostructure of the biological metabolic network into four parts: the giant strong component (GSC), substrate subset (S), product subset (P), and isolated subset (IS). These components correspond to the strongly connected core, input, output, and disconnected components of the Internet bow-tie structure, respectively, while the exact opposite situation is not observed. Any two metabolites within GSC can be generated from each other by a series of reactions. Any metabolite in S can be converted into the corresponding metabolite in GSC, but not vice versa. Any metabolite in P can be transformed via a series of the corresponding metabolites in GSC, while the exact opposite situation is not observed. The metabolites in IS cannot be converted from those in GSC, nor can they be converted into the corresponding metabolites in GSC.

Structural complexity

Qian et al. provided a strict definition of complex networks (Xuesen, Jingyuan & Ruwei, 1993). A network with some or all of the following characteristics is known as a complex network. These characteristics are self-organization, self-similarity, attractor, small world, and scale-free. We start with a scale-free perspective to verify the sophisticated attributes of the network. The scale-free property refers to the invariance of the network’s scale, meaning that the degree of the network nodes obeys the law of a power exponential distribution (i.e., the power-law distribution). It is also known as the Pareto distribution rate or the Zipf rate. This distribution law describes the distribution characteristics of the degree of network nodes. A small number of nodes in the network tend to present a large number of connections, but the number of connections of most nodes is very limited. This paper uses the Pareto distribution rate: (4) $P_{r} [X \geq x] \sim x^{- a}$ where x is the probability corresponding to the actual measurement scale of the network, and a is the scale parameter of the power rate distribution.

The law of the degree distribution of the network nodes needs to be fitted to the power exponential distribution law to verify this characteristic of the core network of the HIF-1 signaling pathway. That is, the fitting value, $\hat{a}$ , of the scale parameter, a, is obtained. Then, the characteristic is judged by the fitting effect.

The power-law fitting of the degree distribution

The general method of power-law fitting is to use the least-squares linear fitting based on double-logarithmic coordinates and to use the R² test to measure the fitting effect. However, Goldstein et al. believe that the scale result obtained by this method shows a significant error in relation to the corresponding value of the actual measurement network scale (Goldstein, Morris & Yen, 2004). Therefore, Clauset and Barabási et al. proposed a maximum likelihood estimation method for power rate fitting and tested the fitting results using KS statistics (Barabási, Albert & Jeong, 1999; Clauset, Shalizi & Newman, 2009). At present, no research shows that all the data corresponding to the actual measurement scale of the studied network obey the power-law distribution, but there is a critical value. The data corresponding to these scales obey the power-law distribution only when the value corresponding to the actual measurement scale, x, is higher than x_min. This study used Clause’s universal method, applicable to both discrete and continuous data, to estimate x_min. For actual network-scale data, the following formula is used to estimate the scale parameter, $\hat{a}$ , of the power-law distribution: (5) $\hat{a} ≃ 1 + n {[\sum_{i = 1}^{n} ln \frac{x_{i}}{x_{min} - 0.5}]}^{- 1}$ where x_min can be accurately obtained by calculating the maximum difference, L, between the actual measurement-scale data and the corresponding data of the fitted model: (6) $L = max_{x \geq min} |S (x) - P (x)|$ where S(x) is the corresponding data of the actual measurement, P(x) is the corresponding data of the fitted power-law distribution model, and x_min, which minimizes L, is the optimal value.

Similarity test

This study used the K-S (Kolmogorov-Smirnov) method to test the “distance” L between the actual measurement data and the fitted power-law distribution model. The model constructed from actual measurement data is denoted as M. This model produces n sets of data. There is a set of data whose “distance”, L, from the corresponding fitted model, M, is greater than the “distance”, L, between the actual measured data and the fitted model, M. The number of such datasets is m. $\frac{m}{n}$ is represented as p and is known as the p value. If the p value is large (close to 1), it can be considered that the statistical fluctuations alone cause the difference between the actual measurement data and the fitted model. If the p value is small, there is room for adjustment in the rationality of the fitted model. If p ≤ 0.1, it can be considered that the actual measured data do not obey the power-law distribution (Ping, 2014).

Results

Topology analysis

Figure 2 shows the overall topology of the HIF-1 signaling pathway network based on the KGML file from KEGG. Quantitative analysis of the network indicators is performed, and the results are as follows: The numbers of nodes and lines are 85 and 61, respectively. The network has a ring and ringless density of 0.00844291 and 0.00854342, respectively. The average node degree (i.e., the average number of connections) is 1.43529412. The network diameter is 7, and the average distance between the nodes is 2.26761. There are 59 orthologous enzymes, 16 compounds, eight pathways, and two unknown groups in this metabolic network that have no references in KEGG. The numbers of activation reactions, expression reactions, and inhibition reactions are 29, 26, and 6, respectively.

Figure 2: The subnet distribution map of the overall network of the HIF-1 signaling pathway.
The numbers beside a node identifies the node. The color of the node and the number in square brackets identify the subnet to which the node belongs. The square nodes, round nodes, and elliptical nodes represent the orthologous enzymes, compounds, and pathways involved in metabolism, respectively. An arrow represents a metabolic association relationship. The shaft of the arrow connects the metabolic reactant, and the arrowhead connects the metabolites. The solid black line indicates the “activated” metabolic behavior, while the solid blue line indicates the “expression” metabolic behavior. Moreover, the red dashed line indicates “suppressive” metabolic behavior.

Download full-size image

DOI: 10.7717/peerj.10985/fig-2

Figure 3: The core network bow-tie structure of the HIF-1 signaling pathway.
According to the analysis results, the bow-tie structure of the core network of the HIF-1 signaling pathway includes three types of components. The first is a strongly connected core, corresponding to the yellow node, ko: K08268, in the figure, and the corresponding reactant in the KEGG database is HIF-1 α. The second is the input components, including 20 green nodes, such as ko: K09592, corresponding to 20 reactants, such as PHD (EGLN, HPH), in the KEGG database. The third is a tendril-like structure including 13 pink nodes, such as ko: K04526, ko: K04357, and ko: K05459, which correspond to 13 reactants, such as GF (INS, EGF, IGF1), in the KEGG database.

Download full-size image

DOI: 10.7717/peerj.10985/fig-3

The core network of the HIF-1 signaling pathway is identified based on cohesive subgroup analysis. The network is extracted for the convenience of further study, as shown in Fig. 3. The quantitative indicators of the network are as follows: the number of nodes is 34, and the number of lines is 34. The network has a ring density of 0.02941176 and a ringless density of 0.03030303. The average node degree is 2. The network diameter is 7, and the average distance between the nodes is 2.57018. There are 24 orthologous enzymes and 10 compounds in this metabolic network. The number of unknown groups that have no references in KEGG is 1; the number of activation reactions is 27; the number of expression reactions is 2; and the number of inhibition reactions is 5.