Prediction of clusters of miRNA binding sites in mRNA candidate genes of breast cancer subtypes

The development of breast cancer (BC) subtypes is controlled by distinct sets of candidate genes, and the expression of these genes is regulated by the binding of their mRNAs with miRNAs. Predicting miRNA associations and target genes is thus essential when studying breast cancer. The MirTarget program identifies the initiation of miRNA binding to mRNA, the localization of miRNA binding sites in mRNA regions, and the free energy from the binding of all miRNA nucleotides with mRNA. Candidate gene mRNAs have clusters (miRNA binding sites with overlapping nucleotide sequences). mRNAs of EPOR, MAZ and NISCH candidate genes of the HER2 subtype have clusters, and there are four clusters in mRNAs of MAZ, BRCA2 and CDK6 genes. Candidate genes of the triple-negative subtype are targets for multiple miRNAs. There are 11 sites in CBL mRNA, five sites in MMP2 mRNA, and RAB5A mRNA contains two clusters in each of the three sites. In SFN mRNA, there are two clusters in three sites, and one cluster in 21 sites. Candidate genes of luminal A and B subtypes are targets for miRNAs: there are 21 sites in FOXA1 mRNA and 15 sites in HMGA2 mRNA. There are clusters of five sites in mRNAs of ITGB1 and SOX4 genes. Clusters of eight sites and 10 sites are identified in mRNAs of SMAD3 and TGFB1 genes, respectively. Organizing miRNA binding sites into clusters reduces the proportion of nucleotide binding sites in mRNAs. This overlapping of miRNA binding sites creates a competition among miRNAs for a binding site. From 6,272 miRNAs studied, only 29 miRNAs from miRBase and 88 novel miRNAs had binding sites in clusters of target gene mRNA in breast cancer. We propose using associations of miRNAs and their target genes as markers in breast cancer subtype diagnosis.

In this publication, on the example of studying the characteristics of the binding of miRNA 75 with mRNA of BC candidate genes we show the advantage of the proposed changes in the 76 perception of the interaction of miRNA with mRNA. The present article is devoted to 77 ascertaining the interaction of miRNAs with mRNA candidate genes of BC, especially those that 78 contain two and more miRNAs binding sites organized in clusters.

80
The nucleotide (nt) sequences of candidate genes of BC subtypes were downloaded from 81 GenBank (http://www.ncbi.nlm.nih.gov). These candidate genes are specific for the development 82 of triple-negative subtype, luminal A and B subtypes and HER2 subtype of BC (Table S1). 83 Information about miRNAs that presumably bind to candidate genes of BC is provided in Table  216 There were six candidate genes that formed a cluster in 3'UTR of mRNA (Table 1). There were 217 six miRNAs binding sites that formed two clusters of binding sites in the mRNA of ATM gene. 218 The first cluster with a length of 37 nt began with 9778 nt and the second cluster with a length of 219 42 nt began with 11054 nt. The total miRNA length for the first and second cluster was 67 and 220 68 nt, respectively. The decrease in the total length of the miRNA binding sites at overlapping of 221 their nucleotide sequences in the clusters was 1.6 -1.8 times. The average free energy of 222 interaction of six miRNAs with the mRNA of ATM gene was -115 kJ/mole.

223
The cluster of binding sites in the mRNA of the IL11 gene is located from 1466 nt to 1497 nt 224 in length 31 nt. The sum of the lengths of binding sites equal to 89 nt is 2.9 times the length of 225 the cluster. The clusters of binding sites for three miRNAs were identified in 3'UTR of mRNA 226 RUNX1 and CBL genes. In 3'UTR of mRNA CBL gene, the cluster of three miRNA binding sites 227 was 44 nt in length, and the total length of the binding sites was 70 nt. In 3'UTR of mRNA 228 RUNX1 gene, the cluster of three miRNA binding sites was 34 nt long, and the sum of the 229 lengths of five binding sites was 115 nt. Compacting the length of the binding sites of these 230 miRNAs led to the emergence of competition between them for the binding site in mRNA. The 231 average free energy of miRNAs interaction with mRNA in CBL and RUNX1 genes clusters were 232 -116 kJ/mole and -109 kJ/mole, respectively.

233
There were two clusters of binding sites for three miRNAs in the 36 nt region from 826 nt to 234 861 nt and another 53 nt cluster from 1179 nt to 1231 nt in 3'UTR of mRNA SFN gene. The third 235 cluster included 21 binding sites of five miRNAs. The sum of the lengths of all miRNA binding 236 sites of two clusters was 619 nt. Due to the clustering of binding sites of these miRNAs, the 237 actual binding site was only 89 nt, which is seven times less and amounts to 18% of the length of 238 3'UTR of mRNA SFN gene equal to 498 nt. The average free energy of miRNA binding at 27 239 sites was -108 kJ/mole.

240
The STMN1 gene was the target of four miRNAs, the binding sites of which in 3'UTR 241 occupied 43 nt, while the total miRNA length was 90 nt. The average free energy of miRNA 242 binding at four sites was -110 kJ/mole.

258
The formation of a cluster of binding sites for the FOXA1 gene in 5'UTR indicates a greater 259 ability of this gene for compaction, which causes competition among miRNA for the binding 260 site. Despite the fact that TJU_CMC_MD2.ID01099.5p-miR, TJU_CMC_MD2.ID01190.5p-miR 261 and TJU_CMC_MD2.ID02457.3p-miR are fully complementary to mRNA gene, they had a free 262 energy interaction of -108 kJ/mole, which is significantly less than for other miRNAs. At equal 263 concentrations of all miRNAs, TJU_CMC_MD2.ID00252.5p-miR, 264 TJU_CMC_MD2.ID00296.3p-miR and TJU_CMC_MD2.ID01702.3p-miR had ∆G values equal 265 to -140 kJ/mole having the advantage in binding to the mRNA of FOXA1 gene. The average free 266 energy of miRNA binding, without three miRNAs with a length of 17 nt, was -126 kJ/mole, 267 which is characteristic of miRNA binding in 5'UTR.

268
The 5'UTR of mRNA HMGA2 had 17 binding sites for 15 miRNAs. Binding sites of these 269 miRNAs were in a 95 nt cluster from 512 nt to 606 nt. The total length of binding sites was equal 270 to 407 nt and it was 4.3 times longer than the cluster. miRNA binding sites were located in the 271 first half of 5'UTR and had a ∆G value more than -125 kJ/mole. TJU_CMC_MD2.ID00296.3p-272 miR and TJU_CMC_MD2.ID00296.3p-miR had a ∆G equal to -142 kJ/mole and -146 kJ/mole, 273 respectively.

274
The ITGB1 gene had no 5'UTR, but a cluster for five miRNA binding sites was located from 275 91 nt to 120 nt in the beginning of CDS with the length of 30 nt, which is 3.6 times less than the 276 sum of the lengths of five miRNAs.

277
For HMGA2 gene there was a cluster for four binding sites from 1255 nt to 1295 nt located in 278 the beginning of 3'UTR. The cluster length was equal to 41 nt with total length of binding sites 279 comprising 98 nt in length.

280
Apparently, the compaction of binding sites is due not only to the economy of gene length but 281 also to the competition between miRNAs for interaction. For example, the cluster of eight 282 binding sites with 3'UTR of mRNA SMAD3 gene with the length of 35 nt was located from 2066 283 nt to 2101 nt. Therefore, only one miRNA can be bind in a cluster. At equal concentrations of all 284 six miRNAs, TJU_CMC_MD2.ID02822.5p-miR and miR-6089 had free interaction energy of -285 127 kJ/mole to -136 kJ/mole will have an advantage in binding to cluster.

289
The mRNAs of the TGFB1 gene had a cluster of binding sites for seven miRNAs with a 290 length of 48 nt located from 2060 nt to 2107 nt. The length of 3'UTR was 146 nt with 10 miRNA 291 binding sites equal to 230 nt, so the compacting of the binding sites was 4.8 times. 292 Fig . S2 shows the schemes of interaction of some miRNAs with mRNA of several candidate 293 genes of the luminal A and B subtypes. The presented schemes clearly show the advantage of the 294 MirTarget program in predicting the miRNA binding sites. For example, 295 TJU_CMC_MD2.ID03367.5p-miR formed a non-canonical G-U pair in the mRNA FOXA1 296 gene. But TJU_CMC_MD2.ID03367.5p-miR can bind to 19 nucleotide of mRNA and the free 297 interaction energy was 93% of the maximum value. The TJU_CMC_MD2.ID02542.5p-miR 298 interacted with 23 nucleotides of mRNA FOXA1 gene, but had only one unpaired nucleotide. 299 Such interaction between the miRNAs and their target genes is valid for the following pairs: 300 TJU_CMC_MD2.ID00101.3p-miR and HMGA2 gene, TJU_CMC_MD2.ID00849.3p-miR and 301 HMGA2 gene, miR-4507-3p and SMAD3 gene, miR-937-5p and TGFB1 gene, miR-937-5p and 302 SMAD3 gene, TJU_CMC_MD2.ID01403.5p-miR and HMGA2 gene. 303 Subtype HER2 Breast Cancer 304 Twenty-three miRNAs were bound in 5'UTR mRNAs of three candidate genes of the breast 305 cancer subtype HER2 (Table 3). The mRNA of EPOR gene had three miRNA binding sites, the 306 nucleotide sequences of which overlapped. Three binding sites of TJU_CMC_MD2.ID01633.3p-307 miR, TJU_CMC_MD2.ID01599.3p-miR and TJU_CMC_MD2.ID01626.3p-miR comprised a 26 308 nt cluster located from 77 nt to 102 nt in 5'UTR of mRNA EPOR gene. Without overlapping 309 sites, the length of three miRNAs would be 67 nt, which is half of the 135 nt length of 5'UTR. 310 Consequently, the compacting of miRNA binding sites is useful in reducing the proportion of 311 binding sites by 2.6 times in 5'UTR of mRNA EPOR gene.

319
In the mRNA of NISCH gene, the binding sites of TJU_CMC_MD2.ID03445.3p-miR, 320 TJU_CMC_MD2.ID01560.3p-miR and TJU_CMC_MD2.ID03119.5p-miR formed a cluster 321 with the length of 35 nt from 31 nt to 64 nt. With cluster formation, the length of these binding 322 sites was 71 nt, i.e. 52 % of the length of 5'UTR equal to 134 nt.

323
There were 24 miRNAs for which the mRNA was targeted in CDS. The mitogen-activated 324 protein kinase three (MAPK3) gene was a target of three miRNAs, the binding sites of which 325 were located in a cluster with the length of 26 nt. The mRNA of MAZ gene had miRNA binding 326 sites with overlapping of nucleotide sequences into four different clusters (Table 3). The first 327 cluster with the length of 33 nt included binding sites of miR-6729-5p, 328 TJU_CMC_MD2.ID02623.3p-miR, TJU_CMC_MD2.ID02460.5p-miR and miR-2861. 329 Nucleotide sequences of binding sites cluster encoded the oligopeptide APAPPPTPQA which 330 was conservative in orthologous proteins MAZ of Hsa, Pab, Ptr, Csa (Fig. 3A). The second 331 cluster with the length of 74 nt was located from 457 nt to 530 nt. The total length of all binding 332 sites of miRNAs of this cluster was 302 nt. This length requires binding site compaction, since 333 all nucleotides participate in the coding of functionally important amino acids in CDS. Despite 334 the large length of the cluster of nine miRNA binding sites encoding the oligopeptide 335 AAAAAAAAAAAAAVAAAPPAPAAA, it was conservative in the MAZ orthologs (Fig. 3B). 336 The third cluster consisted of miR-4706, TJU_CMC_MD2.ID01641.3p-miR, 337 TJU_CMC_MD2.ID01705.3p-miR, and miR-3960 binding sites with the length of 30 nt. The 338 cluster encoded the conservative oligopeptide APPASAAT (Fig. 3C). The fourth cluster with a 339 length of 30 nt included binding sites for three miRNAs from 893 nt to 922 nt. The encoded by 340 cluster oligopeptide GAGGGGGEAG was also conservative (Fig. 3D).

341
All binding sites for miRNAs that interact with MAZ mRNA had a total length of 472 nt, 342 which is approximately 33% of the total CDS length. Clustered binding sites for miRNA 343 occupied only 12% of CDS length equal to 1434 nt. MAZ gene was the most vulnerable target for 344 miRNA, so its expression should be monitored as a matter of priority.

345
Fifteen miRNAs were bound within the mRNA of candidate breast cancer subtype HER2 346 genes with a free energy of -125 kJ/mole or greater (Table 3). For example, 347 TJU_CMC_MD2.ID01626.3p-miR had a competitive advantage over 348 TJU_CMC_MD2.ID01633.3p-miR and TJU_CMC_MD2.ID01599.3p-miR for binding in the 349 mRNA cluster EPOR. In two clusters of MAZ mRNA, TJU_CMC_MD2.ID01476.3p-miR and 350 TJU_CMC_MD2.ID00915.3p-miR will predominantly bind. The translation of mRNA MAZ 351 gene will be significantly suppressed if TJU_CMC_MD2.ID02294.5p-miR (which had three 352 binding sites) is present, and TJU_CMC_MD2.ID01804.3p-miR and 353 TJU_CMC_MD2.ID01641.3p-miR had two sites with a free energy of binding of -132 kJ/mole 354 or greater. 355 In 3'UTR of mRNA BRCA2 gene, three miRNA binding sites were identified with 356 overlapping of nucleotide sequences (Table 3). The CDK6 gene was a target for nine miRNAs. 357 miR-548h-3p, miR-548z, miR-548aq-3p, miR-548az-3p, TJU_CMC_MD2.ID03264.3p-miR 358 formed a cluster from 1677 nt to 1699 nt. The mRNA of BRCA2 and CDK6 genes had binding 359 sites for miRNA in 3'UTR with a low free energy of binding: from -98 kJ/mole to -117 kJ/mole. 360 In the 3'UTR of mRNA CDK6 gene with an RPKM value of 2.2, there are ten binding sites of 361 miR-466, nine binding sites of TJU_CMC_MD2.ID00436.3p-miR, seven binding sites of 362 TJU_CMC_MD2.ID01030.3p-miR formed a cluster from 1896 nt to 1948 nt (Table 3). Multiple 363 binding sites for these miRNAs allow them to bind with mRNA and significantly increase the 364 probability of translation inhibition of the mRNA CDK6 gene.

365
Compacting of miRNA binding sites is difficult to explain if its sole purpose is saving the 366 length of 3'UTR. Apparently, there are other reasons for compacting binding sites. For example, 367 the binding of one miRNA precludes other miRNAs binding with their site. If this miRNA is a 368 signal of the host gene (gene encoding miRNA), the target gene will not perceive this signal. 369 That is, there is competition between different miRNAs for the binding site and for the ability to 370 regulate the expression of the target gene.

371
It should be noted that most miRNA binding sites were located at the beginning of 5'UTR and 372 CDS mRNA regions of MAZ gene (Table 3). This localization of miRNA binding sites allows 373 protein synthesis to be stopped earlier in the case of the formation of abortive proteins. For 374 example, the first three clusters of miRNA binding sites were located in CDS of mRNA MAZ 375 gene comprise an area from the 158 nt to the 477 nt. All binding sites of nine miRNAs in 5'UTR 376 of mRNA MAZ were located from 16 nt to 114 nt of the 168 nt length of 5'UTR.

377
The miRNA binding sites in 3'UTR of mRNA BRCA2 and CDK6 genes were also located at 378 the beginning of 3'UTR (Table 3).  After discovery of clusters of miRNA binding sites with mRNA of candidate genes of breast 387 cancer subtypes, the question arises as to how stable these structural forms are. It is known that 388 some miRNAs arose in early stages of evolution and are stable for tens of millions of years of 389 species divergence (Kondybayeva et al., 2018). Other miRNA associations with mRNA have 390 appeared recently, and they are not observed even in closely related species. In this regard, we 391 checked the variability of nucleotide sequences of binding sites in clusters identified by us. In 392 Tables S3-5 of are given results of analysis of nucleotide sequences of clusters in mRNA 393 candidate breast cancer subtypes. The data obtained show that in most cases the nucleotide 394 sequences of clusters are identical. Observed differences in single nucleotides slightly change the 395 degree of interaction of miRNA with binding sites. Consequently, established bindings between 396 miRNAs and binding sites organized in clusters are stable in genomes of objects studied which 397 have diverged for millions of years. The evolutionary conservatism of the associations of Tables 1-3 provide information (RPKM) on the normal expression of candidate genes in the 500 mammary gland. The most strongly expressed genes are MMP2 (Table 1), ITGB1 (Table 2), 501 NISCH, MAPK3 (Table 3). The mRNA of MMP2 and ITGB1 genes contain clusters of binding 502 sites for five miRNAs, and the mRNA of NISCH and MAPK3 genes for three miRNAs. 503 Consequently, the expression of these candidate genes and miRNAs binding in respective 504 clusters can be used to develop methods for diagnosing BC subtypes. The HMGA2 gene is not 505 normally expressed (Table 2); however, its mRNA has two binding clusters for 18 miRNAs, and 506 some miRNAs can bind with the large free energy to mRNA, which suggests suppression of its  The proposed associations of miRNA and target genes should be analyzed taking into account 511 the following factors: a) miRNA and their target genes perform the limiting stages of key 512 biological processes involved in the development of diseases; b) these binding events have a 513 large free energy of miRNA interaction with mRNA; c) there is a greater number of miRNAs 514 that bind to mRNA; and d) included miRNAs have more target genes. Depending on the 515 circumstances, the adequacy and significance of the listed miRNAs association with mRNA may 516 vary. 517 Conclusions 518 The associations of miRNAs and their targets genes have been identified for a set of candidate 519 genes for breast cancer subtypes. The clustering of miRNA binding sites decreases the fraction 520 of nucleotide sequence comprising binding sites in mRNA. The average free energy of miRNA 521 binding in mRNA sites decreases in the order: 5'UTR ˃ CDS ˃ 3'UTR. The cluster organization 522 of miRNA binding sites is mainly manifested in 5'UTR and 3'UTR. In the CDS, the share of 523 miRNA binding sites organized into clusters is less than that of single miRNA binding sites. The 524 cluster organization of miRNA binding sites together with the free energy of miRNA interaction 525 with mRNA causes competition between miRNA for binding to mRNA. This phenomenon 526 demonstrates the competitive relationship of miRNA in the regulation of the expression of target 527 genes. The number of miRNA binding sites in clusters indicates the degree of dependence of the 528 expression of target genes on the expression of other genes generating miRNAs. Some 529 associations of miRNAs and their target genes can be used to develop methods for diagnosing 530 BC subtypes.   Tables 1, 2

Figure 1(on next page)
Location of nucleotide sequences of miRNA binding sites cluster in mRNA CBL gene.