PeerJ Computer Science Preprints: Distributed and Parallel Computinghttps://peerj.com/preprints/index.atom?journal=cs&subject=9900Distributed and Parallel Computing articles published in PeerJ Computer Science PreprintsImpact study of data locality on task-based applications through the Heteroprio schedulerhttps://peerj.com/preprints/276162019-03-272019-03-27Bérenger Bramas
The task-based approach has gained much attention to use modern heterogeneous computing nodes. It allows parallelizing with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG-scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task assignation. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks' dependencies. The interest of the present method was evaluated on two linear algebra applications and a stencil code. It was deduced that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.
The task-based approach has gained much attention to use modern heterogeneous computing nodes. It allows parallelizing with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG-scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task assignation. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks' dependencies. The interest of the present method was evaluated on two linear algebra applications and a stencil code. It was deduced that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.A use case centric survey of Blockchain: status quo and future directionshttps://peerj.com/preprints/275292019-02-112019-02-11Srinath PereraFrank LeymannPaul Fremantle
This paper presents an assessment of blockchain technology based on the Emerging Technology Analysis Canvas (ETAC) to evaluate the drivers and potential outcomes. The ETAC is a framework to critically analyze emerging technologies.
The assessment finds that blockchain can fundamentally transform the world. It is ready for specific applications in use cases such as digital currency, lightweight financial systems, ledgers, provenance, and disintermediation.
However, Blockchain faces significant technical gaps in other use cases and needs at least 5-10 years to come to full fruition in those spaces. Sustaining the current level of effort (e.g. startups, research) for this period of time may be challenging. We also find that the need and merits of decentralized infrastructures compared to centralized and semi-centralized alternatives is not always clear. Given the risk involved and significant potential returns, we recommend a cautiously optimistic approach to blockchain with the focus on concrete use cases.
The primary contributions of this paper are a use case centric categorization of the blockchain, a detailed discussion on challenges faced by those categories, and an assessment of their future.
This paper presents an assessment of blockchain technology based on the Emerging Technology Analysis Canvas (ETAC) to evaluate the drivers and potential outcomes. The ETAC is a framework to critically analyze emerging technologies.The assessment finds that blockchain can fundamentally transform the world. It is ready for specific applications in use cases such as digital currency, lightweight financial systems, ledgers, provenance, and disintermediation.However, Blockchain faces significant technical gaps in other use cases and needs at least 5-10 years to come to full fruition in those spaces. Sustaining the current level of effort (e.g. startups, research) for this period of time may be challenging. We also find that the need and merits of decentralized infrastructures compared to centralized and semi-centralized alternatives is not always clear. Given the risk involved and significant potential returns, we recommend a cautiously optimistic approach to blockchain with the focus on concrete use cases.The primary contributions of this paper are a use case centric categorization of the blockchain, a detailed discussion on challenges faced by those categories, and an assessment of their future.GIZAChain: e-Government Interoperability Zone Alignment, based on blockchain technologyhttps://peerj.com/preprints/274772019-01-112019-01-11Mohamed A El-dosukyGamal H El-adl
E-government provides access to services anytime anywhere. There are many e-Government frameworks already exist to integrate e-government services, but efficient full interoperability still a challenge.
Interoperability per se can be modeled via four maturity stages, in which the interoperability zone is the holy grail of full interoperability to be reached ultimately with strategy alignment. As e-government services shift in the same way as e-commerce with value chain, this implicitly implies the possibility of benefiting from blockchain with e-government. Blockchain is a nascent promising architecture, whose transactions are permanent, verifiable, and recorded in a distributed ledger.
This research article suggests applying blockchain in achieving e- government interoperability. Forms are juxtaposed on the outer borders of the system. These forms adopt those used by UK government, because they are standard as well as they are available for Python developers. Once a form has been completed, PySOA calls the requested service, before storing the data in Ontology Blockchain. After the service is performed, the policies are analyzed in batch processing using quantgov. A report is submitted to the central government periodically. Ontology Blockchain has a dual effect. On the one hand, it works as a secure data storage. On the other hand, it cooperates with PySOA in supporting both technology and semantic interoperability . The most important feature of the proposed method is the presence of (Government Interoperability Zone Alignment; GIZA), which acts as a backbone that coherently connects the internal subcomponents. This linkage is possible, because each form has an title, that corresponds to the appropriate service name. Each service in turn has a counterpart in the wallets stored in Ontology blockchain.
To measure interoperability empirically, there is a need for metrics. This study adopts and quantizes a standard interoperability matrix along three dimensions of interoperability of Conceptual (Syntax& Semantics), Organizational (Responsibilities& Organization per se), and Technology (Platform& Communication). While concerns are : data, business, service, and process. Any deviation from the standard could contributes to the interoperability score (counting mismatches) or interoperability grade (counting absolute differences). An estimation is performed, for 1000 total random cases. It is estimated that the probability of getting a conceptual/technical interoperability score as large as the standard strategy score is (713 /1000 = 0.713 (2 in 3). It is estimated too that the probability of getting a organizational interoperability score as large as the standard strategy score is (712 /1000 = 0.712 (2 in 3). Then, Markov model is proposed to provide an accurate representation of the evolution of the strategies over time.
E-government provides access to services anytime anywhere. There are many e-Government frameworks already exist to integrate e-government services, but efficient full interoperability still a challenge.Interoperability per se can be modeled via four maturity stages, in which the interoperability zone is the holy grail of full interoperability to be reached ultimately with strategy alignment. As e-government services shift in the same way as e-commerce with value chain, this implicitly implies the possibility of benefiting from blockchain with e-government. Blockchain is a nascent promising architecture, whose transactions are permanent, verifiable, and recorded in a distributed ledger.This research article suggests applying blockchain in achieving e- government interoperability. Forms are juxtaposed on the outer borders of the system. These forms adopt those used by UK government, because they are standard as well as they are available for Python developers. Once a form has been completed, PySOA calls the requested service, before storing the data in Ontology Blockchain. After the service is performed, the policies are analyzed in batch processing using quantgov. A report is submitted to the central government periodically. Ontology Blockchain has a dual effect. On the one hand, it works as a secure data storage. On the other hand, it cooperates with PySOA in supporting both technology and semantic interoperability . The most important feature of the proposed method is the presence of (Government Interoperability Zone Alignment; GIZA), which acts as a backbone that coherently connects the internal subcomponents. This linkage is possible, because each form has an title, that corresponds to the appropriate service name. Each service in turn has a counterpart in the wallets stored in Ontology blockchain.To measure interoperability empirically, there is a need for metrics. This study adopts and quantizes a standard interoperability matrix along three dimensions of interoperability of Conceptual (Syntax& Semantics), Organizational (Responsibilities& Organization per se), and Technology (Platform& Communication). While concerns are : data, business, service, and process. Any deviation from the standard could contributes to the interoperability score (counting mismatches) or interoperability grade (counting absolute differences). An estimation is performed, for 1000 total random cases. It is estimated that the probability of getting a conceptual/technical interoperability score as large as the standard strategy score is (713 /1000 = 0.713 (2 in 3). It is estimated too that the probability of getting a organizational interoperability score as large as the standard strategy score is (712 /1000 = 0.712 (2 in 3). Then, Markov model is proposed to provide an accurate representation of the evolution of the strategies over time.Data security issues in the realm of mobile cloud computing: A surveyhttps://peerj.com/preprints/270502018-07-242018-07-24Mohammed-Ali Anwar
Mobile Cloud Computing (MCC) is a recent technological development, which has emerged from two popular technology trends; mobile computing and cloud. In essence it revolutionises the capabilities of mobile devices by integrating both storage and processing of the cloud environment with mobile computing and in doing so providing greater optimisation and operating power, allowing for transparent and seamless use of resources provided by the cloud. However, expanding the capability of resource constrained mobile devices in this manner comes at a price. There are many risks associated with the security of data within the cloud environment and as MCC essentially uses the cloud, it also inherits any security issues that are associated with cloud computing. The aim of this survey is to identify potential data security issues, and analyse and present some pioneering security mechanisms and finally suggest some future directions for better data security with MCC.
Mobile Cloud Computing (MCC) is a recent technological development, which has emerged from two popular technology trends; mobile computing and cloud. In essence it revolutionises the capabilities of mobile devices by integrating both storage and processing of the cloud environment with mobile computing and in doing so providing greater optimisation and operating power, allowing for transparent and seamless use of resources provided by the cloud. However, expanding the capability of resource constrained mobile devices in this manner comes at a price. There are many risks associated with the security of data within the cloud environment and as MCC essentially uses the cloud, it also inherits any security issues that are associated with cloud computing. The aim of this survey is to identify potential data security issues, and analyse and present some pioneering security mechanisms and finally suggest some future directions for better data security with MCC.A mixed-method empirical study of Function-as-a-Service software development in industrial practicehttps://peerj.com/preprints/270052018-06-262018-06-26Philipp LeitnerErik WitternJosef SpillnerWaldemar Hummer
Function-as-a-Service (FaaS) describes cloud computing services that make infrastructure components transparent to application developers, thus falling in the larger group of “serverless” computing models. When using FaaS offerings, such as AWS Lambda, developers provide atomic and short-running code for their functions, and FaaS providers execute and horizontally scale them on- demand . Currently, there is no systematic research on how developers use serverless, what types of applications lend themselves to this model, or what architectural styles and practices FaaS-based applications are based on. We present results from a mixed-method study, combining interviews with advanced practitioners, a systematic analysis of grey literature, and a Web-based survey. We find that successfully adopting FaaS requires a different mental model, where systems are primarily constructed by composing pre-existing services, with FaaS often acting as the “glue” that brings these services together. Tooling availability and maturity, especially related to testing and deployment, remains a major difficulty. Further, we find that current FaaS systems lack systematic support for function reuse, and abstractions and programming models for building non-trivial FaaS applications are limited . We conclude with a discussion of implications for FaaS providers, software developers, and researchers.
Function-as-a-Service (FaaS) describes cloud computing services that make infrastructure components transparent to application developers, thus falling in the larger group of “serverless” computing models. When using FaaS offerings, such as AWS Lambda, developers provide atomic and short-running code for their functions, and FaaS providers execute and horizontally scale them on- demand . Currently, there is no systematic research on how developers use serverless, what types of applications lend themselves to this model, or what architectural styles and practices FaaS-based applications are based on. We present results from a mixed-method study, combining interviews with advanced practitioners, a systematic analysis of grey literature, and a Web-based survey. We find that successfully adopting FaaS requires a different mental model, where systems are primarily constructed by composing pre-existing services, with FaaS often acting as the “glue” that brings these services together. Tooling availability and maturity, especially related to testing and deployment, remains a major difficulty. Further, we find that current FaaS systems lack systematic support for function reuse, and abstractions and programming models for building non-trivial FaaS applications are limited . We conclude with a discussion of implications for FaaS providers, software developers, and researchers.Resilience enhancement of container-based cloud load balancing servicehttps://peerj.com/preprints/268752018-04-202018-04-20Dongsheng Zhang
Web traffic is highly jittery and unpredictable. Load balancer plays a significant role in mitigating the uncertainty in web environments. With the growing adoption of cloud computing infrastructure, software load balancer becomes more common in recent years. Current load balancer services distribute the network requests based on the number of network connections to the backend servers. However, the load balancing algorithm fails to work when other resources such as CPU or memory in a backend server saturates. We experimented and discussed the resilience evaluation and enhancement of container-based software load balancer services in cloud computing environments. We proposed a pluggable framework that can dynamically adjust the weight assigned to each backend server based on real-time monitoring metrics.
Web traffic is highly jittery and unpredictable. Load balancer plays a significant role in mitigating the uncertainty in web environments. With the growing adoption of cloud computing infrastructure, software load balancer becomes more common in recent years. Current load balancer services distribute the network requests based on the number of network connections to the backend servers. However, the load balancing algorithm fails to work when other resources such as CPU or memory in a backend server saturates. We experimented and discussed the resilience evaluation and enhancement of container-based software load balancer services in cloud computing environments. We proposed a pluggable framework that can dynamically adjust the weight assigned to each backend server based on real-time monitoring metrics.Distributed stream processing for genomics pipelineshttps://peerj.com/preprints/33382017-10-122017-10-12Francesco VersaciLuca PiredduGianluigi Zanetti
Personalized medicine is in great part enabled by the progress in data acquisition technologies for modern biology, such as next-generation sequencing (NGS). Conventional NGS processing workflows are composed by independent tools implementing shared-memory parallelism which communicate by means of intermediate files. With increasing data sizes this approach is showing its limited scalability and robustness characteristics – problems that make it unsuitable for large-scale, population-wide personalized medicine applications. In this work we propose the adoption of the stream computing architecture to make the genomics pipeline more scalable, and fault-tolerant. We implemented the first processing phases for Illumina sequencing data – from raw data to alignment – using the Apache Flink distributed stream processing framework and Apache Kafka. The new pipeline has been tested processing the raw output of an Illumina HiSeq3000 sequencer and producing aligned reads in CRAM format. The results show near optimal scalability characteristics on experiments from 1 to 12 computing nodes, with a speed-up of 9.5x over the conventional solution (which cannot automatically run on multiple nodes). This result is particularly positive considering that the very short runtime of the experiment – less than 15 minutes – makes significant the constant time costs imposed by the overheads of the frameworks.
Personalized medicine is in great part enabled by the progress in data acquisition technologies for modern biology, such as next-generation sequencing (NGS). Conventional NGS processing workflows are composed by independent tools implementing shared-memory parallelism which communicate by means of intermediate files. With increasing data sizes this approach is showing its limited scalability and robustness characteristics – problems that make it unsuitable for large-scale, population-wide personalized medicine applications. In this work we propose the adoption of the stream computing architecture to make the genomics pipeline more scalable, and fault-tolerant. We implemented the first processing phases for Illumina sequencing data – from raw data to alignment – using the Apache Flink distributed stream processing framework and Apache Kafka. The new pipeline has been tested processing the raw output of an Illumina HiSeq3000 sequencer and producing aligned reads in CRAM format. The results show near optimal scalability characteristics on experiments from 1 to 12 computing nodes, with a speed-up of 9.5x over the conventional solution (which cannot automatically run on multiple nodes). This result is particularly positive considering that the very short runtime of the experiment – less than 15 minutes – makes significant the constant time costs imposed by the overheads of the frameworks.GenHap: A novel computational method based on genetic algorithms for haplotype assemblyhttps://peerj.com/preprints/32462017-09-122017-09-12Andrea TangherloniSimone SpolaorLeonardo RundoMarco S NobileIvan MerelliPaolo CazzanigaDaniela BesozziGiancarlo MauriPietro Liò
The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composed of randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.
Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.
The process of inferring a full haplotype of a cell is known as haplotyping, which consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. In this work, we propose a novel computational method for haplotype assembly based on Genetic Algorithms (GAs), named GenHap. Our approach could efficiently solve large instances of the weighted Minimum Error Correction (wMEC) problem, yielding optimal solutions by means of a global search process. wMEC consists in computing the two haplotypes that partition the sequencing reads into two unambiguous sets with the least number of corrections to the SNP values. Since wMEC was proven to be an NP-hard problem, we tackle this problem exploiting GAs, a population-based optimization strategy that mimics Darwinian processes. In GAs, a population composedof randomly generated individuals undergoes a selection mechanism and is modified by genetic operators. Based on a quality measure (i.e., the fitness value), inspired by Darwin’s “survival of the fittest” laws, each individual is involved in a selection process.Our preliminary experimental results show that GenHap is able to achieve correct solutions in short running times. Moreover, this approach can be used to compute haplotypes in organisms with different ploidity. The proposed evolutionary technique has the advantage that it could be formulated and extended using a multi-objective fitness function taking into account additional insights, such as the methylation patterns of the different chromosomes or the gene proximity in maps achieved through Chromosome Conformation Capture (3C) experiments.The Modern Research Data Portal: A design pattern for networked, data-intensive sciencehttps://peerj.com/preprints/31942017-09-122017-09-12Kyle ChardEli DartIan FosterDavid ShifflettSteven TueckeJason Williams
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.Parallel and in-process compilation of individuals for genetic programming on GPUhttps://peerj.com/preprints/29362017-04-192017-04-19Hakan AyralSongül Albayrak
Three approaches to implement genetic programming on GPU hardware are compilation, interpretation and direct generation of machine code. The compiled approach is known to have a prohibitive overhead compared to other two.
This paper investigates methods to accelerate compilation of individuals for genetic programming on GPU hardware. We apply in-process compilation to minimize the compilation overhead at each generation; and we investigate ways to parallelize in-process compilation. In-process compilation doesn’t lend itself to trivial parallelization with threads; we propose a multiprocess parallelization using memory sharing and operating systems interprocess communication primitives. With parallelized compilation we achieve further reductions on compilation overhead. Another contribution of this work is the code framework we built in C# for the experiments. The framework makes it possible to build arbitrary grammatical genetic programming experiments that run on GPU with minimal extra coding effort, and is available as open source.
Three approaches to implement genetic programming on GPU hardware are compilation, interpretation and direct generation of machine code. The compiled approach is known to have a prohibitive overhead compared to other two.This paper investigates methods to accelerate compilation of individuals for genetic programming on GPU hardware. We apply in-process compilation to minimize the compilation overhead at each generation; and we investigate ways to parallelize in-process compilation. In-process compilation doesn’t lend itself to trivial parallelization with threads; we propose a multiprocess parallelization using memory sharing and operating systems interprocess communication primitives. With parallelized compilation we achieve further reductions on compilation overhead. Another contribution of this work is the code framework we built in C# for the experiments. The framework makes it possible to build arbitrary grammatical genetic programming experiments that run on GPU with minimal extra coding effort, and is available as open source.