PeerJ Computer Science Preprints: World Wide Web and Web Sciencehttps://peerj.com/preprints/index.atom?journal=cs&subject=11900World Wide Web and Web Science articles published in PeerJ Computer Science PreprintsAn architecture for context-aware reactive systems based on run-time semantic modelshttps://peerj.com/preprints/277022019-05-042019-05-04Ester GiallonardoFrancesco PoggiDavide RossiEugenio Zimeo
In recent years, new classes of highly dynamic, complex systems are gaining momentum. These systems are characterized by the need to express behaviors driven by external and/or internal changes, i.e. they are reactive and context-aware. These classes include, but are not limited to IoT, smart cities, cyber-physical systems and sensor networks.
An important design feature of these systems should be the ability of adapting their behavior to environment changes. This requires handling a runtime representation of the context enriched with variation points that relate different behaviors to possible changes of the representation.
In this paper, we present a reference architecture for reactive, context-aware systems able to handle contextual knowledge (that defines what the system perceives) by means of virtual sensors and able to react to environment changes by means of virtual actuators, both represented in a declarative manner through semantic web technologies. To improve the ability to react with a proper behavior to context changes (e.g. faults) that may influence the ability of the system to observe the environment, we allow the definition of logical sensors and actuators through an extension of the SSN ontology (a W3C standard). In our reference architecture a knowledge base of sensors and actuators (hosted by an RDF triple store) is bound to real world by grounding semantic elements to physical devices via REST APIs.
The proposed architecture along with the defined ontology try to address the main problems of dynamically reconfigurable systems by exploiting a declarative, queryable approach to enable runtime reconfiguration with the help of (a) semantics to support discovery in heterogeneous environment, (b) composition logic to define alternative behaviors for variation points, (c) bi-causal connection life-cycle to avoid dangling links with the external environment. The proposal is validated in a case study aimed at designing an edge node for smart buildings dedicated to cultural heritage preservation.
In recent years, new classes of highly dynamic, complex systems are gaining momentum. These systems are characterized by the need to express behaviors driven by external and/or internal changes, i.e. they are reactive and context-aware. These classes include, but are not limited to IoT, smart cities, cyber-physical systems and sensor networks.An important design feature of these systems should be the ability of adapting their behavior to environment changes. This requires handling a runtime representation of the context enriched with variation points that relate different behaviors to possible changes of the representation.In this paper, we present a reference architecture for reactive, context-aware systems able to handle contextual knowledge (that defines what the system perceives) by means of virtual sensors and able to react to environment changes by means of virtual actuators, both represented in a declarative manner through semantic web technologies. To improve the ability to react with a proper behavior to context changes (e.g. faults) that may influence the ability of the system to observe the environment, we allow the definition of logical sensors and actuators through an extension of the SSN ontology (a W3C standard). In our reference architecture a knowledge base of sensors and actuators (hosted by an RDF triple store) is bound to real world by grounding semantic elements to physical devices via REST APIs.The proposed architecture along with the defined ontology try to address the main problems of dynamically reconfigurable systems by exploiting a declarative, queryable approach to enable runtime reconfiguration with the help of (a) semantics to support discovery in heterogeneous environment, (b) composition logic to define alternative behaviors for variation points, (c) bi-causal connection life-cycle to avoid dangling links with the external environment. The proposal is validated in a case study aimed at designing an edge node for smart buildings dedicated to cultural heritage preservation.SEO: A unique approach to enhance the site rank by implementing Efficient Keywords Schemehttps://peerj.com/preprints/276092019-03-222019-03-22Khalil ur RehmanAnaa YasinTariq MahmoodMuhammad AzeemSaqib Ali
In search engine optimization individual website pages are optimized through precise keywords, while the websites are optimized using back link watch. The existing literature has no proper guideline for keywords selection and back link generation. In this research, we proposed a model for making back link watch generation and the selection of keywords through precise research analysis. The information on webpages consist of specific keywords while the website traffic is monitored through referrals. we concluded that during the development of Page Content, and architecture, if selected keywords are used in Title, Headings and Meta Tag then the page result is higher in search results. Moreover, for the back-link generation use a shorter volume of URL that monitor the complete traffic of a site can be placed on trusted location which increase the ranks of a site. Proposed model has been validated by comparing quantitative data of website rank taken before and after implementation of framework. Results revealed that overall increase gained in site rank by applying the proposed model was 40%.
In search engine optimization individual website pages are optimized through precise keywords, while the websites are optimized using back link watch. The existing literature has no proper guideline for keywords selection and back link generation. In this research, we proposed a model for making back link watch generation and the selection of keywords through precise research analysis. The information on webpages consist of specific keywords while the website traffic is monitored through referrals. we concluded that during the development of Page Content, and architecture, if selected keywords are used in Title, Headings and Meta Tag then the page result is higher in search results. Moreover, for the back-link generation use a shorter volume of URL that monitor the complete traffic of a site can be placed on trusted location which increase the ranks of a site. Proposed model has been validated by comparing quantitative data of website rank taken before and after implementation of framework. Results revealed that overall increase gained in site rank by applying the proposed model was 40%.Testing the FAIR metrics on data catalogshttps://peerj.com/preprints/271512018-09-042018-09-04Jarno A A van ErpCarolyn D LangenAnca BoonKees van Bochove
The introduction of the FAIR –Findable, Accessible, Interoperable, Reusable– principles has caused quite an uproar within the scientific community. Principles which, if everyone adheres to them, could result in new, revolutionary ways of performing research and fulfill the promise of open science. Furthermore, it allows for concepts such as personalized medicine and personal health monitoring to -finally- become implemented in daily practice.
However, to bring about these changes, data users need to rethink the way they treat scientific data. Just passing a dataset along, without extensive metadata will not suffice anymore. Such new ways of executing research require a significantly different approach from the entire scientific community or, for that matter, anyone who wants to reap the benefits from going FAIR.
Yet, how do you initiate behavioral change? One important solution is by changing the software scientists use and requiring data owners, or data stewards, to FAIRify their dataset. Data catalogs are a great starting point for FAIRifying data as the software already intends to make data Findable and Accessible, while the metadata is Interoperable and relying on users to provide sufficient metadata to ensure Reusability. In this paper we analyse how well the FAIR principles are implemented in several data catalogs.
To determine how FAIR a catalog is, the FAIR metrics were created by the GO-FAIR initiative. These metrics help determine to what extend data can be considered FAIR. However, the metrics were only recently developed, being first released at the end of 2017. At the moment software does not come standard with a FAIR metrics review. Still, this insight is highly desired by the scientific community. How else can they be sure that (public) money is spend in a FAIR way?
The Hyve has tested/evaluated three popular open source data catalogs based on the FAIR metrics: CKAN, Dataverse, and Invenio. Most data stewards will be familiar with at least one of these.
Within this white paper we provide answers to the following questions:
Which of the three data catalogs performs best in making data FAIR?
Which data catalog utilizes FAIR datasets the most?
Which one creates the most FAIR metadata?
Which catalog has the highest potential to increase its FAIRness, and how?
Which data catalog facilitates the FAIRifying process the best?
The introduction of the FAIR –Findable, Accessible, Interoperable, Reusable– principles has caused quite an uproar within the scientific community. Principles which, if everyone adheres to them, could result in new, revolutionary ways of performing research and fulfill the promise of open science. Furthermore, it allows for concepts such as personalized medicine and personal health monitoring to -finally- become implemented in daily practice.However, to bring about these changes, data users need to rethink the way they treat scientific data. Just passing a dataset along, without extensive metadata will not suffice anymore. Such new ways of executing research require a significantly different approach from the entire scientific community or, for that matter, anyone who wants to reap the benefits from going FAIR.Yet, how do you initiate behavioral change? One important solution is by changing the software scientists use and requiring data owners, or data stewards, to FAIRify their dataset. Data catalogs are a great starting point for FAIRifying data as the software already intends to make data Findable and Accessible, while the metadata is Interoperable and relying on users to provide sufficient metadata to ensure Reusability. In this paper we analyse how well the FAIR principles are implemented in several data catalogs.To determine how FAIR a catalog is, the FAIR metrics were created by the GO-FAIR initiative. These metrics help determine to what extend data can be considered FAIR. However, the metrics were only recently developed, being first released at the end of 2017. At the moment software does not come standard with a FAIR metrics review. Still, this insight is highly desired by the scientific community. How else can they be sure that (public) money is spend in a FAIR way?The Hyve has tested/evaluated three popular open source data catalogs based on the FAIR metrics: CKAN, Dataverse, and Invenio. Most data stewards will be familiar with at least one of these.Within this white paper we provide answers to the following questions:Which of the three data catalogs performs best in making data FAIR?Which data catalog utilizes FAIR datasets the most?Which one creates the most FAIR metadata?Which catalog has the highest potential to increase its FAIRness, and how?Which data catalog facilitates the FAIRifying process the best?Anomaly analysis on an open DNS datasethttps://peerj.com/preprints/271162018-08-142018-08-14Benjamin AzizNikolaos MenychtasAmmar Al-Bazi
The increasing availability of open data and the demand to understand better the nature of anomalies and the causes underlying them in modern systems is encouraging researchers to analyse open datasets in various ways. These include both quantitative and qualitative methods. We show here how quantitative methods, such as timeline, local averages and exponentially weighted moving average analyses, led in this work to the discovery of three anomalies in a large open DNS dataset published by the Los Alamos National Laboratory.
The increasing availability of open data and the demand to understand better the nature of anomalies and the causes underlying them in modern systems is encouraging researchers to analyse open datasets in various ways. These include both quantitative and qualitative methods. We show here how quantitative methods, such as timeline, local averages and exponentially weighted moving average analyses, led in this work to the discovery of three anomalies in a large open DNS dataset published by the Los Alamos National Laboratory.A mixed-method empirical study of Function-as-a-Service software development in industrial practicehttps://peerj.com/preprints/270052018-06-262018-06-26Philipp LeitnerErik WitternJosef SpillnerWaldemar Hummer
Function-as-a-Service (FaaS) describes cloud computing services that make infrastructure components transparent to application developers, thus falling in the larger group of “serverless” computing models. When using FaaS offerings, such as AWS Lambda, developers provide atomic and short-running code for their functions, and FaaS providers execute and horizontally scale them on- demand . Currently, there is no systematic research on how developers use serverless, what types of applications lend themselves to this model, or what architectural styles and practices FaaS-based applications are based on. We present results from a mixed-method study, combining interviews with advanced practitioners, a systematic analysis of grey literature, and a Web-based survey. We find that successfully adopting FaaS requires a different mental model, where systems are primarily constructed by composing pre-existing services, with FaaS often acting as the “glue” that brings these services together. Tooling availability and maturity, especially related to testing and deployment, remains a major difficulty. Further, we find that current FaaS systems lack systematic support for function reuse, and abstractions and programming models for building non-trivial FaaS applications are limited . We conclude with a discussion of implications for FaaS providers, software developers, and researchers.
Function-as-a-Service (FaaS) describes cloud computing services that make infrastructure components transparent to application developers, thus falling in the larger group of “serverless” computing models. When using FaaS offerings, such as AWS Lambda, developers provide atomic and short-running code for their functions, and FaaS providers execute and horizontally scale them on- demand . Currently, there is no systematic research on how developers use serverless, what types of applications lend themselves to this model, or what architectural styles and practices FaaS-based applications are based on. We present results from a mixed-method study, combining interviews with advanced practitioners, a systematic analysis of grey literature, and a Web-based survey. We find that successfully adopting FaaS requires a different mental model, where systems are primarily constructed by composing pre-existing services, with FaaS often acting as the “glue” that brings these services together. Tooling availability and maturity, especially related to testing and deployment, remains a major difficulty. Further, we find that current FaaS systems lack systematic support for function reuse, and abstractions and programming models for building non-trivial FaaS applications are limited . We conclude with a discussion of implications for FaaS providers, software developers, and researchers.Resilience enhancement of container-based cloud load balancing servicehttps://peerj.com/preprints/268752018-04-202018-04-20Dongsheng Zhang
Web traffic is highly jittery and unpredictable. Load balancer plays a significant role in mitigating the uncertainty in web environments. With the growing adoption of cloud computing infrastructure, software load balancer becomes more common in recent years. Current load balancer services distribute the network requests based on the number of network connections to the backend servers. However, the load balancing algorithm fails to work when other resources such as CPU or memory in a backend server saturates. We experimented and discussed the resilience evaluation and enhancement of container-based software load balancer services in cloud computing environments. We proposed a pluggable framework that can dynamically adjust the weight assigned to each backend server based on real-time monitoring metrics.
Web traffic is highly jittery and unpredictable. Load balancer plays a significant role in mitigating the uncertainty in web environments. With the growing adoption of cloud computing infrastructure, software load balancer becomes more common in recent years. Current load balancer services distribute the network requests based on the number of network connections to the backend servers. However, the load balancing algorithm fails to work when other resources such as CPU or memory in a backend server saturates. We experimented and discussed the resilience evaluation and enhancement of container-based software load balancer services in cloud computing environments. We proposed a pluggable framework that can dynamically adjust the weight assigned to each backend server based on real-time monitoring metrics.What an entangled Web we weave: An information-centric approach to time-evolving socio-technical systemshttps://peerj.com/preprints/27892018-04-152018-04-15Markus Luczak-RoeschKieron O'HaraJesse David DinneenRamine Tinati
A new layer of complexity, constituted of networks of information token recurrence, has been identified in socio-technical systems such as the Wikipedia online community and the Zooniverse citizen science platform. The identification of this complexity reveals that our current understanding of the actual structure of those systems, and consequently the structure of the entire World Wide Web, is incomplete. Here we establish the principled foundations and practical advantages of analyzing information diffusion within and across Web systems with Transcendental Information Cascades, and outline resulting directions for future study in the area of socio-technical systems. We also suggest that Transcendental Information Cascades may be applicable to any kind of time-evolving system that can be observed using digital technologies, and that the structures found in such systems consist of properties common to all naturally occurring complex systems.
A new layer of complexity, constituted of networks of information token recurrence, has been identified in socio-technical systems such as the Wikipedia online community and the Zooniverse citizen science platform. The identification of this complexity reveals that our current understanding of the actual structure of those systems, and consequently the structure of the entire World Wide Web, is incomplete. Here we establish the principled foundations and practical advantages of analyzing information diffusion within and across Web systems with Transcendental Information Cascades, and outline resulting directions for future study in the area of socio-technical systems. We also suggest that Transcendental Information Cascades may be applicable to any kind of time-evolving system that can be observed using digital technologies, and that the structures found in such systems consist of properties common to all naturally occurring complex systems.Virtual and remote laboratories augment self learning and interactions: Development, deployment and assessments with direct and online feedbackhttps://peerj.com/preprints/267152018-03-162018-03-16Dhanush KumarRakhi RadhamaniNijin NizarKrishnashree AchuthanBipin NairShyam Diwakar
Background. Over the last few decades, in developing nations including India, there have been rapid developments in information and communication technologies with progress towards sustainable development goals facilitating universal access to education. With the aim of augmenting laboratory skill training, India’s Ministry of Human Resource Development (MHRD)’s National Mission on Education through Information and Communication Technology (NME-ICT), launched Virtual laboratories project, enabling professors and institutions to deliver interactive animations, mathematical simulators and remotely-controlled equipment for online experiments in biosciences and engineering courses. Towards that mission of improving teaching and learning quality and with a focus on improving access to users in geographically remote and economically constrained institutes in India, we developed and deployed over 30 web-based laboratories consisting of over 360 computer-based online experiments. This paper focuses on the design, development, deployment of virtual laboratories and assesses the role of online experiments in providing self-learning and novel pedagogical practices for user communities.
Methods. As part of deployment, we evaluated the role virtual laboratories in facilitating self-organized learning and usage perception as a teaching tool in a blended education system. Direct feedback data was collected through organized workshops from 386 university-level students, 192 final year higher secondary school (pre-university) students and 234 college professors from various places across India. We also included online feedback from 2012-2018 to interpret usage analysis and adaptability of virtual and remote labs by online users.
Results. More than 80% of students who used virtual laboratories scored higher in examinations compared to a control group. With 386 students, 80% suggested adapted to self-learning using virtual laboratories. 82% of university teachers who employ virtual laboratories indicated using them to complement teaching material and reduce teaching time. Increase in online usage and feedback suggests novel trends in incorporating online platforms as pedagogical tools.
Discussion. Feedback indicated virtual laboratories altered and enhanced student’s autonomous learning abilities and improved interaction in blended classrooms. Pedagogical analysis suggests the use of ICT-enabled virtual laboratories as a self-organized distance education learning platform for university and pre-university students from economically challenged or time-restrained environments. Online usage statistics indicated steady increase of new users on this online repository suggesting global acceptance of virtual laboratories as a complementing laboratory skill-training online repository.
Background. Over the last few decades, in developing nations including India, there have been rapid developments in information and communication technologies with progress towards sustainable development goals facilitating universal access to education. With the aim of augmenting laboratory skill training, India’s Ministry of Human Resource Development (MHRD)’s National Mission on Education through Information and Communication Technology (NME-ICT), launched Virtual laboratories project, enabling professors and institutions to deliver interactive animations, mathematical simulators and remotely-controlled equipment for online experiments in biosciences and engineering courses. Towards that mission of improving teaching and learning quality and with a focus on improving access to users in geographically remote and economically constrained institutes in India, we developed and deployed over 30 web-based laboratories consisting of over 360 computer-based online experiments. This paper focuses on the design, development, deployment of virtual laboratories and assesses the role of online experiments in providing self-learning and novel pedagogical practices for user communities.Methods. As part of deployment, we evaluated the role virtual laboratories in facilitating self-organized learning and usage perception as a teaching tool in a blended education system. Direct feedback data was collected through organized workshops from 386 university-level students, 192 final year higher secondary school (pre-university) students and 234 college professors from various places across India. We also included online feedback from 2012-2018 to interpret usage analysis and adaptability of virtual and remote labs by online users.Results. More than 80% of students who used virtual laboratories scored higher in examinations compared to a control group. With 386 students, 80% suggested adapted to self-learning using virtual laboratories. 82% of university teachers who employ virtual laboratories indicated using them to complement teaching material and reduce teaching time. Increase in online usage and feedback suggests novel trends in incorporating online platforms as pedagogical tools.Discussion. Feedback indicated virtual laboratories altered and enhanced student’s autonomous learning abilities and improved interaction in blended classrooms. Pedagogical analysis suggests the use of ICT-enabled virtual laboratories as a self-organized distance education learning platform for university and pre-university students from economically challenged or time-restrained environments. Online usage statistics indicated steady increase of new users on this online repository suggesting global acceptance of virtual laboratories as a complementing laboratory skill-training online repository.Unbalanced sentiment classification: an assessment of ANN in the context of sampling the majority classhttps://peerj.com/preprints/266182018-03-052018-03-05Rodrigo MoraesJoão Francisco ValiatiWilson Pires Gavião Neto
Many people make their opinions available on the Internet nowadays, and researchers have been proposing methods to automate the task of classifying textual reviews as positive or negative. Usual supervised learning techniques have been adopted to accomplish such a task. In practice, positive reviews are abundant in comparison to negative's. This context poses challenges to learning-based methods and data undersampling/oversampling are popular preprocessing techniques to overcome the problem. A combination of sampling techniques and learning methods, like Artificial Neural Networks (ANN) or Support Vector Machines (SVM), has been successfully adopted as a classification approach in many areas, while the sentiment classification literature has not explored ANN in studies that involve sampling methods to balance data. Even the performance of SVM, which is widely used as a sentiment learner, has been rarely addressed under the context of a preceding sampling method. This paper addresses document-level sentiment analysis with unbalanced data and focus on empirically assessing the performance of ANN in the context of undersampling the (majority) set of positive reviews. We adopted the performance of SVM as a baseline, since some studies have indicated SVM as being less subject to the class imbalance problem. Results are produced in terms of a traditional bag-of-words model with popular feature selection and weighting methods. Our experiments indicated that SVM are more stable than ANN in highly unbalanced (80%) data scenarios. However, under the discarding of information generated by random undersampling, ANN outperform SVM or produce comparable results.
Many people make their opinions available on the Internet nowadays, and researchers have been proposing methods to automate the task of classifying textual reviews as positive or negative. Usual supervised learning techniques have been adopted to accomplish such a task. In practice, positive reviews are abundant in comparison to negative's. This context poses challenges to learning-based methods and data undersampling/oversampling are popular preprocessing techniques to overcome the problem. A combination of sampling techniques and learning methods, like Artificial Neural Networks (ANN) or Support Vector Machines (SVM), has been successfully adopted as a classification approach in many areas, while the sentiment classification literature has not explored ANN in studies that involve sampling methods to balance data. Even the performance of SVM, which is widely used as a sentiment learner, has been rarely addressed under the context of a preceding sampling method. This paper addresses document-level sentiment analysis with unbalanced data and focus on empirically assessing the performance of ANN in the context of undersampling the (majority) set of positive reviews. We adopted the performance of SVM as a baseline, since some studies have indicated SVM as being less subject to the class imbalance problem. Results are produced in terms of a traditional bag-of-words model with popular feature selection and weighting methods. Our experiments indicated that SVM are more stable than ANN in highly unbalanced (80%) data scenarios. However, under the discarding of information generated by random undersampling, ANN outperform SVM or produce comparable results.Assessing value of biomedical digital repositorieshttps://peerj.com/preprints/26882017-11-292017-11-29Chun-Nan HsuAnita BandrowskiJeffrey S. GretheMaryann E. Martone
Digital repositories bring direct impact and influence on the research community and society but measuring their value using formal metrics remains challenging. their value. It is challenging to define a single perfect metric that covers all quality aspects. Here, we distinguish here between impact and influence and discuss measures and mentions as the basis of quality metrics of a digital repository. We argue that these challenges may potentially be overcome through the introduction of standard resource identification and data citation practices. We briefly summarize our research and experience in the Neuroscience Information Framework, the BD2K BioCaddie project on data citation, and the Resource Identification Initiative. Full implementation of these standards will depend on cooperation from all stakeholders --- digital repositories, authors, publishers, and funding agencies, but both resource and data citation have been gaining support with researchers and publishers.
Digital repositories bring direct impact and influence on the research community and society but measuring their value using formal metrics remains challenging. their value. It is challenging to define a single perfect metric that covers all quality aspects. Here, we distinguish here between impact and influence and discuss measures and mentions as the basis of quality metrics of a digital repository. We argue that these challenges may potentially be overcome through the introduction of standard resource identification and data citation practices. We briefly summarize our research and experience in the Neuroscience Information Framework, the BD2K BioCaddie project on data citation, and the Resource Identification Initiative. Full implementation of these standards will depend on cooperation from all stakeholders --- digital repositories, authors, publishers, and funding agencies, but both resource and data citation have been gaining support with researchers and publishers.