PeerJ Computer Science Preprints: Visual Analyticshttps://peerj.com/preprints/index.atom?journal=cs&subject=11940Visual Analytics articles published in PeerJ Computer Science PreprintsAn expert study on hierarchy comparison methods applied to biological taxonomies curationhttps://peerj.com/preprints/279032019-08-162019-08-16Lilliana Sancho-ChavarriaFabian BeckErick Mata-Montero
Comparison of hierarchies aims at identifying differences and similarities between two or more hierarchical structures. In the biological taxonomy domain, comparison is indispensable for the reconciliation of alternative versions of a taxonomic classification. Biological taxonomies are knowledge structures that may include large amounts of nodes (taxa), which are typically maintained manually. We present the results of a user study with taxonomy experts that evaluates four well-known methods for the comparison of two hierarchies, namely, edge drawing, matrix representation, animation, and agglomeration. Each of these methods is evaluated with respect to seven typical biological taxonomy curation tasks. To this end, we designed an interactive software environment through which expert taxonomists performed exercises representative of the considered tasks. We evaluated participants’ effectiveness and level of satisfaction from both quantitative and qualitative perspectives. Overall quantitative results evidence that participants were less effective with agglomeration whereas they were more satisfied with edge drawing. Qualitative findings reveal a greater preference among participants for the edge drawing method. Also, from the qualitative analysis, we obtained insights that contribute to explain the differences between the methods and provide directions for future research.
Comparison of hierarchies aims at identifying differences and similarities between two or more hierarchical structures. In the biological taxonomy domain, comparison is indispensable for the reconciliation of alternative versions of a taxonomic classification. Biological taxonomies are knowledge structures that may include large amounts of nodes (taxa), which are typically maintained manually. We present the results of a user study with taxonomy experts that evaluates four well-known methods for the comparison of two hierarchies, namely, edge drawing, matrix representation, animation, and agglomeration. Each of these methods is evaluated with respect to seven typical biological taxonomy curation tasks. To this end, we designed an interactive software environment through which expert taxonomists performed exercises representative of the considered tasks. We evaluated participants’ effectiveness and level of satisfaction from both quantitative and qualitative perspectives. Overall quantitative results evidence that participants were less effective with agglomeration whereas they were more satisfied with edge drawing. Qualitative findings reveal a greater preference among participants for the edge drawing method. Also, from the qualitative analysis, we obtained insights that contribute to explain the differences between the methods and provide directions for future research.Improving the resolution of microscope by deconvolution after dense scanhttps://peerj.com/preprints/278492019-07-292019-07-29Yaohua Xie
Super-resolution microscopes (such as STED) illuminate samples with a tiny spot, and achieve very high resolution. But structures smaller than the spot cannot be resolved in this way. Therefore, we propose a technique to solve this problem. It is termed “Deconvolution after Dense Scan (DDS)”. First, a preprocessing stage is introduced to eliminate the optical uncertainty of the peripheral areas around the sample’s ROI (Region of Interest). Then, the ROI is scanned densely together with its peripheral areas. Finally, the high resolution image is recovered by deconvolution. The proposed technique does not need to modify the apparatus much, and is mainly performed by algorithm. Simulation experiments show that the technique can further improve the resolution of super-resolution microscopes.
Super-resolution microscopes (such as STED) illuminate samples with a tiny spot, and achieve very high resolution. But structures smaller than the spot cannot be resolved in this way. Therefore, we propose a technique to solve this problem. It is termed “Deconvolution after Dense Scan (DDS)”. First, a preprocessing stage is introduced to eliminate the optical uncertainty of the peripheral areas around the sample’s ROI (Region of Interest). Then, the ROI is scanned densely together with its peripheral areas. Finally, the high resolution image is recovered by deconvolution. The proposed technique does not need to modify the apparatus much, and is mainly performed by algorithm. Simulation experiments show that the technique can further improve the resolution of super-resolution microscopes.Improving the quality of low SNR images using high SNR imageshttps://peerj.com/preprints/278002019-06-142019-06-14Yaohua Xie
It is important to get data with Signal-Noise-Ratios (SNR) as high as possible. Compared to other techniques, filtering methods are fast. But they do not make full use of the characteristics of sample structure which reflected by relevant high SNR images. In this study, we propose a technique termed “TransFiltering”. It transplants the characteristics of a high SNR image to the frequency spectrum of a low SNR image by filtering. Usually, the high SNR and the low SNR image should have similar structure pattern. For example, they all come from the same image sequence. In the proposed method, Fourier transform is first performed on both of the images. Then, the frequency spectrum of the low SNR image is filtered according to that of the high SNR image. Finally, inverse Fourier transform is performed to get the image with improved SNR. Experiment results show that the proposed method is both effective and efficient.
It is important to get data with Signal-Noise-Ratios (SNR) as high as possible. Compared to other techniques, filtering methods are fast. But they do not make full use of the characteristics of sample structure which reflected by relevant high SNR images. In this study, we propose a technique termed “TransFiltering”. It transplants the characteristics of a high SNR image to the frequency spectrum of a low SNR image by filtering. Usually, the high SNR and the low SNR image should have similar structure pattern. For example, they all come from the same image sequence. In the proposed method, Fourier transform is first performed on both of the images. Then, the frequency spectrum of the low SNR image is filtered according to that of the high SNR image. Finally, inverse Fourier transform is performed to get the image with improved SNR. Experiment results show that the proposed method is both effective and efficient.Extracting super-resolution details directly from a diffraction-blurred image or part of its frequency spectrumhttps://peerj.com/preprints/275912019-04-282019-04-28Edward Y Sheffield
It is usually believed that the low frequency part of a signal’s Fourier spectrum represents its profile, while the high frequency part represents its details. Conventional light microscopes filter out the high frequency parts of image signals, so that people cannot see the details of the samples (objects being imaged) in the blurred images. However, we find that in a certain “resolvable condition”, a signal’s low frequency and high frequency parts not only represent profile and details respectively. Actually, any one of them also contains the full information (including both profile and details) of the sample’s structure. Therefore, for samples with spatial frequency beyond diffraction-limit, even if the image’s high frequency part is filtered out by the microscope, it is still possible to extract the full information from the low frequency part. On the basis of the above findings, we propose the technique of Deconvolution Super-resolution (DeSu-re), including two methods. One method extracts the full information of the sample’s structure directly from the diffraction-blurred image, while the other extracts it directly from part of the observed image’s spectrum (e.g., low frequency part). Both theoretical analysis and simulation experiment support the above findings, and also verify the effectiveness of the proposed methods.
It is usually believed that the low frequency part of a signal’s Fourier spectrum represents its profile, while the high frequency part represents its details. Conventional light microscopes filter out the high frequency parts of image signals, so that people cannot see the details of the samples (objects being imaged) in the blurred images. However, we find that in a certain “resolvable condition”, a signal’s low frequency and high frequency parts not only represent profile and details respectively. Actually, any one of them also contains the full information (including both profile and details) of the sample’s structure. Therefore, for samples with spatial frequency beyond diffraction-limit, even if the image’s high frequency part is filtered out by the microscope, it is still possible to extract the full information from the low frequency part. On the basis of the above findings, we propose the technique of Deconvolution Super-resolution (DeSu-re), including two methods. One method extracts the full information of the sample’s structure directly from the diffraction-blurred image, while the other extracts it directly from part of the observed image’s spectrum (e.g., low frequency part). Both theoretical analysis and simulation experiment support the above findings, and also verify the effectiveness of the proposed methods.Visualizing systems and software performance - Report on the GI-Dagstuhl seminar for young researchers, July 9-13, 2018https://peerj.com/preprints/272532018-10-042018-10-04Fabian BeckAlexandre BergelCor-Paul BezemerKatherine E. Isaacs
This GI-Dagstuhl seminar addressed the problem of visualizing performance-related data of systems and the software that they run. Due to the scale of performance-related data and the open-ended nature of analyzing it, visualization is often the only feasible way to comprehend, improve, and debug the performance behaviour of systems. The rise of cloud and big data systems, and the rapidly growing scale of the performance-related data that they generate, have led to an increased need for visualization of such data. However, the research communities behind data visualization, performance engineering, and high-performance computing are largely disjunct. The goal of this seminar was to bring together young researchers from these research areas to identify cross-community collaboration and to set the path for long-lasting collaborations towards rich and effective visualizations of performance-related data.
This GI-Dagstuhl seminar addressed the problem of visualizing performance-related data of systems and the software that they run. Due to the scale of performance-related data and the open-ended nature of analyzing it, visualization is often the only feasible way to comprehend, improve, and debug the performance behaviour of systems. The rise of cloud and big data systems, and the rapidly growing scale of the performance-related data that they generate, have led to an increased need for visualization of such data. However, the research communities behind data visualization, performance engineering, and high-performance computing are largely disjunct. The goal of this seminar was to bring together young researchers from these research areas to identify cross-community collaboration and to set the path for long-lasting collaborations towards rich and effective visualizations of performance-related data.The utilization of landscape pictures extracted from open picture collections for the determination of interest in spatial featureshttps://peerj.com/preprints/272342018-09-212018-09-21Jens IngensandJean Christophe FoltêteStéphane CretegnyNicolas BlancSarah Composto
This paper describes a method that uses georeferenced landscape pictures extracted from open picture collections for the determination of the population's interest in spatial features. The automated method takes into account the coordinates of the camera position as well as the azimuth angle, the focal length and the crop factor in order to calculate a field of view using a digital terrain model (DTM). This field of view can thereafter be used for the determination of interest in spatial features. In a case study involving more than 3'000 georeferenced pictures we investigate the potential of the method.
This paper describes a method that uses georeferenced landscape pictures extracted from open picture collections for the determination of the population's interest in spatial features. The automated method takes into account the coordinates of the camera position as well as the azimuth angle, the focal length and the crop factor in order to calculate a field of view using a digital terrain model (DTM). This field of view can thereafter be used for the determination of interest in spatial features. In a case study involving more than 3'000 georeferenced pictures we investigate the potential of the method.DataScope: Interactive visual exploratory dashboards for large multidimensional datahttps://peerj.com/preprints/264412018-01-162018-01-16Ganesh R IyerSapoonjyoti DuttaduwarahAshish Sharma
DataScope is a web-based tool for generating interactive visual dashboards on large scale multidimensional datasets. Users can use these dashboards to explore data and create cohorts for downstream analysis. We describe DataScope's architecture, design considerations and provide an overview of its system design. We highlight some of DataScope's features that were useful in the case studies using datasets from cancer registries and co-clinical trials. In benchmarks DataScope is able to perform sub-second queries on data sizes ranging from thousand to million records.
DataScope is a web-based tool for generating interactive visual dashboards on large scale multidimensional datasets. Users can use these dashboards to explore data and create cohorts for downstream analysis. We describe DataScope's architecture, design considerations and provide an overview of its system design. We highlight some of DataScope's features that were useful in the case studies using datasets from cancer registries and co-clinical trials. In benchmarks DataScope is able to perform sub-second queries on data sizes ranging from thousand to million records.MCLEAN: Multilevel Clustering Exploration As Networkhttps://peerj.com/preprints/34482017-12-052017-12-05Daniel AlcaideJan Aerts
Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets.
In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis.
To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean
Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets.In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis.To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mcleanVisualizing mutation occurrence using Big Datahttps://peerj.com/preprints/33252017-10-052017-10-05Silvana Iuliana Albert
The volume of collected genetic data has been growing exponentially in the past few years and we need to improve the way we store, analyze and visualize it in order to be able to draw relevant conclusions that could improve the life quality of people. Extracting patterns and predicting future mutations and their impact will rely heavily on the efficient use of Big Data. Often a mutation on its own cannot provide enough information about a disorder or disease. Only if we combine the genetic information with the organism’s environment we can draw some conclusions about penetrance and expressively of the mutation. Because many genes can cause a single disease and at the same time a single gene can cause multiple diseases, we need to analyze the whole context of a person.
In this work, a distributed solution that provides demographics and metrics about diagnostics and mutations is pro posed. Seeing the occurrence of a mutation in a particular geographic region can help medical special ists narrow down the search for a patient’s mutations without sequencing the whole genome.
The volume of collected genetic data has been growing exponentially in the past few years and we need to improve the way we store, analyze and visualize it in order to be able to draw relevant conclusions that could improve the life quality of people. Extracting patterns and predicting future mutations and their impact will rely heavily on the efficient use of Big Data. Often a mutation on its own cannot provide enough information about a disorder or disease. Only if we combine the genetic information with the organism’s environment we can draw some conclusions about penetrance and expressively of the mutation. Because many genes can cause a single disease and at the same time a single gene can cause multiple diseases, we need to analyze the whole context of a person.In this work, a distributed solution that provides demographics and metrics about diagnostics and mutations is pro posed. Seeing the occurrence of a mutation in a particular geographic region can help medical special ists narrow down the search for a patient’s mutations without sequencing the whole genome.Melanoma expression analysis with Big Data technologieshttps://peerj.com/preprints/32602017-09-262017-09-26Alicia Fernandez-RoviraRocio Lavado-ValenzuelaMiguel Ángel Berciano GuerreroIsmael Navas-DelgadoJosé F Aldana-Montes
Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.
Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.