To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
At production, please add a sentence to clarify how you split the data into training and testing (time based).
Please add a sentence to clarify how you split the data into training and testing (time based).
Although the paper has improved significantly, reviewer 3 has still some concerns before accepting the paper. Please prepare a the new version of the paper taking into account their suggestions, specially the ones related to the experimental design.
The authors have addressed all my comments and have updated the manuscript in a way that the description of methods and results, as well as limitations of their approach, are clearly exposed. Therefore, I believe that the paper meets the PeerJ criteria and should be accepted as is.
However, I would like to point out two minor mistakes:
page 4, 151: Side effect – Side effects
page 8, 318: Using Using -- Using
The paper has improved since its original version addressing concerns mentioned the previous review. Thanks.
While authors have included more details on their experimental setting in the revised version, I still have a few more questions on the settings and methodology:
1- Line 149 mentions topic and volume filters. Could you please clarify in the text what are these filters. If that's already there it is not very clear.
2- CHV is very limited. Do you have any idea how much its limitation may have affected your data collection?
3- How the annotation of the tweets were done? Who annotated them and if they were annotated by multiple annotators, did you calculate inter-annotator agreement?
4- Did you randomly divide the data into training, testing and development or did you sort them based on a filter such as time? Past work has shown that it is possible to overfit a SVM classifier if tweet data is randomly divided because of the dependency of the tweets (look at: Reading the markets: Forecasting public opinion of political candidates by news analysis, in COLING 2008, and Evaluation Methods for Statistically Dependent Text, Computational Linguistics journal 2015).
If your data is in fact divided randomly, I'd be interested to know if you get similar results by sorting them and using old tweets as training/dev and new ones as testing.
In any case, please clarify this in your manuscript for your readers.
Please see above.
- I still think that Figure 2 could be removed and the corresponding information being shown in the text. The caption does include interesting information, but the figure itself not.
- Normalization method has to be explained in the text the way you explained in response to my comment before. Also, mention this to the limitation of your work.
According the reviewers, there are some important things to re-write before the papare was ready for publication. Be specially careful with questions related to the experimental design and the validity of the findings.
The article is well written. There is sufficient background and introduction. The literature is well cited - of a few additional references might be in appropriate drug repositioning (21849665, 21849664), drug-drug interactions (25122524, 22422992).
I'm not familiar with the scope of the journal but the experimental design is well described and easy to understand.
The authors make the following assumption, which is inherently problematic. “drug X causes a distinct profile of side-effects, and this side-effect profile is typical of drugs used to treat a certain disease Y, then drug X should be evaluated for repositioning for the treatment of disease Y”. Repositioning shouldn’t be done on the basis of similar side effects, but side effects “rescuing” a disease phenotype. Side effects are a negative consequence and if one drug with a certain side effect profile is used to treat a disease, there is no evidence that another drug with a similarly bad side effect profile will be a better therapeutic. This should either be discussed in depth as a limitation of the approach or the study shouldn’t be framed as a drug repositioning study, but a way to identify drugs with similar mechanism. In addition to mining Twitter for drug information, I would suggest mining it for disease symptoms and try to link drugs and diseases based on a directional inverse association of the side effects and symptoms.
The authors describe a novel computational method that uses side-effect data mined from social media to show that known indications are well recovered and current trial indications can also be identified. The work relies on the assumption that Drugs with similar side-effects might share a common mechanism of action; therefore it may be possible to infer new indications based on the similarity of side-effect profiles. While the approach of mining social media in the context of drug repositioning is innovative, the computational methods applied are robust, there is an underlying flaw in the study.
In addition to the issue raised above, there several other concerns:
• Figure 2 is a hairball and doesn’t offer much value, it would be more useful to provide the statistics on the network – how many nodes, significant edges, which is the node with the highest number of connections? What are the strongest relationships? I would also incorporate side effect information here - how many different side effects were included? Which drug has the highest number of side effects?
• Table 1 – how were these chosen? Are these the top 10? Also should include original indication
• It would be interesting to see what are the underlying “tweets/side effects” for each of the drug pairs.
• How were the three candidates chosen to discuss further? Maybe list in Table 1 and expand Figure 4?
• The datasets used for validation namely “Cortellis Clinical Trials Intelligence” is not a resource available to those who would like to replicate the findings. At least for the compounds investigated here, that data should be provided as a supplement
While the method shows encouraging results, I agree with the authors that it is more likely to play a role in drug repositioning as a component in an integrated approach, potentially with databases like SIDER and molecular predictions.
The INTRODUCTION section provides a good overview of drug repurposing and the different pharmacological aspects used by different researchers, including side-effects similarities. However, in my opinion there is information that should be described in detail in this section:
• Page 2: The authors mention that existing repositioning methods based on side-effects have used data from the SIDER database and other similar resources, but they do not reference the corresponding papers.
• As an alternative to the lack of completeness of these traditional resources, they propose using side-effects mined from social media. However, the only limitation described by the authors is that only a small fraction of daily tweets contain reports of drug side-effects, restricting the number of drugs analysed in the study. However, text mining of side effects is still an open research area confronting several challenges such as use of idiomatic expressions, spelling errors, use of ambiguous terms, or exaggerated information that might provide false positives, among others (Leaman et al., 2010; Sampathkumar, Chen, & Luo, 2014; Segura-Bedmar, Martínez, Revert, & Moreno-Schneider, 2015).
• In addition to this, side-effects were collected from social media during a 6 months period. In my opinion, it is probable that the number of side effects reported for a drug in social media during that period of time would be smaller than the number of side-effects collected in traditional resources − such as, for example, the Summary of Product Characteristics. If this is the case, using data from social media would not address the limitation of traditional resources – i.e., their lack of completeness and difficulty to be updated with new side-effects information.
• (Page 6) In RESULTS AND DISCUSSION section it is asserted that “our method should provide a viable alternative to existing approaches”. However, these other approaches are not discussed in the paper. Moreover, their current limitations, and how they could be overcome by the method proposed by the authors, are not described.
• (Page 2) The authors describe the hypothesis that “drugs sharing a significant number of side-effects might share a common mechanism of action linking side-effects with disease treatment”. However, the examples they use to illustrate this (exenatide, minoxidil and sildenafil) are not based on this hypothesis (i.e., drugs sharing a similar side-effects profile could share a common indication, which is not necessarily related to any of the side-effects), but in the identification of a potential indication directly related to one specific side effect. I would recommend to point this out, or to illustrate these examples in a different part of the text to avoid confusion.
Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., & Gonzalez, G. (2010). Towards Internet-Age Pharmacovigilance : Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks. In Association for Computational Linguistics (Ed.), Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. BioNLP ’10 (pp. 117–125). Stroudsburg, PA, USA.
Sampathkumar, H., Chen, X., & Luo, B. (2014). Mining Adverse Drug Reactions from online healthcare forums using Hidden Markov Model. BMC Medical Informatics & Decision Making, 14(91), 1–18.
Segura-Bedmar, I., Martínez, P., Revert, R., & Moreno-Schneider, J. (2015). Exploring Spanish health social media for detecting drug effects. BMC Medical Informatics and Decision Making, 15(Suppl 2), S6.
My main concern is that the paper describing the methods followed to obtain drug side-effects from Twitter is described in a paper that is not published, but under preparation (reference  in the manuscript). I suggest that this document should be provided to the reviewers as supplementary material for this review. Otherwise, a better description of the methods − especially the evaluation and limitations − should be provided in this paper.
In addition to this, there is one possible bias in the experiment “RECOVERERING KNOWN INDICATIONS”, that is not discussed by the authors. Drugs belonging to the same group of drugs (e.g., statins), could have similar side-effect profiles. Therefore, it would be expected that two drugs, such as simvastatin and lovastatin, would be ranked to be similar, and therefore the main indication would be correctly “predicted” through this method. Authors should explain if this could have influenced these results, or how did they handle this problem.
The authors say that “While data sets and underlying statistical models clearly differ, these results taken together suggest that the use of side-effect data mined from social media can certainly offer comparable performance to methods using side-effect data extracted from more conventional resources” (page 6). However, this conclusion cannot be extracted from the present study. It is possible that the method described by the authors could obtain better results using side-effects data from the same resources than other researches (e.g., Ye et al.). The influence of the provenance of data in the results obtained with this new method has not been evaluated.
Common side-effects shared by drugs within the subgraphs in the three examples (oxytocin, ramelteon and meloxicam) are side-effects commonly associated to a large number of drugs. However, other authors have observed an inverse correlation between side-effects frequency and the likelihood of two drugs to share a protein target (Campillos, Kuhn, Gavin, Jensen, & Bork, 2008). In my opinion, this is an interesting observation that should be discussed in the paper.
Limitations of the study refer mainly to a paper that is not published. Therefore, a better description of the limitations should be provided.
Campillos, M., Kuhn, M., Gavin, A., Jensen, L. J., & Bork, P. (2008). Drug Target Identification Using Side-Effect Similarity All us. Science, 321(5886), 263–266.
All my comments are covered by the 3 areas above.
The article is generally well-written but there are some typographical mistakes as well as writing style problems that I have listed in the comments for the authors.
The research explained in the paper is of value and useful. However, experiments and evaluations should be clarified by the authors as explained in the comments to the authors.
The data used here is Twitter data which cannot be shared (only Tweet ids can be shared). Even that, authors make no comment if they are sharing that to make their experiments repeatable by other researchers.
Evaluations explained in the paper have some ambiguity which they have to be cleared before any decision is made for the publication of this work.
The paper investigates the usability of social media (Twitter) for repositioning of the drugs based on their side effects. The motivation is clear and the problem itself is interesting and worth pursuing. The paper itself basically uses an existing method with limited modification to a new problem and does not propose a novel algorithm/method. Novelty is mostly the application. One main issue here is that the evaluations are weak and mostly based on anecdotes. How this system is useful in practice is something that the presented evaluations are not showing much. Below are more detailed comments:
- The paper hardly touches on the data it uses and its main issue due to the tweets nature: data quality. It is assumed that the tweets it gets (with the given criteria) are really talking about side effects. However, they could easily being misleading. Also, who did you deal with drug misspellings and the informal language of the tweets? Conclusions discuss the limitation of Twitter data but not this issue. I'd like to see this issue is clearly analyzed or discussed in this work. Also, it would help if a Materials or Data section come before or right after Methods.
- Page 2, lines 75-85: A number of references to the "recent" work in extracting adverse reactions from social media/medical forums are given. These studies are not longer any recent and in fact many more studies have been published in 2015 alone. I'd suggest authors to have a look at more advanced work than a workshop paper in 2010. You can look at a 2015 survey published in ACM computing survey for some of the later work (Text and Data Mining Techniques in Adverse Drug Reaction Detection/the social media section).
- Page 3, line 90: you cannot cite a work that is in preparation. If you want to use its material you either have to include it in this paper so it gets reviewed here or wait till your other paper gets published.
- page 3, line 113: how clinical trial is used as a feature to SVM?
- page 3: what was the training data for the SVM classifier?
- Page 3: how normalization to World Drug bank was done? This step important if done automatically as there is often large error rate is associated with it.
- page 6: Figure 1(a): How the frequencies are measures that they are all less than one? Not sure if that figure is correct given the explanation (top-n, etc). Needs to be clarified.
- In general it would be nicer to have figures or table top or bottom aligned in a page.
- Page 6, line 223: what does extracted from [32-35] means?? Why not mention exactly what here rather than ambiguously referring to others work and expect your reader to go read those work to understand yours.
- A discussion on what the performance/effectiveness values mean for the experiment on page 6-7. If we have a system that finds some correct indication but much more false ones, how is this helpful in a realistic setting. That is, what human workload would you expect to make use of this?
- Page 8: what the value of PCC tells us? What is the range and what values are good or bad?
some proof-reading is required, for example:
- page 6: remove --> removing
- Page 7: know indications --> known indications
- Page 7: Figure 2 is not showing much and can be removed. The colors are also not helpful.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.