All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Considering all comments from three reviewers and the major revision undertaken, I would like to recommend publication in PeerJ.
[# PeerJ Staff Note: While the reviewer mentioned novelty/interest in their final review, this is not a criterion for rejection. The Academic Editor made the decision based on PeerJ's editorial criteria (https://peerj.com/about/editorial-criteria/). #]
[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Computer Science Section Editor covering this Section #]
The author uses the LDA method to explore five research questions about data science on the DSSE website. I can see some questions are addressed, but I think the paper is not novel, although the article is somewhat educational for some data workers. However, I am very concerned that the significance is limited, and perhaps the author can put this on some other learning website. Overall, I am missing somewhat interesting turns in terms of methods or even surprises, which limits my enthusiasm to a certain degree.
No comment.
No comment
Please read the reviewers' comments carefully and revise your manuscript accordingly.
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
The author employs LDA to perform topic modeling on Stack Exchange data related to Data Science and draws statistical conclusions on the popular topics/tools/trends in the field, which may be of interest for people working in the field of Data Science.
The paper is written in professional english, well-structured and easy to follow. The methodology is clearly described and results are valid and well presented.
The only concern is that the dataset used in this paper as well as the methodology (including the data preprocessing step and the LDA approach) is very similar to the work done by Karbasian & Johri in 2020, which the author also cited in the paper, making the paper less innovative. However, given the fact that this paper covered more findings compared to (Karbasian & Johri, 2020), as the author claimed, expanding the scope of the previous paper, this concern should be a minor one.
The methodologies used in the paper, including data preprocessing, LDA algorithm, as well as other statistical methods are rigorous and well designed, leading to solid results.
Findings are valid and statistically sound; Figures and tables are well presented and easy to follow for readers.
The author uses latent Dirichlet allocation (LDA) to learn topics embedded in data science discussions, where the main source of the data is from DSSE, a data science-focused Q&A website. And the author further try to answer 5 questions:
1. What topics are discussed by data scientists?
2. How do the data science topics evolve over time?
3. How do the popularity and difficulty of the topics vary?
4. What are the most commonly used tasks, techniques, and tools in data science?
5. How do data science topics relate to data-driven technologies?
no comment
This paper discusses some important, popular, and difficult issues for data scientists are facing currently. However, the implication of these findings are not clear, and all the topics that are shown in the paper are already hot topics, such as NLP, computer vision, etc. Each year, thousands of articles are published in those areas each year. Would the findings in this paper truly have significant implications on the challenges of data science? I think the results and indications are not giving significant implications.
It is great to do such analysis, but the results don't really reveal the pain points that data scientists care about. I think the topics need to be more specific, unlike general Machine learning and Deep learning. Say, for NLP, the author can use Text Summarization to analyze the current issue in NLP area, this would be more insightful.
With the development of massive amounts of information and data science, data scientists often use specific question-and-answer websites to find solutions to difficult problems. In the paper, the author uses the LDA method to to explore five research questions about data science on the DSSE website. However, for the important method LDA, there is no formula or flow chart in the article to introduce it in detail. I think using formulas or a flow chart would make it clearer to readers know how the LDA works in the article.
The author design RQ2 to explore how do the data science topics evolve over time? However, the results of RQ2 presented in Table3, Figure4 and Figure5 do not show the changing trend of data science topics over time. And there are some errors in the finding of RQ2, because it is the same as the finding of RQ3.
The research results of the article are helpful for data workers to better understand and use data science. However, I am very concerned that the significance is limited, and perhaps the author can use some examples to strengthen the significance.
There are some repeated paragraphs in the article. For example, the finding part of RQ2 and RQ3 mentioned in the previous comments is same. Besides, there are some grammatical errors. I hope the author can check the content of the article more carefully.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.