All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The revised version addressed reviewers concerns properly; the new version is much improved and ready for publication.
The paper looks interesting and has the potential for publication. The authors need to address the reviewers comments in order to strengthen the quality of the manuscript.
The paper is clearly and unambiguously written, with exception of lines 71-73 and 314-318. The description of the TISCI measures in a single sentence might confuse the reader and require multiple readings.
The introduction and background are sufficient to place this work into the broader field of knowledge, and the relevant prior literature is referenced.
The structure of the paper conforms to the PeerJ standard.
There are multiple issues with regard to the figures:
Fig 2 (d) has incorrect timestamps.
Fig 6 (d) has the circle purple dynamic community taking two different values for the population of recurring users for 03/Jan
Fig 7 (b,c) are illegible (low quality).
The combination of the heatmap (dark blue for higher values) and the characterization of the community (in black) used in Fig 7 (a), 8, 9, and 10 makes the characterization of the communities with higher population.
The raw data is supplied through a listing of the unique ids of the tweets used in for the 4 twitter datasets.
The approach is technically sound and thoroughly justified.
However, I have minor ethical concerts wrt. disclosing the twitter identities of the most influential users.
No Comments
The authors present a framework for automated information extraction in online social network (twitter) through dynamic community detection and their ranking based on varied set of measures (TISCI).
The framework is thoroughly described, and applied to four twitter datasets acquired by the authors.
In addition to the lack of clarity in the two sentences (71-73, 314-318), there is a typo in line 252: D (x \leq 1 \leq D \leq N), should be D (1 \leq x \leq D \leq N).
Table 4 has inconsistencies wrt. the formating of the numbers for # of tweets Sherlock and # unique users US Election.
I would suggest the introduction of a small table in description of each of the datasets such that the key findings of this framework could be easily summarized.
This would help to illustrate in what way the proposed framework could "assist the work of journalist in their own story telling, social and political analyst scientists[...]".
See below.
See below.
See below.
In this paper, the authors propose a dynamic/time-evolving community detection framework and apply it to twitter networks.
Under this framework, two sub-problems are addressed:
i) how to identify dynamic communities from temporal graphs?
ii) how to identify rank resulting communities on the basis of structural and temporal properties?
For the first sub-problem, the paper proposes the use of existing community detection tools first and then a Jaccard index way of associating communities from different timesteps to one another. In this case, it justifies the use of InfoMap which is a widely used community detection tool for standard graphs.
Subsequently, every pair of communities from different timesteps are compared using a Jaccard similarity to match communities.
This provides a way to associate community merges/splits/death/birth events from timestep to timestep.
For the second sub-problem of ranking communities, a normalized page rank approach is used that combines persistence, stability, popularity, integrity, etc. of communities.
The approach is then evaluated on multiple twitter datasets - US/Greece elections, Sherlock, etc.
Results are presented by and large as findings of the detection framework. Top 10-20 communities are presented from each dataset and there is a discussion on what those communities represent.
Major comments:
The problem of dynamic community detection is an important problem with a wide range of applications.
Given that there are already several methods in literature (many of them cited in this paper, some missing), it becomes important for any paper on this topic to clearly state what is the achieved improvement, both in concept/algorithm and in results. This paper, by and large, uses previously devised methods and known techniques and therefore it is difficult to see what is new about the contributions made in this paper.
The use of a standard community detection tool such as InfoMap first to detect communities from every step and then match them across is the classical incremental approach that was also used in prior works such as Greene et al. 2010. In fact the approach of matching communities from different timesteps using Jaccard similarity was also proposed in Greene et al. 2010 paper. A similar approach was also presented (earlier) by Tantipathananandh et al. 2007 and later this line of work was extended to include cost functions that straddle across time boundaries. The sequence of work by this group is only briefly acknowledged but never expanded. It needs to be.
There is also work in the literature that tackles a more global approach toward defining a dynamic community. For instance the following work by Mucha et al. 2010, is one such example, which is not cited in the paper:
Mucha, Peter J., Thomas Richardson, Kevin Macon, Mason A. Porter, and Jukka-Pekka Onnela. "Community structure in time-dependent, multiscale, and multiplex networks." science 328, no. 5980 (2010): 876-878.
There is also this review paper:
Cazabet, Rémy, and Frédéric Amblard. "Dynamic Community Detection." In Encyclopedia of Social Network Analysis and Mining, pp. 404-414. Springer New York, 2014.
The paper needs to state what is different about their proposed method compared to these highly releted works, why that matters, and then substantiate those claims of advantages and/or limitations using a comparative study in the experimental results section.
Paper's methodology and writing:
I found the reading of this paper very difficult. Its not just the grammar or typos and such. But it is also the verbosity and the lack of rigor in terms of formalisms. More specifically, there is a lot of sloppy notation and undefined terms that make the understanding of the proposed method impossible. Here are some key instances:
- First, no where is the notion of a "community" and how it differs from a "dynamic community" explicitly defined.
- Key notation such as C_{i,j}, T_{i,j}, etc are never really defined anywhere. This obviously comes in the way of understanding.
For instance, if we look at Fig. 5, what does each of the T on the left hand side stand for, and what do their subscripts mean? What is the basis for edge labelling? The caption says there should be 5 dynamic communities but if I look at the figure, it shows actually C_1 ... C_6.
If you look at the Greene paper, for example, they define every term they use very clearly.
I suggest adding a separate subsection for Definitions and notation prior to the description of the approach, where everything is defined unambiguously and explicitly.
As for verbosity, every effort should be made to make the reading easy - i.e., use short paragraphs (not long montonous strings of text).
The paper uses InfoMap and the authors try to justify its use by comparing it with other methods such as Louvain which are also widely used. However, the comparison they show (Figs. 3 & 4) do not really help in understanding the key differences. The authors should use some quantative way of describing the differences/strengths and weaknesses.
As for the Jaccard similarity test,
a) are communities from only successive timesteps compared or could communities from vastly different timesteps be also compared?
b) how does the use of a threshold impact the result?
As for the results, the authors have done a decent job of trying to explain their ranked community outputs. However, its not clear a) whether the proposed approach is missing out on some key communities; b) or whether there are no further ways to combine the output communities into larger communities and still maintain their significance.
The authors should consider generating synthetic inputs through simulations where there are known community structures and use those for validation first.
The authors should also compare their outputs with other dynamic community detection tools (see. related work comment above).
The quadratic complexity of the tool (in terms of timesteps) is also a problem and the authors should address that and performance reported (they say they can do it through LSH).
Minor comments:
- pg. 10, line 300: has an incomplete sentence.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.