All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
This revised version has substantially improved the first one, and authors have done a great job addressing all comments.
There are still a couple of typos to be corrected - yet this can be done during the publication process, with the help of the production staff:
- Pag. 2, "investigation.dditionally"
- Pag. 2, "meta-communities.n Figure 1"
- Pag. 3, "reported.As" (without space)
- Pag. 5, "subreddits.s such"
- Pag. 5, "zero).Furthermore", without space
- Pag. 6, "network.uture"
- Pag. 6, "forums .iven"
In this interesting Manuscript, authors propose the use of “interest maps” to represent the backbone of user interests in social networks. Specifically, they validate their method with reddit, presenting several interesting analyses and comparing this social platform with other similar websites.
I appreciated the Manuscript. It’s generally well written (except for a couple of typos), and it is remarkable that the authors provide an on-line interactive visualisation tool, as well as the raw data set.
The mayor drawback is that, more than a research paper, this seems an “implementation paper”: authors have implemented a nice methodology, and they are presenting it to the world. I miss some more depth in the scientific analysis, and a more rigorous analysis of the resulting network topology.
Beyond those provided by the referees, following are some comments that authors should tackle before publication:
- There is some confusion with the terminology used. For instance, what exactly “meta-communities” are? They are firstly used in Pag. 4, where it is said that correspond to clusters. Yet, clusters are not explicitly defined. I understand that “meta-communities” are the communities identified by the community detection algorithm, and thus I would suggest to call them simply communities. (OK, I recognise that then you have another problem… as reddit is a community in itself! Maybe the solution is to call Reddit a social platform, and “meta-communities” communities... In any case, authors should be more careful with the terminology).
- Pag. 4. A reference to a Table is missing (“Table ??”)
- Pag 4 - 5, explanation of the power-law fit. This part is quite interesting, and deserves some more discussions. First of all, what does the fact that the power low exponent scales so well with alpha tell us? Can some information about the underlying topology being extracted from this?
Second, as alpha approaches zero, the exponent of the power law fit approaches 2, according to Fig. 3 Top Right, and not zero as reported in the text.
Third, for very small alphas, I understand that most links are filtered from the network, and not retained (as indicated in the text). Thus the network should be mostly disconnected, and not fully connected.
- Some references may be added, as for instance,
+ One on scale-free networks theory:
Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Reviews of modern physics, 74(1), 47.
+ One general reference about network theory, for instance to point towards definitions of the Clustering Coefficient or of the Shortest Path Length
The paper employs methods from network science to understand the structure of the Reddit interest network. Using a backbone extraction method to subset the raw network data into important links at different significance levels, the analyses find evidence of scale-free, small-world, and modular network structure often found in other online social networks. In addition to these empirical findings, the paper provides some network visualization maps to encourage navigation of online information communities. This large-scale network analysis of Reddit data demonstrates that this online social network exhibits features found in many other OSNs and also importantly shares anonymized data for future research. However, the paper would benefit from a clearer motivation for why network visualization in this context is appropriate, revised data analysis to address concerns about distribution fitting, and re-focusing away or towards the dynamics of users' interests over time. These lead me to recommend the authors undertake major revisions before accepting the manuscript.
"Maps" are a central metaphor that motivates the present study including the introductory research question "Is it possible to build a map to aid users in the discovery of relevant interests on social networks?" Drawing on the information visualization literature [1-3], especially around knowledge mapping [4,5], I would have liked to see the paper make a stronger case for why spatialization is a relevant metaphor for information relevance and relatedness (see  for alternatives). I don't think this is a bad metaphor, but the question of whether it's "possible" has a trivial answer --- instead, I would encourage the authors to reflect more thoughtfully on the potential and limitations of network visualizations [7-10] for supporting their suggestion that "the integration of such interest maps into popular social media platforms will assist users in organizing themselves". Ideally the paper would have conducted some sort of usability study to assess lay users' ability to interpet these maps, but as an empirical analysis of visualizing relational data, the paper misses some important limitations and considerations.
The paper finds evidence of the inter-subreddit network having scale-free, small-world, and modular community structure, and attributes the scale-free property to a preferential attachment process (pg. 2). I find attribution to a preferential attachment process unconvincing given the multitude of distinct mechanisms that generate such distributions [11-13], the absence of any curve-fitting estimation to differentiate among distinct classes of long-tailed distributions [14,15], and lack of an empirical comparison between a preferential attachment model and the observed data in the paper. Indeed, linear (naive) preferential attachment models are unable to generate the modularity and clustering seen in these data, which would imply that preferential attachment alone is not an appropriate model for the observed results [16,17].
I appreciated the inclusion of a sensitivity analysis (Fig 3) for different cut-off values in the backbone extraction, but these charts would benefit from indicating where the 0.05 cutoff value is as well as sub-figure labels (3a, 3b, etc.) for reference. The specification of the ER null model also lacks details on how it was parameterized in the context of a weighted network. There's a broken reference to a table on pg. 4. Related to my concerns about the exponent for the powerlaw fit, there are no details on the estimation procedure and the reference to a R^2 value indicates that least-squares procedure was inappropriately used instead of MLE [14, 15]. Some statistical test producing a p-value was conducted, but there are again no details about it, which is important given that the skewed nature of the data suggests the need for using non-parametric tests .
The backbone extraction method, while valuable for data visualization purposes, systematically throws away information about weak ties in the graph. Similarly, the choice to project into a one-mode network rather than preserve the two-mode relationships also produces a loss of information despite the availability of bipartite metrics . Was there a motivating rationale for using this method as opposed to other graph sparsification or backbone extraction methods [20-22]? The methods indicate that the backbone extraction methods was applied after an initial global thresholding step of excluding users with fewer than 10 posts, which is the kind of approach the Serrano, et al. paper criticizes. Wouldn't other methods for thresholding ties based on more meaningful features like posts within a date range (to capture changing rather than aggregated user interests), posts since account creation (to weed out throw-aways), etc. produce more meaningful sub-community linkages?
The discussion claims the map allows users to explore related interests by providing a "*dynamic* view of interests on the social network." This statement is surprising as there was no discussion of time in the previous analyses, so the maps are actually very much cross-sectional samples and do *not* reflect evolving interests or trajectories . I think the question of user dynamics over time on the site could be fascinating from the theoretical perspectives like socialization, recruitment, and persistence, but the analyses in the paper only provide cross-sectional evidence of large-scale structural patterns. The paper references previous Reddit research, but excludes some more recent examples [23, 24]. Did this large-scale collection of user data follow relevant policies from Reddit's terms of service and privacy policies and MSU's human subjects research protections?
 Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. In Visual Languages, 1996. Proceedings., IEEE Symposium on (pp. 336-343). IEEE.
 Keim, D. A. (2002). Information visualization and visual data mining. Visualization and Computer Graphics, IEEE Transactions on, 8(1), 1-8.
 Healy, K., & Moody, J. (2014). Data visualization in sociology. Annual review of sociology, 40, 105-128.
 Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual review of information science and technology, 37(1), 179-255.
 Borner, K. (2010). Atlas of science: Visualizing what we know. The MIT Press.
 Hecht, B., Carton, S. H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D., & Downey, D. (2012, August). Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proc SIGIR (pp. 415-424). ACM.
 Welles, B. F., & Meirelles, I. (2014). Visualizing Computational Social Science The Multiple Lives of a Complex Image. Science Communication, 1075547014556540.
 Bender-deMoll, S., & McFarland, D. A. (2006). The art and science of dynamic network visualization. Journal of Social Structure, 7(2), 1-38.
 Freeman, L. C. (2000). Visualizing social networks. Journal of social structure, 1(1), 4.
 McGrath, C., Blythe, J., & Krackhardt, D. (1997). The effect of spatial arrangement on judgments and errors in interpreting graphs. Social Networks, 19(3), 223-242.
 Mitzenmacher, M. (2004). A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics, 1(2), 226–251.
 Andriani, P., & McKelvey, B. (2009). Perspective—From Gaussian to Paretian Thinking: Causes and Implications of Power Laws in Organizations. Organization Science, 20(6), 1053–1071.
 Newman, M. E. (2005). Power laws, Pareto distributions and Zipf's law. Contemporary physics, 46(5), 323-351.
 Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661-703.
 Alstott, J., Bullmore, E., & Plenz, D. (2014). powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS One, 9(1), e85777.
 Vázquez, A. (2003). Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E, 67(5), 056104.
 Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 2.
 Corder, G. W., & Foreman, D. I. (2009). Nonparametric statistics for non-statisticians: a step-by-step approach. John Wiley & Sons.
 Latapy, M., Magnien, C., & Del Vecchio, N. (2008). Basic notions for the analysis of large two-mode networks. Social Networks, 30(1), 31-48.
 Foti, N.J., Hughes, J.M., and Rockmore, D.N. Nonparametric Sparsification of Complex Multiscale Networks. PLoS ONE 6, 2 (2011), e16431.
 Mathioudakis, M., Bonchi, F., Castillo, C., Gionis, A., and Ukkonen, A. Sparsification of Influence Networks. Proc. KDD, ACM (2011), 529–537.
 Macdonald, P.J., Almaas, E., and Barabási, A.-L. Minimum spanning trees of weighted scale-free networks. EPL (Europhysics Letters) 72, 2 (2005), 308.
 Buntain, C., & Golbeck, J. (2014). Identifying social roles in reddit using network structure. In Proc. WWW (pp. 615-620).
 Leavitt, A., & Clark, J. A. (2014). Upvoting Hurricane Sandy: event-based news production processes on a social news site. In Proc. CHI (pp. 1495-1504). ACM.
The article meets the PeerJ criteria and should be accepted.
There are few typos to be corrected:
Page 4, line 8 from bottom: Table ??
Page 8, second paragraph, line 3: if larger ... should be: is larger
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.