Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on May 11th, 2016 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on June 9th, 2016.
The first revision was submitted on July 12th, 2016 and was reviewed by 2 reviewers and the Academic Editor.
A further revision was submitted on September 19th, 2016 and was reviewed by 1 reviewer and the Academic Editor.
A further revision was submitted on September 22nd, 2016 and was reviewed by 1 reviewer and the Academic Editor.
The article was Accepted by the Academic Editor on September 23rd, 2016.

Version 0.4 (accepted)

Lexing Xie · Sep 23, 2016 · Academic Editor

I am happy to see the improved manuscript and appreciate the efforts from both the authors and reviewer throughout several rounds of revision.

Congratulations!

Reviewer 1 · Sep 23, 2016

Basic reporting

Fine.

Experimental design

Fine.

Validity of the findings

Fine.

Additional comments

Fine.

Cite this review as

Anonymous Reviewer (2016) Peer Review #1 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.4)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.4/reviews/1

Download Version 0.4 (PDF) Download author's response letter - submitted Sep 22, 2016

Version 0.3

Lexing Xie · Sep 20, 2016 · Academic Editor

Minor Revisions

The editor appreciates the author's continued effort to improve the manuscript -- and the reviewers detailed speedy feedback.

I recommend revision of the wording as the reviewer suggests, and look forward to see a manuscript of great readability for both the computer scientists and the law community.

Reviewer 1 · Sep 20, 2016

Basic reporting

Fine.

Experimental design

Fine, subject to the below.

Validity of the findings

Fine, subject to the below.

Additional comments

The caveats in the rebuttal and the changes made to the paper are interesting. The relevant limitations that are acknowledged -- those that make the paper reliant on crude proxies -- mean that the paper's arguments would struggle to be satisfactory in a legal paper or in legal argument. I understand that the methods adopted in this area may be different.

Four specific revisions:
(1) The the rule on exhaustion of domestic remedies seems to be poorly articulated, or perhaps misunderstood, at line 152. In particular, its application at the domestic level seems back-to-front, at least as it is explained here. It should be revised.
(2) If there is literature to support the assumption made in lines 108-111, that literature should be noted. Or is it the literature in lines 134-173?
(3) The authors have explained eloquently the limitations on their methodology due to unavailable data. It is therefore jarring to see an expression of belief at line 116-117 ("we believe there is") that, surely, is unsupported due to those very limitations? Is belief the basis for scientific argument here?
(4) Line 416 says "Large repositories like HUDOC should be easily and freely accessible." This could be more precise: is HUDOC not easily and freely accessible? Isn't the authors' real concern with the fact that HUDOC is a "case law database" and not a "database of case law and other things"?

As I have said before, this is really creative and interesting work and I hope that it will be able to be developed to a point where it is of utility to scholars in all disciplines and to lawyers.

Cite this review as

Anonymous Reviewer (2016) Peer Review #1 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.3)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.3/reviews/1

Download Version 0.3 (PDF) Download author's response letter - submitted Sep 19, 2016

Version 0.2

Lexing Xie · Aug 2, 2016 · Academic Editor

Minor Revisions

I'd like to commend the authors for carefully taking into account the reviewer comments and revising the manuscript.

The remaining concerns are still in regard to paying attention to the nuances (from both the legal and CS perspectives) of the claims being made, and to paint a realistic picture about how the proposed analysis could be used, or developed into something that the legal practice can use.

The availability of data is, and will be an ongoing discussion within the law and CS communities. This paper is well-positioned to lead the academic and legal practice community in this discussion/debate about what is the best way to share data that allows the technology to move forward.

I strongly encourage the authors to take into account these comments and send in another revision.

Reviewer 1 · Jul 13, 2016

Basic reporting

No comments.

Experimental design

I have been asked to re-consider a revised draft article entitled ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’ for ‘PeerJ CS’.

As with the earlier review, this review is prepared by a referee without training in quantitative methods, in computer science, or in natural language processing, and as such should be read with an appropriately-sized grain of salt.

The authors have responded to feedback thoughtfully, which must be commended. But I am afraid that, subject to the caveats above, the article’s legal analysis remains problematic. I don't wish to be unkind or unfair with any of these comments: I look forward to a day when this sort of model can succeed. But this paper needs further work.

(all line references are to the track-changes pdf version)

At lines 88-94, the authors suggest the model can ‘be used to rapidly identify cases’, perhaps in ‘submitted cases’. It remains fundamentally unclear to me what text the authors imagine the model will be operating on to make ex ante predictions, given that it works solely on text in completed judgments. Is it not at least worth *identifying* what the text is that the model would operate on in order to make its ‘rapid’ ex ante predictions? Otherwise, the risk is that the argument will appear to be limited to this: ‘Our model allows us to analyse how the facts are summarized in published cases and predict how, later in that very same published case, the case will be resolved’. This may be interesting in itself, but I’m not sure it’s what the authors want the model to do.

The new section at lines 140-166 seems problematic, unless I have misunderstood key aspects of it.
*First, the authors say that the ECtHR has very limited fact-finding powers (true). But the authors then move on, without citing any authority, to say that this means that ‘in the vast majority of cases, [the ECtHR] will defer…to the judgments of domestic courts that have already heard and dismissed the applicants’ complaint’. This is problematic in various ways (not least that it implies that the domestic courts hear and dismiss complaints on the same legal questions that the ECtHR does, which seems to suggest the ECtHR is an appellate court – more on this below). More problematically, the authors’ logic is unclear: why wouldn’t the ECtHR defer to the summary of the facts prepared by (e.g.) the government lawyers? Moreover, even if it were true that the Court defers in this manner ‘in the vast majority of cases’, surely it should not be difficult to find a range of law journal articles and analysis supporting this legal or procedural proposition? And finally, this logic would suggest that any predictive model should look principally or exclusively at the domestic courts' factual summaries, would it not?
*Second, the authors say that ‘the Court cannot openly acknowledge any kind of bias on its part’ and therefore ‘on their face, summaries of facts…have to be at least framed in as neutral…’. The significance of this point is unexplained. Are the authors arguing that the Court does, in reality, prepare neutral summaries? Or that it may well be biased but that it must hide that bias? How does this help the argument about making ex ante predictions of any sort? Moreover, the authors do not seem to appreciate here that outright bias is only one problem: the bigger problem for their argument is the possibility of perfectly rational differential emphasis by the judges/registry/etc of facts that they know will be significant *because they are also involved in reaching the legal conclusions*. Unless I have misunderstood the ECtHR's procedure, in which case I apologize.
*Third, the absence of disputes before the ECtHR about the facts does not mean that there cannot be different facts emphasized or prioritized by the judges/registry in light of the analysis that they most likely know will follow.
*Fourth, the authors say ‘the “Circumstances” subsection is the closest (even if sometimes crude) proxy we have to a reliable textual representative of the factual background of a case’. Perhaps this is so, but one may wonder about what ‘reliable’ is worth here without a clearer sense of the model's utility. Surely the facts as summarized in government and/or applicant arguments might be worth a look if the goal is to assist the Court/lawyers in making ex ante predictions about how cases will be decided, even if they are not 'reliable' in the way a peer-reviewed paper might be reliable?

At several points (line 325, line 340), the ECtHR seems to be referred to as an appellate court. It is not an appellate court. This means, (1) the range of orders and remedies available to the ECtHR are not those of an appellate court, with consequences for its analysis; (2) the ECtHR will frequently be applying different *legal* tests to those applied by the domestic appellate courts (eg, ‘was there a violation of Art5 or Art6?’ vs ‘was the conviction unsafe?’), and (3) therefore different *facts* or different emphasis on those facts may be of interest to the ECtHR than those that were of interest to the domestic court.

Validity of the findings

See above.

Additional comments

Please let me add: this is really creative and interesting work and I hope that it will be able to be developed to a point where it is of utility to scholars and lawyers in all disciplines. Perhaps I misunderstand the mathematical value of the study; perhaps I misunderstand the internal workings of the ECtHR; but at present I am afraid I simply do not think the legal analysis is sufficient.

Cite this review as

Anonymous Reviewer (2016) Peer Review #1 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.2)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.2/reviews/1

Gabriela Ferraro · Jul 29, 2016

Basic reporting

The authors of the papers have answered the reviewers comments in a satisfactory manner.
Firstly, and more important, relevant concerns from reviewer 1, who have a strong background on Law were addressed, especially in nuancing the conclusion about the correlation between the Facts and the Court decision and using a more precise legal writing along the paper.
Also, clarification about the Facts section and its subsection ‘Circumstances of the case’ make the paper more convincingly and easy to follow.
From the technical part, clarification on the evaluation metric used and further experimentation with combined features sets were carry out, making the paper more substantial and methodologically stronger, and improving the overall accuracy of the predictor.

The authors commented during the review process that there are some barriers for accessing the data, from the EctHR portal. Besides, the authors mentioned that accessing cases from domestic courts is not straightforward either.
I would like to see a comment regarding data access issues, perhaps a sentence or two in the conclusion section. Are data access issues stopping or slowing down emerging research as the one presented in this paper? Should cases in the EctHR and domestic cases be easily and freely available? If that it the case, is the EctHR making progress in open their repositories for public good?

Looking forward to see this paper published in the journal.

Experimental design

No-comments

Validity of the findings

No-comments

Additional comments

No-comments

Cite this review as

Ferraro G (2016) Peer Review #2 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.2)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.2/reviews/2

Download Version 0.2 (PDF) Download author's response letter - submitted Jul 12, 2016

Version 0.1 (original submission)

Lexing Xie · Jun 9, 2016 · Academic Editor

Major Revisions

I very much like the topic of this paper. Both reviewers saw the value of this manuscript although each raised concerns about the reporting, experiments, and interpretation of results.

I strongly encourage the authors to address the concerns of reviewer 1, and carefully quantify the results and its interpretations in a legal context. I also strongly encourage the authors to take into account the suggestions of reviewer 2 in clarifying the content, discussing failure cases so as to get insight, and having not just two features separately, but also consider combining them.

I look forward to the revised manuscript.

Reviewer 1 · May 23, 2016

Basic reporting

No comments.

Experimental design

This review is prepared by a referee without training in quantitative methods, in computer science, or in natural language processing, and as such should be read with an appropriately-sized grain of salt.
The description, under 'Data', of how cases were selected needs clarification in several ways (lines 165-182).
First, when it is said that Article 3 'Prohibits torture', are we to understand that the study does not cover the other prohibitions contained in Article 3 (such as the prohibition on inhuman treatment)? Precision is important in legal writing and it is important here.
Second, the number of cases seems much lower than what one would expect, and much lower than what a rudimentary search of the HUDOC database generates. For example, a basic HUDOC search of Article 6, in English, generates over 10,000 cases. Of those, HUDOC indicates there at least 8,000 violation cases and at least 900 non-violation cases. If the methodology at lines 172-177 is replicable, one would expect that this study would have included at least 1,800 cases in its study of Article 6 (being all the cases in the smaller class plus a randomly selected equal number of cases from the larger class). And yet here the number studied is just 80. It is hard to discern the reasons for the discrepancy between the draft article and the results of a basic HUDOC search.
Third, the reasons for choosing articles 3, 6, and 8 could be substantiated a bit more. Surely *all* of the ECHR rights may be regarded as "important human rights that correspond to a variety of interests" (lines 167-168)? Why focus on these three?

Validity of the findings

This review is prepared by a referee without training in quantitative methods, in computer science, or in natural language processing, and as such should be read with an appropriately-sized grain of salt.
The article claims that "there is a strong correlation between the actual facts of a case and the decisions made by judges" (lines 264-265). I have serious concerns about whether or not this conclusion is substantiated by the data which preceded the Discussion. First, it is unclear what the authors mean by "the *actual* facts" (line 265). Are the "actual" facts somehow different from "the facts"? Second, it is not clear what the authors mean when they say "information available to the judges before they make any comments or decisions" (line 180). Are the authors implying that the judgments contain all the information available to the judges before they made their decision? If so, this would seem to be a misunderstanding of how courts work (surely, at the very least, one would want to look at the full written arguments of the parties, rather than simply the summaries of those written arguments that are contained in judgments?). Third, it seems to me that this study proves, at best, that there is a correlation between the facts *as described in the judgement* and the result of a case. There is a difference between "the actual facts of a case" and "the facts as they are described in the judgment of case". The article does not acknowledge this difference at all. This is a problem. I'm afraid that the authors seem to be under the impression that the facts section of a judgment is an objective scientifically-established recitation of the facts. Unless the authors are aware of ECtHR practice that I am unaware of, this seems dangerously naive. On my understanding, the judgments of the ECtHR are prepared by the judges, their assistants, and the Court Registry. In any court anywhere around the world, including the ECtHR, it would not be unusual in the slightest for the judges, the assistants, or the registry, to frame the facts in light of their full understanding of the case (which would include their view on whether or not there is a violation). Facts sections of judgments are not peer-reviewed scientific papers. They are subjective summaries of the facts, including what the authors think is relevant and what they think is irrelevant. If a judge/judicial assistant/registrar is of the preliminary view that a violation is likely, it would not be at all unusual for them to frame the facts differently than how those same facts would be framed if they were of the view that a violation was unlikely. This raises a problem: the article involves the authors taking the facts section of a delivered judgment, and then predicting whether or not that same judgment will result in a violation or not. This may be useful, but it does not seem to be the same as "predicting judicial decisions...using only the textual information available to the judges before they make any comments or decisions about a specific case..." (lines 312-313). The model does not seem to provide any capability for ex ante prediction -- i.e. it does not allow the result of a judicial decision to be predicted until the facts section of the judgment can be analysed (and the facts section of the judgment cannot be analysed until the judgment is handed down). Surely this limits its utility?
Perhaps I misunderstand the mathematical value of the study; perhaps I misunderstand the internal workings of the European Court. But even if I am wrong on the maths or on the workings of the Court, the article needs to be considerably clearer about what it is predicting and about the nature of how judgments are written and prepared. Without that, it is hard to attach too much significance to its findings, I'm afraid.
Fourth, the authors claim that their study amounts to support for legal realism over legal formalism (line 322). This may be so, but a much more sophisticated account (than that at lines 28-37) of the debate about realism and formalism would be needed to draw much of a conclusion here.

Cite this review as

Anonymous Reviewer (2016) Peer Review #1 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.1/reviews/1

Gabriela Ferraro · Jun 7, 2016

Basic reporting

This paper is about applying the state-of-the-art natural language processing analysis and a machine learning algorithm to build a binary classifier to predict judicial decisions.

The paper is clearly written and easy to understand. The Authors have followed the appropriate document structure and the submission is self-contained.
The authors had made the data set annotation public-ally available, and the content of the cases can be download from the European Court of Human Rights (EctHR) website.

Experimental design

The research question is clearly defined: it is possible to use text processing and machine learning to predict whether, given a case, there has been a violation of an article in the convention of human rights. The research question is certainly relevant, and the results are interesting for the natural language processing and machine learning community, but it’s unclear how these findings are useful for the law and human rights community, since nothing is mentioned in the paper with regards to this question, for example, 'it would be useful to apply this kind of classifier as a tagging tool for highlighting cases in which violation of human rights are likely to be true? perhaps as a prioritizing or filtering means?'
This is more a curiosity than a negative comment.

From the natural language processing and machine learning point of view, the methodology is correct. The authors use well known features as bag-of-words and topic models as well as an appropriate classifier (SVM).
Nevertheless, there are a few weak points that are not explained or addressed in the paper.
- Topic models: in spectral clustering, the number of topics is input parameter, but nothing is mentioned about how the value of this parameter, in this case 30 topics, was chosen.
- I was expecting to see experiments using both set of features, bow and topics, but results are only reported for experiments using one set of features at a time. Why are there no experiments that combine both features sets? If the performance was lower than when using the individual features sets, the outcome is still useful for the community and should be reported.

Validity of the findings

Accuracy is reported as the evaluation metric used to measure performance, but nothing is said about how accuracy is calculated, the formula or an explanation would be helpful. I'm assuming accuracy should be understood to mean: the proportion of true outcomes (true positives and true negatives) among the total number of cases. Please, clarify this.

In the discussion section, the authors gave examples how topics aligned with the theme of some of the cases, but it’s hard to understand if those examples are from the dataset used or from another dataset. I figured it out by exploring the dataset itself. A line clarifying this would be helpful.

Comment something about the cases that were wrongly classify, for example, is there any commonality between the wrongly classified instances? What does it mean for the law community to have more than 20% of its cases wrongly classified?

Additional comments

The topic of the paper is very interesting and the paper is easy to follow, but I would like to read about how this results can be use by the law community.

Cite this review as

Ferraro G (2016) Peer Review #2 of "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.93v0.1/reviews/2

Download Original Submission (PDF) - submitted May 11, 2016

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective

Summary

Version 0.4 (accepted)

Lexing Xie · Sep 23, 2016 · Academic Editor

Reviewer 1 · Sep 23, 2016

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.3

Lexing Xie · Sep 20, 2016 · Academic Editor

Reviewer 1 · Sep 20, 2016

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.2

Lexing Xie · Aug 2, 2016 · Academic Editor

Reviewer 1 · Jul 13, 2016

Basic reporting

Experimental design

Validity of the findings

Additional comments

Gabriela Ferraro · Jul 29, 2016

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Lexing Xie · Jun 9, 2016 · Academic Editor

Reviewer 1 · May 23, 2016

Basic reporting

Experimental design

Validity of the findings

Gabriela Ferraro · Jun 7, 2016

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective