Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on May 13th, 2024 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on August 25th, 2024.
  • The first revision was submitted on August 13th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on October 27th, 2025 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on November 10th, 2025.

Version 0.3 (accepted)

· · Academic Editor

Accept

At the beginning both reviewers had very serious demands to the main idea of morloc and to the general plan of the manuscript. Reviewer 1 stayed with his/her critics but Reviewer 2 narrowed down his/her demands and they were fully satisfied.

[# PeerJ Staff Note - this decision was reviewed and approved by Claudio Ardagna, a PeerJ Section Editor covering this Section #]

Version 0.2

· · Academic Editor

Minor Revisions

The author made substantial improvements to the manuscript. I am sure that those minor suggestions and necessary changes will be easily done by the author a well.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

·

Basic reporting

The author has significantly improved the manuscript after the initial peer review, and the research is still presented well. The manuscript focus on the new programming language, with several examples and case study that helps to illustrate the advantage of the proposed approach.

The revised paper reads well, and I think it focuses on features of a polyglot programming language, rather than a workflow system. I agree with this change, and this softens the need to address other workflow management system features as in my first review.

Related work has been expanded and is well phrased. My previous review have been taken into consideration, the bioinformatics section of a reasonable length, some limitations of morloc are acknowledged, and significant new work with workflow system comparison and performance measures have been added.

Experimental design

The design decisions for the morloc system is explained in detail. Some of these are opinionated choices and could be better backed by evidence or literature, but ultimately for making a new programming language, having a clear set of coherent design principles is top priority, and the author has justified most of these design decisions well.

The main argument is that composing a scientific pipeline with algorithmic code combined across programming languages in a near-memory-shared environment rather than passing files [as is traditional in bioinformatics], would give higher performance and avoid pipeline authors and tool developers from dealing with file formats.

Validity of the findings

The authors have added a detailed performance comparison of a pipeline implemented equivalently in morloc, Python and Snakemake, as well as benchmarking different language bridges. This justifies the claims of performance gain, as requested in previous peer review.

The author has created morloc with a broader aim of scientists using algorithmic combinations rather than composition of command line application, in the morloc documentation https://morloc-project.github.io/docs/ summarised as a goal to deprecate all bioinformatics file formats and command line applications, with a more moderate phrasing of the same aim in the Discussion. Although this is a laudible goal, it is very optimistic and perhaps not recognising the complexity of the ecosystem of bioinformatics open source tools, their long development and varied use. The bioinformatics communities have previously developed large in-language libraries like BioPerl, BioPython which largely tried to stay in the same computational process/memory.

That approach was largely abandoned in favour of the application-centric, which gives flexibility of not just combining different programming languages (to which morloc provides a neater solution), but clearer isolation for debugging, performance, maintenance, testing, as well as the ease of swapping implementations when science changes. Tools are written in different languages for different reasons, e.g. compatibility, optimisation, extension capabilities, integrations.

While morloc provides an environment to seamlessly mix-and-match programming languagues, the manuscript is not fully recognising such reasons above on why one would want to combine many languages. The current text seems to imply that it is a single pipeline author that is writing all their own algorithms, but presumably a single author that writes their own algorithms would not need to use multiple programming languages. What accelerated bioinformatics was the open source development and ease of sharing applications, and where I think the author's vision is going is for such code to rather be shared at an algorithmic/library level with shared data types.

I requested better coverage on reproducibility, which the author refused to reflect fully on in this revision. Their suggested move away from containerised command line application to effectively source code of algorithmic functions, would go against current guidance for Research Software Engineers (e.g. https://doi.org/10.1371/journal.pcbi.1009823
https://doi.org/10.1371/journal.pcbi.1005412) that focus on packaging, versioning, build tools, file formats.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

It is not clear how moving to an algorithmic-only approach would solve many of the portability issues, as a pipeline author would now instead have to not just disentangle the existing code's build system and find its dependencies to make it work with morloc, but would also struggle to share their workflow with anyone else as they would have to do the same.

While the author argue well the inefficency of every tool author making command line options and file formats, the similar side about how tool authors would supply their algorithmic code for others to use with morloc is unclear. For instance how would their custom object types be versioned, distributed and documented? How would compile options like use of hardware optimisations (GPU, SSE, AVX, aMX). How would an established and large open source program (e.g. GROMACS) need to be adopted for use by morloc? How would library dependencies of existing code be given to the morloc users? How would version differences be handled?

To a large degree the demonstrated use of morloc agents within an existing workflow system like Nextflow can also be used to solve many reproducibility challenges, as each agent can have a different container, which can have dependencies etc. In this case morloc operates as glue for efficency and sharing of data types. This possibility is argued in the response to reviewers, but not highlighted in the manuscript.

When trying the morloc tool myself, it now uses containers for itself, which simplifies installation and use greatly.

Additional comments

It can be argued that it is not the job of morloc to solve the reproducibility issues, just like gcc compiler does not solve these problems for the C++ language. Therefore, while my initial review requested several workflow-centric considerations, I am now not requesting any additional morloc features, primarily for the limitations on algorithm sharing and reproducibility to be recognised when moving to algorithmic-centric approach.

For instance in Future Work or Conclusion:

"While morloc is primarily a polyglot programming language, it was developed with the aim of optimising pipeline building at an algorithmic level. However, it is worth noting that a move to such an ecosystem for a field like bioinformatics comes with many additional challenges. Established workflow systems like Nextflow and Snakemake have largely addressed wider workflow engine concerns (e.g. reproducibility and interoperability) by containerised execution of UNIX-like command line applications that read and write files. In a morloc-like approach, the current practices for research software development would need to be augmented to cover mechanisms like versioning, documenting and sharing of polyglot libraries of algorithms, data types, dependencies across programming language."

Other changes could reflect some of my suggestions under "Validity of findings" but I have marked this as minor recommendation.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

The reviewers worked hard and proposed many possible improvements. I hope together with them that the authors will be able to work hard and perform BIG changes.

[# PeerJ Staff Note: It is PeerJ policy that additional references suggested during the peer-review process should *only* be included if the authors are in agreement that they are relevant and useful #]

·

Basic reporting

The text proposes a new programming language, "morloc", aimed as a glue language among a diverse set of different languages to compose workflows, where a set of data has to be processed through a sequence of steps, each potentially programmed in a different programming language.

The presentation is clear and well motivated. The problem the language tries to solve is relevant, and the proposed techniques might work. The main issue with the presentation is that is often too abstract. A key idea of the paper is that morloc can "import" functions from different languages into its own programs, but how that is done is never really explained. For instance, a function in Python lives inside the interpreter; how to import that? Does morloc link against the Python interpreter? Similarly, in the case study, the paper mentions how to integrate the "phylogenetic tree algorithm". But, as the paper makes clear, that algorithm is coded as a single stand-alone program. So, how to get a function to perform that task? How much work is needed to convert a program into a function? Similarly, the paper proposes the use of Jason as an "universal format", but again it does not give the details of how that works down the road. For instance, it mentions in Line 258 that a function could receive "files containing JSON formatted data" as arguments. How does that work? Does it receive a file name, and then the function itself is responsible for reading the JSON data?

The paper should give a much more clearer picture of how the final workflow is structured. Does the morloc compiler generate a single final program? In the case study, there seems to be different parts of the program being connected through external files (shell code after line 277); shouldn't that connection be done by morloc?

Other comments:

- Captions should only give a brief description of a figure. In the paper, several captions form almost a complete subsection, being an essential part of the narrative.

- The case study gives too much attention to Biology in detriment to the more computational aspects of the tool.

- As a good part of the design of morloc centers around a type system to interface multiple languages, the related work should include a discussion about Interface Description Languages (IDLs).

Experimental design

This part of the paper is very weak in its current status. There is practically no evaluation of the design. There is only one single case study. More importantly, there is no data about performance. The paper assumes that minimizing inter-language calls is the way for better performance, without any further considerations. (By that criteria, if Python is the only language that offers implementations for all modules of a workflow, the best result would be to get all implementations from Python, minimizing inter-language calls, even if some algorithms can be orders of magnitude faster in C++.) The case study also lacks an estimation of the work done to implement the whole case in morloc, including the work needed to adapt the modules in other languages. (For instance, the work needed to convert the "phylogenetic tree algorithm" from a program into a function.) The paper needs much more solid data to argue that morloc is really a better alternative for programming workflows, and probably a data-based discussion about what kinds of workflows it is particularly suited to.

Validity of the findings

As it is, the paper proposes an interesting path, but with no data to support its viability. When we read the conclusion, it sounds like a (good!) project proposal, not as the final conclusion of a study.

Additional comments

I am marking it for Reject because I think that "major revision" implies a kind of agreement that, if the author does what was asked, the paper should be accepted, and I cannot assume that. I think this work has interesting ideas worth pursuing.

·

Basic reporting

In general the reporting has clear and consistent language. The research is presented well with reasonable argumentations and good explanations. Literature is mostly covered well, see suggestions below.

It is good to introduce a real-life example that matches the motivation, however consider that most PeerJ CS readers will not be bioinformaticians.
The article flow suffers from going too much in depth in the bioinformatics use case (>7 pages); consider the reader from a Computer Science perspective.
The abstract does not reflect this detailed walk-through and it is not clear why it is appropriate for the PeerJ CS reader.

Try to explain bioinformatics terms like "clades" and "phylogenetic tree" when first introduced.
Make it clearer that in this example, the actual phylogenetic tree is represented by morloc data structures, the text wrongly seems to imply this is simply a JSON structure. To do this you may need to give a mini-introduction to what are the elements of the tree.

Figure 3 caption says "Generate" but figure has "Code Gen", make consistent.


Figure 4 Type annotations "RootedTree Clade Real a" are confusing what they apply to, as they overlap three objects and it may seem like `RootedTree` on Make Tree is separate from `Real a" on Classify. Make font of these types smaller and monotyped, and perhaps change the 5 yellow data dots to use different colours that can then match the 5 types.

For citing CWL, instead of the specification (Amstutz 2016), cite journal paper (Crusoe 2021) <https://doi.org/10.1145/3486897>

Replace Blankenberg 2010 with newest Citation for Galaxy <https://doi.org/10.1093/nar/gkae410>


Minor suggestions:

Typo p15 l 396: "This new specialized function is ignores" -- delete "is"

Typos in references: Capitalize Arvados, Common Workflow Language, Python, C, C++, SWIG, Haskell, EDL, BioNix, octoFLU, US Swine, Cuneiform, BioPython, R, BioPerl, Acro, etc.

References: Please add DOIs for all academic publications. Add URLs for web references like Google Open Source Blog.

Broken reference 6(10.7490):f1000research. - should be: Voss K, Gentry J and Van der Auwera G. Full-stack genomics pipelining with GATK4 + WDL + Cromwell [version 1; not peer reviewed]. F1000Research 2017, 6(ISCB Comm J):1379 (poster) (https://doi.org/10.7490/f1000research.1114631.1)

Section 4.2 The more classical Corba and WSDL should also be mentioned for having strong typing system with code generation/bindings, WSDL was heavily used for bioinformatics services in the 2000s <https://doi.org/10.1093/bib/6.2.178>. In all of these RPC systems the responsibility of a workflow system moves from coordinating files and local processes to an RPC client that orchestrates calls across distributed services, which adds robustness challenges <https://doi.org/10.1109/eScience.2012.6404482> which are even more present in the modern world of cloud-based executions <https://doi.org/10.1038/s41598-021-99288-8>.

Section 4.2 section should consider SHIWA's approach to fine-grained and coarse-grained interoperability <https://doi.org/10.1016/j.future.2014.02.016> from meta workflows.

Experimental design

The author argues well their motivation for creating morloc as a type-based workflow language that can work across programming languages.

The overall structure and inner mechanisms of morloc are explained in detail, with sufficient figures and examples.

The morloc system builds on Haskell, which has a mature and powerful type system. Beyond polymorphism, it is however unclear which other aspects of Haskell apply or don't apply to morloc, e.g. lazy evaluation.

It is stated that the system can only reliably cope with strings as ASCII, this is a disappointing and predictable source of interoperability errors. Given that Unicode 2.0 was released in 1996, and languages like Java and C# have used Unicode since their infancy, I would have expected internationalisation to be crucial for morloc's type system.

Validity of the findings

The implications of using morloc are shown by-example with the running use case. While this helps give the text an understandable flow, I would have appreciated further critical discussion about the limitations and scope of morloc after section 4.2. For instance, one challenge with morloc is that it is an experimental protoype, which has not yet built a community of users and mature workflows; meanwhile the workflow systems cited (Galaxy, Nextflow, Snakemake) have large user and developer communities with mature workflows used in production. It is not clear which subset of such workflows would be natural to consider for moving to a morloc-style workflow language (e.g. those that are more algorithmic).

Performance gains on using morloc over cross-language pipelines are hinted at, but no details are provided on performance numbers. Remove or evidence such theorerical speculations. There is no description on how morloc can/cannot be used in a paralellised or distributed setting (e.g. multiple cores or cloud compute nodes) and how this would affect the implementations combined. This is a major feature of most workflow systems used in bioinformatics, and inefficiencies in such orchestrations (e.g. file transfers) can be much more significant than slow-downs from intra-process-communication due to different programming languagues.

Morloc relies heavily on a type system. This comes as a "tax" on users, which would need to think deeper about data structures than in their typical Regex-style approach as suggested. As education in this kind of thinking is limited to computer science and not domain scientists like bioinformaticians, this adds an adoption challenge for potential users of the system who would traditionally be "smashing" together command lines until the pipeline works. There is also an assumption here that the pipeline author has deeper knowledge of the functions of the components of the pipeline - but these are (at least in bioinformatics) typically written by someone else.

A dataset of the Flu Case study in morloc is provided, but the full extent of this is not detailed sufficiently by the paper, except in Figure 7. This is an interesting comparison of the "same" pipeline in morloc, Snakemake, Nextflow and bash. I think this should be given its own section and discussed further, beyond just syntactic differences. For instance, Snakemake and Nextflow have support for using Conda and containers for capturing the executed tool, while it's unclear how build dependencies of a morloc step can be made reproducible. The requirement to making command line tools for the other systems, while a "burden" for tool authors, can also help testing, as each can be for instance be unit tested from GitHub actions. It should be possible to also do unit testing of Morloc modules, but as it's described it sounds like only integration testing at workflow level is possible.

I am concerned about Reproducibility challenges which are not addressed by the manuscript. It sounds like a small modification to the workflow can change the linking to different programming languages and other implementations.
How can workflow authors be sure that alternative functions are 1:1 interchangeable, and how do they know which one was used? Can the user indicate non-equivalent functions, to force choice to be explicit?

Given the complex heuristics of choosing implementations, it seems odd an error is thrown when two equal priority implementations are given. Perhaps a predictable fallback like definition order for equivalent functions? Or how can users take control of this priority, as it sounds like it's important for scientific validity of the pipeline to know which implementations have been used.

Additional comments

Overall this article gives an in-depth description of a novel mechanism for writing pipelines across programming languages. However, I think the manuscript requires further major improvements before it can be published:

1. Reduce use case bioinformatics text or ensure it helps help the overall manuscript
2. Be more self-critical of the proposed solution and acknowledge its limitations or focused scope.
3. Expand on comparison of other workflow systems, e.g. based on the supplemental data
4. Avoid unjustified arguments of performance gains
5. Include honest indication of maturity of code base and community to better inform potential workflow users

I've included other suggestions in this review which I do not consider critical for acceptance.


### Technical suggestions

Morloc's packable sounds somewhat equivalent to Java's Serializable, but given that composition of any type can be returned, how do morloc know if this converges to a serialisable data structure? Why not go directly to JSON, if this is the underlying transport mechanism anyway?

In terms of supported programming languages, why are not Java or Rust considered? This can be argued for instance by scoping based on which languages are most used in a given field like phylogenetics.

The text uses the form "we" (particularly in Future Work), but this is a single-author paper. There is no indication of a marloc open source community in the text, although additional contributors are listed in GitHub. Could this be expanded on, and perhaps include these as co-authors?

Section 3.2 explains the algoritm in detail, but it is unclear why this is included as it does not seem to explain any Morloc feature or challenges. These walk-through details can be removed when they don't add anything specifically to the paper. This section could be removed or condensed to add considerations on external services and that their added complexity, fragility and execution time are not currently reflected in the morloc weighting of choosing implementations. (For instance, a C++ function that calls Entrez APIs with libcurl will be slower and more fragile than a Python function that looks up in a local Entrez file)

Section 3.3 p13 - it can be worth pointing out that different algorithms can have large differences in computational complexity, and that the morloc weighting do not currently compare these using analysis of call depth etc. Similar to above, a Python function in O(log n) can be more efficient than a C++ function in O(n^2).

Section 3.3 p14 on pass by reference - it's worth also pointing out even further that it is considered bad practice in pipelines to have mutable data structures, as the internals of a later algorithm may have a side-effect on a concurrent pipeline step that is expecting the original input - pipeline steps should aim to be functional so that they can be easily recomposed based on scientific needs.

How are dependendencies of third-party languages handled? Snakemake and Nextflow have support for Conda and containers per step, here morloc assumes a Python module for each. In my reproducibility of the example, it failed on the "tree" module (see below), possibly as my runtime environment did not match exactly the pipeline author's environment.


### Software citations

The author should be commended for used Zenodo's archiving of GitHub, but has not cited the corresponding DOI in the paper. Please add a reference with the software citation of <https://doi.org/10.5281/zenodo.11174477> to the paper, and list this DOI in addition to the current GitHub link.

You may wish to provide more accurate author information for contributors "melody!" and "Zach1031" to include these in the citation. Use CITATION.cff <https://citation-file-format.github.io/> to customize Zenodo's author info.


### Improve Reproducibility

I understand morloc is in an early `0.48` version (flagged as _exerimental_ on GitHhb), but as a previously capable research software enginer with some Haskell knowledge, I was hoping to be able to try it.

I struggled installing morloc in Linux Mint 21.3, as it does not work with the distribution's `haskell-stack` <https://github.com/morloc-project/morloc/issues/23>. The Docker container would be helpful to new users not familiar with the Haskell ecosystem. Please update the README to also detail which Operating Systems and distributions are supported by morloc.

Morloc finally built, but the second example of C++ integration fails <https://github.com/morloc-project/morloc/issues/24>. Make sure the README examples work from top to bottom, with any required installs made explicit.

The examples in <https://doi.org/10.5281/zenodo.11174573> have no human readable documentation or guide in the download/Github <https://github.com/morloc-project/examples/issues/4>, only a minimal description in the Zenodo record with a confusing "Initial release for PeerJ".

As Zenodo deposit, the dataset must be self-described - as a minimum it should connect to the morloc main code base and describe what is the purpose of the dataset. There should also be a citation from the dataset record/readme to the pipeline originn (Chang et al 2019). The Zenodo record metadata has a typo "worlflow". The Dockerfile of the example fails <https://github.com/morloc-project/examples/issues/5> as it seems it has not been tested from source code. Further debugging ran into additional build issues <https://github.com/morloc-project/examples/issues/6> and I gave up.


I think these challenges can be improved with a relatively small amount of effort, but I see these as another warning sign that the morloc system has not been tested outside the author's setting.
I recommend the author expand on the existing GitHub Action mechanism to set up automated builds of the Dockerfile also of the examples, and to verify the example dataset works on more than one machine.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.