A survey on Graph Representation Learning in Computer Vision: Perspectives on the Granularity of visual entities and tasks


Abstract

Background: Graph representation learning has become a powerful paradigm for modeling the relational structure of visual data for computer vision tasks. Unlike traditional grid-based representations, graph representation learning offers a flexible approach to modeling non-Euclidean structures, which can capture long-range dependencies while facilitating structured reasoning. Existing surveys are more focused on task-oriented or data modalities, leaving the field without a unified conceptual framework for understanding how different graph schemas, which can shape visual reasoning, interact.

Methodology: Following standard literature search guidelines, we collected studies published between 2017 and 2025 from major CV/ML conferences and journals using targeted GRL-related keywords. From this corpus, we identified representative works covering seven graph schemas and extracted their graph designs, learning mechanisms, and task-level findings to build our taxonomy.

Results: Our analysis identifies seven fundamental graph schemas: pixel, point, skeleton, object, region, image, and label graphs, which can form a hierarchy ranging from low-level spatial detail to high-level semantic abstraction. For each schema, we review representative models, discuss graph construction strategies, and compare learning mechanisms, including graph convolutional networks, graph attention networks, spatiotemporal GNNs, and graph transformers. Common challenges across schemas include reliance on predefined topologies, uniform neighbor aggregation, difficulty modeling long-range dependencies, and scalability limitations in large graphs. Emerging solutions include graph structure learning, attention-based and adaptive message passing, hierarchical and multiscale graph designs, and integration of graph reasoning into transformer architectures.

Conclusions: This survey presents a unified taxonomy of graph representation learning in computer vision, synthesizing methodological trends across diverse graph schemas. By clarifying the structural roles of graphs from pixels to labels, we provide a coherent framework for understanding how graph- based reasoning advances visual recognition, geometric understanding, and scene-level interpretation. Traditional challenges, such as adaptive graph construction, efficient reasoning on large graphs, and standardized evaluation, highlight important directions for future research. The insights presented here aim to guide the development of more scalable, interpretable, and context-aware graph-based solutions.

Ask to review this manuscript

Notes for potential reviewers

  • Volunteering is not a guarantee that you will be asked to review. There are many reasons: reviewers must be qualified, there should be no conflicts of interest, a minimum of two reviewers have already accepted an invitation, etc.
  • This is NOT OPEN peer review. The review is single-blind, and all recommendations are sent privately to the Academic Editor handling the manuscript. All reviews are published and reviewers can choose to sign their reviews.
  • What happens after volunteering? It may be a few days before you receive an invitation to review with further instructions. You will need to accept the invitation to then become an official referee for the manuscript. If you do not receive an invitation it is for one of many possible reasons as noted above.

  • PeerJ Computer Science does not judge submissions based on subjective measures such as novelty, impact or degree of advance. Effectively, reviewers are asked to comment on whether or not the submission is scientifically and technically sound and therefore deserves to join the scientific literature. Our Peer Review criteria can be found on the "Editorial Criteria" page - reviewers are specifically asked to comment on 3 broad areas: "Basic Reporting", "Experimental Design" and "Validity of the Findings".
  • Reviewers are expected to comment in a timely, professional, and constructive manner.
  • Until the article is published, reviewers must regard all information relating to the submission as strictly confidential.
  • When submitting a review, reviewers are given the option to "sign" their review (i.e. to associate their name with their comments). Otherwise, all review comments remain anonymous.
  • All reviews of published articles are published. This includes manuscript files, peer review comments, author rebuttals and revised materials.
  • Each time a decision is made by the Academic Editor, each reviewer will receive a copy of the Decision Letter (which will include the comments of all reviewers).

If you have any questions about submitting your review, please email us at [email protected].