← Back
Synthesis, characterization and studies of iridium (III) complexes inducing cell death via apoptosis and ferroptosis
Front. Comput. Sci., 2025, 19(11): 1911375
https://doi.org/10.1007/s11704-025-41426-w
REVIEW ARTICLE
A survey of geometric graph neural networks: data structures,
models and applications
Jiaqi HAN1,2*, Jiacheng CEN1*, Liming WU1*, Zongzhao LI1, Xiangzhe KONG3, Rui JIAO3,
Ziyang YU3, Tingyang XU4,5, Fandi WU6, Zihe WANG1, Hongteng XU1, Zhewei WEI1,
Deli ZHAO4,5, Yang LIU ( )3, Yu RONG ( )4,5, Wenbing HUANG ( )1
✉
✉
✉
1 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China
2 Department of Computer Science, Stanford University, CA 94305, USA
3 Department of Computer Science and Technology, Institute for AI, Tsinghua University, Beijing 100084, China
4 DAMO Academy, Alibaba Group, Hangzhou 311121, China
5 Hupan Lab, Hangzhou 311121, China
6 Tencent AI Lab, Shenzhen 518100, China
The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn
Abstract Geometric graphs are a special kind of graph with
geometric features, which are vital to model many scientific
problems. Unlike generic graphs, geometric graphs often
exhibit physical symmetries of translations, rotations, and
reflections, making them ineffectively processed by current
Graph Neural Networks (GNNs). To address this issue,
researchers proposed a variety of geometric GNNs equipped
with invariant/equivariant properties to better characterize the
geometry and topology of geometric graphs. Given the current
progress in this field, it is imperative to conduct a
comprehensive survey of data structures, models, and
applications related to geometric GNNs. In this paper, based on
the necessary but concise mathematical preliminaries, we
formalize geometric graph as the data structure, on top of
which we provide a unified view of existing models from the
geometric message passing perspective. Additionally, we
summarize the applications as well as the related datasets to
facilitate later research for methodology development and
experimental evaluation. We also discuss the challenges and
future potential directions of geometric GNNs at the end of this
survey.
Keywords scientific systems, geometric graphs, graph
neural networks, equivariance, invariance
1
Introduction
Many scientific problems particularly in physics and
biochemistry require to process data in the form of geometric
graphs [1]. Distinct from typical graph data, geometric graphs
Received December 28, 2024; accepted February 24, 2025
E-mail: liuyang2011@tsinghua.edu.cn; yu.rong@hotmail.com;
hwenbing@126.com
* These authors contributed equally to this work. Work done by Jiaqi Han
during his visit to Renmin University of China.
additionally assign each node a special type of node feature in
the form of geometric vectors. For example, a
molecule/protein can be regarded as a geometric graph, where
the 3D position coordinates of atoms are the geometric
vectors; in a general multi-body physical system, the 3D states
(positions, velocities or spins) are the geometric vectors of the
particles. Notably, geometric graphs exhibit symmetries of
translations, rotations and/or reflections. This is because the
physical law controlling the dynamics of the atoms (or
particles) is the same no matter how we translate or rotate the
physical system from one place to another. When tackling this
type of data, it is essential to incorporate the inductive bias of
symmetry into the design of the model, which motivates the
study of geometric Graph Neural Networks (GNNs).
Constructing GNNs that permit such symmetry constraints
has long been challenging to methodological design. Pioneer
approaches like DTNN [2], DimeNet [3], and GemNet [4],
transform the input geometric graph into distance/angle/
dihedral-based scalars that are invariant to rotations or
translations, constituting the family of invariant GNNs.
Noticing the limit on the expressivity of invariant GNNs,
EGNN [5] and PaiNN [6] additionally involve geometric
vectors in message passing and node update to preserve the
directional information in each layer, leading to equivariant
GNNs. With group representation theory as a helpful tool,
TFN [7], SE(3)-Transformer [8], and SEGNN [9] generalizes
invariant scalars and equivariant vectors by viewing them as
steerable vectors parameterized by high-degree spherical
tensors, giving rise to high-degree steerable GNNs. Built upon
these fundamental approaches, geometric GNNs have made
remarkable success in various applications of diverse systems,
including physical dynamics simulation [10,11], molecular
property prediction [5,8], protein structure prediction [12],
protein generation [13,14], and RNA structure ranking [15].
Figure 1 illustrates the superior performance of geometric
2
Front. Comput. Sci., 2025, 19(11): 1911375
straight to the methodology part in Section 3 if they are
familiar with the theoretical background.
Fig. 1 Performance comparisons between geometric GNNs and traditional
methods on molecular property prediction, protein-ligand docking, and
antibody design. Notably, the methods based on geometric GNNs, including
EGNN [5], DiffDock [16], and dyMEAN [17], remarkably outperform
traditional MPNN [18], Gnina [19], and RossetaAb [20], on the datasets of
QM9 [21], PDBBind [22], and SAbDab [23], respectively, verifying the
effectiveness and efficiency of geometric GNNs over various tasks.
(a) Property prediction; (b) ligand docking; (c) antibody design
GNNs against traditional methods on the representative tasks.
To facilitate the research of geometric GNNs, this work
presents a systematic survey focusing both on the methods and
applications1), which is structured as the following sections:
In Section 2, we introduce necessary preliminaries on group
theory and the formal definition of equivariance/invariance; In
Section 3, we propose geometric graph as a universal data
structure that will be leveraged throughout the entire survey as
a bridge between real-world data and the models, i.e.,
geometric GNNs; In Section 4, we summarize existing models
into invariant GNNs (Section 4.2) and equivariant GNNs
(Section 4.3), while the latter is further categorized into
scalarization-based models (Section 4.3.1) and high-degree
steerable models (Section 4.3.2); Besides, we also introduce
geometric graph transformers in Section 4.4; In Section 5, we
provide a comprehensive collection of the applications that
have witnessed the success of geometric GNNs on particlebased physical systems, molecules, proteins, complexes, and
other domains like crystals and RNAs.
The goal of this survey is to provide a general overview
throughout data structure, model design, and applications (see
Fig. 2), which constitutes an entire input-output pipeline that
is instructive for machine learning practitioners to employ
geometric GNNs on various scientific tasks. Recently, several
related surveys have been proposed, which place main focus
on methodology of geometric GNNs [36], pretrained GNNs
for chemical data [37], representation learning for molecules
[38,39], and general application of artificial intelligence in
diverse types of scientific systems [40]. In contrast to all of
them, this survey places an emphasis on geometric graph
neural networks, not only encapsulating theoretical
foundations of geometric GNNs but also delivering an
exhaustive summary of the related applications in domains
across physics, biochemistry, and material science.
Meanwhile, we discuss future prospects and interesting
research directions in Section 6. We also release the Github
repository that collects the reference, datasets, codes,
benchmarks, and other resources related to geometric GNNs.
2
Basic notion of symmetry
In this section, we will compactly introduce the basic notions
related to symmetry. Readers can skip this section and get
1) This work is an extended survey of our previous short version [24].
2.1 Transformation and group
By defining symmetry, we indicate that an object of interest
keeps invariant under a set of transformations. For instance,
the distance between any two points in space remains
constant, regardless of how we simultaneously rotate or
translate these two points. Mathematically, a set of
transformations forms a group (more details are referred to
[41]).
Definition 1 (Group). A group G is a set of transformations
with a binary operation “ · ” satisfying these properties: (i) it is
closed, namely, ∀a, b ∈ G, a · b ∈ G ; (ii) it is associative,
namely, ∀a, b, c ∈ G, (a · b) · c = a · (b · c) ; (iii) there exists an
identity element e ∈ G such that ∀a ∈ G, a · e = e · a = a ;
(iv) each element must have an inverse, namely,
∀a ∈ G, ∃b ∈ G, a · b = b · a = e , where the inverse b is denoted
as a−1 .
We below provide some examples commonly used in the
applications of this paper:
● E(d) is an Euclidean group [42] consisting of rotations,
reflections and translations, acting on d -dimension
vectors.
● T(d) is a subgroup of Euclidean group that consists of
translations.
● O(d) is an orthogonal group that consists of rotations
and reflections, acting on d -dimension vectors.
● SO(d) is a special orthogonal group that only consists of
rotations.
● SE(d) is a special Euclidean group that consists of only
rotations and translations.
● Lie Group is a group whose elements form a
differentiable manifold. Actually, all the groups above
are specific examples of Lie Group.
● SN is a permutation group whose elements are
permutations of a given set consisting of N elements.
2.2 Group representation
While the group operation “ · ” is defined abstractly above, it
can be realized as matrix multiplication, with the help of group
representation. A representation of G is a group
homomorphism ρ(g) : G 7→ GL(V) that takes as input the
group element g ∈ G and acts on the general linear group of
some vector space V , satisfying ρ(g)ρ(h) = ρ(g · h), ∀g, h ∈ G .
When V = Rd , then GL(V) contains all invertible d × d
matrices and ρ(g) assigns a matrix to the element g .
For the orthogonal group O(d) , one of its common group
representations is defined by orthogonal matrices O ∈ Rd×d
subject to O⊤ O = I ; for SO(d) , the group representation is
restricted to orthogonal matrices of determinant 1, denoted as
R . The case of translation group T(d) is a bit tedious and can
be derived in the projective space using homogeneous
coordinates; here, for simplicity, we directly define translation
as vector addition other than matrix multiplication. Note that
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
3
Fig. 2 Illustration of the complete input-output pipeline from data structures, models to applications. Note that most figures here for illustrating
different applications are edited based on previous papers [16,25–35]. The term “instance” indicates a self-interacted system composed of
multiple particles/atoms, such as a molecule or a protein. Pocket-Based Molecule Sampling, Ligand-Binding Affinity Prediction, and ProteinLigand Docking are denoted with yellow shading to imply that all these tasks take as the multi-instance format “Molecule+Protein”
the representation of a group is not unique, which will be
further illustrated in Section 4.3.2.
2.3 Equivariance and invariance
Let X and Y be the input and output vector spaces,
respectively. The function ϕ : X → Y is called equivariant
with respect to G if when we apply any transformation to the
input, the output also changes via the same transformation or
under a certain predictable behavior. In form, we have
Definition 2 (Equivariance). The function ϕ : X 7→ Y is G equivariant if it commutes with any transformation in G ,
ϕ(g · x) = g · ϕ(x), ∀g ∈ G,
(1)
which, by implementing the group operation · with group
representation, can be rewritten as:
ϕ(ρX (g)x) = ρY (g)ϕ(x), ∀g ∈ G,
(2)
where ρX and ρY are the group representations in the input
2) Note that the identity transformation
and output space, respectively.
The choice of group representation facilitates the
specialization of different scenarios. When both ρX and ρY
are trivial representations, namely, ρX (g) = ρY (g) = I 2), ϕ
becomes a trivial function; notably, when ρY (g) = I , ϕ is
called an invariant function, demonstrating that invariance is
just a special case of equivariance.
It is able to verify that equivariance induces the following
desirable properties. (i) Linearity: any linear combination of
equivariant functions is still equivariant. (ii) Composability:
the composition of two equivariant functions (if they can be
composed) yields an equivariant function. Therefore,
equivariance for each layer of a network implies that a whole
network is equivariant. (iii) Inheritability: if a function is
equivariant with respect to group G1 and group G2 , then this
function must be equivariant with respect to the direct product
of these two groups, i.e., G1 × G2 under a corresponding
definition of product group operation or group representation.
I could have different dimensions in the input space X and output space Y .
4
Front. Comput. Sci., 2025, 19(11): 1911375
This implies that proving equivariance of each transformation
individually is sufficient to prove equivariance of joint
transformations.
In the following context, the variable x is instantiated as a
geometric graph, the group transformation ρ(g) becomes the
transformation of geometric graphs, and the function ϕ is
designed as an invariant/equivariant GNN.
3 Data structure: from graph to
geometric graph
This section formally defines graph and geometric graph, and
depicts how they differ from each other. Table 1 summarizes
the notations we used throughout this paper.
3.1 Graph
Conventional studies on graphs [43,44] usually focus on their
relational topology. Examples include social networks,
citation networks, etc. In the domain of AI-Driven Drug
Design (AIDD), they are usually referred to as 2D graphs [45].
Definition 3 (Graph). A graph is defined as G:=(A, H) , where
A ∈ [0, 1]N×N is the adjacency matrix with N being the
number of nodes, and H ∈ RN×Ch is the node feature matrix
with Ch being the dimension of the feature.
Concretely, the adjacency matrix A takes the value 1 at its
(i, j) -entry ai j when node i and j are connected by an edge
and 0 otherwise. The i th row of H , i.e., hi ∈ RCh , represents
the feature vector for node i , e.g., the one-hot embedding of
the atomic number in a molecule graph. Along with the
Table 1
definition of graph, we also describe some vital concepts
derived. We denote the set of nodes as V and the set of edges
as E . Correspondingly, the neighborhood of node i , marked as
Ni , is specified to be Ni :={ j : (vi , v j ) ∈ E} . The graph can
additionally contain some edge features ei j ∈ RCe for edge
(vi , v j ) .
Transformations on graphs: g · G . One can arbitrarily
change the order of nodes without changing the topology of
the graph. With the language of group representation, the
permutation transformation of a graph is denoted as
g · G:=(Pg APg⊤ , Pg H) , where Pg is the representation of the
transformation g ∈ SN (i.e., the permutation matrix3)). We
denote the equivalence in terms of permutation as G ≃ g · G .
As a concrete example, molecules can be viewed as graphs,
where the nodes vi are instantiated as the atoms, and the node
features H are the one-hot encoding of the atomic numbers, a
row for each atom. The edges A are either the existence of
chemical bonds or constructed based on relative distance
between atoms under a cut-off threshold, and the respective
edge features ei j can be assigned as the type of the chemical
bond and/or the relative distance.
3.2 Geometric graph
In many applications, the graphs we tackle contain not only
the topological connections and node features, but also certain
geometric information. Again, in the example of a molecule,
we may additionally be informed of some geometric quantities
in the Euclidean space, e.g., the positions of the atoms in 3D
Basic notations and definitions throughout this survey
Notation
Description
Data structure
G:=(A, H)
⃗
⃗
G:=(A,
H, X)
Ni
hi ∈ RCh
A graph G containing N nodes, with adjacency matrix A ∈ RN×N and node feature matrix H ∈ RN×Ch .
⃗ containing N nodes, with adjacency matrix A and node feature matrix H as above, and additionally a 3D
A geometric graph G
⃗ ∈ RN×3 .
coordinate matrix X
The neighborhood of node i .
The scalar feature of node i .
⃗xi ∈ R3
⃗ i ∈ R3×C
V
The 3D coordinate of node i .
⃗ (l) ∈ R(2l+1)×Cl
V
i
The type- l irreducible vector of node i .
⃗ (l) }l∈L
⃗ (L) :={V
V
i
i
ei j ∈ RCe
The set consisting of irreducible vectors of all types l ∈ L .
G, g
The group G and its group element g .
ρX (g)
×, ⊗
The group representation ρX (g) of the transformation g in the vector space X .
The operators between two vectors including cross product × and Kronecker product ⊗
W
⊗cg , ⊗W
cg , ⊗cg
Clebsch-Gordan (CG) tensor product, optionally with a learnable parameter W and a learnable parameter set W .
Y (l) (⃗x) ∈ R2l+1
The type- l vector constructed by spherical harmonics of ⃗x ∈ S 2 : Y (l) (⃗x) = [Y−l , Y−l+1 , · · · , Yl−1 , Yl ] .
Y(L) (⃗x):={Y (l) (⃗x)}l∈L
A set consisting of spherical harmonics of all types l ∈ L .
D(l) (g)
The l -th degree Wigner-D matrix of the rotation transformation g ∈ SO(3) .
Neural Network
Functions implemented with MLP.
The multi-channel 3D vector of node i .
The edge feature from node j to i .
Operator
ϕ, ψ, φ, σ
(l)
(l)
(l)
(l)
3) The permutation of A can also be written in the form of group representation by first vectorizing A as Vec(A) and then conducting (P ⊗ P )Vec(A) .
g
g
Here ⊗ defines the Kronecker product, and Pg ⊗ Pg is the 2-order representation of the permutation matrix.
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
5
coordinates4). Such quantities are of particular interest in that
they encapsulate rich directional information that depicts the
geometry of the system. With the geometric information, one
can go beyond working on limited perception of the graph
topology, but instead to a broader picture of the entire
configuration of the system in 3D space, where important
information, such as the relative orientation of the neighboring
nodes and directional quantities like velocities, could be better
exploited. Hence, in this section, we begin with the definition
of geometric graphs, which are usually referred to as 3D
graphs [1].
Definition 4 (Geometric Graph). A geometric graph is defined
⃗
⃗ , where A ∈ [0, 1]N×N is the adjacency matrix,
as G:=(A,
H, X)
N×C
h
H∈R
is the node feature matrix with dimension Ch , and
⃗ ∈ RN×3 are the 3D coordinates of all nodes.
X
⃗ , namely, hi ∈ RCh and xi ∈ R3
The i th rows of H and X
denote the feature and 3D coordinate of node vi , respectively.
In the above definition, we distinguish the coordinate matrix
⃗ from other quantities A and H , and geometric graph G
⃗
X
from graph G , with an over-right arrow “ → ”, indicating that
they contain geometric and directional information. Note that
⃗ in a
there could be other geometric variables besides X
geometric graph, such as velocity, force, and so on. Then the
⃗ is extended from N × 3 to N × 3 × C where C
shape of X
denotes the number of channels. In this section, we assume
C = 1 for conciseness, while more complete examples are
shown in Section 5.
⃗ . In contrast
Transformations on geometric graphs: g · G
to graphs, transformations on geometric graphs are not limited
to node permutation. We summarize the transformations of
interest below:
⊤
⃗
● Permutation, which is defined as g · G:=(P
g APg ,
⃗ , where Pg is the permutation matrix
Pg H, Pg X)
representation of g ∈ Sn ;
● Orthogonal transformation, which is defined as
⃗ A, H, XO
⃗ g ) , where Og is the orthogonal matrix
g · G:=(
representation of g ∈ O(3) , consisting of rotations and
reflections;
⃗ A, H, X
⃗ + ⃗t g ) ,
● Translation, which is defined as g · G:=(
where ⃗t g is the translation vector of g ∈ T(3) .
⃗ ≃ g·G
⃗ . We can combine
We always have the equivalence G
orthogonal transformation and translation into Euclidean
⃗
transformation on geometric graphs, namely, g · G:=
⃗
( A, H, XOg + ⃗t g ) for g ∈ E(3) . Here, the Euclidean group E(3)
is a semidirect product [46] of orthogonal transformation and
translation, denoted as E(3) = T(3) ⋉ O(3) . We can similarly
define SE(3) transformation by considering only rotation and
translation. We sometimes call H invariant features (or
scalars), since they are independent to E(3) transformation,
⃗ equivariant features (or vectors) that correlate to
and call X
E(3) transformations. Figure 3 demonstrates the example of
transformation on geometric graph.
Fig. 3 Examples of transformations on geometric graphs. (a) Permutation;
(b) translation; (c) rotation; (d) reflection
Geometric graphs are powerful and general tools to model a
variety of objects in scientific tasks, including small molecules
[5,47], proteins [14,48], crystals [49,50], physical point clouds
[25,51], and many others.
We will provide more details in Section 5.
4
Model: geometric GNNs
In this section, we first recap the general form of Message
Passing Neural Network (MPNN) on topological graphs. Then
we introduce different types of geometric GNNs that extends
the message passing paragidm of MPNNs to geometric
graphs: invariant GNNs, equivariant GNNs, as well as
geometric graph transformers. Finally, we briefly present the
works that discuss the expressivity of geometric GNNs.
Figure 4 presents the taxonomy of geometric GNNs in this
section.
4.1 Message passing neural networks
Graph Neural Networks (GNNs) are favorable to operate on
graphs with the help of the message-passing mechanism,
which facilitates the information propagation along the graph
structure by updating node embeddings through neighborhood
aggregation. To be specific, message-passing GNNs
implement ϕ(G) on topological graphs G by iterating the
following message-passing process in each layer [18],
(
)
mi j = ϕmsg hi , h j , ei j ,
(3)
(
)
h′i = ϕupd hi , {mi j } j∈Ni ,
(4)
where ϕmsg (·) and ϕupd (·) are the message computation and
feature update function, respectively. The node features hi , h j
and edge feature ei j is first synthesized by the message
function to obtain the message mi j . The messages within the
neighborhood are then aggregated with one set function and
4) Although we mainly focus on 3D space, most of our analyses can be extended to
d -dimensional space where d is an arbitrary integer.
6
Front. Comput. Sci., 2025, 19(11): 1911375
Fig. 4
Taxonomy of geometric GNNs introduced in Section 4
leveraged to update the node features h′i combined with the
input hi .
GNNs defined by Eqs. (3) and (4) are always permutation
equivariant but not inherently E(3) -equivariant. When
mentioning equivariance or invariance in what follows, this
paper mainly discusses the latter unless otherwise specified.
4.2 Invariant graph neural networks
Moving forward to the geometric domain, there are various
tasks that require the model we propose to be invariant with
regard to Euclidean transformations. For instance, for the task
of molecular property prediction, the predicted energy should
remain unchanged regardless of any rotation/translation of all
atom coordinates. Embedding such inductive bias is crucial as
it essentially conforms to the physical rule of our 3D world.
In form, invariant GNNs update invariant features as
⃗ with the function ϕ satisfying:
H′ = ϕ(G)
⃗ = ϕ(G),
⃗ ∀g ∈ E(3).
ϕ(g · G)
(5)
To design such function, invariant GNNs usually transform
⃗ to invariant scalars that are
equivariant coordinates X
unaffected by Euclidean transformations. Early invariant
GNNs can date back to DTNN [2], MPNN [18], and MVGNN [91], where relative distances are applied for edge
construction. Recent works further elaborate the use of various
invariant scalars ranging from relative distances to angles or
dihedral angles between edges, upon the message passing
mechanism in Eqs. (3) and (4). We introduce several
representative works below.
SchNet [47]. This work designs a continues filter
convolution conditional on relative distances ri j = ∥⃗xi − ⃗x j ∥ . In
particular, it re-implements Eq. (3) as
mi j = σ2 (ri j )σ1 (h j ),
(6)
where the message is calculated as the multiplication between
the continues convolution filter and the neighbor embedding,
and the functions σ are all Multi-Layer Perceptrons (MLPs).
5) Here CBF is short for Circular Bessel Function.
6) Here SBF is short for Spherical Bessel Function.
DimeNet [3]. By observing that using relative distances
alone is unable to encode directional information, DimeNet
proposes directional message passing which takes as input not
only relative distances but also angles between adjacent edges.
The main component to compute the message embedding of
each directional edge (from j to i ) is given by:
∑
( ji)
(k ji)
′
σint (mk j , eRBF , eCBF ) ,
m ji = σmsg m ji ,
(7)
k∈N j \{i}
( ji)
where eRBF denotes the radial basis function representation of
k ji
relative distance d ji ; eCBF 5) computes the joint representation
of relative distance dk j and angle α(k j, ji) between edge (vk , v j )
and (v j , vi ) , with the help of spherical Bessel functions and
spherical harmonics. In [3], Eq. (7) is applied as an interaction
block before an embedding block that derives the message m ji
( ji)
based on eRBF and hidden features hi and h j . The updated
messages m′ji of all neighbor nodes are then utilized to update
hidden feature hi . A faster version of DimeNet is proposed
later, dubbed DimeNet++ [52,53].
GemNet [4]. To achieve universal expressivity, GemNet
further takes dihedral angles into account, formulating twohop directional message passing based on quadruplets of
nodes. Basically, it replaces the message embeddings from Eq.
(7) in DimeNet [3] with the following form:
∑
m′ji = σmsg m ji ,
m
(8)
,
jikl
k∈Ni \{ j}
l∈Nk \{i, j}
e(lk)
RBF
)
(ikl) ( jikl)
m jikl = σint mlk , e(lk)
RBF , eCBF , eSBF ,
e(ikl)
CBF
(
(9)
( jikl)
eSBF 6)
where
and
are defined as above;
are
calculated by, the spherical Bessel function of relative distance
d ji , and spherical harmonics of angle αji,ik and dihedral angle
αji,kl . The input of Eq. (8) additionally integrates hidden
features hi and h j for more expressivity in its original
formulation. Note that GemNet can be modified to enable
equivariant output by multiplying the output with the
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
associated direction, which belongs to scalarization based
equivariant GNNs introduced in the next subsection.
LieConv [54]. LieConv is formulated as follows.
(
)
mi j = σ log(u−1
(10)
i u j) h j,
)
(
∑
1
(11)
hi +
mi j ,
j∈N(i)
|N(i)| + 1
where ui ∈ G is a lift of ⃗xi , the logarithm log maps each group
member onto the Lie Algebra g that is a vector space, and σ is
a parametric MLP. Besides, Eq. (11) conducts normalization
by the division of the number of all nodes, i.e., |N(i)| + 1 . It is
clear that LieConv only specifies the update of node features
hi while keeping the geometric vectors ⃗xi unchanged. That
means LieConv is invariant.
In addition to the above models, SphereNet [55] is another
prevailing invariant GNN. Similar to GemNet, SphereNet also
exploits relative distances, angles, and torsion angles for
geometric modeling, which is able to distinguish almost all 3D
graph structures. Moreover, its proposed spherical message
passing (SMP) enables both fast and accurate 3D molecular
learning on large-scale molecules. ComENet [56] is another
type of invariant model which incorporates 3D information
completely and efficiently. It ensures global completeness of
model only with message passing in 1 -hop neighborhood to
avoid time-consuming calculations like torsion in SphereNet
or dihedral angles in GemNet. k -DisGNN [57] relies solely on
invariant relative distance information, yet adopts high-order
message-passing frameworks from traditional graph learning
(e.g., k -WL or k -FWL), achieving completeness for k = 2 .
GeoNGNN [58], the geometric extension of the simplest
subgraph GNN (NGNN [92]), effectively utilizes local
subgraph information and also attains completeness with only
distance features. There are also some other studies
[59,93–95] exploiting the quaternion algebra to represent the
3D rotation group, which mathematically ensures SO(3)
invariance during the inference. Specifically, QMP [59]
constructs quaternion message-passing module to distinguish
the molecular conformations caused by bond torsions.
h′i =
4.3 Equivariant graph neural networks
In contrast to invariant GNNs that only conduct the update of
invariant features, equivariant GNNs simultaneously update
both invariant features and equivariant features, given that
many practical tasks (such as molecular dynamics simulation)
requires equivariant output. More importantly, as proved in
[96], equivariant GNNs are strictly more expressive than
invariant GNNs particularly for sparse geometric graphs.
In form, equivariant GNNs design the function over
⃗ satisfying:
⃗ ′ ) = ϕ(G)
geometric graphs as (H′ , X
⃗ = g · ϕ(G),
⃗ ∀g ∈ E(3).
ϕ(g · G)
(12)
Specifically, through the lens of message-passing in Eqs. (3)
and (4), the geometric message is derived as
(
)
⃗ i j = ϕmsg hi , h j , ⃗xi , ⃗x j , ei j .
mi j , m
(13)
⃗ i j are
In subsequent, the computed geometric messages m
aggregated within the neighborhood Ni specified by the
7
connectivity or adjacency matrix of the graph, and updated by
taking the input features into account. This update process is
formally summarized as
(
)
⃗ i j )} j∈Ni .
h′i , ⃗x′i = ϕupd hi , {(mi j , m
(14)
The functions ϕmsg and ϕupd should ensure that all
invariant/equivariant output to be invariant/equivariant with
respect to any E(3) transformation of the input.
There are different ways to realize the specific form of ϕmsg
and ϕupd . Below, we categorize current famous equivariant
GNNs into two classes: scalarization-based models and highdegree steerable models.
4.3.1 Scalarization-based models
This line of works first translates 3D coordinates into invariant
scalars, which is similar to the design of invariant GNNs, but
it refines beyond invariant GNNs by further recovering the
direction of the processed scalars for the update of equivariant
features.
EGNN [5]. EGNN is one of the most famous scalarization
based models, and it can be considered as an equivariant
enhancement of two prior works, SchNet [47] and Radial
Field [63]. For its message function ϕmsg (·) , it first applies the
relative distance for the update of invariant message, which is
then multiplied back with the relative coordinate to derive
directional message. The form of ϕmsg (·) is as follows:
(
)
mi j = σ1 hi , h j , ∥⃗xi − ⃗x j ∥2 , ei j ,
(15)
( )
⃗ i j = (⃗xi − ⃗x j )σ2 mi j ,
m
(16)
while the update function ϕupd (·) takes the following form,
)
( ∑
(17)
mi j ,
h′i = σ3 hi ,
j∈Ni
⃗x′i = ⃗xi + γ
∑
j∈Ni
⃗ i j,
m
(18)
where σ1 , σ2 , σ3 are all instantiated as Multi-Layer
Perceptrons (MLPs), and γ is a predefined constant.
GMN [51]. In practice, each node is usually associated with
multiple geometric features besides 3D position, such as
velocity and force. Therefore, GMN proposes a multi-channel
version of EGNN by defining a multi-channel vector
⃗ i ∈ R3×C for node i , where different channel (column)
V
indicates different kind of geometric vector. In the message
computation, the multi-channel vectors interact through inner
product and are properly normalized for more training stability
just before they are fed into the MLP, i.e.,
⃗ ⊤V
⃗
V
ij ij
mi j = σ1 hi , h j ,
, ei j ,
(19)
⃗ ⊤V
⃗
∥V
i j i j ∥F
( )
⃗ ij = V
⃗ i j σ2 mi j ,
M
(20)
⃗ i j is a translation-invariant directional matrix related
where V
⃗ i = [⃗xi , ⃗x˙ i ] where
⃗ j ; for instance, if we have V
⃗ i and V
to V
⃗x˙ i ∈ R3 defines the velocity, then we can either choose the
⃗ij = V
⃗i − V
⃗ j , or the concatenate form
direct subtraction V
8
Front. Comput. Sci., 2025, 19(11): 1911375
⃗ i j = [V
⃗ i, V
⃗ j ] where the first channel of V
⃗ j is made
⃗ i and V
V
translation invariant by subtracting the mean coordinate [65].
The update process is analogous to Eqs. (17) and (18), but
extended to the multi-channel fashion as well.
PaiNN [6]. By initializing the multi-channel equivariant
⃗ i = ⃗0 ∈ R3×C , PaiNN
features to be zeros, namely, letting V
⃗ i as well as invariant feature hi via the
iteratively updates V
fixed relative position of the input coordinates ⃗xi j = ⃗xi − ⃗x j in
each layer, with the help of residual connection and gated nonlinearity. We rewrite and somehow generalize the original
form proposed by [6] using our consistent denotations. The
messages are given by:
(
)
mi j = σ1 h j , ∥⃗xi j ∥2 , ei j ,
(21)
( )
( )
⃗ ij = V
⃗ j σ2 mi j + ⃗xi j σ3 mi j ,
M
and the update functions are calculated as:
∑
mi = hi +
mi j ,
j∈N(i)
∑
(22)
(23)
⃗ i j,
M
(24)
(
)
⃗ i∥ ,
h′i = mi + σ4 mi , ∥ M
(25)
(
)
⃗′ = M
⃗i+M
⃗ i σ5 mi , ∥ M
⃗ i∥ ,
V
i
(26)
⃗i =V
⃗i +
M
j∈N(i)
where the functions σ1 – σ5 are non-linear invariant scaling
functions. In Eqs. (25) and (26), “ ∥ · ∥ ” outputs a multi-channel
scalar each channel of which computes the vector norm of
each channel of the input matrix.
Local frames [60–62]. These methods construct local
⃗ ∈ R3×3 that are equivariant to
frames (i.e., reference frames) F
rotations and can be utilized to project the geometric
information into invariant representations. In particular, LoCS
⃗ i ∈ R3 of
[61] and Aether [61] leverage the angular position w
⃗ i = R(⃗
each node i to construct node-wise local frames F
wi )
wi ) ∈ R3×3 is the corresponding rotation matrix of the
where R(⃗
⃗ i . ClofNet [60] instead builds up edge-wise
angular position w
⃗ i j = [⃗ai j , ⃗bi j ,⃗ci j ] , with
local frames F
⃗xic − ⃗x jc
⃗xic × ⃗x jc
, ⃗bi j =
,
∥⃗xic − ⃗x jc ∥
∥⃗xic × ⃗x jc ∥
⃗ci j = ⃗ai j × ⃗bi j .
⃗ai j =
(27)
Here ⃗xic = ⃗xi − ⃗xc is translation-invariant by subtracting the
∑N
⃗ i j is also
⃗xi so that the frame F
center of mass ⃗xc = N1 i=1
translation-invariant.
With local frames, the invariant message mi j is generated as
(
)
⃗ ⊤F
⃗
mi j = σ1 hi , h j , V
(28)
ij ij ,
⃗ i j is the translation-invariant geometric information
where V
between node i and j , similar to the considerations in GMN
(Eq. (20)). ClofNet additionally considers to project the
invariant message into an equivariant counterpart:
( )
⃗ ij = F
⃗ i j σ2 mi j .
M
(29)
There are other works that exploit the scalarization
technique to permit equivariance. GVP-GNN [64] first
performs channel-wise linear projection of the input vector to
align the channel dimension, and then computes the
normalization of the projected vector as the scalar that is
multiplied with the vector as the output vector. During this
process, GVP-GNN does not pass the information from the
input scalars, which is different from EGNN where the input
scalars also influence the update of the vector. EGHN [65],
built upon GMN, leverages a hierarchical encoder-decoder
mechanism to represent the multi-body interaction with
specially-designed equivariant pooling and unpooling
modules. FastEGNN [66] addresses large-scale geometric
graph scenarios by employing a small ordered set of virtual
nodes, which minimizes the number of required edges and
enhances computational efficiency. In LEFTNet [69], a local
hierarchy of 3D isomorphism is proposed to evaluate the
expressive power of equivariant GNNs and investigate the
process of representing global geometric information from
local patches. This work leads to two crucial modules for
designing expressive and efficient geometric GNNs: local
substructure encoding and frame transition encoding. SaVeNet
[70] enhances the numerical stability of the model by
introducing gradually decaying directional noise during the
training phase. ViSNet [71] employs vector-scalar interactive
message passing to implicitly extract various geometric
features. QuinNet [72] integrates many-body interactions,
extending this modeling to include interactions of up to five
bodies. Furthermore, HEGNN [73] leverages the inner product
of high-degree steerable features to enhance scalar messaging,
thereby achieving a balance between efficiency and
effectiveness. Additionally, as scalars can be combined with
various other invariant information, ETNN [74] further
amplifies the expressiveness of the model by introducing deep
topological learning constructs. EquiLLM [75] enhances the
representation of invariant scalars through knowledge
injection from large language models, and can be flexibly
generalized to various geometry tasks.
For all above methods, the scalarization process is
implemented via the inner-product operator. In contrast to this,
Frame-Averaging [67] proposes
to ensure equivariance via
1 ∑
−1
this averaging process: |G| g∈G g · σ(g · ⃗x) , where σ is an
−1
arbitrary MLP and the term g · ⃗x makes the input invariant.
To deal with the case when the cardinality of G is large, [67]
instead conduct an average over a carefully selected subset
that is obtained by the so-called frame function. The idea of
Frame-Averaging is latter exploited in the field of material
design [68].
4.3.2 High-degree steerable models
For the aforementioned scalarization-based models, the node
variables to be updated include invariant scalars hi and
⃗ i for the multi-channel case), and
equivariant vectors ⃗xi (or V
the 3D rotation representation throughout the network is the
rotation matrix Rg . It will be observed that scalars and vectors
are respectively type- 0 and type- 1 steerable features, and the
rotation matrix is the 1 st degree matrix of a more general
rotation representation. We will show that it is possible to
derive high-degree representations of steerable features
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
beyond scalars and vectors in equivariant GNNs.
Prior to the introduction of high-degree models, we first
introduce the concepts: 1. Wigner-D matrices [97] to convert
3D rotations to group representations of different degree;
2. spherical harmonics [98] to convert 3D vectors to steerable
features of different type; 3. Clebsch-Gordan (CG) tensor
product [99] to perform equivariant mapping between
steerable features.
Wigner-D matrices. In the general high-degree case, a
widely studied genre of the representation for the rotation
group SO( 3 ) is the irreducible representation [97]:
ρ(g):= D(l) (g) ∈ R(2l+1)×(2l+1) , g ∈ SO(3),
(30)
where
is the Wigner-D matrix7) of degree l, and
l ∈ N = {0, 1, 2, . . . } . In particular, D(0) (g) = 1 reduces to trivial
representation and D(1) (g) = Rg takes the form of the rotation
matrix. The steerability of a type- l feature ⃗x(l) ∈ R2l+1 is
defined as D(l) (g)⃗x(l) , which naturally unifies the
aforementioned invariant features and equivariant features by
restricting l = 0 and l = 1 , separately. Provided that there
could be steerable features of multiple types and multiple
channels, we provide a general form of steerable features:
{
}
⃗ (l) ∈ R(2l+1)×Cl
⃗ (L) := V
V
,
(31)
Clebsch-Gordan (CG) tensor product. Although spherical
harmonics offer a way to design equivariant mapping from 3D
coordinates (type-1 features) to type- l features, they are
unable to depict the interactions between steerable features of
arbitrary types, which, however, is central to the design of
equivariant functions when their input contains steerable
features of various types. Fortunately, CG tensor product
provides a tractable solution to this issue [99]. It derive
⃗ (l) ∈ R(2l+1)×C from two multi-channel steerable features
V
⃗ (l1 ) ∈ R(2l1 +1)×C1 , V
⃗ (l2 ) ∈ R(2l2 +1)×C2 by:
V
[
]
⃗ (l) = V
⃗ (l1 ) ⊗W
⃗ (l2 ) (l) ,
V
(34)
cg V
which can be expanded in details by:
D(l) (g)
l∈L
where L is the set consisting of all possible types and Cl is the
number of channels for type l . Since we are addressing
geometric graphs in this paper, we will specify the steerable
⃗ (l) .
⃗ (L) and its type- l component as V
features of node i as V
i
i
Spherical harmonics. We have defined how to steer type- l
features via Wigner-D matrices, but we do not know yet how
to obtain type- l features given 3D coordinates. Spherical
harmonics are such tools to serve this purpose. Spherical
harmonics are a set of Fourier basis on the unit sphere S 2 .
They map 3D vectors on the unit sphere S 2 into (2l + 1) dimensional vector space8). That is,
Y (l) (⃗x) : S 2 7→ R2l+1 ,
(32)
where ⃗x is a unit vector on the sphere, and the elements in
Y (l) are usually used together and denoted as
(l) (l)
(l)
[Y−l
, Y−l+1 , · · · , Yl−1
, Yl(l) ] where different element is called
different order. It can also be generalized to take arbitrary 3D
⃗x
vector as input by properly normalizing the vector as ∥⃗x∥ prior
to feeding into the spherical harmonics. This offers a unified
view of transition to vector spaces of arbitrary type, where
scalars correspond to Y (0) (⃗x) = 1 when l = 0 , and vectors
correspond to Y (1) (⃗x) = ⃗x ∈ R3 when l = 1 . More importantly,
spherical harmonics are equivariant in terms of Wigner-D
matrices:
Y (l) (Rg ⃗x) = D(l) (g)Y (l) (⃗x), g ∈ SO(3),
(33)
where Rg
is the rotation matrix and D(g) ∈
R(2l+1)×(2l+1) refers to the Wigner-D matrix of degree l. To
create multi-type multi-channel steerable features, we apply
Y (l) over multiple copies for each type in L , yielding Y(L) .
∈ R3×3
9
v(l)
m,c =
C∑
1 ,C 2
wc1 c2 c
c1 =1
c2 =1
l1 ,l2
∑
Q(l(l,m)
v(l1 ) v(l2 ) ,
,m )(l ,m ) m1 ,c1 m2 ,c2
1
1
2
2
(35)
m1 =−l1
m2 =−l2
(l)
⃗ (l) ;
where vm,c indicates the m th order and c th channel of V
(l,m)
Q(l ,m )(l ,m ) are the Clebsch-Gordan (CG) coefficients [99]
1 1 2 2
and are zeros unless |l1 − l2 | ⩽ l ⩽ l1 + l2 ; wc1 c2 c is the learnable
parameter in the parameter matrix W ∈ RC1 ×C2 ×C , and when
W are all ones, Eq. (35) reduces to the traditional nonparametric CG tensor product.
One promising property of CG tensor product is that it is
SO(3) -equivariant regarding Wigner-D matrices, implying
that ∀g ∈ SO(3) ,
(
)]
[(
)
(l2 )
⃗ (l2 ) (l) .
⃗ (l) = D(l1 ) (g)V
⃗ (l1 ) ⊗W
(36)
D(l) (g)V
cg D (g)V
For simplicity, the steerable variables in Eq. (34) are all of a
single type. It is tractable to generalize Eq. (34) to the multitype case by employing it over each combination of inputoutput type, and assigning different learnable parameters
accordingly, which leads to a general form as follows:
⃗ (L) = V
⃗ (L1 ) ⊗W
⃗ (L2 ) .
V
cg V
(37)
With the above building blocks, we below introduce several
prevailing high-degree steerable models where the updated
⃗ (L) .
steerable variables for each node are V
i
TFN [7]. With our formulation for the high-degree steerable
operations, Tensor Field Network (TFN) computes the
following equivariant point convolution:
)
(
xi j
(L)
(L) ⃗
⃗ (L) ,
⃗
⊗W V
(38)
Mi j = Y
∥⃗xi j ∥ cg j
where ⃗xi j = ⃗xi − ⃗x j is the radial vector, and the element in W
is generated by a radial MLP f (∥⃗xi j ∥) upon the distance ∥⃗xi j ∥ .
Here ⃗xi are fixed as the initial coordinates of the input data.
The update of each node is implemented as a series of
operations including aggregation:
∑
⃗ (L) ,
⃗ (L) +
⃗ (L) = V
(39)
M
U
ij
i
i
j∈N(i)
and self-interaction:
⃗ (l)W (l) }l∈L ,
⃗ (L) = {U
V
i
7) Wigner-D matrices lie in the complex space, but they can be transformed to the real space under appropriate bases.
8) Similar to Wigner-D matrices, the output of spherical harmonics are complex but can be transformed into real space under certain bases.
(40)
10
Front. Comput. Sci., 2025, 19(11): 1911375
where W (l) ∈ Rcl ×cl is the learnable channel-mixing matrix for
each type l , and node-wise non-linearity:
{
(
)}
⃗ (l) σ ∥V
⃗ (l) ∥2 + b(l)
⃗ ′(L) = V
V
,
(41)
l∈L
where σ (·) is an activation function, “ ∥ · ∥2 ” is the L2 vector
⃗ (l) , and
norm over the order dimension (with size (2l + 1) ) of V
b(l) ∈ Rcl is the bias for type l .
SEGNN [9]. SEGNN enhances TFN from equivariant point
convolution to general equivariant message passing. Firstly,
SEGNN involves high-degree geometric features from both
node i and j in message computation by deriving
⃗ (L) ⊕ {∥⃗xi j ∥2 } , where, again, ⃗xi j = ⃗xi − ⃗x j is the
⃗ (L) ⊕ V
⃗ (L) = V
V
j
i
ij
radial vector, and “ ⊕ ” denotes concatenation along the
channel dimension for the steerable features with the same
type l ∈ L . For example,
⃗ (l) ∥V
⃗ (l) }l∈L .
⃗ (L) ⊕ V
⃗ (L) :={V
V
1
2
1
2
(42)
Here, “ ∥ ” stands for concatenation along the channel.
Subsequently, the high-degree linear message passing
specified in Eq. (38) is extended to a non-linear fashion via
gated non-linearities [100]:
)
(
⃗xi j
⃗ (L) ,
⃗ (L) , gi j = Y(L)
⊗W1 V
(43)
V
ij
∥⃗xi j ∥ cg i j
)
( (L)
⃗ , Swish( gi j ) ,
⃗ (L) = Gate V
(44)
M
ij
ij
where Gate (·) is the gated non-linearity introduced in [100],
Swish (·) is the Swish activation [101], and gi j is a scalar read
out from the CG tensor product that will further be leveraged
to control the scale in the non-linearity of Eq. (44). Notably,
the CG product and non-linearity in Eqs. (43) and (44) are
performed twice in the implementation of [9]. Analogous to
the design of multi-layer perceptrons (MLPs), they are dubbed
the steerable MLP.
The update function also employs the proposed steerable
MLP. In detail,
(
)
∑
⃗
x
i
j
⃗ (L) , gi =
Y(L)
V
i
∥⃗
x
∥
i
j
j∈Ni
∑
(L)
(L)
V
2
⃗
⃗
,
⊗W
+
M
(45)
cg
i
ij
j∈Ni
)
( (L)
⃗ , Swish(gi ) .
⃗ (L) + Gate V
⃗ ′(L) = V
V
i
i
i
(46)
Besides those have been introduced above, there are still
many methods to build equivariant models with high-degree
steerable features. Cormorant [76] utilizes channel-wise CG
product (a reduced and more efficient form of Eq. (34) that
acts on each input channel independently) and channel
concatenations to formulate one-body and two-body
interactions among the input graph systems. NequIP [10]
improves the convolutional layer in TFN [7] by further
introducing the radial Bessel functions and a polynomial
envelope function used in DimeNet [3] to get a better
embedding of interaction distance, thereby improving the
performance of the model. SCN [78] regards each node
embedding as a set of spherical functions (i.e., the spherical
harmonics), then conducts message passing by rotating the
embeddings based on the 3D edge orientation, and finally
updates the node embeddings via discrete spherical Fourier
Transform. Its following work, eSCN [79] proposes to reduce
the computation complexity of the equivariant convolution on
SO(3) with a mathematically equivalent one on SO(2) . To
enable higher body interaction beyond the two-body modeling
in most previous papers, MACE [80] and Allegro [77],
propose a simplified algorithm to construct the tensor product
item, motivated by a new technology in physics called Atomic
Cluster Expansion (ACE) [102–104].
An illustrative comparison of invariant GNNs, scalarizationbased equivariant GNNs, and high-degree steerable
equivariant GNNs is summarized in Table 2.
4.4 Geometric graph transformers
Inspired by the significant success of Transformers [105,106]
in many areas, such as natural language processing and
computer vision, there have been efforts to apply these selfattention-based architectures to data structure like graphs or
even geometric graphs in the scope of this survey.
Summarized in Fig. 4, these methods stem from different
types of geometric representations, including invariant
representation, scalarization-based equivariant representation,
and high-degree steerable representation, which have been
elaborated in Section 4. Below we discuss these Transformers
in detail.
Graphormer [81,82]. Graphormer has been firstly proposed
as a powerful Transformer architecture operating on graphs,
Table 2 Illustrations of representative models for invariant GNNs, scalarization-based GNNs and high-degree steerable GNNs. Notably, these three types of
models are able to process geometric features of different degrees
Invariant GNNs (e.g., SchNet [47])
Scalarization-Based Models (e.g., EGNN [5])
mi j = σ2 (ri j )σ1 (h j )
)
(
mi j = σ1 hi , h j , ∥⃗xi − ⃗x j ∥2 , ei j
( )
⃗ i j = (⃗xi − ⃗x j )σ2 mi j
m
)
(
⃗ (L)
⃗ (L) = Y(L) ⃗xi j ⊗W
M
cg V j
ij
∥⃗x ∥
)
( ∑
h′i = σ3 hi , j∈Ni mi j
)
( ∑
h′i = σ3 hi , j∈Ni mi j
∑
′
⃗xi = ⃗xi + γ j∈Ni m
⃗ ij
)
( (L) ∑
⃗ (L)
⃗ , j∈N(i) M
⃗ (L) + σ V
⃗ ′(L) = V
V
ij
i
i
i
Message computation
Feature update
High-Degree Steerable Models (e.g., TFN [7])
ij
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
equipped with centrality encoding, spatial encoding, and edge
encoding [81]. With its success on challenging 2D graph
datasets, e.g., the OGB-LSC Challenge [107], it has been
subsequently extended to work on geometric graphs with
special designs in computing the encodings. To be specific,
the spatial encoding, which aims to measure the spatial
⃗ , is chosen to be the
relation between node i and j in G
Euclidean distance ∥⃗xi − ⃗x j ∥2 transformed by Gaussian basis
functions [108]. The centrality encoding is derived as a
summation of the spatial encodings over the connected edges
for each node. The encodings are then utilized in computing
the self-attention, and layer normalization is also adopted for
the intermediate features. Notably, all representations are
E( 3 )-invariant under the construction of Graphormer. In order
to make it suitable for E( 3 )-equivariant prediction tasks, [81]
proposes to use a projection head as the final block, which
aggregates the edge vectors, scaled by their corresponding
attention weights to obtain a node-wise vector as output:
∑
f⃗i =
ai j (⃗xi − ⃗x j ),
(47)
j,i
where ai j is the E(3) -invariant attention weight between node
i and j .
TorchMD-Net [83]. TorchMD-Net is an equivariant
Transformer that tackles general multi-channel geometric
vectors in a scalarization-based manner, akin to PaiNN [6].
Yet, in the process of attention computation, only invariant
representations hi and distances ∥⃗xi j ∥ are involved.
Specifically, the distance is firstly embedded by two MLPs
σdK and σdV for the key and value, respectively:
(i j)
diKj = σdK (eRBF ),
(i j)
(48)
diVj = σdV (eRBF ),
(i j)
where eRBF is the radial basis function representation of
distance ∥⃗xi j ∥ , similar to Eq. (7). The query, key, and value
are given by linear transformations of the input scalar features:
qi = hiWQ , ki j = hiWK ⊙ diKj , vi j = h jWV ⊙ diVj ,
(49)
where “ ⊙ ” is the element-wise product. Instead of
traditionally adopted Softmax operator [105], TorchMD-Net
simplifies to SiLU non-linearity:
∑
(
)
(
)
ai j =
SiLU qi ⊙ ki j · Cutoff ∥⃗xi j ∥ ,
(50)
Ch
with Cutoff (·) being a cosine cutoff on the distance and the
summation being over the channels of these invariant features.
Finally, the output of the attention is yielded as
)
(∑
(51)
h′i =
ai j vi j WO ,
j∈Ni
with WO being a linear transformation for the output.
SE(3) -Transformer [8]. Different from Graphormer and
TorchMD-Net that limit the representation to scalars and
vectors with degree l ∈ {0, 1} , SE(3) -Transformer employs
attention mechanism on general steerable features with high
degree. Following our notations introduced in Section 4.3.2,
we describe the attention computation as follows.
⃗ (L) and pairwise key K
⃗ (L) and value
The point-wise query Q
⃗ (L) are derived as:
V
ij
i
ij
Q ⃗ (L)
⃗ (L) = 1 ⊗W
Q
cg Vi ,
i
(
)
⃗xi j
⃗ (L) = Y(L)
⃗ (L) ,
K
⊗WK V
ij
j
∥⃗xi j ∥ cg
)
(
⃗xi j
⃗ (L) .
⃗ (L) = Y(L)
⊗WV V
V
ij
∥⃗xi j ∥ cg j
11
(52)
The attention coefficient ai j is computed as a Softmax
aggregation over the neighbors with message being the inner
products of the queries and keys, ensuring rotation invariance:
( (L) (L) )
⃗ ·K
⃗
exp Q
ij
i
(53)
αi j = ∑
( (L) (L) ) .
⃗
⃗
k∈Ni exp Qi · Kik
The attention is then utilized to aggregate the values and
update the node feature:
∑
1 ⃗ (L)
⃗ (L) .
⃗ ′(L) = 1 ⊗W
(54)
αi j V
V
cg Vi +
ij
i
j∈Ni
With the invariant attention, the updated feature is easily
guaranteed to satisfy SE(3) -equivariance.
Besides, LieTransformer [84] extends the idea of LieConv
[54] by building attentions on top of lifting and sampling on
Lie groups. GVP-Transformer introduced in [85] leverages
GVP-GNN [64] as the structural encoder and applies a generic
Transformer over the extracted representation, exhibiting
strong performance in learning inverse folding of proteins.
Equiformer [11] proposes to replace dot product attention in
Transformers by MLP attention and non-linear message
passing, building upon the space of high-degree steerable
tensors. EquiformerV2 [86] further incorporates eSCN [79] in
the architecture for efficient modeling and introduces more
technical enhancements like specially designed attention renormalization and layer normalization for better empirical
performance. Geoformer [87] develops an invariant module
called Interatomic Positional Encoding (IPE) based on the
invariant basis from ACE, in order to enhance the
expressiveness of many-body contributions in the attention
blocks. Recently, SO3KRATES [88] proposed a technique
aimed at leveraging the advantages of high-degree
representations while simplifying the complexity inherent in
tensor products. This approach focuses on the design of a
model that utilizes only the paths that yield scalars in tensor
products. Later, GotenNet [89] broadened the scope of the
inner product form, creating a multi-channeled version and
referring to models that employ this methodology as
spherical-scalarization models. GotenNet integrated the inner
product with the original attention mechanism, resulting in an
efficient equivariant transformer architecture.
As previous transformers typically focus on a specific
domain, either proteins or small molecules. EPT [90] proposes
a novel pretraining framework designed to harmonize the
geometric learning of small molecules and proteins. It unifies
the geometric modeling of multi-domain molecules via blockenhanced representation upon an PaiNN-based transformer
framework.
4.5 Theoretical analysis on expressivity
In machine learning, an important criterion for measuring the
expressiveness of a network is whether it has universal
12
Front. Comput. Sci., 2025, 19(11): 1911375
approximation property. In the task of learning on geometric
graphs, this is whether any function of geometric graphs can
be approximated by geometric GNNs with arbitrary accuracy.
An initial attempt to explore this problem is conducted by
[109], which proves the universality of the high-degree
steerable model, i.e., TFN [7], over point clouds (namely
fully-connected geometric graphs) by showing that TFN can
fit any equivariant polynomials. GemNet [4] further
demonstrates that the universality holds with just spherical
representations other than the full SO(3) representations that
are required in the proof of [109]. Later, the GWL framework
[96] defines a geometric version of the Weisfeiler-Lehman
(WL) test [110] to study the expressive power of geometric
GNNs operating on sparse graphs from the perspective of
discriminating geometric graphs, and discuses the difference
of the expressivity between various invariant and equivariant
GNNs, both theoretically and experimentally. One crucial
conclusion drawn by the GWL paper is that GWL is strictly
more powerful than invariant GWL, showing the advantage of
equivariant GNNs against invariant GNNs. For fullyconnected geometric graphs, invariant GWL has the same
expressive power as GWL. More recently, HEGNN [73] has
provided both theoretical and experimental insights into the
necessity of employing high-degree steerable features on
symmetric graphs. Specifically, under the strict equivariance
constraint, the degradation of representations of certain
degrees on symmetry graphs cannot be avoided unless it is
circumvented by relaxing some conditions (e.g., probabilistic
symmetry breaking in SymPE [111]). Furthermore, HEGNN
establishes a connection between high-degree steerable
features and Legendre polynomials, indicating that innerproduct of sufficiently high-degree representations can recover
all angular information present in geometric graphs.
There are other works that only investigate the universality
of the message computation function [46,51]. They explore
the expressivity of the scalarization-based models (e.g.,
EGNN), and [46] confirms that the scalarization-based
methods can universally approximate any invariant/
equivaraint functions of vectors. Besides, SGNN [25]
generalizes from equivariance to subequivariance that depicts
the case when part of the symmetry is broken by external force
field, e.g., gravity, and finally design an universal form of
subequivariant functions.
5
Applications
In this section, we systematically review the applications
related to geometric graph learning. We classify existing
methods according to the system types they work on, which
leads to the categorization of tasks on particle, (small)
molecule, protein, molecule + molecule (Mol + Mol),
molecule + protein (Mol + Protein), protein + protein, and
other domains, as summarized in Table 3. We also provide a
summary of all related datasets of single- and multipleinstance tasks in Tables 4 and 5, respectively. It is worth
mentioning that our discussion primarily focuses on the
methods utilizing geometric GNNs, although other
methods, such as sequence-based approaches, may be
applicable in certain applications.
5.1 Tasks on particles
The particle representation serves as an abstract and unified
concept in the context of dynamic modeling in physics. Rigid
bodies, elastic bodies and even fluid can be modeled as a set
of particles [25]. Under such a particle-based modeling, a
⃗
physical object of interest corresponds to a geometric graph G
as specified in Definition 4, where different particles are
modeled as different nodes, and physical interactions between
particles such as attraction/repulsion force, collision, rolling,
and sliding are denoted as edge connections.
5.1.1 Physical dynamics simulation
Geometric GNNs have been widely applied to characterize the
process of general physical dynamics. One typical example is
N -body simulation, which is originally proposed by [27] and
targets at modeling the dynamics of a prototype system
composed of N interacting particles. While it is built under an
ideal condition, an N -body system is capable of representing
various physical phenomena across a spectrum encompassing
quantum physics through to astronomy, by accommodating
diverse interactions. Other examples include the simulation of
physical scenes that involves more complex objects including
fluids, rigid-bodies, deformable-bodies, and human motions.
Task definition: Given the initial state of the system
⃗ (0) , the future states of all
represented by a geometric graph G
N particles after a period of k steps are predicted by a
parametric function:
⃗ (t) ).
⃗ (t+k) = ϕθ (G
X
(55)
In contrast to the above single-state prediction setting, one
may also conduct a “roll-out” simulation by recurrently taking
the predicted output of current state as the input for the
prediction of the next state. Furthermore, it can also be
extended to the spatio-temporal setting by taking the historical
geometric graphs within a window of size w (namely
⃗ (t−w+1:t) ) as input, rather than a single input frame (namely
G
⃗ (t) ) in Eq. (55).
G
Symmetry preserved: This is an E( 3 )-equivariant task, as
the transformation of the initial state results in the same
transformation
( )
( of) the predicted state. It means
⃗ = ϕθ g · G
⃗ , ∀g ∈ E(3) .
g · ϕθ G
Datasets: The datasets used in current methods belong to the
following classes: 1) N -body dataset series. The original N body dataset [27] presents an environment capable of
simulating three types of system, including 1D phase-coupled
oscillators, 2D springs, and 2D charged balls. The authors in
[8] further generalize N -body to encompass 3D cases.
Recently, the work [51] designs Constrained N -body by
adding geometric constraints between particles, leading to a
combination of diverse systems with isolated particles, sticks
and hinges. Later, the systems derived by [65] further
introduce the interactions between complex objects that are
composed of multiple particles interconnected by rigid sticks.
2) Scene simulation datasets. The paper [118] proposes four
simulation environments: FluidFall, FluidShake, BoxBath, and
RiceGrip, where the former two focus on fluid modeling, the
third one tests fluid-rigid interactions, and the final one
involves modeling deformable objects with elastic/plastic
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
13
Table 3 Summary of various geometric GNNs for different tasks. The generative tasks indicates the ones addressable by generative models, otherwise
referred to as the non-generative tasks. The ones can be solved with either generative or non-generative models are dubbed as the mixed tasks
Data type
Particle
Small molecule
Task name
Task type
N -body simulation
Non-generative
Scene simulation
Non-generative
Molecular property
prediction
Non-generative
Molecular dynamics
Mixed
Molecular generation
Generative
Pretraining
Mixed
Protein property
prediction
Non-generative
Protein inverse folding Generative
Protein
Protein folding
Generative
Protein co-design
Generative
Pretraining
Mixed
Linker design
Generative
Chemical reaction
Generative
Ligand binding affinity Non-generative
prediction
Mol + Protein Protein-ligand docking Mixed
Pocket-based mol
Mixed
sampling
Protein interface
Non-generative
prediction
Binding affinity
Non-generative
prediction
Protein-protein
Protein +
Mixed
docking
Protein
Mol + Mol
Others
Antibody design
Mixed
Peptide design
Mixed
Crystal property
prediction
Non-generative
Crystal generation
Generative
RNA structure ranking Non-generative
Methods
Physics
NRI [27], IN [112], E-NFs [5], EGNN [5], SEGNNs [9], GMN [51], EGHN [65], HOGN [113],
NCGNN [114], FastEGNN [66], HEGNN [73]
SGNN [25], GNS [26], GNS* [115], C-GNS [116], HGNS[117], DPI-Net [118], HRN [119],
FIGNet [120], EGHN [65], LoCS [61], EqMotion [121], ESTAG [31], SEGNO [122],
FastEGNN [66], HEGNN [73], EquiLLM [75]
Biochemistry
Cormorant [76], TFN [7], SE(3)-Transformer [8], NequIP [10], SEGNNs [9], LieConv [54],
Lietransformer [84], SchNet [47], DimeNet [3], GemNet [4], PaiNN [6], TorchMD-Net [83],
Equiformer [11], SphereNet [123], EGNN [5], Graphormer [81,82], SCN [78], eSCN [79],
GNN-LF [124], LEFTNet [69], SaVeNet [70], ViSNet [71], QuinNet [72], SO3KRATES [88],
Gaunt [125], GotenNet [89]
E-CNF [126], EGNN [5], NequIP [127], GMN [51], EGHN [65], NCGNN [114], ESTAG [31],
EGNO [127], SEGNO [122], ITO [128], E-ACF [129], GeoTDM [130], HEGNN [73],
StABlE [131], [132]
GeoDiff [133], GeoLDM [134], ConfVAE [135], ConfGF [136], G-SchNet [137], cG-SchNet [138],
MDM [139], MolDiff [140], DGSM [141], E-NFs [142], EDM [143], GeoMol [144], Torsional
Diffusion [30], MPerformer [145], EEGSDE [146], DMCG [147], HierDiff [148], EquiFM [149],
CoarsenConf [150], GeoBFN [151], MolCRAFT [152]
3D-EMGP [153], GeoSSL-DDM [154], GraphMVP [155], GNS-TAT [156], MGMAE [157], 3DInfomax [158], Uni-Mol [159], Transformer-M[160], MoleculeSDE [161], SliDe [162], Frad [163],
DenoiseVAE [164], MolSpectra [165]
LM-GVP [166], DeepFRI [167], GearNet [168], 3DCNN [169], TM-align [170], GVP-GNN [64],
PAUL [171], EDN [172], EnQA [173], ScanNet [174], EquiPocket [175], PocketMiner [176]
GVP-GNN [64], [177], ESM-IF1 [85], GCA [178], ProteinMPNN [179], PiFold [180],
LM-Design [181], KW-Design [182]
AlphaFold [33], AlphaFold2 [183], RosettaFold [12], RosettaFold2 [48], RFAA [184],
EigenFold [185], RFdiffusion [13], Chroma [14], ESMFold [186], HelixFold-Single [187]
Chroma [14], RFdiffusion [13], PROTSEED [188], ReQFlow [189]
ProtTrans [190], xTrimoPGLM [191], ProtGPT2 [192], HJRSS [193], GearNet [168], ProFSA
[194], PromptProtein [195], DrugCLIP [196], ESM-1b [197], ESM2 [186], Guo et al. [198],
PAAG [199]
DiffLinker [200], DeLinker [201], 3DLinker [28]
OA-ReactDiff [202], TSNet [203]
TargetDiff [29], MaSIF [204], GET [205], ProtNet [206], HGIN [207], BindNet [208],
BADGER [209], DeepTernary [210]
EquiBind [211], DiffDock [16], TankBind [212], DESERT [213], FABind [214], Re-Dock [215]
Pocket2Mol [216], TargetDiff [29], DiffBP [217], SBDD [218], GraphBP [219], FLAG [220],
DESERT [213], D3FG [221], MolCRAFT [152], MolJO [222], DiffBP [217], VoxBind [223]
DeepInteract [224], dMaSIF [225], SASNet [226]
mmCSM-PPI [227], GeoPPI [228], GET [205]
EquiDock [229], HMR [230], HSRN [231], DiffDock-PP [232], SyNDock [233], AlphaFoldMultimer [234], dMaSIF [235], ElliDock [236], EBMDock [237]
DiffAb [238], MEAN [32], dyMEAN [17], RefineGNN [239], PROTSEED [240], AbBERT[240],
ADesigner [241], AbODE[242], AbDiffuser [243], tFold[244], GeoAB [245], RAAD [246],
EquiLLM [75]
HelixGAN [247], RFDiffusion [13], PepGLAD [35], PPFlow [248]
Other domains
CGCNN [249], MEGNet [250], ALIGNN [251], ECN [252], Matformer [253], Crystal Twins [254],
MMPT [255], CrysDiff [256]
CDVAE [257], SyMat [49], DiffCSP [50], DiffCSP++ [258], MatterGen [259], PXRDGen [260],
EquiCSP [261], FlowMM [262], CrysBFN [263]
ARES [35], PaxNet [264], EquiRNA [265]
properties. Similar to BoxBath, Water-3D created by [26]
randomly initializes the water states and constructs a highresolution water scenario. Beyond the simulation of particlelevel interaction in previous datasets, Kubric [266] and MIT
Pushing [268] can be utilized to evaluate face interactions.
Physion [267] is a large-scale dataset that involves more
realistic and diverse objects driven by more complex physical
interactions, including gravity, friction, elasticity, and other
factors.
Methods: Plenty of studies have been devoted to learning to
simulate complex physical systems using GNNs, including
Interaction Network [112], NRI [27], HRN [119], DPI-Net
[118], HOGN [113], GNS [26], C-GNS [116], HGNS [117],
GNS* [115], and FIGNet [120]. However, all these methods
adopt typical GNNs that are unaware of full symmetry in 3D
world, and only a subset of them considers translationequivariance. Since the work of SE(3)-Transformer [8], rototranslation equivariance is introduced upon the attention-based
14
Table 4
Front. Comput. Sci., 2025, 19(11): 1911375
Summary of typical datasets and benchmarks for the single instance applications
Dataset
Number of samples
N -Body [27]
3D N -Body [5]
Constrained N -Body [51]
Hierarchical N -Body [5]
Water3D [26]
Kubric MOVi-A [266]
Physion [267]
MIT Pushing [268]
FluidFall [118]
FluidShake [118]
BoxBath [118]
RiceGrip [118]
70K
7K
5.5K
9K
0.8K
0.02K
16K
6K
3K
2K
3K
5K
QM9 [21]
134K
MD17 [271]
3.6M
OCP [272]
9.8M
Adk [273]
DW-4 [126]
LJ-13 [126]
Fast-folding proteins [274]
4.1K
10K
10K
5M
GEOM [275]
450K
PCQM4Mv2 [107]
QMugs [277]
Uni-Mol [159]
3.3M
665K
209M
GENE Ontology [278]
ENZYME [279]
33.5K
18.5K
CATH [280]
189K
SCOPe [282]
108K
AlphaFoldDB [286]
200M
UniProt [288]
216M
BFD [290]
NetSurfP-2.0 [291]
CASP [293]
2100M
11.3K
45.7K
PDB [294]
1.2M
Task
Particle
N -body Simulation
N -body Simulation
N -body Simulation
N -body Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Scene Simulation
Small molecule
Molecule Property Prediction
Molecule Generation
Molecule Pretraining
Molecule Property Prediction
Molecule Dynamics
Molecule Property Prediction
Molecular Dynamics
Molecular Dynamics
Molecular Dynamics
Molecular Dynamics
Molecular Dynamics
Molecule Property Prediction
Molecule Generation
Molecule Pretraining
Molecule Pretraining
Molecule Pretraining
Molecule Pretraining
Protein
Protein Property Prediction
Protein Property Prediction
Protein Inverse Folding
Protein Pretraining
Protein Co-Design
Protein Inverse Folding
Protein Pretraining
Protein Property Prediction
Protein Folding
Protein Inverse Folding
Protein Pretraining
Protein Pretraining
Protein Property Prediction
Protein Pretraining
Protein Pretraining
Protein Structure Ranking
Protein Residue Identity
Protein Folding
geometric GNNs to address the N -body problem. Later,
EGNN [5] proposes a more effective E( n )-equivariant GNN
by using the scalarization-based strategy as already detailed in
Section 4.3.1. In contrast to EGNN, SEGNN [9] proposes a
general SE(3)-equivariant message passing by making use of
high-order degree representations. Recently, GMN [51] have
developed multi-channel equivariant modeling specifically for
constrained N -body systems consisting of sticks or hinges.
Benchmark
NRI [27]
EGNN [5]
GMN [51]
EGHN [5]
GNS [26]
GNS* [115]
SGNN [25]
FIGNet [120]
DPI-Net [118]
DPI-Net [118]
DPI-Net [118]
DPI-Net [118]
ATOM3D [269]
GEOM-QM9 [270]
3D-Infomax [158]
SchNet [47]
GMN [51]
eSCN [79]
GemNet [4]
EGHN [5]
EQ-Flow [126]
EQ-Flow [126]
ITO [128]
SchNet [47]
GEOM-Drugs [270]
GMN [51]
3D PGT [276]
3D-Infomax [158]
Uni-Mol [159]
GearNet [168]
GearNet [168]
GVP-GNN [166]
S2F [281]
PROTSEED [188]
ProstT5 [283]
ProSE [284]
TAPE [285]
ESMFold [186]
AlphaDesign [287]
GearNet [168]
Prottrans [190]
DeepLoc [289]
Prottrans [190]
PEER [292]
ATOM3D [269]
ATOM3D [269]
ESMFold [186]
Upon GMN, EGHN [65] designs equivariant pooling and
equivariant unpooling to handle the complex system with a
hierarchical structure. In the meantime, SGNN [25]
generalizes and relaxes the symmetry from equivariance to
sub-equivariance, which plausibly grants it the capability to
excel in scenarios influenced by other factors like gravity. As
conventional approaches utilize a fixed velocity estimation
throughout the time interval, NCGNN [114] instead estimates
Jiaqi HAN et al.
Table 5
A survey of geometric graph neural networks: data structures, models and applications
15
The summary of typical datasets and benchmarks for the multi-instance applications
Dataset
Number of samples
ZINC [295]
CASF [296]
GEOM [275]
SN2-TS [203]
Transition1x [297]
727K
0.28K
450K
0.11K
9.6M
CrossDocked 2020 [298]
22.5M
PDBBind [22]
23.5K
DIPS [226]
42.8K
DIPS-plus [299]
Biogrid [300]
42.1K
1.7M
DB5.5 [302]
0.23K
PDBBind [22]
SAbDab [23]
RAbD [20]
23.5K
8.1K
0.06K
SKEMPI 2.0 [303]
7.1K
Cov-abdab [304]
PepBDB [305]
LNR [307]
PepGLAD [35]
PPFlow [248]
2.4K
13K
0.09K
6K
13K
Materials Project [308]
154K
Perov-5 [309,310]
Carbon-24 [311]
ARVIS-DFT [312]
FARFAR2-Puzzles [314]
rRNAsolo [265]
18.9K
10.1K
41K
18K
92K
Task
Mol + Mol
Linker Design
Linker Design
Linker Design
Chemical Reaction
Chemical Reaction
Mol + Protein
Ligand Affinity
Pocket-Based Molecule Sampling
Ligand Affinity
Protein-Ligand Docking
Protein + Protein
Protein Interface Prediction
Protein-Protein Docking
Protein Interface Prediction
Protein Interface Prediction
Protein Interface Prediction
Protein-Protein Docking
Binding Affinity Prediction
Binding Affinity Prediction
Antibody Design
Antibody Design
Antibody Design
Binding Affinity Prediction
Antibody Design
Peptide Design
Peptide Design
Peptide Design
Peptide Design
Others
Protein Crys. Property Prediction
Crystal Generation
Crys. Generation
Crys. Generation
Crys. Property Prediction
RNA Struct. Ranking
RNA Struct. Ranking
velocities at multiple time points using Newton-Cotes
numerical integration. There are also other works that
approach physical simulation based on the spatio-temporal
setting. LoCS [61] utilizes GRU to record the memory of past
frames and additionally incorporates rotation-invariance to
improve the model’s generalization ability; EqMotion [121]
distills the history trajectories of each node into a multidimension vector and then designs an equivariant module and
an interaction reasoning module to predict future frames;
ESTAG [31] employs equivariant discrete Fourier Transform
along with the equivariant spatio-temporal attention
mechanism to model the physical dynamics. SEGNO [315]
incorporates the second-order graph neural ODE with
equivariant property to reduce the roll-out error of long-term
physical simulation.
5.2 Tasks on small molecules
By representing atom coordinates as node positions and bonds
as edges, a molecule naturally becomes a geometric graph
⃗ ∈ RN×3 represents the positions of N atoms in the
where X
molecular, H ∈ RN×Ch indicates the atom types or other
Benchmark
3DLinker [28]
DeLinker [201]
DiffLinker [200]
TSNet [203]
OA-ReactDiff [202]
GNINA [298]
TargetDiff [29]
ATOM3D [269]
EquiBind [211]
ATOM3D [269]
EquiDock [229]
DeepInteract [224]
SYNTERACT [301]
ATOM3D-PIP [269]
EquiDock [229]
GET [205]
GeoPPI [228]
RefineGNN [239]
RefineGNN [239]
ATOM3D [269]
mmCSM-PPI [227]
RefineGNN [239]
CAMP [306]
PDAR [307]
PepGLAD [35]
PPBench2024 [248]
CGCNN [249]
CDVAE [257]
CDVAE [257]
CDVAE [257]
JARVIS-ML [313]
ARES [269]
EquiRNA [265]
properties of the atoms, and A ∈ {0, 1}N×N represents the
existence of bonds. Usually, the edge feature ei j ∈ {0, 1, 2, 3} is
defined by the bond type of the edge from node i to j . In
addition to chemical edges, the relative distance di j between
two atoms is also utilized for constructing k-NN spatial edges
by selecting for each atom the k nearest atoms as its
neighbors, and the spatial edge feature is defined as
ei j = σ(di j ) where σ is a non-linear function, such as RBF.
Prior to the use of geometric graph, a molecule could be
typically represented by a 1D string (e.g., SMILES [316] and
SMARTS [317]) or a 2D topological graph, both of which
lose sight of the geometric information of the molecule,
resulting in defective performance for the tasks that involve
crucial spatial interactions between atoms. Here, we only
introduce the works that apply geometric graphs to represent
molecules.
5.2.1 Molecular property prediction
Molecular property prediction has been a fundamental task in
computational biochemistry and machine learning. As
pinpointed by MoleculeNet [45], common properties can be
16
Front. Comput. Sci., 2025, 19(11): 1911375
subdivided into four categories: quantum mechanics, physical
chemistry, biophysics, and physiology. With the help of
geometric GNNs, we are now able to additionally consider the
molecular geometries which have been demonstrated to be
crucial in determining the quantum chemistry properties of
molecules.
Task definition: With the input molecule characterized as a
⃗ , the task is to learn a model ϕθ to predict a
geometric graph G
scalar property y and/or a vectorial property ⃗y :
( )
⃗ .
y,⃗y = ϕθ G
(56)
While most works mainly focus on the single-task setting by
predicting each individual type of property independently, it is
also possible to leverage the multi-task setting by predicting
multiple types of property simultaneously.
Symmetry preserved: It is an SE( 3 )-invariant task in terms
of y since it remains unaffected by( any
) rotation
(
)or translation
⃗
⃗
exerted on the molecule, i.e., ϕθ G = ϕθ g · G , ∀g ∈ SE(3) .
As for
SE( 3 )-equivariance into the model:
( )⃗y , we( enforce
)
⃗
⃗
g · ϕθ G = ϕθ g · G , ∀g ∈ SE(3) .
Datasets: There are currently three popular data sources for
the evaluation of this task, including QM9 [21], MD17 [271],
and Open Catalyst Project (OCP) [272]. The QM9 dataset
contains 131K small organic molecules with up to nine heavy
atoms from CONF, and each molecular is annotated with 13
property labels ranging from the highest occupied molecular
orbital to the norm of the dipole moment. MD17 is a
collection of molecular dynamic simulations for eight small
organic molecules, whose goal is to predict both the energy
and atomic forces of each molecule, given the atom
coordinates in the non-equilibrium and slightly moving
system. OCP consists of more than 100M atomic structures for
catalysts to help address climate change, each composed of a
molecule called adsorbate placed on a slab named catalyst.
OCP provides two datasets OC20 [34] and OC22 [272] for
benchmarking, and there are three kinds of tasks in OCP
where Initial Structure to Relaxed Energy (IS2RE) taking an
initial structure as input to predict the relaxed energy is a
highly challenging task.
Methods: Most of the methods introduced in Section 4 are
evaluated on molecular property prediction tasks. Here, to
avoid redundant introduction, we no longer describe each
method in detail and only specify which of the three
mentioned benchmarks they are evaluated on. Specifically,
invariant GNNs (including SchNet [47], DimeNet [3],
SphereNet [123], and GemNet [4]), equivariant GNNs
(including Cormorant [76] and PaiNN [6]) and equivariant
graph transformers (e.g., TorchMD-Net [83] and Equiformer
[11]) employ both QM9 and MD17 for performance
comparisons. Other methods like NequIP [10] are conducted
on MD17, while EGNN [5], LieConv [54] and SE(3)Transformer [8] are evaluated on QM9. SEGNN [9],
Graphormer [81,82], Equiformer [11], SCN [78], and eSCN
[79] leverage more challenging benchmarks, namely, OC20
and even OC22 for performance assessment, revealing
encouraging effectiveness of applying geometric GNNs to
catalyst design.
5.2.2 Molecular dynamics simulation
Molecular Dynamics (MD) simulation aims to simulate the
temporal evolution process of molecules driven by internal
interactions between atoms within the same molecule, external
interactions among different molecules, or environmental
interactions from solvents and force fields.
Task definition: Given an input molecular graph at time t ,
⃗ (t) , this task simulates the dynamical evolution of the
i.e., G
molecular over some time. In general, the future coordinates
⃗ (t+k) (k > 0) are estimated by
X
( )
⃗ (t) .
⃗ (t+k) = ϕθ G
X
(57)
Similar to general physical dynamics simulation in Section
5.1.1, one may also conduct a roll-out prediction setting or the
spatio-temporal input setting. Besides, in contrast to the direct
trajectory prediction here, MD can be alternatively addressed
with the methods designed for molecular property prediction
as described in the last subsection. We can first predict the
⃗ ∈ RN×3 or the graph-level system energy
node-level force F
⃗ , and then use these
E ∈ R for the given state of the system G
estimated quantities to update the molecular dynamics by
solving the differential equations that describe molecular
dynamics.
Symmetry preserved: Clearly, the output coordinate matrix
(t+k)
⃗
is E( 3 )-equivariant.
X
Datasets: MD17 [271], AdK [273], OCP [272], DW-4
[126], fast-folding proteins [274], and LJ-13 [126] are
available datasets for MD simulation in the machine learning
community. MD-17 [271] which is usually used for molecular
property prediction also contains the trajectories of eight
molecules generated via DFT. The AdK equilibrium trajectory
dataset simulated by CHARMM27 force field in the
MDAnalysis software [318] involves the MD trajectory of apo
adenylate kinase with explicit water and ions in NPT at 300 K
and 1 bar, where the atom positions of the protein are saved
every 240 ps for a total of 1.004 μs. Besides the common
relaxed energy prediction task, OCP releases a dataset split for
MD, which computes short, high-temperature ab initio MD
trajectories on a randomly sampled subset of the relaxed
states. DW-4 is a relatively simple system consisting of only 4
particles embedded in a 2D space which are governed by an
energy function between pairs of particles, while LJ-13 is
given by the Leonnard-Jones potential, consisting of 13
particles embedded in a 3D space. Both energy functions in
DW-4 and LJ-13 satisfy E(3) -equivariance. The fast-folding
proteins dataset includes 12 structurally diverse proteins, such
as Chignolin, Trp-Cage, and BBA. The simulations were
conducted in explicit solvent, with frame spacing ranging
from 100 μs to 1 ms .
Methods: As a multi-channel version of EGNN [5], GMN
[51] focuses specifically on the physical dynamics by
considering the geometric constraints (such as chemical
bonds) between atoms and achieves promising results on the
MD simulation task in MD17. EGHN [65] develops an
equivariant version of UNet [319] equipped with equivariant
pooling/unpooling layers to better reveal the hierarchy of large
molecules such as proteins, leading to state-of-the-art
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
performance on AdK dataset. NequIP [127] learns interatomic
potentials and forces using high-order geometric tensors and
E(3) -equivariant convolution layers, achieving high data
efficiency and quantum chemical level accuracy for MD17.
By observing that GMN and other related geometric GNN
methods only learn constant integration of the velocity,
Newton–Cotes GNN [114] predicts the integration based on
several velocity estimations with Newton–Cotes formulas and
proves its effectiveness theoretically and empirically. ESTAG
[31] reformulates dynamics simulation as a spatio-temporal
prediction task by employing the trajectory in the past period
to recover the Non-Markovian interactions. EGNO [127]
models the MD trajectory as a function over time using neural
operators. SEGNO [315] leverages the second-order
continuity information to enhance the performance of
GeoTDM [130] further leverages the diffusion model to
perform trajectory generation on molecular dynamics.
Considering the uncertainty of molecular dynamics at the
quantum scale, some methods aim to fit the equilibrium
distribution of molecules rather than predicting a single
molecular conformation. By leveraging the continuous
normalizing flows, E-CNF [126] predicts SE(3) -equivariant
molecular conformers through the invariant CoM prior density
and equivariant vector fields, showing better generation
capabilities compared to invariant flows. Later, E-ACF [129]
employs the augmented normalizing flow [320] to learn the
target distribution of molecules from MD trajectories, which
retains SE(3) -equivariance by projecting the atomic Cartesian
coordinates into the SE(3) -invariant vector space.
Furthermore, ITO [128] utilizes the score matching diffusion
model for stochastic dynamics across multiple time-scales,
with extended SE(3) -equivariant PaiNN architecture [321],
showcasing considerable generalization ability for different
molecular scales.
5.2.3 Molecular generation
Molecule generation plays a central role in drug discovery and
material design. Its goal is to generate novel molecules with
properties of interest by using machine learning.
Task definition: Basically, the methods for molecular
⃗
generation learn a parametric probability distribution pθ (G)
⃗ i } . A novel molecular
from an observed dataset D:={G
geometric graph is then sampled from the learned distribution:
⃗ ∼ pθ (G).
⃗
G
(58)
Instead of generating a whole geometric graph (namely de
novo generation), there are part of methods investigating the
conditional generalization paradigm by generating the 3D
⃗ given the 2D topological graph G(H, A) ,
coordinates X
forming the so-called conformation generation problem
⃗ ∼ pθ ( X
⃗ | H, A) .
X
⃗ should
Symmetry preserved: The generative model pθ (G)
⃗ = pθ (G),
⃗ ∀g ∈ E(3) . This is to
be E( 3 )-invariant, i.e., pθ (g · G)
ensure that the probability distribution is unaffected by the
specific choice of the coordinate system to describe a
⃗ is
molecule. In some methods as presented latter, pθ (G)
⃗
⃗
marginalized from a joint distribution pθ (G, G(0) ) =
17
⃗|G
⃗ (0) )p(G
⃗ (0) ) where p(G
⃗ (0) ) denotes a certain initial
p θ (G
⃗ (0) )
distribution. In this scenario, the initial distribution p(G
should be E( 3 )-invariant and the likelihood distribution
⃗|G
⃗ (0) ) should be E( 3 )-equivariant, to guarantee the E( 3 )p θ (G
⃗ [143].
invariance of pθ (G)
Datasets: QM9 [21] and GEOM [275] are two prevailing
datasets used for molecular generation. In particular, QM9
consisting of about 134K organic molecules contains the
molecular 3D structures (e.g., the coordinates of each atom in
3D space) and a wide range of chemical properties for each
molecule. GEOM is a comprehensive dataset containing over
37 million molecular conformations, offering diverse
conformation ensembles for each 2D molecular structure.
Methods: Current methods can be divided into two classes,
namely, conformation generation and de novo generation.
Conformation generation is to generate 3D conformation
given the 2D graph representation. Traditional methods [321]
focus on the two-stage strategy: first predicting distances and
then reconstructing coordinates, which yet could lead to
unrealistic structures if the predicted distances are invalid. To
avoid this issue, ConfVAE [135] reformulates the generation
task as a bilevel optimization problem under the framework of
VAE [322], where the distance prediction and conformation
generation are optimized jointly in an end-to-end manner. At
the same time, ConfGF [136] estimates the gradient fields of
inter-atomic distances by using denoising score matching, and
then samples the conformations via annealed Langevin
dynamics. Later, DGSM [141] further extends ConfGF by
modeling long-range interactions between non-bond atoms
additionally. Instead of optimizing force field expensively,
GeoMol [144] predicts the local 3D geometries including
bond distances and torsion angles simultaneously in an SE(3)invariant way. Without predicting intermediate values like
inter-atomic distances, DMCG [147] generates the 3D atomic
coordinates by iteratively refining the initial coordinate
predictions while accounting for invariance through its
designed loss function. Due to the success of diffusion
models, GeoDiff [133] leverages graph field network to learn
SE(3)-invariant distribution, and Torsional Diffusion [30]
operates in torsion angle space rather than in Euclidean space.
As for de novo generation, a series methods have been
proposed thanks to the fruitful progress of generative models
[323]. Built upon Schnet [47], G-SchNet [137] introduces an
autoregressive model to directly generate 3D molecular
structures, while maintaining physical constraints. cG-SchNet
[138] further extends G-SchNet to property-guided generation.
Leveraging the generative capabilities of flow models, E-NFs
[142] reformulates generation as the task of solving a
continuous-time ODE, where the dynamics are predicted by
EGNN [5]. By harnessing the power of diffusion, EDM [143]
exploits E(3) equivariance by employing EGNN [5] to
enhance the diffusion process across both continuous and
discrete features. GeoLDM [134] further maps the geometric
features into the latent space where latent diffusion is
performed. Rooted in EDM, EEGSDE [146] formulates the
generation process as an equivariant SDE and employs a
meticulously designed energy function to guide the
18
Front. Comput. Sci., 2025, 19(11): 1911375
generation. Recently, MDM [139] takes into account interatomic forces at varying distances (e.g., van der Waals forces),
and injects variational noises to enhance performance for large
molecules and improve generation diversity. To address atombond inconsistency problem, MolDiff [140] introduces a joint
atom-bond diffusion framework and bond guidance to make
sure atoms are better suited for bonding. HierDiff [148] adopts
a hierarchical diffusion which first generates the coarse
positions of molecular fragments and then fills in the finegrained atomic geometry. EQUIFM [149] further explores de
novo generation with flow matching, utilizing different
probability paths for atom type and structure generation.
5.2.4 Molecular pretraining
Given that molecular labeling is expensive to obtain,
pretraining molecular representation models without labels
becomes fundamental and indispensable in real applications.
These pretrained models can then be directly transferred or
fine-tuned for specific downstream tasks, such as predicting
binding affinity and molecular stability, thereby alleviating
data scarcity and improving training efficiency. Previous
research primarily focused on pretraining models utilizing
non-geometric information, including SMILES notations
[324], chemical graphs [325], functional groups [326], etc.
Recently, there has been a growing interest in self-supervised
pretraining on the 3D geometric structure of molecules.
⃗ to be the representation
Task definition: Suppose ϕθ (G)
(
)
⃗ ϕθ (G)
⃗ to be the self-supervised training
model, and L ŷ(G),
⃗ denotes the pseudo label created based
objective where ŷ(G)
⃗ . The representation model is optimized
on the structure of G
to minimize the self-supervised objective as
(
)
⃗ ϕθ (G)
⃗ .
θ = arg min L ŷ(G),
(59)
θ
⃗ is
Symmetry preserved: The representation model ϕθ (G)
⃗ is a steerable vector, and is E( 3 )E( 3 )-equivariant if ŷ(G)
⃗
invariant if ŷ(G) consists of scalars.
Datasets: PCQM4Mv2 [327] is a comprehensive quantum
chemistry dataset consisting of 3.37 million molecules derived
from the OGB benchmark, which was originally curated as
part of the PubChemQC project [328]. QM9 [21] is another
popular dataset that encompasses quantum chemistry
structures and properties, featuring 134K molecules.
QMugs[277] expands QM9 by offering a more extensive
collection of drug-like molecules, totaling 665K molecules.
GEOM [275] is an energy-annotated molecular conformation
dataset containing 37 million molecular conformations
sourced from multiple datasets, such as QM9 and CREST
program [329]. Uni-Mol [159] constructs a conformation
dataset containing 19 million molecules. It utilizes ETKGD
with Merck Molecular Force Field optimization in RDKit to
generate 11 random conformations for each molecule,
resulting in a total of 209 million conformations.
Methods: A variety of studies investigate the denoising
objective, pretraining the model by recovering the original
signal from a perturbed input. Specifically, GeoSSL-DDM
[154] formulates the denoising objective based on atomic
distance. Uni-Mol [159] proposes position denoising and joint
training between 3D molecular conformations and candidate
protein binding pockets. GNS-TAT [156] establishes a
connection between coordinate denoising and the potential
energy of molecular conformations. MGMAE [157] proposes
a reconstruction strategy to train on the heterogeneous atombond graph with a high mask ratio. 3D-EMGP [153] further
proposes to predict the atomic pseudo force field which is
estimated by an Riemann-Gaussian denoising distribution to
ensure E(3) -invariant pretraining loss. Apart from the
denoising objective, GraphMVP [155] leverages the
correlation between 2D molecular graphs and 3D
conformations, constructing a contrastive objective for the
model pretraining. Similar to GraphMVP, Transformer-M
[160] leverages positional encodings and attention biases to
encode the 2D and 3D structures in one Transformer model.
Meanwhile, 3D-Infomax [158] exploits this correspondence
by attempting to maximize the mutual information between
2D molecular graph embeddings and learned representations
of the corresponding 3D graphs. MoleculeSDE [161] extends
3D-Infomax [158] and leverages group symmetric stochastic
differential equation models to establish a connection between
3D geometries and 2D topologies, with a tighter MI bound.
Frad [163] decomposes molecules into fragments to fix the
rigid parts and pretrains the model via denoising on the
flexible parts. SliDe [162] explores pretraining with denoising
from a distribution that encodes physical principles.
DenoiseVAE [164] utilizes a learnable noise generation
strategy to adaptively acquire atom-specific noise distributions
for different molecules, which results in more accurate force
field learning.
5.3 Tasks on proteins
Proteins are large biomolecules that are composed of one or
more long chains of amino acid residues. All proteinogenic
amino acids share common structural features, including an α carbon to which an amino group, a carboxyl group, and a
variable side chain are bonded. Most proteins fold into unique
3D structures that determine the function and activity of
proteins in biological processes. Owing to the hierarchical
structures of proteins, there are mainly two different ways to
⃗ to represent proteins. For one
leverage geometric graph G
thing, we can treat each residue as a node, the positions of α ⃗ and the residue-level
carbons as the coordinate matrix X
features as H . For another thing, we can apply the full-atom
setting by considering each atom as a node, the positions of all
⃗ and atom-level features as H . In both ways, the
atoms as X
edges can be created via either the chemical bonds or cut-off
distances. There are plenty of works that develop machine
learning methods to process proteins. While some of them
focus on 1D residue sequences, this survey is mainly
interested in the study of 3D structures and will demonstrate
several relevant tasks in the following.
5.3.1 Protein property prediction
Similar to molecular property prediction, protein property
prediction is a crucial E(3) -invariant task in computational
biology. Most previous works solely employ residue
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
sequences to predict protein properties. Thanks to the
development of geometric structure modeling, more and more
attentions are paid to using geometric GNNs to estimate the
functional property of proteins via exploring 3D structures. In
terms of the prediction granularity, the task of protein property
prediction is classified as protein-level, residue-level and
atom-level prediction, with the details provided below.
Protein-level prediction: Many tasks aim to predict the
functions or certain scores given the protein structure.
(1) Enzyme Commission (EC) number prediction [167] is a
prevailing protein-level classification task which aims to
predict the catalyzed reaction class of the given enzyme.
(2) Gene Ontology (GO) term prediction [167] seeks to
predict the functional classes concerning gene ontology given
the protein structure, whose data is usually split into three
tracks: molecular function (MF), biological process (BP), and
cellular component (CC). (3) Protein structure ranking
learns a quality score function of the given protein structure to
estimate the structural similarity between the candidate protein
and the native structure. It plays a vital role in computational
biology, as it assists researchers in pinpointing the most
accurate or biologically significant protein conformations
from a collection of potential structures. (4) Protein
localization prediction targets at forecasting the subcellular
locations of proteins [289], which is essential to understand
the function of a protein and helps investigate the pathogenesis
of many human diseases [330]. (5) Fitness landscape
prediction primarily focuses on the prediction of the effects of
residue mutations on the fitness of proteins. Typical target
functions include β -lactamase [292], Adeno-Associated Virus
(AAV), Thermostability [331], and Fluorescence and Stability
[285].
Abundant protein-level representation models are available
in existing literature. DeepFRI [167] and LM-GVP [166]
propose a two-stage architecture, which adopts language
models to extract amino acid sequence information and graphbased model to learn the interactions between amino acids
simultaneously. Notably, LM-GVP utilizes equivariant model
GVP [64] as the graph-based model. GearNet [168] proposes a
relational graph convolution layer to better capture the 3D
geometry of proteins, and exploits multi-view contrastive
pretraining to better utilize unlabeled data. As for structure
ranking, TM-Align [170] is a typical but not DL-based
method, which is time-consuming. Thanks to the expressive
ability of geometric GNN, [64,172,173] adopt equivariant
GNN models such as GVP [64] and TFN [7] to fulfill model
quality assessment (MQA). In addition, TFN [7] is also used
for ranking protein-protein complex in PAUL [171].
Residue-level prediction: Atom3D [269] proposes Residue
Identity (RES) prediction, which aims to predict the amino
acid types at the center of a given local context. The
performance on this task measures whether a model can
capture the structural dependencies between individual amino
acids, which is vital for protein engineering.
Atom-level prediction: The main form of atom-level
prediction lies in pocket detection, which requires predicting
whether an atom on the protein belongs to the binding site in
terms of a potential ligand. Previous methods usually design
19
algorithms to find and rank the cavities on the protein surface
[332,333], or voxelize the protein structure and use 3D-CNN
for supervised training [334,335]. Notably, a series of works
are exploiting the geometric GNNs to achieve much better
performance (ScanNet [174], EquiPocket [175], and
PocketMiner [176]).
5.3.2 Protein generation
In terms of what to generate, the approaches for protein
generation are categorized into protein folding (or protein
structure prediction), protein inverse folding, and protein
structure and sequence co-design.
Protein folding aims to generate folding structures given
the amino acid sequences of the input protein. This task has
significant implications in the field of drug design. The
folding structure is generated by:
⃗ ∼ pθ ( X
⃗ | s),
X
(60)
N
s
∈
R
where
denotes the amino acid sequence based on the
⃗ ∈ RN×3 (note that each row of X
⃗
coordinates of all residues X
can include more than one 3D coordinate vector if full-atom
coordinates are considered).
Symmetry preserved: This is an equivariant task, implying
⃗ | s) = pθ ( XO
⃗ + ⃗t | s) for an arbitrary orthogonal
that pθ ( X
transformation O and translation ⃗t . Notably, some methods
⃗,
generate the distance matrix or other invariant forms of X
reducing the task into a trivial generation problem without the
equivariance constraint.
Methods: The AlphaFold series [33,183] and RoseTTAFold
series [12,48] represent the forefront of contemporary
techniques in protein folding. They employ a sophisticated
multi-track architecture capable of processing multi-sequence
alignments (MSA), amino acid pair-wise distance maps, and
geometric structures, each with remarkable efficiency.
Building upon these advancements, RoseTTAFold2 [48]
extends the capabilities of both AlphaFold2 [183] and
RoseTTAFold [12] by refining the attention mechanism and
enhancing the three-track architecture, resulting in notable
performance improvements. Moreover, RFAA [184] further
extends RoseTTAFold’s versatility to encompass the design of
various biomolecules beyond proteins, including nucleic acids,
small molecules, and metals. In contrast, ESMFold [336] and
HelixFold-Single [187] represent a departure from traditional
methods by eschewing the requirement for MSA. Instead, it
learns to predict protein structures directly from primary
sequence data, significantly enhancing inference efficiency.
Additionally, EigenFold [185] introduces a novel harmonic
diffusion process that projects protein structures onto
eigenmodes, thereby preventing the disassembly of adjacent
nodes.
Protein inverse folding aims to generate amino acid
sequences conditional on the folding structures of the input
protein. Using the same denotations as the task of protein
folding, the model pθ generates the amino acid sequence
s ∈ RN of interest:
⃗
s ∼ pθ (s | X).
(61)
Symmetry preserved: This is an invariant task, indicating
20
Front. Comput. Sci., 2025, 19(11): 1911375
⃗ = pθ (s | XO
⃗ + ⃗t) for an arbitrary orthogonal
that pθ (s | X)
transformation O and translation ⃗t .
Methods: Typical methods such as [177] and [178] take the
invariant features including distance and dihedral angles as
input, to ensure invariance during generation. More recently,
based on GVP [64] that is E(3) -equivariant, ESM-IF [85]
further incorporates more structure information for the
generation, while keeping the output sequence invariant.
Similarly, LM-Design [181] integrates structural embedding
into language models to improve the performance of inverse
folding. ProteinMPNN [179] uses an invariant architecture to
embed its backbone and predicts amino acid probabilities
autoregressively while enforcing desired constraints. PiFold
[180] additionally incorporate distance, angle, and direction
features and proposes PiGNN to non-autoregressively
generate the sequences. KW-Design [182] integrates
knowledge from pretrained sequence and structure models to
refine the sequences generated by the baselines with a memory
retrieval mechanism.
Protein structure and sequence co-design aims to
generate both the amino acid sequences and folding structures,
which is formally derived as:
⃗ s ∼ pθ ( X,
⃗ s).
X,
(62)
Symmetry preserved: Clearly, this task is invariant with
⃗.
respect to s , and equivariant with respect to X
Methods: Based on RoseTTAFold [12], RFdiffusion [13]
incorporates Gaussian noise into coordinates and Brownian
motion noise into orientations, subsequently denoises the
structure step-by-step and recovers sequence using
ProteinMPNN [179]. Meanwhile, Chroma [14] introduces a
revolutionary
programmable
diffusion
framework,
empowering diverse conditional generation and precise
targeting of properties through constraints such as symmetry,
shape, and semantics. Both Chroma and RFDiffusion begin
with structure generation and then conduct the subsequent
sampling of the corresponding sequence through another
module. Unlike these two works, PROTSEED [188] designs
the structure and sequence jointly by an encoder-decoder
framework, where the encoder is trigonometry-aware to learn
context features and the decoder is SE(3)-equivariant to
express the sequence and structure.
Datasets: ATOM3D [269] constructs multiple widely-used
datasets tailored for protein design tasks. CASP [293] stands
out as a renowned contest dedicated to protein structure
prediction. In this competition, participants submit predicted
structures for evaluation, particularly when the experimental
structures are not publicly available. The community then
assesses the quality of these submissions. Additionally,
AlphaFoldDB [286], SCOPe [282], and CATH [280] serve as
valuable resources for protein design, providing datasets
comprising protein structures alongside their corresponding
sequences. SCOPe and CATH consist of segmented protein
structure domains, while AlphaFoldDB boasts a repository of
over 200 million complete structures predicted by AlphaFold2
[183]. Moreover, with predictions stemming from ESMFold
[336], the ESM Metagenomic Atlas boasts a collection of
about 772 million metagenomic protein structures.
5.3.3 Protein pretraining
Similar to molecule pretraining task, protein pretraining also
aims to learn representations of protein, which can be used in
downstream tasks.
Task definition: Generally, each input protein is modeled as
⃗ and the pretraining purpose is to learn a
a geometric graph G
parametric model ϕθ which can output high-quality
representations H ∈ RN×d of the input protein:
⃗
H = ϕθ (G).
(63)
Symmetry preserved: It is equivariant for the output vectors
in H , and invariant for the output scalars in H .
Datasets: For protein sequence pretraining methods, UniProt
[288] functions as a central repository for both protein
sequence and functional information. It is organized into
clusters by UniRef [337], with pairwise sequence identity
thresholds typically set at 50% and 100% (referred to as
UniRef50 and UniRef100) to eliminate redundancy. BFD
[290], on the other hand, represents a larger sequence dataset,
formed by amalgamating UniProt with protein sequences
sourced from metagenomic sequencing projects. Furthermore,
NetSurfP-2.0 [291] furnishes labels for protein secondary
structure prediction, delineated into 3-states and 8-states,
offering valuable resources for supervised training. In the
realm of protein structure pretraining and classification,
SCOPe [282], CATH [280], and AlphaFoldDB [286] hold
significant importance. They provide comprehensive
repositories for protein structures, facilitating research and
advancement in the field.
Methods: Previous protein pretraining methods such as
ESM-1b [338], ESM2 [336], ProtTrans [190], xTrimoPGLM
[191], and ProtGPT2 [192], are based on sequence masking
and prediction, inspired by the success of NLP language
models. Readers can refer to the survey by [339] for more
introductions of protein language models. Recent attentions
have been paid to pretrained models based on the 3D structure
information. For instance, GearNet [168] built upon an
invariant GNN with multi-type message passing leverages
several pretraining objectives including contrastive learning
between sequences and structures, distance/dihedral
prediction, and residue type prediction. Other works like
ProFSA [194] and DrugCLIP [196] also utilize contrastive
learning to learn SE(3)-invariant features, but focusing more
on pocket pretraining, where the pocket-ligand interaction
knowledge is incorporated as well. Guo et al. [198] employs
pretraining with the protein’s tertiary structure, incorporating
SE(3)-invariant features to ensure the efficient preservation of
SE(3)-equivariance. PAAG [199] enables multi-level
alignment between protein sequence and textual annotation to
capture the fine-grained motif inside the protein and
successfully designs proteins with functional domains.
5.4 Tasks on Mol+Mol
This subsection introduces the tasks with the input of
“molecule+molecule”, including liker design and chemical
reaction prediction.
5.4.1 Linker design
Fragment-based molecule design requires to predict the linker,
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
a small molecule, so that two or more molecular components
can be combined into novel molecules with desirable
properties. Linkers are of great importance in maintaining the
proper orientation, flexibility, and stability of multi-domain
proteins or fusion proteins.
Task definition: The input consists of two or more unlinked
molecular fragments, which are all represented as geometric
⃗ i }k , and the model needs to learn an equivariant
graphs {G
i=1
⃗ L used to link
function fθ whose output is a small molecule G
the fragments. Specifically,
⃗1, G
⃗2, . . . , G
⃗ k ).
⃗ L , H L = f θ (G
X
(64)
Symmetry preserved: If we impose rotation or translation
operations on the input fragments simultaneously, the output
coordinates should transform correspondingly while the atom
features keep invariant.
Datasets: The linkers connecting molecules in ZINC [295]
can be computationally synthesized, similar to the methods
employed by [340]. Conversely, CASF [296] offers
experimentally validated molecules for linker design. In
contrast to ZINC and CASF, which typically produce paired
fragments, DiffLinker [200] generates a novel dataset
comprising three or more fragments, drawing from GEOM
[275].
Methods: DeLinker [201] and 3DLinker [28] employ VAE
[322] to create the 3D structure of a linker. However, their
capability is limited to linking only two fragments, rendering
them ineffective when faced with an arbitrary number of
fragments to link. In contrast, DiffLinker [200] has recently
succeeded in addressing this challenge by harnessing an E( 3 )equivariant diffusion model configured to handle multiple
fragments.
5.4.2 Chemical reaction prediction
In chemical reactions, identifying and characterizing transition
state (TS) structures is crucial for understanding reaction
mechanisms. This process entails locating the TS structure that
minimizes the system’s potential energy (PE) while adhering
to specific constraints, such as SE(3) invariance.
⃗ R and a product G
⃗ P , the
Task definition: Given a reactant G
⃗ TS that optimizes
objective is to generate the TS structure G
the following objective:
⃗ ∗ = argmin PE(G
⃗ TS |G
⃗R, G
⃗ P ),
G
TS
(65)
⃗ TS
G
where the function PE(·) returns the potential energy.
Symmetry preserved: In general, the output, namely, the TS
structure is invariant to any independent transformation (e.g.,
rotation) imposed to each of the input structure. If the input
and output are always fixed within the same 3D coordinate
space, then this task is equivariant, namely, imposing the same
transformation to the two input structures, the output TS is
transformed in the same way.
Datasets: TSNet [203] has meticulously assembled a dataset
called SN 2-TS , which contains structures of reactants,
transition states (TS), and products pertinent to SN 2 reactions.
Transition1x [297] provides a resource of 9.6 million density
functional theory (DFT) calculations encompassing forces and
21
energies for molecular configurations across reaction
pathways. This extensive dataset offers valuable information
for training models for reaction prediction.
Methods: OA-ReactDiff [202] introduces a diffusion model
to generate transition state (TS) structures. This model ensures
SE(3)-equivariance of the score function by constructing local
frames. Moreover, the equivariant backbone model is adapted
to accommodate multiple objects. On the other hand, TSNet
[203] employs the equivariant graph neural network (GNN)
model TFN [7] to predict TS structures. Initially, TFN is
pretrained on extensive chemical data, such as QM9 [21], to
learn useful representations. It is then fine-tuned specifically
for the task of predicting transition structures.
5.5 Tasks on mol+protein
The “molecule+protein” tasks are well explored, such as
ligand binding affinity prediction, protein-ligand docking, and
pocket-based molecule sampling.
5.5.1 Ligand binding affinity prediction
The task of predicting ligand binding affinity revolves around
estimating the interaction strength between a protein (receptor)
and a small molecule (ligand) [205]. Accurate predictions in
this area offer significant advantages for designing and
refining drug candidates. Additionally, they aid in prioritizing
compounds for experimental evaluation, thereby streamlining
the drug discovery process.
Task definition: With both the molecule and protein
⃗m, G
⃗ p , the task aims to learn an
regarded as geometric graph G
efficient predictor ϕθ , which can predict the binding strength s
accurately:
⃗ p, G
⃗ m ).
s = ϕθ (G
(66)
Symmetry preserved: It is obvious that the binding affinity
will not change under any transformation.
Datasets: CrossDocked2020 [298] contains over 22 million
posed ligand-receptor complexes and the corresponding
binding affinity values, which are generated by docking
ligands into multiple receptor structures from the same
binding pocket. PDBbind [22] provides accurate and reliable
binding affinity data, allowing researchers to assess how well
computational methods can predict the strength of binding
between proteins and ligands.
Methods: MaSIF [204] utilizes geodesic space to represent
the protein surface, assigns geometric and chemical features to
patches, and employs rotation invariance to process these
features, facilitating predictions of protein-ligand interactions.
ProtNet [206] considers 3D protein presentations at various
levels (e.g., amino-acid level, backbone level, and all-atoms
level) to accomplish affinity prediction tasks. GET [205]
extends this concept by unifying different levels universally
for both molecule and protein representations. TargetDiff [29]
introduces a diffusion process that gradually adds noise to
coordinates and atom types. This process, guided by an SE(3)equivariant graph neural network (GNN), incorporates binding
free energy terms to steer generation towards high-affinity
poses. HGIN [207] constructs a hierarchical invariant graph
model to predict changes in binding affinity resulting from
22
Front. Comput. Sci., 2025, 19(11): 1911375
protein mutations. BindNet [208] designs two pretraining
tasks utilizing Uni-Mol [159] as the encoder to jointly learn
protein and ligand interactions.
5.5.2 Protein-ligand docking
This task works towards predicting the transformation, e.g.,
rotation and translation, imposed on protein and molecules so
that they can dock together with the minimum root-meansquare-deviation.
Task definition: Without loss of generality, we assume that
the protein remains fixed while the position of the molecule
⃗ p :=( X
⃗ p , H p ) and the
transforms. By denoting the protein as G
⃗ m :=( X
⃗ m , Hm ) , respectively, the model needs to
molecule as G
learn a prediction function ϕθ that outputs the rotation matrix
and translation vector (i.e., R, ⃗t ) by
⃗ p, Hp; X
⃗ m , Hm ).
R, ⃗t = ϕθ ( X
(67)
With the predicted rotation R and translation ⃗t , we can dock
the molecule towards the fixed protein.
Symmetry preserved: To make the final docked complex to
be SE( 3 )-equivariant, the predictor ϕ is supposed to meet the
following independent SE( 3 ) constrains [211]:
R′ =Qm RQ⊤p , ⃗t ′ = Qm⃗t − Qm RQ⊤p ⃗t p + ⃗t m ,
∀Q p , Qm ∈ SO(3), ⃗t p , ⃗t m ∈ R3 ,
(68)
where
are the predicted rotation matrix and translation
vector after transforming the protein and the molecule,
⃗ p Q p + ⃗t p , H p ; X
⃗ m Qm + ⃗t m , Hm ) .
namely, R′ , ⃗t ′ = ϕθ ( X
Datasets: PDBbind [22] stands out as the predominant
dataset for Protein-Protein Docking, housing over 22 million
poses resulting from the docking of ligands into their
respective receptor structures. Typically, current methods
segment the dataset based on chronological order, leveraging
this organization for training and evaluation purposes.
Methods: EquiBind [211] and TankBind [212] have tackled
the blind binding problem by leveraging equivariant graph
neural networks. TankBind additionally introduces
trigonometry constraints to enhance compound rationality. To
further enhance performance, DiffDock [16] proposes a
diffusion process operating across three groups (T(3), SO(3),
and SO(2)). In contrast, DESERT [213] offers a unique
approach by initially outlining pocket shapes and then
generating molecule structures to bind these pockets. This
method alleviates the scarcity of experimental binding data
and is not reliant on predefined pocket-drug pairs. Recently,
FABind [214] designs geometry-aware GNN layers and
efficient interaction modules (e.g., interfacial message
passing) to unify pocket prediction and the docking stage,
which leads to fast and accurate prediction. Further, Re-Dock
[215] explores flexible docking by considering the gap
between apo and holo conformations of the target protein,
which enhances the practical utility.
R′ , ⃗t ′
5.5.3 Pocket-based mol sampling
The technique of pocket-based molecular sampling aims at
generating small molecules that have the potential to bind to a
particular pocket on a protein or other biomolecular target.
Task definition: This target-aware design resorts to learn a
⃗ m that
generation model pθ whose output is a new molecule G
⃗p:
can bind to a specific pocket G
⃗ m ∼ pθ (G
⃗m | G
⃗ p ).
G
(69)
Symmetry preserved: It is an equivariant problem, implying
⃗m | G
⃗ p ) = pθ (g · G
⃗m | g · G
⃗ p ) for any transformation g
that pθ (G
of interest.
Datasets: CrossDocked2020 [298] serves as a substantial
resource for sampling molecules based on docking pockets,
containing approximately 22.5 million docked protein-ligand
pairs.
Methods: Pocket2Mol [216], GraphBP [219], SBDD [218],
and FLAG [220] adopt an autoregressive approach to generate
molecules conditioned on binding sites, operating at the
granularity of atoms or motifs. In contrast, TargetDiff [29]
with a series following diffusion-based methods
[152,217,220,223,341] diverges from this method by utilizing
3D equivariant diffusion in a non-autoregressive fashion. This
approach enables the generation of all atoms simultaneously,
resulting in higher efficiency. DESERT [213] further explores
to first sketch the shape of the molecule according to the
pocket, and then generates a molecule fitting in the shape.
D3FG [221] leverages a fragment-based diffusion to enhance
the generative performance by decomposing molecules into
functional groups and linkers.
5.6 Tasks on protein+protein
The “protein+protein” tasks include protein interface
prediction, protein-protein binding affinity prediction, proteinprotein docking, antibody design that considers specifically
the interaction between antibodies and antigens, and peptide
design that aims at generating target-specific peptide.
5.6.1 Protein interface prediction
Biological processes often depend on interactions between
biomolecules. This creates a need for predicting proteinprotein interfaces, which involves identifying the regions on a
protein’s surface that are likely to participate in interactions
with other proteins.
Task definition: With the protein pair taken as two
⃗1, G
⃗ 2 , this task requires to learn a predictor
geometric graphs G
ϕθ that determines if the atoms on the protein belong to the
interface. The output are interpreted as the atomic probabilities
p ∈ RN1 +N2 of being located on the interface:
⃗1, G
⃗ 2 ).
p = ϕθ ( G
(70)
Symmetry preserved: Once the interaction proteins are
selected, the atoms in the interface are deterministic no matter
the rigid transformations on each partner, resulting in an
invariant problem with respect to each protein:
⃗1, G
⃗ 2 ) = ϕ(g1 · G
⃗ 1 , g2 · G
⃗ 2 ), ∀g1 , g2 ∈ SE(3).
ϕ(G
(71)
Methods: The methods dMaSIF [225] and SASNet [226]
operate via three-dimensional convolution on the protein 3D
structures to keep rotation-invariance. Moreover, fed with
more structure features such as distance, orientation and amide
angle, DeepInteract [224] adopts geometric transformer and
achieves competitive performance as well.
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
5.6.2 Binding affinity prediction
Protein-protein interactions are fundamental to bio-molecular
activity and are crucial for many key functions in biological
processes. Estimating the binding affinity between proteins
not only aids in gaining a deeper understanding of protein
mechanisms of action but also serves as the cornerstone for
designing proteins with specific functions, such as highly
specific antibodies and high-affinity ligands.
Task definition: Given a pair of proteins that can be
⃗1, G
⃗ 2 , this task requires
considered as geometric graphs G
learning a predictive function ϕθ , which can efficiently and
accurately predict the binding strength s between the pair of
proteins:
⃗1, G
⃗ 2 ).
s = ϕθ (G
(72)
Symmetry preserved: This is an invariant task because the
binding strength s remains unchanged under any translations
or rotations applied to the pair of proteins.
Datasets: PDBbind [342] dataset constitutes an assembly of
complex structures, meticulously sourced from the Protein
Data Bank (PDB), accompanied by binding affinities that have
been quantified through rigorous experimental methods.
Protein-Protein Affinity Benchmark Version 2 [302,343]
encompasses a repertoire of 176 variegated protein-protein
complexes, each accompanied by detailed affinity annotations.
SKEMPI (Structural database of Kinetics and Energetics of
Mutant Protein Interactions) [344] constitutes a curated
database that delineates alterations in binding affinities and
kinetic parameters consequent to mutagenesis. SKEMPI 2.0
[303] represents the refined and augmented edition of the
original SKEMPI database.
Methods: mmCSM-PPI [227] presents a binding affinity
prediction method employing graph-based signatures that
encapsulate protein structure’s physico-chemical and
geometric properties, augmented with complementary features
to reflect various mechanisms. The Extra Trees model, trained
with graph-based signatures and complementary features,
yields promising results on the SKEMPI 2.0 dataset. GeoPPI
[228] utilizes the 3D conformations to ascertain a geometric
representation that embodies the topological features of the
protein structure through a self-supervised learning approach.
Subsequently, these representations serve as inputs for
gradient-boosting trees, facilitating the prediction of the
variations in protein-protein binding affinity due to mutations.
GET [205] introduces a bilevel design that ensures
equivariance while unifying representations across different
levels. GET achieves state-of-the-art performance in PDB
dataset.
5.6.3 Protein-protein docking
We have investigated docking pose prediction between protein
and molecule in Section 5.5.2. Here, we study the similar
problem between protein and protein.
Task definition: Assuming two proteins to be denoted as
⃗
⃗2 = (X
⃗ 1 , H1 ), G
⃗ 2 , H2 ) , respectively, the model needs to
G1 = ( X
learn a prediction function ϕθ to output the rotation matrix and
translation vector (i.e., R, ⃗t ) by
⃗ 1 , H1 ; X
⃗ 2 , H2 ).
R, ⃗t = ϕθ ( X
(73)
23
Symmetry preserved: This is identical to Eq. (68).
Methods: Equidock [229] uses SE(3)-equivariant graph
neural networks and optimal transport techniques to predict
the transformation by aligning key points. HMR [230] casts
this task from 3D Euclidean space to 2D Riemannian
manifold, keeping rotational invariant. DiffDock-PP [232]
extends DiffDock [16], a diffusion generative model, to
protein docking task and yields the state-of-the-art
performance. Furthermore, in dMaSIF [235], an energy-based,
SE(3)-equivariant model combined with physical priors is
adopted to infer docking regions. Treating docking as an
optimization problem, EBMDock [237] employs geometric
deep learning to extract features from protein residues and
learns distance distributions between the residues involved in
interfaces. Multimetric protein docking can be tackled by
AlphaFold-Multimer [234] and SyNDock [233]. Recently,
ElliDock [236] predicts SE(3) -equivariant elliptic paraboloids
as the binding interface for protein pairs, and transfers the
rigid protein-protein docking task into surface fitting while
ensuring the same degree of freedom. There are also several
works targeting at antibody-antigen docking, a subfield of
protein docking. For instance, HSRN [231] proposes a
hierarchical framework to handle docking in an iterative
manner. By harnessing the capabilities of tFold-Ab [244] and
AlphaFold2 [183], tFold-Ag [244] generates antibody/antigen
features and employs a docking module to predict complex
structures with flexibility.
5.6.4 Antibody design
Antibodies are Y-shaped symmetric proteins produced by the
immune system that recognize and bind to specific antigens.
The design of antibodies mainly focuses on the variable
domains consisting of a heavy chain and a light chain, with 3
Complementarity-Determining Regions (CDRs) and 4
framework regions interleaving on each chain. The 6 CDRs
largely determine the binding specificity and affinity of the
antibodies, especially CDR-H3 (i.e., the 3rd CDR on the
heavy chain), which is the main scope of the design.
Task definition: Without loss of generality, we define the
task as a conditional variant of structure and sequence codesign. More specifically, given the geometric graphs of the
⃗ A , the heavy chain G
⃗ H , and the light chain G
⃗ L with
antigen G
the CDRs missing, the model ϕθ needs to fill in the geometric
⃗C :
graph of the CDRs of interest G
⃗ C = ϕ θ (G
⃗ A, G
⃗H, G
⃗ L ).
G
(74)
⃗C
Symmetry preserved: Apparently, the output CDRs G
should be SE( 3 )-equivariant with respect to the antigen:
⃗ C = ϕθ (g · G
⃗ A, g · G
⃗H, g · G
⃗ L ), ∀g ∈ SE(3).
g·G
(75)
Methods: Antibody is of great significance in the field of
therapeutics and biology, thus many works have dedicated to
designing antibodies with desired binding specificity and
affinity ([17,32,238–240,242,243]). RefineGNN [239]
initiates the first attempt to design CDRs on the heavy chain
only. Then MEAN [32] and DiffAb [238] extend to the
complete setting where the entire complex (i.e., the antigen,
the heavy chain and the light chain) without CDRs are given
24
Front. Comput. Sci., 2025, 19(11): 1911375
as contexts. Notably, MEAN [32] adopts GMN-like [51]
multi-channel architecture to encode the backbone atoms of
the residues, and proposes an equivariant attention mechanism
to capture interactions between different geometric
components. Progressively, MEAN is upgraded to dyMEAN
[17] which proposes a dynamic multi-channel encoder to
capture the full-atom geometry of residues and tackles a more
challenging setting where the entire structure and docking
pose of the antibody needs to be generated instead of given as
contexts. DiffAb [238] proposes a diffusion generative model
for antibody design. Similarly, AbDiffuser [243] also adopts
diffusion-based generative model, but steps forward to project
each side chain into 4 pseudo-carbon atoms to capture the fullatom geometry and handles length change by placeholders in
the sequence. ADesigner [241] proposes a cross-gate MLP to
facilitate the integration of sequences and structures. Unlike
the aforementioned approaches, AbODE [242] explores graph
PDEs for antibody design. GeoAB [245] uses torsional prior
knowledge with equivariant neural network focusing on bond
lengths, bond angles and dihedrals. RADD [246] introduce
more node features, edge features, and edge relations to
include more contextual and geometric information for
designing the CDRs. Further, [240] utilizes pretrained
antibody language models to improve the quality of sequencestructure co-design, and tFold-Ab [244] also employs a
pretrained language model (i.e., ESM-PPI), along with feature
updating (i.e., Evoformer-Single) and structure modules, to
enable efficient and accurate prediction of antibody structures
directly from sequence.
5.6.5 Peptide design
Peptide, which consists of short sequences of amino acids,
represents the intermediate modality between small molecules
and proteins, and plays a critical role in various biological
functions. Its unique position makes functional peptide design
particularly appealing for both biological research and
therapeutic applications [345,346].
Task definition: Similar to antibody design, peptide design
typically involves generating binding peptides for a given
⃗B
binding area on the target protein. Denoting the target as G
⃗ P , we can formalize the task as follows:
and the peptide as G
⃗ P = ϕθ (G
⃗ B ).
G
(76)
Symmetry preserved: Akin to antibody design, the output of
the model requires to maintain invariance in the sequence
distribution and equivariance in the structure distribution in
terms of the E(3) group.
Datasets: PepBDB [305] collects 13K protein-peptide
complexes with peptides containing fewer than 50 residues
from the Protein Data Bank [294]. [307] curates a diverse and
non-redundant dataset of 96 protein-peptide complexes, with
peptides between 4 and 25 residues, which is referred to as the
Long Non-Redundant (LNR) dataset. PepGLAD [35] further
collects 6K non-redundant protein-peptide complexes, also
featuring peptides between 4 and 25 residues, and partitions
them based on the sequence identity of the receptors for
training and validation, employing LNR as the test set.
Methods: While conventional approaches rely on empirical
energy functions to sample and optimize sequences and
structures at the residue or fragment level [347,348], recent
advances in geometric molecular design shed light on deep
generative models. HelixGAN [247] focuses on a sub-family
of peptides with α -helices. RFDiffusion [13], which is
originally designed for protein generation, also explores
supervised finetuning for target-specific peptide design.
PepGLAD [35] takes a step further by tackling sequencestructure co-design with a geometric latent diffusion model.
5.7 Tasks on other domains
We briefly review the applications on other domains such as
crystals and RNAs.
5.7.1 Crystal property prediction
In the realm of material science, the prediction of crystalline
properties stands as a cornerstone for the innovation of new
materials. Unlike molecules or proteins, which consist of a
finite number of atoms, crystals are characterized by their
periodic repetition throughout infinite 3D space. One of the
main challenges lies in capturing this unique periodicity using
geometric graph neural networks.
Task definition: The infinite crystal structure is commonly
simplified by its repeating unit, which is called a unit cell,
⃗ = ( ⃗L, X,
⃗ H are
⃗ H) , where X,
which is represented as G
coordinate matrix and feature matrix as defined before, and
the additional matrix ⃗L = [⃗l1 , ⃗l2 , ⃗l3 ]⊤ ∈ R3×3 consists of three
lattice vectors determining the periodicity of the crystal. The
task is to predict the property y ∈ R of the entire structure via
the predictor ϕθ .
⃗ H).
y = ϕθ ( ⃗L, X,
(77)
Symmetry preserved: The output of the predictor should be
invariant with respect to several types of groups: 1) E(3) ⃗ and the lattice ⃗L ;
invariance of both the coordinates X
⃗ ; 3) Cell choice
2) Periodic translation invariance of X
invariance owing to periodicity, with details referred to [259].
Datasets: Materials Project (MP) [308] and JARVIS-DFT
[312] are two commonly-used datasets. In particular, MP is an
open-access database containing more than 150K crystal
structures with several properties collected by DFT
calculation. JARVIS-DFT, part of the Joint Automated
Repository for Various Integrated Simulations (JARVIS), is
also calculated by DFT and provides more unique properties
of materials like solar-efficiency and magnetic moment.
Methods: To take the periodicity into consideration,
CGCNN [249] proposes the multi-edge graph construction to
model the interactions across the periodic boundaries.
MEGNet [250] additionally updates the global state attributes
during the message-passing procedure. ALIGNN [251]
composes two GNNs for both the atomic bond graph and its
line graph to capture the interactions among atomic triplets.
ECN [252] leverages space group symmetries into the GNNs
for more powerful expressivity. Matformer [253] utilizes selfconnecting edges to explicitly introduce the lattice matrix ⃗L
into the transformer-based framework. To utilize the large
amount of unlabeled data, Crystal Twins [254] applies two
contrastive frameworks, Barlow Twins [349] and SimSiam
Jiaqi HAN et al.
A survey of geometric graph neural networks: data structures, models and applications
[350], to pre-train the CGCNN models, and MMPT [255]
proposes a mutex mask strategy to enforce the model to learn
representations from two disjoint parts of the crystal.
5.7.2 Crystal generation
Besides predicting the invariant properties of 3D crystals, the
rapid progress of geometric graph neural networks has also
paved the way to de novo material design, whose goal is to
generate novel crystal structures beyond the existing
databases.
Task definition: Crystal generation methods commonly
integrate geometric graph neural networks into deep
generative frameworks, which aims to learn the distribution
from a given dataset, allowing to generate new crystals
through sampling from the learned distribution:
⃗L, X,
⃗ H ∼ pθ ( ⃗L, X,
⃗ H).
(78)
Symmetry preserved: Similar to the property prediction task,
the learned distribution is also required to be invariant in terms
of E(3) group and periodicity.
Datasets: CDVAE [257] collects three datasets, named
Perov-5 [309,310], Carbon-24 [311], and MP-20 [308] to
evaluate the generative models on different crystal
distributions.
Methods: CDVAE [257] incorporates a diffusion-based
decoder into a VAE-based framework, by first predicting the
lattice parameters from the latent space, and updating the atom
types and coordinates according to the predicted lattice. SyMat
[49] refines this approach by generating atom types as
permutation invariant sets and employing coordinate scorematching for the edges. DiffCSP [50], originally aiming at
predicting crystal structures from given composition, also
excels in generating structures from scratch. DiffCSP adopts
⃗ instead of the Cartesian
the fractional coordinates F = ⃗L−1 X
coordinates, and jointly generates the lattice matrix, atom
types and coordinates via a diffusion-based framework.
DiffCSP++ [258] extends DiffCSP with the conditions of
lattice families and Wyckoff coordinates to maintain the space
group constraints. Recently, MatterGen [259] further propels
the joint diffusion method, and specializes the lattice diffusion
process to be cubic-prior and rotation-fixed.
5.7.3 RNA 3D structure ranking
RNA, or ribonucleic acid, is a pivotal type of molecules that
goes beyond its traditional role as a mere intermediary
between DNA and protein synthesis. Its functionality heavily
relies on its intricate three-dimensional structure, making the
prediction and ranking of RNA’s 3D conformation crucial.
This structural complexity enables RNA to participate in gene
regulation,
cellular
communication,
and
catalysis,
underscoring its significance in fundamental life processes. As
a result, RNA stands at the forefront of molecular biology and
biotechnology research.
Task definition: Here, we refer the ranking of 3D RNA
structures to the task of identifying which structure most
accurately reflecting the RNA’s actual shape from a pool of
imprecise ones. In other words, the score model ϕθ is required
to evaluate the root-mean-square deviation (RMSD) between
each candidate 3D RNA structure represented by a geometric
25
⃗ , and the ground truth:
graph G
⃗
s = ϕθ (G).
(79)
Symmetry preserved: This is obviously an invariant task
because the RMSD value between the candidate structure and
the ground truth remains impervious to any translations or
rotations imposed on the candidate structure.
Methods: ARES [35] leverages e3nn [351] to model the 3D
structure of RNA, ensuring equivariance and invariance
during the update of atomic features. ARES then aggregates
the features of all atoms to predict the RMSD value. In
contrast, PaxNet [264] employs a two-layer multiplex graph to
model the 3D structure of RNA. One layer captures local
interactions, while the other focuses on non-local interactions.
EquiRNA [265] introduces a hierarchical equivariant graph
neural network with a size-insensitive K-nearest neighbor
sampling strategy, aimed at solving the size generalization
challenge through the reuse of nucleotide representations.
Datasets: ARES [35] uses a collection of 18K records from
the FARFAR2-Classics dataset [352] as its training and
validation sets. In addition, they have constructed two test
sets: the first test set was selected from the FARFAR2-Puzzles
dataset [352]; the second test set was curated based on certain
criteria and built using the FARFAR2 rna denovo application.
EquiRNA [265] introduces rRNAsolo, a new dataset for
assessing size generalization in RNA structure evaluation. It
covers a wider range of RNA sizes, more RNA types, and
more recent RNA structures than existing datasets.
6
Discussion and future prospect
Whilst much progress has been made in this field, there are
still a broad range of open research directions. We discuss
several examples as follows.
Geometric graph foundation model. Recent advancements
in AI research, exemplified by the remarkable progress of
models like the GPT series [353–355] and Gato [356], have
brought about substantial advantages by employing a unified
foundational model across various tasks and domains.
Foundation models diminish the necessity of manually
crafting inductive biases for individual domains, amplifies the
volume and variety of training data, and holds promise for
further enhancement with increased data, computational
resources, and model complexity. It is natural to mimic such
success to geometric domain. However, it remains an
interesting open question, especially considering the following
design spaces. 1. Task space: How to pretrain a large scale
model that is generally beneficial to various downstream
tasks? 2. Data space: How to build a foundation model that
can simultaneously extract rich information that spans across
different types or scales of the geometric data? 3. Model
space: How to truly scale the model in terms of capacity and
expressivity, such that more knowledge can be captured and
stored in the model? Although some initial works (such as
EPT [90]) manage to pretrain a unified model on small
molecules and proteins, it still lacks a universal model that can
tackle more kinds of input data and tasks.
Effective loop between model training and real-world
experimental verification. Unlike typical applications in
26
Front. Comput. Sci., 2025, 19(11): 1911375
vision and NLP, tasks in science usually require expensive
labor, computational resources, and instruments to produce
data, conduct verification, and record results. Existing
research often adopts an open-loop style, where datasets are
collected beforehand and proposed models are evaluated
offline on these datasets. However, this approach presents two
significant issues. Firstly, the constructed datasets are often
small and insufficient for training geometric GNNs, especially
for data-hungry foundational models equipped with largescale parameters. Secondly, evaluating models solely on
standalone datasets may fail to reflect feedback from the real
world, resulting in less reliable evaluation of the model’s true
ability. These issues can be effectively addressed by training
and testing geometric GNNs within a closed loop between
model prediction and experimental verification. A notable
example is provided by GNoME [357], which integrates an
end-to-end pipeline consisting of graph network training, DFT
computations, and autonomous laboratories for materials
discovery and synthesis. It is expected that such a research
paradigm will become increasingly important in future studies
related to scientific applications.
Integration with large language models. Large Language
Models (LLMs) have been extensively shown to possess a
wealth of knowledge, spanning various domains. Moreover,
there has been a development of domain-specific Language
Model Agents (LMAs) that exhibit high levels of expertise in
specific areas [358,359]. Given that many of the tasks under
discussion are intricately linked with the natural sciences, such
as physics, biochemistry, and material science, which often
require a deep understanding of domain-specific knowledge, it
becomes compelling to enhance the existing knowledge base
by integrating LLM agents into the training and evaluation
pipeline of geometric Graph Neural Networks (GNNs). This
integration holds promise for augmenting the capabilities of
GNNs by leveraging the comprehensive knowledge
representations offered by LLMs, thereby potentially
improving the performance and robustness of these models in
scientific applications. While there have been works
leveraging LLMs for certain tasks such as molecule property
prediction and drug design, they only operate on motifs
[360,361] or molecule graphs [362]. It still remains
challenging to bridge them with geometric graph neural
networks, enabling the pipeline to process 3D structural
information and perform prediction and/or generation over 3D
structures.
Relaxation of equivariance. While equivariance is
undeniably pivotal for bolstering data efficiency and
promoting generalization across diverse datasets, it is
noteworthy that rigidly adhering to equivariance principles can
sometimes overly constrain the model, potentially
compromising its performance. Thus, delving into
methodologies that offer a degree of flexibility in relaxing
equivariance constraints holds considerable significance. By
exploring approaches that strike a balance between
maintaining equivariance and accommodating adaptability,
researchers can unlock avenues for enhancing the practical
utility of models. Several pioneer studies [363,364] try to
relax the equivariance to a certain discrete point group and
achieves a remarkable improvement on various dynamic
physical systems, ranging from particle to vehicle dynamics.
This exploration may not only enrich our understanding of
model behavior but also pave the way for the development of
more robust and versatile solutions with broader applicability.
7
Conclusion
In this survey, we conduct a systematic investigation of the
progress in geometric Graph Neural Networks (GNNs),
through the lens of data structures, models, and their
applications. We specify geometric graph as the data structure,
which generalizes the concept of graph in the presence of
geometric information and permits the vital symmetry under
certain transformations. We present geometric GNNs as the
models, which consist of invariant GNNs, scalarizationbased/high-degree steerable equivariant GNNs, and geometric
graph transformers. We exhaustively discuss their applications
through the taxonomy on the data and tasks, including both
single instance and multi-instance tasks over domains in
physics, biochemistry, and others like materials and RNAs.
We also discuss the challenges and the future potential
directions of geometric GNNs.
Acknowledgement This work was jointly supported by the following
projects: The National Natural Science Foundation of China (Grant Nos.
62376276 and 62172422); Beijing Nova Program (Grant No. 20230484278);
the Fundamental Research Funds for the Central Universities, and the
Research Funds of Renmin University of China (Grant No. 23XNKJ19); and
Tencent AI Lab Rhino-Bird Focused Research Program.
Competing interests The authors declare that they have no competing
interests or financial conflicts to disclose.
Open Access This article is licensed under a Creative Commons Attribution
4.0 International License, which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative
Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons
licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
To view a copy of this licence, visit creativecommons.org/licenses/by/
4.0/
References
1.
2.
3.
4.
5.
Bronstein M M, Bruna J, Cohen T, Veličković P. Geometric deep
learning: grids, groups, graphs, geodesics, and gauges. 2021, arXiv
preprint arXiv: 2104.13478
Schütt K T, Arbabzadah F, Chmiela S, Müller K R, Tkatchenko A.
Quantum-chemical insights from deep tensor neural networks. Nature
Communications, 2017, 8: 13890
Klicpera J, Groß J, Gunnemann S. Directional message passing for
molecular graphs. In: Proceedings of the 8th International Conference
on Learning Representations. 2020
Klicpera J, Becker F, Günnemann S. GemNet: universal directional
graph neural networks for molecules. In: Proceedings of the 35th
International Conference on Neural Information Processing Systems.
2021, 520
Satorras V G, Hoogeboom E, Welling M. E(n) equivariant graph
Jiaqi HAN et al.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
A survey of geometric graph neural networks: data structures, models and applications
neural networks. In: Proceedings of the 38th International Conference
on Machine Learning. 2021, 9323−9332
Schütt K, Unke O, Gastegger M. Equivariant message passing for the
prediction of tensorial properties and molecular spectra. In:
Proceedings of the 38th International Conference on Machine
Learning. 2021, 9377−9388
Thomas N, Smidt T, Kearnes S, Yang L, Li L, Kohlhoff K, Riley P.
Tensor field networks: rotation- and translation-equivariant neural
networks for 3D point clouds. 2018, arXiv preprint arXiv: 1802.08219
Fuchs F B, Worrall D E, Fischer V, Welling M. SE(3)-Transformers:
3D roto-translation equivariant attention networks. In: Proceedings of
the 34th Conference on Neural Information Processing Systems. 2020
Brandstetter J, Hesselink R, van der Pol E, Bekkers E J, Welling M.
Geometric and physical quantities improve E(3) equivariant message
passing. In: Proceedings of the 10th International Conference on
Learning Representations. 2022
Batzner S, Musaelian A, Sun L, Geiger M, Mailoa J P, Kornbluth M,
Molinari N, Smidt T E, Kozinsky B. E(3)-equivariant graph neural
networks for data-efficient and accurate interatomic potentials. Nature
Communications, 2022, 13(1): 2453
Liao Y L, Smidt T E. Equiformer: equivariant graph attention
transformer for 3D atomistic graphs. In: Proceedings of the 11th
International Conference on Learning Representations. 2023
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, et al.
Accurate prediction of protein structures and interactions using a threetrack neural network. Science, 2021, 373(6557): 871−876
Watson J L, Juergens D, Bennett N R, Trippe B L, Yim J, et al. De
novo design of protein structure and function with RFdiffusion.
Nature, 2023, 620(7976): 1089−1100
Ingraham J B, Baranov M, Costello Z, Barber K W, Wang W, et al.
Illuminating protein space with a programmable generative model.
Nature, 2023, 623(7989): 1070−1078
Townshend R J L, Eismann S, Watkins A M, Rangan R, Karelina M,
Das R, Dror R O. Geometric deep learning of RNA structure. Science,
2021, 373(6558): 1047−1051
Corso G, Stärk H, Jing B, Barzilay R, Jaakkola T S. DiffDock:
diffusion steps, twists, and turns for molecular docking. In:
Proceedings of the 11th International Conference on Learning
Representations. 2023
Kong X, Huang W, Liu Y. End-to-end full-atom antibody design. In:
Proceedings of the 40th International Conference on Machine
Learning. 2023, 718
Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E. Neural
message passing for quantum chemistry. In: Proceedings of the 34th
International Conference on Machine Learning. 2017, 1263−1272
McNutt A T, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M,
Sunseri J, Koes D R. GNINA 1.0: molecular docking with deep
learning.Journal of Cheminformatics, 2021, 13(1): 43
Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, Weitzner B D, Hu X,
Adachi Y, Schief W R, Dunbrack Jr R L. RosettaAntibodyDesign
(RAbD): a general framework for computational antibody design.
PLoS Computational Biology, 2018, 14(4): e1006112
Ramakrishnan R, Dral P O, Rupp M, von Lilienfeld O A. Quantum
chemistry structures and properties of 134 kilo molecules. Scientific
Data, 2014, 1: 140022
Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R. Forging the basis
for developing protein–ligand interaction scoring functions. Accounts
of Chemical Research, 2017, 50(2): 302−309
Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, Shi J,
Deane C M. SAbDab: the structural antibody database. Nucleic Acids
Research, 2014, 42(D1): D1140−D1146
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
27
Han J, Rong Y, Xu T, Huang W. Geometrically equivariant graph
neural networks: a survey. 2022, arXiv preprint arXiv: 2202.07230
Han J, Huang W, Ma H, Li J, Tenenbaum J B, Gan C. Learning
physical dynamics with subequivariant graph neural networks. In:
Proceedings of the 36th Conference on Neural Information Processing
Systems. 2022
Sanchez-Gonzalez A, Godwin J, Pfaff T, Ying R, Leskovec J,
Battaglia P. Learning to simulate complex physics with graph
networks. In: Proceedings of the 37th International Conference on
Machine Learning. 2020, 8459−8468
Kipf T, Fetaya E, Wang K C, Welling M, Zemel R. Neural relational
inference for interacting systems. In: Proceedings of the 35th
International Conference on Machine Learning. 2018, 2688−2697
Huang Y, Peng X, Ma J, Zhang M. 3DLinker: an E(3) equivariant
variational autoencoder for molecular linker design. In: Proceedings of
the 39th International Conference on Machine Learning. 2022,
9280−9294
Guan J, Qian W W, Peng X, Su Y, Peng J, Ma J. 3D equivariant
diffusion for target-aware molecule generation and affinity prediction.
In: Proceedings of the 11th International Conference on Learning
Representations. 2023
Jing B, Corso G, Chang J, Barzilay R, Jaakkola T. Torsional diffusion
for molecular conformer generation. In: Proceedings of the 36th
International Conference on Neural Information Processing Systems.
2022, 1760
Wu L, Hou Z, Yuan J, Rong Y, Huang W. Equivariant spatio-temporal
attentive graph networks to simulate physical dynamics. In:
Proceedings of the 37th International Conference on Neural
Information Processing System. 2023, 1965
Kong X, Huang W, Liu Y. Conditional antibody design as 3D
equivariant graph translation. In: Proceedings of the 11th International
Conference on Learning Representations. 2023
Senior A W, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin
C, Žídek A, Nelson A W R, Bridgland A, Penedones H, Petersen S,
Simonyan K, Crossan S, Kohli P, Jones D T, Silver D, Kavukcuoglu
K, Hassabis D. Improved protein structure prediction using potentials
from deep learning. Nature, 2020, 577(7792): 706−710
Chanussot L, Das A, Goyal S, Lavril T, Shuaibi M, Riviere M, Tran K,
Heras-Domingo J, Ho C, Hu W, Palizhati A, Sriram A, Wood B, Yoon
J, Parikh D, Zitnick C L, Ulissi Z. Open catalyst 2020 (OC20) dataset
and community challenges. ACS Catalysis, 2021, 11(10): 6059−6072
Kong X, Jia Y, Huang W, Liu Y. Full-atom peptide design with
geometric latent diffusion. In: Proceedings of the 38th Conference on
Neural Information Processing Systems. 2024
Duval A, Mathis S V, Joshi C K, Schmidt V, Miret S, Malliaros F D,
Cohen T, Liò P, Bengio Y, Bronstein M. A hitchhiker’s guide to
geometric GNNs for 3D atomic systems. 2024, arXiv preprint arXiv:
2312.07511
Xia J, Zhu Y, Du Y, Li S Z. A systematic survey of chemical pretrained models. In: Proceedings of the 32nd International Joint
Conference on Artificial Intelligence. 2023, 6787−6795
Guo Z, Guo K, Nan B, Tian Y, Iyer R G, Ma Y, Wiest O, Zhang X,
Wang W, Zhang C, Chawla N V. Graph-based molecular
representation learning. In: Proceedings of the 32nd International Joint
Conference on Artificial Intelligence. 2023, 6638−6646.
Atz K, Grisoni F, Schneider G. Geometric deep learning on molecular
representations. Nature Machine Intelligence, 2021, 3(12): 1023−1032
Zhang X, Wang L, Helwig J, Luo Y, Fu C, et al. Artificial intelligence
for science in quantum, atomistic, and continuum systems. 2025, arXiv
preprint arXiv: 2307.08423
Esteves C. Theoretical aspects of group equivariant neural networks.
28
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
Front. Comput. Sci., 2025, 19(11): 1911375
2020, arXiv preprint arXiv: 2004.05154
Cederberg J. A course in modern geometries. Springer Science &
Business Media, 2004
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip S Y. A comprehensive
survey on graph neural networks. IEEE Transactions on Neural
Networks and Learning Systems, 2021, 32(1): 4−24
Yuan Z, Wei Z, Lv F, Wen J R. Index-free triangle-based graph local
clustering. Frontiers of Computer Science, 2024, 18(3): 183404
Wu Z, Ramsundar B, Feinberg E N, Gomes J, Geniesse C, Pappu A S,
Leswing K, Pande V. MoleculeNet: a benchmark for molecular
machine learning. Chemical Science, 2018, 9(2): 513−530
Villar S, Hogg D W, Storey-Fisher K, Yao W, Blum-Smith B. Scalars
are universal: equivariant machine learning, structured like classical
physics. In: Proceedings of the 35th Conference on Neural Information
Processing Systems. 2021
Schutt K T, Sauceda H E, Kindermans P J, Tkatchenko A, Müller K R.
SchNet–a deep learning architecture for molecules and materials. The
Journal of Chemical Physics, 2018, 148(24): 241722
Baek M, Anishchenko I, Humphreys I R, Cong Q, Baker D, DiMaio F.
Efficient and accurate prediction of protein structure using
RoseTTAFold2. bioRxiv, 2023
Luo Y, Liu C, Ji S. Towards symmetry-aware generation of periodic
materials. In: Proceedings of the 37th Conference on Neural
Information Processing Systems. 2023, 36
Jiao R, Huang W, Lin P, Han J, Chen P, Lu Y, Liu Y. Crystal structure
prediction by joint equivariant diffusion. In: Proceedings of the 37th
International Conference on Neural Information Processing System.
2023, 767
Huang W, Han J, Rong Y, Xu T, Sun F, Huang J. Equivariant graph
mechanics networks with constraints. In: Proceedings of the 10th
International Conference on Learning Representations. 2022
Gasteiger J, Giri S, Margraf J T, Günnemann S. Fast and uncertaintyaware directional message passing for non-equilibrium molecules.
2022, arXiv preprint arXiv: 2011.14115
Zhu F, Futrega M, Bao H, Eryilmaz S B, Kong F, Duan K, Zheng X,
Angel N, Jouanneaux M, Stadler M, Marcinkiewicz M, Xie F, Yang J,
Andersch M. FastDimeNet++: training DimeNet++ in 22 minutes. In:
Proceedings of the 52nd International Conference on Parallel
Processing. 2023, 274−284
Finzi M, Stanton S, Izmailov P, Wilson A G. Generalizing
convolutional neural networks for equivariance to lie groups on
arbitrary continuous data. In: Proceedings of the 37th International
Conference on Machine Learning. 2020, 3165−3176
Liu Y, Wang L, Liu M, Lin Y, Zhang X, Oztekin B, Ji S. Spherical
message passing for 3D molecular graphs. In: Proceedings of the 10th
International Conference on Learning Representations. 2022
Wang L, Liu Y, Lin Y, Liu H, Ji S. ComENet: towards complete and
efficient message passing for 3D molecular graphs. In: Proceedings of
the 36th International Conference on Neural Information Processing
Systems. 2022, 47
Li Z, Wang X, Huang Y, Zhang M. Is distance matrix enough for
geometric deep learning? In: Proceedings of the 37th International
Conference on Neural Information Processing Systems. 2023, 1627
Li Z, Wang X, Kang S, Zhang M. On the completeness of invariant
geometric deep learning models. 2024, arXiv preprint arXiv:
2402.04836
Yue A, Luo D, Xu H. A plug-and-play quaternion message-passing
module for molecular conformation representation. In: Proceedings of
the 38th AAAI Conference on Artificial Intelligence. 2024,
16633−16641
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
Du W, Zhang H, Du Y, Meng Q, Chen W, Zheng N, Shao B, Liu T Y.
SE(3) equivariant graph neural networks with complete local frames.
In: Proceedings of the 39th International Conference on Machine
Learning. 2022, 5583−5608
Kofinas M, Nagaraja N S, Gavves E. Roto-translated local coordinate
frames for interacting dynamical systems. In: Proceedings of the 35th
Conference on Neural Information Processing Systems. 2021
Kofinas M, Bekkers E J, Nagaraja N S, Gavves E. Latent field
discovery in interacting dynamical systems with neural fields. In:
Proceedings of the 37th International Conference on Neural
Information Processing Systems. 2023, 1379
Kohler J, Klein L, Noé F. Equivariant flows: sampling configurations
for multi-body systems with symmetric energies. 2019, arXiv preprint
arXiv: 1910.00753
Jing B, Eismann S, Suriana P, Townshend R J L, Dror R O. Learning
from protein structure with geometric vector perceptrons. In:
Proceedings of the 9th International Conference on Learning
Representations. 2021
Han J, Huang W, Xu T, Rong Y. Equivariant graph hierarchy-based
neural networks. In: Proceedings of the 36th Conference on Neural
Information Processing Systems. 2022
Zhang Y, Cen J, Han J, Zhang Z, Zhou J, Huang W. Improving
equivariant graph neural networks on large geometric graphs via
virtual nodes learning. In: Proceedings of the 41st International
Conference on Machine Learning. 2024
Puny O, Atzmon M, Smith E J, Misra I, Grover A, Ben-Hamu H,
Lipman Y. Frame averaging for invariant and equivariant network
design. In: Proceedings of the 10th International Conference on
Learning Representations. 2022
Duval A A, Schmidt V, Hernández-Garcıa A, Miret S, Malliaros F D,
Bengio Y, Rolnick D. FAENet: frame averaging equivariant GNN for
materials modeling. In: Proceedings of the 40th International
Conference on Machine Learning. 2023, 9013−9033
Du W, Du Y, Wang L, Feng D, Wang G, Ji S, Gomes C P, Ma Z M. A
new perspective on building efficient and expressive 3D equivariant
graph neural networks. In: Proceedings of the 37th International
Conference on Neural Information Processing System. 2023, 2910
Aykent S, Xia T. SaVeNet: a scalable vector network for enhanced
molecular representation learning. In: Proceedings of the 37th
International Conference on Neural Information Processing Systems.
2023, 1860
Wang Y, Wang T, Li S, He X, Li M, Wang Z, Zheng N, Shao B, Liu T
Y. Enhancing geometric representations for molecules with equivariant
vector-scalar interactive message passing. Nature Communications,
2024, 15(1): 313
Wang Z, Liu G, Zhou Y, Wang T, Shao B. QuinNet: efficiently
incorporating quintuple interactions into geometric deep learning force
fields. In: Proceedings of the 37th Conference on Neural Information
Processing Systems. 2023, 3368
Cen J, Li A, Lin N, Ren Y, Wang Z, Huang W. Are high-degree
representations really unnecessary in equivariant graph neural
networks? In: Proceedings of the 38th Conference on Neural
Information Processing Systems. 2024
Battiloro C, Karaismailoglu E, Tec M, Da-soulas G, Audirac M,
Dominici F. E(n) equivariant topological neural networks. In:
Proceedings of the Thirteenth International Conference on Learning
Representations. 2025
Li Z, Cen J, Su B, Huang W, Xu T, Rong Y, Zhao D. Large languagegeometry model: when LLM meets equivariance. 2025, arXiv preprint
Jiaqi HAN et al.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
A survey of geometric graph neural networks: data structures, models and applications
arXiv: 2502.11149
Anderson B, Hy T S, Kondor R. Cormorant: covariant molecular
neural networks. In: Proceedings of the 33rd Conference on Neural
Information Processing Systems. 2019
Musaelian A, Batzner S, Johansson A, Sun L, Owen C J, Kornbluth M,
Kozinsky B. Learning local equivariant representations for large-scale
atomistic dynamics. Nature Communications, 2023, 14(1): 579
Zitnick C L, Das A, Kolluru A, Lan J, Shuaibi M, Sriram A, Ulissi Z,
Wood B. Spherical channels for modeling atomic interactions. In:
Proceedings of the 36th International Conference on Neural
Information Processing Systems. 2022, 585
Passaro S, Zitnick C L. Reducing SO(3) convolutions to SO(2) for
efficient equivariant GNNs. In: Proceedings of the 40th International
Conference on Machine Learning. 2023, 1140
Batatia I, Kovács D P, Simm G N C, Ortner C, Csányi G. MACE:
higher order equivariant message passing neural networks for fast and
accurate force fields. In: Proceedings of the 36th Conference on Neural
Information Processing Systems. 2022, 11423−11436
Ying C, Cai T, Luo S, Zheng S, Ke G, He D, Shen Y, Liu T Y. Do
transformers really perform bad for graph representation? In:
Proceedings of the 35th Conference on Neural Information Processing
Systems. 2021
Shi Y, Zheng S, Ke G, Shen Y, You J, He J, Luo S, Liu C, He D, Liu T
Y. Benchmarking graphormer on large-scale molecular modeling
datasets. 2023, arXiv preprint arXiv: 2203.04810
Thölke P, de Fabritiis G. Equivariant transformers for neural network
based molecular potentials. In: Proceedings of the 10th International
Conference on Learning Representations. 2022
Hutchinson M J, Le Lan C, Zaidi S, Dupont E, Teh Y W, Kim H.
Lietransformer: equivariant self-attention for Lie groups. In:
Proceedings of the 38th International Conference on Machine
Learning. 2021, 4533−4543
Hsu C, Verkuil R, Liu J, Lin Z, Hie B, Sercu T, Lerer A, Rives A.
Learning inverse folding from millions of predicted structures. In:
Proceedings of the 39th International Conference on Machine
Learning. 2022, 8946−8970
Liao Y L, Wood B M, Das A, Smidt T E. EquiformerV2: improved
equivariant transformer for scaling to higher-degree representations.
In: Proceedings of the 12th International Conference on Learning
Representations. 2024
Wang Y, Li S, Wang T, Shao B, Zheng N, Liu T Y. Geometric
transformer with interatomic positional encoding. In: Proceedings of
the 37th Conference on Neural Information Processing Systems. 2023,
36
Frank J T, Unke O T, Müller K R, Chmiela S. A Euclidean transformer
for fast and stable machine learned force fields. Nature
Communications, 2024, 15(1): 6539
Aykent S, Xia T. GotenNet: rethinking efficient 3D equivariant graph
neural networks. In: Proceedings of the 13th International Conference
on Learning Representations. 2025
Jiao R, Kong X, Yu Z, Huang W, Liu Y. Equivariant pretrained
transformer for unified geometric learning on multi-domain 3D
molecules. 2025, arXiv preprint arXiv: 2402.12714v1
Ma H, Bian Y, Rong Y, Huang W, Xu T, Xie W, Ye G, Huang J.
Cross-dependent graph neural networks for molecular property
prediction. Bioinformatics, 2022, 38(7): 2003−2009
Zhang M, Li P. Nested graph neural networks. In: Proceedings of the
35th Conference on Neural Information Processing Systems. 2021,
15734−15747
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
29
Qin S, Zhang X, Xu H, Xu Y. Fast quaternion product units for
learning disentangled representations in SO(3). IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2023, 45(4): 4504−4520
Zhu X, Xu Y, Xu H, Chen C. Quaternion convolutional neural
networks. In: Proceedings of the 15th European Conference on
Computer Vision. 2018, 645−661
Zhang X, Qin S, Xu Y, Xu H. Quaternion product units for deep
learning on 3D rotation groups. In: Proceedings of 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2020,
7302−7311
Joshi C K, Bodnar C, Mathis S V, Cohen T, Liò P. On the expressive
power of geometric graph neural networks. In: Proceedings of the 40th
International Conference on Machine Learning. 2023, 625
Gilmore R. Lie Groups, Physics, and Geometry: An Introduction for
Physicists, Engineers and Chemists. Cambridge: Cambridge University
Press, 2008
Müller C. Spherical Harmonics. Berlin: Springer, 2006
Griffiths D J, Schroeter D F. Introduction to Quantum Mechanics.
Cambridge: Cambridge University Press, 2018
Weiler M, Geiger M, Welling M, Boomsma W, Cohen T. 3D steerable
CNNs: learning rotationally equivariant features in volumetric data. In:
Proceedings of the 32nd Conference on Neural Information Processing
Systems. 2018, 31
Ramachandran P, Zoph B, Le Q V. Searching for activation functions.
In: Proceedings of the 6th International Conference on Learning
Representations. 2018
Drautz R. Atomic cluster expansion for accurate and transferable
interatomic potentials. Physical Review B, 2019, 99(1): 014104
Dusson G, Bachmayr M, Csányi G, Drautz R, Etter S, van der Oord C,
Ortner C. Atomic cluster expansion: completeness, efficiency and
stability. Journal of Computational Physics, 2022, 454: 110946
Bochkarev A, Lysogorskiy Y, Menon S, Qamar M, Mrovec M, Drautz
R. Efficient parametrization of the atomic cluster expansion. Physical
Review Materials, 2022, 6(1): 013804
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N,
Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of
the 31st International Conference on Neural Information Processing
Systems. 2017, 6000−6010
Yuan C, Zhao K, Kuruoglu E E, Wang L, Xu T, Huang W, Zhao D,
Cheng H, Rong Y. A survey of graph transformers: Architectures,
theories and applications. arXiv preprint arXiv: 2502.16533, 2025
Hu W, Fey M, Ren H, Nakata M, Dong Y, Leskovec J. OGB-LSC: a
large-scale challenge for machine learning on graphs. In: Proceedings
of the 35th Conference on Neural Information Processing Systems
(NeurIPS 2021) Track on Datasets and Benchmarks. 2021
Shuaibi M, Kolluru A, Das A, Grover A, Sriram A, Ulissi Z, Zitnick C
L. Rotation invariant graph neural networks using spin convolutions.
2021, arXiv preprint arXiv: 2106.09575
Dym N, Maron H. On the universality of rotation equivariant point
cloud networks. In: Proceedings of the 9th International Conference on
Learning Representations. 2021
Weisfeiler B, Leman A. The reduction of a graph to canonical form
and the algebra which appears therein. Nauchno-Technicheskaya
Informatsia, 1968, 2(9): 12−16
Lawrence H, Portilheiro V, Zhang Y, Kaba S O. Improving equivariant
networks with probabilistic symmetry breaking. In: Proceedings of the
Geometry-Grounded Representation Learning and Generative
Modeling at 41st International Conference on Machine Learning. 2024
Battaglia P, Pascanu R, Lai M, Jimenez Rezende D, Kavukcuoglu K.
Interaction networks for learning about objects, relations and physics.
30
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
Front. Comput. Sci., 2025, 19(11): 1911375
In: Proceedings of the 30th International Conference on Neural
Information Processing Systems. 2016, 4509−4517
Sanchez-Gonzalez A, Bapst V, Cranmer K, Battaglia P. Hamiltonian
graph networks with ode integrators. 2019, arXiv preprint arXiv:
1909.12790
Guo L, Wang W, Chen Z, Zhang N, Sun Z, Lai Y, Zhang Q, Chen H.
Newton–cotes graph neural networks: on the time evolution of
dynamic systems. In: Proceedings of the 37th Conference on Neural
Information Processing Systems. 2023, 36
Allen K R, Guevara T L, Rubanova Y, Stachenfeld K, SanchezGonzalez A, Battaglia P, Pfaff T. Graph network simulators can learn
discontinuous, rigid contact dynamics. In: Proceedings of the 6th
Conference on Robot Learning. 2023, 1157−1167
Rubanova Y, Sanchez-Gonzalez A, Pfaff T, Battaglia P. Constraintbased graph network simulator. In: Proceedings of the 39th
International Conference on Machine Learning. 2022, 18844−18870
Wu T, Wang Q, Zhang Y, Ying R, Cao K, Sosic R, Jalali R, Hamam
H, Maucec M, Leskovec J. Learning large-scale subsurface simulations
with a hybrid graph network simulator. In: Proceedings of the 28th
ACM SIGKDD Conference on Knowledge Discovery and Data
Mining. 2022, 4184−4194
Li Y, Wu J, Tedrake R, Tenenbaum J B, Torralba A. Learning particle
dynamics for manipulating rigid bodies, deformable objects, and
fluids. In: Proceedings of the 7th International Conference on Learning
Representations. 2019
Mrowca D, Zhuang C, Wang E, Haber N, Fei-Fei L, Tenenbaum J B,
Yamins D L K. Flexible neural representation for physics prediction.
In: Proceedings of the 32nd International Conference on Neural
Information Processing Systems. 2018, 8813−8824
Allen K R, Rubanova Y, Lopez-Guevara T, Whitney W, SanchezGonzalez A, Battaglia P W, Pfaff T. Learning rigid dynamics with face
interaction graph networks. In: Proceedings of the 11th International
Conference on Learning Representations. 2023
Xu C, Tan R T, Tan Y, Chen S, Wang Y G, Wang X, Wang Y.
EqMotion: equivariant multi-agent motion prediction with invariant
interaction reasoning. In: Proceedings of 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition. 2023, 1410−1420
Liu Y, Cheng J, Zhao H, Xu T, Zhao P, Tsung F G, Li J, Rong Y.
Improving generalization in equivariant graph neural networks with
physical inductive biases. In: Proceedings of the 12th International
Conference on Learning Representations. 2024
Coors B, Condurache A P, Geiger A. SphereNet: learning spherical
representations for detection and classification in omnidirectional
images. In: Proceedings of the 15th European Conference on
Computer Vision. 2018, 525−541
Wang X, Zhang M. Graph neural network with local frame for
molecular potential energy surface. In: Proceedings of the 1st Learning
on Graphs Conference. 2022, 19
Luo S, Chen T, Krishnapriyan A S. Enabling efficient equivariant
operations in the Fourier basis via gaunt tensor products. In:
Proceedings of the 12th International Conference on Learning
Representations. 2024
Köhler J, Klein L, Noe F. Equivariant flows: exact likelihood
generative learning for symmetric densities. In: Proceedings of the
37th International Conference on Machine Learning. 2020, 5361−5370
Xu M, Han J, Lou A, Kossaifi J, Ramanathan A, Azizzadenesheli K,
Leskovec J, Ermon S, Anandkumar A. Equivariant graph neural
operator for modeling 3D dynamics. In: Proceedings of the 41st
International Conference on Machine Learning. 2024
Schreiner M, Winther O, Olsson S. Implicit transfer operator learning:
multiple time-resolution surrogates for molecular dynamics. In:
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
Proceedings of the 37th International Conference on Neural
Information Processing Systems. 2023, 1582
Midgley L I, Stimper V, Antorán J, Mathieu E, Schölkopf B,
Hernández-Lobato J M. SE(3) equivariant augmented coupling flows.
In: Proceedings of the 37th International Conference on Neural
Information Processing Systems. 2023, 3466
Han J, Xu M, Lou A, Ye H, Ermon S. Geometric trajectory diffusion
models. In: Proceedings of the 38th Conference on Neural Information
Processing Systems. 2024
Raja S, Amin I, Pedregosa F, Krishnapriyan A S. Stability-aware
training of neural network interatomic potentials with differentiable
Boltzmann estimators. 2025, arXiv preprint arXiv: 2402.13984v1
Amin I, Raja, Krishnapriyan A S. Towards fast, specialized machine
learning force fields: distilling foundation models via energy hessians.
In: Proceedings of the 13th International Conference on Learning
Representations. 2025
Xu M, Yu L, Song Y, Shi C, Ermon S, Tang J. GeoDiff: a geometric
diffusion model for molecular conformation generation. In:
Proceedings of the 10th International Conference on Learning
Representations. 2022
Xu M, Powers A S, Dror R O, Ermon S, Leskovec J. Geometric latent
diffusion models for 3D molecule generation. In: Proceedings of the
40th International Conference on Machine Learning. 2023,
38592−38610
Xu M, Wang W, Luo S, Shi C, Bengio Y, Gomez-Bombarelli R, Tang
J. An end-to-end framework for molecular conformation generation via
bilevel programming. In: Proceedings of the 38th International
Conference on Machine Learning. 2021, 11537−11547
Shi C, Luo S, Xu M, Tang J. Learning gradient fields for molecular
conformation generation. In: Proceedings of the 38th International
Conference on Machine Learning. 2021, 9558−9568
Gebauer N W A, Gastegger M, Schutt K T. Symmetry-adapted
generation of 3D point sets for the targeted discovery of molecules. In:
Proceedings of the 33rd Conference on Neural Information Processing
Systems. 2019, 32
Gebauer N W A, Gastegger M, Hessmann S S P, Müller K R, Schütt K
T. Inverse design of 3D molecular structures with conditional
generative neural networks. Nature Communications, 2022, 13(1): 973
Huang L, Zhang H, Xu T, Wong K C. MDM: molecular diffusion
model for 3D molecule generation. In: Proceedings of the 37th AAAI
Conference on Artificial Intelligence. 2023, 5105−5112
Peng X, Guan J, Liu Q, Ma J. MolDiff: addressing the atom-bond
inconsistency problem in 3D molecule diffusion generation. In:
Proceedings of the 40th International Conference on Machine
Learning. 2023, 27611−27629
Luo S, Shi C, Xu M, Tang J. Predicting molecular conformation via
dynamic graph score matching. In: Proceedings of the 35th Conference
on Neural Information Processing Systems. 2021
Satorras V G, Hoogeboom E, Fuchs F B, Posner I, Welling M. E(n)
equivariant normalizing flows. In: Proceedings of the 35th
International Conference on Neural Information Processing Systems.
2021, 320
Hoogeboom E, Satorras V G, Vignac C, Welling M. Equivariant
diffusion for molecule generation in 3D. In: Proceedings of the 39th
International Conference on Machine Learning. 2022, 8867−8887
Ganea O E, Pattanaik L, Coley C W, Barzilay R, Jensen K F, Green W
H, Jaakkola T S. GEOMOL: torsional geometric generation of
molecular 3D conformer ensembles. In: Proceedings of the 35th
Conference on Neural Information Processing Systems. 2021
Wang F, Xu H, Chen X, Lu S, Deng Y, Huang W. MPerformer: an
SE(3) transformer-based molecular perceptron. In: Proceedings of the
Jiaqi HAN et al.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
A survey of geometric graph neural networks: data structures, models and applications
32nd ACM International Conference on Information and Knowledge
Management. 2023, 2512−2522
Bao F, Zhao M, Hao Z, Li P, Li C, Zhu J. Equivariant energy-guided
SDE for inverse molecular design. In: Proceedings of the 11th
International Conference on Learning Representations. 2023
Zhu J, Xia Y, Liu C, Wu L, Xie S, Wang Y, Wang T, Qin T, Zhou W,
Li H, Liu H, Liu T Y. Direct molecular conformation generation.
Transactions on Machine Learning Research, 2022, See openreview.
net/forum?id=lCPOHiztuw website, 2022
Qiang B, Song Y, Xu M, Gong J, Gao B, Zhou H, Ma W Y, Lan Y.
Coarse-to-fine: a hierarchical diffusion model for molecule generation
in 3D. In: Proceedings of the 40th International Conference on
Machine Learning. 2023, 28277–28299
Song Y, Gong J, Xu M, Cao Z, Lan Y, Ermon S, Zhou H, Ma W Y.
Equivariant flow matching with hybrid probability transport for 3D
molecule generation. In: Proceedings of the 37th International
Conference on Neural Information Processing Systems. 2023, 26
Reidenbach D, Krishnapriyan A S. Coarsenconf: equivariant
coarsening with aggregated attention for molecular conformer
generation. Journal of Chemical Information and Modeling, 2025,
65(1): 22−30
Song Y, Gong J, Zhou H, Zheng M, Liu J, Ma W Y. Unified
generative modeling of 3D molecules with Bayesian flow networks.
In: Proceedings of the 12th International Conference on Learning
Representations. 2024
Qu Y, Qiu K, Song Y, Gong J, Han J, Zheng M, Zhou H, Ma W Y.
MolCRAFT: structure-based drug design in continuous parameter
space. In: Proceedings of the 41st International Conference on
Machine Learning. 2024
Jiao R, Han J, Huang W, Rong Y, Liu Y. Energy-motivated
equivariant pretraining for 3D molecular graphs. In: Proceedings of the
37th AAAI Conference on Artificial Intelligence. 2023, 8096−8104
Liu S, Guo H, Tang J. Molecular geometry pretraining with SE(3)invariant denoising distance matching. In: Proceedings of the 11th
International Conference on Learning Representations. 2023
Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J. Pre-training
molecular graph representation with 3D geometry. In: Proceedings of
the 10th International Conference on Learning Representations. 2022
Zaidi S, Schaarschmidt M, Martens J, Kim H, Teh Y W, SanchezGonzalez A, Battaglia P W, Pascanu R, Godwin J. Pre-training via
denoising for molecular property prediction. In: Proceedings of the
11th International Conference on Learning Representations. 2023
Feng J, Wang Z, Li Y, Ding B, Wei Z, Xu H. MGMAE: molecular
representation learning by reconstructing heterogeneous graphs with a
high mask ratio. In: Proceedings of the 31st ACM International
Conference on Information & Knowledge Management. 2022,
509−519
Stärk H, Beaini D, Corso G, Tossou P, Dallago C, Gunnemann S, Lió
P. 3D infomax improves GNNs for molecular property prediction. In:
Proceedings of the 39th International Conference on Machine
Learning. 2022, 20479−20502
Zhou G, Gao Z, Ding Q, Zheng H, Xu H, Wei Z, Zhang L, Ke G. UniMol: a universal 3D molecular representation learning framework. In:
Proceedings of the 11th International Conference on Learning
Representations. 2023
Luo S, Chen T, Xu Y, Zheng S, Liu T Y, Wang L, He D. One
transformer can understand both 2D & 3D molecular data. In:
Proceedings of the 11th International Conference on Learning
Representations. 2023
Liu S, Du W, Ma Z M, Guo H, Tang J. A group symmetric stochastic
differential equation model for molecule multi-modal pretraining. In:
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
173.
174.
175.
176.
177.
178.
31
Proceedings of the 40th International Conference on Machine
Learning. 2023, 21497–21526
Ni Y, Feng S, Ma W Y, Ma Z M, Lan Y. Sliced denoising: a physicsinformed molecular pre-training method. In: Proceedings of the 12th
International Conference on Learning Representations. 2024
Feng S, Ni Y, Lan Y, Ma Z M, Ma W Y. Fractional denoising for 3D
molecular pre-training. In: Proceedings of the 40th International
Conference on Machine Learning. 2023, 9938−9961
Liu Y, Chen J, Jiao R, Li J, Huang W, Su B. DenoiseVAE: learning
molecule-adaptive noise distributions for denoising-based 3D
molecular pre-training. In: Proceedings of the 13th International
Conference on Learning Representations. 2025
Liu S, Rong Y, Zhao D, Liu Q, Wu S, Wang L. MolSpectra: pretraining 3D molecular representation with multi-modal energy spectra.
In: Proceedings of the 13th International Conference on Learning
Representations. 2025
Wang Z, Combs S A, Brand R, Calvo M R, Xu P, Price G, Golovach
N, Salawu E O, Wise C J, Ponnapalli S P, Clark P M. LM-GVP: an
extensible sequence and structure informed deep learning framework
for protein property prediction. Scientific Reports, 2022, 12(1): 6832
Gligorijević V, Renfrew P D, Kosciolek T, Leman J K, Berenberg D,
Vatanen T, Chandler C, Taylor B C, Fisk I M, Vlamakis H, Xavier R J,
Knight R, Cho K, Bonneau R. Structure-based protein function
prediction
using
graph
convolutional
networks.
Nature
Communications, 2021, 12(1): 3168
Zhang Z, Xu M, Jamasb A R, Chenthamarakshan V, Lozano A C, Das
P, Tang J. Protein representation learning by geometric structure
pretraining. In: Proceedings of the 11th International Conference on
Learning Representations. 2023
Torng W, Altman R B. 3D deep convolutional neural networks for
amino acid environment similarity analysis. BMC Bioinformatics,
2017, 18(1): 302
Zhang Y, Skolnick J. TM-align: a protein structure alignment
algorithm based on the tm-score. Nucleic Acids Research, 2005, 33(7):
2302−2309
Eismann S, Townshend R J L, Thomas N, Jagota M, Jing B, Dror R O.
Hierarchical, rotation-equivariant neural networks to select structural
models of protein complexes. Proteins: Structure, Function, and
Bioinformatics, 2021, 89(5): 493−501
Eismann S, Suriana P, Jing B, Townshend R J L, Dror R O. Protein
model quality assessment using rotation-equivariant transformations
on point clouds. Proteins: Structure, Function, and Bioinformatics,
2023, 91(8): 1089−1096
Chen C, Chen X, Morehead A, Wu T, Cheng J. 3D-equivariant graph
neural networks for protein model quality assessment. Bioinformatics,
2023, 39(1): btad030
Tubiana J, Schneidman-Duhovny D, Wolfson H J. ScanNet: an
interpretable geometric deep learning model for structure-based protein
binding site prediction. Nature Methods, 2022, 19(6): 730−739
Zhang Y, Wei Z, Yuan Y, Ding Z, Huang W. EquiPocket: an E(3)equivariant geometric graph neural network for ligand binding site
prediction. In: Proceedings of the 41st International Conference on
Machine Learning. 2024
Meller A, Ward M D, Borowsky J H, Lotthammer J M, Kshirsagar M,
Oviedo F, Ferres J L, Bowman G. Predicting the locations of cryptic
pockets from single protein structures using the pocketminer graph
neural network. Biophysical Journal, 2023, 122(3Suppl): 445A
Ingraham J, Garg V K, Barzilay R, Jaakkola T. Generative models for
graph-based protein design. In: Proceedings of the 33rd Conference on
Neural Information Processing Systems. 2019, 32
Tan C, Gao Z, Xia J, Hu B, Li S Z. Generative de novo protein design
32
179.
180.
181.
182.
183.
184.
185.
186.
187.
188.
189.
190.
191.
192.
193.
194.
Front. Comput. Sci., 2025, 19(11): 1911375
with global context. 2023, arXiv preprint arXiv: 2204.10673
Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte R J, Milles L F,
Wicky B I M, Courbet A, de Haas R J, Bethel N, Leung P J Y, Huddy
T F, Pellock S, Tischer D, Chan F, Koepnick B, Nguyen H, Kang A,
Sankaran B, Bera A K, King N P, Baker D. Robust deep
learning–based protein sequence design using ProteinMPNN. Science,
2022, 378(6615): 49−56
Gao Z, Tan C, Li S Z. PiFold: toward effective and efficient protein
inverse folding. In: Proceedings of the 11th International Conference
on Learning Representations. 2023
Zheng Z, Deng Y, Xue D, Zhou Y, Ye F, Gu Q. Structure-informed
language models are protein designers. In: Proceedings of the 40th
International Conference on Machine Learning. 2023, 1781
Gao Z, Tan C, Chen X, Zhang Y, Xia J, Li S, Li S Z. KW-Design:
pushing the limit of protein design via knowledge refinement. In:
Proceedings of the 12th International Conference on Learning
Representations. 2024
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, et al. Highly
accurate protein structure prediction with AlphaFold. Nature, 2021,
596(7873): 583−589
Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, et al.
Generalized biomolecular modeling and design with RoseTTAFold
All-Atom. Science, 2024, 384(6693): eadl2528
Jing B, Erives E, Pao-Huang P, Corso G, Berger B, Jaakkola T.
EigenFold: generative protein structure prediction with diffusion
models. In: Proceedings of the ICLR 2023-Machine Learning for Drug
Discovery Workshop. 2023
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R,
Kabeli O, Shmueli Y, Dos Santos Costa A, Fazel-Zarandi M, Sercu T,
Candido S, Rives A. Evolutionary-scale prediction of atomic-level
protein structure with a language model. Science, 2023, 379(6637):
1123−1130
Fang X, Wang F, Liu L, He J, Lin D, Xiang Y, Zhu K, Zhang X, Wu
H, Li H, Song L. A method for multiple-sequence-alignment-free
protein structure prediction using a protein language model. Nature
Machine Intelligence, 2023, 5(10): 1087−1096
Shi C, Wang C, Lu J, Zhong B, Tang J. Protein sequence and structure
co-design with equivariant translation. In: Proceedings of the 11th
International Conference on Learning Representations. 2023
Yue A, Wang Z, Xu H. ReQFlow: rectified quaternion flow for
efficient and high-quality protein backbone generation. 2025, arXiv
preprint arXiv: 2502.14637
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L,
Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B.
ProtTrans: toward understanding the language of life through selfsupervised learning. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2022, 44(10): 7112−7127
Chen B, Cheng X, Li P, Geng Y A, Gong J, Li S, Bei Z, Tan X, Wang
B, Zeng X, Liu C, Zeng A, Dong Y, Tang J, Song L. xTrimoPGLM:
unified 100B-scale pre-trained transformer for deciphering the
language of protein. 2024, arXiv preprint arXiv: 2401.06199
Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised
language model for protein design. Nature Communications, 2022,
13(1): 4348
Mansoor S, Baek M, Madan U, Horvitz E. Toward more general
embeddings for protein design: harnessing joint representations of
sequence and structure. bioRxiv, 2021
Gao B, Jia Y, Mo Y, Ni Y, Ma W Y, Ma Z M, Lan Y. Self-supervised
pocket pretraining via protein fragment-surroundings alignment. In:
Proceedings of the 12th International Conference on Learning
Representations. 2024
195.
196.
197.
198.
199.
200.
201.
202.
203.
204.
205.
206.
207.
208.
209.
210.
211.
212.
Wang Z, Zhang Q, Hu S, Yu H, Jin X, Gong Z, Chen H. Multi-level
protein structure pre-training via prompt learning. In: Proceedings of
the 11th International Conference on Learning Representations. 2023
Gao B, Qiang B, Tan H, Ren M, Jia Y, Lu M, Liu J, Ma W Y, Lan Y.
DrugCLIP: contrastive protein-molecule representation learning for
virtual screening. In: Proceedings of the 37th Conference on Neural
Information Processing Systems. 2023, 36
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M,
Zitnick C L, Ma J, Fergus R. Biological structure and function emerge
from scaling unsupervised learning to 250 million protein sequences.
Proceedings of the National Academy of Sciences of the United States
of America, 2021, 118(15): e2016239118
Guo Y, Wu J, Ma H, Huang J. Self-supervised pre-training for protein
embeddings using tertiary structures. In: Proceedings of the 36th
AAAI Conference on Artificial Intelligence. 2022, 6801−6809
Yuan C, Li S, Ye G, Zhang Y, Huang L K, Huang W, Liu W, Yao J,
Rong Y. Annotation-guided protein design with multi-level domain
alignment. 2024, arXiv preprint arXiv: 2404.16866
Igashov I, St ¨ark H, Vignac C, Schneuing A, Satorras V G, Frossard
P, Welling M, Bronstein M, Correia B. Equivariant 3d-conditional
diffusion model for molecular linker design. Nature Machine
Intelligence, 2024, 6(4): 417–427
Imrie F, Bradley A R, van der Schaar M, Deane C M. Deep generative
models for 3D linker design. Journal of Chemical Information and
Modeling, 2020, 60(4): 1983−1995
Duan C, Du Y, Jia H, Kulik H J. Accurate transition state generation
with an object-aware equivariant elementary reaction diffusion model.
Nature Computational Science, 2023, 3(12): 1045−1055
Jackson R, Zhang W, Pearson J. TSNet: predicting transition state
structures with tensor field networks and transfer learning. Chemical
Science, 2021, 12(29): 10022−10040
Gainza P, Sverrisson F, Monti F, Rodolà E, Boscaini D, Bronstein M
M, Correia B E. Deciphering interaction fingerprints from protein
molecular surfaces using geometric deep learning. Nature Methods,
2020, 17(2): 184−192
Kong X, Huang W, Liu Y. Generalist equivariant transformer towards
3D molecular interaction learning. In: Proceedings of the 41st
International Conference on Machine Learning. 2024, 25149−25175
Wang L, Liu H, Liu Y, Kurtin J, Ji S. Learning hierarchical protein
representations via complete 3D graph networks. In: Proceedings of
the 11th International Conference on Learning Representations. 2023
Zhao K, Rong Y, Jiang B, Tang J, Zhang H, Yu J X, Zhao P.
Geometric graph learning for protein mutation effect prediction. In:
Proceedings of the 32nd ACM International Conference on
Information and Knowledge Management. 2023, 3412−3422
Feng S, Li M, Jia Y, Ma W Y, Lan Y. Protein-ligand binding
representation learning from fine-grained interactions. In: Proceedings
of the 12th International Conference on Learning Representations.
2024
Jian Y, Wu C, Reidenbach D, Krishnapriyan A S. General binding
affinity guidance for diffusion models in structure-based drug design.
2024, arXiv preprint arXiv: 2406.16821
Xue F, Zhang M, Li S, Gao X, Wohlschlegel J A, Huang W, Yang Y,
Deng W. Se (3)-equivariant ternary complex prediction towards target
protein degradation. arXiv preprint arXiv: 2502.18875, 2025
Stärk H, Ganea O, Pattanaik L, Barzilay D, Jaakkola T. EquiBind:
geometric deep learning for drug binding structure prediction. In:
Proceedings of the 39th International Conference on Machine
Learning. 2022, 20503−20521
Lu W, Wu Q, Zhang J, Rao J, Li C, Zheng S. TANKBind:
trigonometry-aware neural networks for drug-protein binding structure
Jiaqi HAN et al.
213.
214.
215.
216.
217.
218.
219.
220.
221.
222.
223.
224.
225.
226.
227.
228.
229.
230.
A survey of geometric graph neural networks: data structures, models and applications
prediction. In: Proceedings of the 36th Conference on Neural
Information Processing Systems. 2022
Long S, Zhou Y, Dai X, Zhou H. Zero-shot 3D drug design by
sketching and generating. In: Proceedings of the 36th Conference on
Neural Information Processing Systems. 2022, 23894−23907
Pei Q, Gao K, Wu L, Zhu J, Xia Y, Xie S, Qin T, He K, Liu T Y, Yan
R. FABind: fast and accurate protein-ligand binding. In: Proceedings
of the 37th Conference on Neural Information Processing Systems.
2023
Huang Y, Zhang O, Wu L, Tan C, Lin H, Gao Z, Li S, Li S Z. ReDock: towards flexible and realistic molecular docking with diffusion
bridge. In: Proceedings of the 41st International Conference on
Machine Learning. 2024
Peng X, Luo S, Guan J, Xie Q, Peng J, Ma J. Pocket2Mol: efficient
molecular sampling based on 3D protein pockets. In: Proceedings of
the 39th International Conference on Machine Learning. 2022,
17644–17655
Lin H, Huang Y, Zhang O, Ma S, Liu M, Li X, Wu L, Wang J, Hou T,
Li S Z. DiffBP: generative diffusion of 3D molecules for target protein
binding. 2024, arXiv preprint arXiv: 2211.11214
Luo S, Guan J, Ma J, Peng J. A 3D generative model for structurebased drug design. In: Proceedings of the 35th Conference on Neural
Information Processing Systems. 2021
Liu M, Luo Y, Uchino K, Maruhashi K, Ji S. Generating 3D molecules
for target protein binding. In: Proceedings of the 39th International
Conference on Machine Learning. 2022, 13912−13924
Zhang Z, Min Y, Zheng S, Liu Q. Molecule generation for target
protein binding with structural motifs. In: Proceedings of the 11th
International Conference on Learning Representations. 2023
Lin H, Huang Y, Zhang O, Wu L, Li S, Chen Z, Li S Z. Functionalgroup-based diffusion for pocket-specific molecule generation and
elaboration. In: Proceedings of the 37th International Conference on
Neural Information Processing Systems. 2023, 36
Qiu K, Song Y, Yu J, Ma H, Cao Z, Zhang Z, Wu Y, Zheng M, Zhou
H, Ma W Y. Structure-based molecule optimization via gradientguided Bayesian update. 2024, arXiv preprint arXiv: 2411.13280
Pinheiro P O, Jamasb A, Mahmood O, Sresht V, Saremi S. Structurebased drug design by denoising voxel grids. In: Proceedings of the 41st
International Conference on Machine Learning. 2024
Morehead A, Chen C, Cheng J. Geometric transformers for protein
interface contact prediction. In: Proceedings of the 10th International
Conference on Learning Representations. 2022
Sverrisson F, Feydy J, Correia B E, Bronstein M M. Fast end-to-end
learning on protein surfaces. In: Proceedings of 2021 IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2021,
15267−15276
Townshend R J L, Bedi R, Suriana P A, Dror R O. End-to-end learning
on 3D protein structure for interface prediction. In: Proceedings of the
33rd Conference on Neural Information Processing Systems. 2019, 32
Rodrigues C H M, Pires D E V, Ascher D B. mmCSM-PPI: predicting
the effects of multiple point mutations on protein–protein interactions.
Nucleic Acids Research, 2021, 49(W1): W417−W424
Liu X, Luo Y, Li P, Song S, Peng J. Deep geometric representations
for modeling effects of mutations on protein-protein binding affinity.
PLoS Computational Biology, 2021, 17(8): e1009284
Ganea O E, Huang X, Bunne C, Bian Y, Barzilay R, Jaakkola T S,
Krause A. Independent SE(3)-equivariant models for end-to-end rigid
protein docking. In: Proceedings of the 10th International Conference
on Learning Representations. 2022
Wang Y, Shen Y, Chen S, Wang L, Fei Y, Zhou H. Learning harmonic
molecular representations on riemannian manifold. In: Proceedings of
231.
232.
233.
234.
235.
236.
237.
238.
239.
240.
241.
242.
243.
244.
245.
246.
247.
33
the 11th International Conference on Learning Representations. 2023
Jin W, Barzilay R, Jaakkola T. Antibody-antigen docking and design
via hierarchical structure refinement. In: Proceedings of the 39th
International Conference on Machine Learning. 2022, 10217–10227
Ketata M A, Laue C, Mammadov R, Stärk H, Wu M, Corso G,
Marquet C, Barzilay R, Jaakkola T S. DiffDock-PP: rigid proteinprotein docking with diffusion models. In: Proceedings of the ICLR
2023-Machine Learning for Drug Discovery Workshop. 2023
Ji Y, Bian Y, Fu G, Zhao P, Luo P. SyNDock: N rigid protein docking
via learnable group synchronization. 2023, arXiv preprint arXiv:
2305.15156
Evans R, O’Neill M, Pritzel A, Antropova N, Senior A, et al. Protein
complex prediction with alphafold-multimer. bioRxiv, 2021
Sverrisson F, Feydy J, Southern J, Bronstein M M, Correia B E.
Physics-informed deep neural network for rigid-body protein docking.
In: Proceedings of the MLDD 2022 - Machine Learning for Drug
Discovery Workshop of ICLR 2022. 2022
Yu Z, Huang W, Liu Y. Rigid protein-protein docking via equivariant
elliptic-paraboloid interface prediction. In: Proceedings of the 12th
International Conference on Learning Representations. 2024
Wu H, Liu W, Bian Y, Wu J, Yang N, Yan J. EBMDock: neural
probabilistic protein-protein docking via a differentiable energy model.
In: Proceedings of the 12th International Conference on Learning
Representations. 2024
Luo S, Su Y, Peng X, Wang S, Peng J, Ma J. Antigen-specific
antibody design and optimization with diffusion-based generative
models for protein structures. In: Proceedings of the 36th International
Conference on Neural Information Processing Systems. 2022, 709
Jin W, Wohlwend J, Barzilay R, Jaakkola T S. Iterative refinement
graph neural network for antibody sequence-structure co-design. In:
Proceedings of the 10th International Conference on Learning
Representations. 2022
Gao K, Wu L, Zhu J, Peng T, Xia Y, He L, Xie S, Qin T, Liu H, He K,
Liu T Y. Incorporating pre-training paradigm for antibody sequencestructure co-design. 2022, arXiv preprint arXiv: 2211.08406
Tan C, Gao Z, Wu L, XIA J, Zheng J, Yang X, Liu Y, Hu B, Li S Z.
Cross-gate MLP with protein complex invariant embedding is a oneshot antibody designer. In: Proceedings of the 38th AAAI Conference
on Artificial Intelligence. 2024, 15222−15230
Verma Y, Heinonen M, Garg V. AbODE: ab initio antibody design
using conjoined ODEs. In: Proceedings of the 40th International
Conference on Machine Learning. 2023, 35037−35050
Martinkus K, Ludwiczak J, Cho K, Liang W C, Lafrance-Vanasse J,
Hotzel I, Rajpal A, Wu Y, Bonneau R, Gligorijevic V, Loukas A.
AbDiffuser: full-atom generation of in vitro functioning antibodies. In:
Proceedings of the 37th Conference on Neural Information Processing
Systems. 2023
Wu F, Zhao Y, Wu J, Jiang B, He B, Huang L, Qin C, Yang F, Huang
N, Xiao Y, Wang R, Jia H, Rong Y, Liu Y, Lai H, Xu T, Liu W, Zhao
P, Yao J. Fast and accurate modeling and design of antibody-antigen
complex using tFold. bioRxiv, 2024
Lin H, Wu L, Huang Y, Liu Y, Zhang O, Zhou Y, Sun R, Li S Z.
GeoAB: towards realistic antibody design and reliable affinity
maturation. In: Proceedings of the 41st International Conference on
Machine Learning. 2024
Wu L, Lin H, Huang Y, Gao Z, Tan C, Liu Y, Wu T, Li S Z. Relationaware equivariant graph networks for epitope-unknown antibody
design and specificity optimization. 2024, arXiv preprint arXiv:
2501.00013
Xie X, Valiente P A, Kim P M. HelixGAN a deep-learning
methodology for conditional de novo design of α-helix structures.
34
248.
249.
250.
251.
252.
253.
254.
255.
256.
257.
258.
259.
260.
261.
262.
263.
264.
265.
266.
Front. Comput. Sci., 2025, 19(11): 1911375
Bioinformatics, 2023, 39(1): btad036
Lin H, Zhang O, Zhao H, Jiang D, Wu L, Liu Z, Huang Y, Li S Z.
PPFLOW: target-aware peptide design with torsional flow matching.
In: Proceedings of the 41st International Conference on Machine
Learning. 2024
Xie T, Grossman J C. Crystal graph convolutional neural networks for
an accurate and interpretable prediction of material properties. Physical
Review Letters, 2018, 120(14): 145301
Chen C, Ye W, Zuo Y, Zheng C, Ong S P. Graph networks as a
universal machine learning framework for molecules and crystals.
Chemistry of Materials, 2019, 31(9): 3564−3572
Choudhary K, DeCost B. Atomistic line graph neural network for
improved materials property predictions. npj Computational Materials,
2021, 7(1): 185
Kaba S O, Ravanbakhsh S. Equivariant networks for crystal structures.
In: Proceedings of the 36th International Conference on Neural
Information Processing Systems. 2022, 300
Yan K, Liu Y, Lin Y, Ji S. Periodic graph transformers for crystal
material property prediction. In: Proceedings of the 36th International
Conference on Neural Information Processing Systems. 2022, 1096
Magar R, Wang Y, Barati Farimani A. Crystal twins: self-supervised
learning for crystalline material property prediction. npj Computational
Materials, 2022, 8(1): 231
Yu H, Song Y, Hu J, Guo C, Yang B. A crystal-specific pre-training
framework for crystal material property prediction. 2023, arXiv
preprint arXiv: 2306.05344
Song Z, Meng Z, King I. A diffusion-based pre-training framework for
crystal property prediction. In: Proceedings of the 38th AAAI
Conference on Artificial Intelligence. 2024, 8993−9001
Xie T, Fu X, Ganea O E, Barzilay R, Jaakkola T S. Crystal diffusion
variational autoencoder for periodic material generation. In:
Proceedings of the 10th International Conference on Learning
Representations. 2022
Jiao R, Huang W, Liu Y, Zhao D, Liu Y. Space group constrained
crystal generation. In: Proceedings of the 12th International
Conference on Learning Representations. 2024
Zeni C, Pinsler R, Zügner D, Fowler A, Horton M, et al. MatterGen: a
generative model for inorganic materials design. 2024, arXiv preprint
arXiv: 2312.03687
Li Q, Jiao R, Wu L, Zhu T, Huang W, Jin S, Liu Y, Weng H, Chen X.
Powder diffraction crystal structure determination using generative
models. 2024, arXiv preprint arXiv: 2409.04727
Lin P, Chen P, Jiao R, Mo Q, Cen J, Huang W, Liu Y, Huang D, Lu Y.
Equivariant diffusion for crystal structure prediction. In: Proceedings
of the 41st International Conference on Machine Learning. 2024, 1204
Miller B K, Chen R T Q, Sriram A, Wood B M. FlowMM: generating
materials with riemannian flow matching. In: Proceedings of the 41st
International Conference on Machine Learning. 2024
Wu H, Song Y, Gong J, Cao Z, Ouyang Y, Zhang J, Zhou H, Ma W Y,
Liu J. A periodic Bayesian flow for material generation. In:
Proceedings of the 13th International Conference on Learning
Representations. 2025
Zhang S, Liu Y, Xie L. Physics-aware graph neural network for
accurate RNA 3D structure prediction. 2023, arXiv preprint arXiv:
2210.16392
Li Z, Cen J, Huang W, Wang T, Song L. Size-generalizable RNA
structure evaluation by exploring hierarchical geometries. In:
Proceedings of the 13th International Conference on Learning
Representations. 2025
Greff K, Belletti F, Beyer L, Doersch C, Du Y, et al. Kubric: a scalable
dataset generator. In: Proceedings of 2022 IEEE/CVF Conference on
267.
268.
269.
270.
271.
272.
273.
274.
275.
276.
277.
278.
279.
280.
281.
282.
283.
Computer Vision and Pattern Recognition. 2022, 3739–3751
Bear D, Wang E, Mrowca D, Binder F J, Tung H Y, Pramod R T,
Holdaway C, Tao S, Smith K A, Sun F Y, Li F F, Kanwisher N,
Tenenbaum J, Yamins D, Fan J E. Physion: evaluating physical
prediction from vision in humans and machines. In: Proceedings of the
1st Neural Information Processing Systems Track on Datasets and
Benchmarks. 2021
Yu K T, Bauza M, Fazeli N, Rodriguez A. More than a million ways to
be pushed. A high-fidelity experimental dataset of planar pushing. In:
Proceedings of 2016 IEEE/RSJ International Conference on Intelligent
Robots and Systems. 2016, 30−37
Townshend R J L, Vogele M, Suriana P, Derry A, Powers A S,
Laloudakis Y, Balachandar S, Jing B, Anderson B M, Eismann S,
Kondor R, Altman R B, Dror R O. ATOM3D: tasks on molecules in
three dimensions. In: Proceedings of the 35th Conference on Neural
Information Processing Systems. 2021
Xu M, Luo S, Bengio Y, Peng J, Tang J. Learning neural generative
dynamics for molecular conformation generation. In: Proceedings of
the 9th International Conference on Learning Representations. 2021
Chmiela S, Tkatchenko A, Sauceda H E, Poltavsky I, Schütt K T,
Müller K R. Machine learning of accurate energy-conserving
molecular force fields. Science Advances, 2017, 3(5): e1603015
Tran R, Lan J, Shuaibi M, Wood B M, Goyal S, Das A, HerasDomingo J, Kolluru A, Rizvi A, Shoghi N, Sriram A, Therrien F, Abed
J, Voznyy O, Sargent E H, Ulissi Z, Zitnick C L. The open catalyst
2022 (OC22) dataset and challenges for oxide electrocatalysts. ACS
Catalysis, 2023, 13(5): 3066−3084
Seyler S, Beckstein O. Molecular dynamics trajectory for
benchmarking MDanalysis. 2017
Lindorff-Larsen K, Piana S, Dror R O, Shaw D E. How fast-folding
proteins fold. Science, 2011, 334(6055): 517−520
Axelrod S, Gómez-Bombarelli R. GEOM, energy-annotated molecular
conformations for property prediction and molecular generation.
Scientific Data, 2022, 9(1): 185
Wang X, Zhao H, Tu W W, Yao Q. Automated 3D pre-training for
molecular property prediction. In: Proceedings of the 29th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining.
2023, 2419−2430
Isert C, Atz K, Jimenez-Luna J, Schneider G. QMugs, quantum
mechanical properties of drug-like molecules. Scientific Data, 2022,
9(1): 273
Ashburner M, Ball C A, Blake J A, Botstein D, Butler H, Cherry J M,
Davis A P, Dolinski K, Dwight S S, Eppig J T, Harris M A, Hill D P,
Issel-Tarver L, Kasarskis A, Lewis S, Matese J C, Richardson J E,
Ringwald M, Rubin G M, Sherlock G. Gene ontology: tool for the
unification of biology. Nature Genetics, 2000, 25(1): 25−29
Bairoch A. The ENZYME database in 2000. Nucleic Acids Research,
2000, 28(1): 304−305
Orengo C A, Michie A D, Jones S, Jones D T, Swindells M B,
Thornton J M. CATH–a hierarchic classification of protein domain
structures. Structure, 1997, 5(8): 1093−1109
Xue Y, Liu Z, Fang X, Wang F. Multimodal pre-training model for
sequence-based prediction of protein-protein interaction. In:
Proceedings of the 16th Machine Learning in Computational Biology
Meeting. 2022, 34−46
Chandonia J M, Fox N K, Brenner S E. SCOPe: classification of large
macromolecular structures in the structural classification of
proteins—extended database. Nucleic Acids Research, 2019, 47(D1):
D475−D481
Heinzinger M, Weissenow K, Sanchez J G, Henkel A, Steinegger M,
Rost B. ProstT5: bilingual language model for protein sequence and
Jiaqi HAN et al.
284.
285.
286.
287.
288.
289.
290.
291.
292.
293.
294.
295.
296.
297.
298.
299.
300.
301.
302.
A survey of geometric graph neural networks: data structures, models and applications
structure. bioRxiv, 2023
Bepler T, Berger B. Learning the protein language: evolution,
structure, and function. Cell Systems, 2021, 12(6): 654−669
Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, Abbeel
P, Song Y S. Evaluating protein transfer learning with TAPE. In:
Proceedings of the 33rd Conference on Neural Information Processing
Systems. 2019, 32
Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, et al.
AlphaFold Protein Structure Database: massively expanding the
structural coverage of protein-sequence space with high-accuracy
models. Nucleic Acids Research, 2022, 50(D1): D439−D444
Gao Z, Tan C, Li S Z. AlphaDesign: a graph protein design method
and benchmark on AlphaFoldDB. 2022, arXiv preprint arXiv:
2202.01079
Consortium T U. UniProt: the universal protein knowledgebase in
2023. Nucleic Acids Research, 2023, 51(D1): D523−D531
Almagro Armenteros J J, Sønderby C K, Sønderby S K, Nielsen H,
Winther O. DeepLoc: prediction of protein subcellular localization
using deep learning. Bioinformatics, 2017, 33(21): 3387−3395
Steinegger M, Söding J. Clustering huge protein sequence sets in linear
time. Nature Communications, 2018, 9(1): 2542
Klausen M S, Jespersen M C, Nielsen H, Jensen K K, Jurtz V I,
Sønderby C K, Sommer M O A, Winther O, Nielsen M, Petersen B,
Marcatili P. NetSurfP-2. 0: improved prediction of protein structural
features by integrated deep learning. Proteins: Structure, Function, and
Bioinformatics, 2019, 87(6): 520−527
Xu M, Zhang Z, Lu J, Zhu Z, Zhang Y, Chang M, Liu R, Tang J. Peer:
a comprehensive and multi-task benchmark for protein sequence
understanding. In: Proceedings of the 36th International Conference on
Neural Information Processing Systems. 2022, 2548
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical
assessment of methods of protein structure prediction (CASP)—round
XIII. Proteins: Structure, Function, and Bioinformatics, 2019, 87(12):
1011−1020
Berman H M, Westbrook J, Feng Z, Gilliland G, Bhat T N, Weissig H,
Shindyalov I N, Bourne P E. The protein data bank. Nucleic Acids
Research, 2000, 28(1): 235−242
Sterling T, Irwin J J. ZINC 15 – ligand discovery for everyone. Journal
of Chemical Information and Modeling, 2015, 55(11): 2324−2337
Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R. Comparative
assessment of scoring functions: the CASF-2016 update. Journal of
Chemical Information and Modeling, 2019, 59(2): 895−913
Schreiner M, Bhowmik A, Vegge T, Busk J, Winther O. Transition1xa dataset for building generalizable reactive machine learning
potentials. Scientific Data, 2022, 9(1): 779
Francoeur P G, Masuda T, Sunseri J, Jia A, Iovanisci R B, Snyder I,
Koes D R. Three-dimensional convolutional neural networks and a
cross-docked data set for structure-based drug design. Journal of
Chemical Information and Modeling, 2020, 60(9): 4200−4215
Morehead A, Chen C, Sedova A, Cheng J. Dips-plus: the enhanced
database of interacting protein structures for interface prediction.
Scientific Data, 2023, 10(1): 509
Stark C, Breitkreutz B J, Reguly T, Boucher L, Breitkreutz A, Tyers
M. BioGRID: a general repository for interaction datasets. Nucleic
Acids Research, 2006, 34(S1): D535−D539
Hallee L, Gleghorn J P. Protein-protein interaction prediction is
achievable with large language models. bioRxiv, 2023
Vreven T, Moal I H, Vangone A, Pierce B G, Kastritis P L, Torchala
M, Chaleil R, Jiménez-García B, Bates P A, Fernandez-Recio J,
Bonvin A M J J, Weng Z. Updates to the integrated protein– protein
interaction benchmarks: docking benchmark version 5 and affinity
303.
304.
305.
306.
307.
308.
309.
310.
311.
312.
313.
314.
315.
316.
317.
318.
319.
320.
35
benchmark version 2. Journal of Molecular Biology, 2015, 427(19):
3031−3041
Jankauskaitė J, Jiménez-García B, Dapkūnas J, Fernández-Recio J,
Moal I H. SKEMPI 2. 0: an updated benchmark of changes in
protein–protein binding energy, kinetics and thermodynamics upon
mutation. Bioinformatics, 2019, 35(3): 462−469
Raybould M I J, Kovaltsuk A, Marks C, Deane C M. CoV-AbDab: the
coronavirus antibody database. Bioinformatics, 2021, 37(5): 734−735
Wen Z, He J, Tao H, Huang S Y. PepBDB: a comprehensive structural
database of biological peptide–protein interactions. Bioinformatics,
2019, 35(1): 175−177
Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, Zhao D, Zeng J. A deeplearning framework for multi-level peptide–protein interaction
prediction. Nature Communications, 2021, 12(1): 5465
Tsaban T, Varga J K, Avraham O, Ben-Aharon Z, Khramushin A,
Schueler-Furman O. Harnessing protein folding neural networks for
peptide–protein docking. Nature Communications, 2022, 13(1): 176
Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia
S, Gunter D, Skinner D, Ceder G, Persson K A. Commentary: the
materials project: a materials genome approach to accelerating
materials innovation. APL Materials, 2013, 1(1): 011002
Castelli I E, Landis D D, Thygesen K S, Dahl S, Chorkendorff I,
Jaramillo T F, Jacobsen K W. New cubic perovskites for one- and twophotonwater splitting using the computational materials repository.
Energy and Environmental Science, 2012, 5(10): 9034−9043
Castelli I E, Olsen T, Datta S, Landis D D, Dahl S, Thygesen K S,
Jacobsen K W. Computational screening of perovskite metal oxides for
optimal solar light capture. Energy and Environmental Science, 2012,
5(2): 5814−5819
Pickard C J. AIRSS data for carbon at 10GPa and the C+N+H+O
system at 1GPa. 2020
Choudhary K, Garrity K F, Reid A C E, DeCost B, Biacchi A J, et al.
The joint automated repository for various integrated simulations
(JARVIS) for data-driven materials design. npj Computational
Materials, 2020, 6(1): 173
Choudhary K, DeCost B, Tavazza F. Machine learning with forcefield-inspired descriptors for materials: fast screening and mapping
energy landscape. Physical Review Materials, 2018, 2(8): 083801
Watkins A M, Rangan R, Das R. FARFAR2: improved de novo
Rosetta prediction of complex global RNA folds. Structure, 2020,
28(8): 963−976.e6
Liu Y, Cheng J, Zhao H, Xu T, Zhao P, Tsung F, Li J, Rong Y.
SEGNO: generalizing equivariant graph neural networks with physical
inductive biases. In: Proceedings of the 12th International Conference
on Learning Representations. 2024
Downs G M, Gillet V J, Holliday J D, Lynch M F. Review of ring
perception algorithms for chemical graphs. Journal of Chemical
Information and Computer Sciences, 1989, 29(3): 172−187
Lipinski C A, Lombardo F, Dominy B W, Feeney P J. Experimental
and computational approaches to estimate solubility and permeability
in drug discovery and development settings. Advanced Drug Delivery
Reviews, 2012, 64 Suppl 1: 4−17
Gowers R J, Linke M, Barnoud J, Reddy T J E, Melo M N, Seyler S L,
Domanski J J, Dotson D L, Buchoux S, Kenney I M, Beckstein O.
MDAnalysis: a python package for the rapid analysis of molecular
dynamics simulations. In: Proceedings of the 15th Python in Science
Conference. 2016, 105
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for
biomedical image segmentation. In: Proceedings of the 18th
International Conference on Medical Image Computing and ComputerAssisted Intervention. 2015, 234−241
Huang C W, Dinh L, Courville A. Augmented normalizing flows:
36
321.
322.
323.
324.
325.
326.
327.
328.
329.
330.
331.
332.
333.
334.
335.
336.
337.
338.
339.
Front. Comput. Sci., 2025, 19(11): 1911375
bridging the gap between generative flows and latent variable models.
2020, arXiv preprint arXiv: 2002.07101
Liberti L, Lavor C, Maculan N, Mucherino A. Euclidean distance
geometry and applications. SIAM Review, 2014, 56(1): 3−69
Kingma D P, Welling M. Auto-encoding variational Bayes. In:
Proceedings of the 2nd International Conference on Learning
Representations. 2014, 1050
Wang L, Song C, Liu Z, Rong Y, Liu Q, Wu S, Wang L. Diffusion
models for molecules: a survey of methods and tasks. 2025, arXiv
preprint arXiv: 2502.09511
Wang S, Guo Y, Wang Y, Sun H, Huang J. SMILES-BERT: large
scale unsupervised pre-training for molecular property prediction. In:
Proceedings of the 10th ACM International Conference on
Bioinformatics, Computational Biology and Health Informatics. 2019,
429−436
Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V S, Leskovec J.
Strategies for pre-training graph neural networks. In: Proceedings of
the 8th International Conference on Learning Representations. 2020
Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, Huang J. Selfsupervised graph transformer on large-scale molecular data. In:
Proceedings of the 34th International Conference on Neural
Information Processing Systems. 2020, 1053
Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec
J. Open graph benchmark: datasets for machine learning on graphs. In:
Proceedings of the 34th International Conference on Neural
Information Processing Systems. 2020, 1855
Nakata M, Shimazaki T. PubChemQC project: a large-scale firstprinciples electronic structure database for data-driven chemistry.
Journal of Chemical Information and Modeling, 2017, 57(6):
1300−1308
Pracht P, Bohle F, Grimme S. Automated exploration of the lowenergy chemical space with fast quantum chemical methods. Physical
Chemistry Chemical Physics, 2020, 22(14): 7169−7192
Hung M C, Link W. Protein localization in disease and therapy.
Journal of Cell Science, 2011, 124(20): 3381−3392
Dallago C, Mou J, Johnston K E, Wittmann B J, Bhattacharya N,
Goldman S, Madani A, Yang K K. FLIP: benchmark tasks in fitness
landscape inference for proteins. bioRxiv, 2021
Krivák R, Hoksza D. Improving protein-ligand binding site prediction
accuracy by classification of inner pocket points using local features.
Journal of Cheminformatics, 2015, 7: 12
Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source
platform for ligand pocket detection. BMC Bioinformatics, 2009, 10:
168
Jiménez J, Doerr S, Martínez-Rosell G, Rose A S, De Fabritiis G.
DeepSite: protein-binding site predictor using 3D-convolutional neural
networks. Bioinformatics, 2017, 33(19): 3036−3042
Mylonas S K, Axenopoulos A, Daras P. DeepSurf: a surface-based
deep learning approach for the prediction of ligand binding sites on
proteins. Bioinformatics, 2021, 37(12): 1681−1690
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, dos Santos Costa A,
Fazel-Zarandi M, Sercu T, Candido S, Rives A. Language models of
protein sequences at the scale of evolution enable accurate structure
prediction. bioRxiv, 2022
Suzek B E, Wang Y, Huang H, McGarvey P B, Wu C H, Consortium
U. UniRef clusters: a comprehensive and scalable alternative for
improving sequence similarity searches. Bioinformatics, 2015, 31(6):
926−932
Rao R, Meier J, Sercu T, Ovchinnikov S, Rives A. Transformer protein
language models are unsupervised structure learners. In: Proceedings
of the 9th International Conference on Learning Representations. 2021
Wu L, Huang Y, Lin H, Li S Z. A survey on protein representation
340.
341.
342.
343.
344.
345.
346.
347.
348.
349.
350.
351.
352.
353.
354.
355.
356.
357.
358.
learning: retrospect and prospect. 2022, arXiv preprint arXiv:
2301.00813
Hussain J, Rea C. Computationally efficient algorithm to identify
matched molecular pairs (MMPs) in large data sets. Journal of
Chemical Information and Modeling, 2010, 50(3): 339−348
Lin H, Huang Y, Zhang O, Wu L, Li S, Chen Z, Li S Z. Functionalgroup-based diffusion for pocket-specific molecule generation and
elaboration. In: Proceedings of the 37th International Conference on
Neural Information Processing Systems. 2023, 1504
Wang R, Fang X, Lu Y, Wang S. The PDBbind database: collection of
binding affinities for protein-ligand complexes with known threedimensional structures. Journal of Medicinal Chemistry, 2004, 47(12):
2977−2980
Kastritis P L, Moal I H, Hwang H, Weng Z, Bates P A, Bonvin A M J
J, Janin J. A structure-based benchmark for protein–protein binding
affinity. Protein Science, 2011, 20(3): 482−491
Moal I H, Fernández-Recio J. SKEMPI: a structural kinetic and
energetic database of mutant protein interactions and its use in
empirical models. Bioinformatics, 2012, 28(20): 2600−2607
Fosgerau K, Hoffmann T. Peptide therapeutics: current status and
future directions. Drug Discovery Today, 2015, 20(1): 122−128
Lee A C L, Harris J L, Khanna K K, Hong J H. A comprehensive
review on current advances in peptide drug development and design.
International Journal of Molecular Sciences, 2019, 20(10): 2383
Bhardwaj G, Mulligan V K, Bahl C D, Gilmore J M, Harvey P J, et al.
Accurate de novo design of hyperstable constrained peptides. Nature,
2016, 538(7625): 329−335
Cao L, Coventry B, Goreshnik I, Huang B, Sheffler W, et al. Design of
protein-binding proteins from the target structure alone. Nature, 2022,
605(7910): 551−560
Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow twins: selfsupervised learning via redundancy reduction. In: Proceedings of the
38th International Conference on Machine Learning. 2021,
12310–12320
Chen X, He K. Exploring simple Siamese representation learning. In:
Proceedings of 2021 IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 2021, 15745−15753
Geiger M, Smidt T. e3nn: Euclidean neural networks. 2022, arXiv
preprint arXiv: 2207.09453
Das R, Baker D. Automated de novo prediction of native-like RNA
tertiary structures. Proceedings of the National Academy of Sciences
of the United States of America, 2007, 104(37): 14664−14669
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving
language understanding by generative pre-training. 2018
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language
models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8):
9
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, et al. Language
models are few-shot learners. In: Proceedings of the 34th International
Conference on Neural Information Processing Systems. 2020, 159
Reed S, Zolna K, Parisotto E, Colmenarejo S G, Novikov A, BarthMaron G, Giménez M, Sulsky Y, Kay J, Springenberg J T, Eccles T,
Bruce J, Razavi A, Edwards A, Heess N, Chen Y, Hadsell R, Vinyals
O, Bordbar M, de Freitas N. A generalist agent. Transactions on
Machine Learning Research, 2022, See openreview.net/forum?id=
1ikK0kHjvj website, 2022
Merchant A, Batzner S, Schoenholz S S, Aykol M, Cheon G, Cubuk E
D. Scaling deep learning for materials discovery. Nature, 2023,
624(7990): 80−85
Bran A M, Cox S, Schilter O, Baldassari C, White A D, Schwaller P.
Augmenting large language models with chemistry tools. Nature
Machine Intelligence, 2024, 6(5): 525−535
Jiaqi HAN et al.
359.
360.
361.
362.
363.
364.
A survey of geometric graph neural networks: data structures, models and applications
Liu X, Yu H, Zhang H, Xu Y, Lei X, et al. AgentBench: evaluating
LLMs as agents. In: Proceedings of the 12th International Conference
on Learning Representations. 2024
Janakarajan N, Erdmann T, Swaminathan S, Laino T, Born J.
Language models in molecular discovery. In: Satoh H, Funatsu K,
Yamamoto H, eds. Drug Development Supported by Informatics.
Singapore: Springer, 2024, 121−141
Liu S, Wang J, Yang Y, Wang C, Liu L, Guo H, Xiao C.
Conversational drug editing using retrieval and domain feedback. In:
Proceedings of the 12th International Conference on Learning
Representations. 2024
Zhang W, Wang X, Nie W, Eaton J, Rees B, Gu Q. MoleculeGPT:
instruction following large language models for molecular property
prediction. In: Proceedings of NeurIPS 2023 Workshop on New
Frontiers of AI for Drug Discovery and Development. 2023
Zheng Z, Liu Y, Li J, Yao J, Rong Y. Relaxing continuous constraints
of equivariant graph neural networks for physical dynamics learning.
In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining. 2024
Liu Y, Zheng Z, Rong Y, Li J. Equivariant graph learning for highdensity crowd trajectories modeling. Transactions on Machine
Learning Research, See openreview.net/forum?id=TeQRze2ZjO
website, 2024
Jiaqi HAN is a PhD student in Computer Science
at Stanford University, USA. Previously, he
received his BE in Computer Science at Tsinghua
University, China. His research interest involves
developing principled machine learning methods
for modeling geometric systems.
Jiacheng CEN is currently a PhD student in the
Gaoling School of Artificial Intelligence, Renmin
University of China. His research interests mainly
concern geometric learning theory and its
applications in scientific settings.
Liming WU is currently pursuing his PhD in the
Gaoling School of Artificial Intelligence at
Renmin University of China. His research
interests lie in geometric deep learning and AI for
Science.
Zongzhao LI is currently a PhD student in the
Gaoling School of Artificial Intelligence, Renmin
University of China, China. His research interests
lie in geometric deep learning and its applications
in AI for Science problems.
37
Xiangzhe KONG received his BE in Computer
Science from Tsinghua University, China in 2022.
He is currently a PhD student in the Department of
Computer Science and Technology at Tsinghua
University, China.
Rui JIAO is a PhD student in the Department of
Computer Science and Technology at Tsinghua
University, China. Prior to that, he received the
BE degree in Computer Science at Tsinghua
University, China. His research interest focuses on
geometric machine learning and material design.
Ziyang YU received his BE in Computer Science
from Tsinghua University, China in 2024. He is
currently a DEng student in the Department of
Computer Science and Technology at Tsinghua
University, China.
Tingyang XU is a Senior Research Scientist at
Alibaba DAMO Academy’s Language & Science
Lab. He earned his bachelor’s degree from
Shanghai Jiao Tong University, China and his
master’s and PhD in Computer Science from the
University of Connecticut, USA under Professor
Jinbo Bi. His research focuses on deep learning
and its scientific applications, particularly geometric graph neural
networks for molecular dynamics, drug design, and AI for science.
He has developed SOTA methods for deep geometric graph learning,
advancing tasks like molecular generation and molecular property
prediction.
Fandi WU is a Senior Research Scientist at
Tencent Life Science Lab. He earned his bachelor’s
degree from University of Science and
Technology of China, China and his PhD in
Computer Science from the Institute of
Computing Technology, Chinese Academy of
Sciences. His research focuses on deep learning
and its scientific applications, particularly protein structure prediction
and protein design.
Zihe WANG received his BE in Computer
Science from Tsinghua University, China in 2011
and his PhD in Computer Science from Tsinghua
University, China in 2016. He is currently an
assistant professor at Renmin University of China.
His research focuses on algorithms and
mechanism design.
38
Front. Comput. Sci., 2025, 19(11): 1911375
Hongteng XU is an associate professor in the
Gaoling School of Artificial Intelligence, Renmin
University of China, China. From 2018 to 2020,
he was a senior research scientist in Infinia ML
Inc. In the same time period, he is a visiting
faculty member in the Department of Electrical
and Computer Engineering, Duke University,
USA. He received his Ph.D. from the School of Electrical and
Computer Engineering at Georgia Institute of Technology, USA in
2017. His research interests include machine learning and its
applications, especially optimal transport theory, sequential data
modeling and analysis, deep learning techniques, and their
applications in computer vision and data mining.
Zhewei WEI received his PhD of Computer
Science and Engineering from Hong Kong
University of Science and Technology, China. He
did postdoctoral research in Aarhus University,
USA from 2012 to 2014, and joined Renmin
University of China, China in 2014.
Deli ZHAO is leading the AI team in Alibaba
DAMO Academy. He has researched on computer
vision and machine learning for nearly two
decades, now mainly focusing on generative
models, multi-modal learning, and foundation
models.
Yang LIU is the GDS Professor in the Department
of Computer Science and Technology at Tsinghua
University, China. He is Executive Dean of
Institute for AI Industry Research (AIR) and
Associate Dean of the Department of Computer
Science and Technology. His research interests
include artificial intelligence, natural language
processing, and Medical AI.
Yu RONG is an IEEE Senior Member and is
recognized as a high-level overseas talent by
Shenzhen. In June 2017, he joined Tencent AI Lab
as a principal researcher and transitioned to
Alibaba DAMO Academy in June 2024, focusing
on large language models and AI for Science. His
research interests lie in graph deep learning and
large language models, particularly applied within the AI for Science
domain.
Wenbing HUANG is now a tenure-track associate
professor at Gaoling School of Artificial
Intelligence (GSAI), Renmin University of China,
China. Before joining GSAI, he worked as an
assistant researcher at AIR, Tsinghua University
and senior researcher at Tencent AI Lab. His
research focuses on geometric deep learning,
GNN and AI for Science.