← Back

Synthesis, characterization and studies of iridium (III) complexes inducing cell death via apoptosis and ferroptosis

Front. Comput. Sci., 2025, 19(11): 1911375 https://doi.org/10.1007/s11704-025-41426-w REVIEW ARTICLE A survey of geometric graph neural networks: data structures, models and applications Jiaqi HAN1,2*, Jiacheng CEN1*, Liming WU1*, Zongzhao LI1, Xiangzhe KONG3, Rui JIAO3, Ziyang YU3, Tingyang XU4,5, Fandi WU6, Zihe WANG1, Hongteng XU1, Zhewei WEI1, Deli ZHAO4,5, Yang LIU ( )3, Yu RONG ( )4,5, Wenbing HUANG ( )1 ✉ ✉ ✉ 1 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China 2 Department of Computer Science, Stanford University, CA 94305, USA 3 Department of Computer Science and Technology, Institute for AI, Tsinghua University, Beijing 100084, China 4 DAMO Academy, Alibaba Group, Hangzhou 311121, China 5 Hupan Lab, Hangzhou 311121, China 6 Tencent AI Lab, Shenzhen 518100, China The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn Abstract Geometric graphs are a special kind of graph with geometric features, which are vital to model many scientific problems. Unlike generic graphs, geometric graphs often exhibit physical symmetries of translations, rotations, and reflections, making them ineffectively processed by current Graph Neural Networks (GNNs). To address this issue, researchers proposed a variety of geometric GNNs equipped with invariant/equivariant properties to better characterize the geometry and topology of geometric graphs. Given the current progress in this field, it is imperative to conduct a comprehensive survey of data structures, models, and applications related to geometric GNNs. In this paper, based on the necessary but concise mathematical preliminaries, we formalize geometric graph as the data structure, on top of which we provide a unified view of existing models from the geometric message passing perspective. Additionally, we summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation. We also discuss the challenges and future potential directions of geometric GNNs at the end of this survey. Keywords scientific systems, geometric graphs, graph neural networks, equivariance, invariance 1 Introduction Many scientific problems particularly in physics and biochemistry require to process data in the form of geometric graphs [1]. Distinct from typical graph data, geometric graphs Received December 28, 2024; accepted February 24, 2025 E-mail: liuyang2011@tsinghua.edu.cn; yu.rong@hotmail.com; hwenbing@126.com * These authors contributed equally to this work. Work done by Jiaqi Han during his visit to Renmin University of China. additionally assign each node a special type of node feature in the form of geometric vectors. For example, a molecule/protein can be regarded as a geometric graph, where the 3D position coordinates of atoms are the geometric vectors; in a general multi-body physical system, the 3D states (positions, velocities or spins) are the geometric vectors of the particles. Notably, geometric graphs exhibit symmetries of translations, rotations and/or reflections. This is because the physical law controlling the dynamics of the atoms (or particles) is the same no matter how we translate or rotate the physical system from one place to another. When tackling this type of data, it is essential to incorporate the inductive bias of symmetry into the design of the model, which motivates the study of geometric Graph Neural Networks (GNNs). Constructing GNNs that permit such symmetry constraints has long been challenging to methodological design. Pioneer approaches like DTNN [2], DimeNet [3], and GemNet [4], transform the input geometric graph into distance/angle/ dihedral-based scalars that are invariant to rotations or translations, constituting the family of invariant GNNs. Noticing the limit on the expressivity of invariant GNNs, EGNN [5] and PaiNN [6] additionally involve geometric vectors in message passing and node update to preserve the directional information in each layer, leading to equivariant GNNs. With group representation theory as a helpful tool, TFN [7], SE(3)-Transformer [8], and SEGNN [9] generalizes invariant scalars and equivariant vectors by viewing them as steerable vectors parameterized by high-degree spherical tensors, giving rise to high-degree steerable GNNs. Built upon these fundamental approaches, geometric GNNs have made remarkable success in various applications of diverse systems, including physical dynamics simulation [10,11], molecular property prediction [5,8], protein structure prediction [12], protein generation [13,14], and RNA structure ranking [15]. Figure 1 illustrates the superior performance of geometric 2 Front. Comput. Sci., 2025, 19(11): 1911375 straight to the methodology part in Section 3 if they are familiar with the theoretical background. Fig. 1 Performance comparisons between geometric GNNs and traditional methods on molecular property prediction, protein-ligand docking, and antibody design. Notably, the methods based on geometric GNNs, including EGNN [5], DiffDock [16], and dyMEAN [17], remarkably outperform traditional MPNN [18], Gnina [19], and RossetaAb [20], on the datasets of QM9 [21], PDBBind [22], and SAbDab [23], respectively, verifying the effectiveness and efficiency of geometric GNNs over various tasks. (a) Property prediction; (b) ligand docking; (c) antibody design GNNs against traditional methods on the representative tasks. To facilitate the research of geometric GNNs, this work presents a systematic survey focusing both on the methods and applications1), which is structured as the following sections: In Section 2, we introduce necessary preliminaries on group theory and the formal definition of equivariance/invariance; In Section 3, we propose geometric graph as a universal data structure that will be leveraged throughout the entire survey as a bridge between real-world data and the models, i.e., geometric GNNs; In Section 4, we summarize existing models into invariant GNNs (Section 4.2) and equivariant GNNs (Section 4.3), while the latter is further categorized into scalarization-based models (Section 4.3.1) and high-degree steerable models (Section 4.3.2); Besides, we also introduce geometric graph transformers in Section 4.4; In Section 5, we provide a comprehensive collection of the applications that have witnessed the success of geometric GNNs on particlebased physical systems, molecules, proteins, complexes, and other domains like crystals and RNAs. The goal of this survey is to provide a general overview throughout data structure, model design, and applications (see Fig. 2), which constitutes an entire input-output pipeline that is instructive for machine learning practitioners to employ geometric GNNs on various scientific tasks. Recently, several related surveys have been proposed, which place main focus on methodology of geometric GNNs [36], pretrained GNNs for chemical data [37], representation learning for molecules [38,39], and general application of artificial intelligence in diverse types of scientific systems [40]. In contrast to all of them, this survey places an emphasis on geometric graph neural networks, not only encapsulating theoretical foundations of geometric GNNs but also delivering an exhaustive summary of the related applications in domains across physics, biochemistry, and material science. Meanwhile, we discuss future prospects and interesting research directions in Section 6. We also release the Github repository that collects the reference, datasets, codes, benchmarks, and other resources related to geometric GNNs. 2 Basic notion of symmetry In this section, we will compactly introduce the basic notions related to symmetry. Readers can skip this section and get 1) This work is an extended survey of our previous short version [24]. 2.1 Transformation and group By defining symmetry, we indicate that an object of interest keeps invariant under a set of transformations. For instance, the distance between any two points in space remains constant, regardless of how we simultaneously rotate or translate these two points. Mathematically, a set of transformations forms a group (more details are referred to [41]). Definition 1 (Group). A group G is a set of transformations with a binary operation “ · ” satisfying these properties: (i) it is closed, namely, ∀a, b ∈ G, a · b ∈ G ; (ii) it is associative, namely, ∀a, b, c ∈ G, (a · b) · c = a · (b · c) ; (iii) there exists an identity element e ∈ G such that ∀a ∈ G, a · e = e · a = a ; (iv) each element must have an inverse, namely, ∀a ∈ G, ∃b ∈ G, a · b = b · a = e , where the inverse b is denoted as a−1 . We below provide some examples commonly used in the applications of this paper: ● E(d) is an Euclidean group [42] consisting of rotations, reflections and translations, acting on d -dimension vectors. ● T(d) is a subgroup of Euclidean group that consists of translations. ● O(d) is an orthogonal group that consists of rotations and reflections, acting on d -dimension vectors. ● SO(d) is a special orthogonal group that only consists of rotations. ● SE(d) is a special Euclidean group that consists of only rotations and translations. ● Lie Group is a group whose elements form a differentiable manifold. Actually, all the groups above are specific examples of Lie Group. ● SN is a permutation group whose elements are permutations of a given set consisting of N elements. 2.2 Group representation While the group operation “ · ” is defined abstractly above, it can be realized as matrix multiplication, with the help of group representation. A representation of G is a group homomorphism ρ(g) : G 7→ GL(V) that takes as input the group element g ∈ G and acts on the general linear group of some vector space V , satisfying ρ(g)ρ(h) = ρ(g · h), ∀g, h ∈ G . When V = Rd , then GL(V) contains all invertible d × d matrices and ρ(g) assigns a matrix to the element g . For the orthogonal group O(d) , one of its common group representations is defined by orthogonal matrices O ∈ Rd×d subject to O⊤ O = I ; for SO(d) , the group representation is restricted to orthogonal matrices of determinant 1, denoted as R . The case of translation group T(d) is a bit tedious and can be derived in the projective space using homogeneous coordinates; here, for simplicity, we directly define translation as vector addition other than matrix multiplication. Note that Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications 3 Fig. 2 Illustration of the complete input-output pipeline from data structures, models to applications. Note that most figures here for illustrating different applications are edited based on previous papers [16,25–35]. The term “instance” indicates a self-interacted system composed of multiple particles/atoms, such as a molecule or a protein. Pocket-Based Molecule Sampling, Ligand-Binding Affinity Prediction, and ProteinLigand Docking are denoted with yellow shading to imply that all these tasks take as the multi-instance format “Molecule+Protein” the representation of a group is not unique, which will be further illustrated in Section 4.3.2. 2.3 Equivariance and invariance Let X and Y be the input and output vector spaces, respectively. The function ϕ : X → Y is called equivariant with respect to G if when we apply any transformation to the input, the output also changes via the same transformation or under a certain predictable behavior. In form, we have Definition 2 (Equivariance). The function ϕ : X 7→ Y is G equivariant if it commutes with any transformation in G , ϕ(g · x) = g · ϕ(x), ∀g ∈ G, (1) which, by implementing the group operation · with group representation, can be rewritten as: ϕ(ρX (g)x) = ρY (g)ϕ(x), ∀g ∈ G, (2) where ρX and ρY are the group representations in the input 2) Note that the identity transformation and output space, respectively. The choice of group representation facilitates the specialization of different scenarios. When both ρX and ρY are trivial representations, namely, ρX (g) = ρY (g) = I 2), ϕ becomes a trivial function; notably, when ρY (g) = I , ϕ is called an invariant function, demonstrating that invariance is just a special case of equivariance. It is able to verify that equivariance induces the following desirable properties. (i) Linearity: any linear combination of equivariant functions is still equivariant. (ii) Composability: the composition of two equivariant functions (if they can be composed) yields an equivariant function. Therefore, equivariance for each layer of a network implies that a whole network is equivariant. (iii) Inheritability: if a function is equivariant with respect to group G1 and group G2 , then this function must be equivariant with respect to the direct product of these two groups, i.e., G1 × G2 under a corresponding definition of product group operation or group representation. I could have different dimensions in the input space X and output space Y . 4 Front. Comput. Sci., 2025, 19(11): 1911375 This implies that proving equivariance of each transformation individually is sufficient to prove equivariance of joint transformations. In the following context, the variable x is instantiated as a geometric graph, the group transformation ρ(g) becomes the transformation of geometric graphs, and the function ϕ is designed as an invariant/equivariant GNN. 3 Data structure: from graph to geometric graph This section formally defines graph and geometric graph, and depicts how they differ from each other. Table 1 summarizes the notations we used throughout this paper. 3.1 Graph Conventional studies on graphs [43,44] usually focus on their relational topology. Examples include social networks, citation networks, etc. In the domain of AI-Driven Drug Design (AIDD), they are usually referred to as 2D graphs [45]. Definition 3 (Graph). A graph is defined as G:=(A, H) , where A ∈ [0, 1]N×N is the adjacency matrix with N being the number of nodes, and H ∈ RN×Ch is the node feature matrix with Ch being the dimension of the feature. Concretely, the adjacency matrix A takes the value 1 at its (i, j) -entry ai j when node i and j are connected by an edge and 0 otherwise. The i th row of H , i.e., hi ∈ RCh , represents the feature vector for node i , e.g., the one-hot embedding of the atomic number in a molecule graph. Along with the Table 1 definition of graph, we also describe some vital concepts derived. We denote the set of nodes as V and the set of edges as E . Correspondingly, the neighborhood of node i , marked as Ni , is specified to be Ni :={ j : (vi , v j ) ∈ E} . The graph can additionally contain some edge features ei j ∈ RCe for edge (vi , v j ) . Transformations on graphs: g · G . One can arbitrarily change the order of nodes without changing the topology of the graph. With the language of group representation, the permutation transformation of a graph is denoted as g · G:=(Pg APg⊤ , Pg H) , where Pg is the representation of the transformation g ∈ SN (i.e., the permutation matrix3)). We denote the equivalence in terms of permutation as G ≃ g · G . As a concrete example, molecules can be viewed as graphs, where the nodes vi are instantiated as the atoms, and the node features H are the one-hot encoding of the atomic numbers, a row for each atom. The edges A are either the existence of chemical bonds or constructed based on relative distance between atoms under a cut-off threshold, and the respective edge features ei j can be assigned as the type of the chemical bond and/or the relative distance. 3.2 Geometric graph In many applications, the graphs we tackle contain not only the topological connections and node features, but also certain geometric information. Again, in the example of a molecule, we may additionally be informed of some geometric quantities in the Euclidean space, e.g., the positions of the atoms in 3D Basic notations and definitions throughout this survey Notation Description Data structure G:=(A, H) ⃗ ⃗ G:=(A, H, X) Ni hi ∈ RCh A graph G containing N nodes, with adjacency matrix A ∈ RN×N and node feature matrix H ∈ RN×Ch . ⃗ containing N nodes, with adjacency matrix A and node feature matrix H as above, and additionally a 3D A geometric graph G ⃗ ∈ RN×3 . coordinate matrix X The neighborhood of node i . The scalar feature of node i . ⃗xi ∈ R3 ⃗ i ∈ R3×C V The 3D coordinate of node i . ⃗ (l) ∈ R(2l+1)×Cl V i The type- l irreducible vector of node i . ⃗ (l) }l∈L ⃗ (L) :={V V i i ei j ∈ RCe The set consisting of irreducible vectors of all types l ∈ L . G, g The group G and its group element g . ρX (g) ×, ⊗ The group representation ρX (g) of the transformation g in the vector space X . The operators between two vectors including cross product × and Kronecker product ⊗ W ⊗cg , ⊗W cg , ⊗cg Clebsch-Gordan (CG) tensor product, optionally with a learnable parameter W and a learnable parameter set W . Y (l) (⃗x) ∈ R2l+1 The type- l vector constructed by spherical harmonics of ⃗x ∈ S 2 : Y (l) (⃗x) = [Y−l , Y−l+1 , · · · , Yl−1 , Yl ] . Y(L) (⃗x):={Y (l) (⃗x)}l∈L A set consisting of spherical harmonics of all types l ∈ L . D(l) (g) The l -th degree Wigner-D matrix of the rotation transformation g ∈ SO(3) . Neural Network Functions implemented with MLP. The multi-channel 3D vector of node i . The edge feature from node j to i . Operator ϕ, ψ, φ, σ (l) (l) (l) (l) 3) The permutation of A can also be written in the form of group representation by first vectorizing A as Vec(A) and then conducting (P ⊗ P )Vec(A) . g g Here ⊗ defines the Kronecker product, and Pg ⊗ Pg is the 2-order representation of the permutation matrix. Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications 5 coordinates4). Such quantities are of particular interest in that they encapsulate rich directional information that depicts the geometry of the system. With the geometric information, one can go beyond working on limited perception of the graph topology, but instead to a broader picture of the entire configuration of the system in 3D space, where important information, such as the relative orientation of the neighboring nodes and directional quantities like velocities, could be better exploited. Hence, in this section, we begin with the definition of geometric graphs, which are usually referred to as 3D graphs [1]. Definition 4 (Geometric Graph). A geometric graph is defined ⃗ ⃗ , where A ∈ [0, 1]N×N is the adjacency matrix, as G:=(A, H, X) N×C h H∈R is the node feature matrix with dimension Ch , and ⃗ ∈ RN×3 are the 3D coordinates of all nodes. X ⃗ , namely, hi ∈ RCh and xi ∈ R3 The i th rows of H and X denote the feature and 3D coordinate of node vi , respectively. In the above definition, we distinguish the coordinate matrix ⃗ from other quantities A and H , and geometric graph G ⃗ X from graph G , with an over-right arrow “ → ”, indicating that they contain geometric and directional information. Note that ⃗ in a there could be other geometric variables besides X geometric graph, such as velocity, force, and so on. Then the ⃗ is extended from N × 3 to N × 3 × C where C shape of X denotes the number of channels. In this section, we assume C = 1 for conciseness, while more complete examples are shown in Section 5. ⃗ . In contrast Transformations on geometric graphs: g · G to graphs, transformations on geometric graphs are not limited to node permutation. We summarize the transformations of interest below: ⊤ ⃗ ● Permutation, which is defined as g · G:=(P g APg , ⃗ , where Pg is the permutation matrix Pg H, Pg X) representation of g ∈ Sn ; ● Orthogonal transformation, which is defined as ⃗ A, H, XO ⃗ g ) , where Og is the orthogonal matrix g · G:=( representation of g ∈ O(3) , consisting of rotations and reflections; ⃗ A, H, X ⃗ + ⃗t g ) , ● Translation, which is defined as g · G:=( where ⃗t g is the translation vector of g ∈ T(3) . ⃗ ≃ g·G ⃗ . We can combine We always have the equivalence G orthogonal transformation and translation into Euclidean ⃗ transformation on geometric graphs, namely, g · G:= ⃗ ( A, H, XOg + ⃗t g ) for g ∈ E(3) . Here, the Euclidean group E(3) is a semidirect product [46] of orthogonal transformation and translation, denoted as E(3) = T(3) ⋉ O(3) . We can similarly define SE(3) transformation by considering only rotation and translation. We sometimes call H invariant features (or scalars), since they are independent to E(3) transformation, ⃗ equivariant features (or vectors) that correlate to and call X E(3) transformations. Figure 3 demonstrates the example of transformation on geometric graph. Fig. 3 Examples of transformations on geometric graphs. (a) Permutation; (b) translation; (c) rotation; (d) reflection Geometric graphs are powerful and general tools to model a variety of objects in scientific tasks, including small molecules [5,47], proteins [14,48], crystals [49,50], physical point clouds [25,51], and many others. We will provide more details in Section 5. 4 Model: geometric GNNs In this section, we first recap the general form of Message Passing Neural Network (MPNN) on topological graphs. Then we introduce different types of geometric GNNs that extends the message passing paragidm of MPNNs to geometric graphs: invariant GNNs, equivariant GNNs, as well as geometric graph transformers. Finally, we briefly present the works that discuss the expressivity of geometric GNNs. Figure 4 presents the taxonomy of geometric GNNs in this section. 4.1 Message passing neural networks Graph Neural Networks (GNNs) are favorable to operate on graphs with the help of the message-passing mechanism, which facilitates the information propagation along the graph structure by updating node embeddings through neighborhood aggregation. To be specific, message-passing GNNs implement ϕ(G) on topological graphs G by iterating the following message-passing process in each layer [18], ( ) mi j = ϕmsg hi , h j , ei j , (3) ( ) h′i = ϕupd hi , {mi j } j∈Ni , (4) where ϕmsg (·) and ϕupd (·) are the message computation and feature update function, respectively. The node features hi , h j and edge feature ei j is first synthesized by the message function to obtain the message mi j . The messages within the neighborhood are then aggregated with one set function and 4) Although we mainly focus on 3D space, most of our analyses can be extended to d -dimensional space where d is an arbitrary integer. 6 Front. Comput. Sci., 2025, 19(11): 1911375 Fig. 4 Taxonomy of geometric GNNs introduced in Section 4 leveraged to update the node features h′i combined with the input hi . GNNs defined by Eqs. (3) and (4) are always permutation equivariant but not inherently E(3) -equivariant. When mentioning equivariance or invariance in what follows, this paper mainly discusses the latter unless otherwise specified. 4.2 Invariant graph neural networks Moving forward to the geometric domain, there are various tasks that require the model we propose to be invariant with regard to Euclidean transformations. For instance, for the task of molecular property prediction, the predicted energy should remain unchanged regardless of any rotation/translation of all atom coordinates. Embedding such inductive bias is crucial as it essentially conforms to the physical rule of our 3D world. In form, invariant GNNs update invariant features as ⃗ with the function ϕ satisfying: H′ = ϕ(G) ⃗ = ϕ(G), ⃗ ∀g ∈ E(3). ϕ(g · G) (5) To design such function, invariant GNNs usually transform ⃗ to invariant scalars that are equivariant coordinates X unaffected by Euclidean transformations. Early invariant GNNs can date back to DTNN [2], MPNN [18], and MVGNN [91], where relative distances are applied for edge construction. Recent works further elaborate the use of various invariant scalars ranging from relative distances to angles or dihedral angles between edges, upon the message passing mechanism in Eqs. (3) and (4). We introduce several representative works below. SchNet [47]. This work designs a continues filter convolution conditional on relative distances ri j = ∥⃗xi − ⃗x j ∥ . In particular, it re-implements Eq. (3) as mi j = σ2 (ri j )σ1 (h j ), (6) where the message is calculated as the multiplication between the continues convolution filter and the neighbor embedding, and the functions σ are all Multi-Layer Perceptrons (MLPs). 5) Here CBF is short for Circular Bessel Function. 6) Here SBF is short for Spherical Bessel Function. DimeNet [3]. By observing that using relative distances alone is unable to encode directional information, DimeNet proposes directional message passing which takes as input not only relative distances but also angles between adjacent edges. The main component to compute the message embedding of each directional edge (from j to i ) is given by:     ∑   ( ji) (k ji)  ′ σint (mk j , eRBF , eCBF ) , m ji = σmsg  m ji , (7)   k∈N j \{i} ( ji) where eRBF denotes the radial basis function representation of k ji relative distance d ji ; eCBF 5) computes the joint representation of relative distance dk j and angle α(k j, ji) between edge (vk , v j ) and (v j , vi ) , with the help of spherical Bessel functions and spherical harmonics. In [3], Eq. (7) is applied as an interaction block before an embedding block that derives the message m ji ( ji) based on eRBF and hidden features hi and h j . The updated messages m′ji of all neighbor nodes are then utilized to update hidden feature hi . A faster version of DimeNet is proposed later, dubbed DimeNet++ [52,53]. GemNet [4]. To achieve universal expressivity, GemNet further takes dihedral angles into account, formulating twohop directional message passing based on quadruplets of nodes. Basically, it replaces the message embeddings from Eq. (7) in DimeNet [3] with the following form:   ∑   m′ji = σmsg  m ji , m (8) , jikl  k∈Ni \{ j} l∈Nk \{i, j} e(lk) RBF ) (ikl) ( jikl) m jikl = σint mlk , e(lk) RBF , eCBF , eSBF , e(ikl) CBF ( (9) ( jikl) eSBF 6) where and are defined as above; are calculated by, the spherical Bessel function of relative distance d ji , and spherical harmonics of angle αji,ik and dihedral angle αji,kl . The input of Eq. (8) additionally integrates hidden features hi and h j for more expressivity in its original formulation. Note that GemNet can be modified to enable equivariant output by multiplying the output with the Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications associated direction, which belongs to scalarization based equivariant GNNs introduced in the next subsection. LieConv [54]. LieConv is formulated as follows. ( ) mi j = σ log(u−1 (10) i u j) h j, ) ( ∑ 1 (11) hi + mi j , j∈N(i) |N(i)| + 1 where ui ∈ G is a lift of ⃗xi , the logarithm log maps each group member onto the Lie Algebra g that is a vector space, and σ is a parametric MLP. Besides, Eq. (11) conducts normalization by the division of the number of all nodes, i.e., |N(i)| + 1 . It is clear that LieConv only specifies the update of node features hi while keeping the geometric vectors ⃗xi unchanged. That means LieConv is invariant. In addition to the above models, SphereNet [55] is another prevailing invariant GNN. Similar to GemNet, SphereNet also exploits relative distances, angles, and torsion angles for geometric modeling, which is able to distinguish almost all 3D graph structures. Moreover, its proposed spherical message passing (SMP) enables both fast and accurate 3D molecular learning on large-scale molecules. ComENet [56] is another type of invariant model which incorporates 3D information completely and efficiently. It ensures global completeness of model only with message passing in 1 -hop neighborhood to avoid time-consuming calculations like torsion in SphereNet or dihedral angles in GemNet. k -DisGNN [57] relies solely on invariant relative distance information, yet adopts high-order message-passing frameworks from traditional graph learning (e.g., k -WL or k -FWL), achieving completeness for k = 2 . GeoNGNN [58], the geometric extension of the simplest subgraph GNN (NGNN [92]), effectively utilizes local subgraph information and also attains completeness with only distance features. There are also some other studies [59,93–95] exploiting the quaternion algebra to represent the 3D rotation group, which mathematically ensures SO(3) invariance during the inference. Specifically, QMP [59] constructs quaternion message-passing module to distinguish the molecular conformations caused by bond torsions. h′i = 4.3 Equivariant graph neural networks In contrast to invariant GNNs that only conduct the update of invariant features, equivariant GNNs simultaneously update both invariant features and equivariant features, given that many practical tasks (such as molecular dynamics simulation) requires equivariant output. More importantly, as proved in [96], equivariant GNNs are strictly more expressive than invariant GNNs particularly for sparse geometric graphs. In form, equivariant GNNs design the function over ⃗ satisfying: ⃗ ′ ) = ϕ(G) geometric graphs as (H′ , X ⃗ = g · ϕ(G), ⃗ ∀g ∈ E(3). ϕ(g · G) (12) Specifically, through the lens of message-passing in Eqs. (3) and (4), the geometric message is derived as ( ) ⃗ i j = ϕmsg hi , h j , ⃗xi , ⃗x j , ei j . mi j , m (13) ⃗ i j are In subsequent, the computed geometric messages m aggregated within the neighborhood Ni specified by the 7 connectivity or adjacency matrix of the graph, and updated by taking the input features into account. This update process is formally summarized as ( ) ⃗ i j )} j∈Ni . h′i , ⃗x′i = ϕupd hi , {(mi j , m (14) The functions ϕmsg and ϕupd should ensure that all invariant/equivariant output to be invariant/equivariant with respect to any E(3) transformation of the input. There are different ways to realize the specific form of ϕmsg and ϕupd . Below, we categorize current famous equivariant GNNs into two classes: scalarization-based models and highdegree steerable models. 4.3.1 Scalarization-based models This line of works first translates 3D coordinates into invariant scalars, which is similar to the design of invariant GNNs, but it refines beyond invariant GNNs by further recovering the direction of the processed scalars for the update of equivariant features. EGNN [5]. EGNN is one of the most famous scalarization based models, and it can be considered as an equivariant enhancement of two prior works, SchNet [47] and Radial Field [63]. For its message function ϕmsg (·) , it first applies the relative distance for the update of invariant message, which is then multiplied back with the relative coordinate to derive directional message. The form of ϕmsg (·) is as follows: ( ) mi j = σ1 hi , h j , ∥⃗xi − ⃗x j ∥2 , ei j , (15) ( ) ⃗ i j = (⃗xi − ⃗x j )σ2 mi j , m (16) while the update function ϕupd (·) takes the following form, ) ( ∑ (17) mi j , h′i = σ3 hi , j∈Ni ⃗x′i = ⃗xi + γ ∑ j∈Ni ⃗ i j, m (18) where σ1 , σ2 , σ3 are all instantiated as Multi-Layer Perceptrons (MLPs), and γ is a predefined constant. GMN [51]. In practice, each node is usually associated with multiple geometric features besides 3D position, such as velocity and force. Therefore, GMN proposes a multi-channel version of EGNN by defining a multi-channel vector ⃗ i ∈ R3×C for node i , where different channel (column) V indicates different kind of geometric vector. In the message computation, the multi-channel vectors interact through inner product and are properly normalized for more training stability just before they are fed into the MLP, i.e.,   ⃗ ⊤V ⃗   V ij ij  mi j = σ1  hi , h j , , ei j  , (19) ⃗ ⊤V ⃗ ∥V i j i j ∥F ( ) ⃗ ij = V ⃗ i j σ2 mi j , M (20) ⃗ i j is a translation-invariant directional matrix related where V ⃗ i = [⃗xi , ⃗x˙ i ] where ⃗ j ; for instance, if we have V ⃗ i and V to V ⃗x˙ i ∈ R3 defines the velocity, then we can either choose the ⃗ij = V ⃗i − V ⃗ j , or the concatenate form direct subtraction V 8 Front. Comput. Sci., 2025, 19(11): 1911375 ⃗ i j = [V ⃗ i, V ⃗ j ] where the first channel of V ⃗ j is made ⃗ i and V V translation invariant by subtracting the mean coordinate [65]. The update process is analogous to Eqs. (17) and (18), but extended to the multi-channel fashion as well. PaiNN [6]. By initializing the multi-channel equivariant ⃗ i = ⃗0 ∈ R3×C , PaiNN features to be zeros, namely, letting V ⃗ i as well as invariant feature hi via the iteratively updates V fixed relative position of the input coordinates ⃗xi j = ⃗xi − ⃗x j in each layer, with the help of residual connection and gated nonlinearity. We rewrite and somehow generalize the original form proposed by [6] using our consistent denotations. The messages are given by: ( ) mi j = σ1 h j , ∥⃗xi j ∥2 , ei j , (21) ( ) ( ) ⃗ ij = V ⃗ j σ2 mi j + ⃗xi j σ3 mi j , M and the update functions are calculated as: ∑ mi = hi + mi j , j∈N(i) ∑ (22) (23) ⃗ i j, M (24) ( ) ⃗ i∥ , h′i = mi + σ4 mi , ∥ M (25) ( ) ⃗′ = M ⃗i+M ⃗ i σ5 mi , ∥ M ⃗ i∥ , V i (26) ⃗i =V ⃗i + M j∈N(i) where the functions σ1 – σ5 are non-linear invariant scaling functions. In Eqs. (25) and (26), “ ∥ · ∥ ” outputs a multi-channel scalar each channel of which computes the vector norm of each channel of the input matrix. Local frames [60–62]. These methods construct local ⃗ ∈ R3×3 that are equivariant to frames (i.e., reference frames) F rotations and can be utilized to project the geometric information into invariant representations. In particular, LoCS ⃗ i ∈ R3 of [61] and Aether [61] leverage the angular position w ⃗ i = R(⃗ each node i to construct node-wise local frames F wi ) wi ) ∈ R3×3 is the corresponding rotation matrix of the where R(⃗ ⃗ i . ClofNet [60] instead builds up edge-wise angular position w ⃗ i j = [⃗ai j , ⃗bi j ,⃗ci j ] , with local frames F ⃗xic − ⃗x jc ⃗xic × ⃗x jc , ⃗bi j = , ∥⃗xic − ⃗x jc ∥ ∥⃗xic × ⃗x jc ∥ ⃗ci j = ⃗ai j × ⃗bi j . ⃗ai j = (27) Here ⃗xic = ⃗xi − ⃗xc is translation-invariant by subtracting the ∑N ⃗ i j is also ⃗xi so that the frame F center of mass ⃗xc = N1 i=1 translation-invariant. With local frames, the invariant message mi j is generated as ( ) ⃗ ⊤F ⃗ mi j = σ1 hi , h j , V (28) ij ij , ⃗ i j is the translation-invariant geometric information where V between node i and j , similar to the considerations in GMN (Eq. (20)). ClofNet additionally considers to project the invariant message into an equivariant counterpart: ( ) ⃗ ij = F ⃗ i j σ2 mi j . M (29) There are other works that exploit the scalarization technique to permit equivariance. GVP-GNN [64] first performs channel-wise linear projection of the input vector to align the channel dimension, and then computes the normalization of the projected vector as the scalar that is multiplied with the vector as the output vector. During this process, GVP-GNN does not pass the information from the input scalars, which is different from EGNN where the input scalars also influence the update of the vector. EGHN [65], built upon GMN, leverages a hierarchical encoder-decoder mechanism to represent the multi-body interaction with specially-designed equivariant pooling and unpooling modules. FastEGNN [66] addresses large-scale geometric graph scenarios by employing a small ordered set of virtual nodes, which minimizes the number of required edges and enhances computational efficiency. In LEFTNet [69], a local hierarchy of 3D isomorphism is proposed to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. This work leads to two crucial modules for designing expressive and efficient geometric GNNs: local substructure encoding and frame transition encoding. SaVeNet [70] enhances the numerical stability of the model by introducing gradually decaying directional noise during the training phase. ViSNet [71] employs vector-scalar interactive message passing to implicitly extract various geometric features. QuinNet [72] integrates many-body interactions, extending this modeling to include interactions of up to five bodies. Furthermore, HEGNN [73] leverages the inner product of high-degree steerable features to enhance scalar messaging, thereby achieving a balance between efficiency and effectiveness. Additionally, as scalars can be combined with various other invariant information, ETNN [74] further amplifies the expressiveness of the model by introducing deep topological learning constructs. EquiLLM [75] enhances the representation of invariant scalars through knowledge injection from large language models, and can be flexibly generalized to various geometry tasks. For all above methods, the scalarization process is implemented via the inner-product operator. In contrast to this, Frame-Averaging [67] proposes to ensure equivariance via 1 ∑ −1 this averaging process: |G| g∈G g · σ(g · ⃗x) , where σ is an −1 arbitrary MLP and the term g · ⃗x makes the input invariant. To deal with the case when the cardinality of G is large, [67] instead conduct an average over a carefully selected subset that is obtained by the so-called frame function. The idea of Frame-Averaging is latter exploited in the field of material design [68]. 4.3.2 High-degree steerable models For the aforementioned scalarization-based models, the node variables to be updated include invariant scalars hi and ⃗ i for the multi-channel case), and equivariant vectors ⃗xi (or V the 3D rotation representation throughout the network is the rotation matrix Rg . It will be observed that scalars and vectors are respectively type- 0 and type- 1 steerable features, and the rotation matrix is the 1 st degree matrix of a more general rotation representation. We will show that it is possible to derive high-degree representations of steerable features Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications beyond scalars and vectors in equivariant GNNs. Prior to the introduction of high-degree models, we first introduce the concepts: 1. Wigner-D matrices [97] to convert 3D rotations to group representations of different degree; 2. spherical harmonics [98] to convert 3D vectors to steerable features of different type; 3. Clebsch-Gordan (CG) tensor product [99] to perform equivariant mapping between steerable features. Wigner-D matrices. In the general high-degree case, a widely studied genre of the representation for the rotation group SO( 3 ) is the irreducible representation [97]: ρ(g):= D(l) (g) ∈ R(2l+1)×(2l+1) , g ∈ SO(3), (30) where is the Wigner-D matrix7) of degree l, and l ∈ N = {0, 1, 2, . . . } . In particular, D(0) (g) = 1 reduces to trivial representation and D(1) (g) = Rg takes the form of the rotation matrix. The steerability of a type- l feature ⃗x(l) ∈ R2l+1 is defined as D(l) (g)⃗x(l) , which naturally unifies the aforementioned invariant features and equivariant features by restricting l = 0 and l = 1 , separately. Provided that there could be steerable features of multiple types and multiple channels, we provide a general form of steerable features: { } ⃗ (l) ∈ R(2l+1)×Cl ⃗ (L) := V V , (31) Clebsch-Gordan (CG) tensor product. Although spherical harmonics offer a way to design equivariant mapping from 3D coordinates (type-1 features) to type- l features, they are unable to depict the interactions between steerable features of arbitrary types, which, however, is central to the design of equivariant functions when their input contains steerable features of various types. Fortunately, CG tensor product provides a tractable solution to this issue [99]. It derive ⃗ (l) ∈ R(2l+1)×C from two multi-channel steerable features V ⃗ (l1 ) ∈ R(2l1 +1)×C1 , V ⃗ (l2 ) ∈ R(2l2 +1)×C2 by: V [ ] ⃗ (l) = V ⃗ (l1 ) ⊗W ⃗ (l2 ) (l) , V (34) cg V which can be expanded in details by: D(l) (g) l∈L where L is the set consisting of all possible types and Cl is the number of channels for type l . Since we are addressing geometric graphs in this paper, we will specify the steerable ⃗ (l) . ⃗ (L) and its type- l component as V features of node i as V i i Spherical harmonics. We have defined how to steer type- l features via Wigner-D matrices, but we do not know yet how to obtain type- l features given 3D coordinates. Spherical harmonics are such tools to serve this purpose. Spherical harmonics are a set of Fourier basis on the unit sphere S 2 . They map 3D vectors on the unit sphere S 2 into (2l + 1) dimensional vector space8). That is, Y (l) (⃗x) : S 2 7→ R2l+1 , (32) where ⃗x is a unit vector on the sphere, and the elements in Y (l) are usually used together and denoted as (l) (l) (l) [Y−l , Y−l+1 , · · · , Yl−1 , Yl(l) ] where different element is called different order. It can also be generalized to take arbitrary 3D ⃗x vector as input by properly normalizing the vector as ∥⃗x∥ prior to feeding into the spherical harmonics. This offers a unified view of transition to vector spaces of arbitrary type, where scalars correspond to Y (0) (⃗x) = 1 when l = 0 , and vectors correspond to Y (1) (⃗x) = ⃗x ∈ R3 when l = 1 . More importantly, spherical harmonics are equivariant in terms of Wigner-D matrices: Y (l) (Rg ⃗x) = D(l) (g)Y (l) (⃗x), g ∈ SO(3), (33) where Rg is the rotation matrix and D(g) ∈ R(2l+1)×(2l+1) refers to the Wigner-D matrix of degree l. To create multi-type multi-channel steerable features, we apply Y (l) over multiple copies for each type in L , yielding Y(L) . ∈ R3×3 9 v(l) m,c = C∑ 1 ,C 2 wc1 c2 c c1 =1 c2 =1 l1 ,l2 ∑ Q(l(l,m) v(l1 ) v(l2 ) , ,m )(l ,m ) m1 ,c1 m2 ,c2 1 1 2 2 (35) m1 =−l1 m2 =−l2 (l) ⃗ (l) ; where vm,c indicates the m th order and c th channel of V (l,m) Q(l ,m )(l ,m ) are the Clebsch-Gordan (CG) coefficients [99] 1 1 2 2 and are zeros unless |l1 − l2 | ⩽ l ⩽ l1 + l2 ; wc1 c2 c is the learnable parameter in the parameter matrix W ∈ RC1 ×C2 ×C , and when W are all ones, Eq. (35) reduces to the traditional nonparametric CG tensor product. One promising property of CG tensor product is that it is SO(3) -equivariant regarding Wigner-D matrices, implying that ∀g ∈ SO(3) , ( )] [( ) (l2 ) ⃗ (l2 ) (l) . ⃗ (l) = D(l1 ) (g)V ⃗ (l1 ) ⊗W (36) D(l) (g)V cg D (g)V For simplicity, the steerable variables in Eq. (34) are all of a single type. It is tractable to generalize Eq. (34) to the multitype case by employing it over each combination of inputoutput type, and assigning different learnable parameters accordingly, which leads to a general form as follows: ⃗ (L) = V ⃗ (L1 ) ⊗W ⃗ (L2 ) . V cg V (37) With the above building blocks, we below introduce several prevailing high-degree steerable models where the updated ⃗ (L) . steerable variables for each node are V i TFN [7]. With our formulation for the high-degree steerable operations, Tensor Field Network (TFN) computes the following equivariant point convolution: ) ( xi j (L) (L) ⃗ ⃗ (L) , ⃗ ⊗W V (38) Mi j = Y ∥⃗xi j ∥ cg j where ⃗xi j = ⃗xi − ⃗x j is the radial vector, and the element in W is generated by a radial MLP f (∥⃗xi j ∥) upon the distance ∥⃗xi j ∥ . Here ⃗xi are fixed as the initial coordinates of the input data. The update of each node is implemented as a series of operations including aggregation: ∑ ⃗ (L) , ⃗ (L) + ⃗ (L) = V (39) M U ij i i j∈N(i) and self-interaction: ⃗ (l)W (l) }l∈L , ⃗ (L) = {U V i 7) Wigner-D matrices lie in the complex space, but they can be transformed to the real space under appropriate bases. 8) Similar to Wigner-D matrices, the output of spherical harmonics are complex but can be transformed into real space under certain bases. (40) 10 Front. Comput. Sci., 2025, 19(11): 1911375 where W (l) ∈ Rcl ×cl is the learnable channel-mixing matrix for each type l , and node-wise non-linearity: { ( )} ⃗ (l) σ ∥V ⃗ (l) ∥2 + b(l) ⃗ ′(L) = V V , (41) l∈L where σ (·) is an activation function, “ ∥ · ∥2 ” is the L2 vector ⃗ (l) , and norm over the order dimension (with size (2l + 1) ) of V b(l) ∈ Rcl is the bias for type l . SEGNN [9]. SEGNN enhances TFN from equivariant point convolution to general equivariant message passing. Firstly, SEGNN involves high-degree geometric features from both node i and j in message computation by deriving ⃗ (L) ⊕ {∥⃗xi j ∥2 } , where, again, ⃗xi j = ⃗xi − ⃗x j is the ⃗ (L) ⊕ V ⃗ (L) = V V j i ij radial vector, and “ ⊕ ” denotes concatenation along the channel dimension for the steerable features with the same type l ∈ L . For example, ⃗ (l) ∥V ⃗ (l) }l∈L . ⃗ (L) ⊕ V ⃗ (L) :={V V 1 2 1 2 (42) Here, “ ∥ ” stands for concatenation along the channel. Subsequently, the high-degree linear message passing specified in Eq. (38) is extended to a non-linear fashion via gated non-linearities [100]: ) ( ⃗xi j ⃗ (L) , ⃗ (L) , gi j = Y(L) ⊗W1 V (43) V ij ∥⃗xi j ∥ cg i j ) ( (L) ⃗ , Swish( gi j ) , ⃗ (L) = Gate V (44) M ij ij where Gate (·) is the gated non-linearity introduced in [100], Swish (·) is the Swish activation [101], and gi j is a scalar read out from the CG tensor product that will further be leveraged to control the scale in the non-linearity of Eq. (44). Notably, the CG product and non-linearity in Eqs. (43) and (44) are performed twice in the implementation of [9]. Analogous to the design of multi-layer perceptrons (MLPs), they are dubbed the steerable MLP. The update function also employs the proposed steerable MLP. In detail,  ( )  ∑  ⃗ x i j ⃗ (L) , gi =   Y(L) V i   ∥⃗ x ∥ i j j∈Ni     ∑ (L) (L)  V 2 ⃗ ⃗  , ⊗W + M (45) cg   i ij   j∈Ni ) ( (L) ⃗ , Swish(gi ) . ⃗ (L) + Gate V ⃗ ′(L) = V V i i i (46) Besides those have been introduced above, there are still many methods to build equivariant models with high-degree steerable features. Cormorant [76] utilizes channel-wise CG product (a reduced and more efficient form of Eq. (34) that acts on each input channel independently) and channel concatenations to formulate one-body and two-body interactions among the input graph systems. NequIP [10] improves the convolutional layer in TFN [7] by further introducing the radial Bessel functions and a polynomial envelope function used in DimeNet [3] to get a better embedding of interaction distance, thereby improving the performance of the model. SCN [78] regards each node embedding as a set of spherical functions (i.e., the spherical harmonics), then conducts message passing by rotating the embeddings based on the 3D edge orientation, and finally updates the node embeddings via discrete spherical Fourier Transform. Its following work, eSCN [79] proposes to reduce the computation complexity of the equivariant convolution on SO(3) with a mathematically equivalent one on SO(2) . To enable higher body interaction beyond the two-body modeling in most previous papers, MACE [80] and Allegro [77], propose a simplified algorithm to construct the tensor product item, motivated by a new technology in physics called Atomic Cluster Expansion (ACE) [102–104]. An illustrative comparison of invariant GNNs, scalarizationbased equivariant GNNs, and high-degree steerable equivariant GNNs is summarized in Table 2. 4.4 Geometric graph transformers Inspired by the significant success of Transformers [105,106] in many areas, such as natural language processing and computer vision, there have been efforts to apply these selfattention-based architectures to data structure like graphs or even geometric graphs in the scope of this survey. Summarized in Fig. 4, these methods stem from different types of geometric representations, including invariant representation, scalarization-based equivariant representation, and high-degree steerable representation, which have been elaborated in Section 4. Below we discuss these Transformers in detail. Graphormer [81,82]. Graphormer has been firstly proposed as a powerful Transformer architecture operating on graphs, Table 2 Illustrations of representative models for invariant GNNs, scalarization-based GNNs and high-degree steerable GNNs. Notably, these three types of models are able to process geometric features of different degrees Invariant GNNs (e.g., SchNet [47]) Scalarization-Based Models (e.g., EGNN [5]) mi j = σ2 (ri j )σ1 (h j ) ) ( mi j = σ1 hi , h j , ∥⃗xi − ⃗x j ∥2 , ei j ( ) ⃗ i j = (⃗xi − ⃗x j )σ2 mi j m ) ( ⃗ (L) ⃗ (L) = Y(L) ⃗xi j ⊗W M cg V j ij ∥⃗x ∥ ) ( ∑ h′i = σ3 hi , j∈Ni mi j ) ( ∑ h′i = σ3 hi , j∈Ni mi j ∑ ′ ⃗xi = ⃗xi + γ j∈Ni m ⃗ ij ) ( (L) ∑ ⃗ (L) ⃗ , j∈N(i) M ⃗ (L) + σ V ⃗ ′(L) = V V ij i i i Message computation Feature update High-Degree Steerable Models (e.g., TFN [7]) ij Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications equipped with centrality encoding, spatial encoding, and edge encoding [81]. With its success on challenging 2D graph datasets, e.g., the OGB-LSC Challenge [107], it has been subsequently extended to work on geometric graphs with special designs in computing the encodings. To be specific, the spatial encoding, which aims to measure the spatial ⃗ , is chosen to be the relation between node i and j in G Euclidean distance ∥⃗xi − ⃗x j ∥2 transformed by Gaussian basis functions [108]. The centrality encoding is derived as a summation of the spatial encodings over the connected edges for each node. The encodings are then utilized in computing the self-attention, and layer normalization is also adopted for the intermediate features. Notably, all representations are E( 3 )-invariant under the construction of Graphormer. In order to make it suitable for E( 3 )-equivariant prediction tasks, [81] proposes to use a projection head as the final block, which aggregates the edge vectors, scaled by their corresponding attention weights to obtain a node-wise vector as output: ∑ f⃗i = ai j (⃗xi − ⃗x j ), (47) j,i where ai j is the E(3) -invariant attention weight between node i and j . TorchMD-Net [83]. TorchMD-Net is an equivariant Transformer that tackles general multi-channel geometric vectors in a scalarization-based manner, akin to PaiNN [6]. Yet, in the process of attention computation, only invariant representations hi and distances ∥⃗xi j ∥ are involved. Specifically, the distance is firstly embedded by two MLPs σdK and σdV for the key and value, respectively: (i j) diKj = σdK (eRBF ), (i j) (48) diVj = σdV (eRBF ), (i j) where eRBF is the radial basis function representation of distance ∥⃗xi j ∥ , similar to Eq. (7). The query, key, and value are given by linear transformations of the input scalar features: qi = hiWQ , ki j = hiWK ⊙ diKj , vi j = h jWV ⊙ diVj , (49) where “ ⊙ ” is the element-wise product. Instead of traditionally adopted Softmax operator [105], TorchMD-Net simplifies to SiLU non-linearity: ∑ ( ) ( ) ai j = SiLU qi ⊙ ki j · Cutoff ∥⃗xi j ∥ , (50) Ch with Cutoff (·) being a cosine cutoff on the distance and the summation being over the channels of these invariant features. Finally, the output of the attention is yielded as ) (∑ (51) h′i = ai j vi j WO , j∈Ni with WO being a linear transformation for the output. SE(3) -Transformer [8]. Different from Graphormer and TorchMD-Net that limit the representation to scalars and vectors with degree l ∈ {0, 1} , SE(3) -Transformer employs attention mechanism on general steerable features with high degree. Following our notations introduced in Section 4.3.2, we describe the attention computation as follows. ⃗ (L) and pairwise key K ⃗ (L) and value The point-wise query Q ⃗ (L) are derived as: V ij i ij Q ⃗ (L) ⃗ (L) = 1 ⊗W Q cg Vi , i ( ) ⃗xi j ⃗ (L) = Y(L) ⃗ (L) , K ⊗WK V ij j ∥⃗xi j ∥ cg ) ( ⃗xi j ⃗ (L) . ⃗ (L) = Y(L) ⊗WV V V ij ∥⃗xi j ∥ cg j 11 (52) The attention coefficient ai j is computed as a Softmax aggregation over the neighbors with message being the inner products of the queries and keys, ensuring rotation invariance: ( (L) (L) ) ⃗ ·K ⃗ exp Q ij i (53) αi j = ∑ ( (L) (L) ) . ⃗ ⃗ k∈Ni exp Qi · Kik The attention is then utilized to aggregate the values and update the node feature: ∑ 1 ⃗ (L) ⃗ (L) . ⃗ ′(L) = 1 ⊗W (54) αi j V V cg Vi + ij i j∈Ni With the invariant attention, the updated feature is easily guaranteed to satisfy SE(3) -equivariance. Besides, LieTransformer [84] extends the idea of LieConv [54] by building attentions on top of lifting and sampling on Lie groups. GVP-Transformer introduced in [85] leverages GVP-GNN [64] as the structural encoder and applies a generic Transformer over the extracted representation, exhibiting strong performance in learning inverse folding of proteins. Equiformer [11] proposes to replace dot product attention in Transformers by MLP attention and non-linear message passing, building upon the space of high-degree steerable tensors. EquiformerV2 [86] further incorporates eSCN [79] in the architecture for efficient modeling and introduces more technical enhancements like specially designed attention renormalization and layer normalization for better empirical performance. Geoformer [87] develops an invariant module called Interatomic Positional Encoding (IPE) based on the invariant basis from ACE, in order to enhance the expressiveness of many-body contributions in the attention blocks. Recently, SO3KRATES [88] proposed a technique aimed at leveraging the advantages of high-degree representations while simplifying the complexity inherent in tensor products. This approach focuses on the design of a model that utilizes only the paths that yield scalars in tensor products. Later, GotenNet [89] broadened the scope of the inner product form, creating a multi-channeled version and referring to models that employ this methodology as spherical-scalarization models. GotenNet integrated the inner product with the original attention mechanism, resulting in an efficient equivariant transformer architecture. As previous transformers typically focus on a specific domain, either proteins or small molecules. EPT [90] proposes a novel pretraining framework designed to harmonize the geometric learning of small molecules and proteins. It unifies the geometric modeling of multi-domain molecules via blockenhanced representation upon an PaiNN-based transformer framework. 4.5 Theoretical analysis on expressivity In machine learning, an important criterion for measuring the expressiveness of a network is whether it has universal 12 Front. Comput. Sci., 2025, 19(11): 1911375 approximation property. In the task of learning on geometric graphs, this is whether any function of geometric graphs can be approximated by geometric GNNs with arbitrary accuracy. An initial attempt to explore this problem is conducted by [109], which proves the universality of the high-degree steerable model, i.e., TFN [7], over point clouds (namely fully-connected geometric graphs) by showing that TFN can fit any equivariant polynomials. GemNet [4] further demonstrates that the universality holds with just spherical representations other than the full SO(3) representations that are required in the proof of [109]. Later, the GWL framework [96] defines a geometric version of the Weisfeiler-Lehman (WL) test [110] to study the expressive power of geometric GNNs operating on sparse graphs from the perspective of discriminating geometric graphs, and discuses the difference of the expressivity between various invariant and equivariant GNNs, both theoretically and experimentally. One crucial conclusion drawn by the GWL paper is that GWL is strictly more powerful than invariant GWL, showing the advantage of equivariant GNNs against invariant GNNs. For fullyconnected geometric graphs, invariant GWL has the same expressive power as GWL. More recently, HEGNN [73] has provided both theoretical and experimental insights into the necessity of employing high-degree steerable features on symmetric graphs. Specifically, under the strict equivariance constraint, the degradation of representations of certain degrees on symmetry graphs cannot be avoided unless it is circumvented by relaxing some conditions (e.g., probabilistic symmetry breaking in SymPE [111]). Furthermore, HEGNN establishes a connection between high-degree steerable features and Legendre polynomials, indicating that innerproduct of sufficiently high-degree representations can recover all angular information present in geometric graphs. There are other works that only investigate the universality of the message computation function [46,51]. They explore the expressivity of the scalarization-based models (e.g., EGNN), and [46] confirms that the scalarization-based methods can universally approximate any invariant/ equivaraint functions of vectors. Besides, SGNN [25] generalizes from equivariance to subequivariance that depicts the case when part of the symmetry is broken by external force field, e.g., gravity, and finally design an universal form of subequivariant functions. 5 Applications In this section, we systematically review the applications related to geometric graph learning. We classify existing methods according to the system types they work on, which leads to the categorization of tasks on particle, (small) molecule, protein, molecule + molecule (Mol + Mol), molecule + protein (Mol + Protein), protein + protein, and other domains, as summarized in Table 3. We also provide a summary of all related datasets of single- and multipleinstance tasks in Tables 4 and 5, respectively. It is worth mentioning that our discussion primarily focuses on the methods utilizing geometric GNNs, although other methods, such as sequence-based approaches, may be applicable in certain applications. 5.1 Tasks on particles The particle representation serves as an abstract and unified concept in the context of dynamic modeling in physics. Rigid bodies, elastic bodies and even fluid can be modeled as a set of particles [25]. Under such a particle-based modeling, a ⃗ physical object of interest corresponds to a geometric graph G as specified in Definition 4, where different particles are modeled as different nodes, and physical interactions between particles such as attraction/repulsion force, collision, rolling, and sliding are denoted as edge connections. 5.1.1 Physical dynamics simulation Geometric GNNs have been widely applied to characterize the process of general physical dynamics. One typical example is N -body simulation, which is originally proposed by [27] and targets at modeling the dynamics of a prototype system composed of N interacting particles. While it is built under an ideal condition, an N -body system is capable of representing various physical phenomena across a spectrum encompassing quantum physics through to astronomy, by accommodating diverse interactions. Other examples include the simulation of physical scenes that involves more complex objects including fluids, rigid-bodies, deformable-bodies, and human motions. Task definition: Given the initial state of the system ⃗ (0) , the future states of all represented by a geometric graph G N particles after a period of k steps are predicted by a parametric function: ⃗ (t) ). ⃗ (t+k) = ϕθ (G X (55) In contrast to the above single-state prediction setting, one may also conduct a “roll-out” simulation by recurrently taking the predicted output of current state as the input for the prediction of the next state. Furthermore, it can also be extended to the spatio-temporal setting by taking the historical geometric graphs within a window of size w (namely ⃗ (t−w+1:t) ) as input, rather than a single input frame (namely G ⃗ (t) ) in Eq. (55). G Symmetry preserved: This is an E( 3 )-equivariant task, as the transformation of the initial state results in the same transformation ( ) ( of) the predicted state. It means ⃗ = ϕθ g · G ⃗ , ∀g ∈ E(3) . g · ϕθ G Datasets: The datasets used in current methods belong to the following classes: 1) N -body dataset series. The original N body dataset [27] presents an environment capable of simulating three types of system, including 1D phase-coupled oscillators, 2D springs, and 2D charged balls. The authors in [8] further generalize N -body to encompass 3D cases. Recently, the work [51] designs Constrained N -body by adding geometric constraints between particles, leading to a combination of diverse systems with isolated particles, sticks and hinges. Later, the systems derived by [65] further introduce the interactions between complex objects that are composed of multiple particles interconnected by rigid sticks. 2) Scene simulation datasets. The paper [118] proposes four simulation environments: FluidFall, FluidShake, BoxBath, and RiceGrip, where the former two focus on fluid modeling, the third one tests fluid-rigid interactions, and the final one involves modeling deformable objects with elastic/plastic Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications 13 Table 3 Summary of various geometric GNNs for different tasks. The generative tasks indicates the ones addressable by generative models, otherwise referred to as the non-generative tasks. The ones can be solved with either generative or non-generative models are dubbed as the mixed tasks Data type Particle Small molecule Task name Task type N -body simulation Non-generative Scene simulation Non-generative Molecular property prediction Non-generative Molecular dynamics Mixed Molecular generation Generative Pretraining Mixed Protein property prediction Non-generative Protein inverse folding Generative Protein Protein folding Generative Protein co-design Generative Pretraining Mixed Linker design Generative Chemical reaction Generative Ligand binding affinity Non-generative prediction Mol + Protein Protein-ligand docking Mixed Pocket-based mol Mixed sampling Protein interface Non-generative prediction Binding affinity Non-generative prediction Protein-protein Protein + Mixed docking Protein Mol + Mol Others Antibody design Mixed Peptide design Mixed Crystal property prediction Non-generative Crystal generation Generative RNA structure ranking Non-generative Methods Physics NRI [27], IN [112], E-NFs [5], EGNN [5], SEGNNs [9], GMN [51], EGHN [65], HOGN [113], NCGNN [114], FastEGNN [66], HEGNN [73] SGNN [25], GNS [26], GNS* [115], C-GNS [116], HGNS[117], DPI-Net [118], HRN [119], FIGNet [120], EGHN [65], LoCS [61], EqMotion [121], ESTAG [31], SEGNO [122], FastEGNN [66], HEGNN [73], EquiLLM [75] Biochemistry Cormorant [76], TFN [7], SE(3)-Transformer [8], NequIP [10], SEGNNs [9], LieConv [54], Lietransformer [84], SchNet [47], DimeNet [3], GemNet [4], PaiNN [6], TorchMD-Net [83], Equiformer [11], SphereNet [123], EGNN [5], Graphormer [81,82], SCN [78], eSCN [79], GNN-LF [124], LEFTNet [69], SaVeNet [70], ViSNet [71], QuinNet [72], SO3KRATES [88], Gaunt [125], GotenNet [89] E-CNF [126], EGNN [5], NequIP [127], GMN [51], EGHN [65], NCGNN [114], ESTAG [31], EGNO [127], SEGNO [122], ITO [128], E-ACF [129], GeoTDM [130], HEGNN [73], StABlE [131], [132] GeoDiff [133], GeoLDM [134], ConfVAE [135], ConfGF [136], G-SchNet [137], cG-SchNet [138], MDM [139], MolDiff [140], DGSM [141], E-NFs [142], EDM [143], GeoMol [144], Torsional Diffusion [30], MPerformer [145], EEGSDE [146], DMCG [147], HierDiff [148], EquiFM [149], CoarsenConf [150], GeoBFN [151], MolCRAFT [152] 3D-EMGP [153], GeoSSL-DDM [154], GraphMVP [155], GNS-TAT [156], MGMAE [157], 3DInfomax [158], Uni-Mol [159], Transformer-M[160], MoleculeSDE [161], SliDe [162], Frad [163], DenoiseVAE [164], MolSpectra [165] LM-GVP [166], DeepFRI [167], GearNet [168], 3DCNN [169], TM-align [170], GVP-GNN [64], PAUL [171], EDN [172], EnQA [173], ScanNet [174], EquiPocket [175], PocketMiner [176] GVP-GNN [64], [177], ESM-IF1 [85], GCA [178], ProteinMPNN [179], PiFold [180], LM-Design [181], KW-Design [182] AlphaFold [33], AlphaFold2 [183], RosettaFold [12], RosettaFold2 [48], RFAA [184], EigenFold [185], RFdiffusion [13], Chroma [14], ESMFold [186], HelixFold-Single [187] Chroma [14], RFdiffusion [13], PROTSEED [188], ReQFlow [189] ProtTrans [190], xTrimoPGLM [191], ProtGPT2 [192], HJRSS [193], GearNet [168], ProFSA [194], PromptProtein [195], DrugCLIP [196], ESM-1b [197], ESM2 [186], Guo et al. [198], PAAG [199] DiffLinker [200], DeLinker [201], 3DLinker [28] OA-ReactDiff [202], TSNet [203] TargetDiff [29], MaSIF [204], GET [205], ProtNet [206], HGIN [207], BindNet [208], BADGER [209], DeepTernary [210] EquiBind [211], DiffDock [16], TankBind [212], DESERT [213], FABind [214], Re-Dock [215] Pocket2Mol [216], TargetDiff [29], DiffBP [217], SBDD [218], GraphBP [219], FLAG [220], DESERT [213], D3FG [221], MolCRAFT [152], MolJO [222], DiffBP [217], VoxBind [223] DeepInteract [224], dMaSIF [225], SASNet [226] mmCSM-PPI [227], GeoPPI [228], GET [205] EquiDock [229], HMR [230], HSRN [231], DiffDock-PP [232], SyNDock [233], AlphaFoldMultimer [234], dMaSIF [235], ElliDock [236], EBMDock [237] DiffAb [238], MEAN [32], dyMEAN [17], RefineGNN [239], PROTSEED [240], AbBERT[240], ADesigner [241], AbODE[242], AbDiffuser [243], tFold[244], GeoAB [245], RAAD [246], EquiLLM [75] HelixGAN [247], RFDiffusion [13], PepGLAD [35], PPFlow [248] Other domains CGCNN [249], MEGNet [250], ALIGNN [251], ECN [252], Matformer [253], Crystal Twins [254], MMPT [255], CrysDiff [256] CDVAE [257], SyMat [49], DiffCSP [50], DiffCSP++ [258], MatterGen [259], PXRDGen [260], EquiCSP [261], FlowMM [262], CrysBFN [263] ARES [35], PaxNet [264], EquiRNA [265] properties. Similar to BoxBath, Water-3D created by [26] randomly initializes the water states and constructs a highresolution water scenario. Beyond the simulation of particlelevel interaction in previous datasets, Kubric [266] and MIT Pushing [268] can be utilized to evaluate face interactions. Physion [267] is a large-scale dataset that involves more realistic and diverse objects driven by more complex physical interactions, including gravity, friction, elasticity, and other factors. Methods: Plenty of studies have been devoted to learning to simulate complex physical systems using GNNs, including Interaction Network [112], NRI [27], HRN [119], DPI-Net [118], HOGN [113], GNS [26], C-GNS [116], HGNS [117], GNS* [115], and FIGNet [120]. However, all these methods adopt typical GNNs that are unaware of full symmetry in 3D world, and only a subset of them considers translationequivariance. Since the work of SE(3)-Transformer [8], rototranslation equivariance is introduced upon the attention-based 14 Table 4 Front. Comput. Sci., 2025, 19(11): 1911375 Summary of typical datasets and benchmarks for the single instance applications Dataset Number of samples N -Body [27] 3D N -Body [5] Constrained N -Body [51] Hierarchical N -Body [5] Water3D [26] Kubric MOVi-A [266] Physion [267] MIT Pushing [268] FluidFall [118] FluidShake [118] BoxBath [118] RiceGrip [118] 70K 7K 5.5K 9K 0.8K 0.02K 16K 6K 3K 2K 3K 5K QM9 [21] 134K MD17 [271] 3.6M OCP [272] 9.8M Adk [273] DW-4 [126] LJ-13 [126] Fast-folding proteins [274] 4.1K 10K 10K 5M GEOM [275] 450K PCQM4Mv2 [107] QMugs [277] Uni-Mol [159] 3.3M 665K 209M GENE Ontology [278] ENZYME [279] 33.5K 18.5K CATH [280] 189K SCOPe [282] 108K AlphaFoldDB [286] 200M UniProt [288] 216M BFD [290] NetSurfP-2.0 [291] CASP [293] 2100M 11.3K 45.7K PDB [294] 1.2M Task Particle N -body Simulation N -body Simulation N -body Simulation N -body Simulation Scene Simulation Scene Simulation Scene Simulation Scene Simulation Scene Simulation Scene Simulation Scene Simulation Scene Simulation Small molecule Molecule Property Prediction Molecule Generation Molecule Pretraining Molecule Property Prediction Molecule Dynamics Molecule Property Prediction Molecular Dynamics Molecular Dynamics Molecular Dynamics Molecular Dynamics Molecular Dynamics Molecule Property Prediction Molecule Generation Molecule Pretraining Molecule Pretraining Molecule Pretraining Molecule Pretraining Protein Protein Property Prediction Protein Property Prediction Protein Inverse Folding Protein Pretraining Protein Co-Design Protein Inverse Folding Protein Pretraining Protein Property Prediction Protein Folding Protein Inverse Folding Protein Pretraining Protein Pretraining Protein Property Prediction Protein Pretraining Protein Pretraining Protein Structure Ranking Protein Residue Identity Protein Folding geometric GNNs to address the N -body problem. Later, EGNN [5] proposes a more effective E( n )-equivariant GNN by using the scalarization-based strategy as already detailed in Section 4.3.1. In contrast to EGNN, SEGNN [9] proposes a general SE(3)-equivariant message passing by making use of high-order degree representations. Recently, GMN [51] have developed multi-channel equivariant modeling specifically for constrained N -body systems consisting of sticks or hinges. Benchmark NRI [27] EGNN [5] GMN [51] EGHN [5] GNS [26] GNS* [115] SGNN [25] FIGNet [120] DPI-Net [118] DPI-Net [118] DPI-Net [118] DPI-Net [118] ATOM3D [269] GEOM-QM9 [270] 3D-Infomax [158] SchNet [47] GMN [51] eSCN [79] GemNet [4] EGHN [5] EQ-Flow [126] EQ-Flow [126] ITO [128] SchNet [47] GEOM-Drugs [270] GMN [51] 3D PGT [276] 3D-Infomax [158] Uni-Mol [159] GearNet [168] GearNet [168] GVP-GNN [166] S2F [281] PROTSEED [188] ProstT5 [283] ProSE [284] TAPE [285] ESMFold [186] AlphaDesign [287] GearNet [168] Prottrans [190] DeepLoc [289] Prottrans [190] PEER [292] ATOM3D [269] ATOM3D [269] ESMFold [186] Upon GMN, EGHN [65] designs equivariant pooling and equivariant unpooling to handle the complex system with a hierarchical structure. In the meantime, SGNN [25] generalizes and relaxes the symmetry from equivariance to sub-equivariance, which plausibly grants it the capability to excel in scenarios influenced by other factors like gravity. As conventional approaches utilize a fixed velocity estimation throughout the time interval, NCGNN [114] instead estimates Jiaqi HAN et al. Table 5 A survey of geometric graph neural networks: data structures, models and applications 15 The summary of typical datasets and benchmarks for the multi-instance applications Dataset Number of samples ZINC [295] CASF [296] GEOM [275] SN2-TS [203] Transition1x [297] 727K 0.28K 450K 0.11K 9.6M CrossDocked 2020 [298] 22.5M PDBBind [22] 23.5K DIPS [226] 42.8K DIPS-plus [299] Biogrid [300] 42.1K 1.7M DB5.5 [302] 0.23K PDBBind [22] SAbDab [23] RAbD [20] 23.5K 8.1K 0.06K SKEMPI 2.0 [303] 7.1K Cov-abdab [304] PepBDB [305] LNR [307] PepGLAD [35] PPFlow [248] 2.4K 13K 0.09K 6K 13K Materials Project [308] 154K Perov-5 [309,310] Carbon-24 [311] ARVIS-DFT [312] FARFAR2-Puzzles [314] rRNAsolo [265] 18.9K 10.1K 41K 18K 92K Task Mol + Mol Linker Design Linker Design Linker Design Chemical Reaction Chemical Reaction Mol + Protein Ligand Affinity Pocket-Based Molecule Sampling Ligand Affinity Protein-Ligand Docking Protein + Protein Protein Interface Prediction Protein-Protein Docking Protein Interface Prediction Protein Interface Prediction Protein Interface Prediction Protein-Protein Docking Binding Affinity Prediction Binding Affinity Prediction Antibody Design Antibody Design Antibody Design Binding Affinity Prediction Antibody Design Peptide Design Peptide Design Peptide Design Peptide Design Others Protein Crys. Property Prediction Crystal Generation Crys. Generation Crys. Generation Crys. Property Prediction RNA Struct. Ranking RNA Struct. Ranking velocities at multiple time points using Newton-Cotes numerical integration. There are also other works that approach physical simulation based on the spatio-temporal setting. LoCS [61] utilizes GRU to record the memory of past frames and additionally incorporates rotation-invariance to improve the model’s generalization ability; EqMotion [121] distills the history trajectories of each node into a multidimension vector and then designs an equivariant module and an interaction reasoning module to predict future frames; ESTAG [31] employs equivariant discrete Fourier Transform along with the equivariant spatio-temporal attention mechanism to model the physical dynamics. SEGNO [315] incorporates the second-order graph neural ODE with equivariant property to reduce the roll-out error of long-term physical simulation. 5.2 Tasks on small molecules By representing atom coordinates as node positions and bonds as edges, a molecule naturally becomes a geometric graph ⃗ ∈ RN×3 represents the positions of N atoms in the where X molecular, H ∈ RN×Ch indicates the atom types or other Benchmark 3DLinker [28] DeLinker [201] DiffLinker [200] TSNet [203] OA-ReactDiff [202] GNINA [298] TargetDiff [29] ATOM3D [269] EquiBind [211] ATOM3D [269] EquiDock [229] DeepInteract [224] SYNTERACT [301] ATOM3D-PIP [269] EquiDock [229] GET [205] GeoPPI [228] RefineGNN [239] RefineGNN [239] ATOM3D [269] mmCSM-PPI [227] RefineGNN [239] CAMP [306] PDAR [307] PepGLAD [35] PPBench2024 [248] CGCNN [249] CDVAE [257] CDVAE [257] CDVAE [257] JARVIS-ML [313] ARES [269] EquiRNA [265] properties of the atoms, and A ∈ {0, 1}N×N represents the existence of bonds. Usually, the edge feature ei j ∈ {0, 1, 2, 3} is defined by the bond type of the edge from node i to j . In addition to chemical edges, the relative distance di j between two atoms is also utilized for constructing k-NN spatial edges by selecting for each atom the k nearest atoms as its neighbors, and the spatial edge feature is defined as ei j = σ(di j ) where σ is a non-linear function, such as RBF. Prior to the use of geometric graph, a molecule could be typically represented by a 1D string (e.g., SMILES [316] and SMARTS [317]) or a 2D topological graph, both of which lose sight of the geometric information of the molecule, resulting in defective performance for the tasks that involve crucial spatial interactions between atoms. Here, we only introduce the works that apply geometric graphs to represent molecules. 5.2.1 Molecular property prediction Molecular property prediction has been a fundamental task in computational biochemistry and machine learning. As pinpointed by MoleculeNet [45], common properties can be 16 Front. Comput. Sci., 2025, 19(11): 1911375 subdivided into four categories: quantum mechanics, physical chemistry, biophysics, and physiology. With the help of geometric GNNs, we are now able to additionally consider the molecular geometries which have been demonstrated to be crucial in determining the quantum chemistry properties of molecules. Task definition: With the input molecule characterized as a ⃗ , the task is to learn a model ϕθ to predict a geometric graph G scalar property y and/or a vectorial property ⃗y : ( ) ⃗ . y,⃗y = ϕθ G (56) While most works mainly focus on the single-task setting by predicting each individual type of property independently, it is also possible to leverage the multi-task setting by predicting multiple types of property simultaneously. Symmetry preserved: It is an SE( 3 )-invariant task in terms of y since it remains unaffected by( any ) rotation ( )or translation ⃗ ⃗ exerted on the molecule, i.e., ϕθ G = ϕθ g · G , ∀g ∈ SE(3) . As for SE( 3 )-equivariance into the model: ( )⃗y , we( enforce ) ⃗ ⃗ g · ϕθ G = ϕθ g · G , ∀g ∈ SE(3) . Datasets: There are currently three popular data sources for the evaluation of this task, including QM9 [21], MD17 [271], and Open Catalyst Project (OCP) [272]. The QM9 dataset contains 131K small organic molecules with up to nine heavy atoms from CONF, and each molecular is annotated with 13 property labels ranging from the highest occupied molecular orbital to the norm of the dipole moment. MD17 is a collection of molecular dynamic simulations for eight small organic molecules, whose goal is to predict both the energy and atomic forces of each molecule, given the atom coordinates in the non-equilibrium and slightly moving system. OCP consists of more than 100M atomic structures for catalysts to help address climate change, each composed of a molecule called adsorbate placed on a slab named catalyst. OCP provides two datasets OC20 [34] and OC22 [272] for benchmarking, and there are three kinds of tasks in OCP where Initial Structure to Relaxed Energy (IS2RE) taking an initial structure as input to predict the relaxed energy is a highly challenging task. Methods: Most of the methods introduced in Section 4 are evaluated on molecular property prediction tasks. Here, to avoid redundant introduction, we no longer describe each method in detail and only specify which of the three mentioned benchmarks they are evaluated on. Specifically, invariant GNNs (including SchNet [47], DimeNet [3], SphereNet [123], and GemNet [4]), equivariant GNNs (including Cormorant [76] and PaiNN [6]) and equivariant graph transformers (e.g., TorchMD-Net [83] and Equiformer [11]) employ both QM9 and MD17 for performance comparisons. Other methods like NequIP [10] are conducted on MD17, while EGNN [5], LieConv [54] and SE(3)Transformer [8] are evaluated on QM9. SEGNN [9], Graphormer [81,82], Equiformer [11], SCN [78], and eSCN [79] leverage more challenging benchmarks, namely, OC20 and even OC22 for performance assessment, revealing encouraging effectiveness of applying geometric GNNs to catalyst design. 5.2.2 Molecular dynamics simulation Molecular Dynamics (MD) simulation aims to simulate the temporal evolution process of molecules driven by internal interactions between atoms within the same molecule, external interactions among different molecules, or environmental interactions from solvents and force fields. Task definition: Given an input molecular graph at time t , ⃗ (t) , this task simulates the dynamical evolution of the i.e., G molecular over some time. In general, the future coordinates ⃗ (t+k) (k > 0) are estimated by X ( ) ⃗ (t) . ⃗ (t+k) = ϕθ G X (57) Similar to general physical dynamics simulation in Section 5.1.1, one may also conduct a roll-out prediction setting or the spatio-temporal input setting. Besides, in contrast to the direct trajectory prediction here, MD can be alternatively addressed with the methods designed for molecular property prediction as described in the last subsection. We can first predict the ⃗ ∈ RN×3 or the graph-level system energy node-level force F ⃗ , and then use these E ∈ R for the given state of the system G estimated quantities to update the molecular dynamics by solving the differential equations that describe molecular dynamics. Symmetry preserved: Clearly, the output coordinate matrix (t+k) ⃗ is E( 3 )-equivariant. X Datasets: MD17 [271], AdK [273], OCP [272], DW-4 [126], fast-folding proteins [274], and LJ-13 [126] are available datasets for MD simulation in the machine learning community. MD-17 [271] which is usually used for molecular property prediction also contains the trajectories of eight molecules generated via DFT. The AdK equilibrium trajectory dataset simulated by CHARMM27 force field in the MDAnalysis software [318] involves the MD trajectory of apo adenylate kinase with explicit water and ions in NPT at 300 K and 1 bar, where the atom positions of the protein are saved every 240 ps for a total of 1.004 μs. Besides the common relaxed energy prediction task, OCP releases a dataset split for MD, which computes short, high-temperature ab initio MD trajectories on a randomly sampled subset of the relaxed states. DW-4 is a relatively simple system consisting of only 4 particles embedded in a 2D space which are governed by an energy function between pairs of particles, while LJ-13 is given by the Leonnard-Jones potential, consisting of 13 particles embedded in a 3D space. Both energy functions in DW-4 and LJ-13 satisfy E(3) -equivariance. The fast-folding proteins dataset includes 12 structurally diverse proteins, such as Chignolin, Trp-Cage, and BBA. The simulations were conducted in explicit solvent, with frame spacing ranging from 100 μs to 1 ms . Methods: As a multi-channel version of EGNN [5], GMN [51] focuses specifically on the physical dynamics by considering the geometric constraints (such as chemical bonds) between atoms and achieves promising results on the MD simulation task in MD17. EGHN [65] develops an equivariant version of UNet [319] equipped with equivariant pooling/unpooling layers to better reveal the hierarchy of large molecules such as proteins, leading to state-of-the-art Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications performance on AdK dataset. NequIP [127] learns interatomic potentials and forces using high-order geometric tensors and E(3) -equivariant convolution layers, achieving high data efficiency and quantum chemical level accuracy for MD17. By observing that GMN and other related geometric GNN methods only learn constant integration of the velocity, Newton–Cotes GNN [114] predicts the integration based on several velocity estimations with Newton–Cotes formulas and proves its effectiveness theoretically and empirically. ESTAG [31] reformulates dynamics simulation as a spatio-temporal prediction task by employing the trajectory in the past period to recover the Non-Markovian interactions. EGNO [127] models the MD trajectory as a function over time using neural operators. SEGNO [315] leverages the second-order continuity information to enhance the performance of GeoTDM [130] further leverages the diffusion model to perform trajectory generation on molecular dynamics. Considering the uncertainty of molecular dynamics at the quantum scale, some methods aim to fit the equilibrium distribution of molecules rather than predicting a single molecular conformation. By leveraging the continuous normalizing flows, E-CNF [126] predicts SE(3) -equivariant molecular conformers through the invariant CoM prior density and equivariant vector fields, showing better generation capabilities compared to invariant flows. Later, E-ACF [129] employs the augmented normalizing flow [320] to learn the target distribution of molecules from MD trajectories, which retains SE(3) -equivariance by projecting the atomic Cartesian coordinates into the SE(3) -invariant vector space. Furthermore, ITO [128] utilizes the score matching diffusion model for stochastic dynamics across multiple time-scales, with extended SE(3) -equivariant PaiNN architecture [321], showcasing considerable generalization ability for different molecular scales. 5.2.3 Molecular generation Molecule generation plays a central role in drug discovery and material design. Its goal is to generate novel molecules with properties of interest by using machine learning. Task definition: Basically, the methods for molecular ⃗ generation learn a parametric probability distribution pθ (G) ⃗ i } . A novel molecular from an observed dataset D:={G geometric graph is then sampled from the learned distribution: ⃗ ∼ pθ (G). ⃗ G (58) Instead of generating a whole geometric graph (namely de novo generation), there are part of methods investigating the conditional generalization paradigm by generating the 3D ⃗ given the 2D topological graph G(H, A) , coordinates X forming the so-called conformation generation problem ⃗ ∼ pθ ( X ⃗ | H, A) . X ⃗ should Symmetry preserved: The generative model pθ (G) ⃗ = pθ (G), ⃗ ∀g ∈ E(3) . This is to be E( 3 )-invariant, i.e., pθ (g · G) ensure that the probability distribution is unaffected by the specific choice of the coordinate system to describe a ⃗ is molecule. In some methods as presented latter, pθ (G) ⃗ ⃗ marginalized from a joint distribution pθ (G, G(0) ) = 17 ⃗|G ⃗ (0) )p(G ⃗ (0) ) where p(G ⃗ (0) ) denotes a certain initial p θ (G ⃗ (0) ) distribution. In this scenario, the initial distribution p(G should be E( 3 )-invariant and the likelihood distribution ⃗|G ⃗ (0) ) should be E( 3 )-equivariant, to guarantee the E( 3 )p θ (G ⃗ [143]. invariance of pθ (G) Datasets: QM9 [21] and GEOM [275] are two prevailing datasets used for molecular generation. In particular, QM9 consisting of about 134K organic molecules contains the molecular 3D structures (e.g., the coordinates of each atom in 3D space) and a wide range of chemical properties for each molecule. GEOM is a comprehensive dataset containing over 37 million molecular conformations, offering diverse conformation ensembles for each 2D molecular structure. Methods: Current methods can be divided into two classes, namely, conformation generation and de novo generation. Conformation generation is to generate 3D conformation given the 2D graph representation. Traditional methods [321] focus on the two-stage strategy: first predicting distances and then reconstructing coordinates, which yet could lead to unrealistic structures if the predicted distances are invalid. To avoid this issue, ConfVAE [135] reformulates the generation task as a bilevel optimization problem under the framework of VAE [322], where the distance prediction and conformation generation are optimized jointly in an end-to-end manner. At the same time, ConfGF [136] estimates the gradient fields of inter-atomic distances by using denoising score matching, and then samples the conformations via annealed Langevin dynamics. Later, DGSM [141] further extends ConfGF by modeling long-range interactions between non-bond atoms additionally. Instead of optimizing force field expensively, GeoMol [144] predicts the local 3D geometries including bond distances and torsion angles simultaneously in an SE(3)invariant way. Without predicting intermediate values like inter-atomic distances, DMCG [147] generates the 3D atomic coordinates by iteratively refining the initial coordinate predictions while accounting for invariance through its designed loss function. Due to the success of diffusion models, GeoDiff [133] leverages graph field network to learn SE(3)-invariant distribution, and Torsional Diffusion [30] operates in torsion angle space rather than in Euclidean space. As for de novo generation, a series methods have been proposed thanks to the fruitful progress of generative models [323]. Built upon Schnet [47], G-SchNet [137] introduces an autoregressive model to directly generate 3D molecular structures, while maintaining physical constraints. cG-SchNet [138] further extends G-SchNet to property-guided generation. Leveraging the generative capabilities of flow models, E-NFs [142] reformulates generation as the task of solving a continuous-time ODE, where the dynamics are predicted by EGNN [5]. By harnessing the power of diffusion, EDM [143] exploits E(3) equivariance by employing EGNN [5] to enhance the diffusion process across both continuous and discrete features. GeoLDM [134] further maps the geometric features into the latent space where latent diffusion is performed. Rooted in EDM, EEGSDE [146] formulates the generation process as an equivariant SDE and employs a meticulously designed energy function to guide the 18 Front. Comput. Sci., 2025, 19(11): 1911375 generation. Recently, MDM [139] takes into account interatomic forces at varying distances (e.g., van der Waals forces), and injects variational noises to enhance performance for large molecules and improve generation diversity. To address atombond inconsistency problem, MolDiff [140] introduces a joint atom-bond diffusion framework and bond guidance to make sure atoms are better suited for bonding. HierDiff [148] adopts a hierarchical diffusion which first generates the coarse positions of molecular fragments and then fills in the finegrained atomic geometry. EQUIFM [149] further explores de novo generation with flow matching, utilizing different probability paths for atom type and structure generation. 5.2.4 Molecular pretraining Given that molecular labeling is expensive to obtain, pretraining molecular representation models without labels becomes fundamental and indispensable in real applications. These pretrained models can then be directly transferred or fine-tuned for specific downstream tasks, such as predicting binding affinity and molecular stability, thereby alleviating data scarcity and improving training efficiency. Previous research primarily focused on pretraining models utilizing non-geometric information, including SMILES notations [324], chemical graphs [325], functional groups [326], etc. Recently, there has been a growing interest in self-supervised pretraining on the 3D geometric structure of molecules. ⃗ to be the representation Task definition: Suppose ϕθ (G) ( ) ⃗ ϕθ (G) ⃗ to be the self-supervised training model, and L ŷ(G), ⃗ denotes the pseudo label created based objective where ŷ(G) ⃗ . The representation model is optimized on the structure of G to minimize the self-supervised objective as ( ) ⃗ ϕθ (G) ⃗ . θ = arg min L ŷ(G), (59) θ ⃗ is Symmetry preserved: The representation model ϕθ (G) ⃗ is a steerable vector, and is E( 3 )E( 3 )-equivariant if ŷ(G) ⃗ invariant if ŷ(G) consists of scalars. Datasets: PCQM4Mv2 [327] is a comprehensive quantum chemistry dataset consisting of 3.37 million molecules derived from the OGB benchmark, which was originally curated as part of the PubChemQC project [328]. QM9 [21] is another popular dataset that encompasses quantum chemistry structures and properties, featuring 134K molecules. QMugs[277] expands QM9 by offering a more extensive collection of drug-like molecules, totaling 665K molecules. GEOM [275] is an energy-annotated molecular conformation dataset containing 37 million molecular conformations sourced from multiple datasets, such as QM9 and CREST program [329]. Uni-Mol [159] constructs a conformation dataset containing 19 million molecules. It utilizes ETKGD with Merck Molecular Force Field optimization in RDKit to generate 11 random conformations for each molecule, resulting in a total of 209 million conformations. Methods: A variety of studies investigate the denoising objective, pretraining the model by recovering the original signal from a perturbed input. Specifically, GeoSSL-DDM [154] formulates the denoising objective based on atomic distance. Uni-Mol [159] proposes position denoising and joint training between 3D molecular conformations and candidate protein binding pockets. GNS-TAT [156] establishes a connection between coordinate denoising and the potential energy of molecular conformations. MGMAE [157] proposes a reconstruction strategy to train on the heterogeneous atombond graph with a high mask ratio. 3D-EMGP [153] further proposes to predict the atomic pseudo force field which is estimated by an Riemann-Gaussian denoising distribution to ensure E(3) -invariant pretraining loss. Apart from the denoising objective, GraphMVP [155] leverages the correlation between 2D molecular graphs and 3D conformations, constructing a contrastive objective for the model pretraining. Similar to GraphMVP, Transformer-M [160] leverages positional encodings and attention biases to encode the 2D and 3D structures in one Transformer model. Meanwhile, 3D-Infomax [158] exploits this correspondence by attempting to maximize the mutual information between 2D molecular graph embeddings and learned representations of the corresponding 3D graphs. MoleculeSDE [161] extends 3D-Infomax [158] and leverages group symmetric stochastic differential equation models to establish a connection between 3D geometries and 2D topologies, with a tighter MI bound. Frad [163] decomposes molecules into fragments to fix the rigid parts and pretrains the model via denoising on the flexible parts. SliDe [162] explores pretraining with denoising from a distribution that encodes physical principles. DenoiseVAE [164] utilizes a learnable noise generation strategy to adaptively acquire atom-specific noise distributions for different molecules, which results in more accurate force field learning. 5.3 Tasks on proteins Proteins are large biomolecules that are composed of one or more long chains of amino acid residues. All proteinogenic amino acids share common structural features, including an α carbon to which an amino group, a carboxyl group, and a variable side chain are bonded. Most proteins fold into unique 3D structures that determine the function and activity of proteins in biological processes. Owing to the hierarchical structures of proteins, there are mainly two different ways to ⃗ to represent proteins. For one leverage geometric graph G thing, we can treat each residue as a node, the positions of α ⃗ and the residue-level carbons as the coordinate matrix X features as H . For another thing, we can apply the full-atom setting by considering each atom as a node, the positions of all ⃗ and atom-level features as H . In both ways, the atoms as X edges can be created via either the chemical bonds or cut-off distances. There are plenty of works that develop machine learning methods to process proteins. While some of them focus on 1D residue sequences, this survey is mainly interested in the study of 3D structures and will demonstrate several relevant tasks in the following. 5.3.1 Protein property prediction Similar to molecular property prediction, protein property prediction is a crucial E(3) -invariant task in computational biology. Most previous works solely employ residue Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications sequences to predict protein properties. Thanks to the development of geometric structure modeling, more and more attentions are paid to using geometric GNNs to estimate the functional property of proteins via exploring 3D structures. In terms of the prediction granularity, the task of protein property prediction is classified as protein-level, residue-level and atom-level prediction, with the details provided below. Protein-level prediction: Many tasks aim to predict the functions or certain scores given the protein structure. (1) Enzyme Commission (EC) number prediction [167] is a prevailing protein-level classification task which aims to predict the catalyzed reaction class of the given enzyme. (2) Gene Ontology (GO) term prediction [167] seeks to predict the functional classes concerning gene ontology given the protein structure, whose data is usually split into three tracks: molecular function (MF), biological process (BP), and cellular component (CC). (3) Protein structure ranking learns a quality score function of the given protein structure to estimate the structural similarity between the candidate protein and the native structure. It plays a vital role in computational biology, as it assists researchers in pinpointing the most accurate or biologically significant protein conformations from a collection of potential structures. (4) Protein localization prediction targets at forecasting the subcellular locations of proteins [289], which is essential to understand the function of a protein and helps investigate the pathogenesis of many human diseases [330]. (5) Fitness landscape prediction primarily focuses on the prediction of the effects of residue mutations on the fitness of proteins. Typical target functions include β -lactamase [292], Adeno-Associated Virus (AAV), Thermostability [331], and Fluorescence and Stability [285]. Abundant protein-level representation models are available in existing literature. DeepFRI [167] and LM-GVP [166] propose a two-stage architecture, which adopts language models to extract amino acid sequence information and graphbased model to learn the interactions between amino acids simultaneously. Notably, LM-GVP utilizes equivariant model GVP [64] as the graph-based model. GearNet [168] proposes a relational graph convolution layer to better capture the 3D geometry of proteins, and exploits multi-view contrastive pretraining to better utilize unlabeled data. As for structure ranking, TM-Align [170] is a typical but not DL-based method, which is time-consuming. Thanks to the expressive ability of geometric GNN, [64,172,173] adopt equivariant GNN models such as GVP [64] and TFN [7] to fulfill model quality assessment (MQA). In addition, TFN [7] is also used for ranking protein-protein complex in PAUL [171]. Residue-level prediction: Atom3D [269] proposes Residue Identity (RES) prediction, which aims to predict the amino acid types at the center of a given local context. The performance on this task measures whether a model can capture the structural dependencies between individual amino acids, which is vital for protein engineering. Atom-level prediction: The main form of atom-level prediction lies in pocket detection, which requires predicting whether an atom on the protein belongs to the binding site in terms of a potential ligand. Previous methods usually design 19 algorithms to find and rank the cavities on the protein surface [332,333], or voxelize the protein structure and use 3D-CNN for supervised training [334,335]. Notably, a series of works are exploiting the geometric GNNs to achieve much better performance (ScanNet [174], EquiPocket [175], and PocketMiner [176]). 5.3.2 Protein generation In terms of what to generate, the approaches for protein generation are categorized into protein folding (or protein structure prediction), protein inverse folding, and protein structure and sequence co-design. Protein folding aims to generate folding structures given the amino acid sequences of the input protein. This task has significant implications in the field of drug design. The folding structure is generated by: ⃗ ∼ pθ ( X ⃗ | s), X (60) N s ∈ R where denotes the amino acid sequence based on the ⃗ ∈ RN×3 (note that each row of X ⃗ coordinates of all residues X can include more than one 3D coordinate vector if full-atom coordinates are considered). Symmetry preserved: This is an equivariant task, implying ⃗ | s) = pθ ( XO ⃗ + ⃗t | s) for an arbitrary orthogonal that pθ ( X transformation O and translation ⃗t . Notably, some methods ⃗, generate the distance matrix or other invariant forms of X reducing the task into a trivial generation problem without the equivariance constraint. Methods: The AlphaFold series [33,183] and RoseTTAFold series [12,48] represent the forefront of contemporary techniques in protein folding. They employ a sophisticated multi-track architecture capable of processing multi-sequence alignments (MSA), amino acid pair-wise distance maps, and geometric structures, each with remarkable efficiency. Building upon these advancements, RoseTTAFold2 [48] extends the capabilities of both AlphaFold2 [183] and RoseTTAFold [12] by refining the attention mechanism and enhancing the three-track architecture, resulting in notable performance improvements. Moreover, RFAA [184] further extends RoseTTAFold’s versatility to encompass the design of various biomolecules beyond proteins, including nucleic acids, small molecules, and metals. In contrast, ESMFold [336] and HelixFold-Single [187] represent a departure from traditional methods by eschewing the requirement for MSA. Instead, it learns to predict protein structures directly from primary sequence data, significantly enhancing inference efficiency. Additionally, EigenFold [185] introduces a novel harmonic diffusion process that projects protein structures onto eigenmodes, thereby preventing the disassembly of adjacent nodes. Protein inverse folding aims to generate amino acid sequences conditional on the folding structures of the input protein. Using the same denotations as the task of protein folding, the model pθ generates the amino acid sequence s ∈ RN of interest: ⃗ s ∼ pθ (s | X). (61) Symmetry preserved: This is an invariant task, indicating 20 Front. Comput. Sci., 2025, 19(11): 1911375 ⃗ = pθ (s | XO ⃗ + ⃗t) for an arbitrary orthogonal that pθ (s | X) transformation O and translation ⃗t . Methods: Typical methods such as [177] and [178] take the invariant features including distance and dihedral angles as input, to ensure invariance during generation. More recently, based on GVP [64] that is E(3) -equivariant, ESM-IF [85] further incorporates more structure information for the generation, while keeping the output sequence invariant. Similarly, LM-Design [181] integrates structural embedding into language models to improve the performance of inverse folding. ProteinMPNN [179] uses an invariant architecture to embed its backbone and predicts amino acid probabilities autoregressively while enforcing desired constraints. PiFold [180] additionally incorporate distance, angle, and direction features and proposes PiGNN to non-autoregressively generate the sequences. KW-Design [182] integrates knowledge from pretrained sequence and structure models to refine the sequences generated by the baselines with a memory retrieval mechanism. Protein structure and sequence co-design aims to generate both the amino acid sequences and folding structures, which is formally derived as: ⃗ s ∼ pθ ( X, ⃗ s). X, (62) Symmetry preserved: Clearly, this task is invariant with ⃗. respect to s , and equivariant with respect to X Methods: Based on RoseTTAFold [12], RFdiffusion [13] incorporates Gaussian noise into coordinates and Brownian motion noise into orientations, subsequently denoises the structure step-by-step and recovers sequence using ProteinMPNN [179]. Meanwhile, Chroma [14] introduces a revolutionary programmable diffusion framework, empowering diverse conditional generation and precise targeting of properties through constraints such as symmetry, shape, and semantics. Both Chroma and RFDiffusion begin with structure generation and then conduct the subsequent sampling of the corresponding sequence through another module. Unlike these two works, PROTSEED [188] designs the structure and sequence jointly by an encoder-decoder framework, where the encoder is trigonometry-aware to learn context features and the decoder is SE(3)-equivariant to express the sequence and structure. Datasets: ATOM3D [269] constructs multiple widely-used datasets tailored for protein design tasks. CASP [293] stands out as a renowned contest dedicated to protein structure prediction. In this competition, participants submit predicted structures for evaluation, particularly when the experimental structures are not publicly available. The community then assesses the quality of these submissions. Additionally, AlphaFoldDB [286], SCOPe [282], and CATH [280] serve as valuable resources for protein design, providing datasets comprising protein structures alongside their corresponding sequences. SCOPe and CATH consist of segmented protein structure domains, while AlphaFoldDB boasts a repository of over 200 million complete structures predicted by AlphaFold2 [183]. Moreover, with predictions stemming from ESMFold [336], the ESM Metagenomic Atlas boasts a collection of about 772 million metagenomic protein structures. 5.3.3 Protein pretraining Similar to molecule pretraining task, protein pretraining also aims to learn representations of protein, which can be used in downstream tasks. Task definition: Generally, each input protein is modeled as ⃗ and the pretraining purpose is to learn a a geometric graph G parametric model ϕθ which can output high-quality representations H ∈ RN×d of the input protein: ⃗ H = ϕθ (G). (63) Symmetry preserved: It is equivariant for the output vectors in H , and invariant for the output scalars in H . Datasets: For protein sequence pretraining methods, UniProt [288] functions as a central repository for both protein sequence and functional information. It is organized into clusters by UniRef [337], with pairwise sequence identity thresholds typically set at 50% and 100% (referred to as UniRef50 and UniRef100) to eliminate redundancy. BFD [290], on the other hand, represents a larger sequence dataset, formed by amalgamating UniProt with protein sequences sourced from metagenomic sequencing projects. Furthermore, NetSurfP-2.0 [291] furnishes labels for protein secondary structure prediction, delineated into 3-states and 8-states, offering valuable resources for supervised training. In the realm of protein structure pretraining and classification, SCOPe [282], CATH [280], and AlphaFoldDB [286] hold significant importance. They provide comprehensive repositories for protein structures, facilitating research and advancement in the field. Methods: Previous protein pretraining methods such as ESM-1b [338], ESM2 [336], ProtTrans [190], xTrimoPGLM [191], and ProtGPT2 [192], are based on sequence masking and prediction, inspired by the success of NLP language models. Readers can refer to the survey by [339] for more introductions of protein language models. Recent attentions have been paid to pretrained models based on the 3D structure information. For instance, GearNet [168] built upon an invariant GNN with multi-type message passing leverages several pretraining objectives including contrastive learning between sequences and structures, distance/dihedral prediction, and residue type prediction. Other works like ProFSA [194] and DrugCLIP [196] also utilize contrastive learning to learn SE(3)-invariant features, but focusing more on pocket pretraining, where the pocket-ligand interaction knowledge is incorporated as well. Guo et al. [198] employs pretraining with the protein’s tertiary structure, incorporating SE(3)-invariant features to ensure the efficient preservation of SE(3)-equivariance. PAAG [199] enables multi-level alignment between protein sequence and textual annotation to capture the fine-grained motif inside the protein and successfully designs proteins with functional domains. 5.4 Tasks on Mol+Mol This subsection introduces the tasks with the input of “molecule+molecule”, including liker design and chemical reaction prediction. 5.4.1 Linker design Fragment-based molecule design requires to predict the linker, Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications a small molecule, so that two or more molecular components can be combined into novel molecules with desirable properties. Linkers are of great importance in maintaining the proper orientation, flexibility, and stability of multi-domain proteins or fusion proteins. Task definition: The input consists of two or more unlinked molecular fragments, which are all represented as geometric ⃗ i }k , and the model needs to learn an equivariant graphs {G i=1 ⃗ L used to link function fθ whose output is a small molecule G the fragments. Specifically, ⃗1, G ⃗2, . . . , G ⃗ k ). ⃗ L , H L = f θ (G X (64) Symmetry preserved: If we impose rotation or translation operations on the input fragments simultaneously, the output coordinates should transform correspondingly while the atom features keep invariant. Datasets: The linkers connecting molecules in ZINC [295] can be computationally synthesized, similar to the methods employed by [340]. Conversely, CASF [296] offers experimentally validated molecules for linker design. In contrast to ZINC and CASF, which typically produce paired fragments, DiffLinker [200] generates a novel dataset comprising three or more fragments, drawing from GEOM [275]. Methods: DeLinker [201] and 3DLinker [28] employ VAE [322] to create the 3D structure of a linker. However, their capability is limited to linking only two fragments, rendering them ineffective when faced with an arbitrary number of fragments to link. In contrast, DiffLinker [200] has recently succeeded in addressing this challenge by harnessing an E( 3 )equivariant diffusion model configured to handle multiple fragments. 5.4.2 Chemical reaction prediction In chemical reactions, identifying and characterizing transition state (TS) structures is crucial for understanding reaction mechanisms. This process entails locating the TS structure that minimizes the system’s potential energy (PE) while adhering to specific constraints, such as SE(3) invariance. ⃗ R and a product G ⃗ P , the Task definition: Given a reactant G ⃗ TS that optimizes objective is to generate the TS structure G the following objective: ⃗ ∗ = argmin PE(G ⃗ TS |G ⃗R, G ⃗ P ), G TS (65) ⃗ TS G where the function PE(·) returns the potential energy. Symmetry preserved: In general, the output, namely, the TS structure is invariant to any independent transformation (e.g., rotation) imposed to each of the input structure. If the input and output are always fixed within the same 3D coordinate space, then this task is equivariant, namely, imposing the same transformation to the two input structures, the output TS is transformed in the same way. Datasets: TSNet [203] has meticulously assembled a dataset called SN 2-TS , which contains structures of reactants, transition states (TS), and products pertinent to SN 2 reactions. Transition1x [297] provides a resource of 9.6 million density functional theory (DFT) calculations encompassing forces and 21 energies for molecular configurations across reaction pathways. This extensive dataset offers valuable information for training models for reaction prediction. Methods: OA-ReactDiff [202] introduces a diffusion model to generate transition state (TS) structures. This model ensures SE(3)-equivariance of the score function by constructing local frames. Moreover, the equivariant backbone model is adapted to accommodate multiple objects. On the other hand, TSNet [203] employs the equivariant graph neural network (GNN) model TFN [7] to predict TS structures. Initially, TFN is pretrained on extensive chemical data, such as QM9 [21], to learn useful representations. It is then fine-tuned specifically for the task of predicting transition structures. 5.5 Tasks on mol+protein The “molecule+protein” tasks are well explored, such as ligand binding affinity prediction, protein-ligand docking, and pocket-based molecule sampling. 5.5.1 Ligand binding affinity prediction The task of predicting ligand binding affinity revolves around estimating the interaction strength between a protein (receptor) and a small molecule (ligand) [205]. Accurate predictions in this area offer significant advantages for designing and refining drug candidates. Additionally, they aid in prioritizing compounds for experimental evaluation, thereby streamlining the drug discovery process. Task definition: With both the molecule and protein ⃗m, G ⃗ p , the task aims to learn an regarded as geometric graph G efficient predictor ϕθ , which can predict the binding strength s accurately: ⃗ p, G ⃗ m ). s = ϕθ (G (66) Symmetry preserved: It is obvious that the binding affinity will not change under any transformation. Datasets: CrossDocked2020 [298] contains over 22 million posed ligand-receptor complexes and the corresponding binding affinity values, which are generated by docking ligands into multiple receptor structures from the same binding pocket. PDBbind [22] provides accurate and reliable binding affinity data, allowing researchers to assess how well computational methods can predict the strength of binding between proteins and ligands. Methods: MaSIF [204] utilizes geodesic space to represent the protein surface, assigns geometric and chemical features to patches, and employs rotation invariance to process these features, facilitating predictions of protein-ligand interactions. ProtNet [206] considers 3D protein presentations at various levels (e.g., amino-acid level, backbone level, and all-atoms level) to accomplish affinity prediction tasks. GET [205] extends this concept by unifying different levels universally for both molecule and protein representations. TargetDiff [29] introduces a diffusion process that gradually adds noise to coordinates and atom types. This process, guided by an SE(3)equivariant graph neural network (GNN), incorporates binding free energy terms to steer generation towards high-affinity poses. HGIN [207] constructs a hierarchical invariant graph model to predict changes in binding affinity resulting from 22 Front. Comput. Sci., 2025, 19(11): 1911375 protein mutations. BindNet [208] designs two pretraining tasks utilizing Uni-Mol [159] as the encoder to jointly learn protein and ligand interactions. 5.5.2 Protein-ligand docking This task works towards predicting the transformation, e.g., rotation and translation, imposed on protein and molecules so that they can dock together with the minimum root-meansquare-deviation. Task definition: Without loss of generality, we assume that the protein remains fixed while the position of the molecule ⃗ p :=( X ⃗ p , H p ) and the transforms. By denoting the protein as G ⃗ m :=( X ⃗ m , Hm ) , respectively, the model needs to molecule as G learn a prediction function ϕθ that outputs the rotation matrix and translation vector (i.e., R, ⃗t ) by ⃗ p, Hp; X ⃗ m , Hm ). R, ⃗t = ϕθ ( X (67) With the predicted rotation R and translation ⃗t , we can dock the molecule towards the fixed protein. Symmetry preserved: To make the final docked complex to be SE( 3 )-equivariant, the predictor ϕ is supposed to meet the following independent SE( 3 ) constrains [211]: R′ =Qm RQ⊤p , ⃗t ′ = Qm⃗t − Qm RQ⊤p ⃗t p + ⃗t m , ∀Q p , Qm ∈ SO(3), ⃗t p , ⃗t m ∈ R3 , (68) where are the predicted rotation matrix and translation vector after transforming the protein and the molecule, ⃗ p Q p + ⃗t p , H p ; X ⃗ m Qm + ⃗t m , Hm ) . namely, R′ , ⃗t ′ = ϕθ ( X Datasets: PDBbind [22] stands out as the predominant dataset for Protein-Protein Docking, housing over 22 million poses resulting from the docking of ligands into their respective receptor structures. Typically, current methods segment the dataset based on chronological order, leveraging this organization for training and evaluation purposes. Methods: EquiBind [211] and TankBind [212] have tackled the blind binding problem by leveraging equivariant graph neural networks. TankBind additionally introduces trigonometry constraints to enhance compound rationality. To further enhance performance, DiffDock [16] proposes a diffusion process operating across three groups (T(3), SO(3), and SO(2)). In contrast, DESERT [213] offers a unique approach by initially outlining pocket shapes and then generating molecule structures to bind these pockets. This method alleviates the scarcity of experimental binding data and is not reliant on predefined pocket-drug pairs. Recently, FABind [214] designs geometry-aware GNN layers and efficient interaction modules (e.g., interfacial message passing) to unify pocket prediction and the docking stage, which leads to fast and accurate prediction. Further, Re-Dock [215] explores flexible docking by considering the gap between apo and holo conformations of the target protein, which enhances the practical utility. R′ , ⃗t ′ 5.5.3 Pocket-based mol sampling The technique of pocket-based molecular sampling aims at generating small molecules that have the potential to bind to a particular pocket on a protein or other biomolecular target. Task definition: This target-aware design resorts to learn a ⃗ m that generation model pθ whose output is a new molecule G ⃗p: can bind to a specific pocket G ⃗ m ∼ pθ (G ⃗m | G ⃗ p ). G (69) Symmetry preserved: It is an equivariant problem, implying ⃗m | G ⃗ p ) = pθ (g · G ⃗m | g · G ⃗ p ) for any transformation g that pθ (G of interest. Datasets: CrossDocked2020 [298] serves as a substantial resource for sampling molecules based on docking pockets, containing approximately 22.5 million docked protein-ligand pairs. Methods: Pocket2Mol [216], GraphBP [219], SBDD [218], and FLAG [220] adopt an autoregressive approach to generate molecules conditioned on binding sites, operating at the granularity of atoms or motifs. In contrast, TargetDiff [29] with a series following diffusion-based methods [152,217,220,223,341] diverges from this method by utilizing 3D equivariant diffusion in a non-autoregressive fashion. This approach enables the generation of all atoms simultaneously, resulting in higher efficiency. DESERT [213] further explores to first sketch the shape of the molecule according to the pocket, and then generates a molecule fitting in the shape. D3FG [221] leverages a fragment-based diffusion to enhance the generative performance by decomposing molecules into functional groups and linkers. 5.6 Tasks on protein+protein The “protein+protein” tasks include protein interface prediction, protein-protein binding affinity prediction, proteinprotein docking, antibody design that considers specifically the interaction between antibodies and antigens, and peptide design that aims at generating target-specific peptide. 5.6.1 Protein interface prediction Biological processes often depend on interactions between biomolecules. This creates a need for predicting proteinprotein interfaces, which involves identifying the regions on a protein’s surface that are likely to participate in interactions with other proteins. Task definition: With the protein pair taken as two ⃗1, G ⃗ 2 , this task requires to learn a predictor geometric graphs G ϕθ that determines if the atoms on the protein belong to the interface. The output are interpreted as the atomic probabilities p ∈ RN1 +N2 of being located on the interface: ⃗1, G ⃗ 2 ). p = ϕθ ( G (70) Symmetry preserved: Once the interaction proteins are selected, the atoms in the interface are deterministic no matter the rigid transformations on each partner, resulting in an invariant problem with respect to each protein: ⃗1, G ⃗ 2 ) = ϕ(g1 · G ⃗ 1 , g2 · G ⃗ 2 ), ∀g1 , g2 ∈ SE(3). ϕ(G (71) Methods: The methods dMaSIF [225] and SASNet [226] operate via three-dimensional convolution on the protein 3D structures to keep rotation-invariance. Moreover, fed with more structure features such as distance, orientation and amide angle, DeepInteract [224] adopts geometric transformer and achieves competitive performance as well. Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications 5.6.2 Binding affinity prediction Protein-protein interactions are fundamental to bio-molecular activity and are crucial for many key functions in biological processes. Estimating the binding affinity between proteins not only aids in gaining a deeper understanding of protein mechanisms of action but also serves as the cornerstone for designing proteins with specific functions, such as highly specific antibodies and high-affinity ligands. Task definition: Given a pair of proteins that can be ⃗1, G ⃗ 2 , this task requires considered as geometric graphs G learning a predictive function ϕθ , which can efficiently and accurately predict the binding strength s between the pair of proteins: ⃗1, G ⃗ 2 ). s = ϕθ (G (72) Symmetry preserved: This is an invariant task because the binding strength s remains unchanged under any translations or rotations applied to the pair of proteins. Datasets: PDBbind [342] dataset constitutes an assembly of complex structures, meticulously sourced from the Protein Data Bank (PDB), accompanied by binding affinities that have been quantified through rigorous experimental methods. Protein-Protein Affinity Benchmark Version 2 [302,343] encompasses a repertoire of 176 variegated protein-protein complexes, each accompanied by detailed affinity annotations. SKEMPI (Structural database of Kinetics and Energetics of Mutant Protein Interactions) [344] constitutes a curated database that delineates alterations in binding affinities and kinetic parameters consequent to mutagenesis. SKEMPI 2.0 [303] represents the refined and augmented edition of the original SKEMPI database. Methods: mmCSM-PPI [227] presents a binding affinity prediction method employing graph-based signatures that encapsulate protein structure’s physico-chemical and geometric properties, augmented with complementary features to reflect various mechanisms. The Extra Trees model, trained with graph-based signatures and complementary features, yields promising results on the SKEMPI 2.0 dataset. GeoPPI [228] utilizes the 3D conformations to ascertain a geometric representation that embodies the topological features of the protein structure through a self-supervised learning approach. Subsequently, these representations serve as inputs for gradient-boosting trees, facilitating the prediction of the variations in protein-protein binding affinity due to mutations. GET [205] introduces a bilevel design that ensures equivariance while unifying representations across different levels. GET achieves state-of-the-art performance in PDB dataset. 5.6.3 Protein-protein docking We have investigated docking pose prediction between protein and molecule in Section 5.5.2. Here, we study the similar problem between protein and protein. Task definition: Assuming two proteins to be denoted as ⃗ ⃗2 = (X ⃗ 1 , H1 ), G ⃗ 2 , H2 ) , respectively, the model needs to G1 = ( X learn a prediction function ϕθ to output the rotation matrix and translation vector (i.e., R, ⃗t ) by ⃗ 1 , H1 ; X ⃗ 2 , H2 ). R, ⃗t = ϕθ ( X (73) 23 Symmetry preserved: This is identical to Eq. (68). Methods: Equidock [229] uses SE(3)-equivariant graph neural networks and optimal transport techniques to predict the transformation by aligning key points. HMR [230] casts this task from 3D Euclidean space to 2D Riemannian manifold, keeping rotational invariant. DiffDock-PP [232] extends DiffDock [16], a diffusion generative model, to protein docking task and yields the state-of-the-art performance. Furthermore, in dMaSIF [235], an energy-based, SE(3)-equivariant model combined with physical priors is adopted to infer docking regions. Treating docking as an optimization problem, EBMDock [237] employs geometric deep learning to extract features from protein residues and learns distance distributions between the residues involved in interfaces. Multimetric protein docking can be tackled by AlphaFold-Multimer [234] and SyNDock [233]. Recently, ElliDock [236] predicts SE(3) -equivariant elliptic paraboloids as the binding interface for protein pairs, and transfers the rigid protein-protein docking task into surface fitting while ensuring the same degree of freedom. There are also several works targeting at antibody-antigen docking, a subfield of protein docking. For instance, HSRN [231] proposes a hierarchical framework to handle docking in an iterative manner. By harnessing the capabilities of tFold-Ab [244] and AlphaFold2 [183], tFold-Ag [244] generates antibody/antigen features and employs a docking module to predict complex structures with flexibility. 5.6.4 Antibody design Antibodies are Y-shaped symmetric proteins produced by the immune system that recognize and bind to specific antigens. The design of antibodies mainly focuses on the variable domains consisting of a heavy chain and a light chain, with 3 Complementarity-Determining Regions (CDRs) and 4 framework regions interleaving on each chain. The 6 CDRs largely determine the binding specificity and affinity of the antibodies, especially CDR-H3 (i.e., the 3rd CDR on the heavy chain), which is the main scope of the design. Task definition: Without loss of generality, we define the task as a conditional variant of structure and sequence codesign. More specifically, given the geometric graphs of the ⃗ A , the heavy chain G ⃗ H , and the light chain G ⃗ L with antigen G the CDRs missing, the model ϕθ needs to fill in the geometric ⃗C : graph of the CDRs of interest G ⃗ C = ϕ θ (G ⃗ A, G ⃗H, G ⃗ L ). G (74) ⃗C Symmetry preserved: Apparently, the output CDRs G should be SE( 3 )-equivariant with respect to the antigen: ⃗ C = ϕθ (g · G ⃗ A, g · G ⃗H, g · G ⃗ L ), ∀g ∈ SE(3). g·G (75) Methods: Antibody is of great significance in the field of therapeutics and biology, thus many works have dedicated to designing antibodies with desired binding specificity and affinity ([17,32,238–240,242,243]). RefineGNN [239] initiates the first attempt to design CDRs on the heavy chain only. Then MEAN [32] and DiffAb [238] extend to the complete setting where the entire complex (i.e., the antigen, the heavy chain and the light chain) without CDRs are given 24 Front. Comput. Sci., 2025, 19(11): 1911375 as contexts. Notably, MEAN [32] adopts GMN-like [51] multi-channel architecture to encode the backbone atoms of the residues, and proposes an equivariant attention mechanism to capture interactions between different geometric components. Progressively, MEAN is upgraded to dyMEAN [17] which proposes a dynamic multi-channel encoder to capture the full-atom geometry of residues and tackles a more challenging setting where the entire structure and docking pose of the antibody needs to be generated instead of given as contexts. DiffAb [238] proposes a diffusion generative model for antibody design. Similarly, AbDiffuser [243] also adopts diffusion-based generative model, but steps forward to project each side chain into 4 pseudo-carbon atoms to capture the fullatom geometry and handles length change by placeholders in the sequence. ADesigner [241] proposes a cross-gate MLP to facilitate the integration of sequences and structures. Unlike the aforementioned approaches, AbODE [242] explores graph PDEs for antibody design. GeoAB [245] uses torsional prior knowledge with equivariant neural network focusing on bond lengths, bond angles and dihedrals. RADD [246] introduce more node features, edge features, and edge relations to include more contextual and geometric information for designing the CDRs. Further, [240] utilizes pretrained antibody language models to improve the quality of sequencestructure co-design, and tFold-Ab [244] also employs a pretrained language model (i.e., ESM-PPI), along with feature updating (i.e., Evoformer-Single) and structure modules, to enable efficient and accurate prediction of antibody structures directly from sequence. 5.6.5 Peptide design Peptide, which consists of short sequences of amino acids, represents the intermediate modality between small molecules and proteins, and plays a critical role in various biological functions. Its unique position makes functional peptide design particularly appealing for both biological research and therapeutic applications [345,346]. Task definition: Similar to antibody design, peptide design typically involves generating binding peptides for a given ⃗B binding area on the target protein. Denoting the target as G ⃗ P , we can formalize the task as follows: and the peptide as G ⃗ P = ϕθ (G ⃗ B ). G (76) Symmetry preserved: Akin to antibody design, the output of the model requires to maintain invariance in the sequence distribution and equivariance in the structure distribution in terms of the E(3) group. Datasets: PepBDB [305] collects 13K protein-peptide complexes with peptides containing fewer than 50 residues from the Protein Data Bank [294]. [307] curates a diverse and non-redundant dataset of 96 protein-peptide complexes, with peptides between 4 and 25 residues, which is referred to as the Long Non-Redundant (LNR) dataset. PepGLAD [35] further collects 6K non-redundant protein-peptide complexes, also featuring peptides between 4 and 25 residues, and partitions them based on the sequence identity of the receptors for training and validation, employing LNR as the test set. Methods: While conventional approaches rely on empirical energy functions to sample and optimize sequences and structures at the residue or fragment level [347,348], recent advances in geometric molecular design shed light on deep generative models. HelixGAN [247] focuses on a sub-family of peptides with α -helices. RFDiffusion [13], which is originally designed for protein generation, also explores supervised finetuning for target-specific peptide design. PepGLAD [35] takes a step further by tackling sequencestructure co-design with a geometric latent diffusion model. 5.7 Tasks on other domains We briefly review the applications on other domains such as crystals and RNAs. 5.7.1 Crystal property prediction In the realm of material science, the prediction of crystalline properties stands as a cornerstone for the innovation of new materials. Unlike molecules or proteins, which consist of a finite number of atoms, crystals are characterized by their periodic repetition throughout infinite 3D space. One of the main challenges lies in capturing this unique periodicity using geometric graph neural networks. Task definition: The infinite crystal structure is commonly simplified by its repeating unit, which is called a unit cell, ⃗ = ( ⃗L, X, ⃗ H are ⃗ H) , where X, which is represented as G coordinate matrix and feature matrix as defined before, and the additional matrix ⃗L = [⃗l1 , ⃗l2 , ⃗l3 ]⊤ ∈ R3×3 consists of three lattice vectors determining the periodicity of the crystal. The task is to predict the property y ∈ R of the entire structure via the predictor ϕθ . ⃗ H). y = ϕθ ( ⃗L, X, (77) Symmetry preserved: The output of the predictor should be invariant with respect to several types of groups: 1) E(3) ⃗ and the lattice ⃗L ; invariance of both the coordinates X ⃗ ; 3) Cell choice 2) Periodic translation invariance of X invariance owing to periodicity, with details referred to [259]. Datasets: Materials Project (MP) [308] and JARVIS-DFT [312] are two commonly-used datasets. In particular, MP is an open-access database containing more than 150K crystal structures with several properties collected by DFT calculation. JARVIS-DFT, part of the Joint Automated Repository for Various Integrated Simulations (JARVIS), is also calculated by DFT and provides more unique properties of materials like solar-efficiency and magnetic moment. Methods: To take the periodicity into consideration, CGCNN [249] proposes the multi-edge graph construction to model the interactions across the periodic boundaries. MEGNet [250] additionally updates the global state attributes during the message-passing procedure. ALIGNN [251] composes two GNNs for both the atomic bond graph and its line graph to capture the interactions among atomic triplets. ECN [252] leverages space group symmetries into the GNNs for more powerful expressivity. Matformer [253] utilizes selfconnecting edges to explicitly introduce the lattice matrix ⃗L into the transformer-based framework. To utilize the large amount of unlabeled data, Crystal Twins [254] applies two contrastive frameworks, Barlow Twins [349] and SimSiam Jiaqi HAN et al. A survey of geometric graph neural networks: data structures, models and applications [350], to pre-train the CGCNN models, and MMPT [255] proposes a mutex mask strategy to enforce the model to learn representations from two disjoint parts of the crystal. 5.7.2 Crystal generation Besides predicting the invariant properties of 3D crystals, the rapid progress of geometric graph neural networks has also paved the way to de novo material design, whose goal is to generate novel crystal structures beyond the existing databases. Task definition: Crystal generation methods commonly integrate geometric graph neural networks into deep generative frameworks, which aims to learn the distribution from a given dataset, allowing to generate new crystals through sampling from the learned distribution: ⃗L, X, ⃗ H ∼ pθ ( ⃗L, X, ⃗ H). (78) Symmetry preserved: Similar to the property prediction task, the learned distribution is also required to be invariant in terms of E(3) group and periodicity. Datasets: CDVAE [257] collects three datasets, named Perov-5 [309,310], Carbon-24 [311], and MP-20 [308] to evaluate the generative models on different crystal distributions. Methods: CDVAE [257] incorporates a diffusion-based decoder into a VAE-based framework, by first predicting the lattice parameters from the latent space, and updating the atom types and coordinates according to the predicted lattice. SyMat [49] refines this approach by generating atom types as permutation invariant sets and employing coordinate scorematching for the edges. DiffCSP [50], originally aiming at predicting crystal structures from given composition, also excels in generating structures from scratch. DiffCSP adopts ⃗ instead of the Cartesian the fractional coordinates F = ⃗L−1 X coordinates, and jointly generates the lattice matrix, atom types and coordinates via a diffusion-based framework. DiffCSP++ [258] extends DiffCSP with the conditions of lattice families and Wyckoff coordinates to maintain the space group constraints. Recently, MatterGen [259] further propels the joint diffusion method, and specializes the lattice diffusion process to be cubic-prior and rotation-fixed. 5.7.3 RNA 3D structure ranking RNA, or ribonucleic acid, is a pivotal type of molecules that goes beyond its traditional role as a mere intermediary between DNA and protein synthesis. Its functionality heavily relies on its intricate three-dimensional structure, making the prediction and ranking of RNA’s 3D conformation crucial. This structural complexity enables RNA to participate in gene regulation, cellular communication, and catalysis, underscoring its significance in fundamental life processes. As a result, RNA stands at the forefront of molecular biology and biotechnology research. Task definition: Here, we refer the ranking of 3D RNA structures to the task of identifying which structure most accurately reflecting the RNA’s actual shape from a pool of imprecise ones. In other words, the score model ϕθ is required to evaluate the root-mean-square deviation (RMSD) between each candidate 3D RNA structure represented by a geometric 25 ⃗ , and the ground truth: graph G ⃗ s = ϕθ (G). (79) Symmetry preserved: This is obviously an invariant task because the RMSD value between the candidate structure and the ground truth remains impervious to any translations or rotations imposed on the candidate structure. Methods: ARES [35] leverages e3nn [351] to model the 3D structure of RNA, ensuring equivariance and invariance during the update of atomic features. ARES then aggregates the features of all atoms to predict the RMSD value. In contrast, PaxNet [264] employs a two-layer multiplex graph to model the 3D structure of RNA. One layer captures local interactions, while the other focuses on non-local interactions. EquiRNA [265] introduces a hierarchical equivariant graph neural network with a size-insensitive K-nearest neighbor sampling strategy, aimed at solving the size generalization challenge through the reuse of nucleotide representations. Datasets: ARES [35] uses a collection of 18K records from the FARFAR2-Classics dataset [352] as its training and validation sets. In addition, they have constructed two test sets: the first test set was selected from the FARFAR2-Puzzles dataset [352]; the second test set was curated based on certain criteria and built using the FARFAR2 rna denovo application. EquiRNA [265] introduces rRNAsolo, a new dataset for assessing size generalization in RNA structure evaluation. It covers a wider range of RNA sizes, more RNA types, and more recent RNA structures than existing datasets. 6 Discussion and future prospect Whilst much progress has been made in this field, there are still a broad range of open research directions. We discuss several examples as follows. Geometric graph foundation model. Recent advancements in AI research, exemplified by the remarkable progress of models like the GPT series [353–355] and Gato [356], have brought about substantial advantages by employing a unified foundational model across various tasks and domains. Foundation models diminish the necessity of manually crafting inductive biases for individual domains, amplifies the volume and variety of training data, and holds promise for further enhancement with increased data, computational resources, and model complexity. It is natural to mimic such success to geometric domain. However, it remains an interesting open question, especially considering the following design spaces. 1. Task space: How to pretrain a large scale model that is generally beneficial to various downstream tasks? 2. Data space: How to build a foundation model that can simultaneously extract rich information that spans across different types or scales of the geometric data? 3. Model space: How to truly scale the model in terms of capacity and expressivity, such that more knowledge can be captured and stored in the model? Although some initial works (such as EPT [90]) manage to pretrain a unified model on small molecules and proteins, it still lacks a universal model that can tackle more kinds of input data and tasks. Effective loop between model training and real-world experimental verification. Unlike typical applications in 26 Front. Comput. Sci., 2025, 19(11): 1911375 vision and NLP, tasks in science usually require expensive labor, computational resources, and instruments to produce data, conduct verification, and record results. Existing research often adopts an open-loop style, where datasets are collected beforehand and proposed models are evaluated offline on these datasets. However, this approach presents two significant issues. Firstly, the constructed datasets are often small and insufficient for training geometric GNNs, especially for data-hungry foundational models equipped with largescale parameters. Secondly, evaluating models solely on standalone datasets may fail to reflect feedback from the real world, resulting in less reliable evaluation of the model’s true ability. These issues can be effectively addressed by training and testing geometric GNNs within a closed loop between model prediction and experimental verification. A notable example is provided by GNoME [357], which integrates an end-to-end pipeline consisting of graph network training, DFT computations, and autonomous laboratories for materials discovery and synthesis. It is expected that such a research paradigm will become increasingly important in future studies related to scientific applications. Integration with large language models. Large Language Models (LLMs) have been extensively shown to possess a wealth of knowledge, spanning various domains. Moreover, there has been a development of domain-specific Language Model Agents (LMAs) that exhibit high levels of expertise in specific areas [358,359]. Given that many of the tasks under discussion are intricately linked with the natural sciences, such as physics, biochemistry, and material science, which often require a deep understanding of domain-specific knowledge, it becomes compelling to enhance the existing knowledge base by integrating LLM agents into the training and evaluation pipeline of geometric Graph Neural Networks (GNNs). This integration holds promise for augmenting the capabilities of GNNs by leveraging the comprehensive knowledge representations offered by LLMs, thereby potentially improving the performance and robustness of these models in scientific applications. While there have been works leveraging LLMs for certain tasks such as molecule property prediction and drug design, they only operate on motifs [360,361] or molecule graphs [362]. It still remains challenging to bridge them with geometric graph neural networks, enabling the pipeline to process 3D structural information and perform prediction and/or generation over 3D structures. Relaxation of equivariance. While equivariance is undeniably pivotal for bolstering data efficiency and promoting generalization across diverse datasets, it is noteworthy that rigidly adhering to equivariance principles can sometimes overly constrain the model, potentially compromising its performance. Thus, delving into methodologies that offer a degree of flexibility in relaxing equivariance constraints holds considerable significance. By exploring approaches that strike a balance between maintaining equivariance and accommodating adaptability, researchers can unlock avenues for enhancing the practical utility of models. Several pioneer studies [363,364] try to relax the equivariance to a certain discrete point group and achieves a remarkable improvement on various dynamic physical systems, ranging from particle to vehicle dynamics. This exploration may not only enrich our understanding of model behavior but also pave the way for the development of more robust and versatile solutions with broader applicability. 7 Conclusion In this survey, we conduct a systematic investigation of the progress in geometric Graph Neural Networks (GNNs), through the lens of data structures, models, and their applications. We specify geometric graph as the data structure, which generalizes the concept of graph in the presence of geometric information and permits the vital symmetry under certain transformations. We present geometric GNNs as the models, which consist of invariant GNNs, scalarizationbased/high-degree steerable equivariant GNNs, and geometric graph transformers. We exhaustively discuss their applications through the taxonomy on the data and tasks, including both single instance and multi-instance tasks over domains in physics, biochemistry, and others like materials and RNAs. We also discuss the challenges and the future potential directions of geometric GNNs. Acknowledgement This work was jointly supported by the following projects: The National Natural Science Foundation of China (Grant Nos. 62376276 and 62172422); Beijing Nova Program (Grant No. 20230484278); the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (Grant No. 23XNKJ19); and Tencent AI Lab Rhino-Bird Focused Research Program. Competing interests The authors declare that they have no competing interests or financial conflicts to disclose. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit creativecommons.org/licenses/by/ 4.0/ References 1. 2. 3. 4. 5. Bronstein M M, Bruna J, Cohen T, Veličković P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. 2021, arXiv preprint arXiv: 2104.13478 Schütt K T, Arbabzadah F, Chmiela S, Müller K R, Tkatchenko A. Quantum-chemical insights from deep tensor neural networks. Nature Communications, 2017, 8: 13890 Klicpera J, Groß J, Gunnemann S. Directional message passing for molecular graphs. In: Proceedings of the 8th International Conference on Learning Representations. 2020 Klicpera J, Becker F, Günnemann S. GemNet: universal directional graph neural networks for molecules. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 520 Satorras V G, Hoogeboom E, Welling M. E(n) equivariant graph Jiaqi HAN et al. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. A survey of geometric graph neural networks: data structures, models and applications neural networks. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 9323−9332 Schütt K, Unke O, Gastegger M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 9377−9388 Thomas N, Smidt T, Kearnes S, Yang L, Li L, Kohlhoff K, Riley P. Tensor field networks: rotation- and translation-equivariant neural networks for 3D point clouds. 2018, arXiv preprint arXiv: 1802.08219 Fuchs F B, Worrall D E, Fischer V, Welling M. SE(3)-Transformers: 3D roto-translation equivariant attention networks. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020 Brandstetter J, Hesselink R, van der Pol E, Bekkers E J, Welling M. Geometric and physical quantities improve E(3) equivariant message passing. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Batzner S, Musaelian A, Sun L, Geiger M, Mailoa J P, Kornbluth M, Molinari N, Smidt T E, Kozinsky B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 2022, 13(1): 2453 Liao Y L, Smidt T E. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, et al. Accurate prediction of protein structures and interactions using a threetrack neural network. Science, 2021, 373(6557): 871−876 Watson J L, Juergens D, Bennett N R, Trippe B L, Yim J, et al. De novo design of protein structure and function with RFdiffusion. Nature, 2023, 620(7976): 1089−1100 Ingraham J B, Baranov M, Costello Z, Barber K W, Wang W, et al. Illuminating protein space with a programmable generative model. Nature, 2023, 623(7989): 1070−1078 Townshend R J L, Eismann S, Watkins A M, Rangan R, Karelina M, Das R, Dror R O. Geometric deep learning of RNA structure. Science, 2021, 373(6558): 1047−1051 Corso G, Stärk H, Jing B, Barzilay R, Jaakkola T S. DiffDock: diffusion steps, twists, and turns for molecular docking. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Kong X, Huang W, Liu Y. End-to-end full-atom antibody design. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 718 Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E. Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1263−1272 McNutt A T, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, Sunseri J, Koes D R. GNINA 1.0: molecular docking with deep learning.Journal of Cheminformatics, 2021, 13(1): 43 Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, Weitzner B D, Hu X, Adachi Y, Schief W R, Dunbrack Jr R L. RosettaAntibodyDesign (RAbD): a general framework for computational antibody design. PLoS Computational Biology, 2018, 14(4): e1006112 Ramakrishnan R, Dral P O, Rupp M, von Lilienfeld O A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 2014, 1: 140022 Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R. Forging the basis for developing protein–ligand interaction scoring functions. Accounts of Chemical Research, 2017, 50(2): 302−309 Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, Shi J, Deane C M. SAbDab: the structural antibody database. Nucleic Acids Research, 2014, 42(D1): D1140−D1146 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 27 Han J, Rong Y, Xu T, Huang W. Geometrically equivariant graph neural networks: a survey. 2022, arXiv preprint arXiv: 2202.07230 Han J, Huang W, Ma H, Li J, Tenenbaum J B, Gan C. Learning physical dynamics with subequivariant graph neural networks. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022 Sanchez-Gonzalez A, Godwin J, Pfaff T, Ying R, Leskovec J, Battaglia P. Learning to simulate complex physics with graph networks. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 8459−8468 Kipf T, Fetaya E, Wang K C, Welling M, Zemel R. Neural relational inference for interacting systems. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 2688−2697 Huang Y, Peng X, Ma J, Zhang M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 9280−9294 Guan J, Qian W W, Peng X, Su Y, Peng J, Ma J. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Jing B, Corso G, Chang J, Barzilay R, Jaakkola T. Torsional diffusion for molecular conformer generation. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1760 Wu L, Hou Z, Yuan J, Rong Y, Huang W. Equivariant spatio-temporal attentive graph networks to simulate physical dynamics. In: Proceedings of the 37th International Conference on Neural Information Processing System. 2023, 1965 Kong X, Huang W, Liu Y. Conditional antibody design as 3D equivariant graph translation. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Senior A W, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson A W R, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones D T, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature, 2020, 577(7792): 706−710 Chanussot L, Das A, Goyal S, Lavril T, Shuaibi M, Riviere M, Tran K, Heras-Domingo J, Ho C, Hu W, Palizhati A, Sriram A, Wood B, Yoon J, Parikh D, Zitnick C L, Ulissi Z. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catalysis, 2021, 11(10): 6059−6072 Kong X, Jia Y, Huang W, Liu Y. Full-atom peptide design with geometric latent diffusion. In: Proceedings of the 38th Conference on Neural Information Processing Systems. 2024 Duval A, Mathis S V, Joshi C K, Schmidt V, Miret S, Malliaros F D, Cohen T, Liò P, Bengio Y, Bronstein M. A hitchhiker’s guide to geometric GNNs for 3D atomic systems. 2024, arXiv preprint arXiv: 2312.07511 Xia J, Zhu Y, Du Y, Li S Z. A systematic survey of chemical pretrained models. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 6787−6795 Guo Z, Guo K, Nan B, Tian Y, Iyer R G, Ma Y, Wiest O, Zhang X, Wang W, Zhang C, Chawla N V. Graph-based molecular representation learning. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 6638−6646. Atz K, Grisoni F, Schneider G. Geometric deep learning on molecular representations. Nature Machine Intelligence, 2021, 3(12): 1023−1032 Zhang X, Wang L, Helwig J, Luo Y, Fu C, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. 2025, arXiv preprint arXiv: 2307.08423 Esteves C. Theoretical aspects of group equivariant neural networks. 28 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. Front. Comput. Sci., 2025, 19(11): 1911375 2020, arXiv preprint arXiv: 2004.05154 Cederberg J. A course in modern geometries. Springer Science & Business Media, 2004 Wu Z, Pan S, Chen F, Long G, Zhang C, Philip S Y. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4−24 Yuan Z, Wei Z, Lv F, Wen J R. Index-free triangle-based graph local clustering. Frontiers of Computer Science, 2024, 18(3): 183404 Wu Z, Ramsundar B, Feinberg E N, Gomes J, Geniesse C, Pappu A S, Leswing K, Pande V. MoleculeNet: a benchmark for molecular machine learning. Chemical Science, 2018, 9(2): 513−530 Villar S, Hogg D W, Storey-Fisher K, Yao W, Blum-Smith B. Scalars are universal: equivariant machine learning, structured like classical physics. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Schutt K T, Sauceda H E, Kindermans P J, Tkatchenko A, Müller K R. SchNet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 2018, 148(24): 241722 Baek M, Anishchenko I, Humphreys I R, Cong Q, Baker D, DiMaio F. Efficient and accurate prediction of protein structure using RoseTTAFold2. bioRxiv, 2023 Luo Y, Liu C, Ji S. Towards symmetry-aware generation of periodic materials. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36 Jiao R, Huang W, Lin P, Han J, Chen P, Lu Y, Liu Y. Crystal structure prediction by joint equivariant diffusion. In: Proceedings of the 37th International Conference on Neural Information Processing System. 2023, 767 Huang W, Han J, Rong Y, Xu T, Sun F, Huang J. Equivariant graph mechanics networks with constraints. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Gasteiger J, Giri S, Margraf J T, Günnemann S. Fast and uncertaintyaware directional message passing for non-equilibrium molecules. 2022, arXiv preprint arXiv: 2011.14115 Zhu F, Futrega M, Bao H, Eryilmaz S B, Kong F, Duan K, Zheng X, Angel N, Jouanneaux M, Stadler M, Marcinkiewicz M, Xie F, Yang J, Andersch M. FastDimeNet++: training DimeNet++ in 22 minutes. In: Proceedings of the 52nd International Conference on Parallel Processing. 2023, 274−284 Finzi M, Stanton S, Izmailov P, Wilson A G. Generalizing convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 3165−3176 Liu Y, Wang L, Liu M, Lin Y, Zhang X, Oztekin B, Ji S. Spherical message passing for 3D molecular graphs. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Wang L, Liu Y, Lin Y, Liu H, Ji S. ComENet: towards complete and efficient message passing for 3D molecular graphs. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 47 Li Z, Wang X, Huang Y, Zhang M. Is distance matrix enough for geometric deep learning? In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1627 Li Z, Wang X, Kang S, Zhang M. On the completeness of invariant geometric deep learning models. 2024, arXiv preprint arXiv: 2402.04836 Yue A, Luo D, Xu H. A plug-and-play quaternion message-passing module for molecular conformation representation. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 16633−16641 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. Du W, Zhang H, Du Y, Meng Q, Chen W, Zheng N, Shao B, Liu T Y. SE(3) equivariant graph neural networks with complete local frames. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 5583−5608 Kofinas M, Nagaraja N S, Gavves E. Roto-translated local coordinate frames for interacting dynamical systems. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Kofinas M, Bekkers E J, Nagaraja N S, Gavves E. Latent field discovery in interacting dynamical systems with neural fields. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1379 Kohler J, Klein L, Noé F. Equivariant flows: sampling configurations for multi-body systems with symmetric energies. 2019, arXiv preprint arXiv: 1910.00753 Jing B, Eismann S, Suriana P, Townshend R J L, Dror R O. Learning from protein structure with geometric vector perceptrons. In: Proceedings of the 9th International Conference on Learning Representations. 2021 Han J, Huang W, Xu T, Rong Y. Equivariant graph hierarchy-based neural networks. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022 Zhang Y, Cen J, Han J, Zhang Z, Zhou J, Huang W. Improving equivariant graph neural networks on large geometric graphs via virtual nodes learning. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Puny O, Atzmon M, Smith E J, Misra I, Grover A, Ben-Hamu H, Lipman Y. Frame averaging for invariant and equivariant network design. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Duval A A, Schmidt V, Hernández-Garcıa A, Miret S, Malliaros F D, Bengio Y, Rolnick D. FAENet: frame averaging equivariant GNN for materials modeling. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 9013−9033 Du W, Du Y, Wang L, Feng D, Wang G, Ji S, Gomes C P, Ma Z M. A new perspective on building efficient and expressive 3D equivariant graph neural networks. In: Proceedings of the 37th International Conference on Neural Information Processing System. 2023, 2910 Aykent S, Xia T. SaVeNet: a scalable vector network for enhanced molecular representation learning. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1860 Wang Y, Wang T, Li S, He X, Li M, Wang Z, Zheng N, Shao B, Liu T Y. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nature Communications, 2024, 15(1): 313 Wang Z, Liu G, Zhou Y, Wang T, Shao B. QuinNet: efficiently incorporating quintuple interactions into geometric deep learning force fields. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 3368 Cen J, Li A, Lin N, Ren Y, Wang Z, Huang W. Are high-degree representations really unnecessary in equivariant graph neural networks? In: Proceedings of the 38th Conference on Neural Information Processing Systems. 2024 Battiloro C, Karaismailoglu E, Tec M, Da-soulas G, Audirac M, Dominici F. E(n) equivariant topological neural networks. In: Proceedings of the Thirteenth International Conference on Learning Representations. 2025 Li Z, Cen J, Su B, Huang W, Xu T, Rong Y, Zhao D. Large languagegeometry model: when LLM meets equivariance. 2025, arXiv preprint Jiaqi HAN et al. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. A survey of geometric graph neural networks: data structures, models and applications arXiv: 2502.11149 Anderson B, Hy T S, Kondor R. Cormorant: covariant molecular neural networks. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019 Musaelian A, Batzner S, Johansson A, Sun L, Owen C J, Kornbluth M, Kozinsky B. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications, 2023, 14(1): 579 Zitnick C L, Das A, Kolluru A, Lan J, Shuaibi M, Sriram A, Ulissi Z, Wood B. Spherical channels for modeling atomic interactions. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 585 Passaro S, Zitnick C L. Reducing SO(3) convolutions to SO(2) for efficient equivariant GNNs. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 1140 Batatia I, Kovács D P, Simm G N C, Ortner C, Csányi G. MACE: higher order equivariant message passing neural networks for fast and accurate force fields. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 11423−11436 Ying C, Cai T, Luo S, Zheng S, Ke G, He D, Shen Y, Liu T Y. Do transformers really perform bad for graph representation? In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Shi Y, Zheng S, Ke G, Shen Y, You J, He J, Luo S, Liu C, He D, Liu T Y. Benchmarking graphormer on large-scale molecular modeling datasets. 2023, arXiv preprint arXiv: 2203.04810 Thölke P, de Fabritiis G. Equivariant transformers for neural network based molecular potentials. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Hutchinson M J, Le Lan C, Zaidi S, Dupont E, Teh Y W, Kim H. Lietransformer: equivariant self-attention for Lie groups. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4533−4543 Hsu C, Verkuil R, Liu J, Lin Z, Hie B, Sercu T, Lerer A, Rives A. Learning inverse folding from millions of predicted structures. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 8946−8970 Liao Y L, Wood B M, Das A, Smidt T E. EquiformerV2: improved equivariant transformer for scaling to higher-degree representations. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Wang Y, Li S, Wang T, Shao B, Zheng N, Liu T Y. Geometric transformer with interatomic positional encoding. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36 Frank J T, Unke O T, Müller K R, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nature Communications, 2024, 15(1): 6539 Aykent S, Xia T. GotenNet: rethinking efficient 3D equivariant graph neural networks. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Jiao R, Kong X, Yu Z, Huang W, Liu Y. Equivariant pretrained transformer for unified geometric learning on multi-domain 3D molecules. 2025, arXiv preprint arXiv: 2402.12714v1 Ma H, Bian Y, Rong Y, Huang W, Xu T, Xie W, Ye G, Huang J. Cross-dependent graph neural networks for molecular property prediction. Bioinformatics, 2022, 38(7): 2003−2009 Zhang M, Li P. Nested graph neural networks. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 15734−15747 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 29 Qin S, Zhang X, Xu H, Xu Y. Fast quaternion product units for learning disentangled representations in SO(3). IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4504−4520 Zhu X, Xu Y, Xu H, Chen C. Quaternion convolutional neural networks. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 645−661 Zhang X, Qin S, Xu Y, Xu H. Quaternion product units for deep learning on 3D rotation groups. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 7302−7311 Joshi C K, Bodnar C, Mathis S V, Cohen T, Liò P. On the expressive power of geometric graph neural networks. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 625 Gilmore R. Lie Groups, Physics, and Geometry: An Introduction for Physicists, Engineers and Chemists. Cambridge: Cambridge University Press, 2008 Müller C. Spherical Harmonics. Berlin: Springer, 2006 Griffiths D J, Schroeter D F. Introduction to Quantum Mechanics. Cambridge: Cambridge University Press, 2018 Weiler M, Geiger M, Welling M, Boomsma W, Cohen T. 3D steerable CNNs: learning rotationally equivariant features in volumetric data. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. 2018, 31 Ramachandran P, Zoph B, Le Q V. Searching for activation functions. In: Proceedings of the 6th International Conference on Learning Representations. 2018 Drautz R. Atomic cluster expansion for accurate and transferable interatomic potentials. Physical Review B, 2019, 99(1): 014104 Dusson G, Bachmayr M, Csányi G, Drautz R, Etter S, van der Oord C, Ortner C. Atomic cluster expansion: completeness, efficiency and stability. Journal of Computational Physics, 2022, 454: 110946 Bochkarev A, Lysogorskiy Y, Menon S, Qamar M, Mrovec M, Drautz R. Efficient parametrization of the atomic cluster expansion. Physical Review Materials, 2022, 6(1): 013804 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010 Yuan C, Zhao K, Kuruoglu E E, Wang L, Xu T, Huang W, Zhao D, Cheng H, Rong Y. A survey of graph transformers: Architectures, theories and applications. arXiv preprint arXiv: 2502.16533, 2025 Hu W, Fey M, Ren H, Nakata M, Dong Y, Leskovec J. OGB-LSC: a large-scale challenge for machine learning on graphs. In: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 2021 Shuaibi M, Kolluru A, Das A, Grover A, Sriram A, Ulissi Z, Zitnick C L. Rotation invariant graph neural networks using spin convolutions. 2021, arXiv preprint arXiv: 2106.09575 Dym N, Maron H. On the universality of rotation equivariant point cloud networks. In: Proceedings of the 9th International Conference on Learning Representations. 2021 Weisfeiler B, Leman A. The reduction of a graph to canonical form and the algebra which appears therein. Nauchno-Technicheskaya Informatsia, 1968, 2(9): 12−16 Lawrence H, Portilheiro V, Zhang Y, Kaba S O. Improving equivariant networks with probabilistic symmetry breaking. In: Proceedings of the Geometry-Grounded Representation Learning and Generative Modeling at 41st International Conference on Machine Learning. 2024 Battaglia P, Pascanu R, Lai M, Jimenez Rezende D, Kavukcuoglu K. Interaction networks for learning about objects, relations and physics. 30 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. Front. Comput. Sci., 2025, 19(11): 1911375 In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 4509−4517 Sanchez-Gonzalez A, Bapst V, Cranmer K, Battaglia P. Hamiltonian graph networks with ode integrators. 2019, arXiv preprint arXiv: 1909.12790 Guo L, Wang W, Chen Z, Zhang N, Sun Z, Lai Y, Zhang Q, Chen H. Newton–cotes graph neural networks: on the time evolution of dynamic systems. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36 Allen K R, Guevara T L, Rubanova Y, Stachenfeld K, SanchezGonzalez A, Battaglia P, Pfaff T. Graph network simulators can learn discontinuous, rigid contact dynamics. In: Proceedings of the 6th Conference on Robot Learning. 2023, 1157−1167 Rubanova Y, Sanchez-Gonzalez A, Pfaff T, Battaglia P. Constraintbased graph network simulator. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 18844−18870 Wu T, Wang Q, Zhang Y, Ying R, Cao K, Sosic R, Jalali R, Hamam H, Maucec M, Leskovec J. Learning large-scale subsurface simulations with a hybrid graph network simulator. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 4184−4194 Li Y, Wu J, Tedrake R, Tenenbaum J B, Torralba A. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In: Proceedings of the 7th International Conference on Learning Representations. 2019 Mrowca D, Zhuang C, Wang E, Haber N, Fei-Fei L, Tenenbaum J B, Yamins D L K. Flexible neural representation for physics prediction. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8813−8824 Allen K R, Rubanova Y, Lopez-Guevara T, Whitney W, SanchezGonzalez A, Battaglia P W, Pfaff T. Learning rigid dynamics with face interaction graph networks. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Xu C, Tan R T, Tan Y, Chen S, Wang Y G, Wang X, Wang Y. EqMotion: equivariant multi-agent motion prediction with invariant interaction reasoning. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 1410−1420 Liu Y, Cheng J, Zhao H, Xu T, Zhao P, Tsung F G, Li J, Rong Y. Improving generalization in equivariant graph neural networks with physical inductive biases. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Coors B, Condurache A P, Geiger A. SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 525−541 Wang X, Zhang M. Graph neural network with local frame for molecular potential energy surface. In: Proceedings of the 1st Learning on Graphs Conference. 2022, 19 Luo S, Chen T, Krishnapriyan A S. Enabling efficient equivariant operations in the Fourier basis via gaunt tensor products. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Köhler J, Klein L, Noe F. Equivariant flows: exact likelihood generative learning for symmetric densities. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 5361−5370 Xu M, Han J, Lou A, Kossaifi J, Ramanathan A, Azizzadenesheli K, Leskovec J, Ermon S, Anandkumar A. Equivariant graph neural operator for modeling 3D dynamics. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Schreiner M, Winther O, Olsson S. Implicit transfer operator learning: multiple time-resolution surrogates for molecular dynamics. In: 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1582 Midgley L I, Stimper V, Antorán J, Mathieu E, Schölkopf B, Hernández-Lobato J M. SE(3) equivariant augmented coupling flows. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 3466 Han J, Xu M, Lou A, Ye H, Ermon S. Geometric trajectory diffusion models. In: Proceedings of the 38th Conference on Neural Information Processing Systems. 2024 Raja S, Amin I, Pedregosa F, Krishnapriyan A S. Stability-aware training of neural network interatomic potentials with differentiable Boltzmann estimators. 2025, arXiv preprint arXiv: 2402.13984v1 Amin I, Raja, Krishnapriyan A S. Towards fast, specialized machine learning force fields: distilling foundation models via energy hessians. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Xu M, Yu L, Song Y, Shi C, Ermon S, Tang J. GeoDiff: a geometric diffusion model for molecular conformation generation. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Xu M, Powers A S, Dror R O, Ermon S, Leskovec J. Geometric latent diffusion models for 3D molecule generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 38592−38610 Xu M, Wang W, Luo S, Shi C, Bengio Y, Gomez-Bombarelli R, Tang J. An end-to-end framework for molecular conformation generation via bilevel programming. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11537−11547 Shi C, Luo S, Xu M, Tang J. Learning gradient fields for molecular conformation generation. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 9558−9568 Gebauer N W A, Gastegger M, Schutt K T. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32 Gebauer N W A, Gastegger M, Hessmann S S P, Müller K R, Schütt K T. Inverse design of 3D molecular structures with conditional generative neural networks. Nature Communications, 2022, 13(1): 973 Huang L, Zhang H, Xu T, Wong K C. MDM: molecular diffusion model for 3D molecule generation. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 5105−5112 Peng X, Guan J, Liu Q, Ma J. MolDiff: addressing the atom-bond inconsistency problem in 3D molecule diffusion generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 27611−27629 Luo S, Shi C, Xu M, Tang J. Predicting molecular conformation via dynamic graph score matching. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Satorras V G, Hoogeboom E, Fuchs F B, Posner I, Welling M. E(n) equivariant normalizing flows. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 320 Hoogeboom E, Satorras V G, Vignac C, Welling M. Equivariant diffusion for molecule generation in 3D. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 8867−8887 Ganea O E, Pattanaik L, Coley C W, Barzilay R, Jensen K F, Green W H, Jaakkola T S. GEOMOL: torsional geometric generation of molecular 3D conformer ensembles. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Wang F, Xu H, Chen X, Lu S, Deng Y, Huang W. MPerformer: an SE(3) transformer-based molecular perceptron. In: Proceedings of the Jiaqi HAN et al. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. A survey of geometric graph neural networks: data structures, models and applications 32nd ACM International Conference on Information and Knowledge Management. 2023, 2512−2522 Bao F, Zhao M, Hao Z, Li P, Li C, Zhu J. Equivariant energy-guided SDE for inverse molecular design. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Zhu J, Xia Y, Liu C, Wu L, Xie S, Wang Y, Wang T, Qin T, Zhou W, Li H, Liu H, Liu T Y. Direct molecular conformation generation. Transactions on Machine Learning Research, 2022, See openreview. net/forum?id=lCPOHiztuw website, 2022 Qiang B, Song Y, Xu M, Gong J, Gao B, Zhou H, Ma W Y, Lan Y. Coarse-to-fine: a hierarchical diffusion model for molecule generation in 3D. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 28277–28299 Song Y, Gong J, Xu M, Cao Z, Lan Y, Ermon S, Zhou H, Ma W Y. Equivariant flow matching with hybrid probability transport for 3D molecule generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 26 Reidenbach D, Krishnapriyan A S. Coarsenconf: equivariant coarsening with aggregated attention for molecular conformer generation. Journal of Chemical Information and Modeling, 2025, 65(1): 22−30 Song Y, Gong J, Zhou H, Zheng M, Liu J, Ma W Y. Unified generative modeling of 3D molecules with Bayesian flow networks. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Qu Y, Qiu K, Song Y, Gong J, Han J, Zheng M, Zhou H, Ma W Y. MolCRAFT: structure-based drug design in continuous parameter space. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Jiao R, Han J, Huang W, Rong Y, Liu Y. Energy-motivated equivariant pretraining for 3D molecular graphs. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 8096−8104 Liu S, Guo H, Tang J. Molecular geometry pretraining with SE(3)invariant denoising distance matching. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J. Pre-training molecular graph representation with 3D geometry. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Zaidi S, Schaarschmidt M, Martens J, Kim H, Teh Y W, SanchezGonzalez A, Battaglia P W, Pascanu R, Godwin J. Pre-training via denoising for molecular property prediction. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Feng J, Wang Z, Li Y, Ding B, Wei Z, Xu H. MGMAE: molecular representation learning by reconstructing heterogeneous graphs with a high mask ratio. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022, 509−519 Stärk H, Beaini D, Corso G, Tossou P, Dallago C, Gunnemann S, Lió P. 3D infomax improves GNNs for molecular property prediction. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 20479−20502 Zhou G, Gao Z, Ding Q, Zheng H, Xu H, Wei Z, Zhang L, Ke G. UniMol: a universal 3D molecular representation learning framework. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Luo S, Chen T, Xu Y, Zheng S, Liu T Y, Wang L, He D. One transformer can understand both 2D & 3D molecular data. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Liu S, Du W, Ma Z M, Guo H, Tang J. A group symmetric stochastic differential equation model for molecule multi-modal pretraining. In: 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 31 Proceedings of the 40th International Conference on Machine Learning. 2023, 21497–21526 Ni Y, Feng S, Ma W Y, Ma Z M, Lan Y. Sliced denoising: a physicsinformed molecular pre-training method. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Feng S, Ni Y, Lan Y, Ma Z M, Ma W Y. Fractional denoising for 3D molecular pre-training. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 9938−9961 Liu Y, Chen J, Jiao R, Li J, Huang W, Su B. DenoiseVAE: learning molecule-adaptive noise distributions for denoising-based 3D molecular pre-training. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Liu S, Rong Y, Zhao D, Liu Q, Wu S, Wang L. MolSpectra: pretraining 3D molecular representation with multi-modal energy spectra. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Wang Z, Combs S A, Brand R, Calvo M R, Xu P, Price G, Golovach N, Salawu E O, Wise C J, Ponnapalli S P, Clark P M. LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction. Scientific Reports, 2022, 12(1): 6832 Gligorijević V, Renfrew P D, Kosciolek T, Leman J K, Berenberg D, Vatanen T, Chandler C, Taylor B C, Fisk I M, Vlamakis H, Xavier R J, Knight R, Cho K, Bonneau R. Structure-based protein function prediction using graph convolutional networks. Nature Communications, 2021, 12(1): 3168 Zhang Z, Xu M, Jamasb A R, Chenthamarakshan V, Lozano A C, Das P, Tang J. Protein representation learning by geometric structure pretraining. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Torng W, Altman R B. 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinformatics, 2017, 18(1): 302 Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the tm-score. Nucleic Acids Research, 2005, 33(7): 2302−2309 Eismann S, Townshend R J L, Thomas N, Jagota M, Jing B, Dror R O. Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes. Proteins: Structure, Function, and Bioinformatics, 2021, 89(5): 493−501 Eismann S, Suriana P, Jing B, Townshend R J L, Dror R O. Protein model quality assessment using rotation-equivariant transformations on point clouds. Proteins: Structure, Function, and Bioinformatics, 2023, 91(8): 1089−1096 Chen C, Chen X, Morehead A, Wu T, Cheng J. 3D-equivariant graph neural networks for protein model quality assessment. Bioinformatics, 2023, 39(1): btad030 Tubiana J, Schneidman-Duhovny D, Wolfson H J. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nature Methods, 2022, 19(6): 730−739 Zhang Y, Wei Z, Yuan Y, Ding Z, Huang W. EquiPocket: an E(3)equivariant geometric graph neural network for ligand binding site prediction. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Meller A, Ward M D, Borowsky J H, Lotthammer J M, Kshirsagar M, Oviedo F, Ferres J L, Bowman G. Predicting the locations of cryptic pockets from single protein structures using the pocketminer graph neural network. Biophysical Journal, 2023, 122(3Suppl): 445A Ingraham J, Garg V K, Barzilay R, Jaakkola T. Generative models for graph-based protein design. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32 Tan C, Gao Z, Xia J, Hu B, Li S Z. Generative de novo protein design 32 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. Front. Comput. Sci., 2025, 19(11): 1911375 with global context. 2023, arXiv preprint arXiv: 2204.10673 Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte R J, Milles L F, Wicky B I M, Courbet A, de Haas R J, Bethel N, Leung P J Y, Huddy T F, Pellock S, Tischer D, Chan F, Koepnick B, Nguyen H, Kang A, Sankaran B, Bera A K, King N P, Baker D. Robust deep learning–based protein sequence design using ProteinMPNN. Science, 2022, 378(6615): 49−56 Gao Z, Tan C, Li S Z. PiFold: toward effective and efficient protein inverse folding. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Zheng Z, Deng Y, Xue D, Zhou Y, Ye F, Gu Q. Structure-informed language models are protein designers. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 1781 Gao Z, Tan C, Chen X, Zhang Y, Xia J, Li S, Li S Z. KW-Design: pushing the limit of protein design via knowledge refinement. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Jumper J, Evans R, Pritzel A, Green T, Figurnov M, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583−589 Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science, 2024, 384(6693): eadl2528 Jing B, Erives E, Pao-Huang P, Corso G, Berger B, Jaakkola T. EigenFold: generative protein structure prediction with diffusion models. In: Proceedings of the ICLR 2023-Machine Learning for Drug Discovery Workshop. 2023 Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, Dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023, 379(6637): 1123−1130 Fang X, Wang F, Liu L, He J, Lin D, Xiang Y, Zhu K, Zhang X, Wu H, Li H, Song L. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nature Machine Intelligence, 2023, 5(10): 1087−1096 Shi C, Wang C, Lu J, Zhong B, Tang J. Protein sequence and structure co-design with equivariant translation. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Yue A, Wang Z, Xu H. ReQFlow: rectified quaternion flow for efficient and high-quality protein backbone generation. 2025, arXiv preprint arXiv: 2502.14637 Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: toward understanding the language of life through selfsupervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7112−7127 Chen B, Cheng X, Li P, Geng Y A, Gong J, Li S, Bei Z, Tan X, Wang B, Zeng X, Liu C, Zeng A, Dong Y, Tang J, Song L. xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein. 2024, arXiv preprint arXiv: 2401.06199 Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nature Communications, 2022, 13(1): 4348 Mansoor S, Baek M, Madan U, Horvitz E. Toward more general embeddings for protein design: harnessing joint representations of sequence and structure. bioRxiv, 2021 Gao B, Jia Y, Mo Y, Ni Y, Ma W Y, Ma Z M, Lan Y. Self-supervised pocket pretraining via protein fragment-surroundings alignment. In: Proceedings of the 12th International Conference on Learning Representations. 2024 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. Wang Z, Zhang Q, Hu S, Yu H, Jin X, Gong Z, Chen H. Multi-level protein structure pre-training via prompt learning. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Gao B, Qiang B, Tan H, Ren M, Jia Y, Lu M, Liu J, Ma W Y, Lan Y. DrugCLIP: contrastive protein-molecule representation learning for virtual screening. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36 Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick C L, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America, 2021, 118(15): e2016239118 Guo Y, Wu J, Ma H, Huang J. Self-supervised pre-training for protein embeddings using tertiary structures. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 6801−6809 Yuan C, Li S, Ye G, Zhang Y, Huang L K, Huang W, Liu W, Yao J, Rong Y. Annotation-guided protein design with multi-level domain alignment. 2024, arXiv preprint arXiv: 2404.16866 Igashov I, St ¨ark H, Vignac C, Schneuing A, Satorras V G, Frossard P, Welling M, Bronstein M, Correia B. Equivariant 3d-conditional diffusion model for molecular linker design. Nature Machine Intelligence, 2024, 6(4): 417–427 Imrie F, Bradley A R, van der Schaar M, Deane C M. Deep generative models for 3D linker design. Journal of Chemical Information and Modeling, 2020, 60(4): 1983−1995 Duan C, Du Y, Jia H, Kulik H J. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. Nature Computational Science, 2023, 3(12): 1045−1055 Jackson R, Zhang W, Pearson J. TSNet: predicting transition state structures with tensor field networks and transfer learning. Chemical Science, 2021, 12(29): 10022−10040 Gainza P, Sverrisson F, Monti F, Rodolà E, Boscaini D, Bronstein M M, Correia B E. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 2020, 17(2): 184−192 Kong X, Huang W, Liu Y. Generalist equivariant transformer towards 3D molecular interaction learning. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 25149−25175 Wang L, Liu H, Liu Y, Kurtin J, Ji S. Learning hierarchical protein representations via complete 3D graph networks. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Zhao K, Rong Y, Jiang B, Tang J, Zhang H, Yu J X, Zhao P. Geometric graph learning for protein mutation effect prediction. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023, 3412−3422 Feng S, Li M, Jia Y, Ma W Y, Lan Y. Protein-ligand binding representation learning from fine-grained interactions. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Jian Y, Wu C, Reidenbach D, Krishnapriyan A S. General binding affinity guidance for diffusion models in structure-based drug design. 2024, arXiv preprint arXiv: 2406.16821 Xue F, Zhang M, Li S, Gao X, Wohlschlegel J A, Huang W, Yang Y, Deng W. Se (3)-equivariant ternary complex prediction towards target protein degradation. arXiv preprint arXiv: 2502.18875, 2025 Stärk H, Ganea O, Pattanaik L, Barzilay D, Jaakkola T. EquiBind: geometric deep learning for drug binding structure prediction. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 20503−20521 Lu W, Wu Q, Zhang J, Rao J, Li C, Zheng S. TANKBind: trigonometry-aware neural networks for drug-protein binding structure Jiaqi HAN et al. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. A survey of geometric graph neural networks: data structures, models and applications prediction. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022 Long S, Zhou Y, Dai X, Zhou H. Zero-shot 3D drug design by sketching and generating. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 23894−23907 Pei Q, Gao K, Wu L, Zhu J, Xia Y, Xie S, Qin T, He K, Liu T Y, Yan R. FABind: fast and accurate protein-ligand binding. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023 Huang Y, Zhang O, Wu L, Tan C, Lin H, Gao Z, Li S, Li S Z. ReDock: towards flexible and realistic molecular docking with diffusion bridge. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Peng X, Luo S, Guan J, Xie Q, Peng J, Ma J. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 17644–17655 Lin H, Huang Y, Zhang O, Ma S, Liu M, Li X, Wu L, Wang J, Hou T, Li S Z. DiffBP: generative diffusion of 3D molecules for target protein binding. 2024, arXiv preprint arXiv: 2211.11214 Luo S, Guan J, Ma J, Peng J. A 3D generative model for structurebased drug design. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Liu M, Luo Y, Uchino K, Maruhashi K, Ji S. Generating 3D molecules for target protein binding. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 13912−13924 Zhang Z, Min Y, Zheng S, Liu Q. Molecule generation for target protein binding with structural motifs. In: Proceedings of the 11th International Conference on Learning Representations. 2023 Lin H, Huang Y, Zhang O, Wu L, Li S, Chen Z, Li S Z. Functionalgroup-based diffusion for pocket-specific molecule generation and elaboration. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 36 Qiu K, Song Y, Yu J, Ma H, Cao Z, Zhang Z, Wu Y, Zheng M, Zhou H, Ma W Y. Structure-based molecule optimization via gradientguided Bayesian update. 2024, arXiv preprint arXiv: 2411.13280 Pinheiro P O, Jamasb A, Mahmood O, Sresht V, Saremi S. Structurebased drug design by denoising voxel grids. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Morehead A, Chen C, Cheng J. Geometric transformers for protein interface contact prediction. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Sverrisson F, Feydy J, Correia B E, Bronstein M M. Fast end-to-end learning on protein surfaces. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 15267−15276 Townshend R J L, Bedi R, Suriana P A, Dror R O. End-to-end learning on 3D protein structure for interface prediction. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32 Rodrigues C H M, Pires D E V, Ascher D B. mmCSM-PPI: predicting the effects of multiple point mutations on protein–protein interactions. Nucleic Acids Research, 2021, 49(W1): W417−W424 Liu X, Luo Y, Li P, Song S, Peng J. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Computational Biology, 2021, 17(8): e1009284 Ganea O E, Huang X, Bunne C, Bian Y, Barzilay R, Jaakkola T S, Krause A. Independent SE(3)-equivariant models for end-to-end rigid protein docking. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Wang Y, Shen Y, Chen S, Wang L, Fei Y, Zhou H. Learning harmonic molecular representations on riemannian manifold. In: Proceedings of 231. 232. 233. 234. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 33 the 11th International Conference on Learning Representations. 2023 Jin W, Barzilay R, Jaakkola T. Antibody-antigen docking and design via hierarchical structure refinement. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 10217–10227 Ketata M A, Laue C, Mammadov R, Stärk H, Wu M, Corso G, Marquet C, Barzilay R, Jaakkola T S. DiffDock-PP: rigid proteinprotein docking with diffusion models. In: Proceedings of the ICLR 2023-Machine Learning for Drug Discovery Workshop. 2023 Ji Y, Bian Y, Fu G, Zhao P, Luo P. SyNDock: N rigid protein docking via learnable group synchronization. 2023, arXiv preprint arXiv: 2305.15156 Evans R, O’Neill M, Pritzel A, Antropova N, Senior A, et al. Protein complex prediction with alphafold-multimer. bioRxiv, 2021 Sverrisson F, Feydy J, Southern J, Bronstein M M, Correia B E. Physics-informed deep neural network for rigid-body protein docking. In: Proceedings of the MLDD 2022 - Machine Learning for Drug Discovery Workshop of ICLR 2022. 2022 Yu Z, Huang W, Liu Y. Rigid protein-protein docking via equivariant elliptic-paraboloid interface prediction. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Wu H, Liu W, Bian Y, Wu J, Yang N, Yan J. EBMDock: neural probabilistic protein-protein docking via a differentiable energy model. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Luo S, Su Y, Peng X, Wang S, Peng J, Ma J. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 709 Jin W, Wohlwend J, Barzilay R, Jaakkola T S. Iterative refinement graph neural network for antibody sequence-structure co-design. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Gao K, Wu L, Zhu J, Peng T, Xia Y, He L, Xie S, Qin T, Liu H, He K, Liu T Y. Incorporating pre-training paradigm for antibody sequencestructure co-design. 2022, arXiv preprint arXiv: 2211.08406 Tan C, Gao Z, Wu L, XIA J, Zheng J, Yang X, Liu Y, Hu B, Li S Z. Cross-gate MLP with protein complex invariant embedding is a oneshot antibody designer. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 15222−15230 Verma Y, Heinonen M, Garg V. AbODE: ab initio antibody design using conjoined ODEs. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 35037−35050 Martinkus K, Ludwiczak J, Cho K, Liang W C, Lafrance-Vanasse J, Hotzel I, Rajpal A, Wu Y, Bonneau R, Gligorijevic V, Loukas A. AbDiffuser: full-atom generation of in vitro functioning antibodies. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023 Wu F, Zhao Y, Wu J, Jiang B, He B, Huang L, Qin C, Yang F, Huang N, Xiao Y, Wang R, Jia H, Rong Y, Liu Y, Lai H, Xu T, Liu W, Zhao P, Yao J. Fast and accurate modeling and design of antibody-antigen complex using tFold. bioRxiv, 2024 Lin H, Wu L, Huang Y, Liu Y, Zhang O, Zhou Y, Sun R, Li S Z. GeoAB: towards realistic antibody design and reliable affinity maturation. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Wu L, Lin H, Huang Y, Gao Z, Tan C, Liu Y, Wu T, Li S Z. Relationaware equivariant graph networks for epitope-unknown antibody design and specificity optimization. 2024, arXiv preprint arXiv: 2501.00013 Xie X, Valiente P A, Kim P M. HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures. 34 248. 249. 250. 251. 252. 253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. Front. Comput. Sci., 2025, 19(11): 1911375 Bioinformatics, 2023, 39(1): btad036 Lin H, Zhang O, Zhao H, Jiang D, Wu L, Liu Z, Huang Y, Li S Z. PPFLOW: target-aware peptide design with torsional flow matching. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Xie T, Grossman J C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical Review Letters, 2018, 120(14): 145301 Chen C, Ye W, Zuo Y, Zheng C, Ong S P. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 2019, 31(9): 3564−3572 Choudhary K, DeCost B. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 2021, 7(1): 185 Kaba S O, Ravanbakhsh S. Equivariant networks for crystal structures. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 300 Yan K, Liu Y, Lin Y, Ji S. Periodic graph transformers for crystal material property prediction. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1096 Magar R, Wang Y, Barati Farimani A. Crystal twins: self-supervised learning for crystalline material property prediction. npj Computational Materials, 2022, 8(1): 231 Yu H, Song Y, Hu J, Guo C, Yang B. A crystal-specific pre-training framework for crystal material property prediction. 2023, arXiv preprint arXiv: 2306.05344 Song Z, Meng Z, King I. A diffusion-based pre-training framework for crystal property prediction. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 8993−9001 Xie T, Fu X, Ganea O E, Barzilay R, Jaakkola T S. Crystal diffusion variational autoencoder for periodic material generation. In: Proceedings of the 10th International Conference on Learning Representations. 2022 Jiao R, Huang W, Liu Y, Zhao D, Liu Y. Space group constrained crystal generation. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Zeni C, Pinsler R, Zügner D, Fowler A, Horton M, et al. MatterGen: a generative model for inorganic materials design. 2024, arXiv preprint arXiv: 2312.03687 Li Q, Jiao R, Wu L, Zhu T, Huang W, Jin S, Liu Y, Weng H, Chen X. Powder diffraction crystal structure determination using generative models. 2024, arXiv preprint arXiv: 2409.04727 Lin P, Chen P, Jiao R, Mo Q, Cen J, Huang W, Liu Y, Huang D, Lu Y. Equivariant diffusion for crystal structure prediction. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 1204 Miller B K, Chen R T Q, Sriram A, Wood B M. FlowMM: generating materials with riemannian flow matching. In: Proceedings of the 41st International Conference on Machine Learning. 2024 Wu H, Song Y, Gong J, Cao Z, Ouyang Y, Zhang J, Zhou H, Ma W Y, Liu J. A periodic Bayesian flow for material generation. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Zhang S, Liu Y, Xie L. Physics-aware graph neural network for accurate RNA 3D structure prediction. 2023, arXiv preprint arXiv: 2210.16392 Li Z, Cen J, Huang W, Wang T, Song L. Size-generalizable RNA structure evaluation by exploring hierarchical geometries. In: Proceedings of the 13th International Conference on Learning Representations. 2025 Greff K, Belletti F, Beyer L, Doersch C, Du Y, et al. Kubric: a scalable dataset generator. In: Proceedings of 2022 IEEE/CVF Conference on 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. Computer Vision and Pattern Recognition. 2022, 3739–3751 Bear D, Wang E, Mrowca D, Binder F J, Tung H Y, Pramod R T, Holdaway C, Tao S, Smith K A, Sun F Y, Li F F, Kanwisher N, Tenenbaum J, Yamins D, Fan J E. Physion: evaluating physical prediction from vision in humans and machines. In: Proceedings of the 1st Neural Information Processing Systems Track on Datasets and Benchmarks. 2021 Yu K T, Bauza M, Fazeli N, Rodriguez A. More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing. In: Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2016, 30−37 Townshend R J L, Vogele M, Suriana P, Derry A, Powers A S, Laloudakis Y, Balachandar S, Jing B, Anderson B M, Eismann S, Kondor R, Altman R B, Dror R O. ATOM3D: tasks on molecules in three dimensions. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021 Xu M, Luo S, Bengio Y, Peng J, Tang J. Learning neural generative dynamics for molecular conformation generation. In: Proceedings of the 9th International Conference on Learning Representations. 2021 Chmiela S, Tkatchenko A, Sauceda H E, Poltavsky I, Schütt K T, Müller K R. Machine learning of accurate energy-conserving molecular force fields. Science Advances, 2017, 3(5): e1603015 Tran R, Lan J, Shuaibi M, Wood B M, Goyal S, Das A, HerasDomingo J, Kolluru A, Rizvi A, Shoghi N, Sriram A, Therrien F, Abed J, Voznyy O, Sargent E H, Ulissi Z, Zitnick C L. The open catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. ACS Catalysis, 2023, 13(5): 3066−3084 Seyler S, Beckstein O. Molecular dynamics trajectory for benchmarking MDanalysis. 2017 Lindorff-Larsen K, Piana S, Dror R O, Shaw D E. How fast-folding proteins fold. Science, 2011, 334(6055): 517−520 Axelrod S, Gómez-Bombarelli R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 2022, 9(1): 185 Wang X, Zhao H, Tu W W, Yao Q. Automated 3D pre-training for molecular property prediction. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 2419−2430 Isert C, Atz K, Jimenez-Luna J, Schneider G. QMugs, quantum mechanical properties of drug-like molecules. Scientific Data, 2022, 9(1): 273 Ashburner M, Ball C A, Blake J A, Botstein D, Butler H, Cherry J M, Davis A P, Dolinski K, Dwight S S, Eppig J T, Harris M A, Hill D P, Issel-Tarver L, Kasarskis A, Lewis S, Matese J C, Richardson J E, Ringwald M, Rubin G M, Sherlock G. Gene ontology: tool for the unification of biology. Nature Genetics, 2000, 25(1): 25−29 Bairoch A. The ENZYME database in 2000. Nucleic Acids Research, 2000, 28(1): 304−305 Orengo C A, Michie A D, Jones S, Jones D T, Swindells M B, Thornton J M. CATH–a hierarchic classification of protein domain structures. Structure, 1997, 5(8): 1093−1109 Xue Y, Liu Z, Fang X, Wang F. Multimodal pre-training model for sequence-based prediction of protein-protein interaction. In: Proceedings of the 16th Machine Learning in Computational Biology Meeting. 2022, 34−46 Chandonia J M, Fox N K, Brenner S E. SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database. Nucleic Acids Research, 2019, 47(D1): D475−D481 Heinzinger M, Weissenow K, Sanchez J G, Henkel A, Steinegger M, Rost B. ProstT5: bilingual language model for protein sequence and Jiaqi HAN et al. 284. 285. 286. 287. 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. A survey of geometric graph neural networks: data structures, models and applications structure. bioRxiv, 2023 Bepler T, Berger B. Learning the protein language: evolution, structure, and function. Cell Systems, 2021, 12(6): 654−669 Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, Abbeel P, Song Y S. Evaluating protein transfer learning with TAPE. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32 Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 2022, 50(D1): D439−D444 Gao Z, Tan C, Li S Z. AlphaDesign: a graph protein design method and benchmark on AlphaFoldDB. 2022, arXiv preprint arXiv: 2202.01079 Consortium T U. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Research, 2023, 51(D1): D523−D531 Almagro Armenteros J J, Sønderby C K, Sønderby S K, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics, 2017, 33(21): 3387−3395 Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nature Communications, 2018, 9(1): 2542 Klausen M S, Jespersen M C, Nielsen H, Jensen K K, Jurtz V I, Sønderby C K, Sommer M O A, Winther O, Nielsen M, Petersen B, Marcatili P. NetSurfP-2. 0: improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinformatics, 2019, 87(6): 520−527 Xu M, Zhang Z, Lu J, Zhu Z, Zhang Y, Chang M, Liu R, Tang J. Peer: a comprehensive and multi-task benchmark for protein sequence understanding. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2548 Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins: Structure, Function, and Bioinformatics, 2019, 87(12): 1011−1020 Berman H M, Westbrook J, Feng Z, Gilliland G, Bhat T N, Weissig H, Shindyalov I N, Bourne P E. The protein data bank. Nucleic Acids Research, 2000, 28(1): 235−242 Sterling T, Irwin J J. ZINC 15 – ligand discovery for everyone. Journal of Chemical Information and Modeling, 2015, 55(11): 2324−2337 Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, Wang R. Comparative assessment of scoring functions: the CASF-2016 update. Journal of Chemical Information and Modeling, 2019, 59(2): 895−913 Schreiner M, Bhowmik A, Vegge T, Busk J, Winther O. Transition1xa dataset for building generalizable reactive machine learning potentials. Scientific Data, 2022, 9(1): 779 Francoeur P G, Masuda T, Sunseri J, Jia A, Iovanisci R B, Snyder I, Koes D R. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 2020, 60(9): 4200−4215 Morehead A, Chen C, Sedova A, Cheng J. Dips-plus: the enhanced database of interacting protein structures for interface prediction. Scientific Data, 2023, 10(1): 509 Stark C, Breitkreutz B J, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Research, 2006, 34(S1): D535−D539 Hallee L, Gleghorn J P. Protein-protein interaction prediction is achievable with large language models. bioRxiv, 2023 Vreven T, Moal I H, Vangone A, Pierce B G, Kastritis P L, Torchala M, Chaleil R, Jiménez-García B, Bates P A, Fernandez-Recio J, Bonvin A M J J, Weng Z. Updates to the integrated protein– protein interaction benchmarks: docking benchmark version 5 and affinity 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 35 benchmark version 2. Journal of Molecular Biology, 2015, 427(19): 3031−3041 Jankauskaitė J, Jiménez-García B, Dapkūnas J, Fernández-Recio J, Moal I H. SKEMPI 2. 0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 2019, 35(3): 462−469 Raybould M I J, Kovaltsuk A, Marks C, Deane C M. CoV-AbDab: the coronavirus antibody database. Bioinformatics, 2021, 37(5): 734−735 Wen Z, He J, Tao H, Huang S Y. PepBDB: a comprehensive structural database of biological peptide–protein interactions. Bioinformatics, 2019, 35(1): 175−177 Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, Zhao D, Zeng J. A deeplearning framework for multi-level peptide–protein interaction prediction. Nature Communications, 2021, 12(1): 5465 Tsaban T, Varga J K, Avraham O, Ben-Aharon Z, Khramushin A, Schueler-Furman O. Harnessing protein folding neural networks for peptide–protein docking. Nature Communications, 2022, 13(1): 176 Jain A, Ong S P, Hautier G, Chen W, Richards W D, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson K A. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Materials, 2013, 1(1): 011002 Castelli I E, Landis D D, Thygesen K S, Dahl S, Chorkendorff I, Jaramillo T F, Jacobsen K W. New cubic perovskites for one- and twophotonwater splitting using the computational materials repository. Energy and Environmental Science, 2012, 5(10): 9034−9043 Castelli I E, Olsen T, Datta S, Landis D D, Dahl S, Thygesen K S, Jacobsen K W. Computational screening of perovskite metal oxides for optimal solar light capture. Energy and Environmental Science, 2012, 5(2): 5814−5819 Pickard C J. AIRSS data for carbon at 10GPa and the C+N+H+O system at 1GPa. 2020 Choudhary K, Garrity K F, Reid A C E, DeCost B, Biacchi A J, et al. The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. npj Computational Materials, 2020, 6(1): 173 Choudhary K, DeCost B, Tavazza F. Machine learning with forcefield-inspired descriptors for materials: fast screening and mapping energy landscape. Physical Review Materials, 2018, 2(8): 083801 Watkins A M, Rangan R, Das R. FARFAR2: improved de novo Rosetta prediction of complex global RNA folds. Structure, 2020, 28(8): 963−976.e6 Liu Y, Cheng J, Zhao H, Xu T, Zhao P, Tsung F, Li J, Rong Y. SEGNO: generalizing equivariant graph neural networks with physical inductive biases. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Downs G M, Gillet V J, Holliday J D, Lynch M F. Review of ring perception algorithms for chemical graphs. Journal of Chemical Information and Computer Sciences, 1989, 29(3): 172−187 Lipinski C A, Lombardo F, Dominy B W, Feeney P J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 2012, 64 Suppl 1: 4−17 Gowers R J, Linke M, Barnoud J, Reddy T J E, Melo M N, Seyler S L, Domanski J J, Dotson D L, Buchoux S, Kenney I M, Beckstein O. MDAnalysis: a python package for the rapid analysis of molecular dynamics simulations. In: Proceedings of the 15th Python in Science Conference. 2016, 105 Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and ComputerAssisted Intervention. 2015, 234−241 Huang C W, Dinh L, Courville A. Augmented normalizing flows: 36 321. 322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. Front. Comput. Sci., 2025, 19(11): 1911375 bridging the gap between generative flows and latent variable models. 2020, arXiv preprint arXiv: 2002.07101 Liberti L, Lavor C, Maculan N, Mucherino A. Euclidean distance geometry and applications. SIAM Review, 2014, 56(1): 3−69 Kingma D P, Welling M. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations. 2014, 1050 Wang L, Song C, Liu Z, Rong Y, Liu Q, Wu S, Wang L. Diffusion models for molecules: a survey of methods and tasks. 2025, arXiv preprint arXiv: 2502.09511 Wang S, Guo Y, Wang Y, Sun H, Huang J. SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2019, 429−436 Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V S, Leskovec J. Strategies for pre-training graph neural networks. In: Proceedings of the 8th International Conference on Learning Representations. 2020 Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, Huang J. Selfsupervised graph transformer on large-scale molecular data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1053 Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J. Open graph benchmark: datasets for machine learning on graphs. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1855 Nakata M, Shimazaki T. PubChemQC project: a large-scale firstprinciples electronic structure database for data-driven chemistry. Journal of Chemical Information and Modeling, 2017, 57(6): 1300−1308 Pracht P, Bohle F, Grimme S. Automated exploration of the lowenergy chemical space with fast quantum chemical methods. Physical Chemistry Chemical Physics, 2020, 22(14): 7169−7192 Hung M C, Link W. Protein localization in disease and therapy. Journal of Cell Science, 2011, 124(20): 3381−3392 Dallago C, Mou J, Johnston K E, Wittmann B J, Bhattacharya N, Goldman S, Madani A, Yang K K. FLIP: benchmark tasks in fitness landscape inference for proteins. bioRxiv, 2021 Krivák R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. Journal of Cheminformatics, 2015, 7: 12 Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics, 2009, 10: 168 Jiménez J, Doerr S, Martínez-Rosell G, Rose A S, De Fabritiis G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics, 2017, 33(19): 3036−3042 Mylonas S K, Axenopoulos A, Daras P. DeepSurf: a surface-based deep learning approach for the prediction of ligand binding sites on proteins. Bioinformatics, 2021, 37(12): 1681−1690 Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022 Suzek B E, Wang Y, Huang H, McGarvey P B, Wu C H, Consortium U. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015, 31(6): 926−932 Rao R, Meier J, Sercu T, Ovchinnikov S, Rives A. Transformer protein language models are unsupervised structure learners. In: Proceedings of the 9th International Conference on Learning Representations. 2021 Wu L, Huang Y, Lin H, Li S Z. A survey on protein representation 340. 341. 342. 343. 344. 345. 346. 347. 348. 349. 350. 351. 352. 353. 354. 355. 356. 357. 358. learning: retrospect and prospect. 2022, arXiv preprint arXiv: 2301.00813 Hussain J, Rea C. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. Journal of Chemical Information and Modeling, 2010, 50(3): 339−348 Lin H, Huang Y, Zhang O, Wu L, Li S, Chen Z, Li S Z. Functionalgroup-based diffusion for pocket-specific molecule generation and elaboration. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1504 Wang R, Fang X, Lu Y, Wang S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known threedimensional structures. Journal of Medicinal Chemistry, 2004, 47(12): 2977−2980 Kastritis P L, Moal I H, Hwang H, Weng Z, Bates P A, Bonvin A M J J, Janin J. A structure-based benchmark for protein–protein binding affinity. Protein Science, 2011, 20(3): 482−491 Moal I H, Fernández-Recio J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics, 2012, 28(20): 2600−2607 Fosgerau K, Hoffmann T. Peptide therapeutics: current status and future directions. Drug Discovery Today, 2015, 20(1): 122−128 Lee A C L, Harris J L, Khanna K K, Hong J H. A comprehensive review on current advances in peptide drug development and design. International Journal of Molecular Sciences, 2019, 20(10): 2383 Bhardwaj G, Mulligan V K, Bahl C D, Gilmore J M, Harvey P J, et al. Accurate de novo design of hyperstable constrained peptides. Nature, 2016, 538(7625): 329−335 Cao L, Coventry B, Goreshnik I, Huang B, Sheffler W, et al. Design of protein-binding proteins from the target structure alone. Nature, 2022, 605(7910): 551−560 Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow twins: selfsupervised learning via redundancy reduction. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 12310–12320 Chen X, He K. Exploring simple Siamese representation learning. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 15745−15753 Geiger M, Smidt T. e3nn: Euclidean neural networks. 2022, arXiv preprint arXiv: 2207.09453 Das R, Baker D. Automated de novo prediction of native-like RNA tertiary structures. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(37): 14664−14669 Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9 Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, et al. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159 Reed S, Zolna K, Parisotto E, Colmenarejo S G, Novikov A, BarthMaron G, Giménez M, Sulsky Y, Kay J, Springenberg J T, Eccles T, Bruce J, Razavi A, Edwards A, Heess N, Chen Y, Hadsell R, Vinyals O, Bordbar M, de Freitas N. A generalist agent. Transactions on Machine Learning Research, 2022, See openreview.net/forum?id= 1ikK0kHjvj website, 2022 Merchant A, Batzner S, Schoenholz S S, Aykol M, Cheon G, Cubuk E D. Scaling deep learning for materials discovery. Nature, 2023, 624(7990): 80−85 Bran A M, Cox S, Schilter O, Baldassari C, White A D, Schwaller P. Augmenting large language models with chemistry tools. Nature Machine Intelligence, 2024, 6(5): 525−535 Jiaqi HAN et al. 359. 360. 361. 362. 363. 364. A survey of geometric graph neural networks: data structures, models and applications Liu X, Yu H, Zhang H, Xu Y, Lei X, et al. AgentBench: evaluating LLMs as agents. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Janakarajan N, Erdmann T, Swaminathan S, Laino T, Born J. Language models in molecular discovery. In: Satoh H, Funatsu K, Yamamoto H, eds. Drug Development Supported by Informatics. Singapore: Springer, 2024, 121−141 Liu S, Wang J, Yang Y, Wang C, Liu L, Guo H, Xiao C. Conversational drug editing using retrieval and domain feedback. In: Proceedings of the 12th International Conference on Learning Representations. 2024 Zhang W, Wang X, Nie W, Eaton J, Rees B, Gu Q. MoleculeGPT: instruction following large language models for molecular property prediction. In: Proceedings of NeurIPS 2023 Workshop on New Frontiers of AI for Drug Discovery and Development. 2023 Zheng Z, Liu Y, Li J, Yao J, Rong Y. Relaxing continuous constraints of equivariant graph neural networks for physical dynamics learning. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024 Liu Y, Zheng Z, Rong Y, Li J. Equivariant graph learning for highdensity crowd trajectories modeling. Transactions on Machine Learning Research, See openreview.net/forum?id=TeQRze2ZjO website, 2024 Jiaqi HAN is a PhD student in Computer Science at Stanford University, USA. Previously, he received his BE in Computer Science at Tsinghua University, China. His research interest involves developing principled machine learning methods for modeling geometric systems. Jiacheng CEN is currently a PhD student in the Gaoling School of Artificial Intelligence, Renmin University of China. His research interests mainly concern geometric learning theory and its applications in scientific settings. Liming WU is currently pursuing his PhD in the Gaoling School of Artificial Intelligence at Renmin University of China. His research interests lie in geometric deep learning and AI for Science. Zongzhao LI is currently a PhD student in the Gaoling School of Artificial Intelligence, Renmin University of China, China. His research interests lie in geometric deep learning and its applications in AI for Science problems. 37 Xiangzhe KONG received his BE in Computer Science from Tsinghua University, China in 2022. He is currently a PhD student in the Department of Computer Science and Technology at Tsinghua University, China. Rui JIAO is a PhD student in the Department of Computer Science and Technology at Tsinghua University, China. Prior to that, he received the BE degree in Computer Science at Tsinghua University, China. His research interest focuses on geometric machine learning and material design. Ziyang YU received his BE in Computer Science from Tsinghua University, China in 2024. He is currently a DEng student in the Department of Computer Science and Technology at Tsinghua University, China. Tingyang XU is a Senior Research Scientist at Alibaba DAMO Academy’s Language & Science Lab. He earned his bachelor’s degree from Shanghai Jiao Tong University, China and his master’s and PhD in Computer Science from the University of Connecticut, USA under Professor Jinbo Bi. His research focuses on deep learning and its scientific applications, particularly geometric graph neural networks for molecular dynamics, drug design, and AI for science. He has developed SOTA methods for deep geometric graph learning, advancing tasks like molecular generation and molecular property prediction. Fandi WU is a Senior Research Scientist at Tencent Life Science Lab. He earned his bachelor’s degree from University of Science and Technology of China, China and his PhD in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences. His research focuses on deep learning and its scientific applications, particularly protein structure prediction and protein design. Zihe WANG received his BE in Computer Science from Tsinghua University, China in 2011 and his PhD in Computer Science from Tsinghua University, China in 2016. He is currently an assistant professor at Renmin University of China. His research focuses on algorithms and mechanism design. 38 Front. Comput. Sci., 2025, 19(11): 1911375 Hongteng XU is an associate professor in the Gaoling School of Artificial Intelligence, Renmin University of China, China. From 2018 to 2020, he was a senior research scientist in Infinia ML Inc. In the same time period, he is a visiting faculty member in the Department of Electrical and Computer Engineering, Duke University, USA. He received his Ph.D. from the School of Electrical and Computer Engineering at Georgia Institute of Technology, USA in 2017. His research interests include machine learning and its applications, especially optimal transport theory, sequential data modeling and analysis, deep learning techniques, and their applications in computer vision and data mining. Zhewei WEI received his PhD of Computer Science and Engineering from Hong Kong University of Science and Technology, China. He did postdoctoral research in Aarhus University, USA from 2012 to 2014, and joined Renmin University of China, China in 2014. Deli ZHAO is leading the AI team in Alibaba DAMO Academy. He has researched on computer vision and machine learning for nearly two decades, now mainly focusing on generative models, multi-modal learning, and foundation models. Yang LIU is the GDS Professor in the Department of Computer Science and Technology at Tsinghua University, China. He is Executive Dean of Institute for AI Industry Research (AIR) and Associate Dean of the Department of Computer Science and Technology. His research interests include artificial intelligence, natural language processing, and Medical AI. Yu RONG is an IEEE Senior Member and is recognized as a high-level overseas talent by Shenzhen. In June 2017, he joined Tencent AI Lab as a principal researcher and transitioned to Alibaba DAMO Academy in June 2024, focusing on large language models and AI for Science. His research interests lie in graph deep learning and large language models, particularly applied within the AI for Science domain. Wenbing HUANG is now a tenure-track associate professor at Gaoling School of Artificial Intelligence (GSAI), Renmin University of China, China. Before joining GSAI, he worked as an assistant researcher at AIR, Tsinghua University and senior researcher at Tencent AI Lab. His research focuses on geometric deep learning, GNN and AI for Science.