UniHG: A Large-scale Universal Heterogeneous Graph Dataset and Benchmark for Representation Learning and Cross-Domain Transferring

Anonymous Authors

Affiliation, Address
Contact Email

Abstract

We present UniHG, the largest universal heterogeneous graph dataset containing 77.31M nodes and 564M edges with 2,082 relation types. Addressing three fundamental challenges in HG research—benchmark deficiency, semantic misalignment, and propagation degradation—we propose:

Extensive experiments show 28.93% accuracy improvement over SOTA methods and 11.71% NDCG@20 boost in downstream tasks. Dataset & Code

Dataset Statistics

MetricUniHGMAG240MAmazon Review
Nodes77.31M244.16M102.70M
Edges564M1.73B571.54M
Node Types1333
Edge Types2,0823-
Labels74,666153-

Experimental Results

Node Classification Performance

Method UniHG-1M UniHG-10M UniHG-Full
AccuracyRecallF1 AccuracyRecallF1 AccuracyRecallF1
HGD (Ours) 75.4175.9582.64 89.0390.1193.05 93.1693.8396.09
GCN23.6825.2621.8922.8625.6623.1926.7224.1923.26
GAT29.1234.0227.7633.7337.4829.8831.0235.0726.92
HGT51.2855.3051.5552.0956.5253.0355.3661.9156.47
MTMP47.5247.4860.7959.9561.3372.8165.6766.1567.74
SGC42.5642.1155.1256.3257.7867.9562.1563.8873.21
SIGN56.7355.5469.4173.5884.1780.3069.0470.4981.48
GAMLP44.5543.7456.9259.4761.3270.0264.2366.0574.34

All values in percentage (%)

Ablation Study with AFP Module

Method UniHG-1M UniHG-10M UniHG-Full
AccRecF1 AccRecF1 AccRecF1
SGC42.5642.1155.1256.3257.7867.9562.1563.8873.21
SGC+AFP44.1842.7557.6564.2565.2176.6769.8471.4081.45
SIGN56.7355.5469.4173.5884.1780.3069.0470.4981.48
SIGN+AFP66.6965.2777.1677.5280.1881.3875.4576.4285.71
GAMLP44.5543.7456.9259.4761.3270.0264.2366.0574.34
GAMLP+AFP47.2446.8060.8560.9962.7073.3872.8974.9583.51

Cross-domain Knowledge Transfer

Method Amazon-Book Yelp2018 Citeulike-a
Prec@20Rec@20NDCG@20 Prec@20Rec@20NDCG@20 Prec@20Rec@20NDCG@20
LightGCN0.017160.061910.041060.004330.011230.008490.023290.071880.04374
LightGCN+UniHG+2.797%+1.712%+0.803%+6.467%+7.925%+5.535%+3.521%+3.116%+1.935%
NGCF0.015400.062780.041420.003180.005960.005330.039920.114550.09264
NGCF+UniHG+14.675%+17.776%+27.764%+6.918%+31.543%+9.005%+0.601%+1.422%+1.759%

Conclusion

UniHG establishes new benchmarks for heterogeneous graph learning through:

References

1. Hu et al. (2020) Heterogeneous Graph Transformer
2. Radford et al. (2021) CLIP
3. Full reference list in original paper