We present UniHG, the largest universal heterogeneous graph dataset containing 77.31M nodes and 564M edges with 2,082 relation types. Addressing three fundamental challenges in HG research—benchmark deficiency, semantic misalignment, and propagation degradation—we propose:
Extensive experiments show 28.93% accuracy improvement over SOTA methods and 11.71% NDCG@20 boost in downstream tasks. Dataset & Code
Metric | UniHG | MAG240M | Amazon Review |
---|---|---|---|
Nodes | 77.31M | 244.16M | 102.70M |
Edges | 564M | 1.73B | 571.54M |
Node Types | 1 | 3 | 33 |
Edge Types | 2,082 | 3 | - |
Labels | 74,666 | 153 | - |
Method | UniHG-1M | UniHG-10M | UniHG-Full | ||||||
---|---|---|---|---|---|---|---|---|---|
Accuracy | Recall | F1 | Accuracy | Recall | F1 | Accuracy | Recall | F1 | |
HGD (Ours) | 75.41 | 75.95 | 82.64 | 89.03 | 90.11 | 93.05 | 93.16 | 93.83 | 96.09 |
GCN | 23.68 | 25.26 | 21.89 | 22.86 | 25.66 | 23.19 | 26.72 | 24.19 | 23.26 |
GAT | 29.12 | 34.02 | 27.76 | 33.73 | 37.48 | 29.88 | 31.02 | 35.07 | 26.92 |
HGT | 51.28 | 55.30 | 51.55 | 52.09 | 56.52 | 53.03 | 55.36 | 61.91 | 56.47 |
MTMP | 47.52 | 47.48 | 60.79 | 59.95 | 61.33 | 72.81 | 65.67 | 66.15 | 67.74 |
SGC | 42.56 | 42.11 | 55.12 | 56.32 | 57.78 | 67.95 | 62.15 | 63.88 | 73.21 |
SIGN | 56.73 | 55.54 | 69.41 | 73.58 | 84.17 | 80.30 | 69.04 | 70.49 | 81.48 |
GAMLP | 44.55 | 43.74 | 56.92 | 59.47 | 61.32 | 70.02 | 64.23 | 66.05 | 74.34 |
All values in percentage (%)
Method | UniHG-1M | UniHG-10M | UniHG-Full | ||||||
---|---|---|---|---|---|---|---|---|---|
Acc | Rec | F1 | Acc | Rec | F1 | Acc | Rec | F1 | |
SGC | 42.56 | 42.11 | 55.12 | 56.32 | 57.78 | 67.95 | 62.15 | 63.88 | 73.21 |
SGC+AFP | 44.18 | 42.75 | 57.65 | 64.25 | 65.21 | 76.67 | 69.84 | 71.40 | 81.45 |
SIGN | 56.73 | 55.54 | 69.41 | 73.58 | 84.17 | 80.30 | 69.04 | 70.49 | 81.48 |
SIGN+AFP | 66.69 | 65.27 | 77.16 | 77.52 | 80.18 | 81.38 | 75.45 | 76.42 | 85.71 |
GAMLP | 44.55 | 43.74 | 56.92 | 59.47 | 61.32 | 70.02 | 64.23 | 66.05 | 74.34 |
GAMLP+AFP | 47.24 | 46.80 | 60.85 | 60.99 | 62.70 | 73.38 | 72.89 | 74.95 | 83.51 |
Method | Amazon-Book | Yelp2018 | Citeulike-a | ||||||
---|---|---|---|---|---|---|---|---|---|
Prec@20 | Rec@20 | NDCG@20 | Prec@20 | Rec@20 | NDCG@20 | Prec@20 | Rec@20 | NDCG@20 | |
LightGCN | 0.01716 | 0.06191 | 0.04106 | 0.00433 | 0.01123 | 0.00849 | 0.02329 | 0.07188 | 0.04374 |
LightGCN+UniHG | +2.797% | +1.712% | +0.803% | +6.467% | +7.925% | +5.535% | +3.521% | +3.116% | +1.935% |
NGCF | 0.01540 | 0.06278 | 0.04142 | 0.00318 | 0.00596 | 0.00533 | 0.03992 | 0.11455 | 0.09264 |
NGCF+UniHG | +14.675% | +17.776% | +27.764% | +6.918% | +31.543% | +9.005% | +0.601% | +1.422% | +1.759% |
UniHG establishes new benchmarks for heterogeneous graph learning through:
1. Hu et al. (2020) Heterogeneous Graph Transformer
2. Radford et al. (2021) CLIP
3. Full reference list in original paper