Dataset / Models | Diagnosis Accuracy |
Analysis Completeness |
Analysis Relevance |
Lead Assessment Coverage |
Lead Assessment Accuracy |
ECG Feature Grounding |
Evidence-Based Reasoning |
Clinical Diagnostic Fidelity |
---|---|---|---|---|---|---|---|---|
MIMIC-IV-ECG (in-domain) | ||||||||
PULSE | 81.14 | 2.37 | 2.39 | 7.11 | 2.95 | 50.18 | 52.40 | 51.63 |
GEM SFT LLaVA |
87.24 | 4.41 | 5.01 | 71.07 | 46.44 | 75.48 | 75.09 | 75.28 |
GEM SFT PULSE |
86.49 | 4.43 | 4.91 | 69.80 | 45.33 | 74.95 | 74.70 | 74.87 |
PTB-XL (out-domain) | ||||||||
PULSE | 59.24 | 2.20 | 2.06 | 11.20 | 6.27 | 52.52 | 55.48 | 53.85 |
GEM SFT LLaVA |
73.53 | 4.19 | 2.96 | 79.54 | 49.01 | 74.48 | 74.61 | 73.84 |
GEM SFT PULSE |
73.59 | 4.19 | 3.00 | 78.86 | 47.96 | 74.97 | 75.41 | 74.24 |
Table 1: Grounded ECG Understanding results on MIMIC-IV-ECG and PTB-XL.
Task2: ECG-Bench (Abnormality Detection)
Models | PTB-XL Super | CODE-15% | CPSC 2018 | CSN | G12EC | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
AUC | F1 | HL | AUC | F1 | HL | AUC | F1 | HL | Accuracy | Accuracy | |
Random | 50.3 | 33.2 | 50.1 | 48.8 | 15.0 | 32.1 | 51.2 | 15.1 | 28.8 | 11.6 | 12.1 |
GPT-4o | 55.6 | 28.3 | 26.2 | 59.9 | 24.9 | 15.7 | 50.9 | 10.6 | 18.2 | 57.5 | 49.2 |
PULSE | 82.4 | 74.8 | 11.0 | 90.7 | 85.4 | 5.0 | 76.9 | 57.6 | 8.6 | 85.2 | 78.2 |
GEM SFT LLaVA |
81.8 | 73.6 | 11.6 | 90.5 | 84.8 | 5.1 | 74.1 | 52.0 | 9.0 | 92.6 | 81.8 |
GEM SFT PULSE |
83.4 | 75.8 | 11.0 | 91.5 | 86.4 | 4.7 | 79.1 | 61.1 | 8.1 | 86.2 | 80.5 |
Ablations | |||||||||||
GEM TS only |
81.2 | 72.5 | 11.9 | 90.8 | 84.9 | 5.0 | 76.3 | 54.0 | 8.5 | 91.6 | 81.4 |
GEM TS+IMG |
82.7 | 74.8 | 11.1 | 91.3 | 86.3 | 4.6 | 74.4 | 51.5 | 8.8 | 90.1 | 81.1 |
Table 2: ECG-Bench abnormality detection results.
Task3: ECG-Bench (Report Generation & ECG-QA)
Models | PTB-XL Report | ECG-QA |
---|---|---|
Report Score | Accuracy | |
Random | 0 | 16.2 |
GPT-4o | 50.2 | 35.2 |
PULSE | 61.3 | 73.8 |
GEM SFT LLaVA |
65.0 | 71.0 |
GEM SFT PULSE |
67.1 | 73.6 |
Table 3: ECG-Bench report generation and QA results.