We are developing five-year survival prediction models for bladder cancer patients who underwent neoadjuvant chemotherapy and radical cystectomy. This study investigated the feasibility of using large language models (Vicuna and Dolly) to extract clinical descriptors from reports for survival prediction with a nomogram model, and with or without further combining with radiomics and deep-learning descriptors from CTU images using BPNNs. The models were developed and validated using data of 163 patients collected with IRB approval. The developed models included C (based on clinical descriptors and nomogram), R (radiomics descriptors), D (deep-learning descriptor), CR (clinical and radiomics descriptors), CD (clinical and deep-learning descriptors), and CRD (clinical, radiomics, and deep-learning descriptors). The developed models achieved the following AUCs on test set: 0.82±0.06 (C: manually labeled reference), 0.73±0.07 (R), and 0.71±0.07 (D), 0.80±0.06 (C: User1 Vicuna-C2 labeled), 0.83±0.05 (C: User1 Dolly labeled), 0.78±0.06 (C: User2 Vicuna-C2 labeled), and 0.85±0.05 (C: User2 Dolly-C2 labeled). For the combined models, the AUCs were (1) manually labeled reference: 0.86±0.05 (CR), 0.86±0.05 (CD), and 0.87±0.05 (CRD), (2) CRD performance on Vicuna-C2 labeled: 0.86±0.05 (User1) and 0.84±0.05 (User2); (3) CRD performance on Dolly-C2 labeled: 0.88±0.05 (User1) and 0.89±0.04 (User2). The results showed that the LLMs extracted three clinical descriptors with accuracy ranging from 77% to 100% relative to manual extraction, and the LLMs run by two users had similar performance. The combined models outperformed individual models, and using LLM-extracted clinical descriptors achieved similar performance as manually extracted descriptors.
|