核心概念
Attention U-Net architecture using features from the ProtTrans protein language model achieves state-of-the-art performance in predicting protein intrinsic disorder regions.
要約
The article presents a new protein intrinsic disorder predictor called DisorderUnetLM, which is based on the Attention U-Net convolutional neural network architecture and uses features from the ProtTrans protein language model.
Key highlights:
- DisorderUnetLM shows top results in direct comparisons with other leading predictors like flDPnn and IDP-CRF, which use multiple sequence alignments and other evolutionary features.
- It also outperforms predictors that use features from the same ProtTrans protein language model, like SETH.
- In the latest CAID-2 benchmark, DisorderUnetLM ranks 9th out of 41 predictors on the Disorder-PDB subset and 1st on the Disorder-NOX subset.
- The Attention U-Net architecture allows for fast training and inference, making DisorderUnetLM suitable for large-scale predictions and low-grade devices.
- The authors share the complete code and models to support reproducibility and encourage the use of DisorderUnetLM in protein research.
統計
The article reports the following key metrics:
On the flDPnn test set, DisorderUnetLM achieves an F1-score of 0.629, ROC-AUC of 0.835, and MCC of 0.478.
On the larger CAID Disorder-PDB test set, DisorderUnetLM achieves an F1-score of 0.516, ROC-AUC of 0.826, and MCC of 0.414.
On the binarized CheZOD test set, DisorderUnetLM achieves a ROC-AUC of 0.910, matching the performance of the SETH predictor.
On the CAID-2 Disorder-PDB test set, the ensembled DisorderUnetLM achieves a ROC-AUC of 0.924.
On the CAID-2 Disorder-NOX test set, the ensembled DisorderUnetLM achieves the best ROC-AUC of 0.844.
引用
"DisorderUnetLM shows top results in direct comparisons with flDPnn and IDP-CRF predictors using MSAs and with the SETH predictor using features from the same ProtTrans model."
"Among 41 predictors from the latest Critical Assessment of Protein Intrinsic Disorder Prediction (CAID-2) benchmark, it ranks 9th for the Disorder-PDB subset (with ROC-AUC of 0.924) and 1st for the Disorder-NOX subset (with ROC-AUC of 0.844) which confirms its potential to perform well in the upcoming CAID-3 challenge for which DisorderUnetLM was submitted."