Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors,
Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign.
Lund University, Faculty of Medicine, Department of Experimental Medical Science, Protein Bioinformatics, Lunds universitet, Medicinska fakulteten, Institutionen för experimentell medicinsk vetenskap, Proteinbioinformatik, Originator, Lund University, Faculty of Medicine, Department of Clinical Sciences, Lund, Section III, Rheumatology, Lunds universitet, Medicinska fakulteten, Institutionen för kliniska vetenskaper, Lund, Sektion III, Reumatologi och molekylär skelettbiologi, Originator, Lund University, Profile areas and other strong research environments, Strategic research areas (SRA), eSSENCE: The e-Science Collaboration, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Strategiska forskningsområden (SFO), eSSENCE: The e-Science Collaboration, Originator, Lund University, Faculty of Science, Department of Chemistry, Center for Molecular Protein Science, Biochemistry and Structural Biology, Lunds universitet, Naturvetenskapliga fakulteten, Kemiska institutionen, Centrum för Molekylär Proteinvetenskap, Biokemi och Strukturbiologi, Originator, Lund University, Profile areas and other strong research environments, Other Strong Research Environments, LUCC: Lund University Cancer Centre, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Övriga starka forskningsmiljöer, LUCC: Lunds universitets cancercentrum, Originator