Repository logo
  • English
  • Deutsch
  • Español
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?

  • English
  • Deutsch
  • Español
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • Researchers
  • Statistics
  1. Home
  2. Current Research Information System UV
  3. Publicaciones
  4. Knowledge-Slanted Random Forest Method For High-Dimensional Data And Small Sample Size With A Feature Selection Application For Gene Expression Data
 
  • Details
Options

Knowledge-Slanted Random Forest Method For High-Dimensional Data And Small Sample Size With A Feature Selection Application For Gene Expression Data

Journal
BioData Mining
Date Issued
2024-09-10
Author(s)
Erika Cantor
Sandra Guauque-Olarte
Roberto León
Chabert, Steren  
Facultad de Ingeniería  
Salas, Rodrigo  
Facultad de Ingeniería  
DOI
10.1186/s13040-024-00388-8
WoS ID
WOS:001309377700001
Abstract
The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes (n≤30) comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.
Subjects

Biochemistry

Computational Mathema...

Computational Theory ...

Computer Science Appl...

Genetics

Mathematical And Comp...

Molecular Biology

OCDE Subjects

Natural Sciences::Oth...

Quartile (Date Issued)
Q2
License
acceso abierto

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback

Hosting & Support by

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science