Presentation + Paper
13 June 2023 Ethical data splitting utilizing L1-norm principal component analysis
Author Affiliations +
Abstract
Ethical data splitting is of paramount importance to ensure the validity of any solution that is based on data. If data is biased, it will not accurately represent how the solution will solve the problem. To ethically split data, the overall variance of the data needs to be fairly represented in the training and the testing sets of the dataset. To do this, the outliers of the data need to be determined so that they can be accounted for when splitting the data. Finding the principal components of the data using the L2-norm has been shown as an effective way to identify outliers of data to make a robust dataset that is resistant to outliers. It has been shown that the L1-norm is more resistant to outliers than the L2-norm, so it will allow the dataset to become more resistant to outliers. Therefore, utilizing L1-norm principal components when determining ethical data splits will result in more robust datasets.
Conference Presentation
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Garrett I. Cayce, Arthur C Depoian II, Colleen P. Bailey, and Parthasarathy Guturu "Ethical data splitting utilizing L1-norm principal component analysis", Proc. SPIE 12522, Big Data V: Learning, Analytics, and Applications , 1252208 (13 June 2023); https://doi.org/10.1117/12.2664127
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Principal component analysis

Databases

Binary data

Machine learning

Sampling rates

Data modeling

Electrocardiography

RELATED CONTENT


Back to Top