The Test-Train Split Won't Save You
In modeling, cross-validation such as test/train splits is often treated as a panacea for poor datasets or model selection procedures. Examples of poor practice are not cited, though such a citation list could be several dozens entries long, from several fields.
I will illustrate through simulation how the test/train split cannot salvage a poor pipeline.
First, some housekeeping, below is the function I am using to generate some data where the predictors (\(X_{n}\)) may or may not have an underlying relationship to the response (\(y\)):