The Test-Train Split Won't Save You

In modeling, cross-validation such as test/train splits is often treated as a panacea for poor datasets or model selection procedures. Examples of poor practice are not cited, though such a citation list could be several dozens entries long, from several fields.

I will illustrate through simulation how the test/train split cannot salvage a poor pipeline.

First, some housekeeping, below is the function I am using to generate some data where the predictors (\(X_{n}\)) may or may not have an underlying relationship to the response (\(y\)):

Recommended Mechanism Reading 2

Introduction

Followup to part 1.

I encourage readers to read the papers fully and not just rely on the small summaries here. This tangentially relates to a reason I think that large language model (LLM) summarisation of papers just isn’t as valuable as people think it is. It is simply not the same as carefully reading and thinking deeply about work. Having an algorithm “decide” what is and isn’t important in a piece of work - a decision which has already been considered by the authors, the editor and the reviewers - skips an important part of the learning process.

Recommended Mechanism Reading

Introduction

There are many resources or outlets that review new literature in chemistry, but few highlight classics. The exception to this is of course the “Classics in Total Synthesis” series of books. Here I will showcase some of my favourites from the literature, with a focus on mechanism, along with a brief explanation of my assessment of the importance of the work. If I can find the time, this will become a running series, until the relevant Zotero folder has been exhausted.

Handling Error in Replicates

Summary

If you do not want to read through the entire article:

Based on the simulations presented, in my opinion, the best way to handle errors calculated from replicates of experiments which involve fitting the data is the following:

  • Fit the data from each replicate independently.
  • Take the mean of the estimates for the fitted values (here - k values, \(\bar{k}\)) obtained from each fit.
  • Calculate the standard error of the mean for \(\bar{k} \;\) (\(SEM(\bar{k})\)).
  • Propagate the standard error of fit to arrive at a fitting error for \(\bar{k} \;\) (\(\sigma_{fit}(\bar{k})\)).
  • Combine the errors by squaring them, summing them and taking the square root (\(\sigma_{tot}(\bar{k}) = \sqrt{(\sigma_{fit}(\bar{k}))^2 + (SEM(\bar{k}))^2}\)), thus arriving at a final value for k with quantified uncertainty (\(\bar{k} \pm \sigma_{tot}(\bar{k})\)).

The presented simulations show that a random effects model also is also effective. However, for the vast majority of chemists/wet lab scientists (?) interested in appropriately accounting for error as assessed by repeated measurements, the above workflow should be sufficient in the majority of cases.