RAVEN found 118 planets in NASA's TESS data — here's how the algorithm works | Startgaze

A team at the University of Warwick pointed a machine-learning pipeline at four years of NASA TESS full-frame images — 2.2 million stars — and pulled out 118 validated planets, roughly 1,000 new candidates, and the first direct measurement of how scarce Neptune-sized worlds are in tight orbits. The pipeline is called RAVEN (RAnking and Validation of ExoplaNets), and the paper landed in MNRAS this spring.

I spend most of my telescope time on deep-sky imaging from my balcony in Nicosia, but I follow the exoplanet pipeline papers closely because they sit exactly at the intersection I care about: where does the ML end and the astrophysics begin? RAVEN is a clean case study.

The false-positive problem

TESS finds planet candidates the same way Kepler did: it watches a star’s brightness over time and flags periodic dips. A planet transiting the stellar disc blocks a fraction of the light, and the depth and shape of that dip encode the planet’s radius and orbital period.

The catch is that other things produce dips too. An eclipsing binary star in the background can mimic a transit. A grazing binary — where two stars just barely overlap — creates shallow, planet-like dips. Instrumental systematics (detector temperature drifts, scattered light from the Earth or Moon) add more noise. In Kepler data, the false-positive rate for planet candidates ran between 5% and 40% depending on the signal strength and stellar neighbourhood.

TESS makes this worse in two ways. Its pixels are large (21 arcseconds per pixel), so background binaries blend into the target star’s light more often. And its baseline is shorter — most stars get only 27 days of continuous coverage per sector — which means the transit signals are noisier and the orbital period solutions are less constrained.

The traditional fix is manual vetting: a human astronomer examines each candidate’s light curve, centroid motion, nearby stars, and archival data, then runs a statistical validation tool like VESPA or TRICERATOPS. It works, but it doesn’t scale. TESS has flagged tens of thousands of candidates. Working through them one at a time takes years.

What RAVEN actually does

RAVEN replaces that manual bottleneck with a two-stage automated pipeline. Here’s the architecture, as described in the paper by Hadjigeorghiou, Armstrong, et al.:

Stage 1 — Vetting (reject the obvious junk). A gradient-boosted decision tree (GBDT) classifier takes in a set of features extracted from each candidate’s light curve and contextual data: transit depth, duration, signal-to-noise ratio, centroid offset, nearby star contamination, even-odd transit depth differences (which flag eclipsing binaries with alternating primary and secondary eclipses). The GBDT was trained on a synthetic dataset where the team injected realistic planet transits and eight distinct categories of false positives into actual TESS light curves. Those eight categories include background eclipsing binaries, grazing binaries, hierarchical triple systems, and several flavours of instrumental artefact.

The GBDT outputs a vetting score. Candidates below a threshold get discarded. The ones above it move to stage 2.

Stage 2 — Validation (confirm it’s a planet). A Gaussian process (GP) classifier computes a Bayesian posterior probability that the candidate is a genuine planet versus each of the eight false-positive scenarios. The GP is trained on the same synthetic injection sets, but operates in a different feature space tuned for the subtler distinctions that survive the first stage. The output is a false-positive probability (FPP). Candidates with FPP below 1% are considered statistically validated planets — the same threshold used by earlier tools like VESPA, but applied at scale across the full TESS dataset for the first time.

The choice of GBDT for stage 1 is pragmatic: gradient-boosted trees handle heterogeneous tabular features well, train fast, and their feature importances are interpretable. The GP in stage 2 trades speed for calibrated uncertainty — it doesn’t just say “planet” or “not planet” but gives a probability distribution over all nine hypotheses (one planet scenario plus eight false positives). That calibration matters when you’re making demographic claims from the validated sample.

The headline numbers

From 2.2 million stars observed in TESS full-frame images over Cycles 1 through 4, RAVEN validated 118 planets. Of those, 31 are genuinely new — not previously identified as candidates by any pipeline. Another 2,000+ signals passed the vetting stage as high-quality candidates, with about 1,000 of those also new.

Some of the confirmed planets are ultra-short-period worlds (USPs) that complete an orbit in less than 24 hours. These are tidally locked, likely molten on their day side, and physically small — typically Earth-sized or sub-Earth. They’re rare, which is exactly why you need to process millions of stars to find them.

The more significant demographic result is the first direct measurement of how common “Neptunian desert” planets are. The Neptunian desert is a gap in the population of known exoplanets: in the orbital-period range of roughly 2 to 4 days, Neptune-mass planets are almost entirely absent. Smaller rocky planets survive there (they’re too compact to lose mass quickly), and hot Jupiters survive there (they’re too massive to strip), but Neptune-mass worlds apparently get photoevaporated by their star’s radiation or spiral inward through tidal decay. RAVEN’s validated sample puts a number on the gap for the first time: Neptunian-desert planets appear around just 0.08% of Sun-like stars.

The team also measured the overall occurrence rate of close-in planets (orbital periods under 16 days) around Sun-like stars at about 9–10%, consistent with Kepler’s earlier estimates but with uncertainties reduced by up to a factor of 10. That’s the payoff of going from ~200,000 Kepler target stars to 2.2 million TESS targets: the statistics tighten.

Why the ML architecture matters

It’s tempting to treat RAVEN as a black box — data in, planets out. But the architecture choices reveal something about where the field is headed.

First, the two-stage design mirrors how human vetters work. A quick scan rejects the obvious non-planets (stage 1, fast GBDT), then a careful probabilistic analysis confirms the survivors (stage 2, slower GP). Splitting the workload lets the pipeline process millions of candidates without burning compute on full Bayesian inference for every signal.

Second, training on synthetic injections rather than labelled real data sidesteps the chicken-and-egg problem in exoplanet ML: you need confirmed planets to train a classifier, but you need a classifier to confirm planets. By simulating planets and false positives with known ground truth and injecting them into real TESS noise, the team gets a training set that’s both large and honest about the instrument’s quirks.

Third, the GP’s calibrated probabilities feed directly into demographic analysis. If the false-positive probability is poorly calibrated — say, the model is overconfident and systematically underestimates FPP — then population statistics built on the validated sample will be biased. The team validated RAVEN’s calibration against known planets and known false positives from previous surveys, which is the kind of step that separates a useful pipeline from a demo.

What it means for the next wave

TESS is still collecting data (its extended mission runs through at least 2028), and ESA’s PLATO mission is set to launch in late 2026 with the explicit goal of finding Earth-like planets in the habitable zone around Sun-like stars. PLATO will generate another firehose of transit candidates.

Pipelines like RAVEN won’t replace spectroscopic follow-up — you still need a ground-based spectrograph to measure a planet’s mass via radial velocity, and you need transmission spectroscopy (JWST, Ariel) to characterise its atmosphere. But they compress the years-long candidate-to-confirmed-planet pipeline into something that runs in hours, which means the spectroscopy teams can focus their telescope time on the most interesting targets rather than spending it confirming whether a candidate is even real.

For me, the RAVEN result is a reminder that the bottleneck in modern astronomy is increasingly computational, not observational. TESS, Gaia, Rubin, PLATO — the data is arriving faster than humans can vet it. The teams that build rigorous, well-calibrated ML pipelines are the ones that will set the pace.

The full RAVEN paper is on arXiv and published in MNRAS. The companion demographics paper covering the occurrence-rate analysis is also available on arXiv.

The false-positive problem#

What RAVEN actually does#

The headline numbers#

Why the ML architecture matters#

What it means for the next wave#

The false-positive problem

What RAVEN actually does

The headline numbers

Why the ML architecture matters

What it means for the next wave