hambone:
It would be an interesting logic/programming question to figure out how to use rankings.
Here’s how I’d do it if the goal is a defensible, reproducible “best coasters” ranking.
1) Collect the right kind of opinions (pairwise, not 1–10)
Rating scales (1–10) are easy, but they’re messy: everyone uses the scale differently, gets anchored by hype, and compresses scores.
Pairwise comparisons (“A vs B, which is better?”) are way cleaner because:
People are better at relative judgments than absolute scoring.
It reduces “everyone’s a 10” scale bias.
You can infer a global ranking from incomplete comparisons.
This is the core idea behind “Hawker-style” approaches (head-to-head preference aggregation), and it’s why people still talk about it.
2) Design the survey like an actual experiment (incomplete blocks + adaptivity)
No one has ridden everything, so you’re always dealing with missing data. The trick is making the missingness less destructive.
Practical setup
Each voter imports a “credits” list (or just checks off coasters ridden).
The system only asks them to compare coasters they’ve ridden.
Each session: ~15–30 comparisons (quick, low fatigue).
Make the comparisons smart
Use an incomplete block design mindset so the dataset doesn’t become a bunch of isolated little “my home park” islands. Balanced incomplete designs are a classic way to get efficient comparisons without requiring everyone to see everything.
Then add active selection:
Prefer pairs where the model is uncertain.
Prefer “bridge” comparisons that connect clusters (Europe-heavy voters vs US-heavy voters, wood people vs hyper people, etc.).
This is how you turn a nerd poll into something that behaves like measurement.
3) Use a real model to turn those comparisons into scores
This is the part where you stop doing “cumulative ranking or some ****” and do the normal thing statisticians do with pairwise data.
The baseline model
Bradley–Terry: each coaster has a latent “strength,” and your comparisons estimate the probability A beats B.
That gets you:
A score per coaster
A ranking
A built-in way to handle incomplete matchups
Make it actually robust (the part most polls skip)
Use a hierarchical version:
Add a rater effect (some voters are harsh, some are hype machines).
Add optional covariates like “ridden this year” (recency), because opinions drift and memory lies. (Hawker ballots even tracked “ridden this year” style info.)
Allow ties if you want (or force a pick, which is fine).
This is standard paired-comparison practice, and it’s well studied.
4) Kill the two biggest biases: exposure bias and sample bias
Exposure bias (few riders)
A coaster with 25 voters can “win” the internet if you don’t control for uncertainty.
Fix: shrinkage + minimum data rules
Use Bayesian priors / regularization so low-data coasters don’t rocket to #3 on vibes alone.
Publish a “Main Ranking” that requires:
at least X unique voters, and
at least Y total comparisons involving that coaster
Everything else goes into “provisional” or “insufficient data.”
Sample bias (who is voting)
Your voters are not “all riders.” They’re enthusiasts who:
travel more than average
skew toward newer rides
skew toward whatever regions are overrepresented
Fix options (pick how serious you want to be):
Stratified weighting by region/home country (so Ohio doesn’t become the global electorate).
Weight voters modestly by breadth of experience (someone with 15 credits probably shouldn’t equal someone with 500), but don’t get elitist about it.
Publish the demographic/credit distribution so everyone can see what the sample really is.
5) Publish uncertainty, not just a single sacred list
A “scientific” ranking that outputs one definitive list with no error bars is cosplay.
So you publish:
Rank + 95% credible interval (or bootstrap CI) per coaster
Probability that #5 actually beats #4 (often it won’t be decisive)
Tiers (“these 8 are statistically indistinguishable”)
That makes the list more useful, not less, because it tells people where the real consensus is versus knife-edge fan wars.
6) Make it reproducible (so it earns respect instead of fights)
If you want “scientifically acceptable,” you do the boring grown-up stuff:
Freeze the dataset for “2026 edition”
Publish the rules and model up front (don’t tweak after seeing results)
Release anonymized comparison data and code so anyone can replicate
When people can rerun your pipeline and get the same ranking, the community arguments shift from “rigged” to “ok fine, I hate math.”
7) What the final output looks like (the “useful list” part)
You don’t ship one list. You ship a small set:
Overall Top 100 (with uncertainty bands)
Regional Top 50s (US, Europe, Asia, etc.)
By category (wood, steel, family thrill, etc.)
Most Polarizing (highest voter disagreement)
Biggest risers/fallers year-over-year (with enough data to justify it)
That covers how coaster people actually consume rankings: bragging rights, trip planning, and arguing online.
8) If you want the simplest “best” approach that still counts as legit
Collect pairwise comparisons.
Fit Bradley–Terry with voter effects.
Apply shrinkage + minimum data thresholds.
Publish ranks with uncertainty and tiers.
That’s the shortest path to “this is real analysis” without turning it into a PhD dissertation nobody finishes.
And yes, it will still be fought about, because coaster nerds don’t want truth. They want ammunition.
---
Thank you, AI overlords.
Oh, we never agree or resolve anything around here.
That's what makes it fun.
The AI solution is really labor intensive for relatively little gain in the quality of the results. Like I said, the five scale I use should be straight forward:
Anything more would be overthinking it.
And here's a fun fact: Despite hundreds of track records, it's looks like about a dozen people actually ranked their list. Glad I spent all that time coding it! (That was like 16 years ago, it's fine.)
Jeff - Editor - CoasterBuzz.com - My Blog
Jeff:
The AI solution is really labor intensive for relatively little gain in the quality of the results.
But...but...accuracy! Scientific relevance! Meaningful statistics! Quantifying opinions!
I seem to remember long-winded debates over the Hawker poll on all of the above.
I'd just feed AI the database and see what it spits out.
Honestly, only copy and pasted it to be...well, me. But I do really think this part is an interesting idea to taking the idea of ranking lists to an ubernerdy, but still useable, place:
My finely-tuned LLM said:
So you publish:
Rank + 95% credible interval (or bootstrap CI) per coaster
Probability that #5 actually beats #4 (often it won’t be decisive)
Tiers (“these 8 are statistically indistinguishable”)
It'd create..oh God, I'm gonna say it...sliders for each entry. And there will be varying levels of overlap on the sliders between entries - from complete to none.
Maybe displayed as list of percentiles graphically as bars? Like the old reports you'd get back from achievement tests in school? (Am I that old? Do they still do that? Did they ever?)
But as I talk (type?) it out, all I'm doing at that point is creating the same list (for the most part) and adding a way to visually display the uncertainly between rankings.
I'm not a nerd, you're a nerd!
Like weather forecasts. Not sure that Doppler 5 Million forecasts are any more accurate. But all the graphics, charts, street by street trackers of storms, etc. looks impressive and makes for some striking visuals on the TV screen. Forecasts though are often for between a trace and 5 feet of snow though.
Jeff:
it's looks like about a dozen people actually ranked their list.
(Raises hand.)Thank you for coding that. I'm the kind of person that has trouble picking out a toothpast because there are too many choices. I find no joy in picking out a tube of toothpaste, or box of cereal, or ... However, ranking my track list is a really fun experience. I start at the top, and ask myself,
"If I were on a deserted island, and I could only choose one ride between this one and the one under it, which would I choose."
...And then I move to the next one and ask the same thing, and so on and so forth.
And I like to revisit it every year or so. So much fun. For example, last year, I went to KD, and I had to give Grizzly and Racer a huge jump to the top of my list. They are really fun now that they have been retracked.
(I'm a lot better with making choices now that I'm older. I have found what brands and flavors I like, and look for only that.I make other brands and flavors are invisible to me.)
-Travis
www.youtube.com/TSVisits
You must be logged in to post
