Articles

Experimentation Elite London 2025: Building for Humans and the Machine

May 30, 2025
|
Hannah Calcutt
This is some text inside of a div block.

December's Experimentation Elite in London brought together perspectives from across the world of experimentation – traditional optimisers and innovators, technical experts and designers, all facing distinct challenges. Topics ranged from the psychology of customer behaviour to some genuinely provocative questions about AI's impact on our industry.

Talia Wolf: AI Hasn't Killed the CRO Star

Talia Wolf opened by diagnosing what she calls the "deadly hamster wheel of optimisation" – the cycle where companies test random strategies, copy what competitors are doing, and end up creating a homogenised market where everyone looks the same. Her challenge to the audience was simple but uncomfortable: could your customers distinguish your company's logo from your competitors?

To break out of this cycle, Talia introduced the ACT matrix – a framework borrowed from Acceptance and Commitment Therapy that helps people gain clarity on what really matters. The matrix has four quadrants: on the right are "towards moves" (where you want to go) and on the left are "away moves" (what you're trying to avoid), whilst the bottom captures what's important and your purpose, and the top identifies what gets in the way. The exercise forces you to identify your actual purpose, acknowledge the internal obstacles and avoidance behaviours that pull you away from your goals, and recognise the actions that move you towards what matters. For experimentation teams, this means stepping back from the tactical treadmill and asking what you're really trying to achieve.

Perhaps the most important shift Talia advocated for was reframing how we think about experiment outcomes. Tactical teams measure success purely by uplift. Strategic teams focus on understanding why customers behave the way they do. When you start with real customer insights and launch experiments to test meaningful hypotheses, a "losing" test isn't a failure – it's learning. You're still discovering something valuable about your customers, and that information can inform decisions across the entire company.

Lucia van den Brink: Want to Find More Winning Experiments?

Lucia van den Brink came with a compelling message backed by data: if you want more winning experiments, get more people involved. Her research across multiple organisations consistently showed that bringing diverse perspectives into the experimentation process leads to significantly higher win rates. The collaboration bonus is real, but with a caveat. Lucia showed one counterexample where increased collaboration hurt win rates – the programme wasn't mature enough. You can't just throw more people at an immature programme and expect magic.

For collaboration to work, you need the right conditions. The problem has to be genuinely complex – simple optimisations don't benefit from too many voices. Your team needs psychological safety, where people feel comfortable sharing unusual ideas without getting shot down. You need established processes and standards before you scale up participation. And you need actual diversity of perspective, not just more people who think the same way.

Lucia's practical advice was refreshingly specific. Don't just measure experiment outcomes – track how many people and teams are actually contributing to your programme. Use structured brainstorming formats like 1-2-4-All to ensure quieter voices get heard. Most importantly, make sure brands own their experimentation problems rather than outsourcing all the thinking. When experimentation leads actively protect unusual ideas from premature criticism, that's when you get the breakthrough insights.

Alun Lucas and Craig Sullivan: Optimise or Die – Why Your Forms Leak Customers and What to Do About It

Alun Lucas and Craig Sullivan’s session was titled "Optimise or Die," and they weren't being dramatic. Their research paints a grim picture: most web forms leak customers at an alarming rate. Password fields are particularly problematic. The issue isn't security itself, but that overly complex password requirements create massive friction without delivering better security. A password like Password135! might tick all the compliance boxes, but it's hardly secure.

One myth they thoroughly dismantled: the idea that shorter forms always perform better. Using the Fogg Behaviour Model, they demonstrated that motivation and ability matter far more than field count. A long form with high user motivation and excellent usability can outperform a poorly designed short form. It's not about how many fields you have – it's about whether people want what you're offering and whether you've made it easy for them to complete the process.

Their practical recommendations covered basics that often get overlooked: top-aligned labels, testing on real mobile devices rather than desktop simulations, turning off auto-correct that second-guesses users. They pointed to gov.uk as inspiration – designed to be accessible and usable for the entire population. On security, they addressed the captcha problem: yes, you need bot protection, but captchas frustrate real users. Alternatives like honey traps and other techniques can provide the same protection without annoying the humans you're trying to serve.

Rose Keen: Brains, Biases, and Buying – Leveraging Behavioural Science in Your Relationship with the Customer

Rose Keen started with a reminder that 95% of what goes on in our brains happens subconsciously, then demonstrated it with the Booba/Kiki effect – showing how we instantly and unconsciously associate rounded shapes with one name and spiky shapes with another. It drives home how much of customer behaviour operates below conscious thought.

Monzo's bright coral banking cards illustrate the bandwagon effect perfectly – their visibility makes them popular, which makes them more popular still. Disney parks engineer the end of every experience because they understand the peak-end rule: those final moments define how visitors remember their entire day. And our perception of value is almost entirely relative – the same product feels expensive or cheap depending purely on what it's positioned next to.

Her warning about scarcity bias felt particularly timely. Limited availability increases perceived value (Wordle's once-per-day model being a prime example), but dishonest scarcity messaging backfires. Artificial urgency that customers see through does more harm than good. Rose closed by encouraging the audience to practice gratitude by reminding themselves "these are the good days" – a useful reminder in an industry obsessed with optimisation that we're ultimately building for humans.

Jono Alderson: The Existential Crisis Facing Marketing and CRO

If Rose's talk was about understanding the human brain, Jono Alderson's  was about how machines are starting to think for us – and it's possibly the most unsettling shift our industry has faced. He traced the history of marketing back to a simple concept: crossing the threshold. Someone walks into your shop, lands on your website, engages with your content. The entire web has been built around this moment. Except now, with zero-click searches and AI-powered answers, that threshold is vanishing. People are getting what they need without ever visiting your site.

The shift is profound. We're no longer persuading people – we're persuading algorithms. These AI systems act as what Jono called "immunological systems," filtering out manipulation and noise. All those traditional optimisation tactics we've relied on, they’re not just ineffective anymore – they're actively working against you. The systems are designed to spot and suppress exactly that kind of manipulation.

Jono's core argument is that we need to stop obsessing over our own websites and start thinking about the entire corpus of information that shapes how AI systems understand our brands. It's not about what you say about yourself anymore – it's about what the totality of the web says about you. Reviews, press coverage, industry publications, social media. All of it matters. And interestingly, despite this dramatic technological shift, the boring technical fundamentals matter more than ever. Poor technical SEO, accessibility problems, performance issues – AI systems are even less forgiving about these things than Google was.

The Q&A revealed what might be the most uncomfortable truth: most brands won't make this shift. They won't fundamentally rethink quality and reputation. But for smaller, more agile players, there's an opportunity. Established brands are carrying decades of baggage that AI systems will happily surface. New brands can build their reputations from scratch in this environment. Your website still matters as a "version of the truth," but if you're spending more energy on your site than on shaping the broader conversation about your brand, you might be optimising the wrong thing.

Roundtable Discussions: Collaborative Learning

The roundtable sessions provided an opportunity to dive deeper into specific challenges facing practitioners. Topics ranged from data science and analytics to design systems, psychology, user research, and emerging technologies.

Design Systems as Gardens

One of the most memorable discussions centred on global design systems (GDS). Erin Weigel [LINK https://www.linkedin.com/in/erindoesthings/] introduced the analogy of treating a GDS like a garden. The idea is to tend to it like a gardener – adding new elements that fit, having a clear process for evaluating additions, and trimming back what isn't working. This perspective addressed a common frustration: overly rigid design systems that make it difficult to fix UX and UI challenges. By thinking of the GDS as a living system requiring care and cultivation, rather than an unchangeable rulebook, teams can maintain consistency whilst remaining adaptable to new needs.

Breaking Down Silos

A recurring theme across multiple tables was the struggle of communication between different teams. Participants shared experiences of departments operating in silos, leading to duplicated work, conflicting priorities, and missed opportunities. The value of cross-functional teams that genuinely collaborate rather than simply coordinate was emphasised repeatedly.

One particularly rich discussion explored how to achieve success in low-traffic CRO programmes. The group discussed proxy metrics that provide signals before traditional metrics like bookings or revenue. The concept of an economic tree to relate different metrics together resonated strongly – understanding how leading indicators connect to lagging outcomes helps teams make faster, more confident decisions.

Combining Data Sources

The power of combining multiple data sources was another key insight. Customer service calls and support tickets emerged as a particularly rich but underutilised source of information. These interactions reveal customer frustrations and problems in their own words, providing invaluable context for prioritising experimentation programmes.

Participants also discussed strategies for dealing with senior stakeholders who demand positive results immediately, especially in the early days of a programme. The importance of reframing "losing" experiments was emphasised – these tests save money by preventing the implementation of bad ideas. A well-designed experiment that produces a negative result is still a success from a learning and risk-management perspective.

The AI Debate

Perhaps inevitably, AI dominated several roundtable discussions. There was debate between those embracing AI tools and more sceptical perspectives. The notion of AI autonomously generating optimal creative ideas, test strategies, and content was challenged – as one participant noted, if AI could already do these things perfectly, everyone would be doing it, and all content would look the same. The general view was that AI is a tool for augmenting human work, not replacing the strategic thinking, domain expertise, and creativity that drive effective experimentation.

Final Thoughts

The real value of the day came from hearing how other practitioners approach challenges that feel universal: dealing with rigid design systems, running programmes on low traffic, managing stakeholders who want wins immediately. Whether it's Erin's garden analogy for GDS or the discussions about proxy metrics and psychological safety for collaboration, it's useful to see how others frame and solve problems we face constantly. Alun and Craig's reminder about forms was particularly refreshing in its simplicity: gov.uk has already solved these problems for an entire population, so stop reinventing password fields and just follow what works. Sometimes the answer isn't a novel testing strategy – it's recognising that good practice already exists and applying it.

Ready to evolve your experimentation strategy? As the landscape shifts between human psychology and machine-led discovery, the right strategy is more important than ever. At Eclipse, we help brands move beyond tactical tweaks to build high-integrity, long-term growth.

Talk to us about a Strategic Experimentation Audit

Hannah Calcutt
Data and CRO Consultant

About the Author

Hannah brings over 10 years' experience in science and consulting, now specialising in conversion rate optimisation and data-driven experimentation. With a strong scientific background, she applies rigorous methodology and cutting-edge technology to deliver innovative CRO solutions that drive measurable business results. She translates complex analytical insights into actionable business strategies that help organisations make data-informed decisions and optimise their digital performance.