TestΞΌ 2024 Home / Video /

How to Scalably Test LLMs | Anand Kannappan

How to Scalably Test LLMs | Anand Kannappan

Youtube thumbnail

...Playlist

...

About the talk

Watch this session as π€π§πšπ§π 𝐊𝐚𝐧𝐧𝐚𝐩𝐩𝐚𝐧, Co-Founder and CEO of Patronus AI addresses the unpredictability of Large Language Models (LLMs) and the most reliable automated methods for testing them at scale. He discusses why intrinsic evaluation metrics like perplexity often don’t align with human judgments and why open-source LLM benchmarks may not be the best way to measure AI progress anymore.

Key Highlights

35+ Sessions

60+ Speakers

20,000+ Attendees

2000+ Minutes

Live Q&As

Key Topics Covered

Intrinsic evaluation metrics like perplexity tend to be weakly correlated with human judgments, so they shouldn't be used to evaluate LLMs post-training.

Creating test cases to measure LLM performance is as much an art as it is a science. Test sets should be diverse in distribution and cover as much of the use case scope as possible.

Open-source LLM benchmarks are no longer trustworthy to measure progress in AI since most LLM developers have already trained on them.

TestΞΌ

TestΞΌ

TestΞΌ (TestMu) is more than just a virtual conference. It is an immersive experience designed by the community, for the community! A 3-day gathering that unites testers, developers, community leaders, industry experts, and ecosystem partners in the testing and QA field, all converging under a single virtual roof. πŸ˜€

More Videos from TestΞΌ 2024