Testฮผ 2024 Home / Video /

How to Scalably Test LLMs | Anand Kannappan

How to Scalably Test LLMs | Anand Kannappan

...Playlist

...

About the talk

Watch this session as ๐€๐ง๐š๐ง๐ ๐Š๐š๐ง๐ง๐š๐ฉ๐ฉ๐š๐ง, Co-Founder and CEO of Patronus AI addresses the unpredictability of Large Language Models (LLMs) and the most reliable automated methods for testing them at scale. He discusses why intrinsic evaluation metrics like perplexity often donโ€™t align with human judgments and why open-source LLM benchmarks may not be the best way to measure AI progress anymore.

Key Highlights

35+ Sessions

60+ Speakers

20,000+ Attendees

2000+ Minutes

Live Q&As

Key Topics Covered

Intrinsic evaluation metrics like perplexity tend to be weakly correlated with human judgments, so they shouldn't be used to evaluate LLMs post-training.

Creating test cases to measure LLM performance is as much an art as it is a science. Test sets should be diverse in distribution and cover as much of the use case scope as possible.

Open-source LLM benchmarks are no longer trustworthy to measure progress in AI since most LLM developers have already trained on them.

Testฮผ

Testฮผ

Testฮผ (TestMu) is more than just a virtual conference. It is an immersive experience designed by the community, for the community! A 3-day gathering that unites testers, developers, community leaders, industry experts, and ecosystem partners in the testing and QA field, all converging under a single virtual roof. ๐Ÿ˜€

More Videos from Testฮผ 2024