AI-Powered QA: How Large Language Models Are Revolutionizing Software Testing- Part 1

Ilam Padmanabhan

Posted On: January 28, 2025

view count1792 Views

Read time13 Min Read

According to GitHub’s 2023 Octoverse report, 46% of code is now written with AI assistance, spiking to 75% for Jupyter notebooks. AI has rapidly evolved from a curiosity to an indispensable coding companion.

But who’s testing all this AI-generated code?

Consider this: in 1996, a $1.7 billion Ariane 5 rocket failure was caused by a software error—a simple bug traditional methods missed. With over a billion code commits in 2024, the stakes are even higher today.

AI isn’t just reshaping how we write code—it’s transforming how we test it. Imagine tests that write themselves, bug reports that explain the root cause, and documentation that syncs seamlessly with your code. Platforms like Kane AI are making these possibilities a reality, revolutionizing end-to-end test automation.

Tokens: The Lego Bricks of How LLMs Think About Testing

I’ll do my best to keep this short. There is a lot of hype around AI / LLMs but many people do not have an understanding of how it actually works. I hope you’ll forgive my oversimplification of it in this section, but I think it is necessary to start with some basics.

Think of tokens as individual pieces of a puzzle that a computer (LLM) uses to understand what it’s looking at. When we use LLMs for testing, these “tokens” could be a word, part of a word, a piece of code or even a special character like a comma. Each token is one tiny bit of information the LLM needs to assemble to get the whole picture.

Consider this small piece of code for a test:

The LLM would look at this and break it down into smaller parts or “tokens” like:

  • def (tells the computer a function is starting),
  • test (the function’s name),
  • _ (a small character that separates words),
  • login (what the function is testing for),

And so on.

Each of these tokens helps the LLM understand what it’s supposed to do. By assembling these pieces the LLM starts to “understand” the code, almost like a person reading a sentence. It uses that understanding to create tests, check for errors and make sure the code works as expected. So when we talk about LLMs creating tests we’re really talking about how they look at each token and use them like puzzle pieces to get the whole picture.

This is where prompt engineering helps – to help the engines use a different context to solve the puzzle and assemble the pieces of the puzzle differently – but more on that later in a different blog.

We’ve moved beyond simple text processing; today’s technology can interpret text, videos, voices, and more. At its core, anything that can be saved in a digital format is readable, can be broken down into tokens, and therefore falls within what LLMs can understand. This token-based structure makes LLMs powerful tools for interpreting and acting on various types of data, whether it’s code, complex documents, or multimedia, opening up exciting possibilities for future applications across different fields.

Speed vs. Quality

Going by what Google’s CEO said in a recent press conference, over 25% of all code written in Google is written with AI assistance. Scale that to all companies in the world and you get where this is going in the next few years. AI is already writing a lot of code and that will only get faster.

Companies need to deliver software fast – or be eaten by competition. But at the same time, companies also need to show stable solutions. This is where the traditional testing approach forces an impossible trade-off: slow down releases or accept higher risk.

Neither option is acceptable in today’s world.

Breaking Points of Traditional Testing

Let’s look at a few scenarios where the traditional test approaches have no answer (yet)!

Maintenance Overhead: The Hidden Cost of Quality Assurance

As systems grow in complexity and development moves at a faster pace, test maintenance is one of the biggest challenges facing QA teams. This hidden cost is a silent killer that drains resources and suffocates innovation. Let’s take a closer look:

Fragile Tests: Breaking with the Tiniest Change

UI tests are notorious for being brittle. Renaming a button, changing a layout, or modifying an attribute can break hundreds of tests even though the underlying functionality remains intact.

For example, a Selenium test might break if a developer changes the id of a login button from “login-button” to “btn-login”. Without self-healing capabilities or smart locators, your QA team is stuck updating test steps for every minor UI change.

But it gets worse. With AI-powered test code generation, the amount of new code can overwhelm even the most robust automation frameworks. No tool on the market can keep up with the volume and pace of changes in today’s software without significant manual configuration.

Even if some of the automation frameworks manage to keep up, it is only a (very short) matter of time before the volume of new code overwhelms them.

Test Data Decay: Resetting Your Testing Foundation

Your automated tests are only as good as the data they rely on. Test data can decay, become invalid, or drift away from your app’s new requirements:

  • Database Schema Changes: Modifications to database structures make existing test data invalid or unreadable.
  • Expired Test Accounts: Test user accounts lock or expire due to password aging policies or changes to production data.
  • Dynamic Data: Apps that rely on constantly changing data, such as product catalogs or user profiles, require refreshed test data to stay relevant.

Resetting your testing foundation is no easy task. It requires hours, if not days, of manual effort to update, clean, or regenerate test data.

Every experienced tester will know exactly how the quality of test data directly impacts the ‘quality’ of ‘Quality Assurance’. Testing would be so much easier & testers would be far more productive if only test data was always top-notch!

Framework Evolution: Keeping Up with Moving Targets

Automation frameworks, libraries, and tools are rapidly evolving to fit modern DevOps practices. But with each new release comes the problem of:

  • Backward Compatibility: Framework updates often deprecate methods or introduce breaking changes that break older test scripts.
  • Learning New Features: Your QA engineers need time to learn new updates, which can delay your testing timeline.
  • Refactoring Test Suites: You may need to re-architect entire test suites to take advantage of new framework features.

Documentation Drift: Falling Behind

Test documentation provides valuable context for test cases, but it often struggles to keep up with the pace of development. Documentation drift happens when:

  • Test steps are updated without touching the test case description or documentation.
  • Developers and QA teams forget to sync docs with the latest app changes.
  • Teams rely on ad-hoc notes or tribal knowledge instead of centralized, up-to-date documentation.

The result? Confusion, duplicated effort, and a long onboarding process for new team members.

I know, that with the advent of agile methodologies, documentation is getting out of fashion. However, try telling that to your Ops teams who struggle to understand how a feature was built & tested when trying to answer a complex customer question.

The Consequences of Maintenance Overhead

Many studies show that QA teams spend a significant part of their time maintaining existing tests instead of focusing on important tasks like exploratory testing, performance testing, or creating new test cases. This resource drain leads to:

  • Stagnant Test Coverage: Time spent on maintenance leaves little room to expand test coverage to new features or risk areas.
  • Accumulated Technical Debt: Ignored maintenance builds up over time, requiring more aggressive fixes down the road.
  • Team Burnout: The never-ending battle to keep tests passing can lead to QA team burnout and frustration.

On a side note, Amy has a few practical tips on AI-powered test maintenance.

Coverage Gaps: The Known Unknowns

Despite all the progress we’ve made in automation and testing frameworks, coverage gaps remain one of the biggest challenges for QA teams. These “known unknowns” are the silent killers that allow bugs to slip through and degrade the user experience. In extreme cases, they can even cause catastrophic failures in production. The ‘unknown unknowns’ will become an even more intense problem with the rise of AI-generated code, but that is a blog of its own.

Edge Cases: Testing the Unpredictable

Modern applications are complex. We can’t possibly anticipate and test for every possible scenario. Let’s talk about some of the challenges QA communities face:

  • Feature Complexity: Every new feature adds more edge cases. As the system grows, so does the number of possible combinations. For example, a simple discount feature in an e-commerce app can have hundreds of edge cases when you combine it with user types, product categories, and payment methods.
  • AI-Generated Agents: Bots are becoming popular as B2C and B2B customers. Their behavior is unpredictable and exposes system behaviors we never would have imagined. They can also exploit system weaknesses much faster than any human.

Integration Points: The Math Problem

Our modern application ecosystems are increasingly made up of multiple APIs, microservices, and 3rd party integrations. This creates a staggering number of possible touchpoints.

  • Service Dependencies: A dependent API or unexpected response format can break critical user journeys.
  • Chained Interactions: When multiple services are involved, a failing component can cause unpredictable errors throughout the app.
  • Dynamic Environments: Changes to external services or dependent systems can introduce hard-to-reproduce bugs.

We can’t mathematically test for every possible combination, so we prioritize and hope we don’t miss the critical gaps. Risk-based testing was not ideal, but it was our only option as QA professionals so (far).

User Flows: The Creativity of Real Users

In reality, users don’t navigate apps as we test them. They take shortcuts and use workarounds that our structured testing often misses.

  • Unexpected Behavior: Users skip optional steps, enter edge-case data, and combine features in ways we never thought possible.
  • Exploratory Navigation: A user may complete a step that triggers a rare bug in a complex workflow system. For example, a simple mobile baking app that has 10 main flows might have thousands of permutations in terms of how it might be used and categorically lies outside of any manual testing/manually coded automated tests.
  • Cultural Differences: The global user base has different expectations and habits. Demographics, wealth, internet speed, etc., vary from country to country and reflect user behavior.

Mobile Variations: The Fragmentation Nightmare

We all know about the mobile fragmentation nightmare:

  • Device Proliferation: Hundreds of device models with varying hardware and OS combinations create an exponential increase in testing requirements.This becomes particularly important in solutions that are primarily targeted towards the mobile platforms.
  • OS Fragmentation: Various OS versions and custom skins lead to inconsistent behavior.
  • Screen Sizes and Resolutions: UI and UX elements don’t render or function correctly on smaller or odd-sized screens.

Tip: Almost no organization can scale itself up to test everything that is possible. Using a test specialist increases your chances of staying on par with the speed of change.

Resource Constraints: The QA Bottleneck

Our QA teams are overwhelmed and expected to do more with less. These are some of the biggest contributors to coverage gaps:

Tester Shortage: The Talent Deficit

We don’t have enough qualified test automation engineers. In fact, it’s one of the biggest challenges we face as an industry.

  • Specialized Skills Required: Today’s QA teams need scripting skills, knowledge of automation frameworks, and DevOps practices. Now we’re adding AI-based tools to the menu.
  • Burnout Risk: Our existing testers are stretched too thin and are at risk of burning out. They also lose focus and fail to take advantage of new approaches and techniques.

Infrastructure Costs: Testing on a Budget

Even with cloud solutions, maintaining proper test environment infrastructure can be expensive:

  • Scalable Test Environments: Creating a realistic production-like environment at scale is costly (cloud resources, load testing tools, and network configs). While the infrastructure maintenance complexity has been outsourced to the cloud providers, the cost story hasn’t been exactly rosy!
  • Tool Licensing: Many of the cool new AI-based testing tools are subscription-based and very expensive. Justifying the cost is a challenge for most teams. Most of the companies that offer an AI solution base the pricing on tokens. The longer your context window, the lengthier your prompts are – the more it costs.

Cloud solutions were cool when they started. Newbie companies embraced it immediately, the larger & more established companies (that are typically regulated) came on board later. Cloud is now the norm, not for cost but for convenience! Many organizations while have some sort of cloud adoption, cost (& skill) is still a formidable challenge.

Time Pressure: Racing Against the Clock

Our Agile and DevOps cycles are getting shorter, and our QA teams are struggling with:

  • Insufficient Testing Windows: Not enough time to run our full test suites before each release. We have covered this at length earlier. Saying no more!
  • Expanded Scope: As our systems grow more complex, we need to test more than ever, but there’s not enough time to do it. As above, saying no more, again!

Our testing tools and methodologies also evolve at a faster pace than ever before:

  • Emerging Technologies: From AI tools to new programming languages and frameworks, testers need to learn new skills quickly.
  • On-the-Job Learning: Training often falls to on-the-job trial and error, which slows down the project and increases the strain on our already-stretched QA resources.

This is one of the most important topics for test professionals – to keep up with the pace of technology growth. With solutions like LambdaTest Kane AI, testers now have the opportunity to keep pace while reducing manual burdens, letting them focus on delivering quality at scale. World’s first end-to-end software testing agent, KaneAI by LambdaTest is an AI Native QA Agent-as-a-Service platform built on modern Large Language Models (LLMs)

KaneAI is just one of the examples of how AI is transforming software testing. In the 2nd next part of the blog series, we’l explore how LLMs and AI is transforming the testing landscape.

Author Profile Author Profile Author Profile

Author’s Profile

Ilam Padmanabhan

Ilam Padmanabhan is a seasoned Tech/Financial services expert with a history of delivering multi-million dollar global projects in various Delivery and QA roles. He frequently authors articles on his personal website where he explores the latest industry trends, cutting-edge technologies, and best practices for thriving in today's rapidly evolving landscape.

Blogs: 15



linkedintwitter