XP Series Webinar

Supercharge Your Data Quality Testing with AI/ML

In this XP Webinar, you'll explore how AI/ML can revolutionize data quality testing, enhancing accuracy and efficiency. Discover practical applications, real-world examples, and strategies to tackle challenges in modern data testing.

Watch Now

Listen On

applepodcastspotifyamazonmusic
Yogendra

Yogendra Porwal

Test Automation Lead, Binmile

WAYS TO LISTEN
applepodcastspotifyamazonmusicamazonmusic
Yogendra Porwal

Yogendra Porwal

Test Automation Lead, Binmile

As an experienced Test Automation Lead, Yogendra brings over 10+ years of expertise in Quality Assurance across diverse software domains. With technical proficiency in multiple verticals of testing, including automation, functional, performance, and security testing, among others, his focus is on applying this knowledge to enhance the overall quality assurance process. Other than being a strong QA expert in Binmile, Yogendra loves biking, sketching, and playing video games.

Kavya

Kavya

Director of Product Marketing, LambdaTest

In her role, she leads various aspects, including DevRel marketing, partnerships, GTM activities, field marketing, PR and branding. Prior to LambdaTest, Kavya played a key role at Internshala, a startup in Edtech and HRtech, where she managed media, PR, social media, content, and marketing across different verticals. Passionate about startups, Kavya excels in creating and executing marketing strategies that foster growth, engagement, and awareness.

The full transcript

Kavya (Director of Product Marketing, LambdaTest) - Hi, everyone. Welcome to another exciting session of the LambdaTest Experience (XP) Series. Through Experience Series, we dive into the world of insights and innovation, featuring renowned industry experts and business leaders in the testing and QE ecosystem.

I'm your host, Kavya, Director of Product Marketing at LambdaTest, and it's a pleasure to have you all with us today. In today's session, we'll explore how AI/ML technologies revolutionize data quality testing, enhancing both efficiency and accuracy.

Let me first introduce you to our guest on the show, Yogendra Porwal, Test Automation Lead at Binmile. With over a decade of experience in quality assurance across various software domains, Yogendra brings a wealth of knowledge in automation, functional performance, and security testing.

Beyond his professional accomplishments, Yogendra is an avid biker, enjoys sketching, and loves playing video games. So to shed light on today's discussion, Yogendra will share his insights with some real-world examples and address the challenges and future trends in data quality testing.

Let's jump straight into the heart of the discussion. I'll hand it over to Yogendra to share a bit about himself and tell our audience about his journey in quality assurance. Over to you, Yogendra.

Yogendra Porwal (Test Automation Lead, Binmile) - Thank you, Kavya. Thank you for having me here. My name is Yogendra, and I'm currently working as a test architect at Binmile. I have more than 10 years of experience in quality assurance in several industry domains like BFSI, E-commerce, Telecom, Fitness, and others.

I have expertise in various testing verticals such as test automation, performance, and security, and for the last few years, I have been working in Idle testing as of now. My role at my organization, Binmile, involves using my expertise to enhance the overall quality process to ensure that we will deliver robust and reliable software solutions for our clients.

I'm also involved in upskilling my peers and looking into new tech and trends in the industry. Outside my professional life, mostly travel, and enjoy solo bike riding. After my marriage, I'm going with my wife actually. So I also do sketching. I do play video games a lot that helps me to be energized every time and be creative in the street.

Kavya (Director of Product Marketing, LambdaTest) - Thank you so much. So can you start off by also telling us a bit about Binmile?

Yogendra Porwal (Test Automation Lead, Binmile) - Definitely. So, Binmile was founded in 2014 by our CEO Avanish Kamboj. We started with a small team of only 15-20 people, and over the past seven years, we have grown to a team of over 350 professionals spread across five different countries or offices. We have offices in the USA, the UK, Dubai, Indonesia, and also in India; we have offices in Mumbai and also in Noida.

We have become a preferred technology partner throughout this year in the industry. We have served our end-to-end digital engineering and quality assurance services to our clients, which come from over 50 different industry domains.

Kavya (Director of Product Marketing, LambdaTest) - Thank you so much, Yogendra, for the brief introduction about yourself and Binmile. Now, on to the first question of the day. Why are data quality and ETL testing important? And what is its impact on data-driven decision-making?

Yogendra Porwal (Test Automation Lead, Binmile) - So, you see a Novartis, it's all about data, right? If you see all the big enterprises, be it financial platforms or maybe e-commerce, right? Or if you think about OTT as well, right?

So, they are all collecting data from the customer or maybe the historical data, right? And curating them, processing them, and creating insights or recommendations to be used for customer or maybe business insight, right, or maybe for their future planning, right, for the business.

So, you see, when there is so much processing of data, right, so much data is being consumed, and so much processing is being done and it is being used to prepare insights and recommendations, right, and prediction. So, we need someone who can actually make sure that all these processes, accuracy, and processing are being done accurately, right?

The data that is being ingested in those processes is of good quality, right? Also, the prepared data is very relevant to the business or maybe the insights that we are creating, right? And that's where the ETL testing comes in. That's why it is very important to have data quality in those pipelines in those processes, having testing in place that can help you to have better quality of those

Kavya (Director of Product Marketing, LambdaTest) - Thank you. That's an interesting point. And, of course, thanks for sharing this insight. Moving on to the next question that we have for you. Can you briefly describe your current approach to data quality testing? I'm sure you must be doing it on a day-to-day basis at your organization.

Yogendra Porwal (Test Automation Lead, Binmile) - So I think the approach or the fundamental approach is very similar to any kind of testing, right? We understand the business first, right? And we understand the process. So for data quality or retail testing as well, right? We first need to understand what business we are doing it for, right?

And what domains the data belong to, right? So that we can make sure the quality of data is up to the mark. You need to understand like what if it is belong to financial domain, right? Then we need to make sure that the precision of the data is good, right? If it is belong to customer profile data, then we need to check all the dimensions related to customer and their habits, right?

If it is belong to telecom domain. So we need to first understand what the business is, what the domain of our process, right? Second is you need to understand the process itself, right? How the data is being curated right how it is being processed and what will be the use case of those process data right if it is you get used for recommendation system right.

Then you need to make sure that all the dimension like what is thumbs up and what what customer is doing comes down right what are their interest while creating the profile right and based on those dimensions you will have better understanding of how of how those recommendations is actually being prepared right and then you can prepare your approach around that right.

If it is being used for financial data right then you need to make sure that those data have very accurate precision to the last decimal right because it will impact the overall insight right that is being prepared for the customer or business. So when you have this holistic knowledge or understanding, then you will have a better approach that can be curated for the data quality, but the data is there to make sure it is compliant.

Kavya (Director of Product Marketing, LambdaTest) - Thanks for sharing that. And interesting to know that with every domain, there are also so many compliances, right? As you said, for finance, there are a set of compliances. For healthcare, our clients have to comply with a set of requirements. Right.

So with every domain, what is happening is that the upload sort of has to be tweaked or customized in its own way. Great. So what are the biggest challenges that you have faced in ensuring data quality at Binmire?

Yogendra Porwal (Test Automation Lead, Binmile) - So the challenges are common for I think all kind of testing. So the one common challenge that I have faced and I have seen it is a norm in the industry. I mean, I don't have much experience. It's just 10 years. There are also experts who are more knowledgeable. But what I have seen that we have a very low QA to Dave ratio in all the teams.

In all the projects that I have been working on, have seen a maximum to one QA to five developer ratio. And then QA will always have tight deadlines because throughout the sprint, QA will have a very small amount of time to certify everything, do the regression testing.

And there are a lot of pressure from stakeholders to sign off early, so that they can go to production early. So this is a very common challenge. And if we talk about ETL testing specifically, so challenges comes with the complexity in data, right? So to handle the diverse data sources and their format, suppose the data is coming from CSV file, it might be coming from RSS feed or streams of API, right?

It can be come from various type of databases. It can be NoSQL, MongoDB, or maybe MSSQL, right? So to handle those different data sources to create a standard process that can fit in all that, right? That is a bit complex and can be challenging, right? There is another challenge, which is the size of data, right? So when the data is small, right?

For example, we are just checking for the customer data profile data, right? For the combination system, we just need to check a few dimensions around it, right? Or a few small fields. So to to at the site, so when I talk about entity, entities like a table and dimension is like the columns in the table.

So when those entity grows rapidly, right, suppose we have hundreds of entity and those entity can have hundreds or tens of dimension, right? And that grows to a number of thousands, right? To prepare validation rules, to do profiling for each of them, right? It is very tedious task. And when you have such monotonous tasks that you have to do again and again, it is error prone, right? And that's where AI/ML can help us, right?

So they can automate those monotonous tasks for us. They can actually enhance the coverage for us, right? And that's why we are actually keen to involve AI/ML in the data wallet testing.

Kavya (Director of Product Marketing, LambdaTest) - Thanks again. So the takeaway that I'm getting out of it is that there are so many variables involved that the number of challenges also sort of increases around gets amplified. And we are, in fact, as we move on, we would be talking about AI and its potential in this area, of course.

But good to know that we even started looking at solutions on how to sort of automate this entire process using AI/ML. Moving on to the next question, what potential does AI/ML have for data quality testing? How can it be a game changer for data quality testing?

Yogendra Porwal (Test Automation Lead, Binmile) - Yeah, definitely, it has potential and I think AI/ML will change how we are doing ETL testing or actually how we will do ETL testing in future. The foundation will be same. We will go through business requirement. We understand how the process is being done, what is the outcome.

It's not like it will change what we are doing, but it will change how we will do things. AI/ML won't be doing everything for us. But there are areas in which it can have a impact on doing some monotonous task that we are doing as of now, right?

Like prepared, preparing data validation rule, preparing profiling for composite dimensions or data, right? Doing anomaly detection for us. So, while after doing this automation with AI/ML, right? For those monotonous tasks it will actually free up a lot of time for us human testers, right?

So that we can focus on more complex work and increase the overall quality of the product, right? So we already have some tools like Collibra or Query Search AI, right? That are actually utilizing LLM model for preparing data validation, data mapping check, data lineage for you, right? That help in increasing overall data quality and governance.

Kavya (Director of Product Marketing, LambdaTest) - Thanks. You just mentioned a few tools. Would you like to share or recommend a few of those that you have personally used at your workspace? And you know, you've seen good justice because a lot of experimentation is happening when it comes to AML tools right now, right? So if there are anything that you might want to recommend.

Yogendra Porwal (Test Automation Lead, Binmile) - So for ETL testing, think Collibra and query search both are a good tool and I think query search has just introduced query search AI, right? That can prepare complex transformation checks for you as well, do data mapping very quickly. So that is one of the tool. Collibra is also a good tool, right?

So maybe we can go and explore that, but decide on your own right? Check like where it can help you, right? What are the areas that it can be implemented and save some effort?

Kavya (Director of Product Marketing, LambdaTest) - Thank you for those recommendations and also for the perspective that you shared on how AI/ML is helping testers free up their time basically so that they can focus on more essential tasks that are on hand. Let's also discuss some specific areas where it could make a big impact. In your opinion, what are some of those most promising areas where AI/ML can revolutionized data quality testing.

Yogendra Porwal (Test Automation Lead, Binmile) - So, I see there are promising areas in enhancing data profiling, right? So, data profiling is actually preparing validation around one particular dimension, right? Have like what kind of format or data type it has like complex relation between different other dimensions, right?

Dependencies and what kind of rules or validation we can have example age of a customer right we can have mean age or maybe if it is only for above 18 right we can have minimum age maximum age but when we use AI email to create data profiling, can have several other suggestions to do those kinds of profiling and it can automate those rule generation.

So it will save a lot of time. There is another promises or maybe the future that where we can use AI/ML or LLMs, which is creating self -healing data pipeline, right? So suppose we have trained some LLM model that can automatically identify the data quality issue, right? Because it has insight from the historical issues that it is being trained on.

And it can find where the values are missing, right, in your data, or it can have incorrect format as well when identified right, you can create a workflow on maybe intelligent routing. So, errors can be directed to appropriate remediation process right and it can be fixed automatically. So, that way you will you have already automated the whole finding those minor issues and error and it is getting fixed again.

So, it is a self filling for your data, right? And it will save a lot of time on those small issue that we might have missed or maybe we spend a lot of time just to look those issue, right? And it is getting fixed and there is a very feedback cycle again and again, right? So it will save those kinds of time. So these are the two areas that I think has a scope in the future.

Kavya (Director of Product Marketing, LambdaTest) - Thank you, that is a lot. Of course, those ideas also sound very promising. Moving on to the next question that we have, have you implemented any AI/ML solutions for data quality testing? And if so, can you also share your experience and the results with the audience?

Yogendra Porwal (Test Automation Lead, Binmile) - So, we have done implementation around data profiling, right, and also anomaly detection. So, we have used some LLM model. So, first challenge is finding an LLM model that can actually work or maybe relevant to your data, right.

Maybe financial data, right? So suppose for example, when we have a dimension size for a property, right? And if we try to do rule validations or try to prepare rules around it ourselves, right?

So we might think of minimum size, and maximum size of property, right? And we might think of that the size should be in a square foot, right? Or a square yard. And we can also think of data type. It should be integer or maybe doubles or when we use that model right to prepare rules around that dimension with the AI right.

It actually created some rules that we have not thought of actually. like mean, median, standard deviation right that will provide an overview of the central tendency and spread of the size value right. It is also suggested to check like suppose you are in a society and there are flats in that property right.

So the size cannot be square foot it should be unit number right how many units this particular apartment has. So this is another great insight that we had right and it even suggested the pattern and trends in the size value which can be associated with the other dimensions right. Suppose the size is directly related to price and the location or region at where it is located.

So this is one example in which we have done some work. Another is anomaly detection, right? So I can share one example for the customer data of their watching habits on the OTT platform, right? So when we were checking on our self, right?

So we were checking like how, which drama they have done, thumbs up or thumbs down, right? Which genre they are watching against their recommendation right and what interest they have selected in their profile and what recommendation they are having. Right.

So, but when we used LLM, right, they have identified a few that we have not thought of. So one is a significant deviation for one user, average viewing duration, right. So previously it was like maybe one hour daily and it is significantly increased over time.

So it's not like it's an anomaly, right? Maybe the customer have some free time, it has some vacation, right? But it's an anomaly, right? That can be reviewed, right? They have found inconsistency in their day of week time to watch series, right? They have seen frequent switches between different content type and genre in a short period.

They have also identified one very interesting issue that there is a frequent change in playback speed for one customer. So he's watching like eight hours of series in just three to four hours. So this is actually very insightful and helpful to find those anomalies. It shed great light on how we are viewing data and what inconsistency and issue the source data can.

Kavya (Director of Product Marketing, LambdaTest) - Thank you, Yukinder, for walking us through the entire process of it. And in fact, breaking it down into multiple steps that are involved. There are so many different factors, so many multiple steps involved in even creating that framework to go ahead with choosing the right set of LLM model. Moving on to the next question, what are the key considerations for organizations that are looking to adopt AI for data quality.

Yogendra Porwal (Test Automation Lead, Binmile) - Yeah, so there are actually several. So let's discuss a few. So first is the model and the data, right? So as I mentioned earlier, so we need to choose the LLM model very carefully, right? Which model we will be using, right? What is the size of the model, right? What is the data that we will be ingesting to validate right?

Is it accurate? Is it complete? Is it representative of the real world? Right? What kind of training data is being used in the LLM model? Because finding a pre-trained model that is actually relevant to your domain, right? It's very hard, right? Finding a data quality model or maybe a model that can help you in data quality for specifically financial domain or specifically the OTT domain. It's very hard.

So you might have to train them on your own data, right? And that will take time. And that will actually take a lot of accurate data, is relevant around that domain, right? So that is one consideration. Also the infrastructure and computational power, right? So training and running these AI models or LM models requires significant computational resource, right?

Like CPU, GPU, and there a specific processor which are claimed to process this model faster. Even if you have seen or in news that OpenAI can't even make revenue out of this chat jpt, because the cost to run each query is very high. So this is actually a important factor to consider before going or choosing the LLM.

Then we have also one challenge, is expertise and talent, right? To find right people who also have knowledge around your domain and who is also expert with data science and engineering and also can work with IML models, right? There is another challenge, which is monitoring and governance, right?

So as you mentioned before, this financial domain data comes with the data governance and compliances, right? So what kind of data we are using, are we actually compromising any compliance or governance while training our model, right? So that can also be an issue.

So we should consider that as well, And if those LLM model can be seamlessly integrated with our process or not, that is also another consideration. So I think it's easy to do POCs and everything, but actually using those models, right takes a lot of consideration and research before.

Kavya (Director of Product Marketing, LambdaTest) - Thank you so much, Yogendra, for those insights. Now, looking forward, how do you see AI/ML impacting the future of data quality testing?

Yogendra Porwal (Test Automation Lead, Binmile) - Yeah, so I think AI will significantly impact the future of data quality and Intel testing, right? It will transform it into a more efficient and accurate and productive practice, right?

Few that we can discuss right how it will boost the efficiency right so when we use LLM models to do some monotonous task for us like anomaly detection, data profiling right and it will free up a lot of time for us to do focus on more complex tasks right and it will significantly improve the testing efficiency and reduce our overall time to production and for our deliveries, right.

And it will enhance the accuracy as well, right. Because it is being trained on your own data and the historical issue that you have found, right. And also the profiling and validation preparation, right. When we do those monotonous task again and again for each dimension, right. It is error prone but LLM can do it better.

Not better, but it is being trained those issues that you have found in previous time. So it has that holistic knowledge with it. So it will increase your accuracy. And there is also proactive approach around that. Because we can implement those machine learning and that can allow to predict data quality maintenance or data quality issue.

Maybe in correct format, missing data and it can analyze historical issues as well. So there will be a proactive approach around ideal testing with the help of AI/ML and it can identify potential issues that can even before they can occur. So enabling the preventive measures throughout our data integrity or data quality pipeline and it will actually increase the integrity and consistency as well.

Kavya (Director of Product Marketing, LambdaTest) - Thank you. yeah, better efficiency or reliability as well as accuracy when it comes to data quality testing. And of course, as you said, AI/ML would of course bring in a more holistic approach to testing, of course. Now, moving on to the last question that we have today, what advice would you like to give the listeners who are interested in exploring AI/ML for data quality testing?

Yogendra Porwal (Test Automation Lead, Binmile) - So I think my advice is to start looking into the LLM model right, because AI and LLM is at it peak, right it's not like it just came in right so AI has been here for decades now right but it's in the hand of common people now right if you are using WhatsApp right we have AI agent in WhatsApp as well right so those kind of AI agent are making life easier for everyone, right? The same goes for data quality analysts as well.

So we need to explore what tool that is leveraging the AI/ML, right? And what tools claim that they can help us with data quality by leveraging those LL models? Look out for those, right? Explore those areas, and don't stop there, right? Explore how this LLM actually works as well, right?

Because when you have an idea like how these models are being trained, what is the biases and issues with this LLM model, right? Then you will have a holistic approach to like how we can use those to enhance or empower our data quality testing. Right. And it's not like they are very error-proof. Right.

So they have their own challenges because AI LLM models have their biases as well, right? We don't know what kind of data is being used in the training of them, right? We cannot explain the result as well because there are so many variables and parameters being used in the neural network of the LLM model, right? So I think we should consider all of those and look out for the LLM model and explore those areas.

Kavya (Director of Product Marketing, LambdaTest) - Thank you, Yogendra. That was really insightful. As we wrap up today's conversation, I just want to take a moment to thank you for sharing valuable insights into AI/ML and applications in data quality testing. I'm sure your expertise has definitely helped our audience get a deeper understanding of potentials and challenges in this area.

And to our audience, thank you for joining us today, and stay tuned for more episodes of the LambdaTest XP Webinar Series, where we bring you cutting-edge insights from industry experts in the field of quality engineering and testing. Once again, thank you so much, Yogendra, for joining us today. It has been a pleasure hosting you. Thank you everyone.

Yogendra Porwal (Test Automation Lead, Binmile) - Thank you for having me.

Past Talks

In-Depth with Playwright: A Modern Testing FrameworkIn-Depth with Playwright: A Modern Testing Framework

In this XP Webinar, you'll learn about Playwright and its innovative features, discovering how this modern testing framework streamlines end-to-end testing, reducing setup time and improving efficiency for robust software quality.

Watch Now ...
Building Quality Software: AI-based testing approach with Jira and QMetryBuilding Quality Software: AI-based testing approach with Jira and QMetry

In this XP Webinar, you'll explore AI-based testing approaches with Jira and QMetry to enhance software quality, streamline testing processes, and accelerate development cycles for robust applications.

Watch Now ...
Rethinking the Role of QA ProfileRethinking the Role of QA Profile

In this XP Webinar, you'll discover transformative insights into modernizing the QA profile, integrating agile practices to streamline software development and elevate quality assurance standards effectively.

Watch Now ...