Selenium WebDriver Architecture Explained – Ultimate Guide On What Is Selenium Architecture And How Does It Work

Tanay Kumar Deo

Posted On: July 3, 2023

view count34456 Views

Read time17 Min Read

When it comes to automation testing for web apps, a few frameworks in the market, like Selenium, Puppeteer, Cypress, Playwright, etc., make it to the ‘favored list’ of top automation frameworks. The choice of test automation framework counts on a range of parameters like language support, complexity, and scale, along with the framework expertise available within the testing team. However, it’s no wonder Selenium is still the most preferred framework among automation testers and developers.

Testing the web app against all odds is challenging, and we need a tool to help us in this process. One such primarily used tool by automation testers and developers is Selenium WebDriver. If you wish to know about Selenium Architecture internally, you have landed at the ideal place.

In this blog on Selenium Architecture, I will go into detail regarding Selenium Architecture and Selenium WebDriver; we will look at the working of Selenium WebDriver, its advantages, and limitations. If you are preparing for an interview you can learn more through Selenium interview questions.

What is Selenium?

Selenium is an automation testing framework. It automates the testing of web applications on different browsers. Selenium empowers testers and developers to write automation test scripts effortlessly in multiple programming languages, like Java, Ruby, NodeJS, Python, C#, PHP, Perl, and many more.

Selenium supports cross browser testing on almost all popular web browsers, such as Google Chrome, Apple’s Safari, Mozilla Firefox, Microsoft Edge, Opera, etc., wherein the Selenium test scripts, written in different programming languages, can run smoothly. It also supports cross platform testing, i.e., the test cases can run simultaneously across multiple supported operating systems. The platforms supported by Selenium include Windows, Linux, Mac OS, and Solaris. Selenium is one of the top automation testing tools as it allows developers and automation testers to create flexible and robust automation test cases.

What are the Components of Selenium?

As we already know, Selenium is not just a test automation framework. It is a suite of testing tools, and every tool in the suite has unique capabilities that help in the development and designing of automation frameworks. All these components can be used individually or can be paired with one another to achieve a greater deal. The Selenium framework mainly comprises four components as listed below:

  • Selenium IDE
  • Selenium WebDriver
  • Selenium RC(Obsolete now and merged with WebDriver)
  • Selenium Grid

Components of Selenium

By Sathwik Prabhu

Now let us comprehend each of these components one by one briefly:

Selenium IDE

Selenium IDE refers to Selenium Integrated Development Environment. It is a Firefox plugin that allows testers and developers to record and playback test scripts. It does not require any prior programming understanding. Usually, the Selenium IDE is a prototyping tool.

Selenium WebDriver

Selenium WebDriver is one of the most important components of the entire Selenium framework that supports overall browser-based automation tests. WebDriver is the remote control interface component that allows test programs to instruct and interact with browsers, manipulate DOM elements in a web page, and control the user agent’s behavior. In a nutshell, it is the bridge between the Selenium framework and the browser over which the test cases run.

After Selenium 1, Selenium RC was merged with the Selenium WebDriver and formed Selenium 2. This was later upgraded to Selenium 3 and further to Selenium 4. We will learn about these in detail in the upcoming sections.

Selenium Grid

This component of the Selenium suite is used to run parallel tests on machines against multiple supported browsers. Since almost all modern browsers and operating systems are supported by Selenium, it is easier for the Selenium Grid to run numerous tests simultaneously on different operating systems with different browsers.

Info Note

Run your Selenium test scripts on LambdaTest’s Cloud Grid. Try LambdaTest Today!

Selenium Architecture


selenium-webdriver-architecture

By Sathwik Prabhu

In this blog on Selenium Architecture so far, we have covered the basics of Selenium and its various components. Now, let’s try to understand the Architecture of Selenium WebDriver in a detailed manner.

As mentioned earlier, Selenium WebDriver and Selenium RC were merged into one single unit called Selenium 2.0 or Selenium WebDriver 2.0. Over time it has been constantly enhanced for more functionalities and features and was upgraded to Selenium 3.0. In Selenium 3.0, the primary mode of communication between the automation test script and web browser was the JSON Wire protocol.

With the introduction of Selenium 4.0, the JSON Wire protocol is replaced with the W3C protocol as the mode of communication. This means that encoding and decoding of test case requests are no longer required in Selenium 4.0. We will learn about these protocols in a detailed fashion in the upcoming sections. So, Let’s start with the Selenium Architecture of WebDriver in Selenium 3.0 and then look at the Selenium Architecture of WebDriver in Selenium 4.0.

Selenium Architecture of WebDriver in Selenium 3.0

Selenium 3.0 primarily uses the JSON Wire protocol to communicate between the user test script and the browser. This wire protocol represents a RESTful web service using JSON over HTTP. In Selenium 3.0, the Selenium WebDriver architecture consists of four major components:

  • Selenium Client Libraries/ Language Bindings
  • JSON Wire Protocol
  • Browser Drivers
  • Real Browsers
  • The image below represents the Selenium Architecture of WebDriver in Selenium 3.0.

Selenium Architecture of WebDriver in Selenium 3.0

Now let us discuss each of these components in Selenium 3.0 one by one:

Selenium Client Libraries:

Automation scripts that interact with the Selenium framework through Selenium WebDriver, can be programmed in multiple programming languages such as Ruby, Java, C#, Python, Javascript, etc. Hence, Selenium developers have chosen to develop Selenium client libraries or language bindings that permit Selenium to support multiple languages.

A Selenium Client Library is nothing but a different kind of Jar file. It contains methods and classes of Selenium WebDriver that are required to create test automation scripts.

Selenium core libraries can be installed easily using package installers available with the respective languages. Also, all the supported Selenium client libraries can be downloaded from the official download page of Selenium.

A Selenium client library is not a testing framework but it provides an application programming interface (API), i.e., a set of functions that performs the Selenium commands from the test script. For example, Java bindings provide APIs to perform the Selenium commands written in the Java language.

JSON Wire Protocol

JSON or JavaScript Object Notation is a very famous data interchange format based on a subset of the JavaScript Programming Language. Selenium WebDriver 3.0 uses JSON to communicate between Selenium client libraries and browser drivers. It provides support for data structures like arrays and objects, making data reading and writing more comfortable.

The JSON requests sent by the client are altered into HTTP requests for the server to understand and converted back to JSON format while sending it back to the client again. This process of data transfer is serialization. In this method, the internal logic of the browser is not disclosed, and the server can communicate with the Selenium client libraries, even if it is unfamiliar with any programming language.

Browser Drivers

Browser drivers act as a bridge between the Selenium client libraries and the real browsers. They help us in running Selenium commands on the browser. It is the main component of Selenium WebDriver responsible for executing user actions, like mouse clicks, page navigation, button clicks, etc., on the browser. For every supported browser in Selenium, we have a unique browser driver. These browser drivers take commands from the Selenium test scripts and pass them to the respective browsers.

Whenever a Selenium automation test is triggered, the following series of actions are performed:

  • Every test command generates a corresponding HTTP request using the JSON Wire Protocol, which is then sent to the browser driver.
  • This HTTP request is routed through the HTTP Server.
  • The HTTP Server directly drives the command execution on the real browser.
  • The browser then sends back the test status to the HTTP Server, which is responsible for forwarding it to the test automation script.

In this way, these browser drivers permit communication between the Selenium automation script with different browsers. Also, the browser driver ensures that communication happens without disclosing the internal logic of those browsers.

Some popular browser drivers in Selenium are ChromeDriver, FirefoxDriver, SafariDriver, OperaBrowser, EdgeDriver, and HtmlUnitDriver.

Real Browsers

A real browser is an application or a software program used for searching and seeing content on the World Wide Web (WWW). This component of the Selenium Webdriver architecture in Selenium 3.0 is pretty straightforward. The browser receives the command and calls the respective functions or methods to perform the desired automation task.

Selenium framework supports almost all popular and modern-age browsers like Google Chrome, Mozilla Firefox, Microsoft Edge, Apple’s Safari, etc.

Selenium Architecture of WebDriver in Selenium 4.0

In Selenium 3.0, JSON Wire protocol over HTTP was used as the medium of communication. The main weakness of this JSON Wire protocol was that there needed to be direct communication between the Selenium client libraries (C#, Java, Ruby, Python, etc..) and the browser driver. This protocol acts as a mediator between the client libraries and the WebDriver. This was so because the server didn’t understand the programming languages but only the protocols. This results in slower test execution, exceptions, and more possibilities of the test getting flaky.

This problem was solved with the introduction of W3C (World Wide Web Consortium) in Selenium WebDriver 4.0. In Selenium 4.0, WebDriver W3C Protocol supersedes the older JSON Wire protocol. This means that we no longer need to encode and decode the Selenium command or the API request. Using the W3C protocol and the automation scripts can directly communicate with the browser. The pieces of information will not be transferred by receiving and sending HTTP requests and HTTP responses, respectively.

The image below represents the Selenium Architecture of WebDriver in Selenium 4.0 onwards.

Selenium Architecture of WebDriver

In Selenium 4.0, the Selenium WebDriver architecture consists of the following four major components:

  • Selenium Client Libraries/ Language Bindings
  • WebDriver W3C Protocol
  • Browser Drivers
  • Real Browsers

Basically, all the components in Selenium WebDriver 4.0 are similar to the components in Selenium 3.0 except that the JSON Wire protocol is replaced with the new W3C WebDriver protocol. So, let’s discuss this protocol in detail.

WebDriver W3C Protocol

‘WebDriver W3C’ is the recently introduced protocol in Selenium 4.0. It has acquired an endorsement from W3C, the community which toils on web standards development. W3C Editor’s Draft and W3C Working Draft are tremendous resources for keeping a tab on the advancement of WebDriver W3C Protocol.

In WebDriver W3C Protocol, there is a direct information transfer between the server and client without the necessity for the JSON Wire Protocol. As both Selenium WebDriver and web browsers use the identical protocol, automated Selenium testing will run tests more consistently between various browsers.

With WebDriver W3C Protocol in action, developers or automation testers would no longer need to change the automation scripts to work across various web browsers. Consistency and stability in tests are the two significant advantages of WebDriver W3C protocol in Selenium 4.0.

Difference Between JSON Wire Protocol and WebDriver W3C Protocol

Till now we have clearly understood the Selenium Architecture in Selenium 3.0 and Selenium 4.0. This section will discuss the primary differences between the JSON Wire protocol used in Selenium 3.0 and the WEbDriver W3C protocol used in Selenium 4.0.

JSON Wire Protocol WebDriver W3C Protocol
Retirement Permanently retired in Selenium 4.0. Active and current protocol.
Role This was mainly responsible for communication in Selenium WebDriver Architecture 3.0 Current protocol responsible for communication in Selenium WebDriver Architecture 4.0
Test Script Compatibility JSON Wire protocol first converts the test commands into HTTP protocol and then communicates with the server to execute test actions. With WebDriver W3C protocol test scripts can directly command the browser driver to execute test actions.
Test Execution Time Because of the encoding and decoding of the commands in JSON, the test execution sometimes becomes very slow. Test Execution is faster because of the direct interaction of test scripts with the browser driver.
Element Interactions Element interactions have different implementations Updated implementation using Actions API

Other than these differences, the WebDriver W3C protocol also introduced some changes in error codes, data structures, and response status codes. More detail on these changes can be found on the official Selenium Changelog page.

How Selenium WebDriver Works Internally?

In a real-time scenario, when we run a Selenium script written in any language using any one of the supported Selenium client libraries (say Java), the browser will launch and starts behaving as directed by the script. Now let’s understand what is occurring internally after the Run button is clicked till the launch of the real browser.

  1. As we click on the Run button, the Selenium client library runs Selenium commands from the automation script and converts them in a serialized JSON format (for example https://www.lambdatest.com will be serialized to {“url”: “https://www.lambdatest.com”} )using JSON Wire protocol over HTTP sent to the browser driver (say ChromeDriver) for each command. Every browser driver uses the HTTP server to receive an HTTP request.
  2. JSON Wire Protocol is responsible for communicating between any client and the server by sharing the data. The browser driver receives the HTTP request via the HTTP Server. This HTTP Server performs all the typical actions or instructions on the browser driver and then the browser driver will send a request to load the URL on the real browser.
  3. After performing all instructions and commands, the execution status is sent back to HTTP Server over the HTTP. The browser driver furthermore uses the HTTP server to receive the HTTP request and then send it back to the client library via the JSON Wire Protocol.

In Selenium 4.0, the role of JSON Wire protocol is completely removed. And the browser driver directly communicates with the Selenium client libraries to execute various Selenium commands on the real browser.

Advantages of Selenium WebDriver W3C Protocol in Selenium 4.0

The introduction of WebDriver W3C protocol in Selenium 4.0 offers several advantages that include the following points:

  • The automated Selenium testing will run more stably and consistently across browsers as both the browsers and Selenium WebDriver is using the same protocol.
  • With WebDriver W3C Protocol, automated Selenium testing will be less unreliable and more stable. Stability in automation testing is a major reason to shift to Selenium 4.0.
  • The new WebDriver W3C protocol uses the Actions API, which is richer in comparison to the one in JSON Wire Protocol, to interact with the browser. Action APIs would now let us perform multi-touch actions, zoom-out, zoom-in, pressing two or more keys simultaneously, and many more.
  • For example, the Pinch-in sequence in W3C Protocol is defined by an action sequence consisting of three ticks with two-pointer devices of type-touch, each performing a series of actions pointerDown, followed by the pointerMove, and then pointerUp.
  • Standardization of W3C unlocks the opportunities for promoting compatibility beyond WebDriver API implementations.
  • Web applications with W3C compliance help in reducing maintenance efforts, as cleaner code results in enhanced readability.

Conclusion

After readings this detailed blog on Selenium Architecture, you will now have a better understanding of what Selenium Architecture is, the various components of the Selenium suite, how important the Selenium WebDriver is in the entire Selenium Architecture, and what is the Selenium architecture in Selenium 3.0 and 4.0. If you have carefully gone through this Selenium architecture tutorial. In that case, you will also be well-equipped with the following knowledge:

  • Selenium WebDriver is the main component of the entire Selenium suite. It is like the brain of Selenium. Selenium IDE, Selenium Grid, and Selenium RC (deprecated) are other components of the Selenium suite.
  • Selenium WebDriver mainly consists of the following components: Selenium client libraries, JSON Wire protocol, browser driver, and the real browser. The browser drivers present in the Selenium WebDriver help ease the suite’s interaction with multiple web browsers.
  • In Selenium 3.0 the JSON Wire protocol over HTTP was primarily used as a mode of communication between the Selenium client libraries and the browser driver. With the introduction of Selenium 4.0, this protocol was replaced with the new WebDriver W3C protocol.

Selenium is a reliable and robust framework for automated web app testing. However, its usage and throughput will be limited if used to test local infrastructure, which is neither economical nor scalable. To achieve better efficiency, scalability, and faster performance, go for cross browser testing in Selenium with a cloud-based digital experience testing platform such as LambdaTest and enjoy the wonderful outcomes.

Frequently Asked Questions (FAQs)

What is Selenium Architecture?

Selenium Architecture refers to the complete structure and components of the Selenium framework that enable automated web app testing. It primarily includes Selenium IDE, Selenium WebDriver, Selenium Grid, and Selenium RC (deprecated).

What are the primary components of Selenium WebDriver?

Selenium WebDriver is the main component of the entire Selenium suite. It is like the brain of Selenium. It mainly consists of the following components: Selenium client libraries, JSON Wire protocol, browser driver, and the real browser. However, with the introduction of Selenium 4.0, the JSON Wire protocol is replaced by the WebDriver W3C protocol, which enables direct communication between the client libraries and the browser driver.

What are the architectural differences between Selenium 3.0 and Selenium 4.0?

In Selenium 3.0, the JSON Wire protocol over HTTP is used as the mode of communication between the Selenium client libraries and the browser driver. This protocol transfers the information between a client and the server over HTTP by encoding and decoding JSON data. While in Selenium 4.0, this JSON Wire protocol is replaced with the WebDriver W3C protocol, which enables direct communication between the client and the server without any encoding or decoding.

What is the role of the browser driver in Selenium WebDriver architecture?

Browser drivers act as a bridge between the Selenium client libraries and the real browsers. They help us in running Selenium commands on the browser. It is the main component of Selenium WebDriver which is responsible for executing user actions, like mouse clicks, page navigation, button clicks, etc., on the browser.

Author Profile Author Profile Author Profile

Author’s Profile

Tanay Kumar Deo

Tanay kumar deo is a skilled software developer, Upcoming SDET at GlobalLogic (A Hitachi group company). With expertise in Android and web development, he is always eager to expand his skill set and take on new challenges. Whether developing software or sharing his knowledge with others, he is driven by a desire to make a positive impact on the world around him. In addition to his technical abilities, Tanay also possesses excellent blogging and writing skills, which allow him to effectively communicate his ideas and insights to a wider audience.

Blogs: 16



linkedintwitter

Test Your Web Or Mobile Apps On 3000+ Browsers

Signup for free