Selenium WebDriver with Powershell

Halo!

I have been back now, after nearly a month that I have to fight with some automation tools for my task then I have ended up with Selenium tool which is the most popular tool for automation. I have tried with some other tools such as Phantomjs, Nightmarejs then Selenium Webdriver is the final one because it supports a lot of languages which can help on other requests and also a large community for research.


Use Case:

I have a task to capture a screenshot of the displayed tooltip for a localized application. We have to go through some steps/screens to capture the tooltip and also go through 8 languages to make sure the tooltips are properly displayed and localized. So, automation is a way to deal. However, again, however life is not a smooth road. At first, I tried with phantomjs, it's a javascript webkit engine, it's quite powerful but also a little bit old so my application is using react with ES6 then we have problem with display issue on phantomjs engine.

I move to nightmarejs, which is inherited from phantomjs and uses Electron as a browser. Things seem went well until I figure out the way nightmarejs capture the screen by rendering the DOM html from browser but not capture the UI rendering. Because, the automation tool to capture tooltip, it will capture the tooltip value and compare with the expected value (this is how automation will run) but no meaning of capturing tooltip into a screenshot. I have to find another solution.

So why Selenium, because I have found an article explain on why almost automation tools will not take screen of the UI rendering, so in order to capture the UI render, we have to use another process to capture the screen of the browser. Then Selenium can help me with this because we can use Java, .NET to call the Webdriver then this main process can capture the screen at the appropriate point.


Solution:

With this idea, I have ended up by using .NET (my strong skill) and I want to have a quick win so powershell is a quick solution to write a script instead of opening a new .NET project. So, I have found this thread about using Selenium Webdriver with Powershell so it helps a lot with my starting point. This one is also a great one to help me to fine tune on the code and make more explicit wait for the html response.

So let's start how to use the Selenium Webdriver, we can read from Selenium document center. We will download the selenium-dotnet dll files from here then extract it and you will refer to these dlls. Here is the list of the documentation for Webdriver API:
- .NET
- Java
- Python
- Ruby
- Javascript

There are also many webdrivers that you can use to execute your automation. I use Chrome to be compatible with almost the web applications for now so we will use the latest Chromedriver from Chromium. There is a problem with SendKeys function on Chromedriver, let extract the file (chromedriver.exe) and set the path into your variable environment.

To init the selenium webdriver libraries

Add-Type -Path "[path]\Selenium.WebDriverBackedSelenium.dll"
Add-Type -Path "[path]\WebDriver.dll"
Add-Type -Path "[path]\WebDriver.Support.dll"

$seleniumOptions = New-Object OpenQA.Selenium.Chrome.ChromeOptions
$seleniumOptions.AddArguments(@('--start-maximized', '--allow-running-insecure-content', '--disable-infobars', '--enable-automation', '--kiosk', "--lang=$language"))
$seleniumOptions.AddUserProfilePreference("credentials_enable_service", $false)
        $seleniumOptions.AddUserProfilePreference("profile.password_manager_enabled", $false)
$seleniumDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver($this.seleniumOptions)
$seleniumDriver.Url = $url
$seleniumWait = New-Object -TypeName OpenQA.Selenium.Support.UI.WebDriverWait($this.seleniumDriver, (New-TimeSpan -Seconds 10))
$seleniumWait.Until([OpenQA.Selenium.Support.UI.ExpectedConditions]::ElementIsVisible([OpenQA.Selenium.By]::ClassName("main-logo")))

To refer more options on starting chromedriver, you can read here for the Chromium command line. You can set the Chrome language with --lang=en, full screen mode with --kiosk, disable the information bar with --disable-infobars.

Close and exit selenium webdriver

$seleniumDriver.Close()
$seleniumDriver.Dispose()
$seleniumDriver.Quit()

There are many ways to query the DOM element with selenium API.

- The simplest way is from the Chromedriver class OpenQA.Selenium.Chrome.ChromeDriver. You can query by Id, ClassName, CssSelector, TagName, XPath ...
seleniumDriver.FindElementById("id1")
seleniumDriver.FindElementByXPath("XPath")
- Or via a filter class By: OpenQA.Selenium.By.
[OpenQA.Selenium.By]::Xpath("//span[@class='classname']")

Personally, I prefer to use XPath as you can find the element with its hierarchy and also some more flexibility in your xpath query for querying xpath contain string or sibling node.

After you have retrieved the IWebElement interface, then you can trigger the Click, SendKeys or Submit action
seleniumDriver.FindElementById("id1").Click()
seleniumDriver.FindElementById("id1").SendKeys("abc")

With more advanced actions such as mouse, key interactive, you can use OpenQA.Selenium.Interactions.Actions. Remember to call Perform() in order to perform the action.

New-Object OpenQA.Selenium.Interactions.Actions($seleniumDriver)).MoveToElement($element).Perform()

To select an option with select element, you can use OpenQA.Selenium.Support.UI.SelectElement then you can select by index, value or text.
$selectElement = New-Object OpenQA.Selenium.Support.UI.SelectElement($this.seleniumDriver.FindElementByXPath("XPath"))
$selectElement.SelectByIndex($index)

Then wait for the response, there are few ways to wait for the response too:
- Explicit wait, wait for a condition with OpenQA.Selenium.Support.UI.WebDriverWait
seleniumWait.Until([OpenQA.Selenium.Support.UI.ExpectedConditions]::ElementIsVisible([OpenQA.Selenium.By]::Xpath("//span[@class='logo']")))
- Implicit wait, as you will let the webdriver to wait for an exact amount of time
$seleniumDriver.Manage().Timeouts().ImplicitlyWait((New-TimeSpan -Seconds 5))
- Some conditions I can't use explicit way and implicit seem doesn't work so I use a dirty wait in powershell.
Start-Sleep #second

This is quite conceptual APIs and objects that you can use for your basic automation (at least for me).

The next step is my crazy stuff, as mentioned above, I will capture the screen which has the tooltip display. Another headache, my application use browser tooltip (title attribute) so the tooltip will popup at the mouse position so I have to control the mouse cursor to the proper coordinate in order to have a reasonable tooltip displayed. Although the IWebElement has the Location property but this is the relative value to the top-left of the browser client window (this is from the DOM element). So, I have to do some more funny tricks, I have to get the window handle of my Chromedriver process then switch the relative coordinates from client handle to screen coordinate in order to move the mouse into the screen position. I also have to get the Chromedriver window rectangle to prepare for the bitmap rectangle to flush the screen data into this bitmap.

So, this is where I will call Win32 API functions in my powershell code. I will get the Chromedriver handle (assuming only one chrome process is running on my system for a simple check)

Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing
Add-Type @"
  using System;
  using System.Runtime.InteropServices;
  public class Win32Lib {
    [DllImport("user32.dll")]
    [return: MarshalAs(UnmanagedType.Bool)]
    public static extern bool GetWindowRect(IntPtr hWnd, out RECT lpRect);
    [DllImport("user32.dll")]
    [return: MarshalAs(UnmanagedType.Bool)]
    public static extern bool GetClientRect(IntPtr hWnd, out RECT lpRect);
    [DllImport("user32.dll")]
    [return: MarshalAs(UnmanagedType.Bool)]
    public static extern bool SetForegroundWindow(IntPtr hWnd);
    [DllImport("user32.dll")]
    [return: MarshalAs(UnmanagedType.Bool)]
    public static extern bool ClientToScreen(IntPtr hWnd, ref POINT lpPoint);
  }
  
  public struct RECT
  {
    public int Left;
    public int Top;
    public int Right;
    public int Bottom;
  }

  public struct POINT
  {
    public int x;
    public int y;
  }

"@

$chrome = Get-Process -Name "chrome" | ? { ($_.MainWindowHandle -ne 0) }
$chromeHwd = $chrome[0].MainWindowHandle
$rcBrowser = New-Object RECT
[Win32Lib]::GetWindowRect($this.chromeHwd, [ref]$rcBrowser)
$rcChrome = New-Object RECT (*)
$rcChrome = $rcBrowser (*)

(*)The last 2 lines when you move these into a class, the rcChrome is a property of that class so it will treat the ref parameter differently (I haven't checked out why)

As you might know, the window rectangle will have the toolbar, title bar ... will be different with the client rectangle (I use fullscreen mode to eliminate those toolbars but not sure on other applications) so we will use the bottom-left point to calculate the mouse position. This is a good trick hereSo, I get the element relative coordinates from IWebElement and then convert it into screen coordinates by a Win32 API. To let the mouse position into somewhere within the element, you should add some delta values (to X, Y axis) to move the mouse inside. I'm not sure why when I move the code into a function, the element object properties have to assign into variables to manipulate.

    [POINT] GetElementPoint([Object] $element) {
        $deltaX = 8
        $deltaY = -18
        $centerPoint = New-Object POINT
        $centerPoint.x = $element.Location.X + $deltaX
        $y = $element.Location.Y
        $h = $element.Size.Height
        $centerPoint.y = $y + $h + $deltaY
        return $centerPoint;
    }

This is how to flush the screen data into a bitmap rectangle

    [void] ScreenCapture([String]$outFile) {
        $width = $rcChrome.Right - $rcChrome.Left
        $height = $rcChrome.Bottom - $rcChrome.Top
        $bmp = New-Object -TypeName System.Drawing.Bitmap -ArgumentList $width, $height
        $graph = [System.Drawing.Graphics]::FromImage($bmp)
        $graph.CopyFromScreen($rcChrome.Left, $rcChrome.Top, 0, 0, $bmp.Size)
        $bmp.Save($outFile)
        $graph.Dispose()
        $bmp.Dispose()

    }

Then we will convert the element point relative coordinates into screen coordinates and capture the screen. 

    [void] CaptureElement([int]$step, [String]$eXPath) {
        $element = $seleniumDriver.FindElementByXPath("$eXPath")
        $ePoint = GetElementPoint($element)
        [Win32Lib]::ClientToScreen($chromeHwd, [ref]$ePoint)
        [Windows.Forms.Cursor]::Position = "$($ePoint.x), $($ePoint.y)"
        (New-Object OpenQA.Selenium.Interactions.Actions($seleniumDriver)).MoveToElement($element).Perform()
        Start-Sleep 2
        $imgFile = "$imgFolder\" + $this.imgName + "_$step.png"
        ScreenCapture($imgFile)
    }


Alright, that's all the tricks! I can run my flow and capture the screenshots with the browser tooltip displayed. It's quite simple flow so the basic knowledge above that can help me to fulfill my task.

See you next time with some more information about phantomjs or nightmarejs probably.

Cheers!

Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. Work with iframe.
    Based on the guide
    https://stackoverflow.com/questions/9942928/how-to-handle-iframe-in-webdriver

    You have to know structure to access correct controls.

    For example:
    Web page has structure
    - Top page
    + controls: id_a, XPath_b
    + iframeId1
    . iframeId11
    . iframeId12
    + iframeId2
    . iframeId21
    # controls: id_21a, XPath_21b
    . iframeId22

    We use $seleniumDriver variable above

    # if we want to access id or XPath on Top Page
    # go to Top page
    seleniumDriver.switchTo().defaultContent()
    seleniumDriver.FindElementById("id_a")
    seleniumDriver.FindElementByXPath("XPath_b")

    # if we want to access id or XPath on iframeId21
    # go to iframeId2 first
    $seleniumDriver.switchTo().frame('iframeId2')

    # go to iframeId21
    $seleniumDriver.switchTo().frame('iframeId21')
    seleniumDriver.FindElementById("id_21a")
    seleniumDriver.FindElementByXPath("XPath_21b")

    # Want to back iframeId2
    seleniumDriver.switchTo().parentFrame()

    ReplyDelete
  3. How do i write tools in locked down environments now? This was possible with internet explorer comobject but now this is the new thing and now you can do it? When you try to load the assembly you get this error:
    Operation is not supported. (Exception from HRESULT: 0x80131515)
    You cannot change the .config file either since it is locked down.

    ReplyDelete
    Replies
    1. Not sure how your comment relate to this post as we're writing client script to access the browser from client side. Sorry that I can't help because I don't understand your context

      Delete
    2. This was actually due to needing to go to properties(right click->properties) on the dlls to do unblock. that's it.

      Delete
  4. Do you know of a way to do this in powershell?
    https://stackoverflow.com/questions/45510973/headless-chrome-ignore-certificate-errors
    What good is automation without going headless? This is a pretty required change and causes massive javascript errors. It does seem to be TOTALLY headless not just a hidden window. Maybe there is a way to simulate just a hidden window like -comobject application.internetexplorer did with .show/.hide?

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. Great Post with valuable information. I am glad that I have visited this site. Share more updates

    Selenium Online Training
    Google Analytics Online course

    ReplyDelete
  7. I see you use New-Object to create a new browser window. Is there a way to use an existing browser window like 'Set objIE = CreateObject("InternetExplorer.Application")' and then match it to a specific URL?

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This post is so interactive and informative.keep update more information...
    dot net training in T Nagar
    Dot net training in Chennai

    ReplyDelete
  10. mobile commerce solution Main benefit of responsive web design is flexible to adapt different screens of website or mobile devices.

    ReplyDelete

Post a Comment