Other references¶
Representation of an RGB(A) color.
- class Color(r: int, g: int, b: int, a: int = 255)[source]¶
Representation of an 8-bit RGBA color.
- static from_str(color: str) webtraversallibrary.color.Color [source]¶
Creates a color from a “#RRGGBBAA” representation. Alpha is optional.
Configuration object for discovery workflow
- class Config(cfg: Optional[Iterable[Union[str, pathlib.Path, dict]]] = None)[source]¶
Represents a config object.
- static default(cfg: Optional[List[Union[str, pathlib.Path, dict]]] = None) webtraversallibrary.config.Config [source]¶
Creates a Config object based on all default values
Contains common library-level errors
- exception WebDriverSendError[source]¶
Any error related to custom sending commands to a WebDriver instance
- exception WindowClosedError[source]¶
Trying to access a browser instance that was closed by the user
Basic 2-dimensional geometric constructs: points, rectangles, etc.
- class Rectangle(minima: webtraversallibrary.geometry.Point, maxima: webtraversallibrary.geometry.Point)[source]¶
Represents a rectangle in a 2-dimensional plane.
- static bounding_box(rectangles: Sequence[webtraversallibrary.geometry.Rectangle]) webtraversallibrary.geometry.Rectangle [source]¶
Computes the bounding box of
rectangles
- property bounds: Tuple[float, float, float, float]¶
Returns min x, min y, max x, max y
- property center: webtraversallibrary.geometry.Point¶
Return the midpoint of the rectangle
- static centered_at(center: webtraversallibrary.geometry.Point, radius: float) webtraversallibrary.geometry.Rectangle [source]¶
A new square centered at
center
with side lengthradius
. (The technically correct term here is Apothem.)
- clip(other: webtraversallibrary.geometry.Rectangle) webtraversallibrary.geometry.Rectangle [source]¶
Return a new rectangle generated by clipping this one by the bounds of
other
.Similar to intersection, but clipping non-intersecting rectangles will result in a degenerate rectangle located on one of the edges of
other
- Returns
New, clipped
Rectangle
(possibly degenerate)
- contains(other: Union[webtraversallibrary.geometry.Point, webtraversallibrary.geometry.Rectangle]) bool [source]¶
Tests whether the rectangle contains
other
.- Returns
True
if other contained in the rectangle,False
otherwise.
- static empty() webtraversallibrary.geometry.Rectangle [source]¶
Returns a rectangle of zero area at origo.
- static from_list(*args) webtraversallibrary.geometry.Rectangle [source]¶
Converts tuple/list (x1, y1, x2, y2) to a Rectangle
- static intersection(rectangles: Sequence[webtraversallibrary.geometry.Rectangle]) webtraversallibrary.geometry.Rectangle [source]¶
Computes the rectangle which is the intersection of a sequence of
rectangles
. In case the intersection is empty, it returns an empty rectangle.
- resized(delta: float) webtraversallibrary.geometry.Rectangle [source]¶
Returns a resized rectangle, shrinked (inflated for
delta<0
) by2*delta
in width and in height.
- property x: float¶
The x-coordinate of the lower left vertex
- property y: float¶
The y-coordinate of the lower-left vertex
Module containing helper functions for graphics-related operations on webdrivers and snapshots.
- crop_image(image: PIL.Image.Image, rect: webtraversallibrary.geometry.Rectangle) PIL.Image.Image [source]¶
Crops the part of the image specified by its
rect
.Rectangle specified by
rect
must lie inside of the image bounds.
- draw_rect(image: PIL.Image.Image, rect: webtraversallibrary.geometry.Rectangle, color: webtraversallibrary.color.Color, width: int)[source]¶
Draws a bounding box around the specified rectangle on the image.
- draw_text(image: PIL.Image.Image, top_left: webtraversallibrary.geometry.Point, color: webtraversallibrary.color.Color, size: int, text: str)[source]¶
Draws text on a PIL image.
Collection of helper classes used in Workflow.
- class ClassifierCollection(classifiers: Iterable[webtraversallibrary.classifiers.Classifier])[source]¶
Helper class for predefined classifiers
- class FrameSwitcher(identifier: str, js: webtraversallibrary.javascript.JavascriptWrapper, driver: <module 'selenium.webdriver' from '/home/docs/checkouts/readthedocs.org/user_builds/webtraversallibrary/envs/latest/lib/python3.8/site-packages/selenium/webdriver/__init__.py'>)[source]¶
Helper class for entering and exiting iframes. Raises ElementNotFoundError if an iframe could not be found.
- class MonkeyPatches(patches: Optional[Dict[webtraversallibrary.selector.Selector, str]] = None)[source]¶
Helper class for monkeypatches
Wrapper functions around JavaScript code to be used from Selenium WebDriver. The main reason to store JS code in files instead of embedding it in Python code is convenience: it is more readable and has better IDE support.
- class JavascriptWrapper(driver: selenium.webdriver.remote.webdriver.WebDriver, config: Optional[webtraversallibrary.config.Config] = None)[source]¶
Helper class for executing built-in javascript scripts or custom files and snippets.
- annotate(location: webtraversallibrary.geometry.Point, color: webtraversallibrary.color.Color, size: int, text: str, background: webtraversallibrary.color.Color = Color(r=0, g=0, b=0, a=0), viewport: bool = False)[source]¶
Writes text with a given color on the page. Shares an HTML canvas with highlight.
- classmethod assemble_script(filenames: Iterable[pathlib.Path]) str [source]¶
Concatenates the contents of several Javascript files into one, with caching. :param filenames: Path to the JS files either in webtraversallibrary/js or an absolute path.
- clear_highlights(viewport: bool = False)[source]¶
Removes all highlights created by
highlight()
.
- click_element(selector: webtraversallibrary.selector.Selector)[source]¶
Clicks an element found by the given selector. Note: If more elements can be found, only one will be clicked.
- delete_element(selector: webtraversallibrary.selector.Selector)[source]¶
Deletes an element found by the given selector. Note: If more elements can be found, only one will be clicked.
- disable_animations()[source]¶
Turns off animation on the page. Works for jQuery by setting a certain flag and for CSS animations by injecting an additional style into the page code.
Mutates the web page.
- element_exists(selector: webtraversallibrary.selector.Selector) bool [source]¶
Returns True if an element exists, otherwise False.
- execute_file(filename: Union[pathlib.Path, Iterable[pathlib.Path]], *args, execute_async: bool = False) Any [source]¶
Execute the JavaScript code in given file and return the result :param filename: Path to the JS file(s) either in webtraversallibrary/js or an absolute path. :param execute_async: if True, will wait until the javascript code has called arguments[arguments.length-1] and will return its input arguments.
- execute_script(script: str, *args) Any [source]¶
Execute the JavaScript code in script and return the result :param script: path to the JS file relative to this package
- execute_script_async(script: str, *args) Any [source]¶
Execute the JavaScript code in script asynchronously and returns the result :param script: path to the JS file relative to this package
- fill_text(selector: webtraversallibrary.selector.Selector, value: str)[source]¶
Fills an element as found by a given selector with given text. Note: If more elements can be found, only one will be used.
- find_active_elements() list [source]¶
Uses a couple of heuristics to try and find all clickable elements in the page.
- find_iframe_name(identifier: str) str [source]¶
Looks for an iframe where name, ID, or class equals the identifier, and returns its name. Returns empty string if no matching object was found. :return: iframe name or empty string
- find_viewport() webtraversallibrary.geometry.Rectangle [source]¶
Get the width of the web browser window with content. :return: viewport height in pixels
- get_element_metadata() List[Dict[str, Any]] [source]¶
Collects metadata about web page DOM elements: their tags, some of the HTML attributes, position and size on the page, CSS styles and classes, inner text.
Each element on the page is assigned a unique within the scope of the page
wtl_uid
and has a pointer to the parent DOM element in thewtl_parent_uid
field and are not to be confused withid
attribute in HTML which is neither unique nor mandatory.- Returns
a list of JSON objects (in their Python dict form) with HTML attributes, additionally calculated properties and unique IDs. Refer to the script code for the keys’ names.
- get_full_height() int [source]¶
Get the full page height, i.e. the height of the document. :return: document height in pixels
- hide_position_fixed_elements(elements: Optional[List[str]] = None) dict [source]¶
Hides page elements that are fixed or sticky (the ones that stay on the page when you scroll) by setting their visibility to “hidden”.
Returns a map from element ids (wtl-uid) to the old visibility values.
Mutates the web page.
- highlight(selector: webtraversallibrary.selector.Selector, color: webtraversallibrary.color.Color, fill: bool = False, viewport: bool = False)[source]¶
Highlight an element as found by a given selector with arbitrary color and intensity. Note: If more elements can be found, only one will be highlighted. Shares an HTML canvas with annotate.
- is_page_loaded(*_) bool [source]¶
Applies some heuristics to check if the page is loaded. But since it is in general a hard question to answer, is known to be faulty in some cases.
- save_mhtml(filename: str)[source]¶
Executes the MHTML saving extension. Saves to the path specified in config.scraping.temp_path. Note: If the file already exists, it will not be overwritten.
- select(selector: webtraversallibrary.selector.Selector, value: str)[source]¶
Select an element of a dropdown (select) element. Note: If more elements can be found, only one will be used.
- show_position_fixed_elements(id_to_visibility: dict)[source]¶
Set the specified visibility to the elements with ids listed in
id_to_visibility
.The 2nd parameter is expected to be the (possibly accumulated) output of hide_position_fixed_elements.
Mutates the web page (hopefully undoing the changes made by hide_position_fixed_elements)
- safe_selenium_method(func)[source]¶
Handles errors thrown in the browser while executing javascript and outputs information to the log. Note: This is a clumsy decorator for instance methods and assumes there is a self.driver member.
Helper function for logging.
- setup_logging(log_dir: Optional[pathlib.Path] = None, logging_level: int = 20)[source]¶
Sets up logging: create a directory to write log files to, configure handlers. Sets sane default values for in-house and third-party modules.
Will remove any existing logging handlers with the name “webtraversallibrary” before proceeding.
- Parameters
log_dir – directory to write log files to.
logging_level – level of logging you wish to have, accepts number or logging.LEVEL
Helper functions to check versions and existence of installed dependencies.
- get_current_os() webtraversallibrary.driver_check.os_functions.OS [source]¶
Gets the current OS of the machine running.
- get_driver_location(driver: webtraversallibrary.driver_check.os_functions.Drivers, os: Optional[webtraversallibrary.driver_check.os_functions.OS] = None) str [source]¶
Gets the location of the driver.
- get_driver_version(driver: webtraversallibrary.driver_check.os_functions.Drivers, os: Optional[webtraversallibrary.driver_check.os_functions.OS] = None) str [source]¶
Gets the driver version.
- is_driver_installed(driver: webtraversallibrary.driver_check.os_functions.Drivers, os: Optional[webtraversallibrary.driver_check.os_functions.OS] = None) bool [source]¶
Checks if a given driver is installed on the OS.
Module collecting helper functions for common processing tasks, such as muting stdout within a context.
- class Alarm(timeout)[source]¶
Helper class to run a timeout thread on Windows
This constructor should always be called with keyword arguments. Arguments are:
group should be None; reserved for future extension when a ThreadGroup class is implemented.
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.
args is the argument tuple for the target invocation. Defaults to ().
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.
If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.
- run()[source]¶
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- class TimeoutContext(n_seconds, error_class=<class 'TimeoutError'>)[source]¶
Uses
signal
to raise TimeoutError within the block, if execution went over a specified timeout.
- class cached_property(method)[source]¶
Decorator that caches a property return value and will return it on later calls. Adapted from The Python Cookbok, 2nd edition.
Note
If you want to map different arguments to values, use functools.lru_cache!
Abstraction layer for a screenshot of a site
- class Screenshot(name: str, image: PIL.Image.Image)[source]¶
Abstraction layer for a screenshot of a site, allowing for various annotations.
- annotate(top_left: webtraversallibrary.geometry.Point, color: webtraversallibrary.color.Color, size: int, text: str)[source]¶
Writes text with a given color on the screenshot.
- classmethod capture(name: str, driver: selenium.webdriver.remote.webdriver.WebDriver, scale: float = 1.0, max_page_height: int = 0) webtraversallibrary.screenshot.Screenshot [source]¶
Creates a snapshot on the given webdriver under certain conditions.
- classmethod capture_viewport(name: str, driver: selenium.webdriver.remote.webdriver.WebDriver, scale: float = 1.0) webtraversallibrary.screenshot.Screenshot [source]¶
Creates a screenshot of the current viewport of a given webdriver. Scales the image by some pixel ratio, if given. Uses PIL as a backend.
- highlight(rect: webtraversallibrary.geometry.Rectangle, color: webtraversallibrary.color.Color, text: str = '', width: int = 1)[source]¶
Draws a colored rectangle on the screenshot. Can also annotate with a text below the rectangle, if given.
- save(path: pathlib.Path, suffix: str = '')[source]¶
Saves screenshot to given path. Filename consists of the screenshot name and an optional suffix.
- property size: webtraversallibrary.geometry.Point¶
Returns a (width, height) Point of the screenshot size in pixels
This module contains heuristics for generating selectors.
- class Selector(css: str = '*', xpath: str = '/', iframe: Optional[str] = None)[source]¶
Web element selector based on CSS and XPATH. You may also specify an identifier (name or ID) of an iframe in which the given element is located. The class itself provides no guarantees on whether the selector is unique or even matches anything. The
iframe
value can be used if this selector refers to an element inside an iframe. Specify as ID or name.- classmethod build(bs4_soup: bs4.BeautifulSoup, target: Union[bs4.element.Tag, int]) webtraversallibrary.selector.Selector [source]¶
Compute xpath and css of a
target
in a bs4.BeautifulSoup. Will be verbose. Use a separate generalizer if you want reusable selectors.
Base representation of the current state of a tab.
- class View(name: str, snapshot: webtraversallibrary.snapshot.PageSnapshot, actions: webtraversallibrary.actions.Actions = <factory>, tags: typing.Set[str] = <factory>, metadata: typing.Dict[typing.Any, typing.Any] = <factory>)[source]¶
Base representation of the current state of a tab. Holds a snapshot, a list of available actions, and output from the prior classifiers. Note: The metadata field can be added to arbitrarily, and contents will be deeply copied to the next view (do not store too large objects!) If you need large metadata storage, use the workflow.metadata instead.
Module for different webdrivers considered.
- send(driver: selenium.webdriver.remote.webdriver.WebDriver, cmd: str, params: Optional[dict] = None) int [source]¶
Send command to the webdriver, return resulting status code.
- setup_driver(config: webtraversallibrary.config.Config, profile_path: Optional[pathlib.Path] = None, preload_callbacks: Optional[Iterable[pathlib.Path]] = None) selenium.webdriver.remote.webdriver.WebDriver [source]¶
Creates a WebDriver object with the given configuration. :param: configuration Configuration dictionary :return: A WebDriver instance