diff --git a/README.en.md b/README.en.md index f49d32d..7f1401f 100644 --- a/README.en.md +++ b/README.en.md @@ -1,167 +1,301 @@ -- 中文README:[点击这里](https://github.com/g1879/DrissionPage/blob/master/README.zh-cn.md) -- 示例:[点击这里](https://gitee.com/g1879/DrissionPage-demos) - # Introduction + *** -DrissionPage, the combination of driver and session, is a python-based Web automation operation integration tool. -It realizes the seamless switching between selenium and requests. +DrissionPage, a combination of driver and session, is a python- based Web automation operation integration tool. +It achieves seamless switching between selenium and requests. Therefore, the convenience of selenium and the high efficiency of requests can be balanced. -It uses POM mode to encapsulate common methods of page elements, which is very suitable for automatic operation function expansion. -What's even better is that its usage is very concise and user-friendly, with a small amount of code and friendly to novices. +It integrates the common functions of the page, the API of the two modes is consistent, and it is easy to use. +It uses the POM mode to encapsulate the commonly used methods of page elements, which is very suitable for automatic operation function expansion. +What's even better is that its usage is very concise and user- friendly, with a small amount of code and friendly to novices. -**Project address:** +**project address:** - https://github.com/g1879/DrissionPage - https://gitee.com/g1879/DrissionPage -**Demos:** [https://gitee.com/g1879/DrissionPage-demos](https://gitee.com/g1879/DrissionPage-demos) +**Sample address:** [Use DrissionPage to crawl common websites and automation](https://gitee.com/g1879/DrissionPage- demos) -**email:** g1879@qq.com +**Contact Email: ** g1879@qq.com + +# Concept and background + +*** + +## Idea + +**Concise, easy to use, extensible** + + + +## Background + +When the requests crawler faces the website to be logged in, it has to analyze data packets and JS source code, construct complex requests, and often has to deal with anti- climbing methods such as verification codes, JS confusion, and signature parameters, which has a high threshold. If the data is generated by JS calculation, the calculation process must be reproduced. The experience is not good and the development efficiency is not high. +Using selenium, these pits can be bypassed to a large extent, but selenium is not efficient. Therefore, this library combines selenium and requests into one, switches the corresponding mode when different needs, and provides a user- friendly method to improve development and operation efficiency. +In addition to merging the two, the library also encapsulates common functions in web pages, simplifies selenium's operations and statements. When used for web page automation, it reduces the consideration of details, focuses on function implementation, and makes it more convenient to use. +Keep everything simple, try to provide simple and direct usage, and be more friendly to novices. # Features *** -- Allows seamless switching between selenium and requests, sharing session. -- Use POM mode to encapsulate common methods for easy expansion. -- The two modes provide a unified operation method with consistent user experience. -- Humanized operation method of page elements to reduce the workload of page analysis and coding. -- Some common functions (such as click) have been optimized to better meet the actual needs. -- Easy configuration method to get rid of the cumbersome browser configuration. +- The first pursuit is simple code. +- Allow seamless switching between selenium and requests, sharing session. +- The two modes provide consistent APIs, and the user experience is consistent. +- Humanized page element operation mode, reducing the workload of page analysis and coding. +- The common functions are integrated and optimized, which is more in line with actual needs. +- Compatible with selenium code to facilitate project migration. +- Use POM mode packaging for easy expansion. +- A unified file download method makes up for the lack of browser downloads. +- Simple configuration method, get rid of tedious browser configuration. -# Idea +# Project structure *** -## Simple, Easy and Extensible +![](https://gitee.com/g1879/DrissionPage-demos/raw/master/pics/20201110161811.jpg) -- DrissionPage takes concise code as the first pursuit, streamlines long statements and completely retains its functions. -- DrissionPage encapsulates many commonly used functions and is more convenient to use. -- The core of DrissionPage is a page class, which can directly derive subclass pages to adapt to various scenarios. -- Simple browser configuration method, get rid of tedious settings. +# Simple demo -The following code implements exactly the same function, comparing the code amounts of the two: +*** -1. Use explicit wait to find all elements whose text contains 'some text' +## Comparison with selenium code + +The following code implements exactly the same function, compare the amount of code between the two: + +- Use explicit waiting to find all elements that contain some text ```python -# selenium: -element = WebDriverWait(driver).until(ec.presence_of_all_elements_located((By.XPATH, '//*[contains(text(), "some text")]'))) -# DrissionPage: -element = page.ele('some text') +# Use selenium: +element = WebDriverWait(driver).until(ec.presence_of_all_elements_located((By.XPATH,'//*[contains(text(), "some text")]'))) + +# Use DrissionPage: +element = page('some text') ``` -2. Jump to the first tab + + +- Jump to the first tab ```python -# selenium +# Use selenium: driver.switch_to.window(driver.window_handles[0]) -# DrissionPage + +# Use DrissionPage: page.to_tab(0) ``` -3. Drag an element + + +- Select drop- down list by text ```python -# selenium +# Use selenium: +from selenium.webdriver.support.select import Select +select_element = Select(element) +select_element.select_by_visible_text('text') + +# Use DrissionPage: +element.select('text') +``` + + + +- Drag and drop an element + +```python +# Use selenium: ActionChains(driver).drag_and_drop(ele1, ele2).perform() -# DrissionPage + +# Use DrissionPage: ele1.drag_to(ele2) ``` -4. Scroll the window to the bottom (keep the horizontal scroll bar unchanged) + + +- Scroll the window to the bottom (keep the horizontal scroll bar unchanged) ```python -# selenium -driver.execute_script("window.scrollTo(document.documentElement.scrollLeft,document.body.scrollHeight);") -# DrissionPage +# Use selenium: +driver.execute_script("window.scrollTo(document.documentElement.scrollLeft, document.body.scrollHeight);") + +# Use DrissionPage: page.scroll_to('bottom') ``` -5. Set headless mode + + +- Set headless mode ```python -# selenium +# Use selenium: options = webdriver.ChromeOptions() -options.add_argument("--headless") -# DrissionPage +options.add_argument("- - headless") + +# Use DrissionPage: set_headless() ``` -# Background - -*** - -When a novice learns a web crawler, in the face of a website that needs to log in, it is necessary to analyze data packets, JS source code, construct complex requests, and often have to deal with verification codes, JS confusion, signature parameters and other measures, which is difficult to learn. When acquiring data, some data is generated by JavaScript calculation. If you only get the source data, you must also reproduce the calculation process. The experience is not good and the development efficiency is not high. - -Using selenium can avoid these problems to a great extent, but selenium is not efficient. Therefore, what this library has to do is to combine selenium and requests into one, and provide a humanized use method to improve development and operation efficiency. - -In addition to merging the two, this library also encapsulates commonly used functions in units of web pages, which simplifies selenium operations and statements. When used in web page automation operations, it reduces the consideration of details, focuses on function implementation, and is more convenient to use. - -The design concept of this library is to keep everything simple, try to provide a simple and direct method of use, and is more friendly to novices. - -# Simple Demo - -*** - -Example: Log in to the website with selenium, then switch to requests to read the web page. +- Get pseudo element content ```python -page = MixPage() # Create page object, default driver mode -page.get('https://gitee.com/profile') # Visit personal center page (redirect to the login page) +# Use selenium: +text = webdriver.execute_script('return window.getComputedStyle(arguments[0], "::after").getPropertyValue("content");', element) -page.ele('@id:user_login').input('your_user_name') # Use selenium to log in +# Use DrissionPage: +text = element.after +``` + + + +- Get shadow- root + +```python +# Use selenium: +shadow_element = webdriver.execute_script('return arguments[0].shadowRoot', element) + +# Use DrissionPage: +shadow_element = element.shadow_root +``` + + + +- Use xpath to get attributes or nodes + +```python +# Use selenium: +The usage is not supported + +# Use DrissionPage: +class_name = element('xpath://div[@id="div_id"]/@class') +text = element('xpath://div[@id="div_id"]/text()[2]') +``` + + + +## Compare with requests code + +The following code implements exactly the same function, compare the amount of code between the two: + +- Get element content + +```python +url ='https://baike.baidu.com/item/python' + +# Use requests: +from lxml import etree +headers = {'User- Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36'} +response = requests.get(url, headers = headers) +html = etree.HTML(response.text) +element = html.xpath('//h1')[0] +title = element.text + +# Use DrissionPage: +page = MixPage('s') +page.get(url) +title = page('tag:h1').text +``` + +Tips: DrissionPage comes with default headers + + + +- download file + +```python +url ='https://www.baidu.com/img/flexible/logo/pc/result.png' +save_path = r'C:\download' + +# Use requests: +r = requests.get(url) +with open(f'{save_path}\\img.png','wb') as fd: + for chunk in r.iter_content(): + fd.write(chunk) + +# Use DrissionPage: +page.download(url, save_path,'img') # Support renaming and handle file name conflicts +``` + + + +## Mode switch + +Log in to the website with selenium, and then switch to requests to read the web page. Both will share login information. + +```python +page = MixPage() # Create page object, default driver mode +page.get('https://gitee.com/profile') # Visit the personal center page (not logged in, redirect to the login page) + +page.ele('@id:user_login').input('your_user_name') # Use selenium to enter the account password to log in page.ele('@id:user_password').input('your_password\n') -page.change_mode() # Switch to session mode -print('Title after login:', page.title, '\n') # Output of session mode after login +page.change_mode() # Switch to session mode +print('Title after login:', page.title,'\n') # session mode output after login ``` Output: ``` -Title after login: Dashboard - Gitee +Title after login: Personal Information- Code Cloud Gitee.com ``` -Example: Find element and print attributes. + + +## Get and print element attributes ```python -foot = page.ele('@id:footer-left') # Find elements by id -first_col = foot.ele('css:>div') # Find first div element in the lower level by css selector. -lnk = first_col.ele('text:Git Branching') # Find elements by text content -text = lnk.text # Get element text -href = lnk.attr('href') # Get element attribute value +# Connect the previous code +foot = page.ele('@id:footer- left') # find element by id +first_col = foot.ele('css:>div') # Use the css selector to find the element in the lower level of the element (the first one) +lnk = first_col.ele('text: Command Learning') # Use text content to find elements +text = lnk.text # Get element text +href = lnk.attr('href') # Get element attribute value -print(first_col) -print(text, href) +print(text, href,'\n') + +# Concise mode series search +text = page('@id:footer- left')('css:>div')('text:command learning').text +print(text) ``` Output: ``` - -Learn Git Branching https://oschina.gitee.io/learn-git-branching/ +Git command learning https://oschina.gitee.io/learn- git- branching/ + +Git command learning ``` -# Install + + +## download file + +```python +url ='https://www.baidu.com/img/flexible/logo/pc/result.png' +save_path = r'C:\download' +page.download(url, save_path) +``` + + + +# Installation *** ``` pip install DrissionPage ``` -Only python3.6 and above are supported. Driver mode currently only supports chrome. -To use the driver mode, you must download chrome and ** corresponding version ** of chromedriver. [chromedriver download](https://chromedriver.chromium.org/downloads) -Currently only tested in the Windows environment. +Only supports python3.6 and above, and the driver mode currently only supports chrome. +To use the driver mode, you must download chrome and **corresponding version** of chromedriver. [[chromedriver download]](https://chromedriver.chromium.org/downloads) +It has only been tested in the Windows environment. # Instructions *** -## import +## Import module ```python from DrissionPage import * @@ -171,209 +305,378 @@ from DrissionPage import * ## Initialization -Before using selenium, you must configure the path of chrome.exe and chromedriver.exe and ensure that their versions match. - If you only use session mode, you can skip this section. -There are three ways to configure the path: +Before using selenium, you must configure the path of chrome.exe and chromedriver.exe and ensure that their versions match. +There are three ways to configure the path: - Write two paths to system variables. -- Pass in the path manually when using it. +- Manually pass in the path when in use. - Write the path to the ini file of this library (recommended). -If you choose the third method, please run these lines of code before using the library for the first time, and record these two paths in the ini file. +If you choose the third method, please run these lines of code before using this library for the first time and record these two paths in the ini file. ```python from DrissionPage.easy_set import set_paths -driver_path = 'D:\\chrome\\chromedriver.exe' # Your chromedriver.exe path, optional -chrome_path = 'D:\\chrome\\chrome.exe' # Your chrome.exe path, optional +driver_path ='D:\\chrome\\chromedriver.exe' # Your chromedriver.exe path, optional +chrome_path ='D:\\chrome\\chrome.exe' # Your chrome.exe path, optional set_paths(driver_path, chrome_path) ``` -This method also checks if the chrome and chromedriver versions match, and displays: +This method also checks whether the chrome and chromedriver versions match, and displays: ``` -版本匹配,可正常使用。 +The version matches and can be used normally. -or +# Or -出现异常: +Abnormal: Message: session not created: Chrome version must be between 70 and 73 (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19631 x86_64) -chromedriver下载网址:https://chromedriver.chromium.org/downloads +chromedriver download URL: https://chromedriver.chromium.org/downloads ``` -After the inspection is passed, the driver mode can be used normally. +After passing the check, you can use the driver mode normally. In addition to the above two paths, this method can also set the following paths: ```python -debugger_address # Opened browser address, eg. 127.0.0.1:9222 -download_path # Download path -global_tmp_path # Temporary folder path +debugger_address # Debug browser address, such as: 127.0.0.1:9222 +download_path # Download file path +global_tmp_path # Temporary folder path user_data_path # User data path -cache_path # Cache path +cache_path # cache path ``` -Tips: +Tips: -- Different projects may require different versions of chrome and chromedriver. You can also save multiple ini files to use as needed. -- It is recommended to use the green version of chrome, and manually set the path to avoid browser upgrades that do not match the chromedriver version. -- It is recommended to set debugger_address when debugging a project, and use a manually opened browser to debug, saving time and effort. +- Different projects may require different versions of chrome and chromedriver. You can also save multiple ini files and use them as needed. +- It is recommended to use the green version of chrome, and manually set the path, to avoid browser upgrades causing mismatch with the chromedriver version. +- It is recommended to set the debugger_address when debugging the project and use the manually opened browser to debug, saving time and effort. -## Create Drission Object +## Create drive object Drission -Drission objects are used to manage driver and session objects.Drission objects are used to transmit drives when multiple pages work together, enabling multiple page classes to control the same browser or Session object. -It can be created by directly reading the configuration information of the ini file, or it can be passed in during initialization. +The creation step is not necessary. If you want to get started quickly, you can skip this section. The MixPage object will automatically create the object. + +Drission objects are used to manage driver and session objects. When multiple pages work together, the Drission object is used to pass the driver, so that multiple page classes can control the same browser or Session object. +The configuration information of the ini file can be directly read and created, or the configuration information can be passed in during initialization. ```python -# Created by default ini file -drission = Drission() +# Created from the default ini file +drission = Drission() # Created by other ini files -drission = Drission(ini_path = 'D:\\settings.ini') +drission = Drission(ini_path ='D:\\settings.ini') ``` To manually pass in the configuration: ```python -# Create with incoming configuration information (ignore ini file) +# Create with the incoming configuration information (ignore the ini file) from DrissionPage.config import DriverOptions -driver_options = DriverOptions() # Create driver configuration object -driver_options.binary_location = 'D:\\chrome\\chrome.exe' # chrome.exe path -session_options = {'headers': {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)'}} -driver_path = 'D:\\chrome\\chromedriver.exe' # driver_path path +driver_options = DriverOptions() # Create driver configuration object +driver_options.binary_location ='D:\\chrome\\chrome.exe' # chrome.exe path +session_options = {'headers': {'User- Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)'}} +driver_path ='D:\\chrome\\chromedriver.exe' # driver_path path -drission = Drission(driver_options, session_options, driver_path) # Create object through incoming configuration +drission = Drission(driver_options, session_options, driver_path) # incoming configuration ``` -## Use MixPage objects +## Use page object MixPage -The MixPage page object encapsulates commonly used web page operations and implements the switch between driver and session mode. -MixPage must receive a Drission object and use its driver or session. If no one is sent, MixPage will create a Drission itself (Use configurations from the default INI file). +The MixPage page object encapsulates common web page operations and realizes the switch between driver and session modes. +MixPage must receive a Drission object and use the driver or session in it. If it is not passed in, MixPage will create a Drission by itself (using the configuration of the default ini file). -Tips: When multi-page objects work together, remember to manually create Drission objects and transfer them to page objects for use. Otherwise, page objects can create their own Drission objects, rendering the information impossible to transmit. +Tips: When multiple page objects work together, remember to manually create a Drission object and pass it to the page object for use. Otherwise, the page objects will each create their own Drission objects, making the information unable to pass. + +### Create Object + +There are three ways to create objects: simple, passing in Drission objects, and passing in configuration. Can be selected according to actual needs. ```python -# Ways to create MixPage objects -page = MixPage() # Automatic creation of Drission objects is recommended only for single page objects -page = MixPage('s') # Quickly create in session mode, automatically create a Drission object +# Simple creation method, automatically create Drission objects with ini file default configuration +page = MixPage() +page = MixPage('s') -page = MixPage(drission) # Created by passing in a Drission object -page = MixPage(drission, mode='s', timeout=5) # session mode, waiting time 5 seconds (default 10 seconds) +# Create by passing in the Drission object +page = MixPage(drission) +page = MixPage(drission, mode='s', timeout=5) # session mode, waiting time is 5 seconds (default 10 seconds) -# Visit URL -page.get(url, **kwargs) -page.post(url, data, **kwargs) # Only session mode has post method.Call the post method will automatically switch to session mode. - -# Switch mode -page.change_mode() - -# Page operation -print(page.html) # Page source code -page.run_script(js) # Run js statement -page.close_other_tabs(num) # Close other tabs -page.to_iframe(iframe) # switch to iframe -page.screenshot(path) # Screenshot of the page -page.scrool_to_see(element) # Scroll until an element is visible -# See APIs for details... +# Create with incoming configuration information +page = MixPage(driver_options=DriverOption, session_options=SessionOption) # default d mode ``` -Tips:Calling a method that belongs only to the driver mode will automatically switch to the driver mode. + + +### visit website + +If there is an error in the connection, the program will automatically retry twice. The number of retries and the waiting interval can be specified. + +```python +# Default mode +page.get(url) +page.post(url, data, **kwargs) # Only session mode has post method + +# Specify the number of retries and interval +page.get(url, retry=5, interval=0.5) +``` -## Find elements +### Switch mode -ele() returns the first eligible element, eles() returns a list of all eligible elements. -You can use these two functions under the page object or element object to find the subordinate elements. +Switch between s and d modes, the cookies and the URL you are visiting will be automatically synchronized when switching. -Note: The element search timeout is 10 seconds by default, you can also set it as required. +```python +page.change_mode(go=False) # If go is False, it means that the url is not redirected +``` + + + +### Page properties + +```python +page.url # currently visited url +page.mode # current mode +page.drission # Dirssion object currently in use +page.driver # WebDirver object currently in use +page.session # Session object currently in use +page.cookies # Get cookies information +page.html # Page source code +page.title # Current page title + +# d mode unique: +page.tabs_count # Return the number of tab pages +page.tab_handles # Return to the handle list of all tabs +page.current_tab_num # Return the serial number of the current tab page +page.current_tab_handle # Return to the current tab page handle +``` + + + +### Page operation + +When calling a method that only belongs to d mode, it will automatically switch to d mode. See APIs for detailed usage. + +```python +page.change_mode() # switch mode +page.cookies_to_session() # Copy cookies from WebDriver object to Session object +page.cookies_to_driver() # Copy cookies from Session object to WebDriver object +page.get(url, retry, interval, **kwargs) # Use get to access the web page, you can specify the number of retries and the interval +page.ele(loc_or_ele, timeout) # Get the first element, node or attribute that meets the conditions +page.eles(loc_or_ele, timeout) # Get all eligible elements, nodes or attributes +page.download(url, save_path, rename, file_exists, **kwargs) # download file +page.close_driver() # Close the WebDriver object +page.close_session() # Close the Session object + +# s mode unique: +page.post(url, data, retry, interval, **kwargs) # To access the webpage in post mode, you can specify the number of retries and the interval + +# d mode unique: +page.wait_ele(loc_or_ele, mode, timeout) # Wait for the element to be deleted, displayed, and hidden from the dom +page.run_script(js, *args) # Run js statement +page.create_tab(url) # Create and locate a tab page, which is at the end +page.to_tab(num_or_handle) # Jump to tab page +page.close_current_tab() # Close the current tab page +page.close_other_tabs(num) # Close other tabs +page.to_iframe(iframe) # cut into iframe +page.screenshot(path) # Page screenshot +page.scrool_to_see(element) # Scroll until an element is visible +page.scroll_to(mode, pixel) # Scroll the page as indicated by the parameter, and the scroll direction is optional:'top','bottom','rightmost','leftmost','up','down','left', ' right' +page.refresh() # refresh the current page +page.back() # Browser back +page.et_window_size(x, y) # Set the browser window size, maximize by default +page.check_page() # Check whether the page meets expectations +page.chrome_downloading() # Get the list of files that chrome is downloading +page.process_alert(mode, text) # Process the prompt box +``` + + + +## Find element + +ele() returns the first eligible element, and eles() returns a list of all eligible elements. +You can use these two functions under the page object or element object to find subordinate elements. + +page.eles() and element.eles() search and return a list of all elements that meet the conditions. + +Note: The default element search timeout is 10 seconds, you can also set it as needed. ```python # Find by attribute -page.ele('@id:ele_id', timeout = 2) # Find the element with id ele_id and set the waiting time to 2 seconds -page.eles('@class') # Find all elements with ele_class -page.eles('@class:class_name') # Find all elements with class equal to ele_class +page.ele('@id:ele_id', timeout = 2) # Find the element whose id is ele_id and set the waiting time for 2 seconds +page.eles('@class') # Find all elements with class attribute +page.eles('@class:class_name') # Find all elements that have ele_class in class +page.eles('@class=class_name') # Find all elements whose class is equal to ele_class -# Search by tag name -page.ele('tag:li') # Find the first li element -page.eles('tag:li') # Find all li elements +# Find by tag name +page.ele('tag:li') # Find the first li element +page.eles('tag:li') # Find all li elements -# Search by tag name and attributes -page.ele('tag:div@class=div_class') # Find the first div element whose class is div_class -page.ele('tag:div@class:ele_class') # Find the div element with ele_class in class -page.ele('tag:div@class=ele_class') # Find div elements with class equal to ele_class -page.ele('tag:div@text():search_text') # Find the div element whose text contains search_text -page.ele('tag:div@text()=search_text') # Find div elements with text equal to search_text +# Find according to tag name and attributes +page.ele('tag:div@class=div_class') # Find the div element whose class is div_class +page.ele('tag:div@class:ele_class') # Find div elements whose class contains ele_class +page.ele('tag:div@class=ele_class') # Find div elements whose class is equal to ele_class +page.ele('tag:div@text():search_text') # Find div elements whose text contains search_text +page.ele('tag:div@text()=search_text') # Find the div element whose text is equal to search_text -# Find by text -page.ele('search text') # Find elements containing incoming text -page.eles('text:search text') # If the text starts with @, tag :, css :, xpath :, text :, add text: in front to avoid conflicts -page.eles('text=search text') # Elements with text equal to search_text +# Find according to text content +page.ele('search text') # find the element containing the incoming text +page.eles('text:search text') # If the text starts with @, tag:, css:, xpath:, text:, add text: in front to avoid conflicts +page.eles('text=search text') # The text is equal to the element of search_text -# Find by xpath or css selector -page.eles('xpath://div[@class="ele_class"]') -page.eles('css:div.ele_class') +# Find according to xpath or css selector +page.eles('xpath://div[@class="ele_class"]') +page.eles('css:div.ele_class') -# Find by loc -loc1 = By.ID, 'ele_id' -loc2 = By.XPATH, '//div[@class="ele_class"]' +# Find according to loc +loc1 = By.ID,'ele_id' +loc2 = By.XPATH,'//div[@class="ele_class"]' page.ele(loc1) page.ele(loc2) -# Find subordinate elements +# Find lower- level elements element = page.ele('@id:ele_id') -element.ele('@class:class_name') # Find the first element whose class is ele_class at the lower level of element -element.eles('tag:li') # Find all li elements below ele_id +element.ele('@class:class_name') # Find the first element whose class is ele_class at the lower level of element +element.eles('tag:li') # find all li elements under ele_id # Find by location -element.parent # Parent element -elementnext # Next sibling element -element.prev # Previous brother element +element.parent # parent element +element.next # next sibling element +element.prev # previous sibling element -# Tandem search +# Get shadow- dom, only support open shadow- root +ele1 = element.shadow_root.ele('tag:div') + +# Chain search page.ele('@id:ele_id').ele('tag:div').next.ele('some text').eles('tag:a') + +# Simplified writing +ele1 = page('@id:ele_id')('@class:class_name') +ele2 = ele1('tag:li') ``` -## Element operations +## Get element attributes ```python -# Get element information -element = page.ele('@id:ele_id') -element.html # Return html inside element -element.text # Returns the text value after removing the html tag in the element -element.tag # Return element tag name -element.attrs # Returns a dictionary of all attributes of the element -element.attr('class') # Returns the element's class attribute -element.is_valid # Driver mode only, used to determine whether the element is still available +element.html # return element outerHTML +element.inner_html # Return element innerHTML +element.tag # return element tag name +element.text # return element innerText value +element.texts() # Returns the text of all direct child nodes in the element, including elements and text nodes, you can specify to return only text nodes +element.attrs # Return a dictionary of all attributes of the element +element.attr(attr) # Return the value of the specified attribute of the element +element.css_path # Return the absolute css path of the element +element.xpath # Return the absolute xpath path of the element +element.parent # return element parent element +element.next # Return the next sibling element of the element +element.prev # Return the previous sibling element of the element +element.parents(num) # Return the numth parent element +element.nexts(num, mode) # Return the following elements or nodes +element.prevs(num, mode) # Return the first few elements or nodes +element.ele(loc_or_str, timeout) # Return the first sub- element, attribute or node text of the current element that meets the conditions +element.eles(loc_or_str, timeout) # Return all eligible sub- elements, attributes or node texts of the current element -# Operating element -element.click() # Click element -element.input(text) # Enter text -element.run_script(js) # Run js -element.submit() # submit Form -element.clear() # Clear element -element.is_selected() # Is selected -element.is_enabled() # it's usable or not -element.is_displayed() # Is it visible -element.is_valid() # Whether it is valid, used to judge the situation where the page jump causes the element to fail -element.select(text) # Select the drop-down list option -element.set_attr(attr,value) # Set element attributes -element.size # Returns the element size -element.location # Returns the element position +# Driver mode unique: +element.before # Get pseudo element before content +element.after # Get pseudo element after content +element.is_valid # Used to determine whether the element is still in dom +element.size # Get element size +element.location # Get element location +element.shadow_root # Get the ShadowRoot element under the element +element.get_style_property(style, pseudo_ele) # Get element style attribute value, can get pseudo element +element.is_selected() # Returns whether the element is selected +element.is_enabled() # Returns whether the element is available +element.is_displayed() # Returns whether the element is visible ``` -## Chrome shortcut settings +## Element operation + +Element operation is unique to d mode. Calling the following method will automatically switch to d mode. + +```python +element.click(by_js) # Click the element, you can choose whether to click with js +element.input(value) # input text +element.run_script(js) # Run JavaScript script on the element +element.submit() # Submit +element.clear() # Clear the element +element.screenshot(path, filename) # Take a screenshot of the element +element.select(text) # Select the drop- down list based on the text +element.set_attr(attr, value) # Set element attribute value +element.drag(x, y, speed, shake) # Drag the relative distance of the element, you can set the speed and whether to shake randomly +element.drag_to(ele_or_loc, speed, shake) # Drag the element to another element or a certain coordinate, you can set the speed and whether to shake randomly +element.hover() # Hover the mouse over the element +``` + + + +## Docking with selenium code + +The DrissionPage code can be seamlessly spliced ​​with the selenium code, either directly using the selenium WebDriver object, or using its own WebDriver everywhere for the selenium code. Make the migration of existing projects very convenient. + +### selenium to DrissionPage + +```python +driver = webdriver.Chrome() +driver.get('https://www.baidu.com') + +page = MixPage(Drission(driver)) # Pass the driver to Drission, create a MixPage object +print(page.title) # Print result: You will know by clicking on Baidu +``` + + + +### DrissionPage to selenium + +```python +page = MixPage() +page.get('https://www.baidu.com') + +driver = page.driver # Get the WebDriver object from the MixPage object +print(driver.title) # Print results: You will know by clicking on Baidu +``` + + + +## download file + +Selenium lacks effective management of browser download files, and it is difficult to detect download status, rename, and fail management. +Using requests to download files can better achieve the above functions, but the code is more cumbersome. +Therefore, DrissionPage encapsulates the download method and integrates the advantages of the two. You can obtain login information from selenium and download it with requests. +To make up for the shortcomings of selenium, make the download simple and efficient. + +### Features + +- Specify download path +- Rename the file without filling in the extension, the program will automatically add +- When there is a file with the same name, you can choose to rename, overwrite, skip, etc. +- Show download progress +- Support post method +- Support custom connection parameters + +### Demo + +```python +url ='https://www.baidu.com/img/flexible/logo/pc/result.png' # file url +save_path = r'C:\download' # save path + +# Rename to img.png, and automatically add a serial number to the end of the file name when there is a duplicate name to display the download progress +page.download(url, save_path,'img','rename', show_msg=True) +``` + + + + +## Chrome Quick Settings The configuration of chrome is very cumbersome. In order to simplify the use, this library provides setting methods for common configurations. @@ -382,153 +685,159 @@ The configuration of chrome is very cumbersome. In order to simplify the use, th The DriverOptions object inherits from the Options object of selenium.webdriver.chrome.options, and the following methods are added to it: ```python -remove_argument(value) # Delete an argument value -remove_experimental_option(key) # Delete an experimental_option setting -remove_all_extensions() # Delete all plugins -save() # Save the configuration to the default ini file -save('D:\\settings.ini') # Save to other path -set_argument(arg, value) # Set argument attribute -set_headless(on_off) # Set whether to use interfaceless mode -set_no_imgs(on_off) # Set whether to load pictures -set_no_js(on_off) # Set whether to disable js -set_mute(on_off) # Set whether to mute -set_user_agent(user_agent) # Set user agent -set_proxy(proxy) # Set proxy address -set_paths(driver_path, chrome_path, debugger_address, download_path, user_data_path, cache_path) # Set browser-related paths +remove_argument(value) # delete an argument value +remove_experimental_option(key) # delete an experimental_option setting +remove_all_extensions() # Remove all plugins +save() # Save the configuration to the default ini file +save('D:\\settings.ini') # save to other path +set_argument(arg, value) # set argument attribute +set_headless(on_off) # Set whether to use no interface mode +set_no_imgs(on_off) # Set whether to load images +set_no_js(on_off) # Set whether to disable js +set_mute(on_off) # Set whether to mute +set_user_agent(user_agent) # set user agent +set_proxy(proxy) # set proxy address +set_paths(driver_path, chrome_path, debugger_address, download_path, user_data_path, cache_path) # Set browser- related paths ``` + + ### Instructions ```python -do = DriverOptions(read_file=False) # Create chrome configuration object, not read from ini file -do.set_headless(False) # Show browser interface -do.set_no_imgs(True) # Don't load pictures -do.set_paths(driver_path='D:\\chromedriver.exe', chrome_path='D:\\chrome.exe') # Set paths -do.set_headless(False).set_no_imgs(True) # Support chain operation +do = DriverOptions(read_file=False) # Create chrome configuration object, not read from ini file +do.set_headless(False) # show the browser interface +do.set_no_imgs(True) # Do not load pictures +do.set_paths(driver_path='D:\\chromedriver.exe', chrome_path='D:\\chrome.exe') # set path +do.set_headless(False).set_no_imgs(True) # Support chain operation -drission = Drission(driver_options=do) # Create Drission object with configuration object -page = MixPage(drission) # Create a MixPage object with a Drission object +drission = Drission(driver_options=do) # Create Drission object with configuration object +page = MixPage(drission) # Create a MixPage object with Drission object -do.save() # Save the configuration to the default ini file +do.save() # Save the configuration to the default ini file ``` ## Save configuration -Because chrome and headers have many configurations, an ini file is set up to save commonly used configurations. You can use the OptionsManager object to get and save the configuration, and use the DriverOptions object to modify the chrome configuration. You can also save multiple ini files and call them according to different projects. +Because there are many configurations of chrome and headers, an ini file is set up specifically to save common configurations. You can use the OptionsManager object to get and save the configuration, and use the DriverOptions object to modify the chrome configuration. You can also save multiple ini files and call them according to different projects. -Tips:It is recommended to save common configuration files to another path to prevent the configuration from being reset when the library is upgraded. +Tips: It is recommended to save the commonly used configuration files to another path to prevent the configuration from being reset when the library is upgraded. -### ini file +### ini file content -The ini file has three parts by default: paths, chrome_options, and session_options. The initial contents are as follows. +The ini file has three parts by default: paths, chrome_options, and session_options. The initial content is as follows. ```ini [paths] ; chromedriver.exe path chromedriver_path = -; Temporary folder path, used to save screenshots, download files, etc. +; Temporary folder path, used to save screenshots, file downloads, etc. global_tmp_path = [chrome_options] -; The opened browser address and port, such as 127.0.0.1:9222 +; The address and port of the opened browser, such as 127.0.0.1:9222 debugger_address = ; chrome.exe path binary_location = ; Configuration information arguments = [ ; Hide browser window - '--headless', + '- - headless', ; Mute - '--mute-audio', + '- - mute- audio', ; No sandbox - '--no-sandbox', - ; Google documentation mentions the need to add this attribute to avoid bugs - '--disable-gpu', - ; ignore errors - 'ignore-certificate-errors', - ; Hidden message bar - '--disable-infobars' + '- - no- sandbox', + ; Google documentation mentions that this attribute needs to be added to avoid bugs + '- - disable- gpu', + ; Ignore warning + 'ignore- certificate- errors', + ; Do not display the information bar + '- - disable- infobars' ] ; Plugin extensions = [] ; Experimental configuration experimental_options = { 'prefs': { - ; Download without pop-up + ; Download does not pop up 'profile.default_content_settings.popups': 0, - ; No pop-up window + ; No popup 'profile.default_content_setting_values': {'notifications': 2}, ; Disable PDF plugin 'plugins.plugins_list': [{"enabled": False, "name": "Chrome PDF Viewer"}] }, - ; Set to developer mode, anti-anti-reptile (useless) - 'excludeSwitches': ["enable-automation"], + ; Set to developer mode, anti- reptile + 'excludeSwitches': ["enable- automation"], 'useAutomationExtension': False } [session_options] headers = { - "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8", + "User- Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", - "Connection": "keep-alive", - "Accept-Charset": "utf-8;q=0.7,*;q=0.7" + "Connection": "keep- alive", + "Accept- Charset": "utf- 8;q=0.7,*;q=0.7" } ``` -### OptionsManager object -The OptionsManager object is used to read, set, and save configurations. + +### OptionsManager Object + +The OptionsManager object is used to read, set and save the configuration. ```python -get_value(section, item) -> str # Get the value of a configuration -get_option(section) -> dict # Return all configuration properties in dictionary format -set_item(section, item, value) # Set configuration properties -save() # Save configuration to default ini file -save('D:\\settings.ini') # Save to other path +get_value(section, item) - > str # Get the value of a configuration +get_option(section) - > dict # Return all attributes of configuration in dictionary format +set_item(section, item, value) # Set configuration attributes +save() # Save the configuration to the default ini file +save('D:\\settings.ini') # save to other path ``` + + ### Usage example ```python from DrissionPage.configs import * -options_manager = OptionsManager() # Create OptionsManager object from default ini file -options_manager = OptionsManager('D:\\settings.ini') # Create OptionsManager object from other ini files -driver_path = options_manager.get_value('paths', 'chromedriver_path') # Read path information -options_manager.save() # Save to the default ini file -options_manager.save('D:\\settings.ini') # Save to other path +options_manager = OptionsManager() # Create OptionsManager object from the default ini file +options_manager = OptionsManager('D:\\settings.ini') # Create OptionsManager object from other ini files +driver_path = options_manager.get_value('paths','chromedriver_path') # read path information +options_manager.save() # Save to the default ini file +options_manager.save('D:\\settings.ini') # save to other path -drission = Drission(ini_path = 'D:\\settings.ini') # Use other ini files to create objects +drission = Drission(ini_path ='D:\\settings.ini') # Use other ini files to create objects ``` -**Note** : If you do not pass in the path when saving, it will be saved to the ini file in the module directory, even if you are not reading the default ini file. +**Note**: If you do not pass in the path when saving, it will be saved to the ini file in the module directory, even if the read is not the default ini file. -## easy_set methods +## easy_set method -​ Calling the easy_set method will modify the content of the default ini file. +Calling the easy_set method will modify the content of the default ini file. ```python -set_headless(True) # Set headless mode -set_no_imgs(True) # Set no-PIC mode -set_no_js(True) # Disable JavaScript -set_mute(True) # Silent mode -set_user_agent('Mozilla/5.0 (Macintosh; Int......') # set user agent -set_proxy('127.0.0.1:8888') # set proxy -set_paths(paths) # See the Initialization section -set_argument(arg, on_off) # Set the property. If the property has no value (e.g. 'zh_CN.utf-8'), the value is bool representing the switch. If value is "" or False, delete the attribute entry +set_headless(True) # Turn on headless mode +set_no_imgs(True) # Turn on no image mode +set_no_js(True) # Disable JS +set_mute(True) # Turn on mute mode +set_user_agent('Mozilla/5.0 (Macintosh; Int......') # set user agent +set_proxy('127.0.0.1:8888') # set proxy +set_paths(paths) # See [Initialization] section +set_argument(arg, value) # Set the attribute. If the attribute has no value (such as'zh_CN.UTF- 8'), the value is bool, which means switch; otherwise, the value is str. When the value is'' or False, delete the attribute item ``` # POM mode *** -MixPage encapsulates common page operations and can be easily used for expansion. +MixPage encapsulates common page operations and can be easily used for extension. -Example: Expand a list page reading class. +Example: extend a list page reading class ```python import re @@ -536,1261 +845,1703 @@ from time import sleep from DrissionPage import * class ListPage(MixPage): - """This class encapsulates the method of reading the list page, according to the necessary 4 elements, can read the homogeneous list page""" + """This class encapsulates the method of reading the list page. According to the necessary 4 elements, the isomorphic list page can be read + (Chinese variable is really fragrant) """ def __init__(self, drission: Drission, url: str = None, **xpaths): super().__init__(drission) self._url = url - self.xpath_cloumn_name = xpaths['cloumn_name'] # [xpath str, re str] - self.xpath_next_btn = xpaths['next_btn'] - self.xpath_rows = xpaths['rows'] - self.xpath_total_pages = xpaths['total_pages'] # [xpath str, re str] - self.total_pages = self.get_total_pages() + self.xpath_column name = xpaths['column name'] # [xpath string, regular expression] + self.xpath_next page = xpaths['next page'] + self.xpath_lines = xpaths['line'] + self.xpath_page number = xpaths['page number'] # [xpath string, regular expression] + self.total pages = self.get_total pages() if url: self.get(url) - def get_cloumn_name(self) -> str: - if self.xpath_cloumn_name[1]: - s = self.ele(f'xpath:{self.xpath_cloumn_name[0]}').text - r = re.search(self.xpath_cloumn_name[1], s) + def get_column name (self) - > str: + if self.xpath_ column name[1]: + s = self.ele(f'xpath:{self.xpath_column name[0]}').text + r = re.search(self.xpath_column name[1], s) return r.group(1) else: - return self.ele(f'xpath:{self.xpath_cloumn_name[0]}').text + return self.ele(f'xpath:{self.xpath_column name[0]}').text - def get_total_pages(self) -> int: - if self.xpath_total_pages[1]: - s = self.ele(f'xpath:{self.xpath_total_pages[0]}').text - r = re.search(self.xpath_total_pages[1], s) + def get_total number of pages (self) - > int: + if self.xpath_page number[1]: + s = self.ele(f'xpath:{self.xpath_number of pages[0]}').text + r = re.search(self.xpath_number of pages[1], s) return int(r.group(1)) else: - return int(self.ele(f'xpath:{self.xpath_total_pages[0]}').text) + return int(self.ele(f'xpath:{self.xpath_number of pages[0]}').text) - def click_next_btn(self, wait: float = None): - self.ele(f'xpath:{self.xpath_next_btn}').click() + def click_next page(self, wait: float = None): + self.ele(f'xpath:{self.xpath_next page}').click() if wait: sleep(wait) - def get_current_page_list(self, content_to_fetch: list) -> list: + def get_ current page list (self, content to be crawled: list) - > list: """ - content_to_fetch:[[xpath1,para1],[xpath2,para2]...] - output list:[[para1,para2...],[para1,para2...]...] + Format of content to be crawled: [[xpath1,parameter1],[xpath2,parameter2]...] + Return list format: [[Parameter1,Parameter2...],[Parameter1,Parameter2...]...] """ - result_list = [] - rows = self.eles(f'xpath:{self.xpath_rows}') - for row in rows: - row_result = [] - for j in content_to_fetch: - row_result.append(row.ele(f'xpath:{j[0]}').attr(j[1])) - result_list.append(row_result) - print(row_result) - return result_list + Result list = [] + Line s = self.eles(f'xpath:{self.xpath_lines}') + for line in line s: + Row result = [] + for j in content to be crawled: + Line result.append(line.ele(f'xpath:{j[0]}').attr(j[1])) + Result list.append (row result) + print(line result) + return result list - def get_list(self, content_to_fetch: list, wait: float = None) -> list: - current_list = self.get_current_page_list(content_to_fetch) - for _ in range(self.total_pages - 1): - self.click_next_btn(wait) - current_list.extend(self.get_current_page_list(content_to_fetch)) - return current_list + def get_list(self, content to be crawled: list, wait: float = None) - > list: + List = self.get_ current page list (content to be crawled) + for _ in range(self. total pages- 1): + self.click_next page(wait) + List.extend(self.get_current page list (content to be crawled)) + return list ``` -# Others +# Other *** ## DriverPage and SessionPage -If there is no need to switch modes, only DriverPage or SessionPage can be used as required, the usage is consistent with MixPage. +If you don't need to switch modes, you can only use DriverPage or SessionPage as needed, and the usage is the same as MixPage. ```python from DrissionPage.session_page import SessionPage from DrissionPage.drission import Drission session = Drission().session -page = SessionPage(session) # Pass in Session object +page = SessionPage(session) # Pass in Session object page.get('http://www.baidu.com') -print(page.ele('@id:su').text) # Output:百度一下 +print(page.ele('@id:su').text) # Output: Baidu driver = Drission().driver -page = DriverPage(driver) # Pass in Driver object +page = DriverPage(driver) # Pass in Driver object page.get('http://www.baidu.com') -print(page.ele('@id:su').text) # Output:百度一下 +print(page.ele('@id:su').text) # Output: Baidu ``` # APIs *** -## Drission class +## Drission Class -​ class **Drission**(driver_options: Union[dict, Options] = None, session_options: dict = None, ini_path = None, proxy: dict = None) +### class Drission() -​ Used to manage driver and session objects. - -​ Parameter Description: - -- driver_options - Chrome configuration parameters, can receive Options object or dictionary -- session_options - session configuration parameters, receive dictionary -- ini_path - ini file path, the default is the ini file in the DrissionPage folder - -### session - -​ Returns the HTMLSession object, which is created automatically when called. - -### driver - -​ Obtain the WebDriver object, which is automatically created when it is called and initialized according to the incoming configuration or ini file configuration. - -### driver_options - -​ Return driver configuration in dictionary format. - -### session_options - -​ Return session configuration in dictionary format. - -### proxy - -​ Return proxy configuration in dictionary format. - -### cookies_to_session() - -​ cookies_to_session(copy_user_agent: bool = False, driver: WebDriver = None, session: Session = None) -> None - -​ Copy cookies from driver to session. By default, self.driver is copied to self.session, and driver and session can also be received for operation. - -​ Parameter Description: - -- copy_user_agent - Whether to copy user_agent to session -- driver - WebDriver object, copy cookies -- session - Session object, receiving cookies - -### cookies_to_driver() - -​ cookies_to_driver(url: str, driver: WebDriver = None, session: Session = None) -> None - -​ Copy cookies from session to driver. By default, self.session is copied to self.driver, and driver and session can also be received for operation. Need to specify url or domain name. - -​ Parameter Description: - -- url - cookies domain -- driver - WebDriver object, receiving cookies -- session - Session object, copy cookies - -### user_agent_to_session() - -​ user_agent_to_session(driver: WebDriver = None, session: Session = None) -> None - -​ Copy the user agent from the driver to the session. By default, self.driver is copied to self.session, and driver and session can also be received for operation. - -​ Parameter Description: - -- driver - WebDriver object, copy user agent -- session - Session object, receiving user agent - -### close_driver() - -​ close_driver() -> None - -​ Close the browser and set the driver to None. - -### close_session() - -​ close_session() -> None - -​ Close the session and set it to None. - -### close() - -​ close() -> None - -​ Close the driver and session. - - - -## MixPage class - -class **MixPage**(drission: Union[Drission, str] = None, mode:str = 'd', timeout: float = 10) - -MixPage encapsulates common functions for page operations and can seamlessly switch between driver and session modes. Cookies are automatically synchronized when switching. -The function of obtaining information is common to the two modes, and the function of operating page elements is only available in the d mode. Calling a function unique to a certain mode will automatically switch to that mode. -It inherits from DriverPage and SessionPage classes. These functions are implemented by these two classes. MixPage exists as a scheduling role. +The Drission class is used to manage WebDriver objects and Session objects, and is the role of the driver. Parameter Description: -- drission - Drission objects, if not transmitted will create one. Quickly configure the corresponding mode when passing in's' or'd' -- mode - Mode, optional 'd' or 's', default is 'd' -- timeout - Timeout time, driver mode search element time and session mode connection time +- driver_or_options: [WebDriver, dict, Options] - WebDriver object or chrome configuration parameters. +- session_or_options: [Session, dict] - Session object configuration parameters +- ini_path: str - ini file path, the default is the ini file under the DrissionPage folder +- proxy: dict - proxy settings -### url -​ Returns the currently visited URL. - -### mode - -​ Returns the current mode ('s' or 'd'). - -### drission - -​ Returns the currently used Dirssion object. - -### driver - -​ Returns the driver object, if not created, it will switch to driver mode when called. ### session -​ Returns the session object, if not created. +Return the Session object, which is automatically initialized according to the configuration information. + +Returns: Session- the managed Session object + + + +### driver + +Return the WebDriver object, which is automatically initialized according to the configuration information. + +Returns: WebDriver- Managed WebDriver object + + + +### driver_options + +Return or set the driver configuration. + +Returns: dict + + + +### session_options + +Return to session configuration. + +Returns: dict + + + +### session_options() + +Set the session configuration. + +Returns: None + + + +### proxy + +Return to proxy configuration. + +Returns: dict + + + +### cookies_to_session() + +Copy the cookies of the driver object to the session object. + +Parameter Description: + +- copy_user_agent: bool - whether to copy user_agent to session +- driver: WebDriver- Copy the WebDriver object of cookies +- session: Session- Session object that receives cookies + +Returns: None + + + +### cookies_to_driver() + +Copy cookies from session to driver. + +Parameter Description: + +- url: str - the domain of cookies +- driver: WebDriver- WebDriver object that receives cookies +- session: Session- Copy the Session object of cookies + +Returns: None + + + +### user_agent_to_session() + +Copy the user agent from the driver to the session. + +Parameter Description: + +- driver: WebDriver- WebDriver object, copy user agent +- session: Session- Session object, receiving user agent + +Returns: None + + + +### close_driver() + +Close the browser and set the driver to None. + +Returns: None + + + +### close_session() + +Close the session and set it to None. + +Returns: None + + + +### close() + +Close the driver and session. + +Returns: None + + + +## MixPage Class + +### class MixPage() + +MixPage encapsulates the common functions of page operation and can seamlessly switch between driver and session modes. Cookies are automatically synchronized when switching. +The function of obtaining information is shared by the two modes, and the function of operating page elements is only available in mode d. Calling a function unique to a certain mode will automatically switch to that mode. +It inherits from DriverPage and SessionPage classes, these functions are implemented by these two classes, and MixPage exists as a scheduling role. + +Parameter Description: + +- drission: Drission - Drission object, if not passed in, create one. Quickly configure the corresponding mode when's' or'd' is passed in +- mode: str - mode, optional'd' or's', default is'd' +- timeout: float - timeout, driver mode is the time to find elements, session mode is the connection waiting time + + + +### url + +Returns the URL currently visited by the MixPage object. + +Returns: str + + + +### mode + +Returns the current mode ('s' or'd'). + +Returns: str + + + +### drission + +Returns the Dirssion object currently in use. + +Returns: Drission + + + +### driver + +Return the driver object, if not, create it, and switch to driver mode when calling. + +Returns: WebDriver + + + +### session + +Return the session object, if not, create it. + +Returns: Session + + ### response -​ Return the Response object and switch to session mode when calling. +Return the Response object obtained in s mode, and switch to s mode when called. + +Returns: Response + + ### cookies -​ Returns cookies, obtained from the current mode. +Return cookies, obtained from the current mode. + +Returns: [dict, list] + + ### html -​ Return the page html text. +Return the html text of the page. + +Returns: str + + ### title -​ Return to the page title text. +Return to the page title. + +Returns: str + + + +### url_available + +Returns the validity of the current url. + +Returns: bool + + ### change_mode() -​ change_mode(mode: str = None, go: bool = True) -> None +Switch mode,'d' or's'. When switching, the cookies of the current mode will be copied to the target mode. -​ Switch mode, you can specify the target mode, if the target mode is consistent with the current mode, then directly return. +Parameter Description: -​ Parameter Description: +- mode: str - Specify the target mode,'d' or's'. +- go: bool - whether to jump to the current url after switching mode -- mode - Specify the target mode, 'd' or 's'. -- go - Whether to jump to the current url after switching modes +Returns: None -### get() -​ get(url: str, go_anyway=False, **kwargs) -> Union[bool, None] - -​ Jump to a url, sync cookies before jumping, and return whether the target url is available after jumping. - -​ Parameter Description: - -- url - Target url -- go_anyway - Whether to force a jump. If the target url is the same as the current url, the default is not to jump. -- kwargs - Used to access parameters when in session mode. ### ele() -​ ele(loc_or_ele: Union[tuple, str, DriverElement, SessionElement], mode: str = None, timeout: float = None, show_errmsg: bool = False) -> Union[DriverElement, SessionElement] +Return the eligible elements on the page, the first one is returned by default. +If the query parameter is a string, the options of'@attribute name:','tag:','text:','css:', and'xpath:' are available. When there is no control mode, the text mode is used to search by default. +If it is loc, query directly according to the content. -​ Get elements according to query parameters and return elements or element lists. -​ If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', 'xpath:' method. When there is no control mode, it is searched by text mode by default. -​ If it is loc, query directly according to the content. +Parameter Description: -​ Parameter Description: +- loc_or_str: [Tuple[str, str], str, DriverElement, SessionElement, WebElement] - The positioning information of the element, which can be an element object, a loc tuple, or a query string +- mode: str - 'single' or'all', corresponding to find one or all +- timeout: float - Find the timeout of the element, valid in driver mode -- loc_or_str - Query condition parameters, if an element object is passed in, return directly -- mode - Find one or more, pass in 'single' or 'all' -- timeout - Search element timeout time, valid in driver mode -- show_errmsg - Whether to throw and display when an exception occurs +Example: -​ Examples: - -- When the element object is received: Return the element object object +- When the element object is received: return the element object object - Find with loc tuple: - - ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children + - ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class -- Find with query string: +- Find with query string: Attributes, tag name and attributes, text, xpath, css selector. Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string. - - page.ele('@class:ele_class') - Return the first class element containing ele_class - - page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name - - page.ele('@placeholder') - Return the first element with placeholder attribute - - page.ele('tag:p') - Return the first p element - - page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class - - page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class - - page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text - - page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text - - page.ele('text:some_text') - Returns the first element whose text contains some_text - - page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line) - - page.ele('text=some_text') - Returns the first element whose text is equal to some_text - - page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath - - page.ele('css:div.ele_class') - Return the first element that matches the css selector + - page.ele('@class:ele_class') - returns the element with ele_class in the first class + - page.ele('@name=ele_name') - returns the first element whose name is equal to ele_name + - page.ele('@placeholder') - returns the first element with placeholder attribute + - page.ele('tag:p') - return the first p element + - page.ele('tag:div@class:ele_class') - returns the first class div element with ele_class + - page.ele('tag:div@class=ele_class') - returns the first div element whose class is equal to ele_class + - page.ele('tag:div@text():some_text') - returns the first div element whose text contains some_text + - page.ele('tag:div@text()=some_text') - returns the first div element whose text is equal to some_text + - page.ele('text:some_text') - returns the first element whose text contains some_text + - page.ele('some_text') - returns the first text element containing some_text (equivalent to the previous line) + - page.ele('text=some_text') - returns the first element whose text is equal to some_text + - page.ele('xpath://div[@class="ele_class"]') - return the first element that matches xpath + - page.ele('css:div.ele_class') - returns the first element that matches the css selector + +Returns: [DriverElement, SessionElement, str] - element object or attribute, text node text + + ### eles() -​ eles(loc_or_str: Union[tuple, str], timeout: float = None, show_errmsg: bool = False) -> List[DriverElement] +Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method. -​ Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method. +Parameter Description: + +- loc_or_str: [Tuple[str, str], str] - query condition parameter +- timeout: float - Find the timeout of the element, valid in driver mode + +Returns: [List[DriverElement or str], List[SessionElement or str]] - a list of element objects or attributes and text node text -​ Parameter Description: -- loc_or_str - Query condition parameters -- timeout - Search element timeout time, valid in driver mode -- show_errmsg - Whether to throw and display when an exception occurs ### cookies_to_session() -​ cookies_to_session(copy_user_agent: bool = False) -> None +Copy cookies from the WebDriver object to the Session object. -​ Manually copy cookies from driver to session. +Parameter Description: + +- copy_user_agent: bool - whether to copy user agent at the same time + +Returns: None -​ Parameter Description: -- copy_user_agent - Whether to also copy user agent ### cookies_to_driver() -​ cookies_to_driver(url=None) -> None +Copy cookies from the Session object to the WebDriver object. -​ Manually copy cookies from session to driver. +Parameter Description: + +- url: str - the domain or url of cookies + +Returns: None + + + +### get() + +To jump to a url, synchronize cookies before the jump, and return whether the target url is available after the jump. + +Parameter Description: + +- url: str - target url +- go_anyway: bool - Whether to force a jump. If the target url is the same as the current url, it will not redirect by default. +- show_errmsg: bool - whether to display and throw an exception +- retry: int - the number of retries when a connection error occurs +- interval: float - Retry interval (seconds) +- **kwargs - connection parameters for requests + +Returns: [bool, None] - whether the url is available -​ Parameter Description: -- url - cookie domain or url ### post() -​ post(url: str, params: dict = None, data: dict = None, go_anyway: bool = False, **kwargs) -> Union[bool, None] +Jump in post mode, automatically switch to session mode when calling. -​ Jump by post, and switch to session mode automatically when calling. +Parameter Description: + +- url: str - target url +- data: dict - submitted data +- go_anyway: bool - Whether to force a jump. If the target url is the same as the current url, it will not redirect by default. +- show_errmsg: bool - whether to display and throw an exception +- retry: int - the number of retries when a connection error occurs +- interval: float - Retry interval (seconds) +- **kwargs - connection parameters for requests + +Returns: [bool, None] - whether the url is available -​ Parameter Description: -- url - Target url -- parame - url parameter -- data - Submitted data -- go_anyway - Whether to force a jump. If the target url is the same as the current url, the default is not to jump. -- kwargs - Access parameters such as headers ### download() -​ download(file_url: str, goal_path: str = None, rename: str = None, file_exists: str = 'rename', show_msg: bool = False, **kwargs) -> tuple +Download a file, return whether it is successful and the download information string. This method will automatically avoid the same name with the existing file in the target path. -​ Download a file, return success and download information string. Changing the method will automatically avoid renaming the existing file in the target path. +Parameter Description: -​ Parameter Description: +- file_url: str - file url +- goal_path: str - storage path, the default is the temporary folder specified in the ini file +- rename: str - rename the file without changing the extension +- file_exists: str - If there is a file with the same name, you can choose'rename','overwrite','skip' to process +- post_data: dict - data submitted in post mode +- show_msg: bool - whether to show download information +- show_errmsg: bool - whether to display and throw an exception +- **kwargs - connection parameters for requests -- file_url - File URL -- goal_path - Storage path, the default is the temporary folder specified in the ini file -- rename - Rename the file without changing the extension -- file_exists - If there is a file with the same name, you can choose 'rename', 'overwrite', 'skip' to process -- show_msg - Show download massage or not. -- kwargs - Connection parameters for requests +Returns: Tuple[bool, str] - a tuple of whether the download was successful (bool) and status information (the information is the file path when successful) -The following methods and attributes only take effect in driver mode, and will automatically switch to driver mode when called +The following methods and properties only take effect in driver mode, and will automatically switch to driver mode when called *** ### tabs_count -​ Returns the number of tab pages. +Returns the number of tab pages. + +Returns: int + + ### tab_handles -​ Returns the handle list of all tabs. +Returns the handle list of all tabs. + +Returns: list + + ### current_tab_num -​ Returns the serial number of the current tab page. +Returns the serial number of the current tab page. + +Returns: int + + ### current_tab_handle -​ Returns the handle of the current tab page. +Returns the handle of the current tab page. + +Returns: str + + + +### wait_ele() + +Wait for the element to be deleted, displayed, and hidden from the dom. + +Parameter Description: + +- loc_or_ele: [str, tuple, DriverElement, WebElement] - Element search method, same as ele() +- mode: str - waiting mode, optional:'del','display','hidden' +- timeout: float - waiting timeout + +Returns: bool - whether the wait is successful + + ### check_page() -​ check_page(by_requests: bool = False) -> Union[bool, None] +In d mode, check whether the web page meets expectations. The response status is checked by default, and can be overloaded to achieve targeted checks. -​ In d mode, check whether the web page meets expectations. The response status is checked by default, and can be overloaded to achieve targeted checks. +Parameter Description: + +- by_requests: bool - Force the use of built- in response for checking + +Return: [bool, None] - bool is available, None is unknown -​ Parameter Description: -- by_requests - 强制使用内置response进行检查 ### run_script() -​ run_script(script: str, *args) -> Any +Execute JavaScript code. -​ Execute JavaScript code. +Parameter Description: + +- script: str - JavaScript code text +- *args - incoming parameters + +Returns: Any -​ Parameter Description: -- script - JavaScript code text -- args - arguments ### create_tab() -​ create_tab(url: str = '') -> None +Create and locate a tab page, which is at the end. -​ Create and locate a tab page, which is at the end. +Parameter Description: + +- url: str - the URL to jump to the new tab page + +Returns: None -​ Parameter Description: -- url - URL to jump in the new tab page ### close_current_tab() -​ close_current_tab() -> None +Close the current tab. + +Returns: None + -​ Close the current tab. ### close_other_tabs() -​ close_other_tabs(num_or_handle: Union[int, str, None] = None) -> None +Close tab pages other than the incoming tab page, and keep the current page by default. -​ Close tab pages other than the incoming tab page, and keep the current page by default. +Parameter Description: + +- num_or_handle:[int, str] - The serial number or handle of the tab to keep, the first serial number is 0, and the last is - 1 + +Returns: None -​ Parameter Description: -- num_or_handle - The serial number or handle of the tab to keep, the first serial number is 0, and the last is -1 ### to_tab() -​ to_tab(num_or_handle: Union[int, str] = 0) -> None +Jump to the tab page. -​ Jump to a tab page. +Parameter Description: + +- num_or_handle:[int, str] - tab page serial number or handle string, the first serial number is 0, the last is - 1 + +Returns: None -​ Parameter Description: -- num_or_handle - The serial number or handle of the tab to keep, the first serial number is 0, and the last is -1 ### to_iframe() -​ to_iframe(self, loc_or_ele: Union[int, str, tuple, WebElement, DriverElement] = 'main') -> None +Jump to iframe, jump to the highest level by default, compatible with selenium native parameters. -​ Jump to iframe, compatible with selenium native parameters. +Parameter Description: -​ Parameter Description: +- loc_or_ele:[int, str, tuple, WebElement, DriverElement] - Find the condition of iframe element, can receive iframe serial number (starting at 0), id or name, query string, loc parameter, WebElement object, DriverElement object, and pass in ' main' jump to the highest level, and pass in'parent' to jump to the upper level + +Example: +- to_iframe('tag:iframe')- locate by the query string passed in iframe +- to_iframe('iframe_id')- Positioning by the id attribute of the iframe +- to_iframe('iframe_name')- locate by the name attribute of iframe +- to_iframe(iframe_element)- locate by passing in the element object +- to_iframe(0)- locate by the serial number of the iframe +- to_iframe('main')- jump to the top level +- to_iframe('parent')- jump to the previous level + +Returns: None -- loc_or_ele - To search for iframe element conditions, you can receive iframe serial number (starting at 0), id or name, control string, loc parameter, WebElement object, DriverElement object, pass 'main' to jump to the top level, pass 'parent' to jump to parent level. -​ Examples: -- to_iframe('tag:iframe') - Positioning by the query string passed in the iframe -- to_iframe('iframe_id') - Positioning by the id attribute of the iframe -- to_iframe('iframe_name') - Positioning by the name attribute of the iframe -- to_iframe(iframe_element) - Positioning by passing in the element object -- to_iframe(0) - Positioning by the serial number of the iframe -- to_iframe('main') - Switch to the top level -- to_iframe('parent') - Switch to the previous level ### scroll_to_see() -​ scroll_to_see(loc_or_ele: Union[str, tuple, WebElement, DriverElement]) -> None +Scroll until the element is visible. -​ Scroll until the element is visible. +Parameter Description: + +- loc_or_ele:[str, tuple, WebElement, DriverElement] - The conditions for finding elements are the same as those of the ele() method. + +Returns: None -​ Parameter Description: -- loc_or_ele - The search condition of the iframe element is the same as the search condition of the ele () method. ### scroll_to() -​ scroll_to(mode: str = 'bottom', pixel: int = 300) -> None +Scroll the page and decide how to scroll according to the parameters. -​ Scroll the page and decide how to scroll according to the parameters. +Parameter Description: + +- mode: str - scroll direction, top, bottom, rightmost, leftmost, up, down, left, right +- pixel: int - scrolling pixel + +Returns: None -​ Parameter Description: -- mode - Scrolling direction, top, bottom, rightmost, leftmost, up, down, left, right -- pixel - Scrolling pixels ### refresh() -​ refresh() -> None +refresh page. + +Returns: None + -​ Refresh page. ### back() -​ back() -> None +The page goes back. + +Returns: None + -​ The page back. ### set_window_size() -​ set_window_size(x: int = None, y: int = None) -> None +Set the window size, maximize by default. -​ Set the window size and maximize it by default. +Parameter Description: + +- x: int - target width +- y: int - target height + +Returns: None -​ Parameter Description: -- x - Target width -- y - Target height ### screenshot() -​ screenshot(path: str, filename: str = None) -> str +Take a screenshot of the web page and return the path of the screenshot file -​ Take a screenshot of the web page and return the path of the screenshot file. +Parameter Description: -​ Parameter Description: +- path: str - The screenshot save path, the default is the temporary folder specified in the ini file +- filename: str - the name of the screenshot file, the default is the page title as the file name -- path - Screenshot save path, default is the temporary folder specified in the ini file -- filename - Screenshot file name, default is page title as file name +Returns: str -### process_alert() -​ process_alert(mode: str = 'ok', text: str = None) -> Union[str, None] - -​ Processing alert, confirm and prompt box. - -​ Parameter Description: - -- mode - 'ok' or 'cancel', if enter another value, the button will not be pressed but the text value will still be returned -- text - Text can be entered when processing prompt box ### chrome_downloading() -​ chrome_downloading(download_path: str = None) -> list +Return to the list of files downloaded by the browser. -​ Check whether the browser is downloaded. +Parameter Description: + +- download_path: str - download folder path + +Returns: list + + + +### process_alert() + +Process the prompt box. + +Parameter Description: + +- mode: str - 'ok' or'cancel', if another value is entered, the button will not be pressed but the text value will still be returned +- text: str - You can enter text when processing the prompt box + +Returns: [str, None] - the text of the prompt box content -​ Parameter Description: -- download_path - Download path, the default is the download path in chrome options configuration ### close_driver() -​ close_driver() -> None +Close the driver and browser. + +Returns: None + -​ Close the driver and browser, and switch to s mode. ### close_session() -​ close_session() -> None +Close the session. + +Returns: None + -​ Close the session and switch to d mode. ## DriverElement class -class DriverElement(ele: WebElement, timeout: float = 10) +### class DriverElement() -The element object of the driver mode wraps a WebElement object and encapsulates common functions. +The element object in driver mode encapsulates a WebElement object and encapsulates common functions. Parameter Description: -- ele - WebElement object -- timeout - Search element time-out time (can also be set separately each time element search) +- ele: WebElement- WebElement object +- page: DriverPage- the page object where the element is located +- timeout: float - Find the timeout of the element (it can be set separately each time the element is searched) + + ### inner_ele -​ The wrapped WebElement object. +The wrapped WebElement object. -### driver +Returns: WebElement -​ WebDriver object of the element. -### attrs - -​ Return all attributes and values of the elements in a dictionary. - -### text - -​ Returns the text inside the element. ### html -​ Returns the html text in the element. +Returns the outerHTML text of the element. + +Returns: str + + + +### inner_html + +Returns the innerHTML text of the element. + +Returns: str + + ### tag -​ Returns the element label name text. +Returns the element tag name. + +Returns: str + + + +### attrs + +Return all attributes and values ​​of the element in a dictionary. + +Returns: dict + + + +### text + +Returns the text inside the element. + +Returns: str + + + +### css_path + +Returns the absolute path of the element css selector. + +Returns: str + + ### xpath -​ Return the xpath path of the element. +Returns the absolute path of the element xpath. + +Returns: str + + ### parent -​ Returns the parent element object. +Returns the parent element object. + +Returns: DriverElement + + ### next -​ Returns the next sibling element object. +Return the next sibling element object. + +Returns: DriverElement + + ### prev -​ Returns the last sibling element object. +Returns the previous sibling element object. -### parents() +Returns: DriverElement -​ parents(num: int = 1) -> Union[DriverElement, None] -​ Returns the Nth-level parent element object. - -​ Parameter Description: - -- num - The parent element of the upper level - -### nexts() - -​ nexts(num: int = 1) -> Union[DriverElement, None] - -​ Returns the next N sibling element objects. - -​ Parameter Description: - -- num - The next few sibling elements - -### prevs() - -​ prevs(num: int = 1) -> Union[DriverElement, None] - -​ Return the first N sibling element objects. - -​ Parameter Description: - -- num - The first few sibling elements ### size -​ Returns the element size as a dictionary. +Return the element size in a dictionary. + +Returns: dict + + ### location -​ Put the element coordinates back in a dictionary. +Replace the element coordinates in a dictionary. -### ele() +Returns: dict -​ ele(loc_or_str: Union[tuple, str], mode: str = None, show_errmsg: bool = False, timeout: float = None) -> Union[DriverElement, List[DriverElement], None] -​ Get elements based on query parameters. -​ If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', and 'xpath:' methods. When there is no control mode, it is searched by text mode by default. -​ If it is loc, query directly according to the content. -​ Parameter Description: +### shadow_root -- loc_or_str - Query condition parameters -- mode - Find one or more, pass in 'single' or 'all' -- show_errmsg - Whether to throw and display when an exception occurs -- timeout - Find Element Timeout +Returns the shadow_root element object of the current element -​ Examples:: +Returns: ShadowRoot -- Find with loc tuple: - - ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children -- Find with query string: +### before - Attributes, tag name and attributes, text, xpath, css selector. +Returns the content of the ::before pseudo- element of the current element - Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string. +Returns: str - - page.ele('@class:ele_class') - Return the first class element containing ele_class - - page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name - - page.ele('@placeholder') - Return the first element with placeholder attribute - - page.ele('tag:p') - Return the first

element - - page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class - - page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class - - page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text - - page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text - - page.ele('text:some_text') - Returns the first element whose text contains some_text - - page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line) - - page.ele('text=some_text') - Returns the first element whose text is equal to some_text - - page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath - - page.ele('css:div.ele_class') - Return the first element that matches the css selector -### eles() -​ eles(loc_or_str: Union[tuple, str], show_errmsg: bool = False, timeout: float = None) -> List[DriverElement] +### after -​ Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method. +Returns the content of the ::after pseudo element of the current element -​ Parameter Description: +Returns: str -- loc_or_str - Query condition parameters -- show_errmsg - Whether to throw and display when an exception occurs -- timeout - Find Element Timeout -### attr() -​ attr(attr: str) -> str +### texts() -​ Get the value of an attribute of an element. - -​ Parameter Description: - -- attr - Attribute name - -### click() - -​ click(by_js=None) -> bool - -​ Click on the element. If it is unsuccessful, click on js. You can specify click on js or not. - -​ Parameter Description: - -- by_js - Whether to click with js - -### input() - -​ input(value, clear: bool = True) -> bool - -​ Enter text. - -​ Parameter Description: - -- value - Text value -- clear - Whether to clear the text box before entering - -### run_script() - -​ run_script(script: str, *args) -> Any - -​ Execute js code, pass self as the first parameter. - -​ Parameter Description: - -- script - JavaScript text -- args - arguments - -### submit() - -​ submit() -> None - -​ Submit form. - -### clear() - -​ clear() -> None - -​ Clear the text box. - -### is_selected() - -​ is_selected() -> bool - -​ Whether the element is selected. - -### is_enabled() - -​ is_enabled() -> bool - -​ Whether the element is available on the page. - -### is_displayed() - -​ is_displayed() -> bool - -​ Whether the element is visible. - -### is_valid() - -​ is_valid() -> bool - -​ Whether the element is valid. This method is used to determine the situation where the page jump element cannot be used - -### screenshot() - -​ screenshot(path: str, filename: str = None) -> str - -​ Take a screenshot of the web page and return the path of the screenshot file. - -​ Parameter Description: - -- path - Screenshot save path, default is the temporary folder specified in the ini file -- filename - Screenshot file name, default is page title as file name - -### select() - -​ select(text: str) -> bool - -​ Choose from the drop-down list. - -​ Parameter Description: - -- text - Option text - -### set_attr() - -​ set_attr(attr: str, value: str) -> bool - -​ Set element attributes. - -​ Parameter Description: - -- attr - parameter name -- value - Parameter value - -### drag() - -​ drag(x: int, y: int, speed: int = 40, shake: bool = True) -> bool - -​ Drag the current element a certain distance, and return whether the drag is successful. - -​ Parameter Description: - -- x - Drag distance in x direction -- y - Drag distance in y direction -- speed - Drag speed -- shake - Random jitter - -### drag_to() - -​ drag_to(ele_or_loc: Union[tuple, WebElement, DrissionElement], speed: int = 40, shake: bool = True) -> bool: - -​ Drag the current element, the target is another element or coordinate tuple, and return whether the drag is successful. - -​ Parameter Description: - -- ele_or_loc - Another element or relative current position. The coordinates are the coordinates of the midpoint of the element. -- speed - Drag speed -- shake - Random jitter - -### hover() - -​ hover() - -​ Hover over the element. - - - -## SessionElement class - -class SessionElement(ele: Element) - -The element object of session mode wraps an Element object and encapsulates common functions. +Returns the text of all direct child nodes within the element, including elements and text nodes Parameter Description: -- ele - Element object of requests_html library +- text_node_only: bool - whether to return only text nodes -### inner_ele +Returns: List[str] -​ The wrapped Element object. -### attrs - -​ Returns the names and values of all attributes of the element in dictionary format. - -### text - -​ Returns the text inside the element. - -### html - -​ Returns the html text in the element. - -### tag - -​ Returns the element label name text. - -### xpath - -​ Return the xpath path of the element. - -### parent - -​ Returns the parent element object. - -### next - -​ Returns the next sibling element object. - -### prev - -​ Returns the last sibling element object. ### parents() -​ parents(num: int = 1) -> Union[SessionElement, None] +Returns the Nth level parent element object. -​ Returns the Nth-level parent element object. +Parameter Description: + +- num: int - which level of parent element + +Returns: DriverElement -​ Parameter Description: -- num - The parent element of the upper level ### nexts() -​ nexts(num: int = 1) -> Union[SessionElement, None] +Returns the text of the numth sibling element or node. -​ Returns the next N sibling element objects. +Parameter Description: + +- num: int - the next sibling element or node +- mode: str - 'ele','node' or'text', matching element, node, or text node + +Returns: [DriverElement, str] -​ Parameter Description: -- num - The next few sibling elements ### prevs() -​ prevs(num: int = 1) -> Union[SessionElement, None] +Returns the text of the previous num sibling element or node. -​ Return the first N sibling element objects. +Parameter Description: + +- num: int - the previous sibling element or node +- mode: str - 'ele','node' or'text', matching element, node, or text node + +Returns: [DriverElement, str] + + + +### attr() + +Get the value of an attribute of an element. + +Parameter Description: + +- attr: str - attribute name + +Returns: str -​ Parameter Description: -- num - The first few sibling elements ### ele() -​ ele(loc_or_str: Union[tuple, str], mode: str = None, show_errmsg: bool = False) -> Union[SessionElement, List[SessionElement], None] +Returns the sub- elements, attributes or node texts of the current element that meet the conditions. +If the query parameter is a string, the options of'@attribute name:','tag:','text:','css:', and'xpath:' are available. When there is no control mode, the text mode is used to search by default. +If it is loc, query directly according to the content. -​ Get elements based on query parameters. -​ If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', and 'xpath:' methods. When there is no control mode, it is searched by text mode by default. -​ If it is loc, query directly according to the content. +Parameter Description: -​ Parameter Description: +- loc_or_str: [Tuple[str, str], str] - the positioning information of the element, which can be a loc tuple or a query string +- mode: str - 'single' or'all', corresponding to find one or all +- timeout: float - Find the timeout of the element -- loc_or_str - Query condition parameters - -- mode - Find one or more, pass in 'single' or 'all' - -- show_errmsg - Whether to throw and display when an exception occurs - -​ Examples: +Example: - Find with loc tuple: - - ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children + - ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class -- Find with query string: +- Find with query string: Attributes, tag name and attributes, text, xpath, css selector. Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string. - - page.ele('@class:ele_class') - Return the first class element containing ele_class - - page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name - - page.ele('@placeholder') - Return the first element with placeholder attribute - - page.ele('tag:p') - Return the first

element - - page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class - - page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class - - page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text - - page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text - - page.ele('text:some_text') - Returns the first element whose text contains some_text - - page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line) - - page.ele('text=some_text') - Returns the first element whose text is equal to some_text - - page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath - - page.ele('css:div.ele_class') - Return the first element that matches the css selector + - ele.ele('@class:ele_class') - returns the first class element that contains ele_class + - ele.ele('@name=ele_name') - returns the first element whose name is equal to ele_name + - ele.ele('@placeholder') - returns the first element with placeholder attribute + - ele.ele('tag:p') - returns the first p element + - ele.ele('tag:div@class:ele_class') - Returns the div element with ele_class in the first class + - ele.ele('tag:div@class=ele_class') - returns the first div element whose class is equal to ele_class + - ele.ele('tag:div@text():some_text') - returns the first div element whose text contains some_text + - ele.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text + - ele.ele('text:some_text') - returns the first element whose text contains some_text + - ele.ele('some_text') - returns the first text element containing some_text (equivalent to the previous line) + - ele.ele('text=some_text') - returns the first element whose text is equal to some_text + - ele.ele('xpath://div[@class="ele_class"]') - Return the first element that matches xpath + - ele.ele('css:div.ele_class') - returns the first element that matches the css selector + +Returns: [DriverElement, str] + + ### eles() -​ eles(loc_or_str: Union[tuple, str], show_errmsg: bool = False) -> List[SessionElement] +Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method. -​ Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method. +Parameter Description: + +- loc_or_str: [Tuple[str, str], str] - query condition parameter +- timeout: float - Find the timeout of the element + +Returns: List[DriverElement or str] + + + +### get_style_property() + +Returns the element style attribute value. + +Parameter Description: + +- style: str - style attribute name +- pseudo_ele: str - pseudo element name + +Returns: str + + + +### click() + +Click on the element. If it is unsuccessful, click in js mode. You can specify whether to click in js mode. + +Parameter Description: + +- by_js: bool - whether to click with js + +Returns: bool + + + +### input() + +Enter text and return whether it is successful. + +Parameter Description: + +- value: str - text value +- clear: bool - whether to clear the text box before typing + +Returns: bool + + + +### run_script() + +Execute the js code and pass in yourself as the first parameter. + +Parameter Description: + +- script: str - JavaScript text +- *args - incoming parameters + +Returns: Any + + + +### submit() + +submit Form. + +Returns: None + + + +### clear() + +Clear the text box. + +Returns: None + + + +### is_selected() + +Whether the element is selected. + +Returns: bool + + + +### is_enabled() + +Whether the element is available on the page. + +Returns: bool + + + +### is_displayed() + +Whether the element is visible. + +Returns: bool + + + +### is_valid() + +Whether the element is still in the DOM. This method is used to determine when the page jump element cannot be used + +Returns: bool + + + +### screenshot() + +Take a screenshot of the web page and return the path of the screenshot file + +Parameter Description: + +- path: str - The screenshot save path, the default is the temporary folder specified in the ini file +- filename: str - the name of the screenshot file, the default is the page title as the file name + +Returns: str + + + +### select() + +Select from the drop- down list. + +Parameter Description: + +- text: str - option text + +Returns: bool - success + + + +### set_attr() + +Set element attributes. + +Parameter Description: + +- attr: str - parameter name +- value: str - parameter value + +Returns: bool - whether it was successful + + + +### drag() + +Drag the current element a certain distance, and return whether the drag is successful. + +Parameter Description: + +- x: int - drag distance in x direction +- y: int - drag distance in y direction +- speed: int - drag speed +- shake: bool - whether to shake randomly + +Returns: bool + + + +### drag_to() + +Drag the current element, the target is another element or coordinate tuple, and return whether the drag is successful. + +Parameter Description: + +- ele_or_loc[tuple, WebElement, DrissionElement] - Another element or relative current position, the coordinates are the coordinates of the element's midpoint. +- speed: int - drag speed +- shake: bool - whether to shake randomly + +Returns: bool + + + +### hover() + +Hover the mouse over the element. + +Returns: None + + + +## SessionElement Class + +### class SessionElement() + +The element object in session mode encapsulates an Element object and encapsulates common functions. + +Parameter Description: + +- ele: HtmlElement - HtmlElement object of lxml library +- page: SessionPage - the page object where the element is located + + + +### inner_ele + +The wrapped HTMLElement object. + +Returns: HtmlElement + + + +### attrs + +Returns the names and values of all attributes of the element in dictionary format. + +Returns: dict + + + +### text + +Returns the text within the element, namely innerText. + +Returns: str + + + +### html + +Returns the outerHTML text of the element. + +Returns: str + + + +### inner_html + +Returns the innerHTML text of the element. + +Returns: str + + + +### tag + +Returns the element tag name. + +Returns: srt + + + +### css_path + +Returns the absolute path of the element css selector. + +Returns: srt + + + +### xpath + +Returns the absolute path of the element xpath. + +Returns: srt + + + +### parent + +Returns the parent element object. + +Returns: SessionElement + + + +### next + +Return the next sibling element object. + +Returns: SessionElement + + + +### prev + +Returns the previous sibling element object. + +Returns: SessionElement + + + +### parents() + +Returns the Nth level parent element object. + +Parameter Description: + +- num: int - which level of parent element + +Returns: SessionElement + + + +### nexts() + +Returns the text of the numth sibling element or node. + +Parameter Description: + +- num- the next few sibling elements +- mode: str - 'ele','node' or'text', matching element, node, or text node + +Returns: [SessionElement, str] + + + +### prevs() + +Return the first N sibling element objects. + +Parameter Description: + +- num- the first few sibling elements +- mode: str - 'ele','node' or'text', matching element, node, or text node + +Returns: [SessionElement, str] -​ Parameter Description: -- loc_or_str - Query condition parameters -- show_errmsg - Whether to throw and display when an exception occurs ### attr() -​ attr(attr: str) -> str +Get the value of an attribute of an element. -​ Get the value of an attribute of an element. +Parameter Description: + +- attr: str - attribute name + +Returns: str + + + +### ele() + +Get elements based on query parameters. +If the query parameter is a string, you can choose the methods of'@attribute name:','tag:','text:','css:', and'xpath:'. When there is no control mode, the text mode is used to search by default. +If it is loc, query directly according to the content. + +Parameter Description: + +- loc_or_str:[Tuple[str, str], str] - query condition parameter + +- mode: str - Find one or more, pass in'single' or'all' + + +Example: + +- Find with loc tuple: + +- ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class + +- Find with query string: + +Attributes, tag name and attributes, text, xpath, css selector. + +Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string. + +- ele.ele('@class:ele_class') - return the first class element containing ele_class +- ele.ele('@name=ele_name') - returns the first element whose name is equal to ele_name +- ele.ele('@placeholder') - returns the first element with placeholder attribute +- ele.ele('tag:p') - return the first p element +- ele.ele('tag:div@class:ele_class') - Returns the div element with ele_class in the first class +- ele.ele('tag:div@class=ele_class') - returns the first div element whose class is equal to ele_class +- ele.ele('tag:div@text():some_text') - returns the first div element whose text contains some_text +- ele.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text +- ele.ele('text:some_text') - returns the first element whose text contains some_text +- ele.ele('some_text') - returns the first element whose text contains some_text (equivalent to the previous line) +- ele.ele('text=some_text') - returns the first element whose text is equal to some_text +- ele.ele('xpath://div[@class="ele_class"]') - Return the first element that matches xpath +- ele.ele('css:div.ele_class') - returns the first element that matches the css selector + +Returns: [SessionElement, str] + + + +### eles() + +Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method. + +Parameter Description: + +- loc_or_str: [Tuple[str, str], str] - query condition parameter + +Returns: List[SessionElement or str] -​ Parameter Description: -- attr - Attribute name ## OptionsManager class -​ class OptionsManager(path: str = None) +### class OptionsManager() -​ The class that manages the content of the configuration file. +The class that manages the content of the configuration file. + +Parameter Description: + +- path: str - the path of the ini file, if not passed in, the configs.ini file in the current folder will be read by default -​ Parameter Description: -- path - Ini file path, if not imported, the configs.ini file in the current folder is read by default ### get_value() -​ get_value(section: str, item: str) -> Any +Get the configured value. -​ Get the configured value. +Parameter Description: + +- section: str - section name +- item: str - configuration item name + +Returns: Any -​ Parameter Description: -- section - Paragraph name -- item - Configuration item name ### get_option() -​ get_option(section: str) -> dict +Return the configuration information of the entire paragraph in dictionary format. -​ Return configuration information for the entire paragraph in dictionary format. +Parameter Description: + +- section: str - section name + +Returns: dict -​ Parameter Description: -- section - Paragraph name ### set_item() -​ set_item(section: str, item: str, value: str) -> OptionsManager +Set the configuration value and return to yourself for chain operation. -​ Set configuration values. +Parameter Description: + +- section: str - section name +- item: str - configuration item name +- value: Any - value content + +Return: OptionsManager - return to yourself -​ Parameter Description: -- section - Paragraph name -- item - Configuration item name -- value - Content of value ### save() -​ save(path: str = None) -> OptionsManager +Save the settings to a file and return to yourself for chain operation. -​ Save the settings to a file. +Parameter Description: -​ Parameter Description: +- path: str - the path of the ini file, saved to the module folder by default -- path - The path of the ini file, which is saved to the module folder by default +Return: OptionsManager - return to yourself ## DriverOptions class -​ class DriverOptions(read_file=True) +### class DriverOptions() -​ The chrome browser configuration class, inherited from the Options class of selenium.webdriver.chrome.options, adds methods to delete configuration and save to file. +The Chrome browser configuration class, inherited from the Options class of selenium.webdriver.chrome.options, adds the methods of deleting configuration and saving to file. + +Parameter Description: + +- read_file: bool - Whether to read configuration information from the ini file when creating -​ Parameter Description: -- read_file - Boolean, specifies whether to read configuration information from the ini file when creating ### driver_path -​ Path of chromedriver.exe. +The path of chromedriver.exe. + +Returns: str + + ### chrome_path -​ Path of chrome.exe. +chrome.exe path -### remove_argument() +Returns: str -​ remove_argument(value: str) -> DriverOptions -​ Remove a setting. - -​ Parameter Description: - -- value - The attribute value to be removed - -### remove_experimental_option() - -​ remove_experimental_option(key: str) -> DriverOptions - -​ Remove an experiment setting and delete the incoming key value. - -​ Parameter Description: - -- key - The key value of the experiment to be removed - -### remove_argument() - -​ remove_argument() -> DriverOptions - -​ Remove all plug-ins, because the plug-in is stored in the entire file, it is difficult to remove one of them, so if you need to set, remove all and reset. ### save() -​ save(path: str = None) -> DriverOptions +Save the settings to a file and return to yourself for chain operation. -​ Save the settings to a file. +Parameter Description: + +- path: str - the path of the ini file, saved to the module folder by default + +Return: DriverOptions - return self + + + +### remove_argument() + +Remove a setting. + +Parameter Description: + +- value: str - the attribute value to be removed + +Return: DriverOptions - return self + + + +### remove_experimental_option() + +Remove an experiment setting and delete the key value. + +Parameter Description: + +- key: str - the key value of the experiment setting to be removed + +Return: DriverOptions - return self + + + +### remove_all_extensions() + +Remove all plug- ins, because plug- ins are stored in the entire file, it is difficult to remove one of them, so if you need to set, remove all and reset. + +Return: DriverOptions - return self -​ Parameter Description: -- path - The path of the ini file, which is saved to the module folder by default ### set_argument() -​ set_argument(arg: str, value: Union[bool, str]) -> DriverOptions +Set the chrome attribute, the attribute with no value can be set to switch, and the attribute with value can set the value of the attribute. -​ Set the chrome attribute, the attribute with no value can be set to switch, the attribute with the value can set the value of the attribute. +Parameter Description: + +- arg: str - attribute name +- value[bool, str] - attribute value, the attribute with value is passed in the value, and the attribute without value is passed in bool + +Return: DriverOptions - return self -​ Parameter description: -- arg - attribute name -- value - the attribute value, the attribute with value is passed in the value, the attribute without value is passed in bool ### set_headless() -​ set_headless(on_off: bool = True) -> DriverOptions +Turn on or off the interfaceless mode. -​ Turn on or off the interfaceless mode. +Parameter Description: + +on_off: bool - turn on or off + +Return: DriverOptions - return self -​ Parameter Description: -​ on_off - open or close, bool ### set_no_imgs() -​ set_no_imgs(on_off: bool = True) -> DriverOptions +Whether to load the picture. -​ Whether to load pictures. +Parameter Description: + +on_off: bool - turn on or off + +Return: DriverOptions - return self -​ Parameter Description: -​ on_off - open or close, bool ### set_no_js() -​ set_no_js(on_off: bool = True) -> DriverOptions +Whether to disable js. -​ Whether to disable js. +Parameter Description: + +on_off: bool - turn on or off + +Return: DriverOptions - return self -​ Parameter Description: -​ on_off - open or close, bool ### set_mute() -​ set_mute(on_off: bool = True) -> DriverOptions +Whether to mute. -​ Whether to mute. +Parameter Description: + +on_off: bool - turn on or off + +Return: DriverOptions - return self -​ Parameter Description: -​ on_off - open or close, bool ### set_user_agent() -​ set_user_agent(user_agent: str) -> DriverOptions +Set the browser user agent. -​ Set the browser user agent. +Parameter Description: + +- user_agent: str - user agent string + +Return: DriverOptions - return self -​ Parameter Description: -- user_agent - user agent string ### set_proxy() -​ set_proxy(proxy: str) -> DriverOptions +Set up a proxy. -​ Set up a proxy. +Parameter Description: + +- proxy: str - proxy address + +Return: DriverOptions - return self -​ Parameter Description: -- proxy - proxy address ### set_paths() -​ set_paths(driver_path: str = None, chrome_path: str = None, debugger_address: str = None, download_path: str = None, user_data_path: str = None, cache_path: str = None) -> DriverOptions +Set the path related to the browser. -​ Set browser-related paths. +Parameter Description: -​ Parameter Description: +- driver_path: str - the path of chromedriver.exe +- chrome_path: str - the path of chrome.exe +- debugger_address: str - debug browser address, for example: 127.0.0.1:9222 +- download_path: str - download file path +- user_data_path: str - user data path +- cache_path: str - cache path -- driver_path - path of chromedriver.exe -- chrome_path - path of chrome.exe -- debugger_address - debug browser address, for example: 127.0.0.1:9222 -- download_path - download file path -- user_data_path - user data path -- cache_path - cache path +Return: DriverOptions - return self -## easy_set methods +## easy_set method - The configuration of chrome is too difficult to remember, so the commonly used configuration is written as a simple method, and the call will modify the relevant content of the ini file. +Chrome's configuration is too difficult to remember, so the commonly used configuration is written as a simple method, and the call will modify the relevant content of the ini file. - ### set_paths() +### set_paths() - ​ set_paths(driver_path: str = None, chrome_path: str = None, debugger_address: str = None, global_tmp_path: str = None, download_path: str = None, user_data_path: str = None, cache_path: str = None, check_version: bool = True) -> None +Convenient way to set the path, save the incoming path to the default ini file, and check whether the chrome and chromedriver versions match. - ​ Convenient way to set the path, save the incoming path to the default ini file, and check whether the chrome and chromedriver versions match. +Parameter Description: - ​ Parameter Description: +- driver_path: str - chromedriver.exe path +- chrome_path: str - chrome.exe path +- debugger_address: str - debug browser address, for example: 127.0.0.1:9222 +- download_path: str - download file path +- global_tmp_path: str - Temporary folder path +- user_data_path: str - user data path +- cache_path: str - cache path +- check_version: bool - whether to check if chromedriver and chrome match - - driver_path - the path of chromedriver.exe - - chrome_path - the path of chrome.exe - - debugger_address - Debug browser address, eg. 127.0.0.1:9222 - - download_path - File download path - - global_tmp_path - Temporary folder path - - user_data_path - User data path - - cache_path - Cache path - - check_version - Whether to check whether chromedriver and chrome match +Returns: None - ### set_argument() - set_argument(arg: str, value: Union[bool, str]) -> None - Set the properties. If the attribute has no value (such as' zh_CN.utf-8 '), the value is passed into the bool to indicate the switch; Otherwise, value passes in STR, and when value is "" or False, the attribute entry is deleted. +### set_argument() - ​ Parameter Description: +Set the properties. If the attribute has no value (such as'zh_CN.UTF- 8'), value is passed in bool to indicate switch; otherwise, value is passed in str, and when value is'' or False, delete the attribute item. - - arg - Attribute name - - value - Attribute value, pass in a value if it has a value, pass in a bool if it doesn't +Parameter Description: - ### set_headless() +- arg:str - Property name +- value[bool, str] - Attribute value, the attribute with value is passed in the value, and the attribute without value is passed in bool - ​ set_headless(on_off: bool) -> None +Returns: None - ​ Turn headless mode on or off. - ​ Parameter Description: - - on_off - Whether to enable headless mode +### set_headless() - ### set_no_imgs() +Turn headless mode on or off. - ​ set_no_imgs(on_off: bool) -> None +Parameter Description: - ​ Turn the picture display on or off. +- on_off: bool - whether to turn on headless mode - ​ Parameter Description: +Returns: None - - on_off - Whether to enable no-picture mode - ### set_no_js() - ​ set_no_js(on_off: bool) -> None +### set_no_imgs() - ​ Turn JS mode on or off. +Turn picture display on or off. - ​ Parameter Description: +Parameter Description: - - on_off - Whether to enable or disable JS mode +- on_off: bool - Whether to turn on the no image mode - ### set_mute() +Returns: None - ​ set_mute(on_off: bool) -> None - ​ Turn silent mode on or off. - ​ Parameter Description: +### set_no_js() - - on_off - Whether to turn on silent mode +Turn on or off disable JS mode. - ### set_user_agent() +Parameter Description: - ​ set_user_agent(user_agent: str) -> None: +- on_off: bool - Whether to enable the disable JS mode - ​ Set user_agent. +Returns: None - ​ Parameter Description: - - user_agent - user_agent value - ### set_proxy() +### set_mute() - ​ set_proxy(proxy: str) -> None +Turn on or off the silent mode. - ​ Set up the proxy. +Parameter Description: - ​ Parameter Description: +- on_off: bool - Whether to turn on silent mode - - proxy - Proxy value +Returns: None - ### check_driver_version() - ​ check_driver_version(driver_path: str = None, chrome_path: str = None) -> bool - ​ Check if the chrome and chromedriver versions match. +### set_user_agent() - ​ Parameter Description: +Set user_agent. - - driver_path - the path of chromedriver.exe - - chrome_path - the path of chrome.exe \ No newline at end of file +Parameter Description: + +- user_agent: str - user_agent value + +Returns: None + + + +### set_proxy() + +Set up a proxy. + +Parameter Description: + +- proxy: str - proxy value + +Returns: None + + + +### check_driver_version() + +Check if the chrome and chromedriver versions match. + +Parameter Description: + +- driver_path: bool - chromedriver.exe path +- chrome_path: boo - chrome.exe path + +Returns: bool \ No newline at end of file diff --git a/README.zh-cn.md b/README.zh-cn.md index 2034a84..ec2196d 100644 --- a/README.zh-cn.md +++ b/README.zh-cn.md @@ -430,7 +430,7 @@ page.get(url, retry=5, interval=0.5) ### 切换模式 -在 s 和 d 模式之间切换,切换时会自动同步cookies和正在访问的url +在 s 和 d 模式之间切换,切换时会自动同步cookies和正在访问的url。 ```python page.change_mode(go=False) # go为False表示不跳转url @@ -461,7 +461,7 @@ page.current_tab_handle # 返回当前标签页handle ### 页面操作 -调用只属于 d 模式的方法,会自动切换到 d 模式。详细用法见APIs +调用只属于 d 模式的方法,会自动切换到 d 模式。详细用法见APIs。 ```python page.change_mode() # 切换模式 @@ -566,7 +566,8 @@ ele2 = ele1('tag:li') ## 获取元素属性 ```python -element.html # 返回元素内html +element.html # 返回元素outerHTML +element.inner_html # 返回元素innerHTML element.tag # 返回元素tag name element.text # 返回元素innerText值 element.texts() # 返回元素内所有直接子节点的文本,包括元素和文本节点,可指定只返回文本节点 @@ -817,7 +818,7 @@ drission = Drission(ini_path = 'D:\\settings.ini') # 使用其它ini文件创 ## easy_set方法 -​ 调用easy_set方法会修改默认ini文件相关内容。 +调用easy_set方法会修改默认ini文件相关内容。 ```python set_headless(True) # 开启headless模式 @@ -1194,7 +1195,7 @@ MixPage封装了页面操作的常用功能,可在driver和session模式间无 - page.ele('@class:ele_class') - 返回第一个class含有ele_class的元素 - page.ele('@name=ele_name') - 返回第一个name等于ele_name的元素 - page.ele('@placeholder') - 返回第一个带placeholder属性的元素 - - page.ele('tag:p') - 返回第一个

元素 + - page.ele('tag:p') - 返回第一个p元素 - page.ele('tag:div@class:ele_class') - 返回第一个class含有ele_class的div元素 - page.ele('tag:div@class=ele_class') - 返回第一个class等于ele_class的div元素 - page.ele('tag:div@text():some_text') - 返回第一个文本含有some_text的div元素 @@ -1427,7 +1428,7 @@ d模式时检查网页是否符合预期。默认由response状态检查,可 - loc_or_ele:[int, str, tuple, WebElement, DriverElement] - 查找iframe元素的条件,可接收iframe序号(0开始)、id或name、查询字符串、loc参数、WebElement对象、DriverElement对象,传入'main'跳到最高层,传入'parent'跳到上一层 -​ 示例: +示例: - to_iframe('tag:iframe') - 通过传入iframe的查询字符串定位 - to_iframe('iframe_id') - 通过iframe的id属性定位 - to_iframe('iframe_name') - 通过iframe的name属性定位 @@ -1572,7 +1573,15 @@ driver模式的元素对象,包装了一个WebElement对象,并封装了常 ### html -返回元素内html文本。 +返回元素outerHTML文本。 + +返回: str + + + +### inner_html + +返回元素innerHTML文本。 返回: str @@ -1771,7 +1780,7 @@ driver模式的元素对象,包装了一个WebElement对象,并封装了常 - ele.ele('@class:ele_class') - 返回第一个class含有ele_class的元素 - ele.ele('@name=ele_name') - 返回第一个name等于ele_name的元素 - ele.ele('@placeholder') - 返回第一个带placeholder属性的元素 - - ele.ele('tag:p') - 返回第一个

元素 + - ele.ele('tag:p') - 返回第一个p元素 - ele.ele('tag:div@class:ele_class') - 返回第一个class含有ele_class的div元素 - ele.ele('tag:div@class=ele_class') - 返回第一个class等于ele_class的div元素 - ele.ele('tag:div@text():some_text') - 返回第一个文本含有some_text的div元素 @@ -1981,16 +1990,16 @@ session模式的元素对象,包装了一个Element对象,并封装了常用 参数说明: -- ele: _Element - lxml库的Element对象 +- ele: HtmlElement - lxml库的HtmlElement对象 - page: SessionPage - 元素所在页面对象 ### inner_ele -被包装的_Element对象。 +被包装的HTMLElement对象。 -返回: _Element +返回: HtmlElement @@ -2012,7 +2021,15 @@ session模式的元素对象,包装了一个Element对象,并封装了常用 ### html -返回元素内html文本,即innerHTML。 +返回元素outerHTML文本。 + +返回: str + + + +### inner_html + +返回元素innerHTML文本。 返回: str @@ -2133,27 +2150,27 @@ session模式的元素对象,包装了一个Element对象,并封装了常用 - 用loc元组查找: - - ele.ele((By.CLASS_NAME, 'ele_class')) - 返回第一个class为ele_class的子元素 +- ele.ele((By.CLASS_NAME, 'ele_class')) - 返回第一个class为ele_class的子元素 - 用查询字符串查找: - 属性、tag name和属性、文本、xpath、css selector。 +属性、tag name和属性、文本、xpath、css selector。 - 其中,@表示属性,=表示精确匹配,:表示模糊匹配,无控制字符串时默认搜索该字符串。 +其中,@表示属性,=表示精确匹配,:表示模糊匹配,无控制字符串时默认搜索该字符串。 - - ele.ele('@class:ele_class') - 返回第一个class含有ele_class的元素 - - ele.ele('@name=ele_name') - 返回第一个name等于ele_name的元素 - - ele.ele('@placeholder') - 返回第一个带placeholder属性的元素 - - ele.ele('tag:p') - 返回第一个

元素 - - ele.ele('tag:div@class:ele_class') - 返回第一个class含有ele_class的div元素 - - ele.ele('tag:div@class=ele_class') - 返回第一个class等于ele_class的div元素 - - ele.ele('tag:div@text():some_text') - 返回第一个文本含有some_text的div元素 - - ele.ele('tag:div@text()=some_text') - 返回第一个文本等于some_text的div元素 - - ele.ele('text:some_text') - 返回第一个文本含有some_text的元素 - - ele.ele('some_text') - 返回第一个文本含有some_text的元素(等价于上一行) - - ele.ele('text=some_text') - 返回第一个文本等于some_text的元素 - - ele.ele('xpath://div[@class="ele_class"]') - 返回第一个符合xpath的元素 - - ele.ele('css:div.ele_class') - 返回第一个符合css selector的元素 +- ele.ele('@class:ele_class') - 返回第一个class含有ele_class的元素 +- ele.ele('@name=ele_name') - 返回第一个name等于ele_name的元素 +- ele.ele('@placeholder') - 返回第一个带placeholder属性的元素 +- ele.ele('tag:p') - 返回第一个p元素 +- ele.ele('tag:div@class:ele_class') - 返回第一个class含有ele_class的div元素 +- ele.ele('tag:div@class=ele_class') - 返回第一个class等于ele_class的div元素 +- ele.ele('tag:div@text():some_text') - 返回第一个文本含有some_text的div元素 +- ele.ele('tag:div@text()=some_text') - 返回第一个文本等于some_text的div元素 +- ele.ele('text:some_text') - 返回第一个文本含有some_text的元素 +- ele.ele('some_text') - 返回第一个文本含有some_text的元素(等价于上一行) +- ele.ele('text=some_text') - 返回第一个文本等于some_text的元素 +- ele.ele('xpath://div[@class="ele_class"]') - 返回第一个符合xpath的元素 +- ele.ele('css:div.ele_class') - 返回第一个符合css selector的元素 返回: [SessionElement, str] @@ -2294,7 +2311,7 @@ chrome.exe的路径 参数说明: -- key - 要移除的实验设置key值 +- key:str - 要移除的实验设置key值 返回: DriverOptions - 返回自己 @@ -2439,8 +2456,8 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- arg:str   - 属性名 -- value[bool, str] - 属性值,有值的属性传入值,没有的传入bool +- arg:str - 属性名 +- value[bool, str] - 属性值,有值的属性传入值,没有的传入bool 返回: None @@ -2452,7 +2469,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- on_off: bool - 是否开启headless模式 +- on_off: bool - 是否开启headless模式 返回: None @@ -2464,7 +2481,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- on_off: bool - 是否开启无图模式 +- on_off: bool - 是否开启无图模式 返回: None @@ -2476,7 +2493,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- on_off: bool - 是否开启禁用JS模式 +- on_off: bool - 是否开启禁用JS模式 返回: None @@ -2488,7 +2505,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- on_off: bool - 是否开启静音模式 +- on_off: bool - 是否开启静音模式 返回: None @@ -2500,7 +2517,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- user_agent: str - user_agent值 +- user_agent: str - user_agent值 返回: None @@ -2512,7 +2529,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- proxy: str - 代理值 +- proxy: str - 代理值 返回: None @@ -2524,7 +2541,7 @@ chrome的配置太难记,所以把常用的配置写成简单的方法,调 参数说明: -- driver_path: bool  - chromedriver.exe路径 -- chrome_path: bool - chrome.exe路径 +- driver_path: bool - chromedriver.exe路径 +- chrome_path: bool - chrome.exe路径 返回: bool \ No newline at end of file