Web Scraper Provider
The Web Scraper Provider finds and extracts any data out of a HTML website.
Whenever you need to extract values from websites like fuel prices, headlines, playlist titles, statistics, game results, measurement or surveillance data, monitor status pages etc etc. use the Web Scraper Provider.
Configuration
-
URL
The website URL to query. -
HTTP Method
Method
Description
Get
HTTP GET method, this is the default.
Post
HTTP POST method
Put
HTTP PUT method
-
Request Body
POST or PUT body data.
This can contain a JSON string to request certain API data. -
Request MIME Type
POST or PUT body content MIME type like "text/plain" or "application/json". -
Authentication The following fields set up a HTTP authentication.
-
Authentication Method Selects the HTTP authentication method:
Method
Description
None
No authentication required
Basic
Basic access authentication
Token
Bearer Token authentication
-
Username
Username if basic authentication is required, empty if HTTP authentication is not necessary. -
Password
Password if basic authentication is required. -
Token
A token string for bearer authentication.
-
-
Confidential URL Parameter The following parameter can be used in the URL:
-
Username
Username for the URL, used as$USERin the URL, like…/xyz.html?user=$USER. -
Password
Password for the URL, used as$PASSin the URL, like…/xyz.html?pw=$PASS. -
Optional Parameter
Secure optional parameter like a code etc. for the URL, used as$PARAin the URL, like…/xyz.html?code=$PARA.
-
-
Interval
The query interval in seconds. -
Ignore Sleep
True to ignore sleep, false to pause requests during sleep.
False is the default to safe power.
Query
-
Value
Returns the response value.-
Param 1
A valid selector expression, see below.TipTo get a CSS selector expression to an element within a HTML page:
- Open the according page and copy the URL to the configuration URL
- Using Firefox: Open theWeb Developer Toolsand select the according element using the inspector. Right click on the HTML element text and selectCopy > CSS Selector. Copy the selector to Param 1.
- Using Chrome: OpenMore tools > Developer toolsand select the according element using the inspector. Right click on the HTML element text and selectCopy > Copy selector. Copy the selector to Param 1.
-
-
Status
Get the result status:Text
Numeric
Description
N/A
0
No result available.
Excellent
1
Result answer available.
Fail
5
Result parsing or format error
Selector expression overview
A CSS (or jquery) selector syntax is used to find matching elements, that allows very powerful and robust queries.
tyckr uses the jsoup engine for data extraction, the following is taken from the jsoup documentation:
-
tagnamefind elements by tag, e.g.a -
ns|tagfind elements by tag in a namespace, e.g.fb|namefinds<fb:name>elements -
#idfind elements by ID, e.g.#logo -
.classfind elements by class name, e.g..masthead -
[attribute]elements with attribute, e.g.[href] -
[^attr]elements with an attribute name prefix, e.g.[^data-]finds elements with HTML5 dataset attributes -
[attr=value]elements with attribute value, e.g.[width=500](also quotable, like[data-name='launch sequence']) -
[attr^=value],[attr$=value],[attr*=value]elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/] -
[attr~=regex]elements with attribute values that match regular expression; e.g.img[src~=(?i)\.(png|jpe?g)] -
*all elements, e.g.*
Selector combinations
-
el#idelements with ID, e.g.div#logo -
el.classelements with class, e.g.div.masthead -
el[attr]elements with attribute, e.g.a[href] -
Any combination, e.g.
a[href].highlight -
ancestor childchild elements that descend from ancestor, e.g..body pfindspelements anywhere under a block with class "body" -
parent > childchild elements that descend directly from parent, e.g.div.content > pfindspelements; andbody > *finds the direct children of the body tag -
siblingA + siblingBfinds sibling B element immediately preceded by sibling A, e.g.div.head + div -
siblingA ~ siblingXfinds sibling X element preceded by sibling A, e.g.h1 ~ p -
el, el, elgroup multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
-
:lt(n)find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n; e.g.td:lt(3) -
:gt(n)find elements whose sibling index is greater than n; e.g.div p:gt(2) -
:eq(n)find elements whose sibling index is equal to n; e.g.form input:eq(1) -
:has(selector)find elements that contain elements matching the selector; e.g.div:has(p) -
:not(selector)find elements that do not match the selector; e.g.div:not(.logo) -
:contains(text)find elements that contain the given text. The search is case-insensitive; e.g.p:contains(jsoup) -
:containsOwn(text)find elements that directly contain the given text -
:matches(regex)find elements whose text matches the specified regular expression; e.g.div:matches((?i)login) -
:matchesOwn(regex)find elements whose own text matches the specified regular expression -
Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc.
See the Selector API reference for the full supported list and details.