Compare commits

...

9 Commits

10 changed files with 104 additions and 44 deletions

View File

@@ -13,15 +13,33 @@ _Live your data-life pro-actively, Detect website changes and perform meaningful
- Chrome browser included.
- Super fast, no registration needed setup.
- Start watching and receiving change notifications instantly.
- Get started watching and receiving website change notifications straight away.
Easily see what changed, examine by word, line, or individual character.
### Target specific parts of the webpage using the Visual Selector tool.
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/docs/screenshot-diff.png" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />
Available when connected to a <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Playwright-content-fetcher">playwright content fetcher</a> (included as part of our subscription service)
[<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/docs/visualselector-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />](https://lemonade.changedetection.io/start?src=github)
### Easily see what changed, examine by word, line, or individual character.
[<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/docs/screenshot-diff.png" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />](https://lemonade.changedetection.io/start?src=github)
#### Example use cases
### Perform interactive browser steps
Fill in text boxes, click buttons and more, setup your changedetection scenario.
Using the **Browser Steps** configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, accept cookie logins, entering dates and refining searches.
[<img src="docs/browsersteps-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Website change detection with interactive browser steps, login, cookies etc" />](https://lemonade.changedetection.io/start?src=github)
After **Browser Steps** have been run, then visit the **Visual Selector** tab to refine the content you're interested in.
Requires Playwright to be enabled.
### Example use cases
- Products and services have a change in pricing
- _Out of stock notification_ and _Back In stock notification_
@@ -59,27 +77,8 @@ _Need an actual Chrome runner with Javascript support? We support fetching via W
We [recommend and use Bright Data](https://brightdata.grsm.io/n0r16zf7eivq) global proxy services, Bright Data will match any first deposit up to $100 using our signup link.
## Screenshots
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/
### Target specific parts of the webpage using the Visual Selector tool.
Available when connected to a <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Playwright-content-fetcher">playwright content fetcher</a> (included as part of our subscription service)
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/docs/visualselector-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />
### Perform interactive browser steps
Fill in text boxes, click buttons and more, setup your changedetection scenario.
Using the **Browser Steps** configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, accept cookie logins, entering dates and refining searches.
<img src="docs/browsersteps-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Website change detection with interactive browser steps, login, cookies etc" />
After **Browser Steps** have been run, then visit the **Visual Selector** tab to refine the content you're interested in.
Requires Playwright to be enabled.
## Installation
### Docker

View File

@@ -27,6 +27,7 @@ from flask import (
session,
url_for,
)
from flask_compress import Compress as FlaskCompress
from flask_login import login_required
from flask_restful import abort, Api
from flask_wtf import CSRFProtect
@@ -51,6 +52,10 @@ app = Flask(__name__,
static_url_path="",
static_folder="static",
template_folder="templates")
from flask_compress import Compress
# Super handy for compressing large BrowserSteps responses and others
FlaskCompress(app)
# Stop browser caching of assets
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0

View File

@@ -210,16 +210,29 @@ def construct_blueprint(datastore: ChangeDetectionStore):
except playwright._impl._api_types.Error as e:
return make_response("Browser session ran out of time :( Please reload this page."+str(e), 401)
p = {'screenshot': "data:image/png;base64,{}".format(
# Use send_file() which is way faster than read/write loop on bytes
import json
from tempfile import mkstemp
from flask import send_file
tmp_fd, tmp_file = mkstemp(text=True, suffix=".json", prefix="changedetectionio-")
output = json.dumps({'screenshot': "data:image/png;base64,{}".format(
base64.b64encode(state[0]).decode('ascii')),
'xpath_data': state[1],
'session_age_start': this_session.age_start,
'browser_time_remaining': round(remaining)
}
})
with os.fdopen(tmp_fd, 'w') as f:
f.write(output)
# @todo BSON/binary JSON, faster xfer, OR pick it off the disk
return p
response = make_response(send_file(path_or_file=tmp_file,
mimetype='application/json; charset=UTF-8',
etag=True))
# No longer needed
os.unlink(tmp_file)
return response
return browser_steps_blueprint

View File

@@ -22,6 +22,7 @@ browser_step_ui_config = {'Choose one': '0 0',
'Click element': '1 0',
'Click element containing text': '0 1',
'Enter text in field': '1 1',
'Execute JS': '0 1',
# 'Extract text and use as filter': '1 0',
'Goto site': '0 0',
'Press Enter': '0 0',
@@ -97,6 +98,9 @@ class steppable_browser_interface():
self.page.fill(selector, value, timeout=10 * 1000)
def action_execute_js(self, selector, value):
self.page.evaluate(value)
def action_click_element(self, selector, value):
print("Clicking element")
if not len(selector.strip()):
@@ -232,7 +236,7 @@ class browsersteps_live_ui(steppable_browser_interface):
self.page.evaluate("var include_filters=''")
# Go find the interactive elements
# @todo in the future, something smarter that can scan for elements with .click/focus etc event handlers?
elements = 'a,button,input,select,textarea,i,th,td,p,li,h1,h2,h3,h4'
elements = 'a,button,input,select,textarea,i,th,td,p,li,h1,h2,h3,h4,div,span'
xpath_element_js = xpath_element_js.replace('%ELEMENTS%', elements)
xpath_data = self.page.evaluate("async () => {" + xpath_element_js + "}")
# So the JS will find the smallest one first

View File

@@ -400,6 +400,15 @@ class watchForm(commonSettingsForm):
self.body.errors.append('Body must be empty when Request Method is set to GET')
result = False
# Attempt to validate jinja2 templates in the URL
from jinja2 import Environment
# Jinja2 available in URLs along with https://pypi.org/project/jinja2-time/
jinja2_env = Environment(extensions=['jinja2_time.TimeExtension'])
try:
ready_url = str(jinja2_env.from_string(self.url.data).render())
except Exception as e:
self.url.errors.append('Invalid template syntax')
result = False
return result

View File

@@ -93,12 +93,23 @@ class model(dict):
@property
def link(self):
url = self.get('url', '')
ready_url = url
if '{%' in url or '{{' in url:
from jinja2 import Environment
# Jinja2 available in URLs along with https://pypi.org/project/jinja2-time/
jinja2_env = Environment(extensions=['jinja2_time.TimeExtension'])
return str(jinja2_env.from_string(url).render())
return url
try:
ready_url = str(jinja2_env.from_string(url).render())
except Exception as e:
from flask import (
flash, Markup, url_for
)
message = Markup('<a href="{}#general">The URL {} is invalid and cannot be used, click to edit</a>'.format(
url_for('edit_page', uuid=self.get('uuid')), self.get('url', '')))
flash(message, 'error')
return ''
return ready_url
@property
def label(self):

View File

@@ -1,3 +1,13 @@
// @file Scrape the page looking for elements of concern (%ELEMENTS%)
// http://matatk.agrip.org.uk/tests/position-and-width/
// https://stackoverflow.com/questions/26813480/when-is-element-getboundingclientrect-guaranteed-to-be-updated-accurate
//
// Some pages like https://www.londonstockexchange.com/stock/NCCL/ncondezi-energy-limited/analysis
// will automatically force a scroll somewhere, so include the position offset
// Lets hope the position doesnt change while we iterate the bbox's, but this is better than nothing
var scroll_y=+document.documentElement.scrollTop || document.body.scrollTop
// Include the getXpath script directly, easier than fetching
function getxpath(e) {
var n = e;
@@ -71,8 +81,13 @@ var bbox;
for (var i = 0; i < elements.length; i++) {
bbox = elements[i].getBoundingClientRect();
// forget really small ones
if (bbox['width'] < 15 && bbox['height'] < 15) {
// Forget really small ones
if (bbox['width'] < 10 && bbox['height'] < 10) {
continue;
}
// Don't include elements that are offset from canvas
if (bbox['top']+scroll_y < 0 || bbox['left'] < 0) {
continue;
}
@@ -109,14 +124,16 @@ for (var i = 0; i < elements.length; i++) {
continue;
}
// @todo Possible to ONLY list where it's clickable to save JSON xfer size
size_pos.push({
xpath: xpath_result,
width: Math.round(bbox['width']),
height: Math.round(bbox['height']),
left: Math.floor(bbox['left']),
top: Math.floor(bbox['top']),
top: Math.floor(bbox['top'])+scroll_y,
tagName: (elements[i].tagName) ? elements[i].tagName.toLowerCase() : '',
tagtype: (elements[i].tagName == 'INPUT' && elements[i].type) ? elements[i].type.toLowerCase() : ''
tagtype: (elements[i].tagName == 'INPUT' && elements[i].type) ? elements[i].type.toLowerCase() : '',
isClickable: (elements[i].onclick) || window.getComputedStyle(elements[i]).cursor == "pointer"
});
}
@@ -150,6 +167,7 @@ if (include_filters.length) {
if (q) {
bbox = q.getBoundingClientRect();
console.log("xpath_element_scraper: Got filter element, scroll from top was "+scroll_y)
} else {
console.log("xpath_element_scraper: filter element "+f+" was not found");
}
@@ -157,10 +175,10 @@ if (include_filters.length) {
if (bbox && bbox['width'] > 0 && bbox['height'] > 0) {
size_pos.push({
xpath: f,
width: Math.round(bbox['width']),
height: Math.round(bbox['height']),
left: Math.floor(bbox['left']),
top: Math.floor(bbox['top'])
width: parseInt(bbox['width']),
height: parseInt(bbox['height']),
left: parseInt(bbox['left']),
top: parseInt(bbox['top'])+scroll_y
});
}
}

View File

@@ -196,9 +196,7 @@ $(document).ready(function () {
$('input[placeholder="Value"]', first_available).addClass('ok').click().focus();
found_something = true;
} else {
// Assume it's just for clicking on
// what are we clicking on?
if (x['tagName'].startsWith('h')|| x['tagName'] === 'a' || x['tagName'] === 'button' || x['tagtype'] === 'submit'|| x['tagtype'] === 'checkbox'|| x['tagtype'] === 'radio'|| x['tagtype'] === 'li') {
if (x['isClickable'] || x['tagName'].startsWith('h')|| x['tagName'] === 'a' || x['tagName'] === 'button' || x['tagtype'] === 'submit'|| x['tagtype'] === 'checkbox'|| x['tagtype'] === 'radio'|| x['tagtype'] === 'li') {
$('select', first_available).val('Click element').change();
$('input[type=text]', first_available).first().val(x['xpath']);
found_something = true;
@@ -220,8 +218,6 @@ $(document).ready(function () {
}
}
}
} else {
}
}
@@ -252,6 +248,7 @@ $(document).ready(function () {
}).done(function (data) {
xpath_data = data.xpath_data;
$('#browsersteps-img').attr('src', data.screenshot);
$("#loading-status-text").fadeIn();
// This should trigger 'Goto site'
$('#browser_steps >li:first-child .apply').click();
browserless_seconds_remaining = data.browser_time_remaining;
@@ -393,6 +390,7 @@ $(document).ready(function () {
400: function () {
// More than likely the CSRF token was lost when the server restarted
alert("There was a problem processing the request, please reload the page.");
$("#loading-status-text").hide();
}
}
}).done(function (data) {
@@ -404,12 +402,14 @@ $(document).ready(function () {
$("#browsersteps-img").css('opacity',1);
$('ul#browser_steps li .control .apply').css('opacity',1);
browserless_seconds_remaining = data.browser_time_remaining;
$("#loading-status-text").hide();
}).fail(function (data) {
console.log(data);
if (data.responseText.includes("Browser session expired")) {
disable_browsersteps_ui();
}
apply_buttons_disabled=false;
$("#loading-status-text").hide();
$('ul#browser_steps li .control .apply').css('opacity',1);
$("#browsersteps-img").css('opacity',1);
//$('#browsersteps-selector-wrapper .loader').fadeOut(2500);

View File

@@ -160,7 +160,7 @@ User-Agent: wonderbra 1.0") }}
<input type=checkbox id="include_text_elements" > <label for="include_text_elements">Turn on text finder</label>
</div>
<div id="loading-status-text" style="display: none;">Please wait, first browser step can take a little time to load..<div class="spinner"></div></div>
<div class="flex-wrapper" >
<div id="browser-steps-ui" class="noselect" style="width: 100%; background-color: #eee; border-radius: 5px;">

View File

@@ -1,5 +1,6 @@
flask~=2.0
flask_wtf
flask-compress
eventlet>=0.31.0
validators
timeago~=1.0