Compare commits
30 Commits
bugfix-las
...
puppeteer-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
78bc6ae0d3 | ||
|
|
c07ab75837 | ||
|
|
0c7689fbd5 | ||
|
|
96dc49e229 | ||
|
|
5f43d988a3 | ||
|
|
4269079c54 | ||
|
|
cdfb3f206c | ||
|
|
9f326783e5 | ||
|
|
4e6e680d79 | ||
|
|
1378b5b2ff | ||
|
|
456c6e3f58 | ||
|
|
61be7f68db | ||
|
|
0e38a3c881 | ||
|
|
2c630e9853 | ||
|
|
786e0d1fab | ||
|
|
78b7aee512 | ||
|
|
9d9d01863a | ||
|
|
108cdf84a5 | ||
|
|
8c6f6f1578 | ||
|
|
df4ffaaff8 | ||
|
|
d522c65e50 | ||
|
|
c3b2a8b019 | ||
|
|
28d3151090 | ||
|
|
2a1c832f8d | ||
|
|
0170adb171 | ||
|
|
cb62404b8c | ||
|
|
8f9c46bd3f | ||
|
|
97291ce6d0 | ||
|
|
f689e5418e | ||
|
|
f751f0b0ef |
1
.github/workflows/test-only.yml
vendored
@@ -28,7 +28,6 @@ jobs:
|
|||||||
uses: ./.github/workflows/test-stack-reusable-workflow.yml
|
uses: ./.github/workflows/test-stack-reusable-workflow.yml
|
||||||
with:
|
with:
|
||||||
python-version: '3.11'
|
python-version: '3.11'
|
||||||
skip-pypuppeteer: true
|
|
||||||
|
|
||||||
test-application-3-12:
|
test-application-3-12:
|
||||||
needs: lint-code
|
needs: lint-code
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ on:
|
|||||||
description: 'Python version to use'
|
description: 'Python version to use'
|
||||||
required: true
|
required: true
|
||||||
type: string
|
type: string
|
||||||
default: '3.10'
|
default: '3.11'
|
||||||
skip-pypuppeteer:
|
skip-pypuppeteer:
|
||||||
description: 'Skip PyPuppeteer (not supported in 3.11/3.12)'
|
description: 'Skip PyPuppeteer (not supported in 3.11/3.12)'
|
||||||
required: false
|
required: false
|
||||||
|
|||||||
@@ -1,8 +1,5 @@
|
|||||||
# pip dependencies install stage
|
# pip dependencies install stage
|
||||||
|
|
||||||
# @NOTE! I would love to move to 3.11 but it breaks the async handler in changedetectionio/content_fetchers/puppeteer.py
|
|
||||||
# If you know how to fix it, please do! and test it for both 3.10 and 3.11
|
|
||||||
|
|
||||||
ARG PYTHON_VERSION=3.11
|
ARG PYTHON_VERSION=3.11
|
||||||
|
|
||||||
FROM python:${PYTHON_VERSION}-slim-bookworm AS builder
|
FROM python:${PYTHON_VERSION}-slim-bookworm AS builder
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
recursive-include changedetectionio/api *
|
recursive-include changedetectionio/api *
|
||||||
recursive-include changedetectionio/apprise_plugin *
|
|
||||||
recursive-include changedetectionio/blueprint *
|
recursive-include changedetectionio/blueprint *
|
||||||
recursive-include changedetectionio/content_fetchers *
|
recursive-include changedetectionio/content_fetchers *
|
||||||
recursive-include changedetectionio/conditions *
|
recursive-include changedetectionio/conditions *
|
||||||
recursive-include changedetectionio/model *
|
recursive-include changedetectionio/model *
|
||||||
|
recursive-include changedetectionio/notification *
|
||||||
recursive-include changedetectionio/processors *
|
recursive-include changedetectionio/processors *
|
||||||
recursive-include changedetectionio/static *
|
recursive-include changedetectionio/static *
|
||||||
recursive-include changedetectionio/templates *
|
recursive-include changedetectionio/templates *
|
||||||
|
|||||||
@@ -89,7 +89,7 @@ _Need an actual Chrome runner with Javascript support? We support fetching via W
|
|||||||
#### Key Features
|
#### Key Features
|
||||||
|
|
||||||
- Lots of trigger filters, such as "Trigger on text", "Remove text by selector", "Ignore text", "Extract text", also using regular-expressions!
|
- Lots of trigger filters, such as "Trigger on text", "Remove text by selector", "Ignore text", "Extract text", also using regular-expressions!
|
||||||
- Target elements with xPath(1.0) and CSS Selectors, Easily monitor complex JSON with JSONPath or jq
|
- Target elements with xPath 1 and xPath 2, CSS Selectors, Easily monitor complex JSON with JSONPath or jq
|
||||||
- Switch between fast non-JS and Chrome JS based "fetchers"
|
- Switch between fast non-JS and Chrome JS based "fetchers"
|
||||||
- Track changes in PDF files (Monitor text changed in the PDF, Also monitor PDF filesize and checksums)
|
- Track changes in PDF files (Monitor text changed in the PDF, Also monitor PDF filesize and checksums)
|
||||||
- Easily specify how often a site should be checked
|
- Easily specify how often a site should be checked
|
||||||
@@ -105,6 +105,12 @@ We [recommend and use Bright Data](https://brightdata.grsm.io/n0r16zf7eivq) glob
|
|||||||
|
|
||||||
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/
|
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/
|
||||||
|
|
||||||
|
### Conditional web page changes
|
||||||
|
|
||||||
|
Easily [configure conditional actions](https://changedetection.io/tutorial/conditional-actions-web-page-changes), for example, only trigger when a price is above or below a preset amount, or [when a web page includes (or does not include) a keyword](https://changedetection.io/tutorial/how-monitor-keywords-any-website)
|
||||||
|
|
||||||
|
<img src="./docs/web-page-change-conditions.png" style="max-width:80%;" alt="Conditional web page changes" title="Conditional web page changes" />
|
||||||
|
|
||||||
### Schedule web page watches in any timezone, limit by day of week and time.
|
### Schedule web page watches in any timezone, limit by day of week and time.
|
||||||
|
|
||||||
Easily set a re-check schedule, for example you could limit the web page change detection to only operate during business hours.
|
Easily set a re-check schedule, for example you could limit the web page change detection to only operate during business hours.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
|
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
|
||||||
|
|
||||||
__version__ = '0.49.9'
|
__version__ = '0.49.12'
|
||||||
|
|
||||||
from changedetectionio.strtobool import strtobool
|
from changedetectionio.strtobool import strtobool
|
||||||
from json.decoder import JSONDecodeError
|
from json.decoder import JSONDecodeError
|
||||||
@@ -11,6 +11,7 @@ os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
|
|||||||
import eventlet
|
import eventlet
|
||||||
import eventlet.wsgi
|
import eventlet.wsgi
|
||||||
import getopt
|
import getopt
|
||||||
|
import platform
|
||||||
import signal
|
import signal
|
||||||
import socket
|
import socket
|
||||||
import sys
|
import sys
|
||||||
@@ -19,7 +20,6 @@ from changedetectionio import store
|
|||||||
from changedetectionio.flask_app import changedetection_app
|
from changedetectionio.flask_app import changedetection_app
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
|
|
||||||
# Only global so we can access it in the signal handler
|
# Only global so we can access it in the signal handler
|
||||||
app = None
|
app = None
|
||||||
datastore = None
|
datastore = None
|
||||||
@@ -29,8 +29,6 @@ def get_version():
|
|||||||
|
|
||||||
# Parent wrapper or OS sends us a SIGTERM/SIGINT, do everything required for a clean shutdown
|
# Parent wrapper or OS sends us a SIGTERM/SIGINT, do everything required for a clean shutdown
|
||||||
def sigshutdown_handler(_signo, _stack_frame):
|
def sigshutdown_handler(_signo, _stack_frame):
|
||||||
global app
|
|
||||||
global datastore
|
|
||||||
name = signal.Signals(_signo).name
|
name = signal.Signals(_signo).name
|
||||||
logger.critical(f'Shutdown: Got Signal - {name} ({_signo}), Saving DB to disk and calling shutdown')
|
logger.critical(f'Shutdown: Got Signal - {name} ({_signo}), Saving DB to disk and calling shutdown')
|
||||||
datastore.sync_to_json()
|
datastore.sync_to_json()
|
||||||
@@ -147,6 +145,19 @@ def main():
|
|||||||
|
|
||||||
signal.signal(signal.SIGTERM, sigshutdown_handler)
|
signal.signal(signal.SIGTERM, sigshutdown_handler)
|
||||||
signal.signal(signal.SIGINT, sigshutdown_handler)
|
signal.signal(signal.SIGINT, sigshutdown_handler)
|
||||||
|
|
||||||
|
# Custom signal handler for memory cleanup
|
||||||
|
def sigusr_clean_handler(_signo, _stack_frame):
|
||||||
|
from changedetectionio.gc_cleanup import memory_cleanup
|
||||||
|
logger.info('SIGUSR1 received: Running memory cleanup')
|
||||||
|
return memory_cleanup(app)
|
||||||
|
|
||||||
|
# Register the SIGUSR1 signal handler
|
||||||
|
# Only register the signal handler if running on Linux
|
||||||
|
if platform.system() == "Linux":
|
||||||
|
signal.signal(signal.SIGUSR1, sigusr_clean_handler)
|
||||||
|
else:
|
||||||
|
logger.info("SIGUSR1 handler only registered on Linux, skipped.")
|
||||||
|
|
||||||
# Go into cleanup mode
|
# Go into cleanup mode
|
||||||
if do_cleanup:
|
if do_cleanup:
|
||||||
|
|||||||
@@ -1,5 +1,7 @@
|
|||||||
# Responsible for building the storage dict into a set of rules ("JSON Schema") acceptable via the API
|
# Responsible for building the storage dict into a set of rules ("JSON Schema") acceptable via the API
|
||||||
# Probably other ways to solve this when the backend switches to some ORM
|
# Probably other ways to solve this when the backend switches to some ORM
|
||||||
|
from changedetectionio.notification import valid_notification_formats
|
||||||
|
|
||||||
|
|
||||||
def build_time_between_check_json_schema():
|
def build_time_between_check_json_schema():
|
||||||
# Setup time between check schema
|
# Setup time between check schema
|
||||||
@@ -98,8 +100,6 @@ def build_watch_json_schema(d):
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
from changedetectionio.notification import valid_notification_formats
|
|
||||||
|
|
||||||
schema['properties']['notification_format'] = {'type': 'string',
|
schema['properties']['notification_format'] = {'type': 'string',
|
||||||
'enum': list(valid_notification_formats.keys())
|
'enum': list(valid_notification_formats.keys())
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -20,10 +20,7 @@ def login_optionally_required(func):
|
|||||||
has_password_enabled = datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False)
|
has_password_enabled = datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False)
|
||||||
|
|
||||||
# Permitted
|
# Permitted
|
||||||
if request.endpoint and 'static_content' in request.endpoint and request.view_args and request.view_args.get('group') == 'styles':
|
if request.endpoint and 'diff_history_page' in request.endpoint and datastore.data['settings']['application'].get('shared_diff_access'):
|
||||||
return func(*args, **kwargs)
|
|
||||||
# Permitted
|
|
||||||
elif request.endpoint and 'diff_history_page' in request.endpoint and datastore.data['settings']['application'].get('shared_diff_access'):
|
|
||||||
return func(*args, **kwargs)
|
return func(*args, **kwargs)
|
||||||
elif request.method in flask_login.config.EXEMPT_METHODS:
|
elif request.method in flask_login.config.EXEMPT_METHODS:
|
||||||
return func(*args, **kwargs)
|
return func(*args, **kwargs)
|
||||||
|
|||||||
@@ -23,7 +23,6 @@ from loguru import logger
|
|||||||
browsersteps_sessions = {}
|
browsersteps_sessions = {}
|
||||||
io_interface_context = None
|
io_interface_context = None
|
||||||
import json
|
import json
|
||||||
import base64
|
|
||||||
import hashlib
|
import hashlib
|
||||||
from flask import Response
|
from flask import Response
|
||||||
|
|
||||||
@@ -34,10 +33,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
|
|||||||
from . import nonContext
|
from . import nonContext
|
||||||
from . import browser_steps
|
from . import browser_steps
|
||||||
import time
|
import time
|
||||||
global browsersteps_sessions
|
|
||||||
global io_interface_context
|
global io_interface_context
|
||||||
|
|
||||||
|
|
||||||
# We keep the playwright session open for many minutes
|
# We keep the playwright session open for many minutes
|
||||||
keepalive_seconds = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60
|
keepalive_seconds = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60
|
||||||
|
|
||||||
@@ -104,8 +101,6 @@ def construct_blueprint(datastore: ChangeDetectionStore):
|
|||||||
# A new session was requested, return sessionID
|
# A new session was requested, return sessionID
|
||||||
|
|
||||||
import uuid
|
import uuid
|
||||||
global browsersteps_sessions
|
|
||||||
|
|
||||||
browsersteps_session_id = str(uuid.uuid4())
|
browsersteps_session_id = str(uuid.uuid4())
|
||||||
watch_uuid = request.args.get('uuid')
|
watch_uuid = request.args.get('uuid')
|
||||||
|
|
||||||
@@ -149,7 +144,6 @@ def construct_blueprint(datastore: ChangeDetectionStore):
|
|||||||
def browsersteps_ui_update():
|
def browsersteps_ui_update():
|
||||||
import base64
|
import base64
|
||||||
import playwright._impl._errors
|
import playwright._impl._errors
|
||||||
global browsersteps_sessions
|
|
||||||
from changedetectionio.blueprint.browser_steps import browser_steps
|
from changedetectionio.blueprint.browser_steps import browser_steps
|
||||||
|
|
||||||
remaining =0
|
remaining =0
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ import re
|
|||||||
from random import randint
|
from random import randint
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
from changedetectionio.content_fetchers.helpers import capture_stitched_together_full_page, SCREENSHOT_SIZE_STITCH_THRESHOLD
|
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT
|
||||||
from changedetectionio.content_fetchers.base import manage_user_agent
|
from changedetectionio.content_fetchers.base import manage_user_agent
|
||||||
from changedetectionio.safe_jinja import render as jinja_render
|
from changedetectionio.safe_jinja import render as jinja_render
|
||||||
|
|
||||||
@@ -293,19 +293,16 @@ class browsersteps_live_ui(steppable_browser_interface):
|
|||||||
def get_current_state(self):
|
def get_current_state(self):
|
||||||
"""Return the screenshot and interactive elements mapping, generally always called after action_()"""
|
"""Return the screenshot and interactive elements mapping, generally always called after action_()"""
|
||||||
import importlib.resources
|
import importlib.resources
|
||||||
|
import json
|
||||||
|
# because we for now only run browser steps in playwright mode (not puppeteer mode)
|
||||||
|
from changedetectionio.content_fetchers.playwright import capture_full_page
|
||||||
|
|
||||||
xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text()
|
xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text()
|
||||||
|
|
||||||
now = time.time()
|
now = time.time()
|
||||||
self.page.wait_for_timeout(1 * 1000)
|
self.page.wait_for_timeout(1 * 1000)
|
||||||
|
|
||||||
|
screenshot = capture_full_page(page=self.page)
|
||||||
full_height = self.page.evaluate("document.documentElement.scrollHeight")
|
|
||||||
|
|
||||||
if full_height >= SCREENSHOT_SIZE_STITCH_THRESHOLD:
|
|
||||||
logger.warning(f"Page full Height: {full_height}px longer than {SCREENSHOT_SIZE_STITCH_THRESHOLD}px, using 'stitched screenshot method'.")
|
|
||||||
screenshot = capture_stitched_together_full_page(self.page)
|
|
||||||
else:
|
|
||||||
screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=40)
|
|
||||||
|
|
||||||
logger.debug(f"Time to get screenshot from browser {time.time() - now:.2f}s")
|
logger.debug(f"Time to get screenshot from browser {time.time() - now:.2f}s")
|
||||||
|
|
||||||
@@ -313,13 +310,21 @@ class browsersteps_live_ui(steppable_browser_interface):
|
|||||||
self.page.evaluate("var include_filters=''")
|
self.page.evaluate("var include_filters=''")
|
||||||
# Go find the interactive elements
|
# Go find the interactive elements
|
||||||
# @todo in the future, something smarter that can scan for elements with .click/focus etc event handlers?
|
# @todo in the future, something smarter that can scan for elements with .click/focus etc event handlers?
|
||||||
elements = 'a,button,input,select,textarea,i,th,td,p,li,h1,h2,h3,h4,div,span'
|
|
||||||
xpath_element_js = xpath_element_js.replace('%ELEMENTS%', elements)
|
|
||||||
|
|
||||||
xpath_data = self.page.evaluate("async () => {" + xpath_element_js + "}")
|
self.page.request_gc()
|
||||||
|
|
||||||
|
scan_elements = 'a,button,input,select,textarea,i,th,td,p,li,h1,h2,h3,h4,div,span'
|
||||||
|
|
||||||
|
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
|
||||||
|
xpath_data = json.loads(self.page.evaluate(xpath_element_js, {
|
||||||
|
"visualselector_xpath_selectors": scan_elements,
|
||||||
|
"max_height": MAX_TOTAL_HEIGHT
|
||||||
|
}))
|
||||||
|
self.page.request_gc()
|
||||||
|
|
||||||
# So the JS will find the smallest one first
|
# So the JS will find the smallest one first
|
||||||
xpath_data['size_pos'] = sorted(xpath_data['size_pos'], key=lambda k: k['width'] * k['height'], reverse=True)
|
xpath_data['size_pos'] = sorted(xpath_data['size_pos'], key=lambda k: k['width'] * k['height'], reverse=True)
|
||||||
logger.debug(f"Time to scrape xpath element data in browser {time.time()-now:.2f}s")
|
logger.debug(f"Time to scrape xPath element data in browser {time.time()-now:.2f}s")
|
||||||
|
|
||||||
# playwright._impl._api_types.Error: Browser closed.
|
# playwright._impl._api_types.Error: Browser closed.
|
||||||
# @todo show some countdown timer?
|
# @todo show some countdown timer?
|
||||||
|
|||||||
@@ -22,6 +22,7 @@
|
|||||||
<li class="tab"><a href="#notifications">Notifications</a></li>
|
<li class="tab"><a href="#notifications">Notifications</a></li>
|
||||||
<li class="tab"><a href="#fetching">Fetching</a></li>
|
<li class="tab"><a href="#fetching">Fetching</a></li>
|
||||||
<li class="tab"><a href="#filters">Global Filters</a></li>
|
<li class="tab"><a href="#filters">Global Filters</a></li>
|
||||||
|
<li class="tab"><a href="#ui-options">UI Options</a></li>
|
||||||
<li class="tab"><a href="#api">API</a></li>
|
<li class="tab"><a href="#api">API</a></li>
|
||||||
<li class="tab"><a href="#timedate">Time & Date</a></li>
|
<li class="tab"><a href="#timedate">Time & Date</a></li>
|
||||||
<li class="tab"><a href="#proxies">CAPTCHA & Proxies</a></li>
|
<li class="tab"><a href="#proxies">CAPTCHA & Proxies</a></li>
|
||||||
@@ -217,7 +218,7 @@ nav
|
|||||||
<a id="chrome-extension-link"
|
<a id="chrome-extension-link"
|
||||||
title="Try our new Chrome Extension!"
|
title="Try our new Chrome Extension!"
|
||||||
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
|
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
|
||||||
<img alt="Chrome store icon" src="{{ url_for('static_content', group='images', filename='Google-Chrome-icon.png') }}" alt="Chrome">
|
<img alt="Chrome store icon" src="{{ url_for('static_content', group='images', filename='google-chrome-icon.png') }}" alt="Chrome">
|
||||||
Chrome Webstore
|
Chrome Webstore
|
||||||
</a>
|
</a>
|
||||||
</p>
|
</p>
|
||||||
@@ -240,6 +241,12 @@ nav
|
|||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
<div class="tab-pane-inner" id="ui-options">
|
||||||
|
<div class="pure-control-group">
|
||||||
|
{{ render_checkbox_field(form.application.form.ui.form.open_diff_in_new_tab, class="open_diff_in_new_tab") }}
|
||||||
|
<span class="pure-form-message-inline">Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
<div class="tab-pane-inner" id="proxies">
|
<div class="tab-pane-inner" id="proxies">
|
||||||
<div id="recommended-proxy">
|
<div id="recommended-proxy">
|
||||||
<div>
|
<div>
|
||||||
|
|||||||
@@ -13,6 +13,7 @@
|
|||||||
/*const email_notification_prefix=JSON.parse('{{ emailprefix|tojson }}');*/
|
/*const email_notification_prefix=JSON.parse('{{ emailprefix|tojson }}');*/
|
||||||
/*{% endif %}*/
|
/*{% endif %}*/
|
||||||
|
|
||||||
|
{% set has_tag_filters_extra='' %}
|
||||||
|
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
@@ -46,59 +47,12 @@
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="tab-pane-inner" id="filters-and-triggers">
|
<div class="tab-pane-inner" id="filters-and-triggers">
|
||||||
<div class="pure-control-group">
|
<p>These settings are <strong><i>added</i></strong> to any existing watch configurations.</p>
|
||||||
{% set field = render_field(form.include_filters,
|
{% include "edit/include_subtract.html" %}
|
||||||
rows=5,
|
<div class="text-filtering border-fieldset">
|
||||||
placeholder="#example
|
<h3>Text filtering</h3>
|
||||||
xpath://body/div/span[contains(@class, 'example-class')]",
|
{% include "edit/text-options.html" %}
|
||||||
class="m-d")
|
</div>
|
||||||
%}
|
|
||||||
{{ field }}
|
|
||||||
{% if '/text()' in field %}
|
|
||||||
<span class="pure-form-message-inline"><strong>Note!: //text() function does not work where the <element> contains <![CDATA[]]></strong></span><br>
|
|
||||||
{% endif %}
|
|
||||||
<span class="pure-form-message-inline">One CSS, xPath, JSON Path/JQ selector per line, <i>any</i> rules that matches will be used.<br>
|
|
||||||
<div data-target="#advanced-help-selectors" class="toggle-show pure-button button-tag button-xsmall">Show advanced help and tips</div>
|
|
||||||
<ul id="advanced-help-selectors">
|
|
||||||
<li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
|
|
||||||
<li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a> (if installed).
|
|
||||||
<ul>
|
|
||||||
<li>JSONPath: Prefix with <code>json:</code>, use <code>json:$</code> to force re-formatting if required, <a href="https://jsonpath.com/" target="new">test your JSONPath here</a>.</li>
|
|
||||||
{% if jq_support %}
|
|
||||||
<li>jq: Prefix with <code>jq:</code> and <a href="https://jqplay.org/" target="new">test your jq here</a>. Using <a href="https://stedolan.github.io/jq/" target="new">jq</a> allows for complex filtering and processing of JSON data with built-in functions, regex, filtering, and more. See examples and documentation <a href="https://stedolan.github.io/jq/manual/" target="new">here</a>. Prefix <code>jqraw:</code> outputs the results as text instead of a JSON list.</li>
|
|
||||||
{% else %}
|
|
||||||
<li>jq support not installed</li>
|
|
||||||
{% endif %}
|
|
||||||
</ul>
|
|
||||||
</li>
|
|
||||||
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash. To specify XPath to be used explicitly or the XPath rule starts with an XPath function: Prefix with <code>xpath:</code>
|
|
||||||
<ul>
|
|
||||||
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath:count(//*[contains(@class, 'sametext')])</code>, <a
|
|
||||||
href="http://xpather.com/" target="new">test your XPath here</a></li>
|
|
||||||
<li>Example: Get all titles from an RSS feed <code>//title/text()</code></li>
|
|
||||||
<li>To use XPath1.0: Prefix with <code>xpath1:</code></li>
|
|
||||||
</ul>
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath{% if jq_support %}, or jq selector{%endif%} rules before filing an issue on GitHub! <a
|
|
||||||
href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br>
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
<fieldset class="pure-control-group">
|
|
||||||
{{ render_field(form.subtractive_selectors, rows=5, placeholder="header
|
|
||||||
footer
|
|
||||||
nav
|
|
||||||
.stockticker
|
|
||||||
//*[contains(text(), 'Advertisement')]") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li> Remove HTML element(s) by CSS and XPath selectors before text conversion. </li>
|
|
||||||
<li> Don't paste HTML here, use only CSS and XPath selectors </li>
|
|
||||||
<li> Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML. </li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
</fieldset>
|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{# rendered sub Template #}
|
{# rendered sub Template #}
|
||||||
|
|||||||
@@ -125,7 +125,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
|
|||||||
|
|
||||||
else:
|
else:
|
||||||
# Recheck all, including muted
|
# Recheck all, including muted
|
||||||
for watch_uuid, watch in datastore.data['watching'].items():
|
# Get most overdue first
|
||||||
|
for k in sorted(datastore.data['watching'].items(), key=lambda item: item[1].get('last_checked', 0)):
|
||||||
|
watch_uuid = k[0]
|
||||||
|
watch = k[1]
|
||||||
if not watch['paused']:
|
if not watch['paused']:
|
||||||
if watch_uuid not in running_uuids:
|
if watch_uuid not in running_uuids:
|
||||||
if with_errors and not watch.get('last_error'):
|
if with_errors and not watch.get('last_error'):
|
||||||
@@ -140,7 +143,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
|
|||||||
if i == 1:
|
if i == 1:
|
||||||
flash("Queued 1 watch for rechecking.")
|
flash("Queued 1 watch for rechecking.")
|
||||||
if i > 1:
|
if i > 1:
|
||||||
flash("Queued {} watches for rechecking.".format(i))
|
flash(f"Queued {i} watches for rechecking.")
|
||||||
if i == 0:
|
if i == 0:
|
||||||
flash("No watches available to recheck.")
|
flash("No watches available to recheck.")
|
||||||
|
|
||||||
|
|||||||
@@ -4,7 +4,6 @@ from loguru import logger
|
|||||||
|
|
||||||
from changedetectionio.store import ChangeDetectionStore
|
from changedetectionio.store import ChangeDetectionStore
|
||||||
from changedetectionio.auth_decorator import login_optionally_required
|
from changedetectionio.auth_decorator import login_optionally_required
|
||||||
from changedetectionio.notification import process_notification
|
|
||||||
|
|
||||||
def construct_blueprint(datastore: ChangeDetectionStore):
|
def construct_blueprint(datastore: ChangeDetectionStore):
|
||||||
notification_blueprint = Blueprint('ui_notification', __name__, template_folder="../ui/templates")
|
notification_blueprint = Blueprint('ui_notification', __name__, template_folder="../ui/templates")
|
||||||
@@ -18,8 +17,11 @@ def construct_blueprint(datastore: ChangeDetectionStore):
|
|||||||
|
|
||||||
# Watch_uuid could be unset in the case it`s used in tag editor, global settings
|
# Watch_uuid could be unset in the case it`s used in tag editor, global settings
|
||||||
import apprise
|
import apprise
|
||||||
from ...apprise_plugin.assets import apprise_asset
|
from changedetectionio.notification.handler import process_notification
|
||||||
from ...apprise_plugin.custom_handlers import apprise_http_custom_handler # noqa: F401
|
from changedetectionio.notification.apprise_plugin.assets import apprise_asset
|
||||||
|
|
||||||
|
from changedetectionio.notification.apprise_plugin.custom_handlers import apprise_http_custom_handler
|
||||||
|
|
||||||
apobj = apprise.Apprise(asset=apprise_asset)
|
apobj = apprise.Apprise(asset=apprise_asset)
|
||||||
|
|
||||||
is_global_settings_form = request.args.get('mode', '') == 'global-settings'
|
is_global_settings_form = request.args.get('mode', '') == 'global-settings'
|
||||||
|
|||||||
@@ -130,7 +130,7 @@
|
|||||||
or ( watch.get_fetch_backend == "system" and system_default_fetcher == 'html_webdriver' )
|
or ( watch.get_fetch_backend == "system" and system_default_fetcher == 'html_webdriver' )
|
||||||
or "extra_browser_" in watch.get_fetch_backend
|
or "extra_browser_" in watch.get_fetch_backend
|
||||||
%}
|
%}
|
||||||
<img class="status-icon" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}" alt="Using a Chrome browser" title="Using a Chrome browser" >
|
<img class="status-icon" src="{{url_for('static_content', group='images', filename='google-chrome-icon.png')}}" alt="Using a Chrome browser" title="Using a Chrome browser" >
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
{%if watch.is_pdf %}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" title="Converting PDF to text" >{% endif %}
|
{%if watch.is_pdf %}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" title="Converting PDF to text" >{% endif %}
|
||||||
@@ -209,15 +209,18 @@
|
|||||||
<a href="{{ url_for('ui.ui_edit.edit_page', uuid=watch.uuid, tag=active_tag_uuid)}}#general" class="pure-button pure-button-primary">Edit</a>
|
<a href="{{ url_for('ui.ui_edit.edit_page', uuid=watch.uuid, tag=active_tag_uuid)}}#general" class="pure-button pure-button-primary">Edit</a>
|
||||||
{% if watch.history_n >= 2 %}
|
{% if watch.history_n >= 2 %}
|
||||||
|
|
||||||
|
{% set open_diff_in_new_tab = datastore.data['settings']['application']['ui'].get('open_diff_in_new_tab') %}
|
||||||
|
{% set target_attr = ' target="' ~ watch.uuid ~ '"' if open_diff_in_new_tab else '' %}
|
||||||
|
|
||||||
{% if is_unviewed %}
|
{% if is_unviewed %}
|
||||||
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid, from_version=watch.get_from_version_based_on_last_viewed) }}" target="{{watch.uuid}}" class="pure-button pure-button-primary diff-link">History</a>
|
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid, from_version=watch.get_from_version_based_on_last_viewed) }}" {{target_attr}} class="pure-button pure-button-primary diff-link">History</a>
|
||||||
{% else %}
|
{% else %}
|
||||||
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid)}}" target="{{watch.uuid}}" class="pure-button pure-button-primary diff-link">History</a>
|
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary diff-link">History</a>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
{% else %}
|
{% else %}
|
||||||
{% if watch.history_n == 1 or (watch.history_n ==0 and watch.error_text_ctime )%}
|
{% if watch.history_n == 1 or (watch.history_n ==0 and watch.error_text_ctime )%}
|
||||||
<a href="{{ url_for('ui.ui_views.preview_page', uuid=watch.uuid)}}" target="{{watch.uuid}}" class="pure-button pure-button-primary">Preview</a>
|
<a href="{{ url_for('ui.ui_views.preview_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary">Preview</a>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
{% endif %}
|
{% endif %}
|
||||||
</td>
|
</td>
|
||||||
@@ -241,7 +244,7 @@
|
|||||||
all {% if active_tag_uuid %} in "{{active_tag.title}}"{%endif%}</a>
|
all {% if active_tag_uuid %} in "{{active_tag.title}}"{%endif%}</a>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="{{ url_for('rss.feed', tag=active_tag_uuid, token=app_rss_token)}}"><img alt="RSS Feed" id="feed-icon" src="{{url_for('static_content', group='images', filename='Generic_Feed-icon.svg')}}" height="15"></a>
|
<a href="{{ url_for('rss.feed', tag=active_tag_uuid, token=app_rss_token)}}"><img alt="RSS Feed" id="feed-icon" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
|
||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
{{ pagination.links }}
|
{{ pagination.links }}
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ from . import default_plugin
|
|||||||
|
|
||||||
# List of all supported JSON Logic operators
|
# List of all supported JSON Logic operators
|
||||||
operator_choices = [
|
operator_choices = [
|
||||||
(None, "Choose one"),
|
(None, "Choose one - Operator"),
|
||||||
(">", "Greater Than"),
|
(">", "Greater Than"),
|
||||||
("<", "Less Than"),
|
("<", "Less Than"),
|
||||||
(">=", "Greater Than or Equal To"),
|
(">=", "Greater Than or Equal To"),
|
||||||
@@ -21,7 +21,7 @@ operator_choices = [
|
|||||||
|
|
||||||
# Fields available in the rules
|
# Fields available in the rules
|
||||||
field_choices = [
|
field_choices = [
|
||||||
(None, "Choose one"),
|
(None, "Choose one - Field"),
|
||||||
]
|
]
|
||||||
|
|
||||||
# The data we will feed the JSON Rules to see if it passes the test/conditions or not
|
# The data we will feed the JSON Rules to see if it passes the test/conditions or not
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ class ConditionFormRow(Form):
|
|||||||
validators=[validators.Optional()]
|
validators=[validators.Optional()]
|
||||||
)
|
)
|
||||||
|
|
||||||
value = StringField("Value", validators=[validators.Optional()])
|
value = StringField("Value", validators=[validators.Optional()], render_kw={"placeholder": "A value"})
|
||||||
|
|
||||||
def validate(self, extra_validators=None):
|
def validate(self, extra_validators=None):
|
||||||
# First, run the default validators
|
# First, run the default validators
|
||||||
|
|||||||
@@ -7,11 +7,29 @@ import os
|
|||||||
# Visual Selector scraper - 'Button' is there because some sites have <button>OUT OF STOCK</button>.
|
# Visual Selector scraper - 'Button' is there because some sites have <button>OUT OF STOCK</button>.
|
||||||
visualselector_xpath_selectors = 'div,span,form,table,tbody,tr,td,a,p,ul,li,h1,h2,h3,h4,header,footer,section,article,aside,details,main,nav,section,summary,button'
|
visualselector_xpath_selectors = 'div,span,form,table,tbody,tr,td,a,p,ul,li,h1,h2,h3,h4,header,footer,section,article,aside,details,main,nav,section,summary,button'
|
||||||
|
|
||||||
|
SCREENSHOT_MAX_HEIGHT_DEFAULT = 20000
|
||||||
|
SCREENSHOT_DEFAULT_QUALITY = 40
|
||||||
|
|
||||||
|
# Maximum total height for the final image (When in stitch mode).
|
||||||
|
# We limit this to 16000px due to the huge amount of RAM that was being used
|
||||||
|
# Example: 16000 × 1400 × 3 = 67,200,000 bytes ≈ 64.1 MB (not including buffers in PIL etc)
|
||||||
|
SCREENSHOT_MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
|
||||||
|
|
||||||
|
# The size at which we will switch to stitching method, when below this (and
|
||||||
|
# MAX_TOTAL_HEIGHT which can be set by a user) we will use the default
|
||||||
|
# screenshot method.
|
||||||
|
SCREENSHOT_SIZE_STITCH_THRESHOLD = 8000
|
||||||
|
|
||||||
# available_fetchers() will scan this implementation looking for anything starting with html_
|
# available_fetchers() will scan this implementation looking for anything starting with html_
|
||||||
# this information is used in the form selections
|
# this information is used in the form selections
|
||||||
from changedetectionio.content_fetchers.requests import fetcher as html_requests
|
from changedetectionio.content_fetchers.requests import fetcher as html_requests
|
||||||
|
|
||||||
|
|
||||||
|
import importlib.resources
|
||||||
|
XPATH_ELEMENT_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text(encoding='utf-8')
|
||||||
|
INSTOCK_DATA_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('stock-not-in-stock.js').read_text(encoding='utf-8')
|
||||||
|
|
||||||
|
|
||||||
def available_fetchers():
|
def available_fetchers():
|
||||||
# See the if statement at the bottom of this file for how we switch between playwright and webdriver
|
# See the if statement at the bottom of this file for how we switch between playwright and webdriver
|
||||||
import inspect
|
import inspect
|
||||||
|
|||||||
@@ -63,11 +63,6 @@ class Fetcher():
|
|||||||
# Time ONTOP of the system defined env minimum time
|
# Time ONTOP of the system defined env minimum time
|
||||||
render_extract_delay = 0
|
render_extract_delay = 0
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
import importlib.resources
|
|
||||||
self.xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text(encoding='utf-8')
|
|
||||||
self.instock_data_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('stock-not-in-stock.js').read_text(encoding='utf-8')
|
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def get_error(self):
|
def get_error(self):
|
||||||
return self.error
|
return self.error
|
||||||
@@ -87,7 +82,7 @@ class Fetcher():
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def quit(self):
|
def quit(self, watch=None):
|
||||||
return
|
return
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
@@ -143,6 +138,7 @@ class Fetcher():
|
|||||||
logger.debug(f">> Iterating check - browser Step n {step_n} - {step['operation']}...")
|
logger.debug(f">> Iterating check - browser Step n {step_n} - {step['operation']}...")
|
||||||
self.screenshot_step("before-" + str(step_n))
|
self.screenshot_step("before-" + str(step_n))
|
||||||
self.save_step_html("before-" + str(step_n))
|
self.save_step_html("before-" + str(step_n))
|
||||||
|
|
||||||
try:
|
try:
|
||||||
optional_value = step['optional_value']
|
optional_value = step['optional_value']
|
||||||
selector = step['selector']
|
selector = step['selector']
|
||||||
|
|||||||
@@ -1,104 +0,0 @@
|
|||||||
|
|
||||||
# Pages with a vertical height longer than this will use the 'stitch together' method.
|
|
||||||
|
|
||||||
# - Many GPUs have a max texture size of 16384x16384px (or lower on older devices).
|
|
||||||
# - If a page is taller than ~8000–10000px, it risks exceeding GPU memory limits.
|
|
||||||
# - This is especially important on headless Chromium, where Playwright may fail to allocate a massive full-page buffer.
|
|
||||||
|
|
||||||
|
|
||||||
# The size at which we will switch to stitching method
|
|
||||||
SCREENSHOT_SIZE_STITCH_THRESHOLD=8000
|
|
||||||
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
def capture_stitched_together_full_page(page):
|
|
||||||
import io
|
|
||||||
import os
|
|
||||||
import time
|
|
||||||
from PIL import Image, ImageDraw, ImageFont
|
|
||||||
|
|
||||||
MAX_TOTAL_HEIGHT = SCREENSHOT_SIZE_STITCH_THRESHOLD*4 # Maximum total height for the final image (When in stitch mode)
|
|
||||||
MAX_CHUNK_HEIGHT = 4000 # Height per screenshot chunk
|
|
||||||
WARNING_TEXT_HEIGHT = 20 # Height of the warning text overlay
|
|
||||||
|
|
||||||
# Save the original viewport size
|
|
||||||
original_viewport = page.viewport_size
|
|
||||||
now = time.time()
|
|
||||||
|
|
||||||
try:
|
|
||||||
viewport = page.viewport_size
|
|
||||||
page_height = page.evaluate("document.documentElement.scrollHeight")
|
|
||||||
|
|
||||||
# Limit the total capture height
|
|
||||||
capture_height = min(page_height, MAX_TOTAL_HEIGHT)
|
|
||||||
|
|
||||||
images = []
|
|
||||||
total_captured_height = 0
|
|
||||||
|
|
||||||
for offset in range(0, capture_height, MAX_CHUNK_HEIGHT):
|
|
||||||
# Ensure we do not exceed the total height limit
|
|
||||||
chunk_height = min(MAX_CHUNK_HEIGHT, MAX_TOTAL_HEIGHT - total_captured_height)
|
|
||||||
|
|
||||||
# Adjust viewport size for this chunk
|
|
||||||
page.set_viewport_size({"width": viewport["width"], "height": chunk_height})
|
|
||||||
|
|
||||||
# Scroll to the correct position
|
|
||||||
page.evaluate(f"window.scrollTo(0, {offset})")
|
|
||||||
|
|
||||||
# Capture screenshot chunk
|
|
||||||
screenshot_bytes = page.screenshot(type='jpeg', quality=int(os.getenv("SCREENSHOT_QUALITY", 30)))
|
|
||||||
images.append(Image.open(io.BytesIO(screenshot_bytes)))
|
|
||||||
|
|
||||||
total_captured_height += chunk_height
|
|
||||||
|
|
||||||
# Stop if we reached the maximum total height
|
|
||||||
if total_captured_height >= MAX_TOTAL_HEIGHT:
|
|
||||||
break
|
|
||||||
|
|
||||||
# Create the final stitched image
|
|
||||||
stitched_image = Image.new('RGB', (viewport["width"], total_captured_height))
|
|
||||||
y_offset = 0
|
|
||||||
|
|
||||||
# Stitch the screenshot chunks together
|
|
||||||
for img in images:
|
|
||||||
stitched_image.paste(img, (0, y_offset))
|
|
||||||
y_offset += img.height
|
|
||||||
|
|
||||||
logger.debug(f"Screenshot stitched together in {time.time()-now:.2f}s")
|
|
||||||
|
|
||||||
# Overlay warning text if the screenshot was trimmed
|
|
||||||
if page_height > MAX_TOTAL_HEIGHT:
|
|
||||||
draw = ImageDraw.Draw(stitched_image)
|
|
||||||
warning_text = f"WARNING: Screenshot was {page_height}px but trimmed to {MAX_TOTAL_HEIGHT}px because it was too long"
|
|
||||||
|
|
||||||
# Load font (default system font if Arial is unavailable)
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("arial.ttf", WARNING_TEXT_HEIGHT) # Arial (Windows/Mac)
|
|
||||||
except IOError:
|
|
||||||
font = ImageFont.load_default() # Default font if Arial not found
|
|
||||||
|
|
||||||
# Get text bounding box (correct method for newer Pillow versions)
|
|
||||||
text_bbox = draw.textbbox((0, 0), warning_text, font=font)
|
|
||||||
text_width = text_bbox[2] - text_bbox[0] # Calculate text width
|
|
||||||
text_height = text_bbox[3] - text_bbox[1] # Calculate text height
|
|
||||||
|
|
||||||
# Define background rectangle (top of the image)
|
|
||||||
draw.rectangle([(0, 0), (viewport["width"], WARNING_TEXT_HEIGHT)], fill="white")
|
|
||||||
|
|
||||||
# Center text horizontally within the warning area
|
|
||||||
text_x = (viewport["width"] - text_width) // 2
|
|
||||||
text_y = (WARNING_TEXT_HEIGHT - text_height) // 2
|
|
||||||
|
|
||||||
# Draw the warning text in red
|
|
||||||
draw.text((text_x, text_y), warning_text, fill="red", font=font)
|
|
||||||
|
|
||||||
# Save or return the final image
|
|
||||||
output = io.BytesIO()
|
|
||||||
stitched_image.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", 30)))
|
|
||||||
screenshot = output.getvalue()
|
|
||||||
|
|
||||||
finally:
|
|
||||||
# Restore the original viewport size
|
|
||||||
page.set_viewport_size(original_viewport)
|
|
||||||
|
|
||||||
return screenshot
|
|
||||||
@@ -4,10 +4,71 @@ from urllib.parse import urlparse
|
|||||||
|
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
from changedetectionio.content_fetchers.helpers import capture_stitched_together_full_page, SCREENSHOT_SIZE_STITCH_THRESHOLD
|
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
|
||||||
|
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS
|
||||||
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
|
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
|
||||||
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
|
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
|
||||||
|
|
||||||
|
def capture_full_page(page):
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
from multiprocessing import Process, Pipe
|
||||||
|
|
||||||
|
start = time.time()
|
||||||
|
|
||||||
|
page_height = page.evaluate("document.documentElement.scrollHeight")
|
||||||
|
page_width = page.evaluate("document.documentElement.scrollWidth")
|
||||||
|
original_viewport = page.viewport_size
|
||||||
|
|
||||||
|
logger.debug(f"Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width}")
|
||||||
|
|
||||||
|
# Use an approach similar to puppeteer: set a larger viewport and take screenshots in chunks
|
||||||
|
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Size that won't cause GPU to overflow
|
||||||
|
screenshot_chunks = []
|
||||||
|
y = 0
|
||||||
|
|
||||||
|
# If page height is larger than current viewport, use a larger viewport for better capturing
|
||||||
|
if page_height > page.viewport_size['height']:
|
||||||
|
# Set viewport to a larger size to capture more content at once
|
||||||
|
page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
|
||||||
|
|
||||||
|
# Capture screenshots in chunks up to the max total height
|
||||||
|
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
|
||||||
|
page.request_gc()
|
||||||
|
page.evaluate(f"window.scrollTo(0, {y})")
|
||||||
|
page.request_gc()
|
||||||
|
screenshot_chunks.append(page.screenshot(
|
||||||
|
type="jpeg",
|
||||||
|
full_page=False,
|
||||||
|
quality=int(os.getenv("SCREENSHOT_QUALITY", 72))
|
||||||
|
))
|
||||||
|
y += step_size
|
||||||
|
page.request_gc()
|
||||||
|
|
||||||
|
# Restore original viewport size
|
||||||
|
page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
|
||||||
|
|
||||||
|
# If we have multiple chunks, stitch them together
|
||||||
|
if len(screenshot_chunks) > 1:
|
||||||
|
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
|
||||||
|
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
|
||||||
|
parent_conn, child_conn = Pipe()
|
||||||
|
p = Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
|
||||||
|
p.start()
|
||||||
|
screenshot = parent_conn.recv_bytes()
|
||||||
|
p.join()
|
||||||
|
logger.debug(
|
||||||
|
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
|
||||||
|
|
||||||
|
screenshot_chunks = None
|
||||||
|
return screenshot
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
|
||||||
|
|
||||||
|
return screenshot_chunks[0]
|
||||||
|
|
||||||
|
|
||||||
class fetcher(Fetcher):
|
class fetcher(Fetcher):
|
||||||
fetcher_description = "Playwright {}/Javascript".format(
|
fetcher_description = "Playwright {}/Javascript".format(
|
||||||
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
|
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
|
||||||
@@ -60,7 +121,8 @@ class fetcher(Fetcher):
|
|||||||
|
|
||||||
def screenshot_step(self, step_n=''):
|
def screenshot_step(self, step_n=''):
|
||||||
super().screenshot_step(step_n=step_n)
|
super().screenshot_step(step_n=step_n)
|
||||||
screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
|
screenshot = capture_full_page(page=self.page)
|
||||||
|
|
||||||
|
|
||||||
if self.browser_steps_screenshot_path is not None:
|
if self.browser_steps_screenshot_path is not None:
|
||||||
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
|
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
|
||||||
@@ -89,7 +151,6 @@ class fetcher(Fetcher):
|
|||||||
|
|
||||||
from playwright.sync_api import sync_playwright
|
from playwright.sync_api import sync_playwright
|
||||||
import playwright._impl._errors
|
import playwright._impl._errors
|
||||||
from changedetectionio.content_fetchers import visualselector_xpath_selectors
|
|
||||||
import time
|
import time
|
||||||
self.delete_browser_steps_screenshots()
|
self.delete_browser_steps_screenshots()
|
||||||
response = None
|
response = None
|
||||||
@@ -164,9 +225,7 @@ class fetcher(Fetcher):
|
|||||||
raise PageUnloadable(url=url, status_code=None, message=str(e))
|
raise PageUnloadable(url=url, status_code=None, message=str(e))
|
||||||
|
|
||||||
if self.status_code != 200 and not ignore_status_codes:
|
if self.status_code != 200 and not ignore_status_codes:
|
||||||
screenshot = self.page.screenshot(type='jpeg', full_page=True,
|
screenshot = capture_full_page(self.page)
|
||||||
quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
|
|
||||||
|
|
||||||
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
|
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
|
||||||
|
|
||||||
if not empty_pages_are_a_change and len(self.page.content().strip()) == 0:
|
if not empty_pages_are_a_change and len(self.page.content().strip()) == 0:
|
||||||
@@ -187,13 +246,23 @@ class fetcher(Fetcher):
|
|||||||
self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
|
self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
|
||||||
else:
|
else:
|
||||||
self.page.evaluate("var include_filters=''")
|
self.page.evaluate("var include_filters=''")
|
||||||
|
self.page.request_gc()
|
||||||
|
|
||||||
self.xpath_data = self.page.evaluate(
|
# request_gc before and after evaluate to free up memory
|
||||||
"async () => {" + self.xpath_element_js.replace('%ELEMENTS%', visualselector_xpath_selectors) + "}")
|
# @todo browsersteps etc
|
||||||
self.instock_data = self.page.evaluate("async () => {" + self.instock_data_js + "}")
|
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
|
||||||
|
self.xpath_data = self.page.evaluate(XPATH_ELEMENT_JS, {
|
||||||
|
"visualselector_xpath_selectors": visualselector_xpath_selectors,
|
||||||
|
"max_height": MAX_TOTAL_HEIGHT
|
||||||
|
})
|
||||||
|
self.page.request_gc()
|
||||||
|
|
||||||
|
self.instock_data = self.page.evaluate(INSTOCK_DATA_JS)
|
||||||
|
self.page.request_gc()
|
||||||
|
|
||||||
self.content = self.page.content()
|
self.content = self.page.content()
|
||||||
logger.debug(f"Time to scrape xpath element data in browser {time.time() - now:.2f}s")
|
self.page.request_gc()
|
||||||
|
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
|
||||||
|
|
||||||
# Bug 3 in Playwright screenshot handling
|
# Bug 3 in Playwright screenshot handling
|
||||||
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
|
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
|
||||||
@@ -204,18 +273,25 @@ class fetcher(Fetcher):
|
|||||||
# acceptable screenshot quality here
|
# acceptable screenshot quality here
|
||||||
try:
|
try:
|
||||||
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
|
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
|
||||||
full_height = self.page.evaluate("document.documentElement.scrollHeight")
|
self.screenshot = capture_full_page(page=self.page)
|
||||||
|
|
||||||
if full_height >= SCREENSHOT_SIZE_STITCH_THRESHOLD:
|
|
||||||
logger.warning(
|
|
||||||
f"Page full Height: {full_height}px longer than {SCREENSHOT_SIZE_STITCH_THRESHOLD}px, using 'stitched screenshot method'.")
|
|
||||||
self.screenshot = capture_stitched_together_full_page(self.page)
|
|
||||||
else:
|
|
||||||
self.screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=int(os.getenv("SCREENSHOT_QUALITY", 30)))
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
# It's likely the screenshot was too long/big and something crashed
|
# It's likely the screenshot was too long/big and something crashed
|
||||||
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
|
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
|
||||||
finally:
|
finally:
|
||||||
|
# Request garbage collection one more time before closing
|
||||||
|
try:
|
||||||
|
self.page.request_gc()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Clean up resources properly
|
||||||
context.close()
|
context.close()
|
||||||
|
context = None
|
||||||
|
|
||||||
|
self.page.close()
|
||||||
|
self.page = None
|
||||||
|
|
||||||
browser.close()
|
browser.close()
|
||||||
|
borwser = None
|
||||||
|
|
||||||
|
|||||||
@@ -6,8 +6,76 @@ from urllib.parse import urlparse
|
|||||||
|
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
|
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
|
||||||
|
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_DEFAULT_QUALITY, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, \
|
||||||
|
SCREENSHOT_MAX_TOTAL_HEIGHT
|
||||||
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
|
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
|
||||||
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, BrowserFetchTimedOut, BrowserConnectError
|
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, BrowserFetchTimedOut, \
|
||||||
|
BrowserConnectError
|
||||||
|
|
||||||
|
|
||||||
|
# Bug 3 in Playwright screenshot handling
|
||||||
|
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
|
||||||
|
|
||||||
|
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
|
||||||
|
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
|
||||||
|
# acceptable screenshot quality here
|
||||||
|
async def capture_full_page(page):
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
from multiprocessing import Process, Pipe
|
||||||
|
|
||||||
|
start = time.time()
|
||||||
|
|
||||||
|
page_height = await page.evaluate("document.documentElement.scrollHeight")
|
||||||
|
page_width = await page.evaluate("document.documentElement.scrollWidth")
|
||||||
|
original_viewport = page.viewport
|
||||||
|
|
||||||
|
logger.debug(f"Puppeteer viewport size {page.viewport} page height {page_height} page width {page_width}")
|
||||||
|
|
||||||
|
# Bug 3 in Playwright screenshot handling
|
||||||
|
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
|
||||||
|
# JPEG is better here because the screenshots can be very very large
|
||||||
|
|
||||||
|
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
|
||||||
|
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
|
||||||
|
# acceptable screenshot quality here
|
||||||
|
|
||||||
|
|
||||||
|
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Something that will not cause the GPU to overflow when taking the screenshot
|
||||||
|
screenshot_chunks = []
|
||||||
|
y = 0
|
||||||
|
if page_height > page.viewport['height']:
|
||||||
|
await page.setViewport({'width': page.viewport['width'], 'height': step_size})
|
||||||
|
|
||||||
|
|
||||||
|
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
|
||||||
|
await page.evaluate(f"window.scrollTo(0, {y})")
|
||||||
|
screenshot_chunks.append(await page.screenshot(type_='jpeg',
|
||||||
|
fullPage=False,
|
||||||
|
quality=int(os.getenv("SCREENSHOT_QUALITY", 72))))
|
||||||
|
y += step_size
|
||||||
|
|
||||||
|
await page.setViewport({'width': original_viewport['width'], 'height': original_viewport['height']})
|
||||||
|
|
||||||
|
if len(screenshot_chunks) > 1:
|
||||||
|
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
|
||||||
|
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
|
||||||
|
parent_conn, child_conn = Pipe()
|
||||||
|
p = Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
|
||||||
|
p.start()
|
||||||
|
screenshot = parent_conn.recv_bytes()
|
||||||
|
p.join()
|
||||||
|
logger.debug(
|
||||||
|
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
|
||||||
|
|
||||||
|
screenshot_chunks = None
|
||||||
|
return screenshot
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
|
||||||
|
return screenshot_chunks[0]
|
||||||
|
|
||||||
|
|
||||||
class fetcher(Fetcher):
|
class fetcher(Fetcher):
|
||||||
fetcher_description = "Puppeteer/direct {}/Javascript".format(
|
fetcher_description = "Puppeteer/direct {}/Javascript".format(
|
||||||
@@ -79,7 +147,6 @@ class fetcher(Fetcher):
|
|||||||
empty_pages_are_a_change
|
empty_pages_are_a_change
|
||||||
):
|
):
|
||||||
|
|
||||||
from changedetectionio.content_fetchers import visualselector_xpath_selectors
|
|
||||||
self.delete_browser_steps_screenshots()
|
self.delete_browser_steps_screenshots()
|
||||||
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
|
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
|
||||||
|
|
||||||
@@ -181,11 +248,10 @@ class fetcher(Fetcher):
|
|||||||
raise PageUnloadable(url=url, status_code=None, message=str(e))
|
raise PageUnloadable(url=url, status_code=None, message=str(e))
|
||||||
|
|
||||||
if self.status_code != 200 and not ignore_status_codes:
|
if self.status_code != 200 and not ignore_status_codes:
|
||||||
screenshot = await self.page.screenshot(type_='jpeg',
|
screenshot = await capture_full_page(page=self.page)
|
||||||
fullPage=True,
|
|
||||||
quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
|
|
||||||
|
|
||||||
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
|
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
|
||||||
|
|
||||||
content = await self.page.content
|
content = await self.page.content
|
||||||
|
|
||||||
if not empty_pages_are_a_change and len(content.strip()) == 0:
|
if not empty_pages_are_a_change and len(content.strip()) == 0:
|
||||||
@@ -203,46 +269,31 @@ class fetcher(Fetcher):
|
|||||||
|
|
||||||
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
|
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
|
||||||
# Setup the xPath/VisualSelector scraper
|
# Setup the xPath/VisualSelector scraper
|
||||||
if current_include_filters is not None:
|
if current_include_filters:
|
||||||
js = json.dumps(current_include_filters)
|
js = json.dumps(current_include_filters)
|
||||||
await self.page.evaluate(f"var include_filters={js}")
|
await self.page.evaluate(f"var include_filters={js}")
|
||||||
else:
|
else:
|
||||||
await self.page.evaluate(f"var include_filters=''")
|
await self.page.evaluate(f"var include_filters=''")
|
||||||
|
|
||||||
self.xpath_data = await self.page.evaluate(
|
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
|
||||||
"async () => {" + self.xpath_element_js.replace('%ELEMENTS%', visualselector_xpath_selectors) + "}")
|
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
|
||||||
self.instock_data = await self.page.evaluate("async () => {" + self.instock_data_js + "}")
|
"visualselector_xpath_selectors": visualselector_xpath_selectors,
|
||||||
|
"max_height": MAX_TOTAL_HEIGHT
|
||||||
|
})
|
||||||
|
if not self.xpath_data:
|
||||||
|
raise Exception(f"Content Fetcher > xPath scraper failed. Please report this URL so we can fix it :)")
|
||||||
|
|
||||||
|
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
|
||||||
|
|
||||||
self.content = await self.page.content
|
self.content = await self.page.content
|
||||||
# Bug 3 in Playwright screenshot handling
|
|
||||||
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
|
|
||||||
# JPEG is better here because the screenshots can be very very large
|
|
||||||
|
|
||||||
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
|
self.screenshot = await capture_full_page(page=self.page)
|
||||||
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
|
|
||||||
# acceptable screenshot quality here
|
# It's good to log here in the case that the browser crashes on shutting down but we still get the data we need
|
||||||
try:
|
logger.success(f"Fetching '{url}' complete, closing page")
|
||||||
self.screenshot = await self.page.screenshot(type_='jpeg',
|
await self.page.close()
|
||||||
fullPage=True,
|
logger.success(f"Fetching '{url}' complete, closing browser")
|
||||||
quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
|
await browser.close()
|
||||||
except Exception as e:
|
|
||||||
logger.error("Error fetching screenshot")
|
|
||||||
# // May fail on very large pages with 'WARNING: tile memory limits exceeded, some content may not draw'
|
|
||||||
# // @ todo after text extract, we can place some overlay text with red background to say 'croppped'
|
|
||||||
logger.error('ERROR: content-fetcher page was maybe too large for a screenshot, reverting to viewport only screenshot')
|
|
||||||
try:
|
|
||||||
self.screenshot = await self.page.screenshot(type_='jpeg',
|
|
||||||
fullPage=False,
|
|
||||||
quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
|
|
||||||
except Exception as e:
|
|
||||||
logger.error('ERROR: Failed to get viewport-only reduced screenshot :(')
|
|
||||||
pass
|
|
||||||
finally:
|
|
||||||
# It's good to log here in the case that the browser crashes on shutting down but we still get the data we need
|
|
||||||
logger.success(f"Fetching '{url}' complete, closing page")
|
|
||||||
await self.page.close()
|
|
||||||
logger.success(f"Fetching '{url}' complete, closing browser")
|
|
||||||
await browser.close()
|
|
||||||
logger.success(f"Fetching '{url}' complete, exiting puppeteer fetch.")
|
logger.success(f"Fetching '{url}' complete, exiting puppeteer fetch.")
|
||||||
|
|
||||||
async def main(self, **kwargs):
|
async def main(self, **kwargs):
|
||||||
|
|||||||
@@ -96,3 +96,17 @@ class fetcher(Fetcher):
|
|||||||
|
|
||||||
|
|
||||||
self.raw_content = r.content
|
self.raw_content = r.content
|
||||||
|
|
||||||
|
def quit(self, watch=None):
|
||||||
|
|
||||||
|
# In case they switched to `requests` fetcher from something else
|
||||||
|
# Then the screenshot could be old, in any case, it's not used here.
|
||||||
|
# REMOVE_REQUESTS_OLD_SCREENSHOTS - Mainly used for testing
|
||||||
|
if strtobool(os.getenv("REMOVE_REQUESTS_OLD_SCREENSHOTS", 'true')):
|
||||||
|
screenshot = watch.get_screenshot()
|
||||||
|
if screenshot:
|
||||||
|
try:
|
||||||
|
os.unlink(screenshot)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to unlink screenshot: {screenshot} - {e}")
|
||||||
|
|
||||||
|
|||||||
@@ -1,190 +0,0 @@
|
|||||||
module.exports = async ({page, context}) => {
|
|
||||||
|
|
||||||
var {
|
|
||||||
url,
|
|
||||||
execute_js,
|
|
||||||
user_agent,
|
|
||||||
extra_wait_ms,
|
|
||||||
req_headers,
|
|
||||||
include_filters,
|
|
||||||
xpath_element_js,
|
|
||||||
screenshot_quality,
|
|
||||||
proxy_username,
|
|
||||||
proxy_password,
|
|
||||||
disk_cache_dir,
|
|
||||||
no_cache_list,
|
|
||||||
block_url_list,
|
|
||||||
} = context;
|
|
||||||
|
|
||||||
await page.setBypassCSP(true)
|
|
||||||
await page.setExtraHTTPHeaders(req_headers);
|
|
||||||
|
|
||||||
if (user_agent) {
|
|
||||||
await page.setUserAgent(user_agent);
|
|
||||||
}
|
|
||||||
// https://ourcodeworld.com/articles/read/1106/how-to-solve-puppeteer-timeouterror-navigation-timeout-of-30000-ms-exceeded
|
|
||||||
|
|
||||||
await page.setDefaultNavigationTimeout(0);
|
|
||||||
|
|
||||||
if (proxy_username) {
|
|
||||||
// Setting Proxy-Authentication header is deprecated, and doing so can trigger header change errors from Puppeteer
|
|
||||||
// https://github.com/puppeteer/puppeteer/issues/676 ?
|
|
||||||
// https://help.brightdata.com/hc/en-us/articles/12632549957649-Proxy-Manager-How-to-Guides#h_01HAKWR4Q0AFS8RZTNYWRDFJC2
|
|
||||||
// https://cri.dev/posts/2020-03-30-How-to-solve-Puppeteer-Chrome-Error-ERR_INVALID_ARGUMENT/
|
|
||||||
await page.authenticate({
|
|
||||||
username: proxy_username,
|
|
||||||
password: proxy_password
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
await page.setViewport({
|
|
||||||
width: 1024,
|
|
||||||
height: 768,
|
|
||||||
deviceScaleFactor: 1,
|
|
||||||
});
|
|
||||||
|
|
||||||
await page.setRequestInterception(true);
|
|
||||||
if (disk_cache_dir) {
|
|
||||||
console.log(">>>>>>>>>>>>>>> LOCAL DISK CACHE ENABLED <<<<<<<<<<<<<<<<<<<<<");
|
|
||||||
}
|
|
||||||
const fs = require('fs');
|
|
||||||
const crypto = require('crypto');
|
|
||||||
|
|
||||||
function file_is_expired(file_path) {
|
|
||||||
if (!fs.existsSync(file_path)) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
var stats = fs.statSync(file_path);
|
|
||||||
const now_date = new Date();
|
|
||||||
const expire_seconds = 300;
|
|
||||||
if ((now_date / 1000) - (stats.mtime.getTime() / 1000) > expire_seconds) {
|
|
||||||
console.log("CACHE EXPIRED: " + file_path);
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
return false;
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
page.on('request', async (request) => {
|
|
||||||
// General blocking of requests that waste traffic
|
|
||||||
if (block_url_list.some(substring => request.url().toLowerCase().includes(substring))) return request.abort();
|
|
||||||
|
|
||||||
if (disk_cache_dir) {
|
|
||||||
const url = request.url();
|
|
||||||
const key = crypto.createHash('md5').update(url).digest("hex");
|
|
||||||
const dir_path = disk_cache_dir + key.slice(0, 1) + '/' + key.slice(1, 2) + '/' + key.slice(2, 3) + '/';
|
|
||||||
|
|
||||||
// https://stackoverflow.com/questions/4482686/check-synchronously-if-file-directory-exists-in-node-js
|
|
||||||
|
|
||||||
if (fs.existsSync(dir_path + key)) {
|
|
||||||
console.log("* CACHE HIT , using - " + dir_path + key + " - " + url);
|
|
||||||
const cached_data = fs.readFileSync(dir_path + key);
|
|
||||||
// @todo headers can come from dir_path+key+".meta" json file
|
|
||||||
request.respond({
|
|
||||||
status: 200,
|
|
||||||
//contentType: 'text/html', //@todo
|
|
||||||
body: cached_data
|
|
||||||
});
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
request.continue();
|
|
||||||
});
|
|
||||||
|
|
||||||
|
|
||||||
if (disk_cache_dir) {
|
|
||||||
page.on('response', async (response) => {
|
|
||||||
const url = response.url();
|
|
||||||
// Basic filtering for sane responses
|
|
||||||
if (response.request().method() != 'GET' || response.request().resourceType() == 'xhr' || response.request().resourceType() == 'document' || response.status() != 200) {
|
|
||||||
console.log("Skipping (not useful) - Status:" + response.status() + " Method:" + response.request().method() + " ResourceType:" + response.request().resourceType() + " " + url);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
if (no_cache_list.some(substring => url.toLowerCase().includes(substring))) {
|
|
||||||
console.log("Skipping (no_cache_list) - " + url);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
if (url.toLowerCase().includes('data:')) {
|
|
||||||
console.log("Skipping (embedded-data) - " + url);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
response.buffer().then(buffer => {
|
|
||||||
if (buffer.length > 100) {
|
|
||||||
console.log("Cache - Saving " + response.request().method() + " - " + url + " - " + response.request().resourceType());
|
|
||||||
|
|
||||||
const key = crypto.createHash('md5').update(url).digest("hex");
|
|
||||||
const dir_path = disk_cache_dir + key.slice(0, 1) + '/' + key.slice(1, 2) + '/' + key.slice(2, 3) + '/';
|
|
||||||
|
|
||||||
if (!fs.existsSync(dir_path)) {
|
|
||||||
fs.mkdirSync(dir_path, {recursive: true})
|
|
||||||
}
|
|
||||||
|
|
||||||
if (fs.existsSync(dir_path + key)) {
|
|
||||||
if (file_is_expired(dir_path + key)) {
|
|
||||||
fs.writeFileSync(dir_path + key, buffer);
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
fs.writeFileSync(dir_path + key, buffer);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
});
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
const r = await page.goto(url, {
|
|
||||||
waitUntil: 'load'
|
|
||||||
});
|
|
||||||
|
|
||||||
await page.waitForTimeout(1000);
|
|
||||||
await page.waitForTimeout(extra_wait_ms);
|
|
||||||
|
|
||||||
if (execute_js) {
|
|
||||||
await page.evaluate(execute_js);
|
|
||||||
await page.waitForTimeout(200);
|
|
||||||
}
|
|
||||||
|
|
||||||
var xpath_data;
|
|
||||||
var instock_data;
|
|
||||||
try {
|
|
||||||
// Not sure the best way here, in the future this should be a new package added to npm then run in evaluatedCode
|
|
||||||
// (Once the old playwright is removed)
|
|
||||||
xpath_data = await page.evaluate((include_filters) => {%xpath_scrape_code%}, include_filters);
|
|
||||||
instock_data = await page.evaluate(() => {%instock_scrape_code%});
|
|
||||||
} catch (e) {
|
|
||||||
console.log(e);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Protocol error (Page.captureScreenshot): Cannot take screenshot with 0 width can come from a proxy auth failure
|
|
||||||
// Wrap it here (for now)
|
|
||||||
|
|
||||||
var b64s = false;
|
|
||||||
try {
|
|
||||||
b64s = await page.screenshot({encoding: "base64", fullPage: true, quality: screenshot_quality, type: 'jpeg'});
|
|
||||||
} catch (e) {
|
|
||||||
console.log(e);
|
|
||||||
}
|
|
||||||
|
|
||||||
// May fail on very large pages with 'WARNING: tile memory limits exceeded, some content may not draw'
|
|
||||||
if (!b64s) {
|
|
||||||
// @todo after text extract, we can place some overlay text with red background to say 'croppped'
|
|
||||||
console.error('ERROR: content-fetcher page was maybe too large for a screenshot, reverting to viewport only screenshot');
|
|
||||||
try {
|
|
||||||
b64s = await page.screenshot({encoding: "base64", quality: screenshot_quality, type: 'jpeg'});
|
|
||||||
} catch (e) {
|
|
||||||
console.log(e);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
var html = await page.content();
|
|
||||||
return {
|
|
||||||
data: {
|
|
||||||
'content': html,
|
|
||||||
'headers': r.headers(),
|
|
||||||
'instock_data': instock_data,
|
|
||||||
'screenshot': b64s,
|
|
||||||
'status_code': r.status(),
|
|
||||||
'xpath_data': xpath_data
|
|
||||||
},
|
|
||||||
type: 'application/json',
|
|
||||||
};
|
|
||||||
};
|
|
||||||
@@ -1,223 +1,220 @@
|
|||||||
// Restock Detector
|
async () => {
|
||||||
// (c) Leigh Morresi dgtlmoon@gmail.com
|
|
||||||
//
|
|
||||||
// Assumes the product is in stock to begin with, unless the following appears above the fold ;
|
|
||||||
// - outOfStockTexts appears above the fold (out of stock)
|
|
||||||
// - negateOutOfStockRegex (really is in stock)
|
|
||||||
|
|
||||||
function isItemInStock() {
|
function isItemInStock() {
|
||||||
// @todo Pass these in so the same list can be used in non-JS fetchers
|
// @todo Pass these in so the same list can be used in non-JS fetchers
|
||||||
const outOfStockTexts = [
|
const outOfStockTexts = [
|
||||||
' أخبرني عندما يتوفر',
|
' أخبرني عندما يتوفر',
|
||||||
'0 in stock',
|
'0 in stock',
|
||||||
'actuellement indisponible',
|
'actuellement indisponible',
|
||||||
'agotado',
|
'agotado',
|
||||||
'article épuisé',
|
'article épuisé',
|
||||||
'artikel zurzeit vergriffen',
|
'artikel zurzeit vergriffen',
|
||||||
'as soon as stock is available',
|
'as soon as stock is available',
|
||||||
'ausverkauft', // sold out
|
'ausverkauft', // sold out
|
||||||
'available for back order',
|
'available for back order',
|
||||||
'awaiting stock',
|
'awaiting stock',
|
||||||
'back in stock soon',
|
'back in stock soon',
|
||||||
'back-order or out of stock',
|
'back-order or out of stock',
|
||||||
'backordered',
|
'backordered',
|
||||||
'benachrichtigt mich', // notify me
|
'benachrichtigt mich', // notify me
|
||||||
'brak na stanie',
|
'brak na stanie',
|
||||||
'brak w magazynie',
|
'brak w magazynie',
|
||||||
'coming soon',
|
'coming soon',
|
||||||
'currently have any tickets for this',
|
'currently have any tickets for this',
|
||||||
'currently unavailable',
|
'currently unavailable',
|
||||||
'dieser artikel ist bald wieder verfügbar',
|
'dieser artikel ist bald wieder verfügbar',
|
||||||
'dostępne wkrótce',
|
'dostępne wkrótce',
|
||||||
'en rupture',
|
'en rupture',
|
||||||
'en rupture de stock',
|
'en rupture de stock',
|
||||||
'épuisé',
|
'épuisé',
|
||||||
'esgotado',
|
'esgotado',
|
||||||
'indisponible',
|
'indisponible',
|
||||||
'indisponível',
|
'indisponível',
|
||||||
'isn\'t in stock right now',
|
'isn\'t in stock right now',
|
||||||
'isnt in stock right now',
|
'isnt in stock right now',
|
||||||
'isn’t in stock right now',
|
'isn’t in stock right now',
|
||||||
'item is no longer available',
|
'item is no longer available',
|
||||||
'let me know when it\'s available',
|
'let me know when it\'s available',
|
||||||
'mail me when available',
|
'mail me when available',
|
||||||
'message if back in stock',
|
'message if back in stock',
|
||||||
'mevcut değil',
|
'mevcut değil',
|
||||||
'nachricht bei',
|
'nachricht bei',
|
||||||
'nicht auf lager',
|
'nicht auf lager',
|
||||||
'nicht lagernd',
|
'nicht lagernd',
|
||||||
'nicht lieferbar',
|
'nicht lieferbar',
|
||||||
'nicht verfügbar',
|
'nicht verfügbar',
|
||||||
'nicht vorrätig',
|
'nicht vorrätig',
|
||||||
'nicht zur verfügung',
|
'nicht zur verfügung',
|
||||||
'nie znaleziono produktów',
|
'nie znaleziono produktów',
|
||||||
'niet beschikbaar',
|
'niet beschikbaar',
|
||||||
'niet leverbaar',
|
'niet leverbaar',
|
||||||
'niet op voorraad',
|
'niet op voorraad',
|
||||||
'no disponible',
|
'no disponible',
|
||||||
'non disponibile',
|
'non disponibile',
|
||||||
'non disponible',
|
'non disponible',
|
||||||
'no longer in stock',
|
'no longer in stock',
|
||||||
'no tickets available',
|
'no tickets available',
|
||||||
'not available',
|
'not available',
|
||||||
'not currently available',
|
'not currently available',
|
||||||
'not in stock',
|
'not in stock',
|
||||||
'notify me when available',
|
'notify me when available',
|
||||||
'notify me',
|
'notify me',
|
||||||
'notify when available',
|
'notify when available',
|
||||||
'não disponível',
|
'não disponível',
|
||||||
'não estamos a aceitar encomendas',
|
'não estamos a aceitar encomendas',
|
||||||
'out of stock',
|
'out of stock',
|
||||||
'out-of-stock',
|
'out-of-stock',
|
||||||
'plus disponible',
|
'plus disponible',
|
||||||
'prodotto esaurito',
|
'prodotto esaurito',
|
||||||
'produkt niedostępny',
|
'produkt niedostępny',
|
||||||
'rupture',
|
'rupture',
|
||||||
'sold out',
|
'sold out',
|
||||||
'sold-out',
|
'sold-out',
|
||||||
'stokta yok',
|
'stok habis',
|
||||||
'temporarily out of stock',
|
'stok kosong',
|
||||||
'temporarily unavailable',
|
'stok varian ini habis',
|
||||||
'there were no search results for',
|
'stokta yok',
|
||||||
'this item is currently unavailable',
|
'temporarily out of stock',
|
||||||
'tickets unavailable',
|
'temporarily unavailable',
|
||||||
'tijdelijk uitverkocht',
|
'there were no search results for',
|
||||||
'tükendi',
|
'this item is currently unavailable',
|
||||||
'unavailable nearby',
|
'tickets unavailable',
|
||||||
'unavailable tickets',
|
'tidak dijual',
|
||||||
'vergriffen',
|
'tidak tersedia',
|
||||||
'vorbestellen',
|
'tijdelijk uitverkocht',
|
||||||
'vorbestellung ist bald möglich',
|
'tiket tidak tersedia',
|
||||||
'we don\'t currently have any',
|
'tükendi',
|
||||||
'we couldn\'t find any products that match',
|
'unavailable nearby',
|
||||||
'we do not currently have an estimate of when this product will be back in stock.',
|
'unavailable tickets',
|
||||||
'we don\'t know when or if this item will be back in stock.',
|
'vergriffen',
|
||||||
'we were not able to find a match',
|
'vorbestellen',
|
||||||
'when this arrives in stock',
|
'vorbestellung ist bald möglich',
|
||||||
'zur zeit nicht an lager',
|
'we don\'t currently have any',
|
||||||
'品切れ',
|
'we couldn\'t find any products that match',
|
||||||
'已售',
|
'we do not currently have an estimate of when this product will be back in stock.',
|
||||||
'已售完',
|
'we don\'t know when or if this item will be back in stock.',
|
||||||
'품절'
|
'we were not able to find a match',
|
||||||
];
|
'when this arrives in stock',
|
||||||
|
'zur zeit nicht an lager',
|
||||||
|
'品切れ',
|
||||||
|
'已售',
|
||||||
|
'已售完',
|
||||||
|
'품절'
|
||||||
|
];
|
||||||
|
|
||||||
|
|
||||||
const vh = Math.max(document.documentElement.clientHeight || 0, window.innerHeight || 0);
|
const vh = Math.max(document.documentElement.clientHeight || 0, window.innerHeight || 0);
|
||||||
|
|
||||||
function getElementBaseText(element) {
|
function getElementBaseText(element) {
|
||||||
// .textContent can include text from children which may give the wrong results
|
// .textContent can include text from children which may give the wrong results
|
||||||
// scan only immediate TEXT_NODEs, which will be a child of the element
|
// scan only immediate TEXT_NODEs, which will be a child of the element
|
||||||
var text = "";
|
var text = "";
|
||||||
for (var i = 0; i < element.childNodes.length; ++i)
|
for (var i = 0; i < element.childNodes.length; ++i)
|
||||||
if (element.childNodes[i].nodeType === Node.TEXT_NODE)
|
if (element.childNodes[i].nodeType === Node.TEXT_NODE)
|
||||||
text += element.childNodes[i].textContent;
|
text += element.childNodes[i].textContent;
|
||||||
return text.toLowerCase().trim();
|
return text.toLowerCase().trim();
|
||||||
}
|
}
|
||||||
|
|
||||||
const negateOutOfStockRegex = new RegExp('^([0-9] in stock|add to cart|in stock)', 'ig');
|
const negateOutOfStockRegex = new RegExp('^([0-9] in stock|add to cart|in stock)', 'ig');
|
||||||
|
|
||||||
// The out-of-stock or in-stock-text is generally always above-the-fold
|
// The out-of-stock or in-stock-text is generally always above-the-fold
|
||||||
// and often below-the-fold is a list of related products that may or may not contain trigger text
|
// and often below-the-fold is a list of related products that may or may not contain trigger text
|
||||||
// so it's good to filter to just the 'above the fold' elements
|
// so it's good to filter to just the 'above the fold' elements
|
||||||
// and it should be atleast 100px from the top to ignore items in the toolbar, sometimes menu items like "Coming soon" exist
|
// and it should be atleast 100px from the top to ignore items in the toolbar, sometimes menu items like "Coming soon" exist
|
||||||
|
|
||||||
|
|
||||||
// @todo - if it's SVG or IMG, go into image diff mode
|
// @todo - if it's SVG or IMG, go into image diff mode
|
||||||
// %ELEMENTS% replaced at injection time because different interfaces use it with different settings
|
|
||||||
|
|
||||||
console.log("Scanning %ELEMENTS%");
|
function collectVisibleElements(parent, visibleElements) {
|
||||||
|
if (!parent) return; // Base case: if parent is null or undefined, return
|
||||||
|
|
||||||
function collectVisibleElements(parent, visibleElements) {
|
// Add the parent itself to the visible elements array if it's of the specified types
|
||||||
if (!parent) return; // Base case: if parent is null or undefined, return
|
visibleElements.push(parent);
|
||||||
|
|
||||||
// Add the parent itself to the visible elements array if it's of the specified types
|
// Iterate over the parent's children
|
||||||
visibleElements.push(parent);
|
const children = parent.children;
|
||||||
|
for (let i = 0; i < children.length; i++) {
|
||||||
// Iterate over the parent's children
|
const child = children[i];
|
||||||
const children = parent.children;
|
if (
|
||||||
for (let i = 0; i < children.length; i++) {
|
child.nodeType === Node.ELEMENT_NODE &&
|
||||||
const child = children[i];
|
window.getComputedStyle(child).display !== 'none' &&
|
||||||
if (
|
window.getComputedStyle(child).visibility !== 'hidden' &&
|
||||||
child.nodeType === Node.ELEMENT_NODE &&
|
child.offsetWidth >= 0 &&
|
||||||
window.getComputedStyle(child).display !== 'none' &&
|
child.offsetHeight >= 0 &&
|
||||||
window.getComputedStyle(child).visibility !== 'hidden' &&
|
window.getComputedStyle(child).contentVisibility !== 'hidden'
|
||||||
child.offsetWidth >= 0 &&
|
) {
|
||||||
child.offsetHeight >= 0 &&
|
// If the child is an element and is visible, recursively collect visible elements
|
||||||
window.getComputedStyle(child).contentVisibility !== 'hidden'
|
collectVisibleElements(child, visibleElements);
|
||||||
) {
|
}
|
||||||
// If the child is an element and is visible, recursively collect visible elements
|
|
||||||
collectVisibleElements(child, visibleElements);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
const elementsToScan = [];
|
const elementsToScan = [];
|
||||||
collectVisibleElements(document.body, elementsToScan);
|
collectVisibleElements(document.body, elementsToScan);
|
||||||
|
|
||||||
var elementText = "";
|
var elementText = "";
|
||||||
|
|
||||||
// REGEXS THAT REALLY MEAN IT'S IN STOCK
|
// REGEXS THAT REALLY MEAN IT'S IN STOCK
|
||||||
for (let i = elementsToScan.length - 1; i >= 0; i--) {
|
for (let i = elementsToScan.length - 1; i >= 0; i--) {
|
||||||
const element = elementsToScan[i];
|
const element = elementsToScan[i];
|
||||||
|
|
||||||
// outside the 'fold' or some weird text in the heading area
|
// outside the 'fold' or some weird text in the heading area
|
||||||
// .getBoundingClientRect() was causing a crash in chrome 119, can only be run on contentVisibility != hidden
|
// .getBoundingClientRect() was causing a crash in chrome 119, can only be run on contentVisibility != hidden
|
||||||
if (element.getBoundingClientRect().top + window.scrollY >= vh || element.getBoundingClientRect().top + window.scrollY <= 100) {
|
if (element.getBoundingClientRect().top + window.scrollY >= vh || element.getBoundingClientRect().top + window.scrollY <= 100) {
|
||||||
continue
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
elementText = "";
|
||||||
|
try {
|
||||||
|
if (element.tagName.toLowerCase() === "input") {
|
||||||
|
elementText = element.value.toLowerCase().trim();
|
||||||
|
} else {
|
||||||
|
elementText = getElementBaseText(element);
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
console.warn('stock-not-in-stock.js scraper - handling element for gettext failed', e);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (elementText.length) {
|
||||||
|
// try which ones could mean its in stock
|
||||||
|
if (negateOutOfStockRegex.test(elementText) && !elementText.includes('(0 products)')) {
|
||||||
|
console.log(`Negating/overriding 'Out of Stock' back to "Possibly in stock" found "${elementText}"`)
|
||||||
|
return 'Possibly in stock';
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
elementText = "";
|
// OTHER STUFF THAT COULD BE THAT IT'S OUT OF STOCK
|
||||||
try {
|
for (let i = elementsToScan.length - 1; i >= 0; i--) {
|
||||||
|
const element = elementsToScan[i];
|
||||||
|
// outside the 'fold' or some weird text in the heading area
|
||||||
|
// .getBoundingClientRect() was causing a crash in chrome 119, can only be run on contentVisibility != hidden
|
||||||
|
// Note: theres also an automated test that places the 'out of stock' text fairly low down
|
||||||
|
if (element.getBoundingClientRect().top + window.scrollY >= vh + 250 || element.getBoundingClientRect().top + window.scrollY <= 100) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
elementText = "";
|
||||||
if (element.tagName.toLowerCase() === "input") {
|
if (element.tagName.toLowerCase() === "input") {
|
||||||
elementText = element.value.toLowerCase().trim();
|
elementText = element.value.toLowerCase().trim();
|
||||||
} else {
|
} else {
|
||||||
elementText = getElementBaseText(element);
|
elementText = getElementBaseText(element);
|
||||||
}
|
}
|
||||||
} catch (e) {
|
|
||||||
console.warn('stock-not-in-stock.js scraper - handling element for gettext failed', e);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (elementText.length) {
|
if (elementText.length) {
|
||||||
// try which ones could mean its in stock
|
// and these mean its out of stock
|
||||||
if (negateOutOfStockRegex.test(elementText) && !elementText.includes('(0 products)')) {
|
for (const outOfStockText of outOfStockTexts) {
|
||||||
console.log(`Negating/overriding 'Out of Stock' back to "Possibly in stock" found "${elementText}"`)
|
if (elementText.includes(outOfStockText)) {
|
||||||
return 'Possibly in stock';
|
console.log(`Selected 'Out of Stock' - found text "${outOfStockText}" - "${elementText}" - offset top ${element.getBoundingClientRect().top}, page height is ${vh}`)
|
||||||
}
|
return outOfStockText; // item is out of stock
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
// OTHER STUFF THAT COULD BE THAT IT'S OUT OF STOCK
|
|
||||||
for (let i = elementsToScan.length - 1; i >= 0; i--) {
|
|
||||||
const element = elementsToScan[i];
|
|
||||||
// outside the 'fold' or some weird text in the heading area
|
|
||||||
// .getBoundingClientRect() was causing a crash in chrome 119, can only be run on contentVisibility != hidden
|
|
||||||
// Note: theres also an automated test that places the 'out of stock' text fairly low down
|
|
||||||
if (element.getBoundingClientRect().top + window.scrollY >= vh + 250 || element.getBoundingClientRect().top + window.scrollY <= 100) {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
elementText = "";
|
|
||||||
if (element.tagName.toLowerCase() === "input") {
|
|
||||||
elementText = element.value.toLowerCase().trim();
|
|
||||||
} else {
|
|
||||||
elementText = getElementBaseText(element);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (elementText.length) {
|
|
||||||
// and these mean its out of stock
|
|
||||||
for (const outOfStockText of outOfStockTexts) {
|
|
||||||
if (elementText.includes(outOfStockText)) {
|
|
||||||
console.log(`Selected 'Out of Stock' - found text "${outOfStockText}" - "${elementText}" - offset top ${element.getBoundingClientRect().top}, page height is ${vh}`)
|
|
||||||
return outOfStockText; // item is out of stock
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
console.log(`Returning 'Possibly in stock' - cant' find any useful matching text`)
|
||||||
|
return 'Possibly in stock'; // possibly in stock, cant decide otherwise.
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(`Returning 'Possibly in stock' - cant' find any useful matching text`)
|
|
||||||
return 'Possibly in stock'; // possibly in stock, cant decide otherwise.
|
|
||||||
}
|
|
||||||
|
|
||||||
// returns the element text that makes it think it's out of stock
|
// returns the element text that makes it think it's out of stock
|
||||||
return isItemInStock().trim()
|
return isItemInStock().trim()
|
||||||
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,285 +1,285 @@
|
|||||||
// Copyright (C) 2021 Leigh Morresi (dgtlmoon@gmail.com)
|
async (options) => {
|
||||||
// All rights reserved.
|
|
||||||
|
|
||||||
// @file Scrape the page looking for elements of concern (%ELEMENTS%)
|
let visualselector_xpath_selectors = options.visualselector_xpath_selectors
|
||||||
// http://matatk.agrip.org.uk/tests/position-and-width/
|
let max_height = options.max_height
|
||||||
// https://stackoverflow.com/questions/26813480/when-is-element-getboundingclientrect-guaranteed-to-be-updated-accurate
|
|
||||||
//
|
|
||||||
// Some pages like https://www.londonstockexchange.com/stock/NCCL/ncondezi-energy-limited/analysis
|
|
||||||
// will automatically force a scroll somewhere, so include the position offset
|
|
||||||
// Lets hope the position doesnt change while we iterate the bbox's, but this is better than nothing
|
|
||||||
var scroll_y = 0;
|
|
||||||
try {
|
|
||||||
scroll_y = +document.documentElement.scrollTop || document.body.scrollTop
|
|
||||||
} catch (e) {
|
|
||||||
console.log(e);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
var scroll_y = 0;
|
||||||
// Include the getXpath script directly, easier than fetching
|
|
||||||
function getxpath(e) {
|
|
||||||
var n = e;
|
|
||||||
if (n && n.id) return '//*[@id="' + n.id + '"]';
|
|
||||||
for (var o = []; n && Node.ELEMENT_NODE === n.nodeType;) {
|
|
||||||
for (var i = 0, r = !1, d = n.previousSibling; d;) d.nodeType !== Node.DOCUMENT_TYPE_NODE && d.nodeName === n.nodeName && i++, d = d.previousSibling;
|
|
||||||
for (d = n.nextSibling; d;) {
|
|
||||||
if (d.nodeName === n.nodeName) {
|
|
||||||
r = !0;
|
|
||||||
break
|
|
||||||
}
|
|
||||||
d = d.nextSibling
|
|
||||||
}
|
|
||||||
o.push((n.prefix ? n.prefix + ":" : "") + n.localName + (i || r ? "[" + (i + 1) + "]" : "")), n = n.parentNode
|
|
||||||
}
|
|
||||||
return o.length ? "/" + o.reverse().join("/") : ""
|
|
||||||
}
|
|
||||||
|
|
||||||
const findUpTag = (el) => {
|
|
||||||
let r = el
|
|
||||||
chained_css = [];
|
|
||||||
depth = 0;
|
|
||||||
|
|
||||||
// Strategy 1: If it's an input, with name, and there's only one, prefer that
|
|
||||||
if (el.name !== undefined && el.name.length) {
|
|
||||||
var proposed = el.tagName + "[name=\"" + CSS.escape(el.name) + "\"]";
|
|
||||||
var proposed_element = window.document.querySelectorAll(proposed);
|
|
||||||
if (proposed_element.length) {
|
|
||||||
if (proposed_element.length === 1) {
|
|
||||||
return proposed;
|
|
||||||
} else {
|
|
||||||
// Some sites change ID but name= stays the same, we can hit it if we know the index
|
|
||||||
// Find all the elements that match and work out the input[n]
|
|
||||||
var n = Array.from(proposed_element).indexOf(el);
|
|
||||||
// Return a Playwright selector for nthinput[name=zipcode]
|
|
||||||
return proposed + " >> nth=" + n;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Strategy 2: Keep going up until we hit an ID tag, imagine it's like #list-widget div h4
|
|
||||||
while (r.parentNode) {
|
|
||||||
if (depth === 5) {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
if ('' !== r.id) {
|
|
||||||
chained_css.unshift("#" + CSS.escape(r.id));
|
|
||||||
final_selector = chained_css.join(' > ');
|
|
||||||
// Be sure theres only one, some sites have multiples of the same ID tag :-(
|
|
||||||
if (window.document.querySelectorAll(final_selector).length === 1) {
|
|
||||||
return final_selector;
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
} else {
|
|
||||||
chained_css.unshift(r.tagName.toLowerCase());
|
|
||||||
}
|
|
||||||
r = r.parentNode;
|
|
||||||
depth += 1;
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
// @todo - if it's SVG or IMG, go into image diff mode
|
|
||||||
// %ELEMENTS% replaced at injection time because different interfaces use it with different settings
|
|
||||||
|
|
||||||
var size_pos = [];
|
|
||||||
// after page fetch, inject this JS
|
|
||||||
// build a map of all elements and their positions (maybe that only include text?)
|
|
||||||
var bbox;
|
|
||||||
console.log("Scanning %ELEMENTS%");
|
|
||||||
|
|
||||||
function collectVisibleElements(parent, visibleElements) {
|
|
||||||
if (!parent) return; // Base case: if parent is null or undefined, return
|
|
||||||
|
|
||||||
|
|
||||||
// Add the parent itself to the visible elements array if it's of the specified types
|
|
||||||
const tagName = parent.tagName.toLowerCase();
|
|
||||||
if ("%ELEMENTS%".split(',').includes(tagName)) {
|
|
||||||
visibleElements.push(parent);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Iterate over the parent's children
|
|
||||||
const children = parent.children;
|
|
||||||
for (let i = 0; i < children.length; i++) {
|
|
||||||
const child = children[i];
|
|
||||||
const computedStyle = window.getComputedStyle(child);
|
|
||||||
|
|
||||||
if (
|
|
||||||
child.nodeType === Node.ELEMENT_NODE &&
|
|
||||||
computedStyle.display !== 'none' &&
|
|
||||||
computedStyle.visibility !== 'hidden' &&
|
|
||||||
child.offsetWidth >= 0 &&
|
|
||||||
child.offsetHeight >= 0 &&
|
|
||||||
computedStyle.contentVisibility !== 'hidden'
|
|
||||||
) {
|
|
||||||
// If the child is an element and is visible, recursively collect visible elements
|
|
||||||
collectVisibleElements(child, visibleElements);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create an array to hold the visible elements
|
|
||||||
const visibleElementsArray = [];
|
|
||||||
|
|
||||||
// Call collectVisibleElements with the starting parent element
|
|
||||||
collectVisibleElements(document.body, visibleElementsArray);
|
|
||||||
|
|
||||||
|
|
||||||
visibleElementsArray.forEach(function (element) {
|
|
||||||
|
|
||||||
bbox = element.getBoundingClientRect();
|
|
||||||
|
|
||||||
// Skip really small ones, and where width or height ==0
|
|
||||||
if (bbox['width'] * bbox['height'] < 10) {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
// Don't include elements that are offset from canvas
|
|
||||||
if (bbox['top'] + scroll_y < 0 || bbox['left'] < 0) {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
// @todo the getXpath kind of sucks, it doesnt know when there is for example just one ID sometimes
|
|
||||||
// it should not traverse when we know we can anchor off just an ID one level up etc..
|
|
||||||
// maybe, get current class or id, keep traversing up looking for only class or id until there is just one match
|
|
||||||
|
|
||||||
// 1st primitive - if it has class, try joining it all and select, if theres only one.. well thats us.
|
|
||||||
xpath_result = false;
|
|
||||||
try {
|
try {
|
||||||
var d = findUpTag(element);
|
scroll_y = +document.documentElement.scrollTop || document.body.scrollTop
|
||||||
if (d) {
|
|
||||||
xpath_result = d;
|
|
||||||
}
|
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
console.log(e);
|
console.log(e);
|
||||||
}
|
}
|
||||||
// You could swap it and default to getXpath and then try the smarter one
|
|
||||||
// default back to the less intelligent one
|
// Include the getXpath script directly, easier than fetching
|
||||||
if (!xpath_result) {
|
function getxpath(e) {
|
||||||
try {
|
var n = e;
|
||||||
// I've seen on FB and eBay that this doesnt work
|
if (n && n.id) return '//*[@id="' + n.id + '"]';
|
||||||
// ReferenceError: getXPath is not defined at eval (eval at evaluate (:152:29), <anonymous>:67:20) at UtilityScript.evaluate (<anonymous>:159:18) at UtilityScript.<anonymous> (<anonymous>:1:44)
|
for (var o = []; n && Node.ELEMENT_NODE === n.nodeType;) {
|
||||||
xpath_result = getxpath(element);
|
for (var i = 0, r = !1, d = n.previousSibling; d;) d.nodeType !== Node.DOCUMENT_TYPE_NODE && d.nodeName === n.nodeName && i++, d = d.previousSibling;
|
||||||
} catch (e) {
|
for (d = n.nextSibling; d;) {
|
||||||
console.log(e);
|
if (d.nodeName === n.nodeName) {
|
||||||
return
|
r = !0;
|
||||||
|
break
|
||||||
|
}
|
||||||
|
d = d.nextSibling
|
||||||
|
}
|
||||||
|
o.push((n.prefix ? n.prefix + ":" : "") + n.localName + (i || r ? "[" + (i + 1) + "]" : "")), n = n.parentNode
|
||||||
|
}
|
||||||
|
return o.length ? "/" + o.reverse().join("/") : ""
|
||||||
|
}
|
||||||
|
|
||||||
|
const findUpTag = (el) => {
|
||||||
|
let r = el
|
||||||
|
chained_css = [];
|
||||||
|
depth = 0;
|
||||||
|
|
||||||
|
// Strategy 1: If it's an input, with name, and there's only one, prefer that
|
||||||
|
if (el.name !== undefined && el.name.length) {
|
||||||
|
var proposed = el.tagName + "[name=\"" + CSS.escape(el.name) + "\"]";
|
||||||
|
var proposed_element = window.document.querySelectorAll(proposed);
|
||||||
|
if (proposed_element.length) {
|
||||||
|
if (proposed_element.length === 1) {
|
||||||
|
return proposed;
|
||||||
|
} else {
|
||||||
|
// Some sites change ID but name= stays the same, we can hit it if we know the index
|
||||||
|
// Find all the elements that match and work out the input[n]
|
||||||
|
var n = Array.from(proposed_element).indexOf(el);
|
||||||
|
// Return a Playwright selector for nthinput[name=zipcode]
|
||||||
|
return proposed + " >> nth=" + n;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Strategy 2: Keep going up until we hit an ID tag, imagine it's like #list-widget div h4
|
||||||
|
while (r.parentNode) {
|
||||||
|
if (depth === 5) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if ('' !== r.id) {
|
||||||
|
chained_css.unshift("#" + CSS.escape(r.id));
|
||||||
|
final_selector = chained_css.join(' > ');
|
||||||
|
// Be sure theres only one, some sites have multiples of the same ID tag :-(
|
||||||
|
if (window.document.querySelectorAll(final_selector).length === 1) {
|
||||||
|
return final_selector;
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
} else {
|
||||||
|
chained_css.unshift(r.tagName.toLowerCase());
|
||||||
|
}
|
||||||
|
r = r.parentNode;
|
||||||
|
depth += 1;
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// @todo - if it's SVG or IMG, go into image diff mode
|
||||||
|
|
||||||
|
var size_pos = [];
|
||||||
|
// after page fetch, inject this JS
|
||||||
|
// build a map of all elements and their positions (maybe that only include text?)
|
||||||
|
var bbox;
|
||||||
|
console.log(`Scanning for "${visualselector_xpath_selectors}"`);
|
||||||
|
|
||||||
|
function collectVisibleElements(parent, visibleElements) {
|
||||||
|
if (!parent) return; // Base case: if parent is null or undefined, return
|
||||||
|
|
||||||
|
|
||||||
|
// Add the parent itself to the visible elements array if it's of the specified types
|
||||||
|
const tagName = parent.tagName.toLowerCase();
|
||||||
|
if (visualselector_xpath_selectors.split(',').includes(tagName)) {
|
||||||
|
visibleElements.push(parent);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Iterate over the parent's children
|
||||||
|
const children = parent.children;
|
||||||
|
for (let i = 0; i < children.length; i++) {
|
||||||
|
const child = children[i];
|
||||||
|
const computedStyle = window.getComputedStyle(child);
|
||||||
|
|
||||||
|
if (
|
||||||
|
child.nodeType === Node.ELEMENT_NODE &&
|
||||||
|
computedStyle.display !== 'none' &&
|
||||||
|
computedStyle.visibility !== 'hidden' &&
|
||||||
|
child.offsetWidth >= 0 &&
|
||||||
|
child.offsetHeight >= 0 &&
|
||||||
|
computedStyle.contentVisibility !== 'hidden'
|
||||||
|
) {
|
||||||
|
// If the child is an element and is visible, recursively collect visible elements
|
||||||
|
collectVisibleElements(child, visibleElements);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let label = "not-interesting" // A placeholder, the actual labels for training are done by hand for now
|
// Create an array to hold the visible elements
|
||||||
|
const visibleElementsArray = [];
|
||||||
|
|
||||||
let text = element.textContent.trim().slice(0, 30).trim();
|
// Call collectVisibleElements with the starting parent element
|
||||||
while (/\n{2,}|\t{2,}/.test(text)) {
|
collectVisibleElements(document.body, visibleElementsArray);
|
||||||
text = text.replace(/\n{2,}/g, '\n').replace(/\t{2,}/g, '\t')
|
|
||||||
}
|
|
||||||
|
|
||||||
// Try to identify any possible currency amounts "Sale: 4000" or "Sale now 3000 Kc", can help with the training.
|
|
||||||
const hasDigitCurrency = (/\d/.test(text.slice(0, 6)) || /\d/.test(text.slice(-6)) ) && /([€£$¥₩₹]|USD|AUD|EUR|Kč|kr|SEK|,–)/.test(text) ;
|
|
||||||
const computedStyle = window.getComputedStyle(element);
|
|
||||||
|
|
||||||
size_pos.push({
|
visibleElementsArray.forEach(function (element) {
|
||||||
xpath: xpath_result,
|
|
||||||
width: Math.round(bbox['width']),
|
bbox = element.getBoundingClientRect();
|
||||||
height: Math.round(bbox['height']),
|
|
||||||
left: Math.floor(bbox['left']),
|
// Skip really small ones, and where width or height ==0
|
||||||
top: Math.floor(bbox['top']) + scroll_y,
|
if (bbox['width'] * bbox['height'] < 10) {
|
||||||
// tagName used by Browser Steps
|
return
|
||||||
tagName: (element.tagName) ? element.tagName.toLowerCase() : '',
|
}
|
||||||
// tagtype used by Browser Steps
|
|
||||||
tagtype: (element.tagName.toLowerCase() === 'input' && element.type) ? element.type.toLowerCase() : '',
|
// Don't include elements that are offset from canvas
|
||||||
isClickable: computedStyle.cursor === "pointer",
|
if (bbox['top'] + scroll_y < 0 || bbox['left'] < 0) {
|
||||||
// Used by the keras trainer
|
return
|
||||||
fontSize: computedStyle.getPropertyValue('font-size'),
|
}
|
||||||
fontWeight: computedStyle.getPropertyValue('font-weight'),
|
|
||||||
hasDigitCurrency: hasDigitCurrency,
|
// @todo the getXpath kind of sucks, it doesnt know when there is for example just one ID sometimes
|
||||||
label: label,
|
// it should not traverse when we know we can anchor off just an ID one level up etc..
|
||||||
|
// maybe, get current class or id, keep traversing up looking for only class or id until there is just one match
|
||||||
|
|
||||||
|
// 1st primitive - if it has class, try joining it all and select, if theres only one.. well thats us.
|
||||||
|
xpath_result = false;
|
||||||
|
try {
|
||||||
|
var d = findUpTag(element);
|
||||||
|
if (d) {
|
||||||
|
xpath_result = d;
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
console.log(e);
|
||||||
|
}
|
||||||
|
// You could swap it and default to getXpath and then try the smarter one
|
||||||
|
// default back to the less intelligent one
|
||||||
|
if (!xpath_result) {
|
||||||
|
try {
|
||||||
|
// I've seen on FB and eBay that this doesnt work
|
||||||
|
// ReferenceError: getXPath is not defined at eval (eval at evaluate (:152:29), <anonymous>:67:20) at UtilityScript.evaluate (<anonymous>:159:18) at UtilityScript.<anonymous> (<anonymous>:1:44)
|
||||||
|
xpath_result = getxpath(element);
|
||||||
|
} catch (e) {
|
||||||
|
console.log(e);
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let label = "not-interesting" // A placeholder, the actual labels for training are done by hand for now
|
||||||
|
|
||||||
|
let text = element.textContent.trim().slice(0, 30).trim();
|
||||||
|
while (/\n{2,}|\t{2,}/.test(text)) {
|
||||||
|
text = text.replace(/\n{2,}/g, '\n').replace(/\t{2,}/g, '\t')
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to identify any possible currency amounts "Sale: 4000" or "Sale now 3000 Kc", can help with the training.
|
||||||
|
const hasDigitCurrency = (/\d/.test(text.slice(0, 6)) || /\d/.test(text.slice(-6))) && /([€£$¥₩₹]|USD|AUD|EUR|Kč|kr|SEK|,–)/.test(text);
|
||||||
|
const computedStyle = window.getComputedStyle(element);
|
||||||
|
|
||||||
|
if (Math.floor(bbox['top']) + scroll_y > max_height) {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
size_pos.push({
|
||||||
|
xpath: xpath_result,
|
||||||
|
width: Math.round(bbox['width']),
|
||||||
|
height: Math.round(bbox['height']),
|
||||||
|
left: Math.floor(bbox['left']),
|
||||||
|
top: Math.floor(bbox['top']) + scroll_y,
|
||||||
|
// tagName used by Browser Steps
|
||||||
|
tagName: (element.tagName) ? element.tagName.toLowerCase() : '',
|
||||||
|
// tagtype used by Browser Steps
|
||||||
|
tagtype: (element.tagName.toLowerCase() === 'input' && element.type) ? element.type.toLowerCase() : '',
|
||||||
|
isClickable: computedStyle.cursor === "pointer",
|
||||||
|
// Used by the keras trainer
|
||||||
|
fontSize: computedStyle.getPropertyValue('font-size'),
|
||||||
|
fontWeight: computedStyle.getPropertyValue('font-weight'),
|
||||||
|
hasDigitCurrency: hasDigitCurrency,
|
||||||
|
label: label,
|
||||||
|
});
|
||||||
|
|
||||||
});
|
});
|
||||||
|
|
||||||
});
|
|
||||||
|
|
||||||
|
|
||||||
// Inject the current one set in the include_filters, which may be a CSS rule
|
// Inject the current one set in the include_filters, which may be a CSS rule
|
||||||
// used for displaying the current one in VisualSelector, where its not one we generated.
|
// used for displaying the current one in VisualSelector, where its not one we generated.
|
||||||
if (include_filters.length) {
|
if (include_filters.length) {
|
||||||
let results;
|
let results;
|
||||||
// Foreach filter, go and find it on the page and add it to the results so we can visualise it again
|
// Foreach filter, go and find it on the page and add it to the results so we can visualise it again
|
||||||
for (const f of include_filters) {
|
for (const f of include_filters) {
|
||||||
bbox = false;
|
bbox = false;
|
||||||
q = false;
|
q = false;
|
||||||
|
|
||||||
if (!f.length) {
|
if (!f.length) {
|
||||||
console.log("xpath_element_scraper: Empty filter, skipping");
|
console.log("xpath_element_scraper: Empty filter, skipping");
|
||||||
continue;
|
continue;
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
// is it xpath?
|
|
||||||
if (f.startsWith('/') || f.startsWith('xpath')) {
|
|
||||||
var qry_f = f.replace(/xpath(:|\d:)/, '')
|
|
||||||
console.log("[xpath] Scanning for included filter " + qry_f)
|
|
||||||
let xpathResult = document.evaluate(qry_f, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
|
|
||||||
results = [];
|
|
||||||
for (let i = 0; i < xpathResult.snapshotLength; i++) {
|
|
||||||
results.push(xpathResult.snapshotItem(i));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
console.log("[css] Scanning for included filter " + f)
|
|
||||||
console.log("[css] Scanning for included filter " + f);
|
|
||||||
results = document.querySelectorAll(f);
|
|
||||||
}
|
}
|
||||||
} catch (e) {
|
|
||||||
// Maybe catch DOMException and alert?
|
|
||||||
console.log("xpath_element_scraper: Exception selecting element from filter " + f);
|
|
||||||
console.log(e);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (results != null && results.length) {
|
try {
|
||||||
|
// is it xpath?
|
||||||
// Iterate over the results
|
if (f.startsWith('/') || f.startsWith('xpath')) {
|
||||||
results.forEach(node => {
|
var qry_f = f.replace(/xpath(:|\d:)/, '')
|
||||||
// Try to resolve //something/text() back to its /something so we can atleast get the bounding box
|
console.log("[xpath] Scanning for included filter " + qry_f)
|
||||||
try {
|
let xpathResult = document.evaluate(qry_f, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
|
||||||
if (typeof node.nodeName == 'string' && node.nodeName === '#text') {
|
results = [];
|
||||||
node = node.parentElement
|
for (let i = 0; i < xpathResult.snapshotLength; i++) {
|
||||||
|
results.push(xpathResult.snapshotItem(i));
|
||||||
}
|
}
|
||||||
} catch (e) {
|
|
||||||
console.log(e)
|
|
||||||
console.log("xpath_element_scraper: #text resolver")
|
|
||||||
}
|
|
||||||
|
|
||||||
// #1231 - IN the case XPath attribute filter is applied, we will have to traverse up and find the element.
|
|
||||||
if (typeof node.getBoundingClientRect == 'function') {
|
|
||||||
bbox = node.getBoundingClientRect();
|
|
||||||
console.log("xpath_element_scraper: Got filter element, scroll from top was " + scroll_y)
|
|
||||||
} else {
|
} else {
|
||||||
|
console.log("[css] Scanning for included filter " + f)
|
||||||
|
console.log("[css] Scanning for included filter " + f);
|
||||||
|
results = document.querySelectorAll(f);
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
// Maybe catch DOMException and alert?
|
||||||
|
console.log("xpath_element_scraper: Exception selecting element from filter " + f);
|
||||||
|
console.log(e);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (results != null && results.length) {
|
||||||
|
|
||||||
|
// Iterate over the results
|
||||||
|
results.forEach(node => {
|
||||||
|
// Try to resolve //something/text() back to its /something so we can atleast get the bounding box
|
||||||
try {
|
try {
|
||||||
// Try and see we can find its ownerElement
|
if (typeof node.nodeName == 'string' && node.nodeName === '#text') {
|
||||||
bbox = node.ownerElement.getBoundingClientRect();
|
node = node.parentElement
|
||||||
console.log("xpath_element_scraper: Got filter by ownerElement element, scroll from top was " + scroll_y)
|
}
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
console.log(e)
|
console.log(e)
|
||||||
console.log("xpath_element_scraper: error looking up q.ownerElement")
|
console.log("xpath_element_scraper: #text resolver")
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
if (bbox && bbox['width'] > 0 && bbox['height'] > 0) {
|
// #1231 - IN the case XPath attribute filter is applied, we will have to traverse up and find the element.
|
||||||
size_pos.push({
|
if (typeof node.getBoundingClientRect == 'function') {
|
||||||
xpath: f,
|
bbox = node.getBoundingClientRect();
|
||||||
width: parseInt(bbox['width']),
|
console.log("xpath_element_scraper: Got filter element, scroll from top was " + scroll_y)
|
||||||
height: parseInt(bbox['height']),
|
} else {
|
||||||
left: parseInt(bbox['left']),
|
try {
|
||||||
top: parseInt(bbox['top']) + scroll_y,
|
// Try and see we can find its ownerElement
|
||||||
highlight_as_custom_filter: true
|
bbox = node.ownerElement.getBoundingClientRect();
|
||||||
});
|
console.log("xpath_element_scraper: Got filter by ownerElement element, scroll from top was " + scroll_y)
|
||||||
}
|
} catch (e) {
|
||||||
});
|
console.log(e)
|
||||||
|
console.log("xpath_element_scraper: error looking up q.ownerElement")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (bbox && bbox['width'] > 0 && bbox['height'] > 0) {
|
||||||
|
size_pos.push({
|
||||||
|
xpath: f,
|
||||||
|
width: parseInt(bbox['width']),
|
||||||
|
height: parseInt(bbox['height']),
|
||||||
|
left: parseInt(bbox['left']),
|
||||||
|
top: parseInt(bbox['top']) + scroll_y,
|
||||||
|
highlight_as_custom_filter: true
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
// Sort the elements so we find the smallest one first, in other words, we find the smallest one matching in that area
|
// Sort the elements so we find the smallest one first, in other words, we find the smallest one matching in that area
|
||||||
// so that we dont select the wrapping element by mistake and be unable to select what we want
|
// so that we dont select the wrapping element by mistake and be unable to select what we want
|
||||||
size_pos.sort((a, b) => (a.width * a.height > b.width * b.height) ? 1 : -1)
|
size_pos.sort((a, b) => (a.width * a.height > b.width * b.height) ? 1 : -1)
|
||||||
|
|
||||||
|
// browser_width required for proper scaling in the frontend
|
||||||
|
// Return as a string to save playwright for juggling thousands of objects
|
||||||
|
return JSON.stringify({'size_pos': size_pos, 'browser_width': window.innerWidth});
|
||||||
|
}
|
||||||
|
|
||||||
// Window.width required for proper scaling in the frontend
|
|
||||||
return {'size_pos': size_pos, 'browser_width': window.innerWidth};
|
|
||||||
|
|||||||
73
changedetectionio/content_fetchers/screenshot_handler.py
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
# Pages with a vertical height longer than this will use the 'stitch together' method.
|
||||||
|
|
||||||
|
# - Many GPUs have a max texture size of 16384x16384px (or lower on older devices).
|
||||||
|
# - If a page is taller than ~8000–10000px, it risks exceeding GPU memory limits.
|
||||||
|
# - This is especially important on headless Chromium, where Playwright may fail to allocate a massive full-page buffer.
|
||||||
|
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, SCREENSHOT_DEFAULT_QUALITY
|
||||||
|
|
||||||
|
|
||||||
|
def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_height):
|
||||||
|
import os
|
||||||
|
import io
|
||||||
|
from PIL import Image, ImageDraw, ImageFont
|
||||||
|
|
||||||
|
try:
|
||||||
|
|
||||||
|
# Load images from byte chunks
|
||||||
|
images = [Image.open(io.BytesIO(b)) for b in chunks_bytes]
|
||||||
|
total_height = sum(im.height for im in images)
|
||||||
|
max_width = max(im.width for im in images)
|
||||||
|
|
||||||
|
# Create stitched image
|
||||||
|
stitched = Image.new('RGB', (max_width, total_height))
|
||||||
|
y_offset = 0
|
||||||
|
for im in images:
|
||||||
|
stitched.paste(im, (0, y_offset))
|
||||||
|
y_offset += im.height
|
||||||
|
|
||||||
|
# Draw caption on top (overlaid, not extending canvas)
|
||||||
|
draw = ImageDraw.Draw(stitched)
|
||||||
|
|
||||||
|
|
||||||
|
caption_text = f"WARNING: Screenshot was {original_page_height}px but trimmed to {capture_height}px because it was too long"
|
||||||
|
padding = 10
|
||||||
|
font_size = 35
|
||||||
|
font_color = (255, 0, 0)
|
||||||
|
background_color = (255, 255, 255)
|
||||||
|
|
||||||
|
|
||||||
|
# Try to load a proper font
|
||||||
|
try:
|
||||||
|
font = ImageFont.truetype("arial.ttf", font_size)
|
||||||
|
except IOError:
|
||||||
|
font = ImageFont.load_default()
|
||||||
|
|
||||||
|
bbox = draw.textbbox((0, 0), caption_text, font=font)
|
||||||
|
text_width = bbox[2] - bbox[0]
|
||||||
|
text_height = bbox[3] - bbox[1]
|
||||||
|
|
||||||
|
# Draw white rectangle background behind text
|
||||||
|
rect_top = 0
|
||||||
|
rect_bottom = text_height + 2 * padding
|
||||||
|
draw.rectangle([(0, rect_top), (max_width, rect_bottom)], fill=background_color)
|
||||||
|
|
||||||
|
# Draw text centered horizontally, 10px padding from top of the rectangle
|
||||||
|
text_x = (max_width - text_width) // 2
|
||||||
|
text_y = padding
|
||||||
|
draw.text((text_x, text_y), caption_text, font=font, fill=font_color)
|
||||||
|
|
||||||
|
# Encode and send image
|
||||||
|
output = io.BytesIO()
|
||||||
|
stitched.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", SCREENSHOT_DEFAULT_QUALITY)))
|
||||||
|
pipe_conn.send_bytes(output.getvalue())
|
||||||
|
|
||||||
|
stitched.close()
|
||||||
|
except Exception as e:
|
||||||
|
pipe_conn.send(f"error:{e}")
|
||||||
|
finally:
|
||||||
|
pipe_conn.close()
|
||||||
|
|
||||||
|
|
||||||
@@ -65,6 +65,7 @@ class fetcher(Fetcher):
|
|||||||
# request_body, request_method unused for now, until some magic in the future happens.
|
# request_body, request_method unused for now, until some magic in the future happens.
|
||||||
|
|
||||||
options = ChromeOptions()
|
options = ChromeOptions()
|
||||||
|
options.add_argument("--headless")
|
||||||
if self.proxy:
|
if self.proxy:
|
||||||
options.proxy = self.proxy
|
options.proxy = self.proxy
|
||||||
|
|
||||||
@@ -112,9 +113,9 @@ class fetcher(Fetcher):
|
|||||||
self.quit()
|
self.quit()
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def quit(self):
|
def quit(self, watch=None):
|
||||||
if self.driver:
|
if self.driver:
|
||||||
try:
|
try:
|
||||||
self.driver.quit()
|
self.driver.quit()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.debug(f"Content Fetcher > Exception in chrome shutdown/quit {str(e)}")
|
logger.debug(f"Content Fetcher > Exception in chrome shutdown/quit {str(e)}")
|
||||||
|
|||||||
@@ -233,7 +233,8 @@ def changedetection_app(config=None, datastore_o=None):
|
|||||||
|
|
||||||
if has_password_enabled and not flask_login.current_user.is_authenticated:
|
if has_password_enabled and not flask_login.current_user.is_authenticated:
|
||||||
# Permitted
|
# Permitted
|
||||||
if request.endpoint and request.endpoint == 'static_content' and request.view_args and request.view_args.get('group') in ['styles', 'js', 'images', 'favicons']:
|
if request.endpoint and request.endpoint == 'static_content' and request.view_args:
|
||||||
|
# Handled by static_content handler
|
||||||
return None
|
return None
|
||||||
# Permitted
|
# Permitted
|
||||||
elif request.endpoint and 'login' in request.endpoint:
|
elif request.endpoint and 'login' in request.endpoint:
|
||||||
@@ -351,11 +352,15 @@ def changedetection_app(config=None, datastore_o=None):
|
|||||||
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
|
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
|
||||||
def static_content(group, filename):
|
def static_content(group, filename):
|
||||||
from flask import make_response
|
from flask import make_response
|
||||||
|
import re
|
||||||
|
group = re.sub(r'[^\w.-]+', '', group.lower())
|
||||||
|
filename = re.sub(r'[^\w.-]+', '', filename.lower())
|
||||||
|
|
||||||
if group == 'screenshot':
|
if group == 'screenshot':
|
||||||
# Could be sensitive, follow password requirements
|
# Could be sensitive, follow password requirements
|
||||||
if datastore.data['settings']['application']['password'] and not flask_login.current_user.is_authenticated:
|
if datastore.data['settings']['application']['password'] and not flask_login.current_user.is_authenticated:
|
||||||
abort(403)
|
if not datastore.data['settings']['application'].get('shared_diff_access'):
|
||||||
|
abort(403)
|
||||||
|
|
||||||
screenshot_filename = "last-screenshot.png" if not request.args.get('error_screenshot') else "last-error-screenshot.png"
|
screenshot_filename = "last-screenshot.png" if not request.args.get('error_screenshot') else "last-error-screenshot.png"
|
||||||
|
|
||||||
@@ -389,7 +394,7 @@ def changedetection_app(config=None, datastore_o=None):
|
|||||||
response.headers['Content-Type'] = 'application/json'
|
response.headers['Content-Type'] = 'application/json'
|
||||||
response.headers['Content-Encoding'] = 'deflate'
|
response.headers['Content-Encoding'] = 'deflate'
|
||||||
else:
|
else:
|
||||||
logger.error(f'Request elements.deflate at "{watch_directory}" but was notfound.')
|
logger.error(f'Request elements.deflate at "{watch_directory}" but was not found.')
|
||||||
abort(404)
|
abort(404)
|
||||||
|
|
||||||
if response:
|
if response:
|
||||||
@@ -404,7 +409,7 @@ def changedetection_app(config=None, datastore_o=None):
|
|||||||
|
|
||||||
# These files should be in our subdirectory
|
# These files should be in our subdirectory
|
||||||
try:
|
try:
|
||||||
return send_from_directory("static/{}".format(group), path=filename)
|
return send_from_directory(f"static/{group}", path=filename)
|
||||||
except FileNotFoundError:
|
except FileNotFoundError:
|
||||||
abort(404)
|
abort(404)
|
||||||
|
|
||||||
@@ -442,6 +447,16 @@ def changedetection_app(config=None, datastore_o=None):
|
|||||||
|
|
||||||
import changedetectionio.blueprint.watchlist as watchlist
|
import changedetectionio.blueprint.watchlist as watchlist
|
||||||
app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='')
|
app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='')
|
||||||
|
|
||||||
|
# Memory cleanup endpoint
|
||||||
|
@app.route('/gc-cleanup', methods=['GET'])
|
||||||
|
@login_optionally_required
|
||||||
|
def gc_cleanup():
|
||||||
|
from changedetectionio.gc_cleanup import memory_cleanup
|
||||||
|
from flask import jsonify
|
||||||
|
|
||||||
|
result = memory_cleanup(app)
|
||||||
|
return jsonify({"status": "success", "message": "Memory cleanup completed", "result": result})
|
||||||
|
|
||||||
# @todo handle ctrl break
|
# @todo handle ctrl break
|
||||||
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
|
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
|
||||||
@@ -499,7 +514,8 @@ def notification_runner():
|
|||||||
sent_obj = None
|
sent_obj = None
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from changedetectionio import notification
|
from changedetectionio.notification.handler import process_notification
|
||||||
|
|
||||||
# Fallback to system config if not set
|
# Fallback to system config if not set
|
||||||
if not n_object.get('notification_body') and datastore.data['settings']['application'].get('notification_body'):
|
if not n_object.get('notification_body') and datastore.data['settings']['application'].get('notification_body'):
|
||||||
n_object['notification_body'] = datastore.data['settings']['application'].get('notification_body')
|
n_object['notification_body'] = datastore.data['settings']['application'].get('notification_body')
|
||||||
@@ -509,8 +525,8 @@ def notification_runner():
|
|||||||
|
|
||||||
if not n_object.get('notification_format') and datastore.data['settings']['application'].get('notification_format'):
|
if not n_object.get('notification_format') and datastore.data['settings']['application'].get('notification_format'):
|
||||||
n_object['notification_format'] = datastore.data['settings']['application'].get('notification_format')
|
n_object['notification_format'] = datastore.data['settings']['application'].get('notification_format')
|
||||||
|
if n_object.get('notification_urls', {}):
|
||||||
sent_obj = notification.process_notification(n_object, datastore)
|
sent_obj = process_notification(n_object, datastore)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Watch URL: {n_object['watch_url']} Error {str(e)}")
|
logger.error(f"Watch URL: {n_object['watch_url']} Error {str(e)}")
|
||||||
|
|||||||
@@ -306,8 +306,8 @@ class ValidateAppRiseServers(object):
|
|||||||
|
|
||||||
def __call__(self, form, field):
|
def __call__(self, form, field):
|
||||||
import apprise
|
import apprise
|
||||||
from .apprise_plugin.assets import apprise_asset
|
from .notification.apprise_plugin.assets import apprise_asset
|
||||||
from .apprise_plugin.custom_handlers import apprise_http_custom_handler # noqa: F401
|
from .notification.apprise_plugin.custom_handlers import apprise_http_custom_handler # noqa: F401
|
||||||
|
|
||||||
apobj = apprise.Apprise(asset=apprise_asset)
|
apobj = apprise.Apprise(asset=apprise_asset)
|
||||||
|
|
||||||
@@ -586,7 +586,7 @@ class processor_text_json_diff_form(commonSettingsForm):
|
|||||||
filter_text_replaced = BooleanField('Replaced/changed lines', default=True)
|
filter_text_replaced = BooleanField('Replaced/changed lines', default=True)
|
||||||
filter_text_removed = BooleanField('Removed lines', default=True)
|
filter_text_removed = BooleanField('Removed lines', default=True)
|
||||||
|
|
||||||
trigger_text = StringListField('Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
|
trigger_text = StringListField('Keyword triggers - Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
|
||||||
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
|
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
|
||||||
browser_steps = FieldList(FormField(SingleBrowserStep), min_entries=10)
|
browser_steps = FieldList(FormField(SingleBrowserStep), min_entries=10)
|
||||||
text_should_not_be_present = StringListField('Block change-detection while text matches', [validators.Optional(), ValidateListRegex()])
|
text_should_not_be_present = StringListField('Block change-detection while text matches', [validators.Optional(), ValidateListRegex()])
|
||||||
@@ -721,6 +721,8 @@ class globalSettingsRequestForm(Form):
|
|||||||
self.extra_proxies.errors.append('Both a name, and a Proxy URL is required.')
|
self.extra_proxies.errors.append('Both a name, and a Proxy URL is required.')
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
class globalSettingsApplicationUIForm(Form):
|
||||||
|
open_diff_in_new_tab = BooleanField('Open diff page in a new tab', default=True, validators=[validators.Optional()])
|
||||||
|
|
||||||
# datastore.data['settings']['application']..
|
# datastore.data['settings']['application']..
|
||||||
class globalSettingsApplicationForm(commonSettingsForm):
|
class globalSettingsApplicationForm(commonSettingsForm):
|
||||||
@@ -752,6 +754,7 @@ class globalSettingsApplicationForm(commonSettingsForm):
|
|||||||
render_kw={"style": "width: 5em;"},
|
render_kw={"style": "width: 5em;"},
|
||||||
validators=[validators.NumberRange(min=0,
|
validators=[validators.NumberRange(min=0,
|
||||||
message="Should contain zero or more attempts")])
|
message="Should contain zero or more attempts")])
|
||||||
|
ui = FormField(globalSettingsApplicationUIForm)
|
||||||
|
|
||||||
|
|
||||||
class globalSettingsForm(Form):
|
class globalSettingsForm(Form):
|
||||||
|
|||||||
162
changedetectionio/gc_cleanup.py
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
import ctypes
|
||||||
|
import gc
|
||||||
|
import re
|
||||||
|
import psutil
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import importlib
|
||||||
|
from loguru import logger
|
||||||
|
|
||||||
|
def memory_cleanup(app=None):
|
||||||
|
"""
|
||||||
|
Perform comprehensive memory cleanup operations and log memory usage
|
||||||
|
at each step with nicely formatted numbers.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
app: Optional Flask app instance for clearing Flask-specific caches
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Status message
|
||||||
|
"""
|
||||||
|
# Get current process
|
||||||
|
process = psutil.Process()
|
||||||
|
|
||||||
|
# Log initial memory usage with nicely formatted numbers
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"Memory cleanup started - Current memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
# 1. Standard garbage collection - force full collection on all generations
|
||||||
|
gc.collect(0) # Collect youngest generation
|
||||||
|
gc.collect(1) # Collect middle generation
|
||||||
|
gc.collect(2) # Collect oldest generation
|
||||||
|
|
||||||
|
# Run full collection again to ensure maximum cleanup
|
||||||
|
gc.collect()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After full gc.collect() - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
|
||||||
|
# 3. Call libc's malloc_trim to release memory back to the OS
|
||||||
|
libc = ctypes.CDLL("libc.so.6")
|
||||||
|
libc.malloc_trim(0)
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After malloc_trim(0) - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
# 4. Clear Python's regex cache
|
||||||
|
re.purge()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After re.purge() - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
# 5. Reset thread-local storage
|
||||||
|
# Create a new thread local object to encourage cleanup of old ones
|
||||||
|
threading.local()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After threading.local() - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
# 6. Clear sys.intern cache if Python version supports it
|
||||||
|
try:
|
||||||
|
sys.intern.clear()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After sys.intern.clear() - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
except (AttributeError, TypeError):
|
||||||
|
logger.debug("sys.intern.clear() not supported in this Python version")
|
||||||
|
|
||||||
|
# 7. Clear XML/lxml caches if available
|
||||||
|
try:
|
||||||
|
# Check if lxml.etree is in use
|
||||||
|
lxml_etree = sys.modules.get('lxml.etree')
|
||||||
|
if lxml_etree:
|
||||||
|
# Clear module-level caches
|
||||||
|
if hasattr(lxml_etree, 'clear_error_log'):
|
||||||
|
lxml_etree.clear_error_log()
|
||||||
|
|
||||||
|
# Check for _ErrorLog and _RotatingErrorLog objects and clear them
|
||||||
|
for obj in gc.get_objects():
|
||||||
|
if hasattr(obj, '__class__') and hasattr(obj.__class__, '__name__'):
|
||||||
|
class_name = obj.__class__.__name__
|
||||||
|
if class_name in ('_ErrorLog', '_RotatingErrorLog', '_DomainErrorLog') and hasattr(obj, 'clear'):
|
||||||
|
try:
|
||||||
|
obj.clear()
|
||||||
|
except (AttributeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Clear Element objects which can hold references to documents
|
||||||
|
elif class_name in ('_Element', 'ElementBase') and hasattr(obj, 'clear'):
|
||||||
|
try:
|
||||||
|
obj.clear()
|
||||||
|
except (AttributeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After lxml.etree cleanup - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
|
||||||
|
# Check if lxml.html is in use
|
||||||
|
lxml_html = sys.modules.get('lxml.html')
|
||||||
|
if lxml_html:
|
||||||
|
# Clear HTML-specific element types
|
||||||
|
for obj in gc.get_objects():
|
||||||
|
if hasattr(obj, '__class__') and hasattr(obj.__class__, '__name__'):
|
||||||
|
class_name = obj.__class__.__name__
|
||||||
|
if class_name in ('HtmlElement', 'FormElement', 'InputElement',
|
||||||
|
'SelectElement', 'TextareaElement', 'CheckboxGroup',
|
||||||
|
'RadioGroup', 'MultipleSelectOptions', 'FieldsDict') and hasattr(obj, 'clear'):
|
||||||
|
try:
|
||||||
|
obj.clear()
|
||||||
|
except (AttributeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After lxml.html cleanup - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
except (ImportError, AttributeError):
|
||||||
|
logger.debug("lxml cleanup not applicable")
|
||||||
|
|
||||||
|
# 8. Clear JSON parser caches if applicable
|
||||||
|
try:
|
||||||
|
# Check if json module is being used and try to clear its cache
|
||||||
|
json_module = sys.modules.get('json')
|
||||||
|
if json_module and hasattr(json_module, '_default_encoder'):
|
||||||
|
json_module._default_encoder.markers.clear()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After JSON parser cleanup - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
except (AttributeError, KeyError):
|
||||||
|
logger.debug("JSON cleanup not applicable")
|
||||||
|
|
||||||
|
# 9. Force Python's memory allocator to release unused memory
|
||||||
|
try:
|
||||||
|
if hasattr(sys, 'pypy_version_info'):
|
||||||
|
# PyPy has different memory management
|
||||||
|
gc.collect()
|
||||||
|
else:
|
||||||
|
# CPython - try to release unused memory
|
||||||
|
ctypes.pythonapi.PyGC_Collect()
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After PyGC_Collect - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
except (AttributeError, TypeError):
|
||||||
|
logger.debug("PyGC_Collect not supported")
|
||||||
|
|
||||||
|
# 10. Clear Flask-specific caches if applicable
|
||||||
|
if app:
|
||||||
|
try:
|
||||||
|
# Clear Flask caches if they exist
|
||||||
|
for key in list(app.config.get('_cache', {}).keys()):
|
||||||
|
app.config['_cache'].pop(key, None)
|
||||||
|
|
||||||
|
# Clear Jinja2 template cache if available
|
||||||
|
if hasattr(app, 'jinja_env') and hasattr(app.jinja_env, 'cache'):
|
||||||
|
app.jinja_env.cache.clear()
|
||||||
|
|
||||||
|
current_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.debug(f"After Flask cache clear - Memory usage: {current_memory:,.2f} MB")
|
||||||
|
except (AttributeError, KeyError):
|
||||||
|
logger.debug("No Flask cache to clear")
|
||||||
|
|
||||||
|
# Final garbage collection pass
|
||||||
|
gc.collect()
|
||||||
|
libc.malloc_trim(0)
|
||||||
|
|
||||||
|
# Log final memory usage
|
||||||
|
final_memory = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.info(f"Memory cleanup completed - Final memory usage: {final_memory:,.2f} MB")
|
||||||
|
return "cleaned"
|
||||||
@@ -366,22 +366,41 @@ def extract_json_as_string(content, json_filter, ensure_is_ldjson_info_type=None
|
|||||||
# wordlist - list of regex's (str) or words (str)
|
# wordlist - list of regex's (str) or words (str)
|
||||||
# Preserves all linefeeds and other whitespacing, its not the job of this to remove that
|
# Preserves all linefeeds and other whitespacing, its not the job of this to remove that
|
||||||
def strip_ignore_text(content, wordlist, mode="content"):
|
def strip_ignore_text(content, wordlist, mode="content"):
|
||||||
i = 0
|
|
||||||
output = []
|
|
||||||
ignore_text = []
|
ignore_text = []
|
||||||
ignore_regex = []
|
ignore_regex = []
|
||||||
ignored_line_numbers = []
|
ignore_regex_multiline = []
|
||||||
|
ignored_lines = []
|
||||||
|
|
||||||
for k in wordlist:
|
for k in wordlist:
|
||||||
# Is it a regex?
|
# Is it a regex?
|
||||||
res = re.search(PERL_STYLE_REGEX, k, re.IGNORECASE)
|
res = re.search(PERL_STYLE_REGEX, k, re.IGNORECASE)
|
||||||
if res:
|
if res:
|
||||||
ignore_regex.append(re.compile(perl_style_slash_enclosed_regex_to_options(k)))
|
res = re.compile(perl_style_slash_enclosed_regex_to_options(k))
|
||||||
|
if res.flags & re.DOTALL or res.flags & re.MULTILINE:
|
||||||
|
ignore_regex_multiline.append(res)
|
||||||
|
else:
|
||||||
|
ignore_regex.append(res)
|
||||||
else:
|
else:
|
||||||
ignore_text.append(k.strip())
|
ignore_text.append(k.strip())
|
||||||
|
|
||||||
for line in content.splitlines(keepends=True):
|
for r in ignore_regex_multiline:
|
||||||
i += 1
|
for match in r.finditer(content):
|
||||||
|
content_lines = content[:match.end()].splitlines(keepends=True)
|
||||||
|
match_lines = content[match.start():match.end()].splitlines(keepends=True)
|
||||||
|
|
||||||
|
end_line = len(content_lines)
|
||||||
|
start_line = end_line - len(match_lines)
|
||||||
|
|
||||||
|
if end_line - start_line <= 1:
|
||||||
|
# Match is empty or in the middle of the line
|
||||||
|
ignored_lines.append(start_line)
|
||||||
|
else:
|
||||||
|
for i in range(start_line, end_line):
|
||||||
|
ignored_lines.append(i)
|
||||||
|
|
||||||
|
line_index = 0
|
||||||
|
lines = content.splitlines(keepends=True)
|
||||||
|
for line in lines:
|
||||||
# Always ignore blank lines in this mode. (when this function gets called)
|
# Always ignore blank lines in this mode. (when this function gets called)
|
||||||
got_match = False
|
got_match = False
|
||||||
for l in ignore_text:
|
for l in ignore_text:
|
||||||
@@ -393,17 +412,19 @@ def strip_ignore_text(content, wordlist, mode="content"):
|
|||||||
if r.search(line):
|
if r.search(line):
|
||||||
got_match = True
|
got_match = True
|
||||||
|
|
||||||
if not got_match:
|
if got_match:
|
||||||
# Not ignored, and should preserve "keepends"
|
ignored_lines.append(line_index)
|
||||||
output.append(line)
|
|
||||||
else:
|
line_index += 1
|
||||||
ignored_line_numbers.append(i)
|
|
||||||
|
ignored_lines = set([i for i in ignored_lines if i >= 0 and i < len(lines)])
|
||||||
|
|
||||||
# Used for finding out what to highlight
|
# Used for finding out what to highlight
|
||||||
if mode == "line numbers":
|
if mode == "line numbers":
|
||||||
return ignored_line_numbers
|
return [i + 1 for i in ignored_lines]
|
||||||
|
|
||||||
return ''.join(output)
|
output_lines = set(range(len(lines))) - ignored_lines
|
||||||
|
return ''.join([lines[i] for i in output_lines])
|
||||||
|
|
||||||
def cdata_in_document_to_text(html_content: str, render_anchor_tag_content=False) -> str:
|
def cdata_in_document_to_text(html_content: str, render_anchor_tag_content=False) -> str:
|
||||||
from xml.sax.saxutils import escape as xml_escape
|
from xml.sax.saxutils import escape as xml_escape
|
||||||
@@ -456,8 +477,10 @@ def html_to_text(html_content: str, render_anchor_tag_content=False, is_rss=Fals
|
|||||||
# Does LD+JSON exist with a @type=='product' and a .price set anywhere?
|
# Does LD+JSON exist with a @type=='product' and a .price set anywhere?
|
||||||
def has_ldjson_product_info(content):
|
def has_ldjson_product_info(content):
|
||||||
try:
|
try:
|
||||||
lc = content.lower()
|
# Better than .lower() which can use a lot of ram
|
||||||
if 'application/ld+json' in lc and lc.count('"price"') == 1 and '"pricecurrency"' in lc:
|
if (re.search(r'application/ld\+json', content, re.IGNORECASE) and
|
||||||
|
re.search(r'"price"', content, re.IGNORECASE) and
|
||||||
|
re.search(r'"pricecurrency"', content, re.IGNORECASE)):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
# On some pages this is really terribly expensive when they dont really need it
|
# On some pages this is really terribly expensive when they dont really need it
|
||||||
|
|||||||
@@ -60,6 +60,9 @@ class model(dict):
|
|||||||
'webdriver_delay': None , # Extra delay in seconds before extracting text
|
'webdriver_delay': None , # Extra delay in seconds before extracting text
|
||||||
'tags': {}, #@todo use Tag.model initialisers
|
'tags': {}, #@todo use Tag.model initialisers
|
||||||
'timezone': None, # Default IANA timezone name
|
'timezone': None, # Default IANA timezone name
|
||||||
|
'ui': {
|
||||||
|
'open_diff_in_new_tab': True,
|
||||||
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -553,7 +553,10 @@ class model(watch_base):
|
|||||||
self.ensure_data_dir_exists()
|
self.ensure_data_dir_exists()
|
||||||
|
|
||||||
with open(target_path, 'wb') as f:
|
with open(target_path, 'wb') as f:
|
||||||
f.write(zlib.compress(json.dumps(data).encode()))
|
if not isinstance(data, str):
|
||||||
|
f.write(zlib.compress(json.dumps(data).encode()))
|
||||||
|
else:
|
||||||
|
f.write(zlib.compress(data.encode()))
|
||||||
f.close()
|
f.close()
|
||||||
|
|
||||||
# Save as PNG, PNG is larger but better for doing visual diff in the future
|
# Save as PNG, PNG is larger but better for doing visual diff in the future
|
||||||
@@ -575,7 +578,7 @@ class model(watch_base):
|
|||||||
import brotli
|
import brotli
|
||||||
filepath = os.path.join(self.watch_data_dir, 'last-fetched.br')
|
filepath = os.path.join(self.watch_data_dir, 'last-fetched.br')
|
||||||
|
|
||||||
if not os.path.isfile(filepath):
|
if not os.path.isfile(filepath) or os.path.getsize(filepath) == 0:
|
||||||
# If a previous attempt doesnt yet exist, just snarf the previous snapshot instead
|
# If a previous attempt doesnt yet exist, just snarf the previous snapshot instead
|
||||||
dates = list(self.history.keys())
|
dates = list(self.history.keys())
|
||||||
if len(dates):
|
if len(dates):
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ import os
|
|||||||
import uuid
|
import uuid
|
||||||
|
|
||||||
from changedetectionio import strtobool
|
from changedetectionio import strtobool
|
||||||
from changedetectionio.notification import default_notification_format_for_watch
|
default_notification_format_for_watch = 'System default'
|
||||||
|
|
||||||
class watch_base(dict):
|
class watch_base(dict):
|
||||||
|
|
||||||
|
|||||||
35
changedetectionio/notification/__init__.py
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
from changedetectionio.model import default_notification_format_for_watch
|
||||||
|
|
||||||
|
ult_notification_format_for_watch = 'System default'
|
||||||
|
default_notification_format = 'HTML Color'
|
||||||
|
default_notification_body = '{{watch_url}} had a change.\n---\n{{diff}}\n---\n'
|
||||||
|
default_notification_title = 'ChangeDetection.io Notification - {{watch_url}}'
|
||||||
|
|
||||||
|
# The values (markdown etc) are from apprise NotifyFormat,
|
||||||
|
# But to avoid importing the whole heavy module just use the same strings here.
|
||||||
|
valid_notification_formats = {
|
||||||
|
'Text': 'text',
|
||||||
|
'Markdown': 'markdown',
|
||||||
|
'HTML': 'html',
|
||||||
|
'HTML Color': 'htmlcolor',
|
||||||
|
# Used only for editing a watch (not for global)
|
||||||
|
default_notification_format_for_watch: default_notification_format_for_watch
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
valid_tokens = {
|
||||||
|
'base_url': '',
|
||||||
|
'current_snapshot': '',
|
||||||
|
'diff': '',
|
||||||
|
'diff_added': '',
|
||||||
|
'diff_full': '',
|
||||||
|
'diff_patch': '',
|
||||||
|
'diff_removed': '',
|
||||||
|
'diff_url': '',
|
||||||
|
'preview_url': '',
|
||||||
|
'triggered_text': '',
|
||||||
|
'watch_tag': '',
|
||||||
|
'watch_title': '',
|
||||||
|
'watch_url': '',
|
||||||
|
'watch_uuid': '',
|
||||||
|
}
|
||||||
@@ -1,47 +1,17 @@
|
|||||||
|
|
||||||
import time
|
import time
|
||||||
from apprise import NotifyFormat
|
|
||||||
import apprise
|
import apprise
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
from .apprise_plugin.assets import APPRISE_AVATAR_URL
|
from .apprise_plugin.assets import apprise_asset, APPRISE_AVATAR_URL
|
||||||
from .apprise_plugin.custom_handlers import apprise_http_custom_handler # noqa: F401
|
|
||||||
from .safe_jinja import render as jinja_render
|
|
||||||
|
|
||||||
valid_tokens = {
|
|
||||||
'base_url': '',
|
|
||||||
'current_snapshot': '',
|
|
||||||
'diff': '',
|
|
||||||
'diff_added': '',
|
|
||||||
'diff_full': '',
|
|
||||||
'diff_patch': '',
|
|
||||||
'diff_removed': '',
|
|
||||||
'diff_url': '',
|
|
||||||
'preview_url': '',
|
|
||||||
'triggered_text': '',
|
|
||||||
'watch_tag': '',
|
|
||||||
'watch_title': '',
|
|
||||||
'watch_url': '',
|
|
||||||
'watch_uuid': '',
|
|
||||||
}
|
|
||||||
|
|
||||||
default_notification_format_for_watch = 'System default'
|
|
||||||
default_notification_format = 'HTML Color'
|
|
||||||
default_notification_body = '{{watch_url}} had a change.\n---\n{{diff}}\n---\n'
|
|
||||||
default_notification_title = 'ChangeDetection.io Notification - {{watch_url}}'
|
|
||||||
|
|
||||||
valid_notification_formats = {
|
|
||||||
'Text': NotifyFormat.TEXT,
|
|
||||||
'Markdown': NotifyFormat.MARKDOWN,
|
|
||||||
'HTML': NotifyFormat.HTML,
|
|
||||||
'HTML Color': 'htmlcolor',
|
|
||||||
# Used only for editing a watch (not for global)
|
|
||||||
default_notification_format_for_watch: default_notification_format_for_watch
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def process_notification(n_object, datastore):
|
def process_notification(n_object, datastore):
|
||||||
|
from changedetectionio.safe_jinja import render as jinja_render
|
||||||
|
from . import default_notification_format_for_watch, default_notification_format, valid_notification_formats
|
||||||
|
# be sure its registered
|
||||||
|
from .apprise_plugin.custom_handlers import apprise_http_custom_handler
|
||||||
|
|
||||||
now = time.time()
|
now = time.time()
|
||||||
if n_object.get('notification_timestamp'):
|
if n_object.get('notification_timestamp'):
|
||||||
logger.trace(f"Time since queued {now-n_object['notification_timestamp']:.3f}s")
|
logger.trace(f"Time since queued {now-n_object['notification_timestamp']:.3f}s")
|
||||||
@@ -58,14 +28,13 @@ def process_notification(n_object, datastore):
|
|||||||
# Initially text or whatever
|
# Initially text or whatever
|
||||||
n_format = datastore.data['settings']['application'].get('notification_format', valid_notification_formats[default_notification_format])
|
n_format = datastore.data['settings']['application'].get('notification_format', valid_notification_formats[default_notification_format])
|
||||||
|
|
||||||
logger.trace(f"Complete notification body including Jinja and placeholders calculated in {time.time() - now:.3f}s")
|
logger.trace(f"Complete notification body including Jinja and placeholders calculated in {time.time() - now:.2f}s")
|
||||||
|
|
||||||
# https://github.com/caronc/apprise/wiki/Development_LogCapture
|
# https://github.com/caronc/apprise/wiki/Development_LogCapture
|
||||||
# Anything higher than or equal to WARNING (which covers things like Connection errors)
|
# Anything higher than or equal to WARNING (which covers things like Connection errors)
|
||||||
# raise it as an exception
|
# raise it as an exception
|
||||||
|
|
||||||
sent_objs = []
|
sent_objs = []
|
||||||
from .apprise_plugin.assets import apprise_asset
|
|
||||||
|
|
||||||
if 'as_async' in n_object:
|
if 'as_async' in n_object:
|
||||||
apprise_asset.async_mode = n_object.get('as_async')
|
apprise_asset.async_mode = n_object.get('as_async')
|
||||||
@@ -176,6 +145,7 @@ def process_notification(n_object, datastore):
|
|||||||
# ( Where we prepare the tokens in the notification to be replaced with actual values )
|
# ( Where we prepare the tokens in the notification to be replaced with actual values )
|
||||||
def create_notification_parameters(n_object, datastore):
|
def create_notification_parameters(n_object, datastore):
|
||||||
from copy import deepcopy
|
from copy import deepcopy
|
||||||
|
from . import valid_tokens
|
||||||
|
|
||||||
# in the case we send a test notification from the main settings, there is no UUID.
|
# in the case we send a test notification from the main settings, there is no UUID.
|
||||||
uuid = n_object['uuid'] if 'uuid' in n_object else ''
|
uuid = n_object['uuid'] if 'uuid' in n_object else ''
|
||||||
@@ -159,7 +159,7 @@ class difference_detection_processor():
|
|||||||
)
|
)
|
||||||
|
|
||||||
#@todo .quit here could go on close object, so we can run JS if change-detected
|
#@todo .quit here could go on close object, so we can run JS if change-detected
|
||||||
self.fetcher.quit()
|
self.fetcher.quit(watch=self.watch)
|
||||||
|
|
||||||
# After init, call run_changedetection() which will do the actual change-detection
|
# After init, call run_changedetection() which will do the actual change-detection
|
||||||
|
|
||||||
|
|||||||
@@ -252,6 +252,7 @@ class perform_site_check(difference_detection_processor):
|
|||||||
|
|
||||||
# 615 Extract text by regex
|
# 615 Extract text by regex
|
||||||
extract_text = watch.get('extract_text', [])
|
extract_text = watch.get('extract_text', [])
|
||||||
|
extract_text += self.datastore.get_tag_overrides_for_watch(uuid=watch.get('uuid'), attr='extract_text')
|
||||||
if len(extract_text) > 0:
|
if len(extract_text) > 0:
|
||||||
regex_matched_output = []
|
regex_matched_output = []
|
||||||
for s_re in extract_text:
|
for s_re in extract_text:
|
||||||
@@ -296,6 +297,8 @@ class perform_site_check(difference_detection_processor):
|
|||||||
### CALCULATE MD5
|
### CALCULATE MD5
|
||||||
# If there's text to ignore
|
# If there's text to ignore
|
||||||
text_to_ignore = watch.get('ignore_text', []) + self.datastore.data['settings']['application'].get('global_ignore_text', [])
|
text_to_ignore = watch.get('ignore_text', []) + self.datastore.data['settings']['application'].get('global_ignore_text', [])
|
||||||
|
text_to_ignore += self.datastore.get_tag_overrides_for_watch(uuid=watch.get('uuid'), attr='ignore_text')
|
||||||
|
|
||||||
text_for_checksuming = stripped_text_from_html
|
text_for_checksuming = stripped_text_from_html
|
||||||
if text_to_ignore:
|
if text_to_ignore:
|
||||||
text_for_checksuming = html_tools.strip_ignore_text(stripped_text_from_html, text_to_ignore)
|
text_for_checksuming = html_tools.strip_ignore_text(stripped_text_from_html, text_to_ignore)
|
||||||
@@ -308,8 +311,8 @@ class perform_site_check(difference_detection_processor):
|
|||||||
|
|
||||||
############ Blocking rules, after checksum #################
|
############ Blocking rules, after checksum #################
|
||||||
blocked = False
|
blocked = False
|
||||||
|
|
||||||
trigger_text = watch.get('trigger_text', [])
|
trigger_text = watch.get('trigger_text', [])
|
||||||
|
trigger_text += self.datastore.get_tag_overrides_for_watch(uuid=watch.get('uuid'), attr='trigger_text')
|
||||||
if len(trigger_text):
|
if len(trigger_text):
|
||||||
# Assume blocked
|
# Assume blocked
|
||||||
blocked = True
|
blocked = True
|
||||||
@@ -324,6 +327,7 @@ class perform_site_check(difference_detection_processor):
|
|||||||
blocked = False
|
blocked = False
|
||||||
|
|
||||||
text_should_not_be_present = watch.get('text_should_not_be_present', [])
|
text_should_not_be_present = watch.get('text_should_not_be_present', [])
|
||||||
|
text_should_not_be_present += self.datastore.get_tag_overrides_for_watch(uuid=watch.get('uuid'), attr='text_should_not_be_present')
|
||||||
if len(text_should_not_be_present):
|
if len(text_should_not_be_present):
|
||||||
# If anything matched, then we should block a change from happening
|
# If anything matched, then we should block a change from happening
|
||||||
result = html_tools.strip_ignore_text(content=str(stripped_text_from_html),
|
result = html_tools.strip_ignore_text(content=str(stripped_text_from_html),
|
||||||
|
|||||||
@@ -14,7 +14,8 @@ SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
|
|||||||
find tests/test_*py -type f|while read test_name
|
find tests/test_*py -type f|while read test_name
|
||||||
do
|
do
|
||||||
echo "TEST RUNNING $test_name"
|
echo "TEST RUNNING $test_name"
|
||||||
pytest $test_name
|
# REMOVE_REQUESTS_OLD_SCREENSHOTS disabled so that we can write a screenshot and send it in test_notifications.py without a real browser
|
||||||
|
REMOVE_REQUESTS_OLD_SCREENSHOTS=false pytest $test_name
|
||||||
done
|
done
|
||||||
|
|
||||||
echo "RUNNING WITH BASE_URL SET"
|
echo "RUNNING WITH BASE_URL SET"
|
||||||
@@ -22,7 +23,7 @@ echo "RUNNING WITH BASE_URL SET"
|
|||||||
# Now re-run some tests with BASE_URL enabled
|
# Now re-run some tests with BASE_URL enabled
|
||||||
# Re #65 - Ability to include a link back to the installation, in the notification.
|
# Re #65 - Ability to include a link back to the installation, in the notification.
|
||||||
export BASE_URL="https://really-unique-domain.io"
|
export BASE_URL="https://really-unique-domain.io"
|
||||||
pytest tests/test_notification.py
|
REMOVE_REQUESTS_OLD_SCREENSHOTS=false pytest tests/test_notification.py
|
||||||
|
|
||||||
|
|
||||||
# Re-run with HIDE_REFERER set - could affect login
|
# Re-run with HIDE_REFERER set - could affect login
|
||||||
@@ -32,7 +33,7 @@ pytest tests/test_access_control.py
|
|||||||
# Re-run a few tests that will trigger brotli based storage
|
# Re-run a few tests that will trigger brotli based storage
|
||||||
export SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD=5
|
export SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD=5
|
||||||
pytest tests/test_access_control.py
|
pytest tests/test_access_control.py
|
||||||
pytest tests/test_notification.py
|
REMOVE_REQUESTS_OLD_SCREENSHOTS=false pytest tests/test_notification.py
|
||||||
pytest tests/test_backend.py
|
pytest tests/test_backend.py
|
||||||
pytest tests/test_rss.py
|
pytest tests/test_rss.py
|
||||||
pytest tests/test_unique_lines.py
|
pytest tests/test_unique_lines.py
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 569 B After Width: | Height: | Size: 569 B |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 6.2 KiB After Width: | Height: | Size: 6.2 KiB |
@@ -8,7 +8,7 @@ $(document).ready(function () {
|
|||||||
$(".addRuleRow").on("click", function(e) {
|
$(".addRuleRow").on("click", function(e) {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
|
|
||||||
let currentRow = $(this).closest("tr");
|
let currentRow = $(this).closest(".fieldlist-row");
|
||||||
|
|
||||||
// Clone without events
|
// Clone without events
|
||||||
let newRow = currentRow.clone(false);
|
let newRow = currentRow.clone(false);
|
||||||
@@ -29,8 +29,8 @@ $(document).ready(function () {
|
|||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
|
|
||||||
// Only remove if there's more than one row
|
// Only remove if there's more than one row
|
||||||
if ($("#rulesTable tbody tr").length > 1) {
|
if ($("#rulesTable .fieldlist-row").length > 1) {
|
||||||
$(this).closest("tr").remove();
|
$(this).closest(".fieldlist-row").remove();
|
||||||
reindexRules();
|
reindexRules();
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
@@ -39,7 +39,7 @@ $(document).ready(function () {
|
|||||||
$(".verifyRuleRow").on("click", function(e) {
|
$(".verifyRuleRow").on("click", function(e) {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
|
|
||||||
let row = $(this).closest("tr");
|
let row = $(this).closest(".fieldlist-row");
|
||||||
let field = row.find("select[name$='field']").val();
|
let field = row.find("select[name$='field']").val();
|
||||||
let operator = row.find("select[name$='operator']").val();
|
let operator = row.find("select[name$='operator']").val();
|
||||||
let value = row.find("input[name$='value']").val();
|
let value = row.find("input[name$='value']").val();
|
||||||
@@ -128,7 +128,7 @@ $(document).ready(function () {
|
|||||||
$(".addRuleRow, .removeRuleRow, .verifyRuleRow").off("click");
|
$(".addRuleRow, .removeRuleRow, .verifyRuleRow").off("click");
|
||||||
|
|
||||||
// Reindex all form elements
|
// Reindex all form elements
|
||||||
$("#rulesTable tbody tr").each(function(index) {
|
$("#rulesTable .fieldlist-row").each(function(index) {
|
||||||
$(this).find("select, input").each(function() {
|
$(this).find("select, input").each(function() {
|
||||||
let oldName = $(this).attr("name");
|
let oldName = $(this).attr("name");
|
||||||
let oldId = $(this).attr("id");
|
let oldId = $(this).attr("id");
|
||||||
|
|||||||
@@ -0,0 +1,135 @@
|
|||||||
|
/* Styles for the flexbox-based table replacement for conditions */
|
||||||
|
.fieldlist_formfields {
|
||||||
|
width: 100%;
|
||||||
|
background-color: var(--color-background, #fff);
|
||||||
|
border-radius: 4px;
|
||||||
|
border: 1px solid var(--color-border-table-cell, #cbcbcb);
|
||||||
|
|
||||||
|
/* Header row */
|
||||||
|
.fieldlist-header {
|
||||||
|
display: flex;
|
||||||
|
background-color: var(--color-background-table-thead, #e0e0e0);
|
||||||
|
font-weight: bold;
|
||||||
|
border-bottom: 1px solid var(--color-border-table-cell, #cbcbcb);
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-header-cell {
|
||||||
|
flex: 1;
|
||||||
|
padding: 0.5em 1em;
|
||||||
|
text-align: left;
|
||||||
|
|
||||||
|
&:last-child {
|
||||||
|
flex: 0 0 120px; /* Fixed width for actions column */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Body rows */
|
||||||
|
.fieldlist-body {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-row {
|
||||||
|
display: flex;
|
||||||
|
border-bottom: 1px solid var(--color-border-table-cell, #cbcbcb);
|
||||||
|
|
||||||
|
&:last-child {
|
||||||
|
border-bottom: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
&:nth-child(2n-1) {
|
||||||
|
background-color: var(--color-table-stripe, #f2f2f2);
|
||||||
|
}
|
||||||
|
|
||||||
|
&.error-row {
|
||||||
|
background-color: var(--color-error-input, #ffdddd);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-cell {
|
||||||
|
flex: 1;
|
||||||
|
padding: 0.5em 1em;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
justify-content: center;
|
||||||
|
|
||||||
|
/* Make inputs take up full width of their cell */
|
||||||
|
input, select {
|
||||||
|
width: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
|
&.fieldlist-actions {
|
||||||
|
flex: 0 0 120px; /* Fixed width for actions column */
|
||||||
|
display: flex;
|
||||||
|
flex-direction: row;
|
||||||
|
align-items: center;
|
||||||
|
gap: 4px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Error styling */
|
||||||
|
ul.errors {
|
||||||
|
margin-top: 0.5em;
|
||||||
|
margin-bottom: 0;
|
||||||
|
padding: 0.5em;
|
||||||
|
background-color: var(--color-error-background-snapshot-age, #ffdddd);
|
||||||
|
border-radius: 4px;
|
||||||
|
list-style-position: inside;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Responsive styles */
|
||||||
|
@media only screen and (max-width: 760px) {
|
||||||
|
.fieldlist-header, .fieldlist-row {
|
||||||
|
flex-direction: column;
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-header-cell {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-row {
|
||||||
|
padding: 0.5em 0;
|
||||||
|
border-bottom: 2px solid var(--color-border-table-cell, #cbcbcb);
|
||||||
|
}
|
||||||
|
|
||||||
|
.fieldlist-cell {
|
||||||
|
padding: 0.25em 0.5em;
|
||||||
|
|
||||||
|
&.fieldlist-actions {
|
||||||
|
flex: 1;
|
||||||
|
justify-content: flex-start;
|
||||||
|
padding-top: 0.5em;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Add some spacing between fields on mobile */
|
||||||
|
.fieldlist-cell:not(:last-child) {
|
||||||
|
margin-bottom: 0.5em;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Label each cell on mobile view */
|
||||||
|
.fieldlist-cell::before {
|
||||||
|
content: attr(data-label);
|
||||||
|
font-weight: bold;
|
||||||
|
margin-bottom: 0.25em;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Button styling */
|
||||||
|
.fieldlist_formfields {
|
||||||
|
.addRuleRow, .removeRuleRow, .verifyRuleRow {
|
||||||
|
cursor: pointer;
|
||||||
|
border: none;
|
||||||
|
padding: 4px 8px;
|
||||||
|
border-radius: 3px;
|
||||||
|
font-weight: bold;
|
||||||
|
background-color: #aaa;
|
||||||
|
color: var(--color-foreground-text, #fff);
|
||||||
|
|
||||||
|
&:hover {
|
||||||
|
background-color: #999;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
@@ -14,6 +14,7 @@
|
|||||||
@import "parts/_love";
|
@import "parts/_love";
|
||||||
@import "parts/preview_text_filter";
|
@import "parts/preview_text_filter";
|
||||||
@import "parts/_edit";
|
@import "parts/_edit";
|
||||||
|
@import "parts/_conditions_table";
|
||||||
|
|
||||||
body {
|
body {
|
||||||
color: var(--color-text);
|
color: var(--color-text);
|
||||||
|
|||||||
@@ -530,6 +530,99 @@ ul#conditions_match_logic {
|
|||||||
ul#conditions_match_logic li {
|
ul#conditions_match_logic li {
|
||||||
padding-right: 1em; }
|
padding-right: 1em; }
|
||||||
|
|
||||||
|
/* Styles for the flexbox-based table replacement for conditions */
|
||||||
|
.fieldlist_formfields {
|
||||||
|
width: 100%;
|
||||||
|
background-color: var(--color-background, #fff);
|
||||||
|
border-radius: 4px;
|
||||||
|
border: 1px solid var(--color-border-table-cell, #cbcbcb);
|
||||||
|
/* Header row */
|
||||||
|
/* Body rows */
|
||||||
|
/* Error styling */
|
||||||
|
/* Responsive styles */ }
|
||||||
|
.fieldlist_formfields .fieldlist-header {
|
||||||
|
display: flex;
|
||||||
|
background-color: var(--color-background-table-thead, #e0e0e0);
|
||||||
|
font-weight: bold;
|
||||||
|
border-bottom: 1px solid var(--color-border-table-cell, #cbcbcb); }
|
||||||
|
.fieldlist_formfields .fieldlist-header-cell {
|
||||||
|
flex: 1;
|
||||||
|
padding: 0.5em 1em;
|
||||||
|
text-align: left; }
|
||||||
|
.fieldlist_formfields .fieldlist-header-cell:last-child {
|
||||||
|
flex: 0 0 120px;
|
||||||
|
/* Fixed width for actions column */ }
|
||||||
|
.fieldlist_formfields .fieldlist-body {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column; }
|
||||||
|
.fieldlist_formfields .fieldlist-row {
|
||||||
|
display: flex;
|
||||||
|
border-bottom: 1px solid var(--color-border-table-cell, #cbcbcb); }
|
||||||
|
.fieldlist_formfields .fieldlist-row:last-child {
|
||||||
|
border-bottom: none; }
|
||||||
|
.fieldlist_formfields .fieldlist-row:nth-child(2n-1) {
|
||||||
|
background-color: var(--color-table-stripe, #f2f2f2); }
|
||||||
|
.fieldlist_formfields .fieldlist-row.error-row {
|
||||||
|
background-color: var(--color-error-input, #ffdddd); }
|
||||||
|
.fieldlist_formfields .fieldlist-cell {
|
||||||
|
flex: 1;
|
||||||
|
padding: 0.5em 1em;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
justify-content: center;
|
||||||
|
/* Make inputs take up full width of their cell */ }
|
||||||
|
.fieldlist_formfields .fieldlist-cell input, .fieldlist_formfields .fieldlist-cell select {
|
||||||
|
width: 100%; }
|
||||||
|
.fieldlist_formfields .fieldlist-cell.fieldlist-actions {
|
||||||
|
flex: 0 0 120px;
|
||||||
|
/* Fixed width for actions column */
|
||||||
|
display: flex;
|
||||||
|
flex-direction: row;
|
||||||
|
align-items: center;
|
||||||
|
gap: 4px; }
|
||||||
|
.fieldlist_formfields ul.errors {
|
||||||
|
margin-top: 0.5em;
|
||||||
|
margin-bottom: 0;
|
||||||
|
padding: 0.5em;
|
||||||
|
background-color: var(--color-error-background-snapshot-age, #ffdddd);
|
||||||
|
border-radius: 4px;
|
||||||
|
list-style-position: inside; }
|
||||||
|
@media only screen and (max-width: 760px) {
|
||||||
|
.fieldlist_formfields {
|
||||||
|
/* Add some spacing between fields on mobile */
|
||||||
|
/* Label each cell on mobile view */ }
|
||||||
|
.fieldlist_formfields .fieldlist-header, .fieldlist_formfields .fieldlist-row {
|
||||||
|
flex-direction: column; }
|
||||||
|
.fieldlist_formfields .fieldlist-header-cell {
|
||||||
|
display: none; }
|
||||||
|
.fieldlist_formfields .fieldlist-row {
|
||||||
|
padding: 0.5em 0;
|
||||||
|
border-bottom: 2px solid var(--color-border-table-cell, #cbcbcb); }
|
||||||
|
.fieldlist_formfields .fieldlist-cell {
|
||||||
|
padding: 0.25em 0.5em; }
|
||||||
|
.fieldlist_formfields .fieldlist-cell.fieldlist-actions {
|
||||||
|
flex: 1;
|
||||||
|
justify-content: flex-start;
|
||||||
|
padding-top: 0.5em; }
|
||||||
|
.fieldlist_formfields .fieldlist-cell:not(:last-child) {
|
||||||
|
margin-bottom: 0.5em; }
|
||||||
|
.fieldlist_formfields .fieldlist-cell::before {
|
||||||
|
content: attr(data-label);
|
||||||
|
font-weight: bold;
|
||||||
|
margin-bottom: 0.25em; } }
|
||||||
|
|
||||||
|
/* Button styling */
|
||||||
|
.fieldlist_formfields .addRuleRow, .fieldlist_formfields .removeRuleRow, .fieldlist_formfields .verifyRuleRow {
|
||||||
|
cursor: pointer;
|
||||||
|
border: none;
|
||||||
|
padding: 4px 8px;
|
||||||
|
border-radius: 3px;
|
||||||
|
font-weight: bold;
|
||||||
|
background-color: #aaa;
|
||||||
|
color: var(--color-foreground-text, #fff); }
|
||||||
|
.fieldlist_formfields .addRuleRow:hover, .fieldlist_formfields .removeRuleRow:hover, .fieldlist_formfields .verifyRuleRow:hover {
|
||||||
|
background-color: #999; }
|
||||||
|
|
||||||
body {
|
body {
|
||||||
color: var(--color-text);
|
color: var(--color-text);
|
||||||
background: var(--color-background-page);
|
background: var(--color-background-page);
|
||||||
|
|||||||
@@ -61,21 +61,20 @@
|
|||||||
{{ field(**kwargs)|safe }}
|
{{ field(**kwargs)|safe }}
|
||||||
{% endmacro %}
|
{% endmacro %}
|
||||||
|
|
||||||
{% macro render_fieldlist_of_formfields_as_table(fieldlist, table_id="rulesTable") %}
|
{% macro render_conditions_fieldlist_of_formfields_as_table(fieldlist, table_id="rulesTable") %}
|
||||||
<table class="fieldlist_formfields pure-table" id="{{ table_id }}">
|
<div class="fieldlist_formfields" id="{{ table_id }}">
|
||||||
<thead>
|
<div class="fieldlist-header">
|
||||||
<tr>
|
{% for subfield in fieldlist[0] %}
|
||||||
{% for subfield in fieldlist[0] %}
|
<div class="fieldlist-header-cell">{{ subfield.label }}</div>
|
||||||
<th>{{ subfield.label }}</th>
|
{% endfor %}
|
||||||
{% endfor %}
|
<div class="fieldlist-header-cell">Actions</div>
|
||||||
<th>Actions</th>
|
</div>
|
||||||
</tr>
|
<div class="fieldlist-body">
|
||||||
</thead>
|
|
||||||
<tbody>
|
|
||||||
{% for form_row in fieldlist %}
|
{% for form_row in fieldlist %}
|
||||||
<tr {% if form_row.errors %} class="error-row" {% endif %}>
|
<div class="fieldlist-row {% if form_row.errors %}error-row{% endif %}">
|
||||||
{% for subfield in form_row %}
|
{% for subfield in form_row %}
|
||||||
<td>
|
<div class="fieldlist-cell">
|
||||||
|
|
||||||
{{ subfield()|safe }}
|
{{ subfield()|safe }}
|
||||||
{% if subfield.errors %}
|
{% if subfield.errors %}
|
||||||
<ul class="errors">
|
<ul class="errors">
|
||||||
@@ -84,17 +83,17 @@
|
|||||||
{% endfor %}
|
{% endfor %}
|
||||||
</ul>
|
</ul>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
</td>
|
</div>
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
<td>
|
<div class="fieldlist-cell fieldlist-actions">
|
||||||
<button type="button" class="addRuleRow">+</button>
|
<button type="button" class="addRuleRow" title="Add a row/rule after">+</button>
|
||||||
<button type="button" class="removeRuleRow">-</button>
|
<button type="button" class="removeRuleRow" title="Remove this row/rule">-</button>
|
||||||
<button type="button" class="verifyRuleRow" title="Verify this rule against current snapshot">✓</button>
|
<button type="button" class="verifyRuleRow" title="Verify this rule against current snapshot">✓</button>
|
||||||
</td>
|
</div>
|
||||||
</tr>
|
</div>
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
</tbody>
|
</div>
|
||||||
</table>
|
</div>
|
||||||
{% endmacro %}
|
{% endmacro %}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -159,7 +159,7 @@
|
|||||||
<a id="chrome-extension-link"
|
<a id="chrome-extension-link"
|
||||||
title="Chrome Extension - Web Page Change Detection with changedetection.io!"
|
title="Chrome Extension - Web Page Change Detection with changedetection.io!"
|
||||||
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
|
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
|
||||||
<img alt="Chrome store icon" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}">
|
<img alt="Chrome store icon" src="{{url_for('static_content', group='images', filename='google-chrome-icon.png')}}">
|
||||||
Chrome Webstore
|
Chrome Webstore
|
||||||
</a>
|
</a>
|
||||||
</p>
|
</p>
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
{% extends 'base.html' %}
|
{% extends 'base.html' %}
|
||||||
{% block content %}
|
{% block content %}
|
||||||
{% from '_helpers.html' import render_field, render_checkbox_field, render_button, render_time_schedule_form, playwright_warning, only_webdriver_type_watches_warning, render_fieldlist_of_formfields_as_table %}
|
{% from '_helpers.html' import render_field, render_checkbox_field, render_button, render_time_schedule_form, playwright_warning, only_webdriver_type_watches_warning, render_conditions_fieldlist_of_formfields_as_table %}
|
||||||
{% from '_common_fields.html' import render_common_settings_form %}
|
{% from '_common_fields.html' import render_common_settings_form %}
|
||||||
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
|
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
|
||||||
<script src="{{url_for('static_content', group='js', filename='vis.js')}}" defer></script>
|
<script src="{{url_for('static_content', group='js', filename='vis.js')}}" defer></script>
|
||||||
@@ -289,25 +289,13 @@ Math: {{ 1 + 1 }}") }}
|
|||||||
<script>
|
<script>
|
||||||
const verify_condition_rule_url="{{url_for('conditions.verify_condition_single_rule', watch_uuid=uuid)}}";
|
const verify_condition_rule_url="{{url_for('conditions.verify_condition_single_rule', watch_uuid=uuid)}}";
|
||||||
</script>
|
</script>
|
||||||
<style>
|
|
||||||
.verifyRuleRow {
|
|
||||||
background-color: #4caf50;
|
|
||||||
color: white;
|
|
||||||
border: none;
|
|
||||||
cursor: pointer;
|
|
||||||
font-weight: bold;
|
|
||||||
}
|
|
||||||
.verifyRuleRow:hover {
|
|
||||||
background-color: #45a049;
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
<div class="pure-control-group">
|
<div class="pure-control-group">
|
||||||
{{ render_field(form.conditions_match_logic) }}
|
{{ render_field(form.conditions_match_logic) }}
|
||||||
{{ render_fieldlist_of_formfields_as_table(form.conditions) }}
|
{{ render_conditions_fieldlist_of_formfields_as_table(form.conditions) }}
|
||||||
<div class="pure-form-message-inline">
|
<div class="pure-form-message-inline">
|
||||||
|
|
||||||
<p id="verify-state-text">Use the verify (✓) button to test if a condition passes against the current snapshot.</p>
|
<p id="verify-state-text">Use the verify (✓) button to test if a condition passes against the current snapshot.</p>
|
||||||
Did you know that <strong>conditions</strong> can be extended with your own custom plugin? tutorials coming soon!<br>
|
Read a quick tutorial about <a href="https://changedetection.io/tutorial/conditional-actions-web-page-changes">using conditional web page changes here</a>.<br>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -326,61 +314,8 @@ Math: {{ 1 + 1 }}") }}
|
|||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</div>
|
</div>
|
||||||
<div class="pure-control-group">
|
|
||||||
{% set field = render_field(form.include_filters,
|
|
||||||
rows=5,
|
|
||||||
placeholder=has_tag_filters_extra+"#example
|
|
||||||
xpath://body/div/span[contains(@class, 'example-class')]",
|
|
||||||
class="m-d")
|
|
||||||
%}
|
|
||||||
{{ field }}
|
|
||||||
{% if '/text()' in field %}
|
|
||||||
<span class="pure-form-message-inline"><strong>Note!: //text() function does not work where the <element> contains <![CDATA[]]></strong></span><br>
|
|
||||||
{% endif %}
|
|
||||||
<span class="pure-form-message-inline">One CSS, xPath, JSON Path/JQ selector per line, <i>any</i> rules that matches will be used.<br>
|
|
||||||
<span data-target="#advanced-help-selectors" class="toggle-show pure-button button-tag button-xsmall">Show advanced help and tips</span><br>
|
|
||||||
<ul id="advanced-help-selectors" style="display: none;">
|
|
||||||
<li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
|
|
||||||
<li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a> (if installed).
|
|
||||||
<ul>
|
|
||||||
<li>JSONPath: Prefix with <code>json:</code>, use <code>json:$</code> to force re-formatting if required, <a href="https://jsonpath.com/" target="new">test your JSONPath here</a>.</li>
|
|
||||||
{% if jq_support %}
|
|
||||||
<li>jq: Prefix with <code>jq:</code> and <a href="https://jqplay.org/" target="new">test your jq here</a>. Using <a href="https://stedolan.github.io/jq/" target="new">jq</a> allows for complex filtering and processing of JSON data with built-in functions, regex, filtering, and more. See examples and documentation <a href="https://stedolan.github.io/jq/manual/" target="new">here</a>. Prefix <code>jqraw:</code> outputs the results as text instead of a JSON list.</li>
|
|
||||||
{% else %}
|
|
||||||
<li>jq support not installed</li>
|
|
||||||
{% endif %}
|
|
||||||
</ul>
|
|
||||||
</li>
|
|
||||||
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash. To specify XPath to be used explicitly or the XPath rule starts with an XPath function: Prefix with <code>xpath:</code>
|
|
||||||
<ul>
|
|
||||||
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath:count(//*[contains(@class, 'sametext')])</code>, <a
|
|
||||||
href="http://xpather.com/" target="new">test your XPath here</a></li>
|
|
||||||
<li>Example: Get all titles from an RSS feed <code>//title/text()</code></li>
|
|
||||||
<li>To use XPath1.0: Prefix with <code>xpath1:</code></li>
|
|
||||||
</ul>
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath{% if jq_support %}, or jq selector{%endif%} rules before filing an issue on GitHub! <a
|
|
||||||
href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br>
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
</span>
|
{% include "edit/include_subtract.html" %}
|
||||||
</div>
|
|
||||||
<fieldset class="pure-control-group">
|
|
||||||
{{ render_field(form.subtractive_selectors, rows=5, placeholder=has_tag_filters_extra+"header
|
|
||||||
footer
|
|
||||||
nav
|
|
||||||
.stockticker
|
|
||||||
//*[contains(text(), 'Advertisement')]") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li> Remove HTML element(s) by CSS and XPath selectors before text conversion. </li>
|
|
||||||
<li> Don't paste HTML here, use only CSS and XPath selectors </li>
|
|
||||||
<li> Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML. </li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
</fieldset>
|
|
||||||
<div class="text-filtering border-fieldset">
|
<div class="text-filtering border-fieldset">
|
||||||
<fieldset class="pure-group" id="text-filtering-type-options">
|
<fieldset class="pure-group" id="text-filtering-type-options">
|
||||||
<h3>Text filtering</h3>
|
<h3>Text filtering</h3>
|
||||||
@@ -408,76 +343,9 @@ nav
|
|||||||
{{ render_checkbox_field(form.trim_text_whitespace) }}
|
{{ render_checkbox_field(form.trim_text_whitespace) }}
|
||||||
<span class="pure-form-message-inline">Remove any whitespace before and after each line of text</span>
|
<span class="pure-form-message-inline">Remove any whitespace before and after each line of text</span>
|
||||||
</fieldset>
|
</fieldset>
|
||||||
<fieldset>
|
{% include "edit/text-options.html" %}
|
||||||
<div class="pure-control-group">
|
|
||||||
{{ render_field(form.trigger_text, rows=5, placeholder="Some text to wait for in a line
|
|
||||||
/some.regex\d{2}/ for case-INsensitive regex
|
|
||||||
") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li>Text to wait for before triggering a change/notification, all text and regex are tested <i>case-insensitive</i>.</li>
|
|
||||||
<li>Trigger text is processed from the result-text that comes out of any CSS/JSON Filters for this watch</li>
|
|
||||||
<li>Each line is processed separately (think of each line as "OR")</li>
|
|
||||||
<li>Note: Wrap in forward slash / to use regex example: <code>/foo\d/</code></li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
</fieldset>
|
|
||||||
<fieldset class="pure-group">
|
|
||||||
{{ render_field(form.ignore_text, rows=5, placeholder="Some text to ignore in a line
|
|
||||||
/some.regex\d{2}/ for case-INsensitive regex
|
|
||||||
") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li>Matching text will be <strong>ignored</strong> in the text snapshot (you can still see it but it wont trigger a change)</li>
|
|
||||||
<li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
|
|
||||||
<li>Regular Expression support, wrap the entire line in forward slash <code>/regex/</code></li>
|
|
||||||
<li>Changing this will affect the comparison checksum which may trigger an alert</li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
|
|
||||||
</fieldset>
|
|
||||||
|
|
||||||
<fieldset>
|
|
||||||
<div class="pure-control-group">
|
|
||||||
{{ render_field(form.text_should_not_be_present, rows=5, placeholder="For example: Out of stock
|
|
||||||
Sold out
|
|
||||||
Not in stock
|
|
||||||
Unavailable") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li>Block change-detection while this text is on the page, all text and regex are tested <i>case-insensitive</i>, good for waiting for when a product is available again</li>
|
|
||||||
<li>Block text is processed from the result-text that comes out of any CSS/JSON Filters for this watch</li>
|
|
||||||
<li>All lines here must not exist (think of each line as "OR")</li>
|
|
||||||
<li>Note: Wrap in forward slash / to use regex example: <code>/foo\d/</code></li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
</fieldset>
|
|
||||||
<fieldset>
|
|
||||||
<div class="pure-control-group">
|
|
||||||
{{ render_field(form.extract_text, rows=5, placeholder="/.+?\d+ comments.+?/
|
|
||||||
or
|
|
||||||
keyword") }}
|
|
||||||
<span class="pure-form-message-inline">
|
|
||||||
<ul>
|
|
||||||
<li>Extracts text in the final output (line by line) after other filters using regular expressions or string match;
|
|
||||||
<ul>
|
|
||||||
<li>Regular expression ‐ example <code>/reports.+?2022/i</code></li>
|
|
||||||
<li>Don't forget to consider the white-space at the start of a line <code>/.+?reports.+?2022/i</code></li>
|
|
||||||
<li>Use <code>//(?aiLmsux))</code> type flags (more <a href="https://docs.python.org/3/library/re.html#index-15">information here</a>)<br></li>
|
|
||||||
<li>Keyword example ‐ example <code>Out of stock</code></li>
|
|
||||||
<li>Use groups to extract just that text ‐ example <code>/reports.+?(\d+)/i</code> returns a list of years only</li>
|
|
||||||
<li>Example - match lines containing a keyword <code>/.*icecream.*/</code></li>
|
|
||||||
</ul>
|
|
||||||
</li>
|
|
||||||
<li>One line per regular-expression/string match</li>
|
|
||||||
</ul>
|
|
||||||
</span>
|
|
||||||
</div>
|
|
||||||
</fieldset>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div id="text-preview" style="display: none;" >
|
<div id="text-preview" style="display: none;" >
|
||||||
<script>
|
<script>
|
||||||
const preview_text_edit_filters_url="{{url_for('ui.ui_edit.watch_get_preview_rendered', uuid=uuid)}}";
|
const preview_text_edit_filters_url="{{url_for('ui.ui_edit.watch_get_preview_rendered', uuid=uuid)}}";
|
||||||
|
|||||||
55
changedetectionio/templates/edit/include_subtract.html
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
<div class="pure-control-group">
|
||||||
|
{% set field = render_field(form.include_filters,
|
||||||
|
rows=5,
|
||||||
|
placeholder=has_tag_filters_extra+"#example
|
||||||
|
xpath://body/div/span[contains(@class, 'example-class')]",
|
||||||
|
class="m-d")
|
||||||
|
%}
|
||||||
|
{{ field }}
|
||||||
|
{% if '/text()' in field %}
|
||||||
|
<span class="pure-form-message-inline"><strong>Note!: //text() function does not work where the <element> contains <![CDATA[]]></strong></span><br>
|
||||||
|
{% endif %}
|
||||||
|
<span class="pure-form-message-inline">One CSS, xPath 1 & 2, JSON Path/JQ selector per line, <i>any</i> rules that matches will be used.<br>
|
||||||
|
<span data-target="#advanced-help-selectors" class="toggle-show pure-button button-tag button-xsmall">Show advanced help and tips</span><br>
|
||||||
|
<ul id="advanced-help-selectors" style="display: none;">
|
||||||
|
<li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
|
||||||
|
<li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a> (if installed).
|
||||||
|
<ul>
|
||||||
|
<li>JSONPath: Prefix with <code>json:</code>, use <code>json:$</code> to force re-formatting if required, <a href="https://jsonpath.com/" target="new">test your JSONPath here</a>.</li>
|
||||||
|
{% if jq_support %}
|
||||||
|
<li>jq: Prefix with <code>jq:</code> and <a href="https://jqplay.org/" target="new">test your jq here</a>. Using <a href="https://stedolan.github.io/jq/" target="new">jq</a> allows for complex filtering and processing of JSON data with built-in functions, regex, filtering, and more. See examples and documentation <a href="https://stedolan.github.io/jq/manual/" target="new">here</a>. Prefix <code>jqraw:</code> outputs the results as text instead of a JSON list.</li>
|
||||||
|
{% else %}
|
||||||
|
<li>jq support not installed</li>
|
||||||
|
{% endif %}
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash. To specify XPath to be used explicitly or the XPath rule starts with an XPath function: Prefix with <code>xpath:</code>
|
||||||
|
<ul>
|
||||||
|
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath:count(//*[contains(@class, 'sametext')])</code>, <a
|
||||||
|
href="http://xpather.com/" target="new">test your XPath here</a></li>
|
||||||
|
<li>Example: Get all titles from an RSS feed <code>//title/text()</code></li>
|
||||||
|
<li>To use XPath1.0: Prefix with <code>xpath1:</code></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath{% if jq_support %}, or jq selector{%endif%} rules before filing an issue on GitHub! <a
|
||||||
|
href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<fieldset class="pure-control-group">
|
||||||
|
{{ render_field(form.subtractive_selectors, rows=5, placeholder=has_tag_filters_extra+"header
|
||||||
|
footer
|
||||||
|
nav
|
||||||
|
.stockticker
|
||||||
|
//*[contains(text(), 'Advertisement')]") }}
|
||||||
|
<span class="pure-form-message-inline">
|
||||||
|
<ul>
|
||||||
|
<li> Remove HTML element(s) by CSS and XPath selectors before text conversion. </li>
|
||||||
|
<li> Don't paste HTML here, use only CSS and XPath selectors </li>
|
||||||
|
<li> Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML. </li>
|
||||||
|
</ul>
|
||||||
|
</span>
|
||||||
|
</fieldset>
|
||||||
69
changedetectionio/templates/edit/text-options.html
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
|
||||||
|
<fieldset>
|
||||||
|
<div class="pure-control-group">
|
||||||
|
{{ render_field(form.trigger_text, rows=5, placeholder="Some text to wait for in a line
|
||||||
|
/some.regex\d{2}/ for case-INsensitive regex
|
||||||
|
") }}
|
||||||
|
<span class="pure-form-message-inline">
|
||||||
|
<ul>
|
||||||
|
<li>Text to wait for before triggering a change/notification, all text and regex are tested <i>case-insensitive</i>.</li>
|
||||||
|
<li>Trigger text is processed from the result-text that comes out of any CSS/JSON Filters for this watch</li>
|
||||||
|
<li>Each line is processed separately (think of each line as "OR")</li>
|
||||||
|
<li>Note: Wrap in forward slash / to use regex example: <code>/foo\d/</code></li>
|
||||||
|
</ul>
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
</fieldset>
|
||||||
|
<fieldset class="pure-group">
|
||||||
|
{{ render_field(form.ignore_text, rows=5, placeholder="Some text to ignore in a line
|
||||||
|
/some.regex\d{2}/ for case-INsensitive regex
|
||||||
|
") }}
|
||||||
|
<span class="pure-form-message-inline">
|
||||||
|
<ul>
|
||||||
|
<li>Matching text will be <strong>ignored</strong> in the text snapshot (you can still see it but it wont trigger a change)</li>
|
||||||
|
<li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
|
||||||
|
<li>Regular Expression support, wrap the entire line in forward slash <code>/regex/</code></li>
|
||||||
|
<li>Changing this will affect the comparison checksum which may trigger an alert</li>
|
||||||
|
</ul>
|
||||||
|
</span>
|
||||||
|
|
||||||
|
</fieldset>
|
||||||
|
|
||||||
|
<fieldset>
|
||||||
|
<div class="pure-control-group">
|
||||||
|
{{ render_field(form.text_should_not_be_present, rows=5, placeholder="For example: Out of stock
|
||||||
|
Sold out
|
||||||
|
Not in stock
|
||||||
|
Unavailable") }}
|
||||||
|
<span class="pure-form-message-inline">
|
||||||
|
<ul>
|
||||||
|
<li>Block change-detection while this text is on the page, all text and regex are tested <i>case-insensitive</i>, good for waiting for when a product is available again</li>
|
||||||
|
<li>Block text is processed from the result-text that comes out of any CSS/JSON Filters for this watch</li>
|
||||||
|
<li>All lines here must not exist (think of each line as "OR")</li>
|
||||||
|
<li>Note: Wrap in forward slash / to use regex example: <code>/foo\d/</code></li>
|
||||||
|
</ul>
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
</fieldset>
|
||||||
|
<fieldset>
|
||||||
|
<div class="pure-control-group">
|
||||||
|
{{ render_field(form.extract_text, rows=5, placeholder="/.+?\d+ comments.+?/
|
||||||
|
or
|
||||||
|
keyword") }}
|
||||||
|
<span class="pure-form-message-inline">
|
||||||
|
<ul>
|
||||||
|
<li>Extracts text in the final output (line by line) after other filters using regular expressions or string match;
|
||||||
|
<ul>
|
||||||
|
<li>Regular expression ‐ example <code>/reports.+?2022/i</code></li>
|
||||||
|
<li>Don't forget to consider the white-space at the start of a line <code>/.+?reports.+?2022/i</code></li>
|
||||||
|
<li>Use <code>//(?aiLmsux))</code> type flags (more <a href="https://docs.python.org/3/library/re.html#index-15">information here</a>)<br></li>
|
||||||
|
<li>Keyword example ‐ example <code>Out of stock</code></li>
|
||||||
|
<li>Use groups to extract just that text ‐ example <code>/reports.+?(\d+)/i</code> returns a list of years only</li>
|
||||||
|
<li>Example - match lines containing a keyword <code>/.*icecream.*/</code></li>
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
<li>One line per regular-expression/string match</li>
|
||||||
|
</ul>
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
</fieldset>
|
||||||
@@ -25,7 +25,6 @@ def test_setup(live_server):
|
|||||||
|
|
||||||
def get_last_message_from_smtp_server():
|
def get_last_message_from_smtp_server():
|
||||||
import socket
|
import socket
|
||||||
global smtp_test_server
|
|
||||||
port = 11080 # socket server port number
|
port = 11080 # socket server port number
|
||||||
|
|
||||||
client_socket = socket.socket() # instantiate
|
client_socket = socket.socket() # instantiate
|
||||||
@@ -44,7 +43,6 @@ def test_check_notification_email_formats_default_HTML(client, live_server, meas
|
|||||||
# live_server_setup(live_server)
|
# live_server_setup(live_server)
|
||||||
set_original_response()
|
set_original_response()
|
||||||
|
|
||||||
global smtp_test_server
|
|
||||||
notification_url = f'mailto://changedetection@{smtp_test_server}:11025/?to=fff@home.com'
|
notification_url = f'mailto://changedetection@{smtp_test_server}:11025/?to=fff@home.com'
|
||||||
|
|
||||||
#####################
|
#####################
|
||||||
@@ -99,7 +97,6 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
|
|||||||
# https://github.com/caronc/apprise/issues/633
|
# https://github.com/caronc/apprise/issues/633
|
||||||
|
|
||||||
set_original_response()
|
set_original_response()
|
||||||
global smtp_test_server
|
|
||||||
notification_url = f'mailto://changedetection@{smtp_test_server}:11025/?to=fff@home.com'
|
notification_url = f'mailto://changedetection@{smtp_test_server}:11025/?to=fff@home.com'
|
||||||
notification_body = f"""<!DOCTYPE html>
|
notification_body = f"""<!DOCTYPE html>
|
||||||
<html lang="en">
|
<html lang="en">
|
||||||
|
|||||||
@@ -60,6 +60,11 @@ def test_check_access_control(app, client, live_server):
|
|||||||
res = c.get(url_for('static_content', group='styles', filename='404-testetest.css'))
|
res = c.get(url_for('static_content', group='styles', filename='404-testetest.css'))
|
||||||
assert res.status_code == 404
|
assert res.status_code == 404
|
||||||
|
|
||||||
|
# Access to screenshots should be limited by 'shared_diff_access'
|
||||||
|
path = url_for('static_content', group='screenshot', filename='random-uuid-that-will-404.png', _external=True)
|
||||||
|
res = c.get(path)
|
||||||
|
assert res.status_code == 404
|
||||||
|
|
||||||
# Check wrong password does not let us in
|
# Check wrong password does not let us in
|
||||||
res = c.post(
|
res = c.post(
|
||||||
url_for("login"),
|
url_for("login"),
|
||||||
@@ -163,7 +168,7 @@ def test_check_access_control(app, client, live_server):
|
|||||||
url_for("settings.settings_page"),
|
url_for("settings.settings_page"),
|
||||||
data={"application-password": "foobar",
|
data={"application-password": "foobar",
|
||||||
# Should be disabled
|
# Should be disabled
|
||||||
# "application-shared_diff_access": "True",
|
"application-shared_diff_access": "",
|
||||||
"requests-time_between_check-minutes": 180,
|
"requests-time_between_check-minutes": 180,
|
||||||
'application-fetch_backend': "html_requests"},
|
'application-fetch_backend': "html_requests"},
|
||||||
follow_redirects=True
|
follow_redirects=True
|
||||||
@@ -176,6 +181,10 @@ def test_check_access_control(app, client, live_server):
|
|||||||
# Should be logged out
|
# Should be logged out
|
||||||
assert b"Login" in res.data
|
assert b"Login" in res.data
|
||||||
|
|
||||||
|
# Access to screenshots should be limited by 'shared_diff_access'
|
||||||
|
res = c.get(url_for('static_content', group='screenshot', filename='random-uuid-that-will-403.png'))
|
||||||
|
assert res.status_code == 403
|
||||||
|
|
||||||
# The diff page should return something valid when logged out
|
# The diff page should return something valid when logged out
|
||||||
res = c.get(url_for("ui.ui_views.diff_history_page", uuid="first"))
|
res = c.get(url_for("ui.ui_views.diff_history_page", uuid="first"))
|
||||||
assert b'Random content' not in res.data
|
assert b'Random content' not in res.data
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
import json
|
import json
|
||||||
import urllib
|
import time
|
||||||
|
|
||||||
from flask import url_for
|
from flask import url_for
|
||||||
from .util import live_server_setup, wait_for_all_checks
|
from .util import live_server_setup, wait_for_all_checks
|
||||||
@@ -113,6 +113,7 @@ def test_conditions_with_text_and_number(client, live_server):
|
|||||||
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
|
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
|
||||||
wait_for_all_checks(client)
|
wait_for_all_checks(client)
|
||||||
|
|
||||||
|
time.sleep(2)
|
||||||
# 75 is > 20 and < 100 and contains "5"
|
# 75 is > 20 and < 100 and contains "5"
|
||||||
res = client.get(url_for("watchlist.index"))
|
res = client.get(url_for("watchlist.index"))
|
||||||
assert b'unviewed' in res.data
|
assert b'unviewed' in res.data
|
||||||
|
|||||||
@@ -32,7 +32,6 @@ def test_strip_regex_text_func():
|
|||||||
]
|
]
|
||||||
|
|
||||||
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)
|
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)
|
||||||
|
|
||||||
assert "but 1 lines" in stripped_content
|
assert "but 1 lines" in stripped_content
|
||||||
assert "igNORe-cAse text" not in stripped_content
|
assert "igNORe-cAse text" not in stripped_content
|
||||||
assert "but 1234 lines" not in stripped_content
|
assert "but 1234 lines" not in stripped_content
|
||||||
@@ -42,6 +41,46 @@ def test_strip_regex_text_func():
|
|||||||
# Check line number reporting
|
# Check line number reporting
|
||||||
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines, mode="line numbers")
|
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines, mode="line numbers")
|
||||||
assert stripped_content == [2, 5, 6, 7, 8, 10]
|
assert stripped_content == [2, 5, 6, 7, 8, 10]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/but 1.+5 lines/s'])
|
||||||
|
assert "but 1 lines" not in stripped_content
|
||||||
|
assert "skip 5 lines" not in stripped_content
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/but 1.+5 lines/s'], mode="line numbers")
|
||||||
|
assert stripped_content == [4, 5]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/.+/s'])
|
||||||
|
assert stripped_content == ""
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/.+/s'], mode="line numbers")
|
||||||
|
assert stripped_content == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/^.+but.+\\n.+lines$/m'])
|
||||||
|
assert "but 1 lines" not in stripped_content
|
||||||
|
assert "skip 5 lines" not in stripped_content
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/^.+but.+\\n.+lines$/m'], mode="line numbers")
|
||||||
|
assert stripped_content == [4, 5]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/^.+?\.$/m'])
|
||||||
|
assert "but sometimes we want to remove the lines." not in stripped_content
|
||||||
|
assert "but not always." not in stripped_content
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/^.+?\.$/m'], mode="line numbers")
|
||||||
|
assert stripped_content == [2, 11]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/but.+?but/ms'])
|
||||||
|
assert "but sometimes we want to remove the lines." not in stripped_content
|
||||||
|
assert "but 1 lines" not in stripped_content
|
||||||
|
assert "but 1234 lines" not in stripped_content
|
||||||
|
assert "igNORe-cAse text we dont want to keep" not in stripped_content
|
||||||
|
assert "but not always." not in stripped_content
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text(test_content, ['/but.+?but/ms'], mode="line numbers")
|
||||||
|
assert stripped_content == [2, 3, 4, 9, 10, 11]
|
||||||
|
|
||||||
|
stripped_content = html_tools.strip_ignore_text("\n\ntext\n\ntext\n\n", ['/^$/ms'], mode="line numbers")
|
||||||
|
assert stripped_content == [1, 2, 4, 6]
|
||||||
|
|
||||||
# Check that linefeeds are preserved when there are is no matching ignores
|
# Check that linefeeds are preserved when there are is no matching ignores
|
||||||
content = "some text\n\nand other text\n"
|
content = "some text\n\nand other text\n"
|
||||||
|
|||||||
@@ -167,7 +167,10 @@ def test_check_notification(client, live_server, measure_memory_usage):
|
|||||||
assert ':-)' in notification_submission
|
assert ':-)' in notification_submission
|
||||||
# Check the attachment was added, and that it is a JPEG from the original PNG
|
# Check the attachment was added, and that it is a JPEG from the original PNG
|
||||||
notification_submission_object = json.loads(notification_submission)
|
notification_submission_object = json.loads(notification_submission)
|
||||||
|
assert notification_submission_object
|
||||||
|
|
||||||
# We keep PNG screenshots for now
|
# We keep PNG screenshots for now
|
||||||
|
# IF THIS FAILS YOU SHOULD BE TESTING WITH ENV VAR REMOVE_REQUESTS_OLD_SCREENSHOTS=False
|
||||||
assert notification_submission_object['attachments'][0]['filename'] == 'last-screenshot.png'
|
assert notification_submission_object['attachments'][0]['filename'] == 'last-screenshot.png'
|
||||||
assert len(notification_submission_object['attachments'][0]['base64'])
|
assert len(notification_submission_object['attachments'][0]['base64'])
|
||||||
assert notification_submission_object['attachments'][0]['mimetype'] == 'image/png'
|
assert notification_submission_object['attachments'][0]['mimetype'] == 'image/png'
|
||||||
|
|||||||
80
changedetectionio/tests/test_ui.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
from flask import url_for
|
||||||
|
from .util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks
|
||||||
|
|
||||||
|
def test_checkbox_open_diff_in_new_tab(client, live_server):
|
||||||
|
|
||||||
|
set_original_response()
|
||||||
|
live_server_setup(live_server)
|
||||||
|
|
||||||
|
# Add our URL to the import page
|
||||||
|
res = client.post(
|
||||||
|
url_for("imports.import_page"),
|
||||||
|
data={"urls": url_for('test_endpoint', _external=True)},
|
||||||
|
follow_redirects=True
|
||||||
|
)
|
||||||
|
|
||||||
|
assert b"1 Imported" in res.data
|
||||||
|
wait_for_all_checks(client)
|
||||||
|
|
||||||
|
# Make a change
|
||||||
|
set_modified_response()
|
||||||
|
|
||||||
|
# Test case 1 - checkbox is enabled in settings
|
||||||
|
res = client.post(
|
||||||
|
url_for("settings.settings_page"),
|
||||||
|
data={"application-ui-open_diff_in_new_tab": "1"},
|
||||||
|
follow_redirects=True
|
||||||
|
)
|
||||||
|
assert b'Settings updated' in res.data
|
||||||
|
|
||||||
|
# Force recheck
|
||||||
|
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
|
||||||
|
assert b'Queued 1 watch for rechecking.' in res.data
|
||||||
|
|
||||||
|
wait_for_all_checks(client)
|
||||||
|
|
||||||
|
res = client.get(url_for("watchlist.index"))
|
||||||
|
lines = res.data.decode().split("\n")
|
||||||
|
|
||||||
|
# Find link to diff page
|
||||||
|
target_line = None
|
||||||
|
for line in lines:
|
||||||
|
if '/diff' in line:
|
||||||
|
target_line = line.strip()
|
||||||
|
break
|
||||||
|
|
||||||
|
assert target_line != None
|
||||||
|
assert 'target=' in target_line
|
||||||
|
|
||||||
|
# Test case 2 - checkbox is disabled in settings
|
||||||
|
res = client.post(
|
||||||
|
url_for("settings.settings_page"),
|
||||||
|
data={"application-ui-open_diff_in_new_tab": ""},
|
||||||
|
follow_redirects=True
|
||||||
|
)
|
||||||
|
assert b'Settings updated' in res.data
|
||||||
|
|
||||||
|
# Force recheck
|
||||||
|
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
|
||||||
|
assert b'Queued 1 watch for rechecking.' in res.data
|
||||||
|
|
||||||
|
wait_for_all_checks(client)
|
||||||
|
|
||||||
|
res = client.get(url_for("watchlist.index"))
|
||||||
|
lines = res.data.decode().split("\n")
|
||||||
|
|
||||||
|
# Find link to diff page
|
||||||
|
target_line = None
|
||||||
|
for line in lines:
|
||||||
|
if '/diff' in line:
|
||||||
|
target_line = line.strip()
|
||||||
|
break
|
||||||
|
|
||||||
|
assert target_line != None
|
||||||
|
assert 'target=' not in target_line
|
||||||
|
|
||||||
|
# Cleanup everything
|
||||||
|
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
|
||||||
|
assert b'Deleted' in res.data
|
||||||
@@ -109,7 +109,6 @@ class update_worker(threading.Thread):
|
|||||||
default_notification_title
|
default_notification_title
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
# Would be better if this was some kind of Object where Watch can reference the parent datastore etc
|
# Would be better if this was some kind of Object where Watch can reference the parent datastore etc
|
||||||
v = watch.get(var_name)
|
v = watch.get(var_name)
|
||||||
if v and not watch.get('notification_muted'):
|
if v and not watch.get('notification_muted'):
|
||||||
@@ -592,6 +591,7 @@ class update_worker(threading.Thread):
|
|||||||
|
|
||||||
self.current_uuid = None # Done
|
self.current_uuid = None # Done
|
||||||
self.q.task_done()
|
self.q.task_done()
|
||||||
|
update_handler = None
|
||||||
logger.debug(f"Watch {uuid} done in {time.time()-fetch_start_time:.2f}s")
|
logger.debug(f"Watch {uuid} done in {time.time()-fetch_start_time:.2f}s")
|
||||||
|
|
||||||
# Give the CPU time to interrupt
|
# Give the CPU time to interrupt
|
||||||
|
|||||||
@@ -63,6 +63,10 @@ services:
|
|||||||
#
|
#
|
||||||
# A valid timezone name to run as (for scheduling watch checking) see https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
# A valid timezone name to run as (for scheduling watch checking) see https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||||
# - TZ=America/Los_Angeles
|
# - TZ=America/Los_Angeles
|
||||||
|
#
|
||||||
|
# Maximum height of screenshots, default is 16000 px, screenshots will be clipped to this if exceeded.
|
||||||
|
# RAM usage will be higher if you increase this.
|
||||||
|
# - SCREENSHOT_MAX_HEIGHT=16000
|
||||||
|
|
||||||
# Comment out ports: when using behind a reverse proxy , enable networks: etc.
|
# Comment out ports: when using behind a reverse proxy , enable networks: etc.
|
||||||
ports:
|
ports:
|
||||||
|
|||||||
BIN
docs/web-page-change-conditions.png
Normal file
|
After Width: | Height: | Size: 104 KiB |