Compare commits

..

59 Commits

Author SHA1 Message Date
dgtlmoon
5d4ebff235 little for apprise type urls that are converted ot post:// 2023-11-30 13:15:39 +01:00
dgtlmoon
74d4c580cf Merge branch 'fixing-post-headers' into api-import 2023-11-30 12:16:32 +01:00
dgtlmoon
b899579ca8 properly handle user/pass auth and clean URL without token 2023-11-30 12:15:22 +01:00
dgtlmoon
1f7f1e2bfa Fixing support for headers in custom post, posts etc notifications 2023-11-30 11:34:19 +01:00
dgtlmoon
0df773a12c Fixing support for headers in custom post, posts etc notifications 2023-11-30 11:33:36 +01:00
dgtlmoon
6a5566e771 Add import by list 2023-11-30 10:14:33 +01:00
dgtlmoon
7fe0ef7099 0.45.8 2023-11-29 10:25:11 +01:00
dgtlmoon
fe70beeaed Restock detector - adding more detection strings 2023-11-29 10:21:30 +01:00
dgtlmoon
abf7ed9085 UI - remove incorrect label 2023-11-29 10:19:49 +01:00
dgtlmoon
19e752e9ba UI - "Add new watch" URL at main input box should always grow to match the viewport 2023-11-28 18:11:11 +01:00
dgtlmoon
684e96f5f1 UI - Tidy-up for advanced settings under watch edit, HTML validation fixes (#2011) 2023-11-28 17:31:08 +01:00
dgtlmoon
8f321139fd UI - 'Request body' section disappears after switching from 'Playwright' to 'System settings default' and back on 'Request' tab - Fixed #1449 2023-11-28 14:01:15 +01:00
dgtlmoon
7fdae82e46 Browser Steps - Adding validation for "Click X,Y" step 2023-11-28 12:36:15 +01:00
dgtlmoon
bbc18d8e80 API - Make sure the watch "is viewed" attribute is correctly represented in the API output (#2009) 2023-11-28 11:42:08 +01:00
dgtlmoon
d8ee5472f1 Update playwright fetcher library and API calls 2023-11-28 11:20:06 +01:00
dgtlmoon
8fd57280b7 Testing - Improve PDF text change detection tests (#1992) 2023-11-20 15:18:46 +01:00
dgtlmoon
0285d00f13 UI - Clicking the "[Diff]" link should take you to the difference starting at the relative time to when you last viewed the difference page (#1989) 2023-11-17 17:21:26 +01:00
dgtlmoon
f7f98945a2 Visual Selector - xPath handling misc fixes (#1976) 2023-11-13 21:23:43 +01:00
dgtlmoon
5e2049c538 Fix build issue 2023-11-13 17:02:27 +01:00
Constantin Hong
26931e0167 feature: Support XPath2.0 to 3.1 (#1774) 2023-11-13 16:42:21 +01:00
dgtlmoon
5229094e44 New functionanlity - Selectable browser / ability to add extra browser connections (good for using "scraping browsers"/ etc) (#1943) 2023-11-13 16:39:11 +01:00
dgtlmoon
5a306aa78c API/UI - Button to regenerate API key (#1975 / #1967) 2023-11-13 16:26:50 +01:00
dgtlmoon
c8dcc072c8 Code refactor for fetchers (#1941) 2023-11-13 10:42:56 +01:00
dgtlmoon
7c97a5a403 0.45.7.3 2023-11-12 12:05:54 +01:00
dgtlmoon
7dd967be8e Build - update docker container cache setup 2023-11-12 10:06:46 +01:00
dgtlmoon
3607d15185 0.45.7.2 2023-11-12 00:29:35 +01:00
dgtlmoon
3382b4cb3f UI - Cleanup fonts better display in firefox, request CSS according to version (#1968) 2023-11-12 00:29:15 +01:00
Marcelo Alencar
5f030d3668 Revert "Build - Add piwheels support for ARMv6 and ARMv7 machines (rPi etc) (#1814)" (#1964) 2023-11-11 20:48:34 +01:00
dgtlmoon
06975d6d8f 0.45.7.1 2023-11-11 20:42:16 +01:00
dgtlmoon
f58e5b7f19 Build: python libraries - pinning more libraries (#1962) 2023-11-11 20:41:21 +01:00
dgtlmoon
e50eff8e35 Build: python libraries - eventlet + dnspython dep problems were fixed (#1963) 2023-11-11 20:16:09 +01:00
dgtlmoon
07a853ce59 Pip builder - ignore proxy test data if it exists 2023-11-10 17:50:09 +01:00
dgtlmoon
80f8d23309 0.45.7 2023-11-10 17:39:49 +01:00
dgtlmoon
9f41d15908 UI - Fixing issue where search box JS interfered with page render when logged out 2023-11-10 17:38:04 +01:00
dgtlmoon
89797dfe02 0.45.6 2023-11-10 17:32:21 +01:00
dgtlmoon
c905652780 UI - Adding support-us widget <3 (#1956) 2023-11-10 17:31:00 +01:00
dgtlmoon
99246d3e6d Visual Selector - Small fix, Improving elements fetcher reliability (#1947) 2023-11-09 19:13:18 +01:00
dgtlmoon
f9f69bf0dd Update README.md - Adding import information 2023-11-08 11:55:02 +01:00
dgtlmoon
68efb25e9b Upgrade playwright browser library (#1942) 2023-11-07 16:33:29 +01:00
dgtlmoon
70606ab05d Update docker-compose.yml - playwright version should be the same as in the automated tests 2023-11-06 22:33:22 +01:00
Jai Gupta
d3c8386874 Import - Improved Wachete Excel XLS import support for "dynamic wachet" (sets correct state of using chrome browser or not) column (#1934) 2023-11-06 12:50:27 +01:00
dgtlmoon
47103d7f3d Refactor Excel / wachete import, extend tests (#1931) 2023-11-03 15:43:57 +01:00
dgtlmoon
03c671bfff Build - Upgrading pip packages (#1915) 2023-11-01 18:47:12 +01:00
dgtlmoon
e209d9fba0 Ability to Import from Wachete XLSX (or any XLSX) - Wachete alternative made easy (#1921) 2023-11-01 15:36:49 +01:00
dgtlmoon
3b43da35ec Docker build - upgrade image to "bookworm" debian version - fix glibc mismatch (#1918) 2023-10-31 10:31:34 +01:00
dgtlmoon
a0665e1f18 Fetcher - experimental puppeteer fetch - dont rewrite the proxy protocol (fixes socks5 bug) 2023-10-30 16:08:31 +01:00
dgtlmoon
9ffe7e0eaf Nice format stats (comma sep) 2023-10-29 19:13:15 +01:00
dgtlmoon
3e5671a3a2 Selenium fetcher - Test was on 4.14.1 but documentation was not, change both to 4 (#1912) 2023-10-29 11:43:27 +01:00
dgtlmoon
cd1aca9ee3 0.45.5 2023-10-28 20:20:24 +02:00
dgtlmoon
6a589e14f3 BrowserSteps - Wrong text taken from browser steps (#1911) 2023-10-28 20:19:51 +02:00
dgtlmoon
dbb76f3618 0.45.4 2023-10-28 16:48:10 +02:00
dgtlmoon
4ae27af511 Code cleanup - Browser Steps 2023-10-28 14:58:12 +02:00
dgtlmoon
e1860549dc Fetching - Browser Step enabled watches should also identify 404/non-200 status situations (#1907) 2023-10-28 14:37:42 +02:00
dgtlmoon
9765d56a23 Text Filters - "Extract Text" filter was not being error checked properly when using a RegEx (#1902) 2023-10-26 20:19:59 +02:00
dgtlmoon
349111eb35 Fetching/BrowserSteps - Going to a page was using slightly logic to the main way - make them use the same methods (#1890) 2023-10-26 20:19:22 +02:00
dgtlmoon
71e50569a0 UI - "With errors" tag/button should always show the current tag error count 2023-10-26 19:42:48 +02:00
Marcelo Alencar
c372942295 Build - Add piwheels support for ARMv6 and ARMv7 machines (rPi etc) (#1814) 2023-10-26 13:46:14 +02:00
Marcelo Alencar
0aef5483d9 Upgrade selenium to 4.14.0 (latest) (#1783) 2023-10-26 10:09:03 +02:00
dgtlmoon
c266c64b94 UI - Don't show search icon when logged out (#1896) 2023-10-25 13:31:33 +02:00
77 changed files with 2279 additions and 560 deletions

View File

@@ -96,8 +96,9 @@ jobs:
tags: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:dev,ghcr.io/${{ github.repository }}:dev
platforms: linux/amd64,linux/arm64,linux/arm/v6,linux/arm/v7,linux/arm/v8
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
cache-from: type=gha
cache-to: type=gha,mode=max
# Looks like this was disabled
# provenance: false
@@ -116,18 +117,11 @@ jobs:
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:latest
ghcr.io/dgtlmoon/changedetection.io:latest
platforms: linux/amd64,linux/arm64,linux/arm/v6,linux/arm/v7,linux/arm/v8
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
cache-from: type=gha
cache-to: type=gha,mode=max
# Looks like this was disabled
# provenance: false
- name: Image digest
run: echo step SHA ${{ steps.vars.outputs.sha_short }} tag ${{steps.vars.outputs.tag}} branch ${{steps.vars.outputs.branch}} digest ${{ steps.docker_build.outputs.digest }}
- name: Cache Docker layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-

View File

@@ -29,8 +29,11 @@ jobs:
docker network create changedet-network
# Selenium+browserless
docker run --network changedet-network -d --hostname selenium -p 4444:4444 --rm --shm-size="2g" selenium/standalone-chrome-debug:3.141.59
docker run --network changedet-network -d --hostname browserless -e "FUNCTION_BUILT_INS=[\"fs\",\"crypto\"]" -e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" --rm -p 3000:3000 --shm-size="2g" browserless/chrome:1.53-chrome-stable
docker run --network changedet-network -d --hostname selenium -p 4444:4444 --rm --shm-size="2g" selenium/standalone-chrome:4
docker run --network changedet-network -d --name browserless --hostname browserless -e "FUNCTION_BUILT_INS=[\"fs\",\"crypto\"]" -e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" --rm -p 3000:3000 --shm-size="2g" browserless/chrome:1.60-chrome-stable
# For accessing custom browser tests
docker run --network changedet-network -d --name browserless-custom-url --hostname browserless-custom-url -e "FUNCTION_BUILT_INS=[\"fs\",\"crypto\"]" -e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" --rm --shm-size="2g" browserless/chrome:1.60-chrome-stable
- name: Build changedetection.io container for testing
run: |
@@ -48,6 +51,7 @@ jobs:
run: |
# Unit tests
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_notification_diff'
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_watch_model'
# All tests
docker run --network changedet-network test-changedetectionio bash -c 'cd changedetectionio && ./run_basic_tests.sh'
@@ -86,6 +90,12 @@ jobs:
# And again with PLAYWRIGHT_DRIVER_URL=..
cd ..
- name: Test custom browser URL
run: |
cd changedetectionio
./run_custom_browser_url_tests.sh
cd ..
- name: Test changedetection.io container starts+runs basically without error
run: |
docker run -p 5556:5000 -d test-changedetectionio

View File

@@ -1,5 +1,5 @@
# pip dependencies install stage
FROM python:3.11-slim-bullseye as builder
FROM python:3.11-slim-bookworm as builder
# See `cryptography` pin comment in requirements.txt
ARG CRYPTOGRAPHY_DONT_BUILD_RUST=1
@@ -25,14 +25,13 @@ RUN pip install --target=/dependencies -r /requirements.txt
# Playwright is an alternative to Selenium
# Excluded this package from requirements.txt to prevent arm/v6 and arm/v7 builds from failing
# https://github.com/dgtlmoon/changedetection.io/pull/1067 also musl/alpine (not supported)
RUN pip install --target=/dependencies playwright~=1.27.1 \
RUN pip install --target=/dependencies playwright~=1.40 \
|| echo "WARN: Failed to install Playwright. The application can still run, but the Playwright option will be disabled."
# Final image stage
FROM python:3.11-slim-bullseye
FROM python:3.11-slim-bookworm
RUN apt-get update && apt-get install -y --no-install-recommends \
libssl1.1 \
libxslt1.1 \
# For pdftohtml
poppler-utils \

View File

@@ -16,3 +16,4 @@ global-exclude venv
global-exclude test-datastore
global-exclude changedetection.io*dist-info
global-exclude changedetectionio/tests/proxy_socks5/test-datastore

View File

@@ -232,6 +232,13 @@ See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configura
Raspberry Pi and linux/arm/v6 linux/arm/v7 arm64 devices are supported! See the wiki for [details](https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver)
## Import support
Easily [import your list of websites to watch for changes in Excel .xslx file format](https://changedetection.io/tutorial/how-import-your-website-change-detection-lists-excel), or paste in lists of website URLs as plaintext.
Excel import is recommended - that way you can better organise tags/groups of websites and other features.
## API Support
Supports managing the website watch list [via our API](https://changedetection.io/docs/api_v1/index.html)
@@ -261,3 +268,7 @@ I offer commercial support, this software is depended on by network security, ae
[license-shield]: https://img.shields.io/github/license/dgtlmoon/changedetection.io.svg?style=for-the-badge
[release-link]: https://github.com/dgtlmoon/changedetection.io/releases
[docker-link]: https://hub.docker.com/r/dgtlmoon/changedetection.io
## Third-party licenses
changedetectionio.html_tools.elementpath_tostring: Copyright (c), 2018-2021, SISSA (Scuola Internazionale Superiore di Studi Avanzati), Licensed under [MIT license](https://github.com/sissaschool/elementpath/blob/master/LICENSE)

View File

@@ -38,7 +38,7 @@ from flask_paginate import Pagination, get_page_parameter
from changedetectionio import html_tools
from changedetectionio.api import api_v1
__version__ = '0.45.3'
__version__ = '0.45.8'
from changedetectionio.store import BASE_URL_NOT_SET_TEXT
@@ -105,6 +105,10 @@ def get_darkmode_state():
css_dark_mode = request.cookies.get('css_dark_mode', 'false')
return 'true' if css_dark_mode and strtobool(css_dark_mode) else 'false'
@app.template_global()
def get_css_version():
return __version__
# We use the whole watch object from the store/JSON so we can see if there's some related status in terms of a thread
# running or something similar.
@app.template_filter('format_last_checked_time')
@@ -236,6 +240,10 @@ def changedetection_app(config=None, datastore_o=None):
watch_api.add_resource(api_v1.Watch, '/api/v1/watch/<string:uuid>',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(api_v1.Import,
'/api/v1/import',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(api_v1.SystemInfo, '/api/v1/systeminfo',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
@@ -422,11 +430,12 @@ def changedetection_app(config=None, datastore_o=None):
for uuid, watch in datastore.data['watching'].items():
if with_errors and not watch.get('last_error'):
continue
if watch.get('last_error'):
errored_count += 1
if limit_tag and not limit_tag in watch['tags']:
continue
if watch.get('last_error'):
errored_count += 1
if search_q:
if (watch.get('title') and search_q in watch.get('title').lower()) or search_q in watch.get('url', '').lower():
sorted_watches.append(watch)
@@ -609,6 +618,8 @@ def changedetection_app(config=None, datastore_o=None):
# For the form widget tag uuid lookup
form.tags.datastore = datastore # in _value
for p in datastore.extra_browsers:
form.fetch_backend.choices.append(p)
form.fetch_backend.choices.append(("system", 'System settings default'))
@@ -709,7 +720,7 @@ def changedetection_app(config=None, datastore_o=None):
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver':
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
# Only works reliably with Playwright
@@ -814,6 +825,16 @@ def changedetection_app(config=None, datastore_o=None):
return output
@app.route("/settings/reset-api-key", methods=['GET'])
@login_optionally_required
def settings_reset_api_key():
import secrets
secret = secrets.token_hex(16)
datastore.data['settings']['application']['api_access_token'] = secret
datastore.needs_write_urgent = True
flash("API Key was regenerated.")
return redirect(url_for('settings_page')+'#api')
@app.route("/import", methods=['GET', "POST"])
@login_optionally_required
def import_page():
@@ -821,6 +842,7 @@ def changedetection_app(config=None, datastore_o=None):
from . import forms
if request.method == 'POST':
from .importer import import_url_list, import_distill_io_json
# URL List import
@@ -844,11 +866,32 @@ def changedetection_app(config=None, datastore_o=None):
for uuid in d_importer.new_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid, 'skip_when_checksum_same': True}))
# XLSX importer
if request.files and request.files.get('xlsx_file'):
file = request.files['xlsx_file']
from .importer import import_xlsx_wachete, import_xlsx_custom
if request.values.get('file_mapping') == 'wachete':
w_importer = import_xlsx_wachete()
w_importer.run(data=file, flash=flash, datastore=datastore)
else:
w_importer = import_xlsx_custom()
# Building mapping of col # to col # type
map = {}
for i in range(10):
c = request.values.get(f"custom_xlsx[col_{i}]")
v = request.values.get(f"custom_xlsx[col_type_{i}]")
if c and v:
map[int(c)] = v
w_importer.import_profile = map
w_importer.run(data=file, flash=flash, datastore=datastore)
for uuid in w_importer.new_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid, 'skip_when_checksum_same': True}))
form = forms.importForm(formdata=request.form if request.method == 'POST' else None,
# data=default,
)
# Could be some remaining, or we could be on GET
form = forms.importForm(formdata=request.form if request.method == 'POST' else None)
output = render_template("import.html",
form=form,
import_url_list_remaining="\n".join(remaining_urls),
@@ -922,7 +965,7 @@ def changedetection_app(config=None, datastore_o=None):
# Read as binary and force decode as UTF-8
# Windows may fail decode in python if we just use 'r' mode (chardet decode exception)
from_version = request.args.get('from_version')
from_version_index = -2 # second newest
from_version_index = -2 # second newest
if from_version and from_version in dates:
from_version_index = dates.index(from_version)
else:
@@ -931,7 +974,7 @@ def changedetection_app(config=None, datastore_o=None):
try:
from_version_file_contents = watch.get_history_snapshot(dates[from_version_index])
except Exception as e:
from_version_file_contents = "Unable to read to-version at index{}.\n".format(dates[from_version_index])
from_version_file_contents = f"Unable to read to-version at index {dates[from_version_index]}.\n"
to_version = request.args.get('to_version')
to_version_index = -1
@@ -950,7 +993,7 @@ def changedetection_app(config=None, datastore_o=None):
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver':
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
password_enabled_and_share_is_off = False
@@ -1004,7 +1047,7 @@ def changedetection_app(config=None, datastore_o=None):
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver':
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
# Never requested successfully, but we detected a fetch error
@@ -1185,8 +1228,7 @@ def changedetection_app(config=None, datastore_o=None):
# These files should be in our subdirectory
try:
# set nocache, set content-type
watch_dir = datastore_o.datastore_path + "/" + filename
response = make_response(send_from_directory(filename="elements.json", directory=watch_dir, path=watch_dir + "/elements.json"))
response = make_response(send_from_directory(os.path.join(datastore_o.datastore_path, filename), "elements.json"))
response.headers['Content-type'] = 'application/json'
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'

View File

@@ -76,7 +76,7 @@ class Watch(Resource):
# Properties are not returned as a JSON, so add the required props manually
watch['history_n'] = watch.history_n
watch['last_changed'] = watch.last_changed
watch['viewed'] = watch.viewed
return watch
@auth.check_token
@@ -280,11 +280,14 @@ class CreateWatch(Resource):
if tag_limit and not any(v.get('title').lower() == tag_limit for k, v in tags.items()):
continue
list[uuid] = {'url': watch['url'],
'title': watch['title'],
'last_checked': watch['last_checked'],
'last_changed': watch.last_changed,
'last_error': watch['last_error']}
list[uuid] = {
'last_changed': watch.last_changed,
'last_checked': watch['last_checked'],
'last_error': watch['last_error'],
'title': watch['title'],
'url': watch['url'],
'viewed': watch.viewed
}
if request.args.get('recheck_all'):
for uuid in self.datastore.data['watching'].keys():
@@ -293,6 +296,62 @@ class CreateWatch(Resource):
return list, 200
class Import(Resource):
def __init__(self, **kwargs):
# datastore is a black box dependency
self.datastore = kwargs['datastore']
@auth.check_token
def post(self):
"""
@api {post} /api/v1/import - Import a list of watched URLs
@apiDescription Accepts a line-feed separated list of URLs to import, additionally with ?tag_uuids=(tag id), ?tag=(name), ?proxy={key}, ?dedupe=true (default true) one URL per line.
@apiExample {curl} Example usage:
curl http://localhost:5000/api/v1/import --data-binary @list-of-sites.txt -H"x-api-key:8a111a21bc2f8f1dd9b9353bbd46049a"
@apiName Import
@apiGroup Watch
@apiSuccess (200) {List} OK List of watch UUIDs added
@apiSuccess (500) {String} ERR Some other error
"""
extras = {}
if request.args.get('proxy'):
plist = self.datastore.proxy_list
if not request.args.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
else:
extras['proxy'] = request.args.get('proxy')
dedupe = strtobool(request.args.get('dedupe', 'true'))
tags = request.args.get('tag')
tag_uuids = request.args.get('tag_uuids')
if tag_uuids:
tag_uuids = tag_uuids.split(',')
urls = request.get_data().decode('utf8').splitlines()
added = []
allow_simplehost = not strtobool(os.getenv('BLOCK_SIMPLEHOSTS', 'False'))
for url in urls:
url = url.strip()
if not len(url):
continue
# If hosts that only contain alphanumerics are allowed ("localhost" for example)
if not validators.url(url, simple_host=allow_simplehost):
return f"Invalid or unsupported URL - {url}", 400
if dedupe and self.datastore.url_exists(url):
continue
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags, tag_uuids=tag_uuids)
added.append(new_uuid)
return added
class SystemInfo(Resource):
def __init__(self, **kwargs):
# datastore is a black box dependency

View File

@@ -152,7 +152,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
@browser_steps_blueprint.route("/browsersteps_update", methods=['POST'])
def browsersteps_ui_update():
import base64
import playwright._impl._api_types
import playwright._impl._errors
global browsersteps_sessions
from changedetectionio.blueprint.browser_steps import browser_steps

View File

@@ -110,7 +110,7 @@ class steppable_browser_interface():
self.page.click(selector=selector, timeout=30 * 1000, delay=randint(200, 500))
def action_click_element_if_exists(self, selector, value):
import playwright._impl._api_types as _api_types
import playwright._impl._errors as _api_types
print("Clicking element if exists")
if not len(selector.strip()):
return
@@ -123,6 +123,9 @@ class steppable_browser_interface():
return
def action_click_x_y(self, selector, value):
if not re.match(r'^\s?\d+\s?,\s?\d+\s?$', value):
raise Exception("'Click X,Y' step should be in the format of '100 , 90'")
x, y = value.strip().split(',')
x = int(float(x.strip()))
y = int(float(y.strip()))

View File

@@ -40,8 +40,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
contents = ''
now = time.time()
try:
update_handler = text_json_diff.perform_site_check(datastore=datastore)
changed_detected, update_obj, contents = update_handler.run(uuid, preferred_proxy=preferred_proxy, skip_when_checksum_same=False)
update_handler = text_json_diff.perform_site_check(datastore=datastore, watch_uuid=uuid)
update_handler.call_browser()
# title, size is len contents not len xfer
except content_fetcher.Non200ErrorCodeReceived as e:
if e.status_code == 404:

View File

@@ -69,11 +69,12 @@ xpath://body/div/span[contains(@class, 'example-class')]",
{% endif %}
</ul>
</li>
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash,
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash. To specify XPath to be used explicitly or the XPath rule starts with an XPath function: Prefix with <code>xpath:</code>
<ul>
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath://*[contains(@class, 'sametext')]</code>, <a
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath:count(//*[contains(@class, 'sametext')])</code>, <a
href="http://xpather.com/" target="new">test your XPath here</a></li>
<li>Example: Get all titles from an RSS feed <code>//title/text()</code></li>
<li>To use XPath1.0: Prefix with <code>xpath1:</code></li>
</ul>
</li>
</ul>

View File

@@ -96,6 +96,7 @@ class Fetcher():
content = None
error = None
fetcher_description = "No description"
browser_connection_url = None
headers = {}
status_code = None
webdriver_js_execute_code = None
@@ -159,9 +160,19 @@ class Fetcher():
"""
return {k.lower(): v for k, v in self.headers.items()}
def browser_steps_get_valid_steps(self):
if self.browser_steps is not None and len(self.browser_steps):
valid_steps = filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one' and s['operation'] != 'Goto site'),
self.browser_steps)
return valid_steps
return None
def iterate_browser_steps(self):
from changedetectionio.blueprint.browser_steps.browser_steps import steppable_browser_interface
from playwright._impl._api_types import TimeoutError
from playwright._impl._errors import TimeoutError
from jinja2 import Environment
jinja2_env = Environment(extensions=['jinja2_time.TimeExtension'])
@@ -170,10 +181,7 @@ class Fetcher():
if self.browser_steps is not None and len(self.browser_steps):
interface = steppable_browser_interface()
interface.page = self.page
valid_steps = filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one' and s['operation'] != 'Goto site'),
self.browser_steps)
valid_steps = self.browser_steps_get_valid_steps()
for step in valid_steps:
step_n += 1
@@ -244,14 +252,16 @@ class base_html_playwright(Fetcher):
proxy = None
def __init__(self, proxy_override=None):
def __init__(self, proxy_override=None, browser_connection_url=None):
super().__init__()
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.browser_type = os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').strip('"')
self.command_executor = os.getenv(
"PLAYWRIGHT_DRIVER_URL",
'ws://playwright-chrome:3000'
).strip('"')
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
if not browser_connection_url:
self.browser_connection_url = os.getenv("PLAYWRIGHT_DRIVER_URL", 'ws://playwright-chrome:3000').strip('"')
else:
self.browser_connection_url = browser_connection_url
# If any proxy settings are enabled, then we should setup the proxy object
proxy_args = {}
@@ -326,9 +336,8 @@ class base_html_playwright(Fetcher):
# Remove username/password if it exists in the URL or you will receive "ERR_NO_SUPPORTED_PROXIES" error
# Actual authentication handled by Puppeteer/node
o = urlparse(self.proxy.get('server'))
# Remove scheme, socks5:// doesnt always work and it will autodetect anyway
proxy_url = urllib.parse.quote(o._replace(netloc="{}:{}".format(o.hostname, o.port)).geturl().replace(f"{o.scheme}://", '', 1))
browserless_function_url = f"{browserless_function_url}&--proxy-server={proxy_url}&dumpio=true"
proxy_url = urllib.parse.quote(o._replace(netloc="{}:{}".format(o.hostname, o.port)).geturl())
browserless_function_url = f"{browserless_function_url}&--proxy-server={proxy_url}"
try:
amp = '&' if '?' in browserless_function_url else '?'
@@ -413,11 +422,7 @@ class base_html_playwright(Fetcher):
is_binary=False):
# For now, USE_EXPERIMENTAL_PUPPETEER_FETCH is not supported by watches with BrowserSteps (for now!)
has_browser_steps = self.browser_steps and list(filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one' and s['operation'] != 'Goto site'),
self.browser_steps))
if not has_browser_steps and os.getenv('USE_EXPERIMENTAL_PUPPETEER_FETCH'):
if not self.browser_steps and os.getenv('USE_EXPERIMENTAL_PUPPETEER_FETCH'):
if strtobool(os.getenv('USE_EXPERIMENTAL_PUPPETEER_FETCH')):
# Temporary backup solution until we rewrite the playwright code
return self.run_fetch_browserless_puppeteer(
@@ -431,7 +436,7 @@ class base_html_playwright(Fetcher):
is_binary)
from playwright.sync_api import sync_playwright
import playwright._impl._api_types
import playwright._impl._errors
self.delete_browser_steps_screenshots()
response = None
@@ -442,7 +447,7 @@ class base_html_playwright(Fetcher):
# Seemed to cause a connection Exception even tho I can see it connect
# self.browser = browser_type.connect(self.command_executor, timeout=timeout*1000)
# 60,000 connection timeout only
browser = browser_type.connect_over_cdp(self.command_executor, timeout=60000)
browser = browser_type.connect_over_cdp(self.browser_connection_url, timeout=60000)
# SOCKS5 with authentication is not supported (yet)
# https://github.com/microsoft/playwright/issues/10567
@@ -472,13 +477,19 @@ class base_html_playwright(Fetcher):
browsersteps_interface = steppable_browser_interface()
browsersteps_interface.page = self.page
try:
response = browsersteps_interface.action_goto_url(value=url)
response = browsersteps_interface.action_goto_url(value=url)
self.headers = response.all_headers()
if response is None:
context.close()
browser.close()
print("Content Fetcher > Response object was none")
raise EmptyReply(url=url, status_code=None)
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._api_types.TimeoutError as e:
except playwright._impl._errors.TimeoutError as e:
context.close()
browser.close()
# This can be ok, we will try to grab what we could retrieve
@@ -489,31 +500,30 @@ class base_html_playwright(Fetcher):
browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if response is None:
context.close()
browser.close()
print("Content Fetcher > Response object was none")
raise EmptyReply(url=url, status_code=None)
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
self.page.wait_for_timeout(extra_wait * 1000)
# Run Browser Steps here
self.iterate_browser_steps()
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
self.page.wait_for_timeout(extra_wait * 1000)
self.content = self.page.content()
self.status_code = response.status
if self.status_code != 200 and not ignore_status_codes:
screenshot=self.page.screenshot(type='jpeg', full_page=True,
quality=int(os.getenv("PLAYWRIGHT_SCREENSHOT_QUALITY", 72)))
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if len(self.page.content().strip()) == 0:
context.close()
browser.close()
print("Content Fetcher > Content was empty")
raise EmptyReply(url=url, status_code=response.status)
self.status_code = response.status
self.headers = response.all_headers()
# Run Browser Steps here
if self.browser_steps_get_valid_steps():
self.iterate_browser_steps()
self.page.wait_for_timeout(extra_wait * 1000)
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
@@ -525,6 +535,7 @@ class base_html_playwright(Fetcher):
"async () => {" + self.xpath_element_js.replace('%ELEMENTS%', visualselector_xpath_selectors) + "}")
self.instock_data = self.page.evaluate("async () => {" + self.instock_data_js + "}")
self.content = self.page.content()
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
# JPEG is better here because the screenshots can be very very large
@@ -539,7 +550,7 @@ class base_html_playwright(Fetcher):
except Exception as e:
context.close()
browser.close()
raise ScreenshotUnavailable(url=url, status_code=None)
raise ScreenshotUnavailable(url=url, status_code=response.status_code)
context.close()
browser.close()
@@ -551,8 +562,6 @@ class base_html_webdriver(Fetcher):
else:
fetcher_description = "WebDriver Chrome/Javascript"
command_executor = ''
# Configs for Proxy setup
# In the ENV vars, is prefixed with "webdriver_", so it is for example "webdriver_sslProxy"
selenium_proxy_settings_mappings = ['proxyType', 'ftpProxy', 'httpProxy', 'noProxy',
@@ -560,12 +569,15 @@ class base_html_webdriver(Fetcher):
'socksProxy', 'socksVersion', 'socksUsername', 'socksPassword']
proxy = None
def __init__(self, proxy_override=None):
def __init__(self, proxy_override=None, browser_connection_url=None):
super().__init__()
from selenium.webdriver.common.proxy import Proxy as SeleniumProxy
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.command_executor = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub').strip('"')
if not browser_connection_url:
self.browser_connection_url = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub').strip('"')
else:
self.browser_connection_url = browser_connection_url
# If any proxy settings are enabled, then we should setup the proxy object
proxy_args = {}
@@ -598,14 +610,17 @@ class base_html_webdriver(Fetcher):
is_binary=False):
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.common.exceptions import WebDriverException
# request_body, request_method unused for now, until some magic in the future happens.
options = ChromeOptions()
if self.proxy:
options.proxy = self.proxy
self.driver = webdriver.Remote(
command_executor=self.command_executor,
desired_capabilities=DesiredCapabilities.CHROME,
proxy=self.proxy)
command_executor=self.browser_connection_url,
options=options)
try:
self.driver.get(url)
@@ -637,11 +652,11 @@ class base_html_webdriver(Fetcher):
# Does the connection to the webdriver work? run a test connection.
def is_ready(self):
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options as ChromeOptions
self.driver = webdriver.Remote(
command_executor=self.command_executor,
desired_capabilities=DesiredCapabilities.CHROME)
options=ChromeOptions())
# driver.quit() seems to cause better exceptions
self.quit()
@@ -659,8 +674,10 @@ class base_html_webdriver(Fetcher):
class html_requests(Fetcher):
fetcher_description = "Basic fast Plaintext/HTTP Client"
def __init__(self, proxy_override=None):
def __init__(self, proxy_override=None, browser_connection_url=None):
super().__init__()
self.proxy_override = proxy_override
# browser_connection_url is none because its always 'launched locally'
def run(self,
url,

View File

@@ -15,14 +15,20 @@ from wtforms import (
validators,
widgets
)
from flask_wtf.file import FileField, FileAllowed
from wtforms.fields import FieldList
from wtforms.validators import ValidationError
from validators.url import url as url_validator
# default
# each select <option data-enabled="enabled-0-0"
from changedetectionio.blueprint.browser_steps.browser_steps import browser_step_ui_config
from changedetectionio import content_fetcher
from changedetectionio import content_fetcher, html_tools
from changedetectionio.notification import (
valid_notification_formats,
)
@@ -40,7 +46,7 @@ valid_method = {
}
default_method = 'GET'
allow_simplehost = not strtobool(os.getenv('BLOCK_SIMPLEHOSTS', 'False'))
class StringListField(StringField):
widget = widgets.TextArea()
@@ -162,7 +168,9 @@ class ValidateContentFetcherIsReady(object):
def __call__(self, form, field):
import urllib3.exceptions
from changedetectionio import content_fetcher
return
# AttributeError: module 'changedetectionio.content_fetcher' has no attribute 'extra_browser_unlocked<>ASDF213r123r'
# Better would be a radiohandler that keeps a reference to each class
if field.data is not None and field.data != 'system':
klass = getattr(content_fetcher, field.data)
@@ -260,19 +268,23 @@ class validateURL(object):
self.message = message
def __call__(self, form, field):
import validators
# If hosts that only contain alphanumerics are allowed ("localhost" for example)
allow_simplehost = not strtobool(os.getenv('BLOCK_SIMPLEHOSTS', 'False'))
try:
validators.url(field.data.strip(), simple_host=allow_simplehost)
except validators.ValidationFailure:
message = field.gettext('\'%s\' is not a valid URL.' % (field.data.strip()))
raise ValidationError(message)
# This should raise a ValidationError() or not
validate_url(field.data)
from .model.Watch import is_safe_url
if not is_safe_url(field.data):
raise ValidationError('Watch protocol is not permitted by SAFE_PROTOCOL_REGEX')
def validate_url(test_url):
# If hosts that only contain alphanumerics are allowed ("localhost" for example)
try:
url_validator(test_url, simple_host=allow_simplehost)
except validators.ValidationError:
#@todo check for xss
message = f"'{test_url}' is not a valid URL."
# This should be wtforms.validators.
raise ValidationError(message)
from .model.Watch import is_safe_url
if not is_safe_url(test_url):
# This should be wtforms.validators.
raise ValidationError('Watch protocol is not permitted by SAFE_PROTOCOL_REGEX or incorrect URL format')
class ValidateListRegex(object):
"""
@@ -284,11 +296,10 @@ class ValidateListRegex(object):
def __call__(self, form, field):
for line in field.data:
if line[0] == '/' and line[-1] == '/':
# Because internally we dont wrap in /
line = line.strip('/')
if re.search(html_tools.PERL_STYLE_REGEX, line, re.IGNORECASE):
try:
re.compile(line)
regex = html_tools.perl_style_slash_enclosed_regex_to_options(line)
re.compile(regex)
except re.error:
message = field.gettext('RegEx \'%s\' is not a valid regular expression.')
raise ValidationError(message % (line))
@@ -317,11 +328,30 @@ class ValidateCSSJSONXPATHInput(object):
return
# Does it look like XPath?
if line.strip()[0] == '/':
if line.strip()[0] == '/' or line.strip().startswith('xpath:'):
if not self.allow_xpath:
raise ValidationError("XPath not permitted in this field!")
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
tree = html.fromstring("<html></html>")
line = line.replace('xpath:', '')
try:
elementpath.select(tree, line.strip(), parser=XPath3Parser)
except elementpath.ElementPathError as e:
message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
raise ValidationError(message % (line, str(e)))
except:
raise ValidationError("A system-error occurred when validating your XPath expression")
if line.strip().startswith('xpath1:'):
if not self.allow_xpath:
raise ValidationError("XPath not permitted in this field!")
from lxml import etree, html
tree = html.fromstring("<html></html>")
line = re.sub(r'^xpath1:', '', line)
try:
tree.xpath(line.strip())
@@ -398,6 +428,9 @@ class importForm(Form):
from . import processors
processor = RadioField(u'Processor', choices=processors.available_processors(), default="text_json_diff")
urls = TextAreaField('URLs')
xlsx_file = FileField('Upload .xlsx file', validators=[FileAllowed(['xlsx'], 'Must be .xlsx file!')])
file_mapping = SelectField('File mapping', [validators.DataRequired()], choices={('wachete', 'Wachete mapping'), ('custom','Custom mapping')})
class SingleBrowserStep(Form):
@@ -484,6 +517,12 @@ class SingleExtraProxy(Form):
proxy_url = StringField('Proxy URL', [validators.Optional()], render_kw={"placeholder": "socks5:// or regular proxy http://user:pass@...:3128", "size":50})
# @todo do the validation here instead
class SingleExtraBrowser(Form):
browser_name = StringField('Name', [validators.Optional()], render_kw={"placeholder": "Name"})
browser_connection_url = StringField('Browser connection URL', [validators.Optional()], render_kw={"placeholder": "wss://brightdata... wss://oxylabs etc", "size":50})
# @todo do the validation here instead
# datastore.data['settings']['requests']..
class globalSettingsRequestForm(Form):
time_between_check = FormField(TimeBetweenCheckForm)
@@ -492,6 +531,7 @@ class globalSettingsRequestForm(Form):
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0, message="Should contain zero or more seconds")])
extra_proxies = FieldList(FormField(SingleExtraProxy), min_entries=5)
extra_browsers = FieldList(FormField(SingleExtraBrowser), min_entries=5)
def validate_extra_proxies(self, extra_validators=None):
for e in self.data['extra_proxies']:

View File

@@ -69,10 +69,89 @@ def element_removal(selectors: List[str], html_content):
selector = ",".join(selectors)
return subtractive_css_selector(selector, html_content)
def elementpath_tostring(obj):
"""
change elementpath.select results to string type
# The MIT License (MIT), Copyright (c), 2018-2021, SISSA (Scuola Internazionale Superiore di Studi Avanzati)
# https://github.com/sissaschool/elementpath/blob/dfcc2fd3d6011b16e02bf30459a7924f547b47d0/elementpath/xpath_tokens.py#L1038
"""
import elementpath
from decimal import Decimal
import math
if obj is None:
return ''
# https://elementpath.readthedocs.io/en/latest/xpath_api.html#elementpath.select
elif isinstance(obj, elementpath.XPathNode):
return obj.string_value
elif isinstance(obj, bool):
return 'true' if obj else 'false'
elif isinstance(obj, Decimal):
value = format(obj, 'f')
if '.' in value:
return value.rstrip('0').rstrip('.')
return value
elif isinstance(obj, float):
if math.isnan(obj):
return 'NaN'
elif math.isinf(obj):
return str(obj).upper()
value = str(obj)
if '.' in value:
value = value.rstrip('0').rstrip('.')
if '+' in value:
value = value.replace('+', '')
if 'e' in value:
return value.upper()
return value
return str(obj)
# Return str Utf-8 of matched rules
def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_rss=False):
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
parser = etree.HTMLParser()
if is_rss:
# So that we can keep CDATA for cdata_in_document_to_text() to process
parser = etree.XMLParser(strip_cdata=False)
tree = html.fromstring(bytes(html_content, encoding='utf-8'), parser=parser)
html_block = ""
r = elementpath.select(tree, xpath_filter.strip(), namespaces={'re': 'http://exslt.org/regular-expressions'}, parser=XPath3Parser)
#@note: //title/text() wont work where <title>CDATA..
if type(r) != list:
r = [r]
for element in r:
# When there's more than 1 match, then add the suffix to separate each line
# And where the matched result doesn't include something that will cause Inscriptis to add a newline
# (This way each 'match' reliably has a new-line in the diff)
# Divs are converted to 4 whitespaces by inscriptis
if append_pretty_line_formatting and len(html_block) and (not hasattr( element, 'tag' ) or not element.tag in (['br', 'hr', 'div', 'p'])):
html_block += TEXT_FILTER_LIST_LINE_SUFFIX
if type(element) == str:
html_block += element
elif issubclass(type(element), etree._Element) or issubclass(type(element), etree._ElementTree):
html_block += etree.tostring(element, pretty_print=True).decode('utf-8')
else:
html_block += elementpath_tostring(element)
return html_block
# Return str Utf-8 of matched rules
# 'xpath1:'
def xpath1_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_rss=False):
from lxml import etree, html
parser = None
if is_rss:

View File

@@ -1,6 +1,9 @@
from abc import ABC, abstractmethod
import time
import validators
from wtforms import ValidationError
from changedetectionio.forms import validate_url
class Importer():
@@ -12,6 +15,7 @@ class Importer():
self.new_uuids = []
self.good = 0
self.remaining_data = []
self.import_profile = None
@abstractmethod
def run(self,
@@ -132,3 +136,167 @@ class import_distill_io_json(Importer):
good += 1
flash("{} Imported from Distill.io in {:.2f}s, {} Skipped.".format(len(self.new_uuids), time.time() - now, len(self.remaining_data)))
class import_xlsx_wachete(Importer):
def run(self,
data,
flash,
datastore,
):
good = 0
now = time.time()
self.new_uuids = []
from openpyxl import load_workbook
try:
wb = load_workbook(data)
except Exception as e:
# @todo correct except
flash("Unable to read export XLSX file, something wrong with the file?", 'error')
return
row_id = 2
for row in wb.active.iter_rows(min_row=row_id):
try:
extras = {}
data = {}
for cell in row:
if not cell.value:
continue
column_title = wb.active.cell(row=1, column=cell.column).value.strip().lower()
data[column_title] = cell.value
# Forced switch to webdriver/playwright/etc
dynamic_wachet = str(data.get('dynamic wachet', '')).strip().lower() # Convert bool to str to cover all cases
# libreoffice and others can have it as =FALSE() =TRUE(), or bool(true)
if 'true' in dynamic_wachet or dynamic_wachet == '1':
extras['fetch_backend'] = 'html_webdriver'
elif 'false' in dynamic_wachet or dynamic_wachet == '0':
extras['fetch_backend'] = 'html_requests'
if data.get('xpath'):
# @todo split by || ?
extras['include_filters'] = [data.get('xpath')]
if data.get('name'):
extras['title'] = data.get('name').strip()
if data.get('interval (min)'):
minutes = int(data.get('interval (min)'))
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)
extras['time_between_check'] = {'weeks': weeks, 'days': days, 'hours': hours, 'minutes': minutes, 'seconds': 0}
# At minimum a URL is required.
if data.get('url'):
try:
validate_url(data.get('url'))
except ValidationError as e:
print(">> import URL error", data.get('url'), str(e))
flash(f"Error processing row number {row_id}, URL value was incorrect, row was skipped.", 'error')
# Don't bother processing anything else on this row
continue
new_uuid = datastore.add_watch(url=data['url'].strip(),
extras=extras,
tag=data.get('folder'),
write_to_disk_now=False)
if new_uuid:
# Straight into the queue.
self.new_uuids.append(new_uuid)
good += 1
except Exception as e:
print(e)
flash(f"Error processing row number {row_id}, check all cell data types are correct, row was skipped.", 'error')
else:
row_id += 1
flash(
"{} imported from Wachete .xlsx in {:.2f}s".format(len(self.new_uuids), time.time() - now))
class import_xlsx_custom(Importer):
def run(self,
data,
flash,
datastore,
):
good = 0
now = time.time()
self.new_uuids = []
from openpyxl import load_workbook
try:
wb = load_workbook(data)
except Exception as e:
# @todo correct except
flash("Unable to read export XLSX file, something wrong with the file?", 'error')
return
# @todo cehck atleast 2 rows, same in other method
from .forms import validate_url
row_i = 1
try:
for row in wb.active.iter_rows():
url = None
tags = None
extras = {}
for cell in row:
if not self.import_profile.get(cell.col_idx):
continue
if not cell.value:
continue
cell_map = self.import_profile.get(cell.col_idx)
cell_val = str(cell.value).strip() # could be bool
if cell_map == 'url':
url = cell.value.strip()
try:
validate_url(url)
except ValidationError as e:
print(">> Import URL error", url, str(e))
flash(f"Error processing row number {row_i}, URL value was incorrect, row was skipped.", 'error')
# Don't bother processing anything else on this row
url = None
break
elif cell_map == 'tag':
tags = cell.value.strip()
elif cell_map == 'include_filters':
# @todo validate?
extras['include_filters'] = [cell.value.strip()]
elif cell_map == 'interval_minutes':
hours, minutes = divmod(int(cell_val), 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)
extras['time_between_check'] = {'weeks': weeks, 'days': days, 'hours': hours, 'minutes': minutes, 'seconds': 0}
else:
extras[cell_map] = cell_val
# At minimum a URL is required.
if url:
new_uuid = datastore.add_watch(url=url,
extras=extras,
tag=tags,
write_to_disk_now=False)
if new_uuid:
# Straight into the queue.
self.new_uuids.append(new_uuid)
good += 1
except Exception as e:
print(e)
flash(f"Error processing row number {row_i}, check all cell data types are correct, row was skipped.", 'error')
else:
row_i += 1
flash(
"{} imported from custom .xlsx in {:.2f}s".format(len(self.new_uuids), time.time() - now))

View File

@@ -16,6 +16,7 @@ class model(dict):
},
'requests': {
'extra_proxies': [], # Configurable extra proxies via the UI
'extra_browsers': [], # Configurable extra proxies via the UI
'jitter_seconds': 0,
'proxy': None, # Preferred proxy connection
'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None},

View File

@@ -19,6 +19,7 @@ from changedetectionio.notification import (
base_config = {
'body': None,
'browser_steps': [],
'browser_steps_last_error_step': None,
'check_unique_lines': False, # On change-detected, compare against all history if its something new
'check_count': 0,
@@ -112,7 +113,8 @@ class model(dict):
@property
def viewed(self):
if int(self['last_viewed']) >= int(self.newest_history_key) :
# Don't return viewed when last_viewed is 0 and newest_key is 0
if int(self['last_viewed']) and int(self['last_viewed']) >= int(self.newest_history_key) :
return True
return False
@@ -145,8 +147,14 @@ class model(dict):
flash(message, 'error')
return ''
if ready_url.startswith('source:'):
ready_url=ready_url.replace('source:', '')
return ready_url
@property
def is_source_type_url(self):
return self.get('url', '').startswith('source:')
@property
def get_fetch_backend(self):
"""
@@ -234,6 +242,14 @@ class model(dict):
fname = os.path.join(self.watch_data_dir, "history.txt")
return os.path.isfile(fname)
@property
def has_browser_steps(self):
has_browser_steps = self.get('browser_steps') and list(filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one' and s['operation'] != 'Goto site'),
self.get('browser_steps')))
return has_browser_steps
# Returns the newest key, but if theres only 1 record, then it's counted as not being new, so return 0.
@property
def newest_history_key(self):
@@ -247,6 +263,38 @@ class model(dict):
bump = self.history
return self.__newest_history_key
# Given an arbitrary timestamp, find the closest next key
# For example, last_viewed = 1000 so it should return the next 1001 timestamp
#
# used for the [diff] button so it can preset a smarter from_version
@property
def get_next_snapshot_key_to_last_viewed(self):
"""Unfortunately for now timestamp is stored as string key"""
keys = list(self.history.keys())
if not keys:
return None
last_viewed = int(self.get('last_viewed'))
prev_k = keys[0]
sorted_keys = sorted(keys, key=lambda x: int(x))
sorted_keys.reverse()
# When the 'last viewed' timestamp is greater than the newest snapshot, return second last
if last_viewed > int(sorted_keys[0]):
return sorted_keys[1]
for k in sorted_keys:
if int(k) < last_viewed:
if prev_k == sorted_keys[0]:
# Return the second last one so we dont recommend the same version compares itself
return sorted_keys[1]
return prev_k
prev_k = k
return keys[0]
def get_history_snapshot(self, timestamp):
import brotli
filepath = self.history[timestamp]

View File

@@ -46,6 +46,9 @@ from apprise.decorators import notify
@notify(on="puts")
def apprise_custom_api_call_wrapper(body, title, notify_type, *args, **kwargs):
import requests
from apprise.utils import parse_url as apprise_parse_url
from apprise.URLBase import URLBase
url = kwargs['meta'].get('url')
if url.startswith('post'):
@@ -68,16 +71,46 @@ def apprise_custom_api_call_wrapper(body, title, notify_type, *args, **kwargs):
url = url.replace('delete://', 'http://')
url = url.replace('deletes://', 'https://')
# Try to auto-guess if it's JSON
headers = {}
params = {}
auth = None
# Convert /foobar?+some-header=hello to proper header dictionary
results = apprise_parse_url(url)
if results:
# Add our headers that the user can potentially over-ride if they wish
# to to our returned result set and tidy entries by unquoting them
headers = {URLBase.unquote(x): URLBase.unquote(y)
for x, y in results['qsd+'].items()}
# https://github.com/caronc/apprise/wiki/Notify_Custom_JSON#get-parameter-manipulation
# In Apprise, it relies on prefixing each request arg with "-", because it uses say &method=update as a flag for apprise
# but here we are making straight requests, so we need todo convert this against apprise's logic
for k, v in results['qsd'].items():
if not k.strip('+-') in results['qsd+'].keys():
params[URLBase.unquote(k)] = URLBase.unquote(v)
# Determine Authentication
auth = ''
if results.get('user') and results.get('password'):
auth = (URLBase.unquote(results.get('user')), URLBase.unquote(results.get('user')))
elif results.get('user'):
auth = (URLBase.unquote(results.get('user')))
# Try to auto-guess if it's JSON
try:
json.loads(body)
headers = {'Content-Type': 'application/json; charset=utf-8'}
headers['Content-Type'] = 'application/json; charset=utf-8'
except ValueError as e:
pass
r(url, headers=headers, data=body)
r(results.get('url'),
headers=headers,
data=body,
params=params,
auth=auth
)
def process_notification(n_object, datastore):

View File

@@ -1,15 +1,122 @@
from abc import abstractmethod
import os
import hashlib
import re
from changedetectionio import content_fetcher
from copy import deepcopy
from distutils.util import strtobool
class difference_detection_processor():
browser_steps = None
datastore = None
fetcher = None
screenshot = None
watch = None
xpath_data = None
def __init__(self, *args, **kwargs):
def __init__(self, *args, datastore, watch_uuid, **kwargs):
super().__init__(*args, **kwargs)
self.datastore = datastore
self.watch = deepcopy(self.datastore.data['watching'].get(watch_uuid))
def call_browser(self):
# Protect against file:// access
if re.search(r'^file://', self.watch.get('url', '').strip(), re.IGNORECASE):
if not strtobool(os.getenv('ALLOW_FILE_URI', 'false')):
raise Exception(
"file:// type access is denied for security reasons."
)
url = self.watch.link
# Requests, playwright, other browser via wss:// etc, fetch_extra_something
prefer_fetch_backend = self.watch.get('fetch_backend', 'system')
# Proxy ID "key"
preferred_proxy_id = self.datastore.get_preferred_proxy_for_watch(uuid=self.watch.get('uuid'))
# Pluggable content self.fetcher
if not prefer_fetch_backend or prefer_fetch_backend == 'system':
prefer_fetch_backend = self.datastore.data['settings']['application'].get('fetch_backend')
# In the case that the preferred fetcher was a browser config with custom connection URL..
# @todo - on save watch, if its extra_browser_ then it should be obvious it will use playwright (like if its requests now..)
browser_connection_url = None
if prefer_fetch_backend.startswith('extra_browser_'):
(t, key) = prefer_fetch_backend.split('extra_browser_')
connection = list(
filter(lambda s: (s['browser_name'] == key), self.datastore.data['settings']['requests'].get('extra_browsers', [])))
if connection:
prefer_fetch_backend = 'base_html_playwright'
browser_connection_url = connection[0].get('browser_connection_url')
# Grab the right kind of 'fetcher', (playwright, requests, etc)
if hasattr(content_fetcher, prefer_fetch_backend):
fetcher_obj = getattr(content_fetcher, prefer_fetch_backend)
else:
# If the klass doesnt exist, just use a default
fetcher_obj = getattr(content_fetcher, "html_requests")
proxy_url = None
if preferred_proxy_id:
proxy_url = self.datastore.proxy_list.get(preferred_proxy_id).get('url')
print(f"Using proxy Key: {preferred_proxy_id} as Proxy URL {proxy_url}")
# Now call the fetcher (playwright/requests/etc) with arguments that only a fetcher would need.
# When browser_connection_url is None, it method should default to working out whats the best defaults (os env vars etc)
self.fetcher = fetcher_obj(proxy_override=proxy_url,
browser_connection_url=browser_connection_url
)
if self.watch.has_browser_steps:
self.fetcher.browser_steps = self.watch.get('browser_steps', [])
self.fetcher.browser_steps_screenshot_path = os.path.join(self.datastore.datastore_path, self.watch.get('uuid'))
# Tweak the base config with the per-watch ones
request_headers = self.watch.get('headers', [])
request_headers.update(self.datastore.get_all_base_headers())
request_headers.update(self.datastore.get_all_headers_in_textfile_for_watch(uuid=self.watch.get('uuid')))
# https://github.com/psf/requests/issues/4525
# Requests doesnt yet support brotli encoding, so don't put 'br' here, be totally sure that the user cannot
# do this by accident.
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
timeout = self.datastore.data['settings']['requests'].get('timeout')
request_body = self.watch.get('body')
request_method = self.watch.get('method')
ignore_status_codes = self.watch.get('ignore_status_codes', False)
# Configurable per-watch or global extra delay before extracting text (for webDriver types)
system_webdriver_delay = self.datastore.data['settings']['application'].get('webdriver_delay', None)
if self.watch.get('webdriver_delay'):
self.fetcher.render_extract_delay = self.watch.get('webdriver_delay')
elif system_webdriver_delay is not None:
self.fetcher.render_extract_delay = system_webdriver_delay
if self.watch.get('webdriver_js_execute_code') is not None and self.watch.get('webdriver_js_execute_code').strip():
self.fetcher.webdriver_js_execute_code = self.watch.get('webdriver_js_execute_code')
# Requests for PDF's, images etc should be passwd the is_binary flag
is_binary = self.watch.is_pdf
# And here we go! call the right browser with browser-specific settings
self.fetcher.run(url, timeout, request_headers, request_body, request_method, ignore_status_codes, self.watch.get('include_filters'),
is_binary=is_binary)
#@todo .quit here could go on close object, so we can run JS if change-detected
self.fetcher.quit()
# After init, call run_changedetection() which will do the actual change-detection
@abstractmethod
def run(self, uuid, skip_when_checksum_same=True, preferred_proxy=None):
def run_changedetection(self, uuid, skip_when_checksum_same=True):
update_obj = {'last_notification_error': False, 'last_error': False}
some_data = 'xxxxx'
update_obj["previous_md5"] = hashlib.md5(some_data.encode('utf-8')).hexdigest()

View File

@@ -1,10 +1,7 @@
import hashlib
import os
import re
import urllib3
from . import difference_detection_processor
from changedetectionio import content_fetcher
from copy import deepcopy
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
@@ -22,11 +19,7 @@ class perform_site_check(difference_detection_processor):
screenshot = None
xpath_data = None
def __init__(self, *args, datastore, **kwargs):
super().__init__(*args, **kwargs)
self.datastore = datastore
def run(self, uuid, skip_when_checksum_same=True):
def run_changedetection(self, uuid, skip_when_checksum_same=True):
# DeepCopy so we can be sure we don't accidently change anything by reference
watch = deepcopy(self.datastore.data['watching'].get(uuid))
@@ -34,84 +27,24 @@ class perform_site_check(difference_detection_processor):
if not watch:
raise Exception("Watch no longer exists.")
# Protect against file:// access
if re.search(r'^file', watch.get('url', ''), re.IGNORECASE) and not os.getenv('ALLOW_FILE_URI', False):
raise Exception(
"file:// type access is denied for security reasons."
)
# Unset any existing notification error
update_obj = {'last_notification_error': False, 'last_error': False}
request_headers = watch.get('headers', [])
request_headers.update(self.datastore.get_all_base_headers())
request_headers.update(self.datastore.get_all_headers_in_textfile_for_watch(uuid=uuid))
# https://github.com/psf/requests/issues/4525
# Requests doesnt yet support brotli encoding, so don't put 'br' here, be totally sure that the user cannot
# do this by accident.
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
timeout = self.datastore.data['settings']['requests'].get('timeout')
url = watch.link
request_body = self.datastore.data['watching'][uuid].get('body')
request_method = self.datastore.data['watching'][uuid].get('method')
ignore_status_codes = self.datastore.data['watching'][uuid].get('ignore_status_codes', False)
# Pluggable content fetcher
prefer_backend = watch.get_fetch_backend
if not prefer_backend or prefer_backend == 'system':
prefer_backend = self.datastore.data['settings']['application']['fetch_backend']
if hasattr(content_fetcher, prefer_backend):
klass = getattr(content_fetcher, prefer_backend)
else:
# If the klass doesnt exist, just use a default
klass = getattr(content_fetcher, "html_requests")
proxy_id = self.datastore.get_preferred_proxy_for_watch(uuid=uuid)
proxy_url = None
if proxy_id:
proxy_url = self.datastore.proxy_list.get(proxy_id).get('url')
print("UUID {} Using proxy {}".format(uuid, proxy_url))
fetcher = klass(proxy_override=proxy_url)
# Configurable per-watch or global extra delay before extracting text (for webDriver types)
system_webdriver_delay = self.datastore.data['settings']['application'].get('webdriver_delay', None)
if watch['webdriver_delay'] is not None:
fetcher.render_extract_delay = watch.get('webdriver_delay')
elif system_webdriver_delay is not None:
fetcher.render_extract_delay = system_webdriver_delay
# Could be removed if requests/plaintext could also return some info?
if prefer_backend != 'html_webdriver':
raise Exception("Re-stock detection requires Chrome or compatible webdriver/playwright fetcher to work")
if watch.get('webdriver_js_execute_code') is not None and watch.get('webdriver_js_execute_code').strip():
fetcher.webdriver_js_execute_code = watch.get('webdriver_js_execute_code')
fetcher.run(url, timeout, request_headers, request_body, request_method, ignore_status_codes, watch.get('include_filters'))
fetcher.quit()
self.screenshot = fetcher.screenshot
self.xpath_data = fetcher.xpath_data
self.screenshot = self.fetcher.screenshot
self.xpath_data = self.fetcher.xpath_data
# Track the content type
update_obj['content_type'] = fetcher.headers.get('Content-Type', '')
update_obj["last_check_status"] = fetcher.get_last_status_code()
update_obj['content_type'] = self.fetcher.headers.get('Content-Type', '')
update_obj["last_check_status"] = self.fetcher.get_last_status_code()
# Main detection method
fetched_md5 = None
if fetcher.instock_data:
fetched_md5 = hashlib.md5(fetcher.instock_data.encode('utf-8')).hexdigest()
if self.fetcher.instock_data:
fetched_md5 = hashlib.md5(self.fetcher.instock_data.encode('utf-8')).hexdigest()
# 'Possibly in stock' comes from stock-not-in-stock.js when no string found above the fold.
update_obj["in_stock"] = True if fetcher.instock_data == 'Possibly in stock' else False
update_obj["in_stock"] = True if self.fetcher.instock_data == 'Possibly in stock' else False
else:
raise UnableToExtractRestockData(status_code=fetcher.status_code)
raise UnableToExtractRestockData(status_code=self.fetcher.status_code)
# The main thing that all this at the moment comes down to :)
changed_detected = False
@@ -128,4 +61,4 @@ class perform_site_check(difference_detection_processor):
# Always record the new checksum
update_obj["previous_md5"] = fetched_md5
return changed_detected, update_obj, fetcher.instock_data.encode('utf-8')
return changed_detected, update_obj, self.fetcher.instock_data.encode('utf-8')

View File

@@ -1,4 +1,4 @@
# HTML to TEXT/JSON DIFFERENCE FETCHER
# HTML to TEXT/JSON DIFFERENCE self.fetcher
import hashlib
import json
@@ -32,15 +32,10 @@ class PDFToHTMLToolNotFound(ValueError):
# Some common stuff here that can be moved to a base class
# (set_proxy_from_list)
class perform_site_check(difference_detection_processor):
screenshot = None
xpath_data = None
def __init__(self, *args, datastore, **kwargs):
super().__init__(*args, **kwargs)
self.datastore = datastore
def run(self, uuid, skip_when_checksum_same=True, preferred_proxy=None):
def run_changedetection(self, uuid, skip_when_checksum_same=True):
changed_detected = False
html_content = ""
screenshot = False # as bytes
stripped_text_from_html = ""
@@ -49,100 +44,25 @@ class perform_site_check(difference_detection_processor):
if not watch:
raise Exception("Watch no longer exists.")
# Protect against file:// access
if re.search(r'^file', watch.get('url', ''), re.IGNORECASE) and not os.getenv('ALLOW_FILE_URI', False):
raise Exception(
"file:// type access is denied for security reasons."
)
# Unset any existing notification error
update_obj = {'last_notification_error': False, 'last_error': False}
# Tweak the base config with the per-watch ones
request_headers = watch.get('headers', [])
request_headers.update(self.datastore.get_all_base_headers())
request_headers.update(self.datastore.get_all_headers_in_textfile_for_watch(uuid=uuid))
# https://github.com/psf/requests/issues/4525
# Requests doesnt yet support brotli encoding, so don't put 'br' here, be totally sure that the user cannot
# do this by accident.
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
timeout = self.datastore.data['settings']['requests'].get('timeout')
url = watch.link
request_body = self.datastore.data['watching'][uuid].get('body')
request_method = self.datastore.data['watching'][uuid].get('method')
ignore_status_codes = self.datastore.data['watching'][uuid].get('ignore_status_codes', False)
# source: support
is_source = False
if url.startswith('source:'):
url = url.replace('source:', '')
is_source = True
# Pluggable content fetcher
prefer_backend = watch.get_fetch_backend
if not prefer_backend or prefer_backend == 'system':
prefer_backend = self.datastore.data['settings']['application']['fetch_backend']
if hasattr(content_fetcher, prefer_backend):
klass = getattr(content_fetcher, prefer_backend)
else:
# If the klass doesnt exist, just use a default
klass = getattr(content_fetcher, "html_requests")
if preferred_proxy:
proxy_id = preferred_proxy
else:
proxy_id = self.datastore.get_preferred_proxy_for_watch(uuid=uuid)
proxy_url = None
if proxy_id:
proxy_url = self.datastore.proxy_list.get(proxy_id).get('url')
print("UUID {} Using proxy {}".format(uuid, proxy_url))
fetcher = klass(proxy_override=proxy_url)
# Configurable per-watch or global extra delay before extracting text (for webDriver types)
system_webdriver_delay = self.datastore.data['settings']['application'].get('webdriver_delay', None)
if watch['webdriver_delay'] is not None:
fetcher.render_extract_delay = watch.get('webdriver_delay')
elif system_webdriver_delay is not None:
fetcher.render_extract_delay = system_webdriver_delay
# Possible conflict
if prefer_backend == 'html_webdriver':
fetcher.browser_steps = watch.get('browser_steps', None)
fetcher.browser_steps_screenshot_path = os.path.join(self.datastore.datastore_path, uuid)
if watch.get('webdriver_js_execute_code') is not None and watch.get('webdriver_js_execute_code').strip():
fetcher.webdriver_js_execute_code = watch.get('webdriver_js_execute_code')
# requests for PDF's, images etc should be passwd the is_binary flag
is_binary = watch.is_pdf
fetcher.run(url, timeout, request_headers, request_body, request_method, ignore_status_codes, watch.get('include_filters'),
is_binary=is_binary)
fetcher.quit()
self.screenshot = fetcher.screenshot
self.xpath_data = fetcher.xpath_data
self.screenshot = self.fetcher.screenshot
self.xpath_data = self.fetcher.xpath_data
# Track the content type
update_obj['content_type'] = fetcher.get_all_headers().get('content-type', '').lower()
update_obj['content_type'] = self.fetcher.get_all_headers().get('content-type', '').lower()
# Watches added automatically in the queue manager will skip if its the same checksum as the previous run
# Saves a lot of CPU
update_obj['previous_md5_before_filters'] = hashlib.md5(fetcher.content.encode('utf-8')).hexdigest()
update_obj['previous_md5_before_filters'] = hashlib.md5(self.fetcher.content.encode('utf-8')).hexdigest()
if skip_when_checksum_same:
if update_obj['previous_md5_before_filters'] == watch.get('previous_md5_before_filters'):
raise content_fetcher.checksumFromPreviousCheckWasTheSame()
# Fetching complete, now filters
# @todo move to class / maybe inside of fetcher abstract base?
# @note: I feel like the following should be in a more obvious chain system
# - Check filter text
@@ -151,24 +71,24 @@ class perform_site_check(difference_detection_processor):
# https://stackoverflow.com/questions/41817578/basic-method-chaining ?
# return content().textfilter().jsonextract().checksumcompare() ?
is_json = 'application/json' in fetcher.get_all_headers().get('content-type', '').lower()
is_json = 'application/json' in self.fetcher.get_all_headers().get('content-type', '').lower()
is_html = not is_json
is_rss = False
ctype_header = fetcher.get_all_headers().get('content-type', '').lower()
ctype_header = self.fetcher.get_all_headers().get('content-type', '').lower()
# Go into RSS preprocess for converting CDATA/comment to usable text
if any(substring in ctype_header for substring in ['application/xml', 'application/rss', 'text/xml']):
if '<rss' in fetcher.content[:100].lower():
fetcher.content = cdata_in_document_to_text(html_content=fetcher.content)
if '<rss' in self.fetcher.content[:100].lower():
self.fetcher.content = cdata_in_document_to_text(html_content=self.fetcher.content)
is_rss = True
# source: support, basically treat it as plaintext
if is_source:
if watch.is_source_type_url:
is_html = False
is_json = False
inline_pdf = fetcher.get_all_headers().get('content-disposition', '') and '%PDF-1' in fetcher.content[:10]
if watch.is_pdf or 'application/pdf' in fetcher.get_all_headers().get('content-type', '').lower() or inline_pdf:
inline_pdf = self.fetcher.get_all_headers().get('content-disposition', '') and '%PDF-1' in self.fetcher.content[:10]
if watch.is_pdf or 'application/pdf' in self.fetcher.get_all_headers().get('content-type', '').lower() or inline_pdf:
from shutil import which
tool = os.getenv("PDF_TO_HTML_TOOL", "pdftohtml")
if not which(tool):
@@ -179,18 +99,18 @@ class perform_site_check(difference_detection_processor):
[tool, '-stdout', '-', '-s', 'out.pdf', '-i'],
stdout=subprocess.PIPE,
stdin=subprocess.PIPE)
proc.stdin.write(fetcher.raw_content)
proc.stdin.write(self.fetcher.raw_content)
proc.stdin.close()
fetcher.content = proc.stdout.read().decode('utf-8')
self.fetcher.content = proc.stdout.read().decode('utf-8')
proc.wait(timeout=60)
# Add a little metadata so we know if the file changes (like if an image changes, but the text is the same
# @todo may cause problems with non-UTF8?
metadata = "<p>Added by changedetection.io: Document checksum - {} Filesize - {} bytes</p>".format(
hashlib.md5(fetcher.raw_content).hexdigest().upper(),
len(fetcher.content))
hashlib.md5(self.fetcher.raw_content).hexdigest().upper(),
len(self.fetcher.content))
fetcher.content = fetcher.content.replace('</body>', metadata + '</body>')
self.fetcher.content = self.fetcher.content.replace('</body>', metadata + '</body>')
# Better would be if Watch.model could access the global data also
# and then use getattr https://docs.python.org/3/reference/datamodel.html#object.__getitem__
@@ -217,7 +137,7 @@ class perform_site_check(difference_detection_processor):
if is_json:
# Sort the JSON so we dont get false alerts when the content is just re-ordered
try:
fetcher.content = json.dumps(json.loads(fetcher.content), sort_keys=True)
self.fetcher.content = json.dumps(json.loads(self.fetcher.content), sort_keys=True)
except Exception as e:
# Might have just been a snippet, or otherwise bad JSON, continue
pass
@@ -225,22 +145,22 @@ class perform_site_check(difference_detection_processor):
if has_filter_rule:
for filter in include_filters_rule:
if any(prefix in filter for prefix in json_filter_prefixes):
stripped_text_from_html += html_tools.extract_json_as_string(content=fetcher.content, json_filter=filter)
stripped_text_from_html += html_tools.extract_json_as_string(content=self.fetcher.content, json_filter=filter)
is_html = False
if is_html or is_source:
if is_html or watch.is_source_type_url:
# CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
fetcher.content = html_tools.workarounds_for_obfuscations(fetcher.content)
html_content = fetcher.content
self.fetcher.content = html_tools.workarounds_for_obfuscations(self.fetcher.content)
html_content = self.fetcher.content
# If not JSON, and if it's not text/plain..
if 'text/plain' in fetcher.get_all_headers().get('content-type', '').lower():
if 'text/plain' in self.fetcher.get_all_headers().get('content-type', '').lower():
# Don't run get_text or xpath/css filters on plaintext
stripped_text_from_html = html_content
else:
# Does it have some ld+json price data? used for easier monitoring
update_obj['has_ldjson_price_data'] = html_tools.has_ldjson_product_info(fetcher.content)
update_obj['has_ldjson_price_data'] = html_tools.has_ldjson_product_info(self.fetcher.content)
# Then we assume HTML
if has_filter_rule:
@@ -250,14 +170,19 @@ class perform_site_check(difference_detection_processor):
# For HTML/XML we offer xpath as an option, just start a regular xPath "/.."
if filter_rule[0] == '/' or filter_rule.startswith('xpath:'):
html_content += html_tools.xpath_filter(xpath_filter=filter_rule.replace('xpath:', ''),
html_content=fetcher.content,
append_pretty_line_formatting=not is_source,
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url,
is_rss=is_rss)
elif filter_rule.startswith('xpath1:'):
html_content += html_tools.xpath1_filter(xpath_filter=filter_rule.replace('xpath1:', ''),
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url,
is_rss=is_rss)
else:
# CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
html_content += html_tools.include_filters(include_filters=filter_rule,
html_content=fetcher.content,
append_pretty_line_formatting=not is_source)
html_content=self.fetcher.content,
append_pretty_line_formatting=not watch.is_source_type_url)
if not html_content.strip():
raise FilterNotFoundInResponse(include_filters_rule)
@@ -265,7 +190,7 @@ class perform_site_check(difference_detection_processor):
if has_subtractive_selectors:
html_content = html_tools.element_removal(subtractive_selectors, html_content)
if is_source:
if watch.is_source_type_url:
stripped_text_from_html = html_content
else:
# extract text
@@ -311,7 +236,7 @@ class perform_site_check(difference_detection_processor):
empty_pages_are_a_change = self.datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
if not is_json and not empty_pages_are_a_change and len(stripped_text_from_html.strip()) == 0:
raise content_fetcher.ReplyWithContentButNoText(url=url,
status_code=fetcher.get_last_status_code(),
status_code=self.fetcher.get_last_status_code(),
screenshot=screenshot,
has_filters=has_filter_rule,
html_content=html_content
@@ -320,7 +245,7 @@ class perform_site_check(difference_detection_processor):
# We rely on the actual text in the html output.. many sites have random script vars etc,
# in the future we'll implement other mechanisms.
update_obj["last_check_status"] = fetcher.get_last_status_code()
update_obj["last_check_status"] = self.fetcher.get_last_status_code()
# If there's text to skip
# @todo we could abstract out the get_text() to handle this cleaner
@@ -408,7 +333,7 @@ class perform_site_check(difference_detection_processor):
if is_html:
if self.datastore.data['settings']['application'].get('extract_title_as_title') or watch['extract_title_as_title']:
if not watch['title'] or not len(watch['title']):
update_obj['title'] = html_tools.extract_element(find='title', html_content=fetcher.content)
update_obj['title'] = html_tools.extract_element(find='title', html_content=self.fetcher.content)
if changed_detected:
if watch.get('check_unique_lines', False):

View File

@@ -1,6 +1,7 @@
function isItemInStock() {
// @todo Pass these in so the same list can be used in non-JS fetchers
const outOfStockTexts = [
' أخبرني عندما يتوفر',
'0 in stock',
'agotado',
'artikel zurzeit vergriffen',
@@ -16,9 +17,12 @@ function isItemInStock() {
'currently have any tickets for this',
'currently unavailable',
'dostępne wkrótce',
'dostępne wkrótce',
'en rupture de stock',
'ist derzeit nicht auf lager',
'ist derzeit nicht auf lager',
'item is no longer available',
'let me know when it\'s available',
'message if back in stock',
'nachricht bei',
'nicht auf lager',
@@ -42,7 +46,9 @@ function isItemInStock() {
'unavailable tickets',
'we do not currently have an estimate of when this product will be back in stock.',
'zur zeit nicht an lager',
'品切れ',
'已售完',
'품절'
];

View File

@@ -170,9 +170,12 @@ if (include_filters.length) {
try {
// is it xpath?
if (f.startsWith('/') || f.startsWith('xpath:')) {
q = document.evaluate(f.replace('xpath:', ''), document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
if (f.startsWith('/') || f.startsWith('xpath')) {
var qry_f = f.replace(/xpath(:|\d:)/, '')
console.log("[xpath] Scanning for included filter " + qry_f)
q = document.evaluate(qry_f, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
} else {
console.log("[css] Scanning for included filter " + f)
q = document.querySelector(f);
}
} catch (e) {
@@ -182,8 +185,18 @@ if (include_filters.length) {
}
if (q) {
// Try to resolve //something/text() back to its /something so we can atleast get the bounding box
try {
if (typeof q.nodeName == 'string' && q.nodeName === '#text') {
q = q.parentElement
}
} catch (e) {
console.log(e)
console.log("xpath_element_scraper: #text resolver")
}
// #1231 - IN the case XPath attribute filter is applied, we will have to traverse up and find the element.
if (q.hasOwnProperty('getBoundingClientRect')) {
if (typeof q.getBoundingClientRect == 'function') {
bbox = q.getBoundingClientRect();
console.log("xpath_element_scraper: Got filter element, scroll from top was " + scroll_y)
} else {
@@ -192,7 +205,8 @@ if (include_filters.length) {
bbox = q.ownerElement.getBoundingClientRect();
console.log("xpath_element_scraper: Got filter by ownerElement element, scroll from top was " + scroll_y)
} catch (e) {
console.log("xpath_element_scraper: error looking up ownerElement")
console.log(e)
console.log("xpath_element_scraper: error looking up q.ownerElement")
}
}
}

View File

@@ -0,0 +1,44 @@
#!/bin/bash
# run some tests and look if the 'custom-browser-search-string=1' connect string appeared in the correct containers
# enable debug
set -x
# A extra browser is configured, but we never chose to use it, so it should NOT show in the logs
docker run --rm -e "PLAYWRIGHT_DRIVER_URL=ws://browserless:3000" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio;pytest tests/custom_browser_url/test_custom_browser_url.py::test_request_not_via_custom_browser_url'
docker logs browserless-custom-url &>log.txt
grep 'custom-browser-search-string=1' log.txt
if [ $? -ne 1 ]
then
echo "Saw a request in 'browserless-custom-url' container with 'custom-browser-search-string=1' when I should not"
exit 1
fi
docker logs browserless &>log.txt
grep 'custom-browser-search-string=1' log.txt
if [ $? -ne 1 ]
then
echo "Saw a request in 'browser' container with 'custom-browser-search-string=1' when I should not"
exit 1
fi
# Special connect string should appear in the custom-url container, but not in the 'default' one
docker run --rm -e "PLAYWRIGHT_DRIVER_URL=ws://browserless:3000" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio;pytest tests/custom_browser_url/test_custom_browser_url.py::test_request_via_custom_browser_url'
docker logs browserless-custom-url &>log.txt
grep 'custom-browser-search-string=1' log.txt
if [ $? -ne 0 ]
then
echo "Did not see request in 'browserless-custom-url' container with 'custom-browser-search-string=1' when I should"
exit 1
fi
docker logs browserless &>log.txt
grep 'custom-browser-search-string=1' log.txt
if [ $? -ne 1 ]
then
echo "Saw a request in 'browser' container with 'custom-browser-search-string=1' when I should not"
exit 1
fi

View File

@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg width="15" height="16.363636" viewBox="0 0 15 16.363636" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
<svg width="15" height="16.363636" viewBox="0 0 15 16.363636" xmlns="http://www.w3.org/2000/svg" >
<path d="m 14.318182,11.762045 v 1.1925 H 5.4102273 L 11.849318,7.1140909 C 12.234545,9.1561364 12.54,11.181818 14.318182,11.762045 Z m -6.7984093,4.601591 c 1.0759091,0 2.0256823,-0.955909 2.0256823,-2.045454 H 5.4545455 c 0,1.089545 0.9879545,2.045454 2.0652272,2.045454 z M 15,2.8622727 0.9177273,15.636136 0,14.627045 l 1.8443182,-1.6725 h -1.1625 v -1.1925 C 4.0070455,10.677273 2.1784091,4.5388636 5.3611364,2.6897727 5.8009091,2.4347727 6.0709091,1.9609091 6.0702273,1.4488636 v -0.00205 C 6.0702273,0.64772727 6.7104545,0 7.5,0 8.2895455,0 8.9297727,0.64772727 8.9297727,1.4468182 v 0.00205 C 8.9290909,1.9602319 9.199773,2.4354591 9.638864,2.6897773 10.364318,3.111141 10.827273,3.7568228 11.1525,4.5129591 L 14.085682,1.8531818 Z M 6.8181818,1.3636364 C 6.8181818,1.74 7.1236364,2.0454545 7.5,2.0454545 7.8763636,2.0454545 8.1818182,1.74 8.1818182,1.3636364 8.1818182,0.98795455 7.8763636,0.68181818 7.5,0.68181818 c -0.3763636,0 -0.6818182,0.30613637 -0.6818182,0.68181822 z" id="path2" style="fill:#f8321b;stroke-width:0.681818;fill-opacity:1"/>
</svg>

Before

Width:  |  Height:  |  Size: 1.2 KiB

After

Width:  |  Height:  |  Size: 1.2 KiB

View File

@@ -10,7 +10,7 @@
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
>
<defs
id="defs16" />
<sodipodi:namedview

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -12,7 +12,7 @@
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"><defs
><defs
id="defs11" /><sodipodi:namedview
id="namedview9"
pagecolor="#ffffff"

Before

Width:  |  Height:  |  Size: 2.5 KiB

After

Width:  |  Height:  |  Size: 2.5 KiB

View File

@@ -10,7 +10,7 @@
viewBox="0 0 7.1975545 4.7993639"
xml:space="preserve"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"><defs
><defs
id="defs19" />
<g
id="g14"

Before

Width:  |  Height:  |  Size: 1.9 KiB

After

Width:  |  Height:  |  Size: 1.9 KiB

View File

@@ -9,7 +9,7 @@
id="svg5"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
>
<defs
id="defs2" />
<g

Before

Width:  |  Height:  |  Size: 2.4 KiB

After

Width:  |  Height:  |  Size: 2.4 KiB

View File

@@ -10,7 +10,7 @@
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
>
<defs
id="defs12" />
<sodipodi:namedview

Before

Width:  |  Height:  |  Size: 9.7 KiB

After

Width:  |  Height:  |  Size: 9.7 KiB

View File

@@ -3,7 +3,6 @@
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
version="1.1"
id="Capa_1"

Before

Width:  |  Height:  |  Size: 2.9 KiB

After

Width:  |  Height:  |  Size: 2.8 KiB

View File

@@ -13,7 +13,6 @@
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"><sodipodi:namedview

Before

Width:  |  Height:  |  Size: 3.5 KiB

After

Width:  |  Height:  |  Size: 3.4 KiB

View File

@@ -6,7 +6,7 @@
version="1.1"
id="svg6"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
>
<defs
id="defs10" />
<path

Before

Width:  |  Height:  |  Size: 892 B

After

Width:  |  Height:  |  Size: 854 B

View File

@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg width="18" height="19.92" viewBox="0 0 18 19.92" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
<svg width="18" height="19.92" viewBox="0 0 18 19.92" xmlns="http://www.w3.org/2000/svg" >
<path d="M -3,-2 H 21 V 22 H -3 Z" fill="none" id="path2"/>
<path d="m 15,14.08 c -0.76,0 -1.44,0.3 -1.96,0.77 L 5.91,10.7 C 5.96,10.47 6,10.24 6,10 6,9.76 5.96,9.53 5.91,9.3 L 12.96,5.19 C 13.5,5.69 14.21,6 15,6 16.66,6 18,4.66 18,3 18,1.34 16.66,0 15,0 c -1.66,0 -3,1.34 -3,3 0,0.24 0.04,0.47 0.09,0.7 L 5.04,7.81 C 4.5,7.31 3.79,7 3,7 1.34,7 0,8.34 0,10 c 0,1.66 1.34,3 3,3 0.79,0 1.5,-0.31 2.04,-0.81 l 7.12,4.16 c -0.05,0.21 -0.08,0.43 -0.08,0.65 0,1.61 1.31,2.92 2.92,2.92 1.61,0 2.92,-1.31 2.92,-2.92 0,-1.61 -1.31,-2.92 -2.92,-2.92 z" id="path4" style="fill:#0078e7;fill-opacity:1"/>
</svg>

Before

Width:  |  Height:  |  Size: 787 B

After

Width:  |  Height:  |  Size: 749 B

View File

@@ -1,19 +1,4 @@
$(document).ready(function () {
function toggle() {
if ($('input[name="application-fetch_backend"]:checked').val() != 'html_requests') {
$('#requests-override-options').hide();
$('#webdriver-override-options').show();
} else {
$('#requests-override-options').show();
$('#webdriver-override-options').hide();
}
}
$('input[name="application-fetch_backend"]').click(function (e) {
toggle();
});
toggle();
$("#api-key").hover(
function () {
$("#api-key-copy").html('copy').fadeIn();

View File

@@ -3,45 +3,50 @@
* Toggles theme between light and dark mode.
*/
$(document).ready(function () {
const button = document.getElementById("toggle-light-mode");
const button = document.getElementById("toggle-light-mode");
button.onclick = () => {
const htmlElement = document.getElementsByTagName("html");
const isDarkMode = htmlElement[0].dataset.darkmode === "true";
htmlElement[0].dataset.darkmode = !isDarkMode;
setCookieValue(!isDarkMode);
};
button.onclick = () => {
const htmlElement = document.getElementsByTagName("html");
const isDarkMode = htmlElement[0].dataset.darkmode === "true";
htmlElement[0].dataset.darkmode = !isDarkMode;
setCookieValue(!isDarkMode);
};
const setCookieValue = (value) => {
document.cookie = `css_dark_mode=${value};max-age=31536000;path=/`
}
const setCookieValue = (value) => {
document.cookie = `css_dark_mode=${value};max-age=31536000;path=/`
}
// Search input box behaviour
// Search input box behaviour
const toggle_search = document.getElementById("toggle-search");
const search_q = document.getElementById("search-q");
window.addEventListener('keydown', function (e) {
const search_q = document.getElementById("search-q");
if(search_q) {
window.addEventListener('keydown', function (e) {
if (e.altKey == true && e.keyCode == 83) {
search_q.classList.toggle('expanded');
search_q.focus();
}
});
if (e.altKey == true && e.keyCode == 83)
search_q.classList.toggle('expanded');
search_q.focus();
});
search_q.onkeydown = (e) => {
var key = e.keyCode || e.which;
if (key === 13) {
document.searchForm.submit();
search_q.onkeydown = (e) => {
var key = e.keyCode || e.which;
if (key === 13) {
document.searchForm.submit();
}
};
toggle_search.onclick = () => {
// Could be that they want to search something once text is in there
if (search_q.value.length) {
document.searchForm.submit();
} else {
// If not..
search_q.classList.toggle('expanded');
search_q.focus();
}
};
}
};
toggle_search.onclick = () => {
// Could be that they want to search something once text is in there
if (search_q.value.length) {
document.searchForm.submit();
} else {
// If not..
search_q.classList.toggle('expanded');
search_q.focus();
}
};
$('#heart-us').click(function () {
$("#overlay").toggleClass('visible');
heartpath.style.fill = document.getElementById("overlay").classList.contains("visible") ? '#ff0000' : 'var(--color-background)';
});
});

View File

@@ -0,0 +1,22 @@
$(document).ready(function () {
// Lazy Hide/Show elements mechanism
$('[data-visible-for]').hide();
$(':radio').on('keyup keypress blur change click', function (e) {
$('[data-visible-for]').hide();
$('.advanced-options').hide();
var n = $(this).attr('name') + "=" + $(this).val();
if (n === 'fetch_backend=system') {
n = "fetch_backend=" + default_system_fetch_backend;
}
$(`[data-visible-for~="${n}"]`).show();
});
$(':radio:checked').change();
// Show advanced
$('.show-advanced').click(function (e) {
$(this).closest('.tab-pane-inner').find('.advanced-options').toggle();
});
});

View File

@@ -149,7 +149,7 @@ $(document).ready(function () {
// @todo In the future paint all that match
for (const c of current_default_xpath) {
for (var i = selector_data['size_pos'].length; i !== 0; i--) {
if (selector_data['size_pos'][i - 1].xpath === c) {
if (selector_data['size_pos'][i - 1].xpath.trim() === c.trim()) {
console.log("highlighting " + c);
current_selected_i = i - 1;
highlight_current_selected_i();

View File

@@ -1,39 +1,17 @@
$(document).ready(function () {
function toggle() {
if ($('input[name="fetch_backend"]:checked').val() == 'html_webdriver') {
if (playwright_enabled) {
// playwright supports headers, so hide everything else
// See #664
$('#requests-override-options #request-method').hide();
$('#requests-override-options #request-body').hide();
// @todo connect this one up
$('#ignore-status-codes-option').hide();
} else {
// selenium/webdriver doesnt support anything afaik, hide it all
$('#requests-override-options').hide();
}
$('#webdriver-override-options').show();
} else if ($('input[name="fetch_backend"]:checked').val() == 'system') {
$('#requests-override-options #request-method').hide();
$('#requests-override-options #request-body').hide();
$('#ignore-status-codes-option').hide();
$('#requests-override-options').hide();
$('#webdriver-override-options').hide();
} else {
$('#requests-override-options').show();
$('#requests-override-options *:hidden').show();
$('#webdriver-override-options').hide();
// Lazy Hide/Show elements mechanism
$('[data-visible-for]').hide();
$(':radio').on('keyup keypress blur change click', function (e){
$('[data-visible-for]').hide();
var n = $(this).attr('name') + "=" + $(this).val();
if (n === 'fetch_backend=system') {
n = "fetch_backend=" + default_system_fetch_backend;
}
}
$(`[data-visible-for~="${n}"]`).show();
$('input[name="fetch_backend"]').click(function (e) {
toggle();
});
toggle();
$(':radio:checked').change();
$('#notification-setting-reset-to-default').click(function (e) {
$('#notification_title').val('');

View File

@@ -1,6 +1,6 @@
#toggle-light-mode {
width: 3rem;
/* width: 3rem;*/
/* default */
.icon-dark {
display: none;

View File

@@ -0,0 +1,24 @@
ul#requests-extra_browsers {
list-style: none;
/* tidy up the table to look more "inline" */
li {
> label {
display: none;
}
}
/* each proxy entry is a `table` */
table {
tr {
display: inline;
}
}
}
#extra-browsers-setting {
border: 1px solid var(--color-grey-800);
border-radius: 4px;
margin: 1em;
padding: 1em;
}

View File

@@ -60,3 +60,10 @@ body.proxy-check-active {
padding-bottom: 1em;
}
#extra-proxies-setting {
border: 1px solid var(--color-grey-800);
border-radius: 4px;
margin: 1em;
padding: 1em;
}

View File

@@ -0,0 +1,38 @@
#overlay {
opacity: 0.95;
position: fixed;
width: 350px;
max-width: 100%;
height: 100%;
top: 0;
right: -350px;
background-color: var(--color-table-stripe);
z-index: 2;
transform: translateX(0);
transition: transform .5s ease;
&.visible {
transform: translateX(-100%);
}
.content {
font-size: 0.875rem;
padding: 1rem;
margin-top: 5rem;
max-width: 400px;
color: var(--color-watch-table-row-text);
}
}
#heartpath {
&:hover {
fill: #ff0000 !important;
transition: all ease 0.3s !important;
}
transition: all ease 0.3s !important;
}

View File

@@ -0,0 +1,25 @@
.pure-menu-link {
padding: 0.5rem 1em;
line-height: 1.2rem;
}
.pure-menu-item {
svg {
height: 1.2rem;
}
* {
vertical-align: middle;
}
.github-link {
height: 1.8rem;
display: block;
svg {
height: 100%;
}
}
.bi-heart {
&:hover {
cursor: pointer;
}
}
}

View File

@@ -5,14 +5,18 @@
@import "parts/_arrows";
@import "parts/_browser-steps";
@import "parts/_extra_proxies";
@import "parts/_extra_browsers";
@import "parts/_pagination";
@import "parts/_spinners";
@import "parts/_variables";
@import "parts/_darkmode";
@import "parts/_menu";
@import "parts/_love";
body {
color: var(--color-text);
background: var(--color-background-page);
font-family: Helvetica Neue, Helvetica, Lucida Grande, Arial, Ubuntu, Cantarell, Fira Sans, sans-serif;
}
.visually-hidden {
@@ -55,11 +59,6 @@ a.github-link {
}
}
#toggle-search {
width: 2rem;
}
#search-q {
opacity: 0;
-webkit-transition: all .9s ease;
@@ -403,8 +402,24 @@ label {
}
#watch-add-wrapper-zone {
>div {
display: inline-block;
@media only screen and (min-width: 760px) {
display: flex;
gap: 0.3rem;
flex-direction: row;
}
/* URL field grows always, other stay static in width */
> span {
flex-grow: 0;
input {
width: 100%;
padding-right: 1em;
}
&:first-child {
flex-grow: 1;
}
}
@media only screen and (max-width: 760px) {
@@ -945,10 +960,8 @@ ul {
@import "parts/_visualselector";
#webdriver-override-options {
input[type="number"] {
#webdriver_delay {
width: 5em;
}
}
#api-key {
@@ -1082,3 +1095,4 @@ ul {
border-radius: 3px;
white-space: nowrap;
}

View File

@@ -128,6 +128,27 @@ body.proxy-check-active #request .proxy-timing {
border-radius: 4px;
padding: 1em; }
#extra-proxies-setting {
border: 1px solid var(--color-grey-800);
border-radius: 4px;
margin: 1em;
padding: 1em; }
ul#requests-extra_browsers {
list-style: none;
/* tidy up the table to look more "inline" */
/* each proxy entry is a `table` */ }
ul#requests-extra_browsers li > label {
display: none; }
ul#requests-extra_browsers table tr {
display: inline; }
#extra-browsers-setting {
border: 1px solid var(--color-grey-800);
border-radius: 4px;
margin: 1em;
padding: 1em; }
.pagination-page-info {
color: #fff;
font-size: 0.85rem;
@@ -331,7 +352,7 @@ html[data-darkmode="true"] {
color: var(--color-watch-table-error); }
#toggle-light-mode {
width: 3rem;
/* width: 3rem;*/
/* default */ }
#toggle-light-mode .icon-dark {
display: none; }
@@ -342,9 +363,56 @@ html[data-darkmode="true"] #toggle-light-mode .icon-light {
html[data-darkmode="true"] #toggle-light-mode .icon-dark {
display: block; }
.pure-menu-link {
padding: 0.5rem 1em;
line-height: 1.2rem; }
.pure-menu-item svg {
height: 1.2rem; }
.pure-menu-item * {
vertical-align: middle; }
.pure-menu-item .github-link {
height: 1.8rem;
display: block; }
.pure-menu-item .github-link svg {
height: 100%; }
.pure-menu-item .bi-heart:hover {
cursor: pointer; }
#overlay {
opacity: 0.95;
position: fixed;
width: 350px;
max-width: 100%;
height: 100%;
top: 0;
right: -350px;
background-color: var(--color-table-stripe);
z-index: 2;
transform: translateX(0);
transition: transform .5s ease; }
#overlay.visible {
transform: translateX(-100%); }
#overlay .content {
font-size: 0.875rem;
padding: 1rem;
margin-top: 5rem;
max-width: 400px;
color: var(--color-watch-table-row-text); }
#heartpath {
transition: all ease 0.3s !important; }
#heartpath:hover {
fill: #ff0000 !important;
transition: all ease 0.3s !important; }
body {
color: var(--color-text);
background: var(--color-background-page); }
background: var(--color-background-page);
font-family: Helvetica Neue, Helvetica, Lucida Grande, Arial, Ubuntu, Cantarell, Fira Sans, sans-serif; }
.visually-hidden {
clip: rect(0 0 0 0);
@@ -376,9 +444,6 @@ a.github-link {
a.github-link:hover {
color: var(--color-icon-github-hover); }
#toggle-search {
width: 2rem; }
#search-q {
opacity: 0;
-webkit-transition: all .9s ease;
@@ -618,11 +683,23 @@ label:hover {
#new-watch-form legend {
color: var(--color-text-legend);
font-weight: bold; }
#new-watch-form #watch-add-wrapper-zone > div {
display: inline-block; }
@media only screen and (max-width: 760px) {
#new-watch-form #watch-add-wrapper-zone #url {
width: 100%; } }
#new-watch-form #watch-add-wrapper-zone {
/* URL field grows always, other stay static in width */ }
@media only screen and (min-width: 760px) {
#new-watch-form #watch-add-wrapper-zone {
display: flex;
gap: 0.3rem;
flex-direction: row; } }
#new-watch-form #watch-add-wrapper-zone > span {
flex-grow: 0; }
#new-watch-form #watch-add-wrapper-zone > span input {
width: 100%;
padding-right: 1em; }
#new-watch-form #watch-add-wrapper-zone > span:first-child {
flex-grow: 1; }
@media only screen and (max-width: 760px) {
#new-watch-form #watch-add-wrapper-zone #url {
width: 100%; } }
#diff-col {
padding-left: 40px; }
@@ -1000,7 +1077,7 @@ ul {
#selector-current-xpath {
font-size: 80%; }
#webdriver-override-options input[type="number"] {
#webdriver_delay {
width: 5em; }
#api-key:hover {

View File

@@ -234,7 +234,7 @@ class ChangeDetectionStore:
# Probably their should be dict...
for watch in self.data['watching'].values():
if watch['url'] == url:
if watch['url'].lower() == url.lower():
return True
return False
@@ -333,7 +333,8 @@ class ChangeDetectionStore:
# Or if UUIDs given directly
if tag_uuids:
apply_extras['tags'] = list(set(apply_extras['tags'] + tag_uuids))
for t in tag_uuids:
apply_extras['tags'] = list(set(apply_extras['tags'] + [t.strip()]))
# Make any uuids unique
if apply_extras.get('tags'):
@@ -360,6 +361,8 @@ class ChangeDetectionStore:
if write_to_disk_now:
self.sync_to_json()
print("added ", url)
return new_uuid
def visualselector_data_is_ready(self, watch_uuid):
@@ -631,6 +634,18 @@ class ChangeDetectionStore:
return {}
@property
def extra_browsers(self):
res = []
p = list(filter(
lambda s: (s.get('browser_name') and s.get('browser_connection_url')),
self.__data['settings']['requests'].get('extra_browsers', [])))
if p:
for i in p:
res.append(("extra_browser_"+i['browser_name'], i['browser_name']))
return res
def tag_exists_by_name(self, tag_name):
return any(v.get('title', '').lower() == tag_name.lower() for k, v in self.__data['settings']['application']['tags'].items())
@@ -833,4 +848,14 @@ class ChangeDetectionStore:
if not watch.get('date_created'):
self.data['watching'][uuid]['date_created'] = i
i+=1
return
return
# #1774 - protect xpath1 against migration
def update_14(self):
for awatch in self.__data["watching"]:
if self.__data["watching"][awatch]['include_filters']:
for num, selector in enumerate(self.__data["watching"][awatch]['include_filters']):
if selector.startswith('/'):
self.__data["watching"][awatch]['include_filters'][num] = 'xpath1:' + selector
if selector.startswith('xpath:'):
self.__data["watching"][awatch]['include_filters'][num] = selector.replace('xpath:', 'xpath1:', 1)

View File

@@ -13,7 +13,7 @@
<div class="pure-form-message-inline">
<ul>
<li>Use <a target=_new href="https://github.com/caronc/apprise">AppRise URLs</a> for notification to just about any service! <i><a target=_new href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes">Please read the notification services wiki here for important configuration notes</a></i>.</li>
<li><code><a target=_new href="https://github.com/caronc/apprise/wiki/Notify_discord">discord://</a></code> (or <code>https://discord.com/api/webhooks...</code>)) </code> only supports a maximum <strong>2,000 characters</strong> of notification text, including the title.</li>
<li><code><a target=_new href="https://github.com/caronc/apprise/wiki/Notify_discord">discord://</a></code> (or <code>https://discord.com/api/webhooks...</code>)) only supports a maximum <strong>2,000 characters</strong> of notification text, including the title.</li>
<li><code><a target=_new href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> bots can't send messages to other bots, so you should specify chat ID of non-bot user.</li>
<li><code><a target=_new href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> only supports very limited HTML and can fail when extra tags are sent, <a href="https://core.telegram.org/bots/api#html-style">read more here</a> (or use plaintext/markdown format)</li>
<li><code>gets://</code>, <code>posts://</code>, <code>puts://</code>, <code>deletes://</code> for direct API calls (or omit the "<code>s</code>" for non-SSL ie <code>get://</code>)</li>

View File

@@ -39,6 +39,24 @@
{% endmacro %}
{% macro render_nolabel_field(field) %}
<span>
{{ field(**kwargs)|safe }}
{% if field.errors %}
<span class="error">
{% if field.errors %}
<ul class=errors>
{% for error in field.errors %}
<li>{{ error }}</li>
{% endfor %}
</ul>
{% endif %}
</span>
{% endif %}
</span>
{% endmacro %}
{% macro render_button(field) %}
{{ field(**kwargs)|safe }}
{% endmacro %}

View File

@@ -8,10 +8,10 @@
<title>Change Detection{{extra_title}}</title>
<link rel="alternate" type="application/rss+xml" title="Changedetection.io » Feed{% if active_tag %}- {{active_tag}}{% endif %}" href="{{ url_for('rss', tag=active_tag , token=app_rss_token)}}" >
<link rel="stylesheet" href="{{url_for('static_content', group='styles', filename='pure-min.css')}}" >
<link rel="stylesheet" href="{{url_for('static_content', group='styles', filename='styles.css')}}" >
<link rel="stylesheet" href="{{url_for('static_content', group='styles', filename='styles.css')}}?v={{ get_css_version() }}" >
{% if extra_stylesheets %}
{% for m in extra_stylesheets %}
<link rel="stylesheet" href="{{ m }}?ver=1000" >
<link rel="stylesheet" href="{{ m }}?ver={{ get_css_version() }}" >
{% endfor %}
{% endif %}
@@ -85,6 +85,7 @@
<a href="{{url_for('logout')}}" class="pure-menu-link">LOG OUT</a>
</li>
{% endif %}
{% if current_user.is_authenticated or not has_password %}
<li class="pure-menu-item pure-form" id="search-menu-item">
<!-- We use GET here so it offers people a chance to set bookmarks etc -->
<form name="searchForm" action="" method="GET">
@@ -95,6 +96,7 @@
</button>
</form>
</li>
{% endif %}
<li class="pure-menu-item">
<button class="toggle-button" id ="toggle-light-mode" type="button" title="Toggle Light/Dark Mode">
<span class="visually-hidden">Toggle light/dark mode</span>
@@ -106,6 +108,20 @@
</span>
</button>
</li>
<li class="pure-menu-item" id="heart-us">
<svg
fill="#ff0000"
class="bi bi-heart"
preserveAspectRatio="xMidYMid meet"
viewBox="0 0 16.9 16.1"
id="svg-heart"
xmlns="http://www.w3.org/2000/svg"
>
<path id="heartpath" d="M 5.338316,0.50302766 C 0.71136983,0.50647126 -3.9576371,7.2707777 8.5004254,15.503028 23.833425,5.3700277 13.220206,-2.5384409 8.6762066,1.6475589 c -0.060791,0.054322 -0.11943,0.1110064 -0.1757812,0.1699219 -0.057,-0.059 -0.1157813,-0.116875 -0.1757812,-0.171875 C 7.4724566,0.86129334 6.4060729,0.50223298 5.338316,0.50302766 Z"
style="fill:var(--color-background);fill-opacity:1;stroke:#ff0000;stroke-opacity:1" />
</svg>
</li>
<li class="pure-menu-item">
<a class="github-link" href="https://github.com/dgtlmoon/changedetection.io">
{% include "svgs/github.svg" %}
@@ -129,7 +145,43 @@
<div class="sticky-tab" id="right-sticky">{{ right_sticky }}</div>
{% endif %}
<section class="content">
<header>
<div id="overlay">
<div class="content">
<strong>changedetection.io needs your support!</strong><br>
<p>
You can help us by supporting changedetection.io on these platforms;
</p>
<p>
<ul>
<li>
<a href="https://alternativeto.net/software/changedetection-io/about/">Rate us at
AlternativeTo.net</a>
</li>
<li>
<a href="https://github.com/dgtlmoon/changedetection.io">Star us on GitHub</a>
</li>
<li>
<a href="https://twitter.com/change_det_io">Follow us at Twitter/X</a>
</li>
<li>
<a href="https://www.linkedin.com/company/changedetection-io">Check us out on LinkedIn</a>
</li>
<li>
And tell your friends and colleagues :)
</li>
</ul>
<p>
The more popular changedetection.io is, the more time we can dedicate to adding amazing features!
</p>
<p>
Many thanks :)<br>
</p>
<p>
<i>changedetection.io team</i>
</p>
</div>
</div>
<header>
{% block header %}{% endblock %}
</header>

View File

@@ -3,6 +3,7 @@
{% from '_helpers.jinja' import render_field, render_checkbox_field, render_button %}
{% from '_common_fields.jinja' import render_common_settings_form %}
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='vis.js')}}" defer></script>
<script>
const browser_steps_available_screenshots=JSON.parse('{{ watch.get_browsersteps_available_screenshots|tojson }}');
const browser_steps_config=JSON.parse('{{ browser_steps_config|tojson }}');
@@ -19,7 +20,7 @@
const proxy_recheck_status_url="{{url_for('check_proxies.get_recheck_status', uuid=uuid)}}";
const screenshot_url="{{url_for('static_content', group='screenshot', filename=uuid)}}";
const watch_visual_selector_data_url="{{url_for('static_content', group='visual_selector_data', filename=uuid)}}";
const default_system_fetch_backend="{{ settings_application['fetch_backend'] }}";
</script>
<script src="{{url_for('static_content', group='js', filename='watch-settings.js')}}" defer></script>
@@ -124,10 +125,9 @@
</span>
</div>
{% endif %}
<div class="pure-control-group inline-radio">
{{ render_checkbox_field(form.ignore_status_codes) }}
</div>
<fieldset id="webdriver-override-options">
<!-- webdriver always -->
<fieldset data-visible-for="fetch_backend=html_webdriver">
<div class="pure-control-group">
{{ render_field(form.webdriver_delay) }}
<div class="pure-form-message-inline">
@@ -140,23 +140,40 @@
</div>
</div>
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">Show advanced options</a>
</div>
<div class="advanced-options" style="display: none;">
{{ render_field(form.webdriver_js_execute_code) }}
<div class="pure-form-message-inline">
Run this code before performing change detection, handy for filling in fields and other actions <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Run-JavaScript-before-change-detection">More help and examples here</a>
Run this code before performing change detection, handy for filling in fields and other
actions <a
href="https://github.com/dgtlmoon/changedetection.io/wiki/Run-JavaScript-before-change-detection">More
help and examples here</a>
</div>
</div>
</fieldset>
<fieldset class="pure-group" id="requests-override-options">
{% if not playwright_enabled %}
<div class="pure-form-message-inline">
<strong>Request override is currently only used by the <i>Basic fast Plaintext/HTTP Client</i> method.</strong>
</div>
{% endif %}
<div class="pure-control-group" id="request-method">
{{ render_field(form.method) }}
<!-- html requests always -->
<fieldset data-visible-for="fetch_backend=html_requests" style="display: none;">
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">Show advanced options</a>
</div>
<div class="pure-control-group" id="request-headers">
{{ render_field(form.headers, rows=5, placeholder="Example
<div class="advanced-options" style="display: none;">
<div class="pure-control-group advanced-options" id="request-method" style="display: none;">
{{ render_field(form.method) }}
</div>
<div class="advanced-options" id="request-body">
{{ render_field(form.body, rows=5, placeholder="Example
{
\"name\":\"John\",
\"age\":30,
\"car\":null
}") }}
</div>
</div>
</fieldset>
<!-- hmm -->
<div class="pure-control-group advanced-options" style="display: none;">
{{ render_field(form.headers, rows=5, placeholder="Example
Cookie: foobar
User-Agent: wonderbra 1.0") }}
@@ -169,17 +186,12 @@ User-Agent: wonderbra 1.0") }}
<br>
(Not supported by Selenium browser)
</div>
</div>
<div class="pure-control-group" id="request-body">
{{ render_field(form.body, rows=5, placeholder="Example
{
\"name\":\"John\",
\"age\":30,
\"car\":null
}") }}
<fieldset data-visible-for="fetch_backend=html_requests fetch_backend=html_webdriver" style="display: none;">
<div class="pure-control-group inline-radio advanced-options" style="display: none;">
{{ render_checkbox_field(form.ignore_status_codes) }}
</div>
</fieldset>
</fieldset>
</div>
{% if playwright_enabled %}
<div class="tab-pane-inner" id="browser-steps">
@@ -290,11 +302,12 @@ xpath://body/div/span[contains(@class, 'example-class')]",
{% endif %}
</ul>
</li>
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash,
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash. To specify XPath to be used explicitly or the XPath rule starts with an XPath function: Prefix with <code>xpath:</code>
<ul>
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath://*[contains(@class, 'sametext')]</code>, <a
<li>Example: <code>//*[contains(@class, 'sametext')]</code> or <code>xpath:count(//*[contains(@class, 'sametext')])</code>, <a
href="http://xpather.com/" target="new">test your XPath here</a></li>
<li>Example: Get all titles from an RSS feed <code>//title/text()</code></li>
<li>To use XPath1.0: Prefix with <code>xpath1:</code></li>
</ul>
</li>
</ul>
@@ -455,15 +468,15 @@ Unavailable") }}
<tbody>
<tr>
<td>Check count</td>
<td>{{ watch.check_count }}</td>
<td>{{ "{:,}".format( watch.check_count) }}</td>
</tr>
<tr>
<td>Consecutive filter failures</td>
<td>{{ watch.consecutive_filter_failures }}</td>
<td>{{ "{:,}".format( watch.consecutive_filter_failures) }}</td>
</tr>
<tr>
<td>History length</td>
<td>{{ watch.history|length }}</td>
<td>{{ "{:,}".format(watch.history|length) }}</td>
</tr>
<tr>
<td>Last fetch time</td>

View File

@@ -8,11 +8,12 @@
<ul>
<li class="tab" id=""><a href="#url-list">URL List</a></li>
<li class="tab"><a href="#distill-io">Distill.io</a></li>
<li class="tab"><a href="#xlsx">.XLSX &amp; Wachete</a></li>
</ul>
</div>
<div class="box-wrap inner">
<form class="pure-form pure-form-aligned" action="{{url_for('import_page')}}" method="POST">
<form class="pure-form" action="{{url_for('import_page')}}" method="POST" enctype="multipart/form-data">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<div class="tab-pane-inner" id="url-list">
<legend>
@@ -79,6 +80,42 @@
" rows="25">{{ original_distill_json }}</textarea>
</div>
<div class="tab-pane-inner" id="xlsx">
<fieldset>
<div class="pure-control-group">
{{ render_field(form.xlsx_file, class="processor") }}
</div>
<div class="pure-control-group">
{{ render_field(form.file_mapping, class="processor") }}
</div>
</fieldset>
<div class="pure-control-group">
<span class="pure-form-message-inline">
Table of custom column and data types mapping for the <strong>Custom mapping</strong> File mapping type.
</span>
<table style="border: 1px solid #aaa; padding: 0.5rem; border-radius: 4px;">
<tr>
<td><strong>Column #</strong></td>
{% for n in range(4) %}
<td><input type="number" name="custom_xlsx[col_{{n}}]" style="width: 4rem;" min="1"></td>
{% endfor %}
</tr>
<tr>
<td><strong>Type</strong></td>
{% for n in range(4) %}
<td><select name="custom_xlsx[col_type_{{n}}]">
<option value="" style="color: #aaa"> -- none --</option>
<option value="url">URL</option>
<option value="title">Title</option>
<option value="include_filter">CSS/xPath filter</option>
<option value="tag">Group / Tag name(s)</option>
<option value="interval_minutes">Recheck time (minutes)</option>
</select></td>
{% endfor %}
</tr>
</table>
</div>
</div>
<button type="submit" class="pure-button pure-input-1-2 pure-button-primary">Import</button>
</form>

View File

@@ -11,7 +11,7 @@
</script>
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='notifications.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='vis.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='global-settings.js')}}" defer></script>
<div class="edit-form">
<div class="tabs collapsable">
@@ -111,7 +111,7 @@
<br>
Tip: <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">Connect using Bright Data and Oxylabs Proxies, find out more here.</a>
</div>
<fieldset class="pure-group" id="webdriver-override-options">
<fieldset class="pure-group" id="webdriver-override-options" data-visible-for="application-fetch_backend=html_webdriver">
<div class="pure-form-message-inline">
<strong>If you're having trouble waiting for the page to be fully rendered (text missing etc), try increasing the 'wait' time here.</strong>
<br>
@@ -178,6 +178,9 @@ nav
<span style="display:none;" id="api-key-copy" >copy</span>
</div>
</div>
<div class="pure-control-group">
<a href="{{url_for('settings_reset_api_key')}}" class="pure-button button-small button-cancel">Regenerate API key</a>
</div>
</div>
<div class="tab-pane-inner" id="proxies">
<div id="recommended-proxy">
@@ -227,11 +230,15 @@ nav
</p>
<p><strong>Tip</strong>: "Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.
<div class="pure-control-group">
<div class="pure-control-group" id="extra-proxies-setting">
{{ render_field(form.requests.form.extra_proxies) }}
<span class="pure-form-message-inline">"Name" will be used for selecting the proxy in the Watch Edit settings</span><br>
<span class="pure-form-message-inline">SOCKS5 proxies with authentication are only supported with 'plain requests' fetcher, for other fetchers you should whitelist the IP access instead</span>
</div>
<div class="pure-control-group" id="extra-browsers-setting">
<span class="pure-form-message-inline"><i>Extra Browsers</i> allow changedetection.io to communicate with a different web-browser.</span><br>
{{ render_field(form.requests.form.extra_browsers) }}
</div>
</div>
<div id="actions">
<div class="pure-control-group">

View File

@@ -1,3 +1,6 @@
<svg class="octicon octicon-mark-github v-align-middle" height="32" viewbox="0 0 16 16" version="1.1" width="32" aria-hidden="true">
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"></path>
<svg class="octicon octicon-mark-github v-align-middle" viewbox="0 0 16 16" version="1.1" aria-hidden="true">
<path
fill-rule="evenodd"
d="M 8,0 C 3.58,0 0,3.58 0,8 c 0,3.54 2.29,6.53 5.47,7.59 0.4,0.07 0.55,-0.17 0.55,-0.38 0,-0.19 -0.01,-0.82 -0.01,-1.49 C 4,14.09 3.48,13.23 3.32,12.78 3.23,12.55 2.84,11.84 2.5,11.65 2.22,11.5 1.82,11.13 2.49,11.12 3.12,11.11 3.57,11.7 3.72,11.94 4.44,13.15 5.59,12.81 6.05,12.6 6.12,12.08 6.33,11.73 6.56,11.53 4.78,11.33 2.92,10.64 2.92,7.58 2.92,6.71 3.23,5.99 3.74,5.43 3.66,5.23 3.38,4.41 3.82,3.31 c 0,0 0.67,-0.21 2.2,0.82 0.64,-0.18 1.32,-0.27 2,-0.27 0.68,0 1.36,0.09 2,0.27 1.53,-1.04 2.2,-0.82 2.2,-0.82 0.44,1.1 0.16,1.92 0.08,2.12 0.51,0.56 0.82,1.27 0.82,2.15 0,3.07 -1.87,3.75 -3.65,3.95 0.29,0.25 0.54,0.73 0.54,1.48 0,1.07 -0.01,1.93 -0.01,2.2 0,0.21 0.15,0.46 0.55,0.38 A 8.013,8.013 0 0 0 16,8 C 16,3.58 12.42,0 8,0 Z"
id="path2" />
</svg>

Before

Width:  |  Height:  |  Size: 749 B

After

Width:  |  Height:  |  Size: 917 B

View File

@@ -1 +1 @@
<?xml version="1.0" encoding="utf-8"?><svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 122.879 119.799" enable-background="new 0 0 122.879 119.799" xml:space="preserve"><g><path d="M49.988,0h0.016v0.007C63.803,0.011,76.298,5.608,85.34,14.652c9.027,9.031,14.619,21.515,14.628,35.303h0.007v0.033v0.04 h-0.007c-0.005,5.557-0.917,10.905-2.594,15.892c-0.281,0.837-0.575,1.641-0.877,2.409v0.007c-1.446,3.66-3.315,7.12-5.547,10.307 l29.082,26.139l0.018,0.016l0.157,0.146l0.011,0.011c1.642,1.563,2.536,3.656,2.649,5.78c0.11,2.1-0.543,4.248-1.979,5.971 l-0.011,0.016l-0.175,0.203l-0.035,0.035l-0.146,0.16l-0.016,0.021c-1.565,1.642-3.654,2.534-5.78,2.646 c-2.097,0.111-4.247-0.54-5.971-1.978l-0.015-0.011l-0.204-0.175l-0.029-0.024L78.761,90.865c-0.88,0.62-1.778,1.209-2.687,1.765 c-1.233,0.755-2.51,1.466-3.813,2.115c-6.699,3.342-14.269,5.222-22.272,5.222v0.007h-0.016v-0.007 c-13.799-0.004-26.296-5.601-35.338-14.645C5.605,76.291,0.016,63.805,0.007,50.021H0v-0.033v-0.016h0.007 c0.004-13.799,5.601-26.296,14.645-35.338C23.683,5.608,36.167,0.016,49.955,0.007V0H49.988L49.988,0z M50.004,11.21v0.007h-0.016 h-0.033V11.21c-10.686,0.007-20.372,4.35-27.384,11.359C15.56,29.578,11.213,39.274,11.21,49.973h0.007v0.016v0.033H11.21 c0.007,10.686,4.347,20.367,11.359,27.381c7.009,7.012,16.705,11.359,27.403,11.361v-0.007h0.016h0.033v0.007 c10.686-0.007,20.368-4.348,27.382-11.359c7.011-7.009,11.358-16.702,11.36-27.4h-0.006v-0.016v-0.033h0.006 c-0.006-10.686-4.35-20.372-11.358-27.384C70.396,15.56,60.703,11.213,50.004,11.21L50.004,11.21z"/></g></svg>
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 122.879 119.799" enable-background="new 0 0 122.879 119.799" xml:space="preserve"><g><path d="M49.988,0h0.016v0.007C63.803,0.011,76.298,5.608,85.34,14.652c9.027,9.031,14.619,21.515,14.628,35.303h0.007v0.033v0.04 h-0.007c-0.005,5.557-0.917,10.905-2.594,15.892c-0.281,0.837-0.575,1.641-0.877,2.409v0.007c-1.446,3.66-3.315,7.12-5.547,10.307 l29.082,26.139l0.018,0.016l0.157,0.146l0.011,0.011c1.642,1.563,2.536,3.656,2.649,5.78c0.11,2.1-0.543,4.248-1.979,5.971 l-0.011,0.016l-0.175,0.203l-0.035,0.035l-0.146,0.16l-0.016,0.021c-1.565,1.642-3.654,2.534-5.78,2.646 c-2.097,0.111-4.247-0.54-5.971-1.978l-0.015-0.011l-0.204-0.175l-0.029-0.024L78.761,90.865c-0.88,0.62-1.778,1.209-2.687,1.765 c-1.233,0.755-2.51,1.466-3.813,2.115c-6.699,3.342-14.269,5.222-22.272,5.222v0.007h-0.016v-0.007 c-13.799-0.004-26.296-5.601-35.338-14.645C5.605,76.291,0.016,63.805,0.007,50.021H0v-0.033v-0.016h0.007 c0.004-13.799,5.601-26.296,14.645-35.338C23.683,5.608,36.167,0.016,49.955,0.007V0H49.988L49.988,0z M50.004,11.21v0.007h-0.016 h-0.033V11.21c-10.686,0.007-20.372,4.35-27.384,11.359C15.56,29.578,11.213,39.274,11.21,49.973h0.007v0.016v0.033H11.21 c0.007,10.686,4.347,20.367,11.359,27.381c7.009,7.012,16.705,11.359,27.403,11.361v-0.007h0.016h0.033v0.007 c10.686-0.007,20.368-4.348,27.382-11.359c7.011-7.009,11.358-16.702,11.36-27.4h-0.006v-0.016v-0.033h0.006 c-0.006-10.686-4.35-20.372-11.358-27.384C70.396,15.56,60.703,11.213,50.004,11.21L50.004,11.21z"/></g></svg>

Before

Width:  |  Height:  |  Size: 1.6 KiB

After

Width:  |  Height:  |  Size: 1.5 KiB

View File

@@ -1,6 +1,6 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.jinja' import render_simple_field, render_field %}
{% from '_helpers.jinja' import render_simple_field, render_field, render_nolabel_field %}
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='watch-overview.js')}}" defer></script>
@@ -11,17 +11,14 @@
<fieldset>
<legend>Add a new change detection watch</legend>
<div id="watch-add-wrapper-zone">
<div>
{{ render_simple_field(form.url, placeholder="https://...", required=true) }}
{{ render_simple_field(form.tags, value=tags[active_tag].title if active_tag else '', placeholder="watch label / tag") }}
</div>
<div>
{{ render_simple_field(form.watch_submit_button, title="Watch this URL!" ) }}
{{ render_simple_field(form.edit_and_watch_submit_button, title="Edit first then Watch") }}
</div>
{{ render_nolabel_field(form.url, placeholder="https://...", required=true) }}
{{ render_nolabel_field(form.tags, value=tags[active_tag].title if active_tag else '', placeholder="watch label / tag") }}
{{ render_nolabel_field(form.watch_submit_button, title="Watch this URL!" ) }}
{{ render_nolabel_field(form.edit_and_watch_submit_button, title="Edit first then Watch") }}
</div>
<div id="quick-watch-processor-type">
{{ render_simple_field(form.processor, title="Edit first then Watch") }}
{{ render_simple_field(form.processor) }}
</div>
</fieldset>
@@ -82,12 +79,15 @@
</tr>
{% endif %}
{% for watch in (watches|sort(attribute=sort_attribute, reverse=sort_order == 'asc'))|pagination_slice(skip=pagination.skip) %}
{% set is_unviewed = watch.newest_history_key| int > watch.last_viewed and watch.history_n>=2 %}
<tr id="{{ watch.uuid }}"
class="{{ loop.cycle('pure-table-odd', 'pure-table-even') }} processor-{{ watch['processor'] }}
{% if watch.last_error is defined and watch.last_error != False %}error{% endif %}
{% if watch.last_notification_error is defined and watch.last_notification_error != False %}error{% endif %}
{% if watch.paused is defined and watch.paused != False %}paused{% endif %}
{% if watch.newest_history_key| int > watch.last_viewed and watch.history_n>=2 %}unviewed{% endif %}
{% if is_unviewed %}unviewed{% endif %}
{% if watch.uuid in queued_uuids %}queued{% endif %}">
<td class="inline checkbox-uuid" ><input name="uuids" type="checkbox" value="{{ watch.uuid}} " > <span>{{ loop.index+pagination.skip }}</span></td>
<td class="inline watch-controls">
@@ -104,8 +104,9 @@
{% if watch.get_fetch_backend == "html_webdriver"
or ( watch.get_fetch_backend == "system" and system_default_fetcher == 'html_webdriver' )
or "extra_browser_" in watch.get_fetch_backend
%}
<img class="status-icon" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}" title="Using a chrome browser" >
<img class="status-icon" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}" title="Using a Chrome browser" >
{% endif %}
{%if watch.is_pdf %}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" title="Converting PDF to text" >{% endif %}
@@ -166,7 +167,13 @@
class="recheck pure-button pure-button-primary">{% if watch.uuid in queued_uuids %}Queued{% else %}Recheck{% endif %}</a>
<a href="{{ url_for('edit_page', uuid=watch.uuid)}}" class="pure-button pure-button-primary">Edit</a>
{% if watch.history_n >= 2 %}
<a href="{{ url_for('diff_history_page', uuid=watch.uuid) }}" target="{{watch.uuid}}" class="pure-button pure-button-primary diff-link">Diff</a>
{% if is_unviewed %}
<a href="{{ url_for('diff_history_page', uuid=watch.uuid, from_version=watch.get_next_snapshot_key_to_last_viewed) }}" target="{{watch.uuid}}" class="pure-button pure-button-primary diff-link">Diff</a>
{% else %}
<a href="{{ url_for('diff_history_page', uuid=watch.uuid)}}" target="{{watch.uuid}}" class="pure-button pure-button-primary diff-link">Diff</a>
{% endif %}
{% else %}
{% if watch.history_n == 1 or (watch.history_n ==0 and watch.error_text_ctime )%}
<a href="{{ url_for('preview_page', uuid=watch.uuid)}}" target="{{watch.uuid}}" class="pure-button pure-button-primary">Preview</a>

View File

@@ -0,0 +1 @@
# placeholder

View File

@@ -0,0 +1,89 @@
# !/usr/bin/python3
import os
from flask import url_for
from ..util import live_server_setup, wait_for_all_checks
def do_test(client, live_server, make_test_use_extra_browser=False):
# Grep for this string in the logs?
test_url = f"https://changedetection.io/ci-test.html"
custom_browser_name = 'custom browser URL'
# needs to be set and something like 'ws://127.0.0.1:3000?stealth=1&--disable-web-security=true'
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
#####################
res = client.post(
url_for("settings_page"),
data={"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver",
# browserless-custom-url is setup in .github/workflows/test-only.yml
# the test script run_custom_browser_url_test.sh will look for 'custom-browser-search-string' in the container logs
'requests-extra_browsers-0-browser_connection_url': 'ws://browserless-custom-url:3000?stealth=1&--disable-web-security=true&custom-browser-search-string=1',
'requests-extra_browsers-0-browser_name': custom_browser_name
},
follow_redirects=True
)
assert b"Settings updated." in res.data
# Add our URL to the import page
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
if make_test_use_extra_browser:
# So the name should appear in the edit page under "Request" > "Fetch Method"
res = client.get(
url_for("edit_page", uuid="first"),
follow_redirects=True
)
assert b'custom browser URL' in res.data
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': f"extra_browser_{custom_browser_name}",
'webdriver_js_execute_code': ''
},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
# Force recheck
res = client.get(url_for("form_watch_checknow"), follow_redirects=True)
assert b'1 watches queued for rechecking.' in res.data
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b'cool it works' in res.data
# Requires playwright to be installed
def test_request_via_custom_browser_url(client, live_server):
live_server_setup(live_server)
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=True)
def test_request_not_via_custom_browser_url(client, live_server):
live_server_setup(live_server)
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=False)

Binary file not shown.

Binary file not shown.

View File

@@ -1,4 +1,4 @@
from . util import live_server_setup, extract_UUID_from_client
from .util import live_server_setup, extract_UUID_from_client, wait_for_all_checks
from flask import url_for
import time
@@ -19,10 +19,16 @@ def test_check_access_control(app, client, live_server):
)
assert b"1 Imported" in res.data
time.sleep(2)
res = client.get(url_for("form_watch_checknow"), follow_redirects=True)
time.sleep(3)
# causes a 'Popped wrong request context.' error when client. is accessed?
#wait_for_all_checks(client)
res = c.get(url_for("form_watch_checknow"), follow_redirects=True)
assert b'1 watches queued for rechecking.' in res.data
time.sleep(2)
time.sleep(3)
# causes a 'Popped wrong request context.' error when client. is accessed?
#wait_for_all_checks(client)
# Enable password check and diff page access bypass
res = c.post(
@@ -42,7 +48,7 @@ def test_check_access_control(app, client, live_server):
assert b"Login" in res.data
# The diff page should return something valid when logged out
res = client.get(url_for("diff_history_page", uuid="first"))
res = c.get(url_for("diff_history_page", uuid="first"))
assert b'Random content' in res.data
# Check wrong password does not let us in
@@ -83,6 +89,8 @@ def test_check_access_control(app, client, live_server):
res = c.get(url_for("logout"),
follow_redirects=True)
assert b"Login" in res.data
res = c.get(url_for("settings_page"),
follow_redirects=True)
@@ -160,5 +168,5 @@ def test_check_access_control(app, client, live_server):
assert b"Login" in res.data
# The diff page should return something valid when logged out
res = client.get(url_for("diff_history_page", uuid="first"))
res = c.get(url_for("diff_history_page", uuid="first"))
assert b'Random content' not in res.data

View File

@@ -96,7 +96,9 @@ def test_api_simple(client, live_server):
)
assert watch_uuid in res.json.keys()
before_recheck_info = res.json[watch_uuid]
assert before_recheck_info['last_checked'] != 0
#705 `last_changed` should be zero on the first check
assert before_recheck_info['last_changed'] == 0
assert before_recheck_info['title'] == 'My test URL'
@@ -157,6 +159,18 @@ def test_api_simple(client, live_server):
# @todo how to handle None/default global values?
assert watch['history_n'] == 2, "Found replacement history section, which is in its own API"
assert watch.get('viewed') == False
# Loading the most recent snapshot should force viewed to become true
client.get(url_for("diff_history_page", uuid="first"), follow_redirects=True)
# Fetch the whole watch again, viewed should be true
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
watch = res.json
assert watch.get('viewed') == True
# basic systeminfo check
res = client.get(
url_for("systeminfo"),
@@ -343,3 +357,24 @@ def test_api_watch_PUT_update(client, live_server):
# Cleanup everything
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_api_import(client, live_server):
api_key = extract_api_key_from_UI(client)
res = client.post(
url_for("import") + "?tag=import-test",
data='https://website1.com\r\nhttps://website2.com',
headers={'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 200
assert len(res.json) == 2
res = client.get(url_for("index"))
assert b"https://website1.com" in res.data
assert b"https://website2.com" in res.data
# Should see the new tag in the tag/groups list
res = client.get(url_for('tags.tags_overview_page'))
assert b'import-test' in res.data

View File

@@ -24,7 +24,7 @@ def test_check_extract_text_from_diff(client, live_server):
)
assert b"1 Imported" in res.data
time.sleep(1)
wait_for_all_checks(client)
# Load in 5 different numbers/changes
last_date=""

View File

@@ -202,3 +202,32 @@ def test_check_filter_and_regex_extract(client, live_server):
# Should not be here
assert b'Some text that did change' not in res.data
def test_regex_error_handling(client, live_server):
#live_server_setup(live_server)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
### test regex error handling
res = client.post(
url_for("edit_page", uuid="first"),
data={"extract_text": '/something bad\d{3/XYZ',
"url": test_url,
"fetch_backend": "html_requests"},
follow_redirects=True
)
assert b'is not a valid regular expression.' in res.data
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data

View File

@@ -33,8 +33,6 @@ def test_strip_regex_text_func():
"/not"
]
fetcher = fetch_site_status.perform_site_check(datastore=False)
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)
assert b"but 1 lines" in stripped_content

View File

@@ -24,7 +24,6 @@ def test_strip_text_func():
ignore_lines = ["sometimes"]
fetcher = fetch_site_status.perform_site_check(datastore=False)
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)
assert b"sometimes" not in stripped_content

View File

@@ -1,16 +1,19 @@
#!/usr/bin/python3
import io
import os
import time
from flask import url_for
from .util import live_server_setup
from .util import live_server_setup, wait_for_all_checks
def test_setup(client, live_server):
live_server_setup(live_server)
def test_import(client, live_server):
# Give the endpoint time to spin up
time.sleep(1)
wait_for_all_checks(client)
res = client.post(
url_for("import_page"),
@@ -119,3 +122,97 @@ def test_import_distillio(client, live_server):
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
# Clear flask alerts
res = client.get(url_for("index"))
def test_import_custom_xlsx(client, live_server):
"""Test can upload a excel spreadsheet and the watches are created correctly"""
#live_server_setup(live_server)
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'import/spreadsheet.xlsx')
with open(filename, 'rb') as f:
data= {
'file_mapping': 'custom',
'custom_xlsx[col_0]': '1',
'custom_xlsx[col_1]': '3',
'custom_xlsx[col_2]': '5',
'custom_xlsx[col_3]': '4',
'custom_xlsx[col_type_0]': 'title',
'custom_xlsx[col_type_1]': 'url',
'custom_xlsx[col_type_2]': 'include_filters',
'custom_xlsx[col_type_3]': 'interval_minutes',
'xlsx_file': (io.BytesIO(f.read()), 'spreadsheet.xlsx')
}
res = client.post(
url_for("import_page"),
data=data,
follow_redirects=True,
)
assert b'4 imported from custom .xlsx' in res.data
# Because this row was actually just a header with no usable URL, we should get an error
assert b'Error processing row number 1' in res.data
res = client.get(
url_for("index")
)
assert b'Somesite results ABC' in res.data
assert b'City news results' in res.data
# Just find one to check over
for uuid, watch in live_server.app.config['DATASTORE'].data['watching'].items():
if watch.get('title') == 'Somesite results ABC':
filters = watch.get('include_filters')
assert filters[0] == '/html[1]/body[1]/div[4]/div[1]/div[1]/div[1]||//*[@id=\'content\']/div[3]/div[1]/div[1]||//*[@id=\'content\']/div[1]'
assert watch.get('time_between_check') == {'weeks': 0, 'days': 1, 'hours': 6, 'minutes': 24, 'seconds': 0}
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_import_watchete_xlsx(client, live_server):
"""Test can upload a excel spreadsheet and the watches are created correctly"""
#live_server_setup(live_server)
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'import/spreadsheet.xlsx')
with open(filename, 'rb') as f:
data= {
'file_mapping': 'wachete',
'xlsx_file': (io.BytesIO(f.read()), 'spreadsheet.xlsx')
}
res = client.post(
url_for("import_page"),
data=data,
follow_redirects=True,
)
assert b'4 imported from Wachete .xlsx' in res.data
res = client.get(
url_for("index")
)
assert b'Somesite results ABC' in res.data
assert b'City news results' in res.data
# Just find one to check over
for uuid, watch in live_server.app.config['DATASTORE'].data['watching'].items():
if watch.get('title') == 'Somesite results ABC':
filters = watch.get('include_filters')
assert filters[0] == '/html[1]/body[1]/div[4]/div[1]/div[1]/div[1]||//*[@id=\'content\']/div[3]/div[1]/div[1]||//*[@id=\'content\']/div[1]'
assert watch.get('time_between_check') == {'weeks': 0, 'days': 1, 'hours': 6, 'minutes': 24, 'seconds': 0}
assert watch.get('fetch_backend') == 'html_requests' # Has inactive 'dynamic wachet'
if watch.get('title') == 'JS website':
assert watch.get('fetch_backend') == 'html_webdriver' # Has active 'dynamic wachet'
if watch.get('title') == 'system default website':
assert watch.get('fetch_backend') == 'system' # uses default if blank
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data

View File

@@ -2,9 +2,8 @@
import time
from flask import url_for
from .util import set_original_response, set_modified_response, live_server_setup
from .util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks
sleep_time_for_fetch_thread = 3
# `subtractive_selectors` should still work in `source:` type requests
def test_fetch_pdf(client, live_server):
@@ -22,7 +21,9 @@ def test_fetch_pdf(client, live_server):
assert b"1 Imported" in res.data
time.sleep(sleep_time_for_fetch_thread)
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
@@ -33,8 +34,42 @@ def test_fetch_pdf(client, live_server):
# So we know if the file changes in other ways
import hashlib
md5 = hashlib.md5(open("test-datastore/endpoint-test.pdf", 'rb').read()).hexdigest().upper()
original_md5 = hashlib.md5(open("test-datastore/endpoint-test.pdf", 'rb').read()).hexdigest().upper()
# We should have one
assert len(md5) >0
assert len(original_md5) >0
# And it's going to be in the document
assert b'Document checksum - '+bytes(str(md5).encode('utf-8')) in res.data
assert b'Document checksum - '+bytes(str(original_md5).encode('utf-8')) in res.data
shutil.copy("tests/test2.pdf", "test-datastore/endpoint-test.pdf")
changed_md5 = hashlib.md5(open("test-datastore/endpoint-test.pdf", 'rb').read()).hexdigest().upper()
res = client.get(url_for("form_watch_checknow"), follow_redirects=True)
assert b'1 watches queued for rechecking.' in res.data
wait_for_all_checks(client)
# Now something should be ready, indicated by having a 'unviewed' class
res = client.get(url_for("index"))
assert b'unviewed' in res.data
# The original checksum should be not be here anymore (cdio adds it to the bottom of the text)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert original_md5.encode('utf-8') not in res.data
assert changed_md5.encode('utf-8') in res.data
res = client.get(
url_for("diff_history_page", uuid="first"),
follow_redirects=True
)
assert original_md5.encode('utf-8') in res.data
assert changed_md5.encode('utf-8') in res.data
assert b'here is a change' in res.data

View File

@@ -80,8 +80,11 @@ def test_headers_in_request(client, live_server):
# Should be only one with headers set
assert watches_with_headers==1
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_body_in_request(client, live_server):
# Add our URL to the import page
test_url = url_for('test_body', _external=True)
if os.getenv('PLAYWRIGHT_DRIVER_URL'):
@@ -170,7 +173,8 @@ def test_body_in_request(client, live_server):
follow_redirects=True
)
assert b"Body must be empty when Request Method is set to GET" in res.data
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_method_in_request(client, live_server):
# Add our URL to the import page

View File

@@ -1,5 +1,5 @@
from flask import url_for
from . util import set_original_response, set_modified_response, live_server_setup
from .util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks
import time
@@ -12,6 +12,7 @@ def test_bad_access(client, live_server):
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
# Attempt to add a body with a GET method
res = client.post(
@@ -59,7 +60,7 @@ def test_bad_access(client, live_server):
data={"url": 'file:///tasty/disk/drive', "tags": ''},
follow_redirects=True
)
time.sleep(1)
wait_for_all_checks(client)
res = client.get(url_for("index"))
assert b'file:// type access is denied for security reasons.' in res.data

View File

@@ -6,9 +6,11 @@ from .util import live_server_setup, wait_for_all_checks
from ..html_tools import *
def test_setup(live_server):
live_server_setup(live_server)
def set_original_response():
test_return_data = """<html>
<body>
@@ -26,6 +28,7 @@ def set_original_response():
f.write(test_return_data)
return None
def set_modified_response():
test_return_data = """<html>
<body>
@@ -44,11 +47,12 @@ def set_modified_response():
return None
# Handle utf-8 charset replies https://github.com/dgtlmoon/changedetection.io/pull/613
def test_check_xpath_filter_utf8(client, live_server):
filter='//item/*[self::description]'
filter = '//item/*[self::description]'
d='''<?xml version="1.0" encoding="UTF-8"?>
d = '''<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>rpilocator.com</title>
@@ -102,9 +106,9 @@ def test_check_xpath_filter_utf8(client, live_server):
# Handle utf-8 charset replies https://github.com/dgtlmoon/changedetection.io/pull/613
def test_check_xpath_text_function_utf8(client, live_server):
filter='//item/title/text()'
filter = '//item/title/text()'
d='''<?xml version="1.0" encoding="UTF-8"?>
d = '''<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>rpilocator.com</title>
@@ -163,15 +167,12 @@ def test_check_xpath_text_function_utf8(client, live_server):
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_check_markup_xpath_filter_restriction(client, live_server):
def test_check_markup_xpath_filter_restriction(client, live_server):
xpath_filter = "//*[contains(@class, 'sametext')]"
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
@@ -214,7 +215,6 @@ def test_check_markup_xpath_filter_restriction(client, live_server):
def test_xpath_validation(client, live_server):
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
@@ -235,6 +235,48 @@ def test_xpath_validation(client, live_server):
assert b'Deleted' in res.data
def test_xpath23_prefix_validation(client, live_server):
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "xpath:/something horrible", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"is not a valid XPath expression" in res.data
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_xpath1_validation(client, live_server):
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "xpath1:/something horrible", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"is not a valid XPath expression" in res.data
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
# actually only really used by the distll.io importer, but could be handy too
def test_check_with_prefix_include_filters(client, live_server):
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
@@ -254,7 +296,8 @@ def test_check_with_prefix_include_filters(client, live_server):
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "xpath://*[contains(@class, 'sametext')]", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests"},
data={"include_filters": "xpath://*[contains(@class, 'sametext')]", "url": test_url, "tags": "", "headers": "",
'fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -266,13 +309,15 @@ def test_check_with_prefix_include_filters(client, live_server):
follow_redirects=True
)
assert b"Some text thats the same" in res.data #in selector
assert b"Some text that will change" not in res.data #not in selector
assert b"Some text thats the same" in res.data # in selector
assert b"Some text that will change" not in res.data # not in selector
client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
def test_various_rules(client, live_server):
# Just check these don't error
#live_server_setup(live_server)
# live_server_setup(live_server)
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write("""<html>
<body>
@@ -285,10 +330,11 @@ def test_various_rules(client, live_server):
<a href=''>some linky </a>
<a href=''>another some linky </a>
<!-- related to https://github.com/dgtlmoon/changedetection.io/pull/1774 -->
<input type="email" id="email" />
<input type="email" id="email" />
</body>
</html>
""")
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
@@ -298,7 +344,6 @@ def test_various_rules(client, live_server):
assert b"1 Imported" in res.data
wait_for_all_checks(client)
for r in ['//div', '//a', 'xpath://div', 'xpath://a']:
res = client.post(
url_for("edit_page", uuid="first"),
@@ -313,3 +358,153 @@ def test_various_rules(client, live_server):
assert b"Updated watch." in res.data
res = client.get(url_for("index"))
assert b'fetch-error' not in res.data, f"Should not see errors after '{r} filter"
res = client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_xpath_20(client, live_server):
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
set_original_response()
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "//*[contains(@class, 'sametext')]|//*[contains(@class, 'changetext')]",
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b"Some text thats the same" in res.data # in selector
assert b"Some text that will change" in res.data # in selector
client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_count(client, live_server):
set_original_response()
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "xpath:count(//div) * 123456789987654321",
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b"246913579975308642" in res.data # in selector
client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_count2(client, live_server):
set_original_response()
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
res = client.post(
url_for("edit_page", uuid="first"),
data={"include_filters": "/html/body/count(div) * 123456789987654321",
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b"246913579975308642" in res.data # in selector
client.get(url_for("form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_string_join_matches(client, live_server):
set_original_response()
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
res = client.post(
url_for("edit_page", uuid="first"),
data={
"include_filters": "xpath:string-join(//*[contains(@class, 'sametext')]|//*[matches(@class, 'changetext')], 'specialconjunction')",
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b"Some text thats the samespecialconjunctionSome text that will change" in res.data # in selector
client.get(url_for("form_delete", uuid="all"), follow_redirects=True)

View File

@@ -0,0 +1,203 @@
import sys
import os
import pytest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import html_tools
# test generation guide.
# 1. Do not include encoding in the xml declaration if the test object is a str type.
# 2. Always paraphrase test.
hotels = """
<hotel>
<branch location="California">
<staff>
<given_name>Christopher</given_name>
<surname>Anderson</surname>
<age>25</age>
</staff>
<staff>
<given_name>Christopher</given_name>
<surname>Carter</surname>
<age>30</age>
</staff>
</branch>
<branch location="Las Vegas">
<staff>
<given_name>Lisa</given_name>
<surname>Walker</surname>
<age>60</age>
</staff>
<staff>
<given_name>Jessica</given_name>
<surname>Walker</surname>
<age>32</age>
</staff>
<staff>
<given_name>Jennifer</given_name>
<surname>Roberts</surname>
<age>50</age>
</staff>
</branch>
</hotel>"""
@pytest.mark.parametrize("html_content", [hotels])
@pytest.mark.parametrize("xpath, answer", [('(//staff/given_name, //staff/age)', '25'),
("xs:date('2023-10-10')", '2023-10-10'),
("if (/hotel/branch[@location = 'California']/staff[1]/age = 25) then 'is 25' else 'is not 25'", 'is 25'),
("if (//hotel/branch[@location = 'California']/staff[1]/age = 25) then 'is 25' else 'is not 25'", 'is 25'),
("if (count(/hotel/branch/staff) = 5) then true() else false()", 'true'),
("if (count(//hotel/branch/staff) = 5) then true() else false()", 'true'),
("for $i in /hotel/branch/staff return if ($i/age >= 40) then upper-case($i/surname) else lower-case($i/surname)", 'anderson'),
("given_name = 'Christopher' and age = 40", 'false'),
("//given_name = 'Christopher' and //age = 40", 'false'),
#("(staff/given_name, staff/age)", 'Lisa'),
("(//staff/given_name, //staff/age)", 'Lisa'),
#("hotel/branch[@location = 'California']/staff/age union hotel/branch[@location = 'Las Vegas']/staff/age", ''),
("(//hotel/branch[@location = 'California']/staff/age union //hotel/branch[@location = 'Las Vegas']/staff/age)", '60'),
("(200 to 210)", "205"),
("(//hotel/branch[@location = 'California']/staff/age union //hotel/branch[@location = 'Las Vegas']/staff/age)", "50"),
("(1, 9, 9, 5)", "5"),
("(3, (), (14, 15), 92, 653)", "653"),
("for $i in /hotel/branch/staff return $i/given_name", "Christopher"),
("for $i in //hotel/branch/staff return $i/given_name", "Christopher"),
("distinct-values(for $i in /hotel/branch/staff return $i/given_name)", "Jessica"),
("distinct-values(for $i in //hotel/branch/staff return $i/given_name)", "Jessica"),
("for $i in (7 to 15) return $i*10", "130"),
("some $i in /hotel/branch/staff satisfies $i/age < 20", "false"),
("some $i in //hotel/branch/staff satisfies $i/age < 20", "false"),
("every $i in /hotel/branch/staff satisfies $i/age > 20", "true"),
("every $i in //hotel/branch/staff satisfies $i/age > 20 ", "true"),
("let $x := branch[@location = 'California'], $y := branch[@location = 'Las Vegas'] return (avg($x/staff/age), avg($y/staff/age))", "27.5"),
("let $x := //branch[@location = 'California'], $y := //branch[@location = 'Las Vegas'] return (avg($x/staff/age), avg($y/staff/age))", "27.5"),
("let $nu := 1, $de := 1000 return 'probability = ' || $nu div $de * 100 || '%'", "0.1%"),
("let $nu := 2, $probability := function ($argument) { 'probability = ' || $nu div $argument * 100 || '%'}, $de := 5 return $probability($de)", "40%"),
("'XPATH2.0-3.1 dissemination' instance of xs:string ", "true"),
("'new stackoverflow question incoming' instance of xs:integer ", "false"),
("'50000' cast as xs:integer", "50000"),
("//branch[@location = 'California']/staff[1]/surname eq 'Anderson'", "true"),
("fn:false()", "false")])
def test_hotels(html_content, xpath, answer):
html_content = html_tools.xpath_filter(xpath, html_content, append_pretty_line_formatting=True)
assert type(html_content) == str
assert answer in html_content
branches_to_visit = """<?xml version="1.0" ?>
<branches_to_visit>
<manager name="Godot" room_no="501">
<branch>Area 51</branch>
<branch>A place with no name</branch>
<branch>Stalsk12</branch>
</manager>
<manager name="Freya" room_no="305">
<branch>Stalsk12</branch>
<branch>Barcelona</branch>
<branch>Paris</branch>
</manager>
</branches_to_visit>"""
@pytest.mark.parametrize("html_content", [branches_to_visit])
@pytest.mark.parametrize("xpath, answer", [
("manager[@name = 'Godot']/branch union manager[@name = 'Freya']/branch", "Area 51"),
("//manager[@name = 'Godot']/branch union //manager[@name = 'Freya']/branch", "Stalsk12"),
("manager[@name = 'Godot']/branch | manager[@name = 'Freya']/branch", "Stalsk12"),
("//manager[@name = 'Godot']/branch | //manager[@name = 'Freya']/branch", "Stalsk12"),
("manager/branch intersect manager[@name = 'Godot']/branch", "A place with no name"),
("//manager/branch intersect //manager[@name = 'Godot']/branch", "A place with no name"),
("manager[@name = 'Godot']/branch intersect manager[@name = 'Freya']/branch", ""),
("manager/branch except manager[@name = 'Godot']/branch", "Barcelona"),
("manager[@name = 'Godot']/branch[1] eq 'Area 51'", "true"),
("//manager[@name = 'Godot']/branch[1] eq 'Area 51'", "true"),
("manager[@name = 'Godot']/branch[1] eq 'Seoul'", "false"),
("//manager[@name = 'Godot']/branch[1] eq 'Seoul'", "false"),
("manager[@name = 'Godot']/branch[2] eq manager[@name = 'Freya']/branch[2]", "false"),
("//manager[@name = 'Godot']/branch[2] eq //manager[@name = 'Freya']/branch[2]", "false"),
("manager[1]/@room_no lt manager[2]/@room_no", "false"),
("//manager[1]/@room_no lt //manager[2]/@room_no", "false"),
("manager[1]/@room_no gt manager[2]/@room_no", "true"),
("//manager[1]/@room_no gt //manager[2]/@room_no", "true"),
("manager[@name = 'Godot']/branch[1] = 'Area 51'", "true"),
("//manager[@name = 'Godot']/branch[1] = 'Area 51'", "true"),
("manager[@name = 'Godot']/branch[1] = 'Seoul'", "false"),
("//manager[@name = 'Godot']/branch[1] = 'Seoul'", "false"),
("manager[@name = 'Godot']/branch = 'Area 51'", "true"),
("//manager[@name = 'Godot']/branch = 'Area 51'", "true"),
("manager[@name = 'Godot']/branch = 'Barcelona'", "false"),
("//manager[@name = 'Godot']/branch = 'Barcelona'", "false"),
("manager[1]/@room_no > manager[2]/@room_no", "true"),
("//manager[1]/@room_no > //manager[2]/@room_no", "true"),
("manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is manager[1]/branch[1]", "false"),
("//manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is //manager[1]/branch[1]", "false"),
("manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is manager[1]/branch[3]", "true"),
("//manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is //manager[1]/branch[3]", "true"),
("manager[@name = 'Godot']/branch[ . = 'Stalsk12'] << manager[1]/branch[1]", "false"),
("//manager[@name = 'Godot']/branch[ . = 'Stalsk12'] << //manager[1]/branch[1]", "false"),
("manager[@name = 'Godot']/branch[ . = 'Stalsk12'] >> manager[1]/branch[1]", "true"),
("//manager[@name = 'Godot']/branch[ . = 'Stalsk12'] >> //manager[1]/branch[1]", "true"),
("manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is manager[@name = 'Freya']/branch[ . = 'Stalsk12']", "false"),
("//manager[@name = 'Godot']/branch[ . = 'Stalsk12'] is //manager[@name = 'Freya']/branch[ . = 'Stalsk12']", "false"),
("manager[1]/@name || manager[2]/@name", "GodotFreya"),
("//manager[1]/@name || //manager[2]/@name", "GodotFreya"),
])
def test_branches_to_visit(html_content, xpath, answer):
html_content = html_tools.xpath_filter(xpath, html_content, append_pretty_line_formatting=True)
assert type(html_content) == str
assert answer in html_content
trips = """
<trips>
<trip reservation_number="10">
<depart>2023-10-06</depart>
<arrive>2023-10-10</arrive>
<traveler name="Christopher Anderson">
<duration>4</duration>
<price>2000.00</price>
</traveler>
</trip>
<trip reservation_number="12">
<depart>2023-10-06</depart>
<arrive>2023-10-12</arrive>
<traveler name="Frank Carter">
<duration>6</duration>
<price>3500.34</price>
</traveler>
</trip>
</trips>"""
@pytest.mark.parametrize("html_content", [trips])
@pytest.mark.parametrize("xpath, answer", [
("1 + 9 * 9 + 5 div 5", "83"),
("(1 + 9 * 9 + 5) div 6", "14.5"),
("23 idiv 3", "7"),
("23 div 3", "7.66666666"),
("for $i in ./trip return $i/traveler/duration * $i/traveler/price", "21002.04"),
("for $i in ./trip return $i/traveler/duration ", "4"),
("for $i in .//trip return $i/traveler/duration * $i/traveler/price", "21002.04"),
("sum(for $i in ./trip return $i/traveler/duration * $i/traveler/price)", "29002.04"),
("sum(for $i in .//trip return $i/traveler/duration * $i/traveler/price)", "29002.04"),
#("trip[1]/depart - trip[1]/arrive", "fail_to_get_answer"),
#("//trip[1]/depart - //trip[1]/arrive", "fail_to_get_answer"),
#("trip[1]/depart + trip[1]/arrive", "fail_to_get_answer"),
#("xs:date(trip[1]/depart) + xs:date(trip[1]/arrive)", "fail_to_get_answer"),
("(//trip[1]/arrive cast as xs:date) - (//trip[1]/depart cast as xs:date)", "P4D"),
("(//trip[1]/depart cast as xs:date) - (//trip[1]/arrive cast as xs:date)", "-P4D"),
("(//trip[1]/depart cast as xs:date) + xs:dayTimeDuration('P3D')", "2023-10-09"),
("(//trip[1]/depart cast as xs:date) - xs:dayTimeDuration('P3D')", "2023-10-03"),
("(456, 623) instance of xs:integer", "false"),
("(456, 623) instance of xs:integer*", "true"),
("/trips/trip instance of element()", "false"),
("/trips/trip instance of element()*", "true"),
("/trips/trip[1]/arrive instance of xs:date", "false"),
("date(/trips/trip[1]/arrive) instance of xs:date", "true"),
("'8' cast as xs:integer", "8"),
("'11.1E3' cast as xs:double", "11100"),
("6.5 cast as xs:integer", "6"),
#("/trips/trip[1]/arrive cast as xs:dateTime", "fail_to_get_answer"),
("/trips/trip[1]/arrive cast as xs:date", "2023-10-10"),
("('2023-10-12') cast as xs:date", "2023-10-12"),
("for $i in //trip return concat($i/depart, ' ', $i/arrive)", "2023-10-06 2023-10-10"),
])
def test_trips(html_content, xpath, answer):
html_content = html_tools.xpath_filter(xpath, html_content, append_pretty_line_formatting=True)
assert type(html_content) == str
assert answer in html_content

View File

@@ -0,0 +1,54 @@
#!/usr/bin/python3
# run from dir above changedetectionio/ dir
# python3 -m unittest changedetectionio.tests.unit.test_notification_diff
import unittest
import os
from changedetectionio.model import Watch
# mostly
class TestDiffBuilder(unittest.TestCase):
def test_watch_get_suggested_from_diff_timestamp(self):
import uuid as uuid_builder
watch = Watch.model(datastore_path='/tmp', default={})
watch.ensure_data_dir_exists()
watch['last_viewed'] = 110
watch.save_history_text(contents=b"hello world", timestamp=100, snapshot_id=str(uuid_builder.uuid4()))
watch.save_history_text(contents=b"hello world", timestamp=105, snapshot_id=str(uuid_builder.uuid4()))
watch.save_history_text(contents=b"hello world", timestamp=109, snapshot_id=str(uuid_builder.uuid4()))
watch.save_history_text(contents=b"hello world", timestamp=112, snapshot_id=str(uuid_builder.uuid4()))
watch.save_history_text(contents=b"hello world", timestamp=115, snapshot_id=str(uuid_builder.uuid4()))
watch.save_history_text(contents=b"hello world", timestamp=117, snapshot_id=str(uuid_builder.uuid4()))
p = watch.get_next_snapshot_key_to_last_viewed
assert p == "112", "Correct last-viewed timestamp was detected"
# When there is only one step of difference from the end of the list, it should return second-last change
watch['last_viewed'] = 116
p = watch.get_next_snapshot_key_to_last_viewed
assert p == "115", "Correct 'second last' last-viewed timestamp was detected when using the last timestamp"
watch['last_viewed'] = 99
p = watch.get_next_snapshot_key_to_last_viewed
assert p == "100"
watch['last_viewed'] = 200
p = watch.get_next_snapshot_key_to_last_viewed
assert p == "115", "When the 'last viewed' timestamp is greater than the newest snapshot, return second last "
watch['last_viewed'] = 109
p = watch.get_next_snapshot_key_to_last_viewed
assert p == "109", "Correct when its the same time"
# new empty one
watch = Watch.model(datastore_path='/tmp', default={})
p = watch.get_next_snapshot_key_to_last_viewed
assert p == None, "None when no history available"
if __name__ == '__main__':
unittest.main()

View File

@@ -1,18 +1,19 @@
#!/usr/bin/python3
import time
import os
from flask import url_for
from ..util import live_server_setup, wait_for_all_checks, extract_UUID_from_client
def test_setup(client, live_server):
live_server_setup(live_server)
# Add a site in paused mode, add an invalid filter, we should still have visual selector data ready
def test_visual_selector_content_ready(client, live_server):
import os
import json
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
time.sleep(1)
live_server_setup(live_server)
# Add our URL to the import page, because the docker container (playwright/selenium) wont be able to connect to our usual test url
test_url = "https://changedetection.io/ci-test/test-runjs.html"
@@ -53,6 +54,13 @@ def test_visual_selector_content_ready(client, live_server):
with open(os.path.join('test-datastore', uuid, 'elements.json'), 'r') as f:
json.load(f)
# Attempt to fetch it via the web hook that the browser would use
res = client.get(url_for('static_content', group='visual_selector_data', filename=uuid))
json.loads(res.data)
assert res.mimetype == 'application/json'
assert res.status_code == 200
# Some options should be enabled
# @todo - in the future, the visibility should be toggled by JS from the request type setting
res = client.get(
@@ -60,4 +68,75 @@ def test_visual_selector_content_ready(client, live_server):
follow_redirects=True
)
assert b'notification_screenshot' in res.data
client.get(
url_for("form_delete", uuid="all"),
follow_redirects=True
)
def test_basic_browserstep(client, live_server):
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
#live_server_setup(live_server)
# Add our URL to the import page, because the docker container (playwright/selenium) wont be able to connect to our usual test url
test_url = "https://changedetection.io/ci-test/test-runjs.html"
res = client.post(
url_for("form_quick_watch_add"),
data={"url": test_url, "tags": '', 'edit_and_watch_submit_button': 'Edit > Watch'},
follow_redirects=True
)
assert b"Watch added in Paused state, saving will unpause" in res.data
res = client.post(
url_for("edit_page", uuid="first", unpause_on_save=1),
data={
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_webdriver",
'browser_steps-0-operation': 'Goto site',
'browser_steps-1-operation': 'Click element',
'browser_steps-1-selector': 'button[name=test-button]',
'browser_steps-1-optional_value': ''
},
follow_redirects=True
)
assert b"unpaused" in res.data
wait_for_all_checks(client)
uuid = extract_UUID_from_client(client)
# Check HTML conversion detected and workd
res = client.get(
url_for("preview_page", uuid=uuid),
follow_redirects=True
)
assert b"This text should be removed" not in res.data
assert b"I smell JavaScript because the button was pressed" in res.data
# now test for 404 errors
res = client.post(
url_for("edit_page", uuid=uuid, unpause_on_save=1),
data={
"url": "https://changedetection.io/404",
"tags": "",
"headers": "",
'fetch_backend': "html_webdriver",
'browser_steps-0-operation': 'Goto site',
'browser_steps-1-operation': 'Click element',
'browser_steps-1-selector': 'button[name=test-button]',
'browser_steps-1-optional_value': ''
},
follow_redirects=True
)
assert b"unpaused" in res.data
wait_for_all_checks(client)
res = client.get(url_for("index"))
assert b'Error - 404' in res.data
client.get(
url_for("form_delete", uuid="all"),
follow_redirects=True
)

View File

@@ -209,6 +209,7 @@ class update_worker(threading.Thread):
from .processors import text_json_diff, restock_diff
while not self.app.config.exit.is_set():
update_handler = None
try:
queued_item_data = self.q.get(block=False)
@@ -229,17 +230,35 @@ class update_worker(threading.Thread):
now = time.time()
try:
processor = self.datastore.data['watching'][uuid].get('processor','text_json_diff')
# Processor is what we are using for detecting the "Change"
processor = self.datastore.data['watching'][uuid].get('processor', 'text_json_diff')
# if system...
# Abort processing when the content was the same as the last fetch
skip_when_same_checksum = queued_item_data.item.get('skip_when_checksum_same')
# @todo some way to switch by name
# Init a new 'difference_detection_processor'
if processor == 'restock_diff':
update_handler = restock_diff.perform_site_check(datastore=self.datastore)
update_handler = restock_diff.perform_site_check(datastore=self.datastore,
watch_uuid=uuid
)
else:
# Used as a default and also by some tests
update_handler = text_json_diff.perform_site_check(datastore=self.datastore)
update_handler = text_json_diff.perform_site_check(datastore=self.datastore,
watch_uuid=uuid
)
# Clear last errors (move to preflight func?)
self.datastore.data['watching'][uuid]['browser_steps_last_error_step'] = None
changed_detected, update_obj, contents = update_handler.run(uuid, skip_when_checksum_same=queued_item_data.item.get('skip_when_checksum_same'))
update_handler.call_browser()
changed_detected, update_obj, contents = update_handler.run_changedetection(uuid,
skip_when_checksum_same=skip_when_same_checksum,
)
# Re #342
# In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.
@@ -391,6 +410,9 @@ class update_worker(threading.Thread):
self.datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})
# Other serious error
process_changedetection_results = False
# import traceback
# print(traceback.format_exc())
else:
# Crash protection, the watch entry could have been removed by this point (during a slow chrome fetch etc)
if not self.datastore.data['watching'].get(uuid):

View File

@@ -66,25 +66,12 @@ services:
# browser-chrome:
# condition: service_started
# browser-chrome:
# hostname: browser-chrome
# image: selenium/standalone-chrome-debug:3.141.59
# environment:
# - VNC_NO_PASSWORD=1
# - SCREEN_WIDTH=1920
# - SCREEN_HEIGHT=1080
# - SCREEN_DEPTH=24
# volumes:
# # Workaround to avoid the browser crashing inside a docker container
# # See https://github.com/SeleniumHQ/docker-selenium#quick-start
# - /dev/shm:/dev/shm
# restart: unless-stopped
# Used for fetching pages via Playwright+Chrome where you need Javascript support.
# Note: Playwright/browserless not supported on ARM type devices (rPi etc)
# RECOMMENDED FOR FETCHING PAGES WITH CHROME
# playwright-chrome:
# hostname: playwright-chrome
# image: browserless/chrome
# image: browserless/chrome:1.60-chrome-stable
# restart: unless-stopped
# environment:
# - SCREEN_WIDTH=1920
@@ -101,6 +88,23 @@ services:
# Ignore HTTPS errors, like for self-signed certs
# - DEFAULT_IGNORE_HTTPS_ERRORS=true
#
# Used for fetching pages via Playwright+Chrome where you need Javascript support.
# Note: works well but is deprecated, doesnt fetch full page screenshots and other issues
# browser-chrome:
# hostname: browser-chrome
# image: selenium/standalone-chrome:4
# environment:
# - VNC_NO_PASSWORD=1
# - SCREEN_WIDTH=1920
# - SCREEN_HEIGHT=1080
# - SCREEN_DEPTH=24
# volumes:
# # Workaround to avoid the browser crashing inside a docker container
# # See https://github.com/SeleniumHQ/docker-selenium#quick-start
# - /dev/shm:/dev/shm
# restart: unless-stopped
volumes:
changedetection-data:

View File

@@ -1,12 +1,13 @@
eventlet>=0.31.0
eventlet>=0.33.3 # related to dnspython fixes
feedgen~=0.9
flask-compress
flask-login~=0.5
# 0.6.3 included compatibility fix for werkzeug 3.x (2.x had deprecation of url handlers)
flask-login>=0.6.3
flask-paginate
flask_expects_json~=1.7
flask_restful
flask_wtf
flask~=2.0
flask_wtf~=1.2
flask~=2.3
inscriptis~=2.2
pytz
timeago~=1.0
@@ -24,11 +25,7 @@ chardet>2.3.0
wtforms~=3.0
jsonpath-ng~=1.5.3
# dnspython 2.3.0 is not compatible with eventlet
# * https://github.com/eventlet/eventlet/issues/781
# * https://datastax-oss.atlassian.net/browse/PYTHON-1320
dnspython<2.3.0
dnspython~=2.4 # related to eventlet fixes
# jq not available on Windows so must be installed manually
@@ -49,18 +46,17 @@ beautifulsoup4
# XPath filtering, lxml is required by bs4 anyway, but put it here to be safe.
lxml
# 3.141 was missing socksVersion, 3.150 was not in pypi, so we try 4.1.0
selenium~=4.1.0
# XPath 2.0-3.1 support
elementpath
# https://stackoverflow.com/questions/71652965/importerror-cannot-import-name-safe-str-cmp-from-werkzeug-security/71653849#71653849
# ImportError: cannot import name 'safe_str_cmp' from 'werkzeug.security'
# need to revisit flask login versions
werkzeug~=2.0.0
selenium~=4.14.0
werkzeug~=3.0
# Templating, so far just in the URLs but in the future can be for the notifications also
jinja2~=3.1
jinja2-time
openpyxl
# https://peps.python.org/pep-0508/#environment-markers
# https://github.com/dgtlmoon/changedetection.io/pull/1009
jq~=1.3; python_version >= "3.8" and sys_platform == "darwin"