Compare commits

..

4 Commits

Author SHA1 Message Date
dgtlmoon fecd181e07 oops
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-02 11:15:12 +01:00
dgtlmoon 525e390523 test env tweaks 2026-03-02 11:10:28 +01:00
dgtlmoon 7fe332ad95 Small fix for 3.14 setup
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-02 10:37:02 +01:00
dgtlmoon b65a01ec02 Python 3.14 test #3662 2026-03-02 10:37:02 +01:00
138 changed files with 1443 additions and 7770 deletions
+9 -26
View File
@@ -66,27 +66,27 @@ jobs:
echo ${{ github.ref }} > changedetectionio/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v4
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:latest
platforms: all
- name: Login to GitHub Container Registry
uses: docker/login-action@v4
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Login to Docker Hub Container Registry
uses: docker/login-action@v4
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v4
uses: docker/setup-buildx-action@v3
with:
install: true
version: latest
@@ -95,7 +95,7 @@ jobs:
# master branch -> :dev container tag
- name: Docker meta :dev
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/metadata-action@v6
uses: docker/metadata-action@v5
id: meta_dev
with:
images: |
@@ -103,19 +103,11 @@ jobs:
ghcr.io/${{ github.repository }}
tags: |
type=raw,value=dev
labels: |
org.opencontainers.image.created=${{ github.event.release.published_at }}
org.opencontainers.image.description=Website, webpage change detection, monitoring and notifications.
org.opencontainers.image.documentation=https://changedetection.io
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=https://github.com/dgtlmoon/changedetection.io
org.opencontainers.image.title=changedetection.io
org.opencontainers.image.url=https://changedetection.io
- name: Build and push :dev
id: docker_build
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/build-push-action@v7
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
@@ -136,10 +128,10 @@ jobs:
echo "Release tag: ${{ github.event.release.tag_name }}"
echo "Github ref: ${{ github.ref }}"
echo "Github ref name: ${{ github.ref_name }}"
- name: Docker meta :tag
if: github.event_name == 'release' && startsWith(github.event.release.tag_name, '0.')
uses: docker/metadata-action@v6
uses: docker/metadata-action@v5
id: meta
with:
images: |
@@ -150,20 +142,11 @@ jobs:
type=semver,pattern={{major}}.{{minor}},value=${{ github.event.release.tag_name }}
type=semver,pattern={{major}},value=${{ github.event.release.tag_name }}
type=raw,value=latest
labels: |
org.opencontainers.image.created=${{ github.event.release.published_at }}
org.opencontainers.image.description=Website, webpage change detection, monitoring and notifications.
org.opencontainers.image.documentation=https://changedetection.io
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=https://github.com/dgtlmoon/changedetection.io
org.opencontainers.image.title=changedetection.io
org.opencontainers.image.url=https://changedetection.io
org.opencontainers.image.version=${{ github.event.release.tag_name }}
- name: Build and push :tag
id: docker_build_tag_release
if: github.event_name == 'release' && startsWith(github.event.release.tag_name, '0.')
uses: docker/build-push-action@v7
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
+3 -3
View File
@@ -60,14 +60,14 @@ jobs:
# Just test that the build works, some libraries won't compile on ARM/rPi etc
- name: Set up QEMU
uses: docker/setup-qemu-action@v4
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:latest
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v4
uses: docker/setup-buildx-action@v3
with:
install: true
version: latest
@@ -75,7 +75,7 @@ jobs:
- name: Test that the docker containers can build (${{ matrix.platform }} - ${{ matrix.dockerfile }})
id: docker_build
uses: docker/build-push-action@v7
uses: docker/build-push-action@v6
# https://github.com/docker/build-push-action#customizing
with:
context: ./
@@ -42,10 +42,10 @@ jobs:
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
uses: docker/setup-buildx-action@v3
- name: Build changedetection.io container for testing under Python ${{ env.PYTHON_VERSION }}
uses: docker/build-push-action@v7
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
@@ -292,8 +292,8 @@ jobs:
- name: Specific tests in built container for Selenium
run: |
docker run --name "changedet" --hostname changedet --rm -e "FLASK_SERVER_NAME=changedet" -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio; pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_content.py'
docker run --name "changedet" --hostname changedet --rm -e "FLASK_SERVER_NAME=changedet" -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio; pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/test_errorhandling.py'
docker run --rm -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio;pytest tests/fetchers/test_content.py && pytest tests/test_errorhandling.py'
# SMTP tests
smtp-tests:
+2 -15
View File
@@ -2,7 +2,7 @@
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
# Semver means never use .01, or 00. Should be .1.
__version__ = '0.54.7'
__version__ = '0.54.3'
from changedetectionio.strtobool import strtobool
from json.decoder import JSONDecodeError
@@ -10,6 +10,7 @@ from json.decoder import JSONDecodeError
from loguru import logger
import getopt
import logging
import os
import platform
import signal
import threading
@@ -60,22 +61,8 @@ import time
# ==============================================================================
import multiprocessing
import os
import sys
# Limit glibc malloc arena count to prevent RSS growth from concurrent requests.
# Default: glibc creates up to 8×CPU_cores arenas. Each concurrent thread/connection
# can trigger a new arena, and freed memory stays mapped in those arenas as RSS forever.
# With MALLOC_ARENA_MAX=2, at most 2 arenas are used; freed pages return to the OS faster.
# Must be set before worker threads start; env var is read lazily by glibc on first arena creation.
if 'MALLOC_ARENA_MAX' not in os.environ:
os.environ['MALLOC_ARENA_MAX'] = '2'
try:
import ctypes as _ctypes
_ctypes.CDLL('libc.so.6').mallopt(-8, 2) # M_ARENA_MAX = -8
except Exception:
pass
# Set spawn as global default (safety net - all our code uses explicit contexts anyway)
# Skip in tests to avoid breaking pytest-flask's LiveServer fixture (uses unpicklable local functions)
if 'pytest' not in sys.modules:
+2 -9
View File
@@ -154,10 +154,11 @@ class Import(Resource):
if extras['processor'] not in available:
return f"Invalid processor '{extras['processor']}'. Available processors: {', '.join(available)}", 400
# Validate fetch_backend if provided (legacy API compat — still accepted, stored as-is)
# Validate fetch_backend if provided
if 'fetch_backend' in extras:
from changedetectionio.content_fetchers import available_fetchers
available = [f[0] for f in available_fetchers()]
# Also allow 'system' and extra_browser_* patterns
is_valid = (
extras['fetch_backend'] == 'system' or
extras['fetch_backend'] in available or
@@ -166,14 +167,6 @@ class Import(Resource):
if not is_valid:
return f"Invalid fetch_backend '{extras['fetch_backend']}'. Available: system, {', '.join(available)}", 400
# Validate browser_profile if provided
if 'browser_profile' in extras:
from changedetectionio.model.browser_profile import get_builtin_profiles, RESERVED_MACHINE_NAMES
store_profiles = self.datastore.data['settings']['application'].get('browser_profiles', {})
known = set(get_builtin_profiles().keys()) | set(store_profiles.keys()) | {'system', None}
if extras['browser_profile'] not in known:
return f"Invalid browser_profile '{extras['browser_profile']}'. Available: {', '.join(str(k) for k in known)}", 400
# Validate notification_urls if provided
if 'notification_urls' in extras:
from wtforms import ValidationError
-10
View File
@@ -85,9 +85,6 @@ class Tag(Resource):
# Create clean tag dict without Watch-specific fields
clean_tag = {k: v for k, v in tag.items() if k not in watch_only_fields}
# fetch_backend is a legacy field superseded by browser_profile — omit from API response
clean_tag.pop('fetch_backend', None)
return clean_tag
@auth.check_token
@@ -180,13 +177,6 @@ class Tag(Resource):
new_uuid = self.datastore.add_tag(title=title)
if new_uuid:
# Apply any extra fields (e.g. processor_config_restock_diff) beyond just title
extra = {k: v for k, v in json_data.items() if k != 'title'}
if extra:
tag = self.datastore.data['settings']['application']['tags'].get(new_uuid)
if tag:
tag.update(extra)
tag.commit()
return {'uuid': new_uuid}, 201
else:
return "Invalid or unsupported tag", 400
+3 -6
View File
@@ -105,9 +105,6 @@ class Watch(Resource):
watch['viewed'] = watch_obj.viewed
watch['link'] = watch_obj.link,
# fetch_backend is a legacy field superseded by browser_profile — omit from API response
watch.pop('fetch_backend', None)
return watch
@auth.check_token
@@ -341,7 +338,7 @@ class WatchHistoryDiff(Resource):
word_diff = True
# Get boolean diff preferences with defaults from DIFF_PREFERENCES_CONFIG
changes_only = strtobool(request.args.get('changesOnly', 'false'))
changes_only = strtobool(request.args.get('changesOnly', 'true'))
ignore_whitespace = strtobool(request.args.get('ignoreWhitespace', 'false'))
include_removed = strtobool(request.args.get('removed', 'true'))
include_added = strtobool(request.args.get('added', 'true'))
@@ -352,7 +349,7 @@ class WatchHistoryDiff(Resource):
previous_version_file_contents=from_version_file_contents,
newest_version_file_contents=to_version_file_contents,
ignore_junk=ignore_whitespace,
include_equal=not changes_only,
include_equal=changes_only,
include_removed=include_removed,
include_added=include_added,
include_replaced=include_replaced,
@@ -570,4 +567,4 @@ class CreateWatch(Resource):
return {'status': f'OK, queueing {len(watches_to_queue)} watches in background'}, 202
return list, 200
return list, 200
@@ -40,6 +40,11 @@ def create_backup(datastore_path, watches: dict, tags: dict = None):
zipObj.write(url_watches_json, arcname="url-watches.json")
logger.debug("Added url-watches.json to backup")
# Add the flask app secret (if it exists)
secret_file = os.path.join(datastore_path, "secret.txt")
if os.path.isfile(secret_file):
zipObj.write(secret_file, arcname="secret.txt")
# Add tag data directories (each tag has its own {uuid}/tag.json)
for uuid, tag in (tags or {}).items():
for f in Path(tag.data_dir).glob('*'):
@@ -146,22 +151,19 @@ def construct_blueprint(datastore: ChangeDetectionStore):
def download_backup(filename):
import re
filename = filename.strip()
backup_filename_regex = BACKUP_FILENAME_FORMAT.format(r"\d+")
backup_filename_regex = BACKUP_FILENAME_FORMAT.format("\d+")
full_path = os.path.join(os.path.abspath(datastore.datastore_path), filename)
if not full_path.startswith(os.path.abspath(datastore.datastore_path)):
abort(404)
# Resolve 'latest' before any validation so checks run against the real filename.
if filename == 'latest':
backups = find_backups()
if not backups:
abort(404)
filename = backups[0]['filename']
if not re.match(r"^" + backup_filename_regex + "$", filename):
abort(400) # Bad Request if the filename doesn't match the pattern
full_path = os.path.join(os.path.abspath(datastore.datastore_path), filename)
if not full_path.startswith(os.path.abspath(datastore.datastore_path) + os.sep):
abort(404)
logger.debug(f"Backup download request for '{full_path}'")
return send_from_directory(os.path.abspath(datastore.datastore_path), filename, as_attachment=True)
+5 -45
View File
@@ -1,7 +1,6 @@
import io
import json
import os
import re
import shutil
import tempfile
import threading
@@ -15,16 +14,6 @@ from loguru import logger
from changedetectionio.flask_app import login_optionally_required
# Maximum size of the uploaded zip file. Override via env var MAX_RESTORE_UPLOAD_MB.
_MAX_UPLOAD_BYTES = int(os.getenv("MAX_RESTORE_UPLOAD_MB", 256)) * 1024 * 1024
# Maximum total uncompressed size of all entries (zip-bomb guard). Override via MAX_RESTORE_DECOMPRESSED_MB.
_MAX_DECOMPRESSED_BYTES = int(os.getenv("MAX_RESTORE_DECOMPRESSED_MB", 1024)) * 1024 * 1024
# Only top-level directories whose name is a valid UUID are treated as watch/tag entries.
_UUID_RE = re.compile(
r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$',
re.IGNORECASE,
)
class RestoreForm(Form):
zip_file = FileField(_l('Backup zip file'), validators=[
@@ -61,18 +50,7 @@ def import_from_zip(zip_stream, datastore, include_groups, include_groups_replac
with tempfile.TemporaryDirectory() as tmpdir:
logger.debug(f"Restore: extracting zip to {tmpdir}")
with zipfile.ZipFile(zip_stream, 'r') as zf:
total_uncompressed = sum(m.file_size for m in zf.infolist())
if total_uncompressed > _MAX_DECOMPRESSED_BYTES:
raise ValueError(
f"Backup archive decompressed size ({total_uncompressed // (1024 * 1024)} MB) "
f"exceeds the {_MAX_DECOMPRESSED_BYTES // (1024 * 1024)} MB limit"
)
resolved_dest = os.path.realpath(tmpdir)
for member in zf.infolist():
member_dest = os.path.realpath(os.path.join(resolved_dest, member.filename))
if not member_dest.startswith(resolved_dest + os.sep) and member_dest != resolved_dest:
raise ValueError(f"Zip Slip path traversal detected in backup archive: {member.filename!r}")
zf.extract(member, tmpdir)
zf.extractall(tmpdir)
logger.debug("Restore: zip extracted, scanning UUID directories")
for entry in os.scandir(tmpdir):
@@ -80,9 +58,6 @@ def import_from_zip(zip_stream, datastore, include_groups, include_groups_replac
continue
uuid = entry.name
if not _UUID_RE.match(uuid):
logger.warning(f"Restore: skipping non-UUID directory {uuid!r}")
continue
tag_json_path = os.path.join(entry.path, 'tag.json')
watch_json_path = os.path.join(entry.path, 'watch.json')
@@ -180,9 +155,7 @@ def construct_restore_blueprint(datastore):
form = RestoreForm()
return render_template("backup_restore.html",
form=form,
restore_running=any(t.is_alive() for t in restore_threads),
max_upload_mb=_MAX_UPLOAD_BYTES // (1024 * 1024),
max_decompressed_mb=_MAX_DECOMPRESSED_BYTES // (1024 * 1024))
restore_running=any(t.is_alive() for t in restore_threads))
@login_optionally_required
@restore_blueprint.route("/restore/start", methods=['POST'])
@@ -200,22 +173,10 @@ def construct_restore_blueprint(datastore):
flash(gettext("File must be a .zip backup file"), "error")
return redirect(url_for('backups.restore.restore'))
# Reject oversized uploads before reading the stream into memory.
content_length = request.content_length
if content_length and content_length > _MAX_UPLOAD_BYTES:
flash(gettext("Backup file is too large (max %(mb)s MB)", mb=_MAX_UPLOAD_BYTES // (1024 * 1024)), "error")
return redirect(url_for('backups.restore.restore'))
# Read into memory now — the request stream is gone once we return.
# Read one byte beyond the limit so we can detect truncated-but-still-oversized streams.
# Read into memory now — the request stream is gone once we return
try:
raw = zip_file.read(_MAX_UPLOAD_BYTES + 1)
if len(raw) > _MAX_UPLOAD_BYTES:
flash(gettext("Backup file is too large (max %(mb)s MB)", mb=_MAX_UPLOAD_BYTES // (1024 * 1024)), "error")
return redirect(url_for('backups.restore.restore'))
zip_bytes = io.BytesIO(raw)
with zipfile.ZipFile(zip_bytes): # quick validity check before spawning
pass
zip_bytes = io.BytesIO(zip_file.read())
zipfile.ZipFile(zip_bytes) # quick validity check before spawning
zip_bytes.seek(0)
except zipfile.BadZipFile:
flash(gettext("Invalid or corrupted zip file"), "error")
@@ -240,7 +201,6 @@ def construct_restore_blueprint(datastore):
name="BackupRestore"
)
restore_thread.start()
restore_threads[:] = [t for t in restore_threads if t.is_alive()]
restore_threads.append(restore_thread)
flash(gettext("Restore started in background, check back in a few minutes."))
return redirect(url_for('backups.restore.restore'))
@@ -19,10 +19,6 @@
<p>{{ _('Restore a backup. Must be a .zip backup file created on/after v0.53.1 (new database layout).') }}</p>
<p>{{ _('Note: This does not override the main application settings, only watches and groups.') }}</p>
<p class="pure-form-message">
{{ _('Max upload size: %(upload)s MB &nbsp;·&nbsp; Max decompressed size: %(decomp)s MB',
upload=max_upload_mb, decomp=max_decompressed_mb) }}
</p>
<form class="pure-form pure-form-stacked settings"
action="{{ url_for('backups.restore.backups_restore_start') }}"
@@ -102,35 +102,6 @@ def run_async_in_browser_loop(coro):
else:
raise RuntimeError("Browser steps event loop is not available")
async def _close_session_resources(session_data, label=''):
"""Close all browser resources for a session in the correct order.
browserstepper.cleanup() closes page+context but not the browser itself.
For CloakBrowser, browser.close() is what stops the local Chromium process via pw.stop().
For the default CDP path, playwright_context.stop() shuts down the playwright instance.
"""
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
await browserstepper.cleanup()
except Exception as e:
logger.error(f"Error cleaning up browserstepper{label}: {e}")
browser = session_data.get('browser')
if browser:
try:
await asyncio.wait_for(browser.close(), timeout=5.0)
except Exception as e:
logger.warning(f"Error closing browser{label}: {e}")
playwright_context = session_data.get('playwright_context')
if playwright_context:
try:
await playwright_context.stop()
except Exception as e:
logger.warning(f"Error stopping playwright context{label}: {e}")
def cleanup_expired_sessions():
"""Remove expired browsersteps sessions and cleanup their resources"""
global browsersteps_sessions, browsersteps_watch_to_session
@@ -148,10 +119,13 @@ def cleanup_expired_sessions():
logger.debug(f"Cleaning up expired browsersteps session {session_id}")
session_data = browsersteps_sessions[session_id]
try:
run_async_in_browser_loop(_close_session_resources(session_data, label=f" for session {session_id}"))
except Exception as e:
logger.error(f"Error cleaning up session {session_id}: {e}")
# Cleanup playwright resources asynchronously
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
@@ -178,10 +152,12 @@ def cleanup_session_for_watch(watch_uuid):
session_data = browsersteps_sessions.get(session_id)
if session_data:
try:
run_async_in_browser_loop(_close_session_resources(session_data, label=f" for watch {watch_uuid}"))
except Exception as e:
logger.error(f"Error cleaning up session {session_id} for watch {watch_uuid}: {e}")
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id} for watch {watch_uuid}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
@@ -202,64 +178,59 @@ def construct_blueprint(datastore: ChangeDetectionStore):
import time
from playwright.async_api import async_playwright
# We keep the playwright session open for many minutes
keepalive_seconds = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60
keepalive_ms = ((keepalive_seconds + 3) * 1000)
browsersteps_start_session = {'start_time': time.time()}
# Build proxy dict first — needed by both the CDP path and fetcher-specific launchers
proxy_url = datastore.get_proxy_url_for_watch(uuid=watch_uuid)
proxy = None
if proxy_url:
from urllib.parse import urlparse
parsed = urlparse(proxy_url)
proxy = {'server': proxy_url}
if parsed.username:
proxy['username'] = parsed.username
if parsed.password:
proxy['password'] = parsed.password
logger.debug(f"Browser Steps: UUID {watch_uuid} selected proxy {proxy_url}")
# Create a new async playwright instance for browser steps
playwright_instance = async_playwright()
playwright_context = await playwright_instance.start()
# Resolve the fetcher class for this watch so we can ask it to launch its own browser
# if it supports that (e.g. CloakBrowser, which runs locally rather than via CDP)
watch = datastore.data['watching'][watch_uuid]
from changedetectionio import content_fetchers
fetcher_class = content_fetchers.get_fetcher(watch.effective_browser_profile.fetch_backend)
browser = None
playwright_context = None
# If the fetcher has its own browser launch for the live steps UI, use it.
# get_browsersteps_browser(proxy, keepalive_ms) returns (browser, playwright_context_or_None)
# or None to fall back to the default CDP path.
if fetcher_class and hasattr(fetcher_class, 'get_browsersteps_browser'):
result = await fetcher_class.get_browsersteps_browser(proxy=proxy, keepalive_ms=keepalive_ms)
if result is not None:
browser, playwright_context = result
logger.debug(f"Browser Steps: using fetcher-specific browser for '{fetcher_class.__name__}'")
# Default: connect to the remote Playwright/sockpuppetbrowser via CDP
if browser is None:
playwright_instance = async_playwright()
playwright_context = await playwright_instance.start()
base_url = os.getenv('PLAYWRIGHT_DRIVER_URL', '').strip('"')
a = "?" if '?' not in base_url else '&'
base_url += a + f"timeout={keepalive_ms}"
browser = await playwright_context.chromium.connect_over_cdp(base_url, timeout=keepalive_ms)
logger.debug(f"Browser Steps: using CDP connection to {base_url}")
keepalive_ms = ((keepalive_seconds + 3) * 1000)
base_url = os.getenv('PLAYWRIGHT_DRIVER_URL', '').strip('"')
a = "?" if not '?' in base_url else '&'
base_url += a + f"timeout={keepalive_ms}"
browser = await playwright_context.chromium.connect_over_cdp(base_url, timeout=keepalive_ms)
browsersteps_start_session['browser'] = browser
browsersteps_start_session['playwright_context'] = playwright_context
proxy_id = datastore.get_preferred_proxy_for_watch(uuid=watch_uuid)
proxy = None
if proxy_id:
proxy_url = datastore.proxy_list.get(proxy_id).get('url')
if proxy_url:
# Playwright needs separate username and password values
from urllib.parse import urlparse
parsed = urlparse(proxy_url)
proxy = {'server': proxy_url}
if parsed.username:
proxy['username'] = parsed.username
if parsed.password:
proxy['password'] = parsed.password
logger.debug(f"Browser Steps: UUID {watch_uuid} selected proxy {proxy_url}")
# Tell Playwright to connect to Chrome and setup a new session via our stepper interface
browserstepper = browser_steps.browsersteps_live_ui(
playwright_browser=browser,
proxy=proxy,
start_url=watch.link,
headers=watch.get('headers')
start_url=datastore.data['watching'][watch_uuid].link,
headers=datastore.data['watching'][watch_uuid].get('headers')
)
# Initialize the async connection
await browserstepper.connect(proxy=proxy)
browsersteps_start_session['browserstepper'] = browserstepper
# For test
#await browsersteps_start_session['browserstepper'].action_goto_url(value="http://example.com?time="+str(time.time()))
return browsersteps_start_session
@@ -40,14 +40,12 @@ def construct_blueprint(datastore: ChangeDetectionStore):
contents = ''
now = time.time()
try:
import asyncio
processor_module = importlib.import_module("changedetectionio.processors.text_json_diff.processor")
update_handler = processor_module.perform_site_check(datastore=datastore,
watch_uuid=uuid
)
update_handler.preferred_proxy_override = preferred_proxy
asyncio.run(update_handler.call_browser())
update_handler.call_browser(preferred_proxy_id=preferred_proxy)
# title, size is len contents not len xfer
except content_fetcher_exceptions.Non200ErrorCodeReceived as e:
if e.status_code == 404:
@@ -175,9 +175,9 @@ class import_xlsx_wachete(Importer):
dynamic_wachet = str(data.get('dynamic wachet', '')).strip().lower() # Convert bool to str to cover all cases
# libreoffice and others can have it as =FALSE() =TRUE(), or bool(true)
if 'true' in dynamic_wachet or dynamic_wachet == '1':
extras['browser_profile'] = 'browser_chromeplaywright'
extras['fetch_backend'] = 'html_webdriver'
elif 'false' in dynamic_wachet or dynamic_wachet == '0':
extras['browser_profile'] = 'direct_http_requests'
extras['fetch_backend'] = 'html_requests'
if data.get('xpath'):
# @todo split by || ?
+1 -1
View File
@@ -7,7 +7,7 @@ def construct_tag_routes(rss_blueprint, datastore):
datastore: The ChangeDetectionStore instance
"""
@rss_blueprint.route("/tag/<uuid_str:tag_uuid>", methods=['GET'])
@rss_blueprint.route("/tag/<string:tag_uuid>", methods=['GET'])
def rss_tag_feed(tag_uuid):
from flask import make_response, request, url_for
@@ -15,9 +15,6 @@ from changedetectionio.auth_decorator import login_optionally_required
def construct_blueprint(datastore: ChangeDetectionStore):
settings_blueprint = Blueprint('settings', __name__, template_folder="templates")
from changedetectionio.blueprint.settings.browser_profile import construct_blueprint as construct_browser_profile_blueprint
settings_blueprint.register_blueprint(construct_browser_profile_blueprint(datastore), url_prefix='/browsers')
@settings_blueprint.route("", methods=['GET', "POST"])
@login_optionally_required
def settings_page():
@@ -1,200 +0,0 @@
import flask_login
from flask import Blueprint, render_template, request, redirect, url_for, flash
from flask_babel import gettext
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
def construct_blueprint(datastore: ChangeDetectionStore):
settings_browser_profile_blueprint = Blueprint(
'settings_browsers',
__name__,
template_folder="templates"
)
def _render_index(browser_profile_form=None, editing_machine_name=None):
from changedetectionio import forms
from changedetectionio import content_fetchers as cf
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
# Only browser-capable fetchers are valid profile types
fetcher_choices = cf.available_browser_fetchers()
if browser_profile_form is None:
browser_profile_form = forms.BrowserProfileForm()
browser_profile_form.fetch_backend.choices = fetcher_choices
fetcher_supports_screenshots = {name: True for name, _ in fetcher_choices}
fetcher_requires_connection_url = {name: True for name, cls in cf.FETCHERS.items()
if getattr(cls, 'requires_connection_url', False)}
# Table shows default built-in profiles first, then user-created profiles
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
user_profiles = dict(cf.DEFAULT_BROWSER_PROFILES)
for machine_name, raw in store_profiles.items():
try:
user_profiles[machine_name] = BrowserProfile(**raw) if isinstance(raw, dict) else raw
except Exception:
pass
current_default = datastore.data['settings']['application'].get('browser_profile') or 'direct_http_requests'
return render_template(
"browser_profiles.html",
browser_profiles=user_profiles,
browser_profile_form=browser_profile_form,
reserved_browser_profile_names=RESERVED_MACHINE_NAMES,
fetcher_choices=fetcher_choices,
fetcher_supports_screenshots=fetcher_supports_screenshots,
fetcher_requires_connection_url=fetcher_requires_connection_url,
current_default_profile=current_default,
editing_machine_name=editing_machine_name,
)
@settings_browser_profile_blueprint.route("", methods=['GET'])
@login_optionally_required
def index():
return _render_index()
@settings_browser_profile_blueprint.route("/<string:machine_name>/edit", methods=['GET'])
@login_optionally_required
def edit(machine_name):
from changedetectionio import forms
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Built-in browser profiles cannot be edited."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
raw = store_profiles.get(machine_name)
if raw is None:
flash(gettext("Browser profile not found."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
profile = BrowserProfile(**raw) if isinstance(raw, dict) else raw
form = forms.BrowserProfileForm(data=profile.model_dump())
return _render_index(browser_profile_form=form, editing_machine_name=machine_name)
@settings_browser_profile_blueprint.route("/save", methods=['POST'])
@login_optionally_required
def save():
from changedetectionio import forms
from changedetectionio import content_fetchers as cf
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
fetcher_choices = [(name, desc) for name, desc in cf.available_fetchers()]
browser_profile_form = forms.BrowserProfileForm(formdata=request.form)
browser_profile_form.fetch_backend.choices = fetcher_choices
if not browser_profile_form.validate():
flash(gettext("Browser profile error: {}").format(
'; '.join(str(e) for errs in browser_profile_form.errors.values() for e in errs)
), 'error')
return redirect(url_for('settings.settings_browsers.index'))
name = browser_profile_form.name.data.strip()
machine_name = BrowserProfile.machine_name_from_str(name)
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Cannot use reserved profile name '{}'. Please choose a different name.").format(name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
original_machine_name = request.form.get('original_machine_name', '').strip()
store_profiles = datastore.data['settings']['application'].setdefault('browser_profiles', {})
if machine_name != original_machine_name and machine_name in store_profiles:
flash(gettext("A browser profile named '{}' already exists.").format(name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
profile_data = {
'name': name,
'fetch_backend': browser_profile_form.fetch_backend.data,
'browser_connection_url': browser_profile_form.browser_connection_url.data or None,
'viewport_width': browser_profile_form.viewport_width.data or 1280,
'viewport_height': browser_profile_form.viewport_height.data or 1000,
'block_images': bool(browser_profile_form.block_images.data),
'block_fonts': bool(browser_profile_form.block_fonts.data),
'ignore_https_errors': bool(browser_profile_form.ignore_https_errors.data),
'user_agent': browser_profile_form.user_agent.data or None,
'locale': browser_profile_form.locale.data or None,
'custom_headers': browser_profile_form.custom_headers.data or '',
'is_builtin': False,
}
try:
BrowserProfile(**profile_data)
except Exception as e:
flash(gettext("Browser profile validation error: {}").format(str(e)), 'error')
return redirect(url_for('settings.settings_browsers.index'))
# Handle rename: remove old key, cascade-update watches and tags
if original_machine_name and original_machine_name != machine_name and original_machine_name in store_profiles:
del store_profiles[original_machine_name]
for watch in datastore.data['watching'].values():
if watch.get('browser_profile') == original_machine_name:
watch['browser_profile'] = machine_name
for tag in datastore.data.get('settings', {}).get('application', {}).get('tags', {}).values():
if tag.get('browser_profile') == original_machine_name:
tag['browser_profile'] = machine_name
store_profiles[machine_name] = profile_data
datastore.commit()
flash(gettext("Browser profile '{}' saved.").format(name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
@settings_browser_profile_blueprint.route("/<string:machine_name>/delete", methods=['GET'])
@login_optionally_required
def delete(machine_name):
from changedetectionio.model.browser_profile import RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Built-in browser profiles cannot be deleted."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
if machine_name not in store_profiles:
flash(gettext("Browser profile not found."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
raw = store_profiles[machine_name]
profile_name = raw.get('name', machine_name) if isinstance(raw, dict) else machine_name
for watch in datastore.data['watching'].values():
if watch.get('browser_profile') == machine_name:
watch['browser_profile'] = None
for tag in datastore.data.get('settings', {}).get('application', {}).get('tags', {}).values():
if tag.get('browser_profile') == machine_name:
tag['browser_profile'] = None
if datastore.data['settings']['application'].get('browser_profile') == machine_name:
datastore.data['settings']['application']['browser_profile'] = None
del store_profiles[machine_name]
datastore.commit()
flash(gettext("Browser profile '{}' deleted.").format(profile_name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
@settings_browser_profile_blueprint.route("/set-default", methods=['POST'])
@login_optionally_required
def set_default():
from changedetectionio import content_fetchers as cf
machine_name = request.form.get('machine_name', '').strip()
if not machine_name:
flash(gettext("No profile specified."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
from changedetectionio.model.browser_profile import get_profile
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
if get_profile(machine_name, store_profiles) is None:
flash(gettext("Unknown browser profile '{}'.").format(machine_name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
datastore.data['settings']['application']['browser_profile'] = machine_name
datastore.commit()
flash(gettext("Default browser profile set to '{}'.").format(machine_name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
return settings_browser_profile_blueprint
@@ -1,163 +0,0 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_field, render_checkbox_field, render_button %}
<div class="edit-form">
<div class="box-wrap inner">
<h2>{{ _('Browser Profiles') }}</h2>
<p>{{ _('Create named profiles to configure browser settings — viewport size, connection URL, image/font blocking, and more. Each profile is based on an available browser type.') }}</p>
<form id="set-default-form" action="{{ url_for('settings.settings_browsers.set_default') }}" method="POST">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="machine_name" id="default-machine-name" value="">
</form>
{% if browser_profiles %}
<table class="pure-table pure-table-striped" style="width:100%; margin-bottom:1.5em;">
<thead>
<tr>
<th style="width:2.5em; text-align:center;" title="{{ _('System default') }}">{{ _('Default') }}</th>
<th>{{ _('Name') }}</th>
<th>{{ _('Type') }}</th>
<th style="width:3em; text-align:center;"></th>
<th>{{ _('Viewport') }}</th>
<th>{{ _('Options') }}</th>
<th></th>
</tr>
</thead>
<tbody>
{% for machine_name, profile in browser_profiles.items() %}
<tr>
<td style="text-align:center;">
<input type="radio"
name="default_profile"
value="{{ machine_name }}"
title="{{ _('Set as system default') }}"
{% if machine_name == current_default_profile %}checked{% endif %}
onchange="setDefaultProfile('{{ machine_name }}')">
</td>
<td>{{ profile.name }}</td>
<td><code>{{ profile.fetch_backend }}</code></td>
<td style="text-align:center;">{{ profile.get_fetcher_class_name()|fetcher_status_icons }}</td>
<td>{{ profile.viewport_width }}×{{ profile.viewport_height }}</td>
<td style="font-size:0.8em; line-height:1.6;">
{% if profile.block_images %}{{ _('No images') }}<br>{% endif %}
{% if profile.block_fonts %}{{ _('No fonts') }}<br>{% endif %}
{% if profile.ignore_https_errors %}{{ _('Ignore TLS') }}<br>{% endif %}
{% if profile.browser_connection_url %}<span title="{{ profile.browser_connection_url }}">{{ _('Custom URL') }}</span>{% endif %}
</td>
<td style="white-space:nowrap;">
{% if not profile.is_builtin %}
<a href="{{ url_for('settings.settings_browsers.edit', machine_name=machine_name) }}"
class="pure-button button-small">{{ _('Edit') }}</a>
<a href="{{ url_for('settings.settings_browsers.delete', machine_name=machine_name) }}"
class="pure-button button-small button-error"
onclick="return confirm('{{ _('Delete this browser profile?') }}')">{{ _('Delete') }}</a>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p style="color:#888; font-style:italic;">{{ _('No browser profiles configured yet. Add one below.') }}</p>
{% endif %}
<div class="border-fieldset">
<h3 id="profile-form-heading">{{ _('Edit browser profile') if editing_machine_name else _('Add new browser profile') }}</h3>
{% if not editing_machine_name %}
<p style="font-size:0.9em; color:#666;">{{ _('Choose a browser type, give it a name, and configure its settings. You can create multiple profiles from the same type with different connection URLs or options.') }}</p>
{% endif %}
<form class="pure-form pure-form-stacked"
id="browser-profile-form"
action="{{ url_for('settings.settings_browsers.save') }}"
method="POST">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="original_machine_name" id="original_machine_name" value="{{ editing_machine_name or '' }}">
<fieldset>
<div class="pure-control-group">
{{ render_field(browser_profile_form.name) }}
</div>
<div class="pure-control-group inline-radio">
{{ render_field(browser_profile_form.fetch_backend, id="profile-fetch-backend") }}
</div>
<div class="pure-control-group browser-only-field cdp-only-field">
{{ render_field(browser_profile_form.browser_connection_url) }}
<span class="pure-form-message-inline">{{ _('Optional — override the system CDP/WebSocket URL for this profile only (e.g.') }} <code>ws://my-chrome:3000</code>).</span>
</div>
<div class="pure-control-group browser-only-field" style="display:flex; gap:1em; flex-wrap:wrap;">
<div>{{ render_field(browser_profile_form.viewport_width) }}</div>
<div>{{ render_field(browser_profile_form.viewport_height) }}</div>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.block_images) }}
<span class="pure-form-message-inline">{{ _('Block image downloads — speeds up loads on image-heavy pages.') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.block_fonts) }}
<span class="pure-form-message-inline">{{ _('Block web font downloads.') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.ignore_https_errors) }}
<span class="pure-form-message-inline">{{ _('Ignore TLS/HTTPS certificate errors (useful for self-signed certs on staging sites).') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_field(browser_profile_form.user_agent) }}
<span class="pure-form-message-inline">{{ _("Leave blank to use the fetcher's default User-Agent.") }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_field(browser_profile_form.locale) }}
<span class="pure-form-message-inline">{{ _('Sets Accept-Language and navigator.language (e.g. en-US, de-DE).') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(browser_profile_form.custom_headers) }}
<span class="pure-form-message-inline">{{ _('Extra HTTP headers for all requests using this profile (one per line, Key: Value). Applied before per-watch headers.') }}</span>
</div>
<div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary" id="profile-submit-btn">{{ _('Save profile') }}</button>
{% if editing_machine_name %}
<a href="{{ url_for('settings.settings_browsers.index') }}" class="pure-button button-cancel">{{ _('Cancel') }}</a>
{% endif %}
<a href="{{ url_for('settings.settings_page') }}" class="pure-button button-cancel">{{ _('Back to Settings') }}</a>
</div>
</fieldset>
</form>
</div>
</div>
</div>
<script>
function setDefaultProfile(machineName) {
document.getElementById('default-machine-name').value = machineName;
document.getElementById('set-default-form').submit();
}
const fetcherSupportsBrowser = {{ fetcher_supports_screenshots | tojson }};
const fetcherRequiresConnectionUrl = {{ fetcher_requires_connection_url | tojson }};
function updateBrowserFieldVisibility() {
const fetchBackend = document.getElementById('profile-fetch-backend').value;
const isBrowser = !!fetcherSupportsBrowser[fetchBackend];
const isCdp = !!fetcherRequiresConnectionUrl[fetchBackend];
document.querySelectorAll('.browser-only-field').forEach(function(el) {
el.style.display = isBrowser ? '' : 'none';
});
document.querySelectorAll('.cdp-only-field').forEach(function(el) {
el.style.display = isCdp ? '' : 'none';
});
}
document.addEventListener('DOMContentLoaded', function() {
const sel = document.getElementById('profile-fetch-backend');
if (sel) {
sel.addEventListener('change', updateBrowserFieldVisibility);
updateBrowserFieldVisibility();
}
});
{% if editing_machine_name %}
document.addEventListener('DOMContentLoaded', function() {
document.getElementById('browser-profile-form').scrollIntoView({behavior: 'smooth'});
});
{% endif %}
</script>
{% endblock %}
@@ -28,7 +28,6 @@
<li class="tab"><a href="{{ url_for('backups.create') }}">{{ _('Backups') }}</a></li>
<li class="tab"><a href="#timedate">{{ _('Time & Date') }}</a></li>
<li class="tab"><a href="#proxies">{{ _('CAPTCHA & Proxies') }}</a></li>
<li class="tab"><a href="{{ url_for('settings.settings_browsers.index') }}">{{ _('Browsers') }}</a></li>
{% if plugin_tabs %}
{% for tab in plugin_tabs %}
<li class="tab"><a href="#plugin-{{ tab.plugin_id }}">{{ tab.tab_label }}</a></li>
@@ -116,7 +115,14 @@
</div>
<div class="tab-pane-inner" id="fetching">
<fieldset class="pure-group" id="webdriver-override-options">
<div class="pure-control-group inline-radio">
{{ render_field(form.application.form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched sites don\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var') }} 'WEBDRIVER_URL'. </p>
</span>
</div>
<fieldset class="pure-group" id="webdriver-override-options" data-visible-for="application-fetch_backend=html_webdriver">
<div class="pure-form-message-inline">
<strong>{{ _('If you\'re having trouble waiting for the page to be fully rendered (text missing etc), try increasing the \'wait\' time here.') }}</strong>
<br>
@@ -140,9 +146,17 @@
{{ render_field(form.requests.form.timeout) }}
<span class="pure-form-message-inline">{{ _('For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.') }}</span><br>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.requests.form.default_ua) }}
<span class="pure-form-message-inline">
{{ _('Applied to all requests.') }}<br><br>
{{ _('Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it\'s important to consider') }} <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">{{ _('all of the ways that the browser is detected') }}</a>.
</span>
</div>
<div class="pure-control-group">
<br>
{{ _('Tip:') }} <a href="{{ url_for('settings.settings_page')}}#proxies">{{ _('Connect using Bright Data proxies, find out more here.') }}</a>
<br>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
</div>
</div>
@@ -338,7 +352,7 @@ nav
</div>
</div>
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successful than "Data Center" for blocked websites.') }}</p>
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.') }}</p>
<div class="pure-control-group" id="extra-proxies-setting">
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }}
+2 -2
View File
@@ -156,9 +156,9 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
@login_optionally_required
def clear_all_history():
if request.method == 'POST':
confirmtext = request.form.get('confirmtext', '')
confirmtext = request.form.get('confirmtext')
if confirmtext.strip().lower() == gettext('clear').strip().lower():
if confirmtext == 'clear':
# Run in background thread to avoid blocking
def clear_history_background():
# Capture UUIDs first to avoid race conditions
+4 -36
View File
@@ -67,10 +67,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
default['proxy'] = ''
# proxy_override set to the json/text list of the items
# browser_profile: None means "use system default" — map to 'system' so the radio pre-selects correctly
if not default.get('browser_profile'):
default['browser_profile'] = 'system'
# Does it use some custom form? does one exist?
processor_name = datastore.data['watching'][uuid].get('processor', '')
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == processor_name), None)
@@ -143,37 +139,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
except Exception as e:
logger.warning(f"Failed to load processor config: {e}")
from changedetectionio.model.browser_profile import BrowserProfile
from changedetectionio import content_fetchers as cf
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
for p in datastore.extra_browsers:
form.fetch_backend.choices.append(p)
# Resolve the name of the system-level default profile for the label
from changedetectionio.model.browser_profile import get_profile
_system_default_machine_name = datastore.data['settings']['application'].get('browser_profile') or 'direct_http_requests'
_all_store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
_default_profile = get_profile(_system_default_machine_name, _all_store_profiles)
if _default_profile:
_system_label = gettext('System settings default') + ' \u2013 ' + _default_profile.name
else:
_system_label = gettext('System settings default')
# Choices: system default + always-present defaults (requests) + user-created profiles
form.browser_profile.choices = [('system', _system_label)] + [
(p.get_machine_name(), p.name)
for p in cf.DEFAULT_BROWSER_PROFILES.values()
] + [
(machine_name, raw.get('name', machine_name) if isinstance(raw, dict) else getattr(raw, 'name', machine_name))
for machine_name, raw in store_profiles.items()
]
# Build a map of machine_name → fetcher class name for the JS visibility system
all_profiles = dict(cf.DEFAULT_BROWSER_PROFILES)
for machine_name, raw in store_profiles.items():
try:
all_profiles[machine_name] = BrowserProfile(**raw) if isinstance(raw, dict) else raw
except Exception:
pass
browser_profile_fetchers = {mn: p.get_fetcher_class_name() for mn, p in all_profiles.items()}
form.fetch_backend.choices.append(("system", 'System settings default'))
# form.browser_steps[0] can be assumed that we 'goto url' first
@@ -241,7 +210,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Recast it if need be to right data Watch handler
watch_class = processors.get_custom_watch_obj_for_processor(form.data.get('processor'))
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore, default=datastore.data['watching'][uuid])
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore.data, default=datastore.data['watching'][uuid])
# Save the watch immediately
datastore.data['watching'][uuid].commit()
@@ -327,7 +296,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
template_args = {
'available_processors': processors.available_processors(),
'available_timezones': sorted(available_timezones()),
'browser_profile_fetchers': browser_profile_fetchers,
'browser_steps_config': browser_step_ui_config,
'emailprefix': os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
'extra_classes': ' '.join(c),
+8 -7
View File
@@ -10,8 +10,7 @@ from changedetectionio import html_tools
def construct_blueprint(datastore: ChangeDetectionStore):
preview_blueprint = Blueprint('ui_preview', __name__, template_folder="../ui/templates")
@preview_blueprint.route("/preview/<uuid_str:uuid>", methods=['GET', 'POST'])
@preview_blueprint.route("/preview/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def preview_page(uuid):
"""
@@ -60,8 +59,12 @@ def construct_blueprint(datastore: ChangeDetectionStore):
versions = []
timestamp = None
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
extra_stylesheets = [url_for('static_content', group='styles', filename='diff.css')]
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
triggered_line_numbers = []
ignored_line_numbers = []
@@ -71,9 +74,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
flash(gettext("Preview unavailable - No fetch/check completed or triggers not reached"), "error")
else:
# So prepare the latest preview or not
preferred_version = request.values.get('version') if request.method == 'POST' else request.args.get('version')
preferred_version = request.args.get('version')
versions = list(watch.history.keys())
timestamp = versions[-1]
if preferred_version and preferred_version in versions:
@@ -112,7 +113,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
highlight_triggered_line_numbers=triggered_line_numbers,
highlight_blocked_line_numbers=blocked_line_numbers,
history_n=watch.history_n,
fetcher_supports_screenshots=fetcher_supports_screenshots,
is_html_webdriver=is_html_webdriver,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
@@ -143,7 +143,7 @@
<div class="tip">
{{ _('For now, Differences are performed on text, not graphically, only the latest screenshot is available.') }}
</div>
{% if fetcher_supports_screenshots %}
{% if is_html_webdriver %}
{% if screenshot %}
<div class="snapshot-age">{{watch_a.snapshot_screenshot_ctime|format_timestamp_timeago}}</div>
<img style="max-width: 80%" id="screenshot-img" alt="{{ _('Current screenshot from most recent request') }}" >
@@ -27,8 +27,7 @@
const proxy_recheck_status_url="{{url_for('check_proxies.get_recheck_status', uuid=uuid)}}";
const screenshot_url="{{url_for('static_content', group='screenshot', filename=uuid)}}";
const watch_visual_selector_data_url="{{url_for('static_content', group='visual_selector_data', filename=uuid)}}";
const default_system_fetch_backend = {{ (browser_profile_fetchers.get(settings_application.get('browser_profile') or 'direct_http_requests', 'requests')) | tojson }};
const browserProfileFetcherMap = {{ browser_profile_fetchers | tojson }};
const default_system_fetch_backend="{{ settings_application['fetch_backend'] }}";
</script>
<script src="{{url_for('static_content', group='js', filename='plugins.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='watch-settings.js')}}" defer></script>
@@ -132,19 +131,11 @@
{% if capabilities.supports_request_type %}
<div class="tab-pane-inner" id="request">
<div class="pure-control-group inline-radio">
<div><label for="browser_profile">{{ form.browser_profile.label.text }}</label></div>
<div><ul class="fetch-backend" id="browser_profile">
{%- for subfield in form.browser_profile %}
<li>
{{ subfield() }}
{{ browser_profile_fetchers.get(subfield.data, '')|fetcher_status_icons }}
<label for="{{ subfield.id }}">{{ subfield.label.text }}</label>
</li>
{%- endfor %}
</ul></div>
{{ render_field(form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>{{ _('Choose how this watch fetches its target URL. \'System settings default\' inherits the global setting.') }}</p>
<p>{{ _('Manage browser profiles in') }} <a href="{{ url_for('settings.settings_browsers.index') }}">{{ _('Settings → Browsers') }}</a>.</p>
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched site doesn\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var \'WEBDRIVER_URL\'.') }} </p>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
</span>
</div>
{% if form.proxy %}
@@ -158,7 +149,7 @@
{% endif %}
<!-- webdriver always -->
<fieldset data-visible-for="fetch_backend=playwright fetch_backend=selenium fetch_backend=puppeteer fetch_backend=cloakbrowser" style="display: none;">
<fieldset data-visible-for="fetch_backend=html_webdriver" style="display: none;">
<div class="pure-control-group">
{{ render_field(form.webdriver_delay) }}
<div class="pure-form-message-inline">
@@ -181,8 +172,8 @@
</div>
</div>
</fieldset>
<!-- requests always -->
<fieldset data-visible-for="fetch_backend=requests">
<!-- html requests always -->
<fieldset data-visible-for="fetch_backend=html_requests">
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">{{ _('Show advanced options') }}</a>
</div>
@@ -219,7 +210,7 @@ Math: {{ 1 + 1 }}") }}
({{ _('Not supported by Selenium browser') }})
</div>
</div>
<fieldset data-visible-for="fetch_backend=requests fetch_backend=playwright fetch_backend=selenium fetch_backend=puppeteer fetch_backend=cloakbrowser" >
<fieldset data-visible-for="fetch_backend=html_requests fetch_backend=html_webdriver" >
<div class="pure-control-group inline-radio advanced-options" style="display: none;">
{{ render_checkbox_field(form.ignore_status_codes) }}
</div>
@@ -17,7 +17,7 @@
<script src="{{ url_for('static_content', group='js', filename='tabs.js') }}" defer></script>
{% if versions|length >= 2 %}
<div id="diff-form" style="text-align: center;">
<form class="pure-form " action="{{url_for('ui.ui_preview.preview_page', uuid=uuid)}}" method="POST">
<form class="pure-form " action="" method="POST">
<fieldset>
<label for="preview-version">{{ _('Select timestamp') }}</label> <select id="preview-version"
name="from_version"
@@ -28,7 +28,6 @@
</option>
{% endfor %}
</select>
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="pure-button pure-button-primary">{{ _('Go') }}</button>
</fieldset>
@@ -81,7 +81,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
sorted_tags = sorted(datastore.data['settings']['application'].get('tags').items(), key=lambda x: x[1]['title'])
proxy_list = datastore.proxy_list
output = render_template(
"watch-overview.html",
active_tag=active_tag,
@@ -93,7 +92,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
form=form,
generate_tag_colors=processors.generate_processor_badge_colors,
guid=datastore.data['app_guid'],
has_proxies=proxy_list,
has_proxies=datastore.proxy_list,
hosted_sticky=os.getenv("SALTED_PASS", False) == False,
now_time_server=round(time.time()),
pagination=pagination,
@@ -105,22 +104,12 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
search_q=request.args.get('q', '').strip(),
sort_attribute=request.args.get('sort') if request.args.get('sort') else request.cookies.get('sort'),
sort_order=request.args.get('order') if request.args.get('order') else request.cookies.get('order'),
system_default_fetcher=datastore.data['settings']['application'].get('browser_profile'),
system_default_fetcher=datastore.data['settings']['application'].get('fetch_backend'),
tags=sorted_tags,
unread_changes_count=datastore.unread_changes_count,
watches=sorted_watches
)
# Return freed template-building memory to the OS immediately.
# render_template allocates ~20MB of intermediate strings that are freed on return,
# but glibc keeps those pages mapped in its arenas as RSS. malloc_trim() forces
# glibc to release them, preventing RSS growth from concurrent Chrome connections.
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
if session.get('share-link'):
del (session['share-link'])
@@ -213,13 +213,12 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{%- set checking_now = is_checking_now(watch) -%}
{%- set history_n = watch.history_n -%}
{%- set favicon = watch.get_favicon_filename() -%}
{%- set error_texts = watch.compile_error_texts(has_proxies=has_proxies) -%}
{%- set system_use_url_watchlist = datastore.data['settings']['application']['ui'].get('use_page_title_in_list') -%}
{# Class settings mirrored in changedetectionio/static/js/realtime.js for the frontend #}
{%- set row_classes = [
loop.cycle('pure-table-odd', 'pure-table-even'),
'processor-' ~ watch['processor'],
'has-error' if error_texts|length > 2 else '',
'has-error' if watch.compile_error_texts()|length > 2 else '',
'paused' if watch.paused is defined and watch.paused != False else '',
'unviewed' if watch.has_unviewed else '',
'has-restock-info' if watch.has_restock_info else 'no-restock-info',
@@ -272,7 +271,7 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{% endif %}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
</span>
<div class="error-text" style="display:none;">{{ error_texts|safe }}</div>
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div>
{%- if watch['processor'] == 'text_json_diff' -%}
{%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%}
<div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div>
@@ -285,7 +284,10 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
</div>
<div class="status-icons">
<a class="link-spread" href="{{url_for('ui.form_share_put_watch', uuid=watch.uuid)}}"><img src="{{url_for('static_content', group='images', filename='spread.svg')}}" class="status-icon icon icon-spread" title="Create a link to share watch config with others" ></a>
{{ watch.effective_browser_profile.get_fetcher_class_name()|fetcher_status_icons }}
{%- set effective_fetcher = watch.get_fetch_backend if watch.get_fetch_backend != "system" else system_default_fetcher -%}
{%- if effective_fetcher and ("html_webdriver" in effective_fetcher or "html_" in effective_fetcher or "extra_browser_" in effective_fetcher) -%}
{{ effective_fetcher|fetcher_status_icons }}
{%- endif -%}
{%- if watch.is_pdf -%}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" alt="Converting PDF to text" >{%- endif -%}
{%- if watch.has_browser_steps -%}<img class="status-icon status-browsersteps" src="{{url_for('static_content', group='images', filename='steps.svg')}}" alt="Browser Steps is enabled" >{%- endif -%}
@@ -303,20 +305,12 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{%- endif -%}
{%- if watch.get('restock') and watch['restock'].get('price') -%}
{%- set restock = watch['restock'] -%}
{%- set price = restock.get('price') -%}
{%- set cur = restock.get('currency','') -%}
{%- if price is not none and (price|string)|regex_search('\d') -%}
<span class="restock-label price" title="{{ _('Price') }}">
{# @todo: make parse_currency/parse_decimal aware of the locale of the actual web page and use that instead changedetectionio/processors/restock_diff/__init__.py #}
{%- if price is number -%}{# It's a number so we can convert it to their locale' #}
{{ price|format_number_locale }} {{ cur }}<!-- as number -->
{%- else -%}{# It's totally fine if it arrives as something else, the website might be something weird in this field #}
{{ price }} {{ cur }}<!-- as string -->
{%- if watch['restock']['price'] is number -%}
<span class="restock-label price" title="{{ _('Price') }}">
{{ watch['restock']['price']|format_number_locale if watch['restock'].get('price') else '' }} {{ watch['restock'].get('currency','') }}
</span>
{%- else -%} <!-- watch['restock']['price']' is not a number, cant output it -->
{%- endif -%}
</span>
{%- endif -%}
{%- elif not watch.has_restock_info -%}
<span class="restock-label error">{{ _('No information') }}</span>
{%- endif -%}
+60 -43
View File
@@ -1,4 +1,5 @@
import sys
from changedetectionio.strtobool import strtobool
from loguru import logger
from changedetectionio.content_fetchers.exceptions import BrowserStepsStepException
import os
@@ -24,71 +25,87 @@ SCREENSHOT_MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_
# Most modern GPUs support 16384x16384 textures, so 1280x10000 is safe
SCREENSHOT_SIZE_STITCH_THRESHOLD = int(os.getenv("SCREENSHOT_CHUNK_HEIGHT", 10000))
# available_fetchers() will scan this implementation looking for anything starting with html_
# this information is used in the form selections
from changedetectionio.content_fetchers.requests import fetcher as html_requests
import importlib.resources
XPATH_ELEMENT_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text(encoding='utf-8')
INSTOCK_DATA_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('stock-not-in-stock.js').read_text(encoding='utf-8')
FAVICON_FETCHER_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('favicon-fetcher.js').read_text(encoding='utf-8')
# Registry: clean fetcher name → fetcher class (e.g. 'requests', 'playwright', 'cloakbrowser')
FETCHERS: dict = {}
def register_fetcher(name: str, cls) -> None:
"""Register a fetcher class under its clean name (no html_ prefix)."""
FETCHERS[name] = cls
def get_fetcher(name: str):
"""Return the fetcher class for a clean name, or None."""
return FETCHERS.get(name)
def available_fetchers():
"""Return list of (name, description) for all registered fetchers."""
return [(name, cls.fetcher_description) for name, cls in FETCHERS.items()
if hasattr(cls, 'fetcher_description')]
# See the if statement at the bottom of this file for how we switch between playwright and webdriver
import inspect
p = []
# Get built-in fetchers (but skip plugin fetchers that were added via setattr)
for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass):
if inspect.isclass(obj):
# @todo html_ is maybe better as fetcher_ or something
# In this case, make sure to edit the default one in store.py and fetch_site_status.py
if name.startswith('html_'):
# Skip plugin fetchers that were already registered
if name not in _plugin_fetchers:
t = tuple([name, obj.fetcher_description])
p.append(t)
# Get plugin fetchers from cache (already loaded at module init)
for name, fetcher_class in _plugin_fetchers.items():
if hasattr(fetcher_class, 'fetcher_description'):
t = tuple([name, fetcher_class.fetcher_description])
p.append(t)
else:
logger.warning(f"Plugin fetcher '{name}' does not have fetcher_description attribute")
return p
def available_browser_fetchers():
"""Return list of (name, description) for fetchers that support screenshots (browser-type fetchers)."""
return [(name, cls.fetcher_description) for name, cls in FETCHERS.items()
if cls.supports_screenshots]
def get_plugin_fetchers():
"""Load and return all plugin fetchers from the centralized plugin manager."""
from changedetectionio.pluggy_interface import plugin_manager
def _load_fetchers():
"""Load all fetchers (built-ins + plugins) into the FETCHERS registry."""
from changedetectionio.pluggy_interface import plugin_manager, register_builtin_fetchers
# Built-ins must be registered first
register_builtin_fetchers()
# Then external plugins
fetchers = {}
try:
# Call the register_content_fetcher hook from all registered plugins
results = plugin_manager.hook.register_content_fetcher()
for result in results:
if result:
name, fetcher_class = result
register_fetcher(name, fetcher_class)
logger.info(f"Registered fetcher: {name} - {getattr(fetcher_class, 'fetcher_description', '?')}")
fetchers[name] = fetcher_class
# Register in current module so hasattr() checks work
setattr(sys.modules[__name__], name, fetcher_class)
logger.info(f"Registered plugin fetcher: {name} - {getattr(fetcher_class, 'fetcher_description', 'No description')}")
except Exception as e:
logger.error(f"Error loading plugin fetchers: {e}")
return fetchers
# Default browser profiles always shown in the browser profiles table (keyed by machine name)
DEFAULT_BROWSER_PROFILES: dict = {}
# Initialize plugins at module load time
_plugin_fetchers = get_plugin_fetchers()
def _register_default_browser_profiles():
"""Register browser profiles that are always present in the profiles table."""
from changedetectionio.model.browser_profile import BUILTIN_REQUESTS
DEFAULT_BROWSER_PROFILES[BUILTIN_REQUESTS.get_machine_name()] = BUILTIN_REQUESTS
# Decide which is the 'real' HTML webdriver, this is more a system wide config
# rather than site-specific.
use_playwright_as_chrome_fetcher = os.getenv('PLAYWRIGHT_DRIVER_URL', False)
if use_playwright_as_chrome_fetcher:
# @note - For now, browser steps always uses playwright
if not strtobool(os.getenv('FAST_PUPPETEER_CHROME_FETCHER', 'False')):
logger.debug('Using Playwright library as fetcher')
from .playwright import fetcher as html_webdriver
else:
logger.debug('Using direct Python Puppeteer library as fetcher')
from .puppeteer import fetcher as html_webdriver
else:
logger.debug("Falling back to selenium as fetcher")
from .webdriver_selenium import fetcher as html_webdriver
# Populate the registry at module load time
_load_fetchers()
_register_default_browser_profiles()
# Register built-in fetchers as plugins after all imports are complete
from changedetectionio.pluggy_interface import register_builtin_fetchers
register_builtin_fetchers()
+18 -32
View File
@@ -70,41 +70,37 @@ class Fetcher():
supports_screenshots = False # Can capture page screenshots
supports_xpath_element_data = False # Can extract xpath element positions/data for visual selector
# Icon shown in the watch list when this fetcher is the effective fetcher.
# Set to a dict with 'filename', 'alt', 'title' keys (image served from static/images/).
# None means no icon is shown (e.g. plain HTTP requests fetcher).
status_icon = None
# Screenshot element locking - prevents layout shifts during screenshot capture
# Only needed for visual comparison (image_ssim_diff processor)
# Locks element dimensions in the first viewport to prevent headers/ads from resizing
lock_viewport_elements = False # Default: disabled for performance
# BrowserProfile-derived settings — applied by browser fetchers, ignored by html_requests
viewport_width: int = 1280
viewport_height: int = 1000
block_images: bool = False
block_fonts: bool = False
profile_user_agent: str = None # Profile-level UA; lower priority than request_headers User-Agent
ignore_https_errors: bool = False
locale: str = None
service_workers: str = 'allow'
extra_delay: int = 0
def __init__(self, **kwargs):
if kwargs and 'screenshot_format' in kwargs:
self.screenshot_format = kwargs.get('screenshot_format')
# Allow lock_viewport_elements to be set via kwargs
if kwargs and 'lock_viewport_elements' in kwargs:
self.lock_viewport_elements = kwargs.get('lock_viewport_elements')
# BrowserProfile fields — store whatever was passed, subclasses use them
for field in ('viewport_width', 'viewport_height', 'block_images', 'block_fonts',
'profile_user_agent', 'ignore_https_errors', 'locale',
'service_workers', 'extra_delay'):
if field in kwargs:
setattr(self, field, kwargs[field])
@classmethod
def get_status_icon_data(cls):
"""Return data for status icon to display in the watch overview.
This method can be overridden by subclasses to provide custom status icons.
Returns:
dict or None: Dictionary with icon data:
{
'filename': 'icon-name.svg', # Icon filename
'alt': 'Alt text', # Alt attribute
'title': 'Tooltip text', # Title attribute
'style': 'height: 1em;' # Optional inline CSS
}
Or None if no icon
"""
return None
def clear_content(self):
"""
@@ -202,16 +198,6 @@ class Fetcher():
# Stop processing here
raise BrowserStepsStepException(step_n=step_n, original_e=e)
def disk_cleanup_after_fetch(self):
"""Remove any temporary files written to disk during a fetch.
The default implementation is a no-op. Browser-based fetchers
override this to delete browser-step screenshots and any other
ephemeral files they create. Called by the processor after
``quit()`` regardless of whether the fetch succeeded or failed.
"""
pass
# It's always good to reset these
def delete_browser_steps_screenshots(self):
import glob
@@ -0,0 +1,471 @@
import asyncio
import gc
import json
import os
from urllib.parse import urlparse
from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, FAVICON_FETCHER_JS
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable, \
BrowserStepsStepException
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Use an approach similar to puppeteer: set a larger viewport and take screenshots in chunks
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Size that won't cause GPU to overflow
screenshot_chunks = []
y = 0
elements_locked = False
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled (took {lock_time:.2f}s)")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
viewport_start = time.time()
logger.debug(f"{watch_info}Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
# Set viewport to a larger size to capture more content at once
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport_size['width']}x{step_size} (took {viewport_time:.2f}s)")
# Capture screenshots in chunks up to the max total height
capture_start = time.time()
chunk_times = []
# Use PNG for better quality (no compression artifacts), JPEG for smaller size
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
# PNG should use quality 100, JPEG uses configurable quality
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
# Only scroll if not at the top (y > 0)
if y > 0:
await page.evaluate(f"window.scrollTo(0, {y})")
# Request GC only before screenshot (not 3x per chunk)
await page.request_gc()
screenshot_kwargs = {
'type': screenshot_type,
'full_page': False
}
# Only pass quality parameter for jpeg (PNG doesn't support it in Playwright)
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
# Restore original viewport size
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
# Unlock element dimensions if they were locked
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
# If we have multiple chunks, stitch them together
if len(screenshot_chunks) > 1:
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
class fetcher(Fetcher):
fetcher_description = "Playwright {}/Javascript".format(
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
)
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
fetcher_description += " via '{}'".format(os.getenv("PLAYWRIGHT_DRIVER_URL"))
browser_type = ''
command_executor = ''
# Configs for Proxy setup
# In the ENV vars, is prefixed with "playwright_proxy_", so it is for example "playwright_proxy_server"
playwright_proxy_settings_mappings = ['bypass', 'server', 'username', 'password']
proxy = None
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Playwright fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
self.browser_type = os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').strip('"')
if custom_browser_connection_url:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
# Fallback to fetching from system
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.browser_connection_url = os.getenv("PLAYWRIGHT_DRIVER_URL", 'ws://playwright-chrome:3000').strip('"')
# If any proxy settings are enabled, then we should setup the proxy object
proxy_args = {}
for k in self.playwright_proxy_settings_mappings:
v = os.getenv('playwright_proxy_' + k, False)
if v:
proxy_args[k] = v.strip('"')
if proxy_args:
self.proxy = proxy_args
# allow per-watch proxy selection override
if proxy_override:
self.proxy = {'server': proxy_override}
if self.proxy:
# Playwright needs separate username and password values
parsed = urlparse(self.proxy.get('server'))
if parsed.username:
self.proxy['username'] = parsed.username
self.proxy['password'] = parsed.password
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Request GC immediately after screenshot to free memory
# Screenshots can be large and browser steps take many of them
await self.page.request_gc()
if self.browser_steps_screenshot_path is not None:
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
logger.debug(f"Saving step screenshot to {destination}")
with open(destination, 'wb') as f:
f.write(screenshot)
# Clear local reference to allow screenshot bytes to be collected
del screenshot
gc.collect()
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = await self.page.content()
# Request GC after getting page content
await self.page.request_gc()
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w', encoding='utf-8') as f:
f.write(content)
# Clear local reference
del content
gc.collect()
async def run(self,
fetch_favicon=True,
current_include_filters=None,
empty_pages_are_a_change=False,
ignore_status_codes=False,
is_binary=False,
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid # Store for use in screenshot_step
response = None
async with async_playwright() as p:
browser_type = getattr(p, self.browser_type)
# Seemed to cause a connection Exception even tho I can see it connect
# self.browser = browser_type.connect(self.command_executor, timeout=timeout*1000)
# 60,000 connection timeout only
browser = await browser_type.connect_over_cdp(self.browser_connection_url, timeout=60000)
# SOCKS5 with authentication is not supported (yet)
# https://github.com/microsoft/playwright/issues/10567
# Set user agent to prevent Cloudflare from blocking the browser
# Use the default one configured in the App.py model that's passed from fetch_site_status.py
context = await browser.new_context(
accept_downloads=False, # Should never be needed
bypass_csp=True, # This is needed to enable JavaScript execution on GitHub and others
extra_http_headers=request_headers,
ignore_https_errors=True,
proxy=self.proxy,
service_workers=os.getenv('PLAYWRIGHT_SERVICE_WORKERS', 'allow'), # Should be `allow` or `block` - sites like YouTube can transmit large amounts of data via Service Workers
user_agent=manage_user_agent(headers=request_headers),
)
self.page = await context.new_page()
# Listen for all console events and handle errors
self.page.on("console", lambda msg: logger.debug(f"Playwright console: Watch URL: {url} {msg.type}: {msg.text} {msg.args}"))
# Re-use as much code from browser steps as possible so its the same
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
response = await browsersteps_interface.action_goto_url(value=url)
if response is None:
await context.close()
await browser.close()
logger.debug("Content Fetcher > Response object from the browser communication was none")
raise EmptyReply(url=url, status_code=None)
# In async_playwright, all_headers() returns a coroutine
try:
self.headers = await response.all_headers()
except TypeError:
# Fallback for sync version
self.headers = response.all_headers()
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
await browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._errors.TimeoutError as e:
await context.close()
await browser.close()
# This can be ok, we will try to grab what we could retrieve
pass
except Exception as e:
logger.debug(f"Content Fetcher > Other exception when executing custom JS code {str(e)}")
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
await self.page.wait_for_timeout(extra_wait * 1000)
try:
self.status_code = response.status
except Exception as e:
# https://github.com/dgtlmoon/changedetection.io/discussions/2122#discussioncomment-8241962
logger.critical(f"Response from the browser/Playwright did not have a status_code! Response follows.")
logger.critical(response)
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if fetch_favicon:
try:
self.favicon_blob = await self.page.evaluate(FAVICON_FETCHER_JS)
await self.page.request_gc()
except Exception as e:
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Finally block will handle cleanup
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
logger.debug("Content Fetcher > Content was empty, empty_pages_are_a_change = False")
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
# Wrap remaining operations in try/finally to ensure cleanup
try:
# Run Browser Steps here
if self.browser_steps:
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
# Finally block will handle cleanup
raise
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
# request_gc before and after evaluate to free up memory
# @todo browsersteps etc
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await self.page.request_gc()
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await self.page.request_gc()
self.content = await self.page.content()
await self.page.request_gc()
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
# JPEG is better here because the screenshots can be very very large
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
self.screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers
await self.page.request_gc()
gc.collect()
except ScreenshotUnavailable:
# Re-raise screenshot unavailable exceptions
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
# Clean up resources properly with timeouts to prevent hanging
try:
if hasattr(self, 'page') and self.page:
await self.page.request_gc()
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"Successfully closed page for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing page for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing page for {url}: {e}")
finally:
self.page = None
try:
if context:
await asyncio.wait_for(context.close(), timeout=5.0)
logger.debug(f"Successfully closed context for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing context for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing context for {url}: {e}")
finally:
context = None
try:
if browser:
await asyncio.wait_for(browser.close(), timeout=5.0)
logger.debug(f"Successfully closed browser connection for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing browser connection for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing browser for {url}: {e}")
finally:
browser = None
# Force Python GC to release Playwright resources immediately
# Playwright objects can have circular references that delay cleanup
gc.collect()
# Plugin registration for built-in fetcher
class PlaywrightFetcherPlugin:
"""Plugin class that registers the Playwright fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the Playwright fetcher"""
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
playwright_plugin = PlaywrightFetcherPlugin()
@@ -1,41 +0,0 @@
"""
Playwright CDP fetcher connects to a remote browser via Chrome DevTools Protocol.
browser_connection_url must be supplied via the resolved BrowserProfile
(set by preconfigure_browser_profiles_based_on_env at startup or edited in the UI).
"""
from loguru import logger
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Chrome (CDP/Remote)"
requires_connection_url = True
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(proxy_override=proxy_override, custom_browser_connection_url=custom_browser_connection_url, **kwargs)
if custom_browser_connection_url:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
logger.critical("Playwright CDP fetcher has no browser_connection_url — browser profile was not configured. "
"Set PLAYWRIGHT_DRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
# CDP always connects to Chromium
self.browser_type = 'chromium'
async def _connect_browser(self, p):
browser_type = getattr(p, self.browser_type)
return await browser_type.connect_over_cdp(self.browser_connection_url, timeout=60_000)
class PlaywrightCDPPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_cdp', fetcher)
cdp_plugin = PlaywrightCDPPlugin()
@@ -1,403 +0,0 @@
"""
Playwright-based content fetchers.
Submodules:
cdp connect to a remote browser via Chrome DevTools Protocol (CDP/WebSocket)
chrome launch a local Chromium browser
firefox launch a local Firefox browser
webkit launch a local WebKit (Safari-engine) browser
"""
import asyncio
import gc
import json
import os
import re
from urllib.parse import urlparse
from loguru import logger
from changedetectionio.content_fetchers import (
SCREENSHOT_MAX_HEIGHT_DEFAULT,
SCREENSHOT_MAX_TOTAL_HEIGHT,
SCREENSHOT_SIZE_STITCH_THRESHOLD,
FAVICON_FETCHER_JS,
INSTOCK_DATA_JS,
XPATH_ELEMENT_JS,
visualselector_xpath_selectors,
)
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import (
BrowserStepsStepException,
EmptyReply,
Non200ErrorCodeReceived,
PageUnloadable,
ScreenshotUnavailable,
)
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import time
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD
screenshot_chunks = []
y = 0
elements_locked = False
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), '..', 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
logger.debug(f"{watch_info}Viewport element locking enabled (took {time.time() - lock_start:.2f}s)")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
capture_start = time.time()
chunk_times = []
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
if y > 0:
await page.evaluate(f"window.scrollTo(0, {y})")
await _safe_request_gc(page)
screenshot_kwargs = {'type': screenshot_type, 'full_page': False}
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), '..', 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
capture_time = time.time() - capture_start
if len(screenshot_chunks) > 1:
stitch_start = time.time()
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px | "
f"Setup: {total_time - capture_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
async def _safe_request_gc(page):
"""Request browser GC — Chromium-specific, silently ignored on Firefox/WebKit."""
try:
await page.request_gc()
except Exception:
pass
class PlaywrightBaseFetcher(Fetcher):
"""
Shared base for all Playwright fetchers.
Subclasses implement ``_connect_browser(playwright_instance)`` to return a
connected-or-launched browser object. Everything else context creation,
page interaction, screenshot capture, browser-steps execution lives here.
"""
playwright_proxy_settings_mappings = ['bypass', 'server', 'username', 'password']
proxy = None
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
# Subclasses may use this (e.g. CDP); others ignore it
self._custom_browser_connection_url = custom_browser_connection_url
proxy_args = {}
for k in self.playwright_proxy_settings_mappings:
v = os.getenv('playwright_proxy_' + k, False)
if v:
proxy_args[k] = v.strip('"')
if proxy_args:
self.proxy = proxy_args
if proxy_override:
self.proxy = {'server': proxy_override}
if self.proxy:
parsed = urlparse(self.proxy.get('server', ''))
if parsed.username:
self.proxy['username'] = parsed.username
self.proxy['password'] = parsed.password
def disk_cleanup_after_fetch(self):
"""Delete browser-step screenshots written during this fetch."""
self.delete_browser_steps_screenshots()
async def _connect_browser(self, playwright_instance):
"""Return an open browser object. Must be overridden by each subclass."""
raise NotImplementedError(f"{type(self).__name__} must implement _connect_browser()")
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(
page=self.page,
screenshot_format=self.screenshot_format,
watch_uuid=watch_uuid,
lock_viewport_elements=self.lock_viewport_elements,
)
await _safe_request_gc(self.page)
if self.browser_steps_screenshot_path is not None:
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
logger.debug(f"Saving step screenshot to {destination}")
with open(destination, 'wb') as f:
f.write(screenshot)
del screenshot
gc.collect()
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = await self.page.content()
await _safe_request_gc(self.page)
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w', encoding='utf-8') as f:
f.write(content)
del content
gc.collect()
async def run(self,
fetch_favicon=True,
current_include_filters=None,
empty_pages_are_a_change=False,
ignore_status_codes=False,
is_binary=False,
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid
response = None
async with async_playwright() as p:
browser = await self._connect_browser(p)
ua = manage_user_agent(headers=request_headers) or self.profile_user_agent or None
context_kwargs = dict(
accept_downloads=False,
bypass_csp=True,
extra_http_headers=request_headers,
ignore_https_errors=self.ignore_https_errors,
proxy=self.proxy,
service_workers=self.service_workers,
user_agent=ua,
viewport={'width': self.viewport_width, 'height': self.viewport_height},
)
if self.locale:
context_kwargs['locale'] = self.locale
context = await browser.new_context(**context_kwargs)
if self.block_images:
await context.route(
re.compile(r'\.(png|jpe?g|gif|svg|ico|webp|avif|bmp)(\?.*)?$', re.IGNORECASE),
lambda route: route.abort()
)
if self.block_fonts:
await context.route(
re.compile(r'\.(woff2?|ttf|otf|eot)(\?.*)?$', re.IGNORECASE),
lambda route: route.abort()
)
self.page = await context.new_page()
self.page.on("console", lambda msg: logger.debug(f"Playwright console: {url} {msg.type}: {msg.text}"))
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
response = await browsersteps_interface.action_goto_url(value=url)
if response is None:
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=None)
try:
self.headers = await response.all_headers()
except TypeError:
self.headers = response.all_headers()
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
await browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._errors.TimeoutError:
await context.close()
await browser.close()
pass
except Exception as e:
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
extra_wait = self.extra_delay + self.render_extract_delay
await self.page.wait_for_timeout(extra_wait * 1000)
try:
self.status_code = response.status
except Exception as e:
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if fetch_favicon:
try:
self.favicon_blob = await self.page.evaluate(FAVICON_FETCHER_JS)
await _safe_request_gc(self.page)
except Exception as e:
logger.error(f"Error fetching favicon: {e}")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
try:
page_html = await self.page.content()
except Exception as e:
logger.warning(f"Got non-200 status {self.status_code} but failed to fetch page content: {e}")
page_html = None
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot, page_html=page_html)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
try:
if self.browser_steps:
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
raise
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await _safe_request_gc(self.page)
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await _safe_request_gc(self.page)
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await _safe_request_gc(self.page)
self.content = await self.page.content()
await _safe_request_gc(self.page)
logger.debug(f"Scrape xPath element data done in {time.time() - now:.2f}s")
self.screenshot = await capture_full_page_async(
page=self.page,
screenshot_format=self.screenshot_format,
watch_uuid=watch_uuid,
lock_viewport_elements=self.lock_viewport_elements,
)
await _safe_request_gc(self.page)
gc.collect()
except ScreenshotUnavailable:
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
for obj, name, close_coro in [
(self.page if hasattr(self, 'page') and self.page else None, 'page', lambda: self.page.close() if self.page else asyncio.sleep(0)),
(context, 'context', lambda: context.close() if context else asyncio.sleep(0)),
(browser, 'browser', lambda: browser.close() if browser else asyncio.sleep(0)),
]:
try:
await asyncio.wait_for(close_coro(), timeout=5.0)
except asyncio.TimeoutError:
logger.warning(f"Timed out closing {name} for {url}")
except Exception as e:
logger.warning(f"Error closing {name} for {url}: {e}")
self.page = None
context = None
browser = None
gc.collect()
@@ -1,27 +0,0 @@
"""
Playwright Chrome fetcher launches a local Chromium browser directly.
No external browser container is required. Playwright must be installed
with Chromium browsers: ``playwright install chromium``.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Chrome (local)"
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.chromium.launch(**launch_kwargs)
class PlaywrightChromePlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_chrome', fetcher)
chrome_plugin = PlaywrightChromePlugin()
@@ -1,33 +0,0 @@
"""
Playwright Firefox fetcher launches a local Firefox browser directly.
No external browser container is required. Playwright must be installed
with Firefox browsers: ``playwright install firefox``.
Note: ``page.request_gc()`` is Chromium-specific and is silently skipped
on Firefox this is handled transparently by ``_safe_request_gc()`` in
the base package.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Firefox (local)"
status_icon = {'filename': 'firefox-icon.svg', 'alt': 'Using Firefox', 'title': 'Using Firefox'}
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.firefox.launch(**launch_kwargs)
class PlaywrightFirefoxPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_firefox', fetcher)
firefox_plugin = PlaywrightFirefoxPlugin()
@@ -1,30 +0,0 @@
"""
Playwright WebKit fetcher launches a local WebKit (Safari-engine) browser.
No external browser container is required. Playwright must be installed
with WebKit browsers: ``playwright install webkit``.
Note: ``page.request_gc()`` is Chromium-specific and is silently skipped
on WebKit handled transparently by ``_safe_request_gc()`` in the base package.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright WebKit/Safari (local)"
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.webkit.launch(**launch_kwargs)
class PlaywrightWebKitPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_webkit', fetcher)
webkit_plugin = PlaywrightWebKitPlugin()
+20 -20
View File
@@ -7,7 +7,6 @@ from urllib.parse import urlparse
from loguru import logger
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_DEFAULT_QUALITY, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, \
SCREENSHOT_MAX_TOTAL_HEIGHT, FAVICON_FETCHER_JS
@@ -167,8 +166,11 @@ async def capture_full_page(page, screenshot_format='JPEG', watch_uuid=None, loc
class fetcher(Fetcher):
fetcher_description = "Puppeteer Chromium"
requires_connection_url = True
fetcher_description = "Puppeteer/direct {}/Javascript".format(
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
)
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
fetcher_description += " via '{}'".format(os.getenv("PLAYWRIGHT_DRIVER_URL"))
browser = None
browser_type = ''
@@ -180,10 +182,14 @@ class fetcher(Fetcher):
supports_screenshots = True
supports_xpath_element_data = True
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
def disk_cleanup_after_fetch(self):
self.delete_browser_steps_screenshots()
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Puppeteer fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
@@ -192,10 +198,9 @@ class fetcher(Fetcher):
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
from loguru import logger
logger.critical("Puppeteer fetcher has no browser_connection_url — browser profile was not configured. "
"Set PLAYWRIGHT_DRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
# Fallback to fetching from system
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.browser_connection_url = os.getenv("PLAYWRIGHT_DRIVER_URL", 'ws://playwright-chrome:3000').strip('"')
# allow per-watch proxy selection override
# @todo check global too?
@@ -265,7 +270,7 @@ class fetcher(Fetcher):
import re
self.delete_browser_steps_screenshots()
n = self.extra_delay + self.render_extract_delay
n = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 12)) + self.render_extract_delay
extra_wait = min(n, 15)
logger.debug(f"Extra wait set to {extra_wait}s, requested was {n}s.")
@@ -442,12 +447,8 @@ class fetcher(Fetcher):
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
try:
page_html = await self.page.content
except Exception as e:
logger.warning(f"Got non-200 status {self.status_code} but failed to fetch page content: {e}")
page_html = None
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot, page_html=page_html)
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
content = await self.page.content
@@ -547,10 +548,9 @@ class fetcher(Fetcher):
class PuppeteerFetcherPlugin:
"""Plugin class that registers the Puppeteer fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the Puppeteer fetcher"""
return ('puppeteer', fetcher)
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
+5 -29
View File
@@ -8,7 +8,6 @@ import asyncio
from changedetectionio import strtobool
from changedetectionio.content_fetchers.exceptions import BrowserStepsInUnsupportedFetcher, EmptyReply, Non200ErrorCodeReceived
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.validate_url import is_private_hostname
@@ -149,32 +148,10 @@ class fetcher(Fetcher):
# Default to UTF-8 for XML if no encoding found
r.encoding = 'utf-8'
else:
# No charset in HTTP header - sniff encoding in priority order matching browsers
# (WHATWG encoding sniffing algorithm):
# 1. BOM - highest confidence, check before anything else
# 2. <meta charset> in first 2kb
# 3. chardet statistical detection - last resort
# See: https://github.com/dgtlmoon/changedetection.io/issues/3952
boms = [
(b'\xef\xbb\xbf', 'utf-8-sig'),
(b'\xff\xfe', 'utf-16-le'),
(b'\xfe\xff', 'utf-16-be'),
]
bom_encoding = next((enc for bom, enc in boms if r.content.startswith(bom)), None)
if bom_encoding:
logger.info(f"URL: {url} Using encoding '{bom_encoding}' detected from BOM")
r.encoding = bom_encoding
else:
meta_charset_match = re.search(rb'<meta[^>]+charset\s*=\s*["\']?\s*([^"\'\s;>]+)', r.content[:2000], re.IGNORECASE)
if meta_charset_match:
encoding = meta_charset_match.group(1).decode('ascii', errors='ignore')
logger.info(f"URL: {url} No content-type encoding in HTTP headers - Using encoding '{encoding}' from HTML meta charset tag")
r.encoding = encoding
else:
encoding = chardet.detect(r.content)['encoding']
logger.warning(f"URL: {url} No charset in headers or meta tag, guessed encoding as '{encoding}' via chardet")
if encoding:
r.encoding = encoding
# For other content types, use chardet
encoding = chardet.detect(r.content)['encoding']
if encoding:
r.encoding = encoding
self.headers = r.headers
@@ -259,10 +236,9 @@ class fetcher(Fetcher):
class RequestsFetcherPlugin:
"""Plugin class that registers the requests fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the requests fetcher"""
return ('requests', fetcher)
return ('html_requests', fetcher)
# Create module-level instance for plugin registration
@@ -3,13 +3,13 @@ import time
from loguru import logger
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.content_fetchers.exceptions import Non200ErrorCodeReceived
from changedetectionio.pluggy_interface import hookimpl
class fetcher(Fetcher):
fetcher_description = "Selenium WebDriver Chrome"
requires_connection_url = True
if os.getenv("WEBDRIVER_URL"):
fetcher_description = f"WebDriver Chrome/Javascript via \"{os.getenv('WEBDRIVER_URL', '')}\""
else:
fetcher_description = "WebDriver Chrome/Javascript"
proxy = None
proxy_url = None
@@ -19,21 +19,26 @@ class fetcher(Fetcher):
supports_screenshots = True
supports_xpath_element_data = True
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for WebDriver fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
from urllib.parse import urlparse
from selenium.webdriver.common.proxy import Proxy
if custom_browser_connection_url:
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
if not custom_browser_connection_url:
self.browser_connection_url = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub').strip('"')
else:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
from loguru import logger
logger.critical("Selenium WebDriver fetcher has no browser_connection_url — browser profile was not configured. "
"Set WEBDRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
##### PROXY SETUP #####
@@ -125,28 +130,22 @@ class fetcher(Fetcher):
if not "--window-size" in os.getenv("CHROME_OPTIONS", ""):
driver.set_window_size(1280, 1024)
driver.implicitly_wait(self.extra_delay)
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
if self.webdriver_js_execute_code is not None:
driver.execute_script(self.webdriver_js_execute_code)
# Selenium doesn't automatically wait for actions as good as Playwright, so wait again
driver.implicitly_wait(self.extra_delay)
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
# @todo - how to check this? is it possible?
self.status_code = 200
# @todo somehow we should try to get this working for WebDriver
# raise EmptyReply(url=url, status_code=r.status_code)
# @todo - dom wait loaded?
import time
time.sleep(self.extra_delay + self.render_extract_delay)
time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay)
self.content = driver.page_source
# Use Navigation Timing API to get the real HTTP status code (Chrome 102+)
# Read after the sleep so the page is fully settled
try:
nav_status = driver.execute_script(
"return window.performance.getEntriesByType('navigation')[0]?.responseStatus"
)
# Guard against 0 (file://, blocked requests) which should not raise Non200
self.status_code = int(nav_status) if nav_status and int(nav_status) > 0 else 200
except Exception:
self.status_code = 200
self.headers = {}
# Selenium always captures as PNG, convert to JPEG if needed
@@ -176,10 +175,6 @@ class fetcher(Fetcher):
img.close()
else:
self.screenshot = screenshot_png
if self.status_code != 200 and not ignore_status_codes:
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=self.screenshot, page_html=self.content)
except Exception as e:
driver.quit()
raise e
@@ -195,10 +190,9 @@ class fetcher(Fetcher):
class WebDriverSeleniumFetcherPlugin:
"""Plugin class that registers the WebDriver Selenium fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the WebDriver Selenium fetcher"""
return ('selenium', fetcher)
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
+56 -40
View File
@@ -4,7 +4,6 @@ import flask_login
import locale
import os
import queue
import re
import sys
import threading
import time
@@ -218,12 +217,8 @@ def _jinja2_filter_format_number_locale(value: float) -> str:
"Formats for example 4000.10 to the local locale default of 4,000.10"
# Format the number with two decimal places (locale format string will return 6 decimal)
formatted_value = locale.format_string("%.2f", value, grouping=True)
return formatted_value
@app.template_filter('regex_search')
def _jinja2_filter_regex_search(value, pattern):
import re
return re.search(pattern, str(value)) is not None
return formatted_value
@app.template_global('is_checking_now')
def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"):
@@ -341,38 +336,52 @@ def _jinja2_filter_format_duration(seconds):
@app.template_filter('fetcher_status_icons')
def _jinja2_filter_fetcher_status_icons(fetcher_name):
"""Return status icon HTML for a fetcher, or empty string if none.
"""Get status icon HTML for a given fetcher.
Built-in fetchers declare their icon via the ``status_icon`` class attribute
on their ``Fetcher`` subclass. Plugin fetchers may still use the pluggy
``collect_fetcher_status_icons`` hook as a fallback.
This filter checks both built-in fetchers and plugin fetchers for status icons.
Args:
fetcher_name: The fetcher name (e.g., 'html_webdriver', 'html_js_zyte')
Returns:
str: HTML string containing status icon elements
"""
from changedetectionio import content_fetchers
from changedetectionio.pluggy_interface import collect_fetcher_status_icons
from markupsafe import Markup
from flask import url_for
icon_data = None
fetcher_class = content_fetchers.get_fetcher(fetcher_name)
if fetcher_class is not None:
icon_data = getattr(fetcher_class, 'status_icon', None)
if not icon_data and callable(getattr(fetcher_class, 'get_status_icon_data', None)):
# First check if it's a plugin fetcher (plugins have priority)
plugin_icon_data = collect_fetcher_status_icons(fetcher_name)
if plugin_icon_data:
icon_data = plugin_icon_data
# Check if it's a built-in fetcher
elif hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
if hasattr(fetcher_class, 'get_status_icon_data'):
icon_data = fetcher_class.get_status_icon_data()
# Fallback: pluggy hook for plugins that implement fetcher_status_icon
if not icon_data:
from changedetectionio.pluggy_interface import collect_fetcher_status_icons
icon_data = collect_fetcher_status_icons(fetcher_name)
# Build HTML from icon data
if icon_data and isinstance(icon_data, dict):
# Use 'group' from icon_data if specified, otherwise default to 'images'
group = icon_data.get('group', 'images')
if not icon_data:
return ''
# Try to use url_for, but fall back to manual URL building if endpoint not registered yet
try:
icon_url = url_for('static_content', group=group, filename=icon_data['filename'])
except:
# Fallback: build URL manually respecting APPLICATION_ROOT
from flask import request
app_root = request.script_root if hasattr(request, 'script_root') else ''
icon_url = f"{app_root}/static/{group}/{icon_data['filename']}"
group = icon_data.get('group', 'images')
icon_url = url_for('static_content', group=group, filename=icon_data['filename'])
style_attr = f' style="{icon_data["style"]}"' if icon_data.get('style') else ''
return Markup(f'<img class="status-icon" src="{icon_url}" alt="{icon_data["alt"]}" title="{icon_data["title"]}"{style_attr}>')
style_attr = f' style="{icon_data["style"]}"' if icon_data.get('style') else ''
html = f'<img class="status-icon" src="{icon_url}" alt="{icon_data["alt"]}" title="{icon_data["title"]}"{style_attr}>'
return Markup(html)
_RE_SANITIZE_TAG = re.compile(r'[^a-zA-Z0-9]')
return ''
@app.template_filter('sanitize_tag_class')
def _jinja2_filter_sanitize_tag_class(tag_title):
@@ -385,8 +394,9 @@ def _jinja2_filter_sanitize_tag_class(tag_title):
Returns:
str: A sanitized string suitable for use as a CSS class name
"""
import re
# Remove all non-alphanumeric characters and convert to lowercase
sanitized = _RE_SANITIZE_TAG.sub('', tag_title).lower()
sanitized = re.sub(r'[^a-zA-Z0-9]', '', tag_title).lower()
# Ensure it starts with a letter (CSS requirement)
if sanitized and not sanitized[0].isalpha():
sanitized = 'tag' + sanitized
@@ -474,21 +484,28 @@ def changedetection_app(config=None, datastore_o=None):
available_languages = get_available_languages()
language_codes = get_language_codes()
_locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
_locale_match_list = language_codes + list(_locale_aliases.keys())
def get_locale():
# Locale aliases: map browser language codes to translation directory names
# This handles cases where browsers send standard codes (e.g., zh-TW)
# but our translations use more specific codes (e.g., zh_Hant_TW)
locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
# 1. Try to get locale from session (user explicitly selected)
if 'locale' in session:
return session['locale']
# 2. Fall back to Accept-Language header
browser_locale = request.accept_languages.best_match(_locale_match_list)
# 3. Map browser locale to our internal locale if needed
return _locale_aliases.get(browser_locale, browser_locale)
# Get the best match from browser's Accept-Language header
browser_locale = request.accept_languages.best_match(language_codes + list(locale_aliases.keys()))
# 3. Check if we need to map the browser locale to our internal locale
if browser_locale in locale_aliases:
return locale_aliases[browser_locale]
return browser_locale
# Initialize Babel with locale selector
babel = Babel(app, locale_selector=get_locale)
@@ -1001,16 +1018,15 @@ def check_for_new_version():
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
session = requests.Session()
session.verify = False
while not app.config.exit.is_set():
try:
r = session.post("https://changedetection.io/check-ver.php",
r = requests.post("https://changedetection.io/check-ver.php",
data={'version': __version__,
'app_guid': datastore.data['app_guid'],
'watch_count': len(datastore.data['watching'])
})
},
verify=False)
except:
pass
+12 -66
View File
@@ -608,12 +608,13 @@ class ValidateCSSJSONXPATHInput(object):
raise ValidationError("XPath not permitted in this field!")
from lxml import etree, html
import elementpath
from changedetectionio.html_tools import SafeXPath3Parser
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
tree = html.fromstring("<html></html>")
line = line.replace('xpath:', '')
try:
elementpath.select(tree, line.strip(), parser=SafeXPath3Parser)
elementpath.select(tree, line.strip(), parser=XPath3Parser)
except elementpath.ElementPathError as e:
message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
raise ValidationError(message % (line, str(e)))
@@ -667,11 +668,9 @@ class ValidateCSSJSONXPATHInput(object):
# `jq` requires full compilation in windows and so isn't generally available
raise ValidationError("jq not support not found")
from changedetectionio.html_tools import validate_jq_expression
input = line.replace('jq:', '')
try:
validate_jq_expression(input)
jq.compile(input)
except (ValueError) as e:
message = field.gettext('\'%s\' is not a valid jq expression. (%s)')
@@ -742,6 +741,7 @@ class commonSettingsForm(Form):
self.notification_title.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
self.notification_urls.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
fetch_backend = RadioField(_l('Fetch Method'), choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
notification_body = TextAreaField(_l('Notification Body'), default='{{ watch_url }} had a change.', validators=[validators.Optional(), ValidateJinja2Template()])
notification_format = SelectField(_l('Notification format'), choices=list(valid_notification_formats.items()))
notification_title = StringField(_l('Notification Title'), default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()])
@@ -778,7 +778,6 @@ class SingleBrowserStep(Form):
class processor_text_json_diff_form(commonSettingsForm):
browser_profile = RadioField(_l('Browser / Fetch method'), choices=[]) # populated at runtime in edit.py
url = fields.URLField('Web Page URL', validators=[validateURL()])
tags = StringTagUUID('Group Tag', [validators.Optional()], default='')
@@ -940,66 +939,10 @@ class SingleExtraBrowser(Form):
ValidateSimpleURL()
], render_kw={"placeholder": "wss://brightdata... wss://oxylabs etc", "size":50})
class BrowserProfileForm(Form):
"""Create or edit a named BrowserProfile stored in settings.application.browser_profiles."""
name = StringField(
_l('Profile name'),
[validators.DataRequired(), validators.Length(max=100)],
render_kw={"placeholder": _l("e.g. Mobile Chrome, Bright Data CDP"), "maxlength": "100"}
)
fetch_backend = SelectField(
_l('Fetch method'),
choices=[], # populated at runtime from available_fetchers()
)
browser_connection_url = StringField(
_l('Browser connection URL'),
[
validators.Optional(),
ValidateStartsWithRegex(
regex=r'^(wss?|ws|http|https)://',
flags=re.IGNORECASE,
message=_l('Browser connection URL must start with ws://, wss://, http://, https://')
),
ValidateSimpleURL(),
],
render_kw={"placeholder": "ws://my-chrome:3000", "size": 50}
)
viewport_width = IntegerField(
_l('Viewport width (px)'),
[validators.Optional(), validators.NumberRange(min=100, max=7680)],
default=1280,
render_kw={"style": "width:5em;"}
)
viewport_height = IntegerField(
_l('Viewport height (px)'),
[validators.Optional(), validators.NumberRange(min=100, max=4320)],
default=1000,
render_kw={"style": "width:5em;"}
)
block_images = BooleanField(_l('Block images (faster loads)'), default=False)
block_fonts = BooleanField(_l('Block web fonts'), default=False)
ignore_https_errors = BooleanField(_l('Ignore HTTPS/TLS errors'), default=False)
user_agent = StringField(
_l('User-Agent override'),
[validators.Optional(), validators.Length(max=500)],
render_kw={"placeholder": _l("Leave blank to use fetcher default"), "size": 60}
)
locale = StringField(
_l('Locale'),
[validators.Optional(), validators.Length(max=20)],
render_kw={"placeholder": "en-US, de-DE, fr-FR …", "size": 15}
)
custom_headers = TextAreaField(
_l('Custom headers'),
[validators.Optional()],
render_kw={
"placeholder": "Header-Name: value\nAnother-Header: value",
"rows": 4, "cols": 60,
"style": "font-family:monospace;"
}
)
class DefaultUAInputForm(Form):
html_requests = StringField(_l('Plaintext requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
if os.getenv("PLAYWRIGHT_DRIVER_URL") or os.getenv("WEBDRIVER_URL"):
html_webdriver = StringField(_l('Chrome requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
# datastore.data['settings']['requests']..
class globalSettingsRequestForm(Form):
@@ -1023,6 +966,8 @@ class globalSettingsRequestForm(Form):
extra_proxies = FieldList(FormField(SingleExtraProxy), min_entries=5)
extra_browsers = FieldList(FormField(SingleExtraBrowser), min_entries=5)
default_ua = FormField(DefaultUAInputForm, label=_l("Default User-Agent overrides"))
def validate_extra_proxies(self, extra_validators=None):
for e in self.data['extra_proxies']:
if e.get('proxy_name') or e.get('proxy_url'):
@@ -1045,6 +990,7 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"placeholder": os.getenv('BASE_URL', 'Not set')}
)
empty_pages_are_a_change = BooleanField(_l('Treat empty pages as a change?'), default=False)
fetch_backend = RadioField(_l('Fetch Method'), default="html_requests", choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
global_ignore_text = StringListField(_l('Ignore Text'), [ValidateListRegex()])
global_subtractive_selectors = StringListField(_l('Remove elements'), [ValidateCSSJSONXPATHInput(allow_json=False)])
ignore_whitespace = BooleanField(_l('Ignore whitespace'))
@@ -1060,7 +1006,7 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"placeholder": "0.1", "style": "width: 8em;"}
)
password = SaltyPasswordField(_l('Password'), render_kw={"autocomplete": "new-password"})
password = SaltyPasswordField(_l('Password'))
pager_size = IntegerField(_l('Pager size'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,
+12 -122
View File
@@ -4,7 +4,6 @@ from loguru import logger
from typing import List
import html
import json
import os
import re
# HTML added to be sure each result matching a filter (.example) gets converted to a new line by Inscriptis
@@ -14,45 +13,6 @@ PERL_STYLE_REGEX = r'^/(.*?)/([a-z]*)?$'
TITLE_RE = re.compile(r"<title[^>]*>(.*?)</title>", re.I | re.S)
META_CS = re.compile(r'<meta[^>]+charset=["\']?\s*([a-z0-9_\-:+.]+)', re.I)
# jq builtins that can leak sensitive data or cause harm when user-supplied expressions are executed.
# env/$ENV reads all process environment variables (passwords, API keys, etc.)
# include/import can read arbitrary files from disk
# input/inputs reads beyond the supplied JSON data
# debug/stderr leaks data to stderr
# halt/halt_error terminates the process (DoS)
_JQ_BLOCKED_PATTERNS = [
(re.compile(r'\benv\b'), 'env (reads environment variables)'),
(re.compile(r'\$ENV\b'), '$ENV (reads environment variables)'),
(re.compile(r'\binclude\b'), 'include (reads files from disk)'),
(re.compile(r'\bimport\b'), 'import (reads files from disk)'),
(re.compile(r'\binputs?\b'), 'input/inputs (reads beyond provided data)'),
(re.compile(r'\bdebug\b'), 'debug (leaks data to stderr)'),
(re.compile(r'\bstderr\b'), 'stderr (leaks data to stderr)'),
(re.compile(r'\bhalt(?:_error)?\b'), 'halt/halt_error (terminates the process)'),
(re.compile(r'\$__loc__\b'), '$__loc__ (leaks file path information)'),
(re.compile(r'\bbuiltins\b'), 'builtins (enumerates available functions)'),
(re.compile(r'\bmodulemeta\b'), 'modulemeta (leaks module information)'),
(re.compile(r'\$JQ_BUILD_CONFIGURATION\b'), '$JQ_BUILD_CONFIGURATION (leaks build information)'),
]
def validate_jq_expression(expression: str) -> None:
"""Raise ValueError if the jq expression uses any dangerous builtin.
User-supplied jq expressions are executed server-side. Without this check,
builtins like `env` expose every process environment variable (SALTED_PASS,
proxy credentials, API keys, etc.) as watch output.
"""
from changedetectionio.strtobool import strtobool
if strtobool(os.getenv('JQ_ALLOW_RISKY_EXPRESSIONS', 'false')):
return
for pattern, description in _JQ_BLOCKED_PATTERNS:
if pattern.search(expression):
msg = f"jq expression uses disallowed builtin: {description}"
logger.critical(f"Security: blocked jq expression containing '{description}' - expression: {expression!r}")
raise ValueError(msg)
META_CT = re.compile(r'<meta[^>]+http-equiv=["\']?content-type["\']?[^>]*content=["\'][^>]*charset=([a-z0-9_\-:+.]+)', re.I)
# 'price' , 'lowPrice', 'highPrice' are usually under here
@@ -63,59 +23,6 @@ class JSONNotFound(ValueError):
def __init__(self, msg):
ValueError.__init__(self, msg)
_DEFAULT_UNSAFE_XPATH3_FUNCTIONS = [
'unparsed-text',
'unparsed-text-lines',
'unparsed-text-available',
'doc',
'doc-available',
'json-doc',
'json-doc-available',
'collection', # XPath 2.0+: loads XML node collections from arbitrary URIs
'uri-collection', # XPath 3.0+: enumerates URIs from resource collections
'transform', # XPath 3.1: XSLT transformation (currently raises, block proactively)
'load-xquery-module', # XPath 3.1: loads XQuery modules (currently raises, block proactively)
'environment-variable',
'available-environment-variables',
]
def _build_safe_xpath3_parser():
"""Return an XPath3Parser subclass with filesystem/environment access functions removed.
XPath 3.0 includes functions that can read arbitrary files or environment variables:
- unparsed-text / unparsed-text-lines / unparsed-text-available (file read)
- doc / doc-available (XML fetch from URI)
- environment-variable / available-environment-variables (env var leakage)
Subclassing gives us an independent symbol_table copy (not shared with the parent class),
so removing entries here does not affect XPath3Parser itself.
Override the blocked list via the XPATH_BLOCKED_FUNCTIONS environment variable
(comma-separated, e.g. "unparsed-text,doc,environment-variable").
"""
import os
from elementpath.xpath3 import XPath3Parser
class SafeXPath3Parser(XPath3Parser):
pass
env_override = os.getenv('XPATH_BLOCKED_FUNCTIONS')
if env_override is not None:
blocked = [f.strip() for f in env_override.split(',') if f.strip()]
else:
blocked = _DEFAULT_UNSAFE_XPATH3_FUNCTIONS
for _fn in blocked:
SafeXPath3Parser.symbol_table.pop(_fn, None)
return SafeXPath3Parser
# Module-level singleton — built once, reused everywhere.
SafeXPath3Parser = _build_safe_xpath3_parser()
# Doesn't look like python supports forward slash auto enclosure in re.findall
# So convert it to inline flag "(?i)foobar" type configuration
@lru_cache(maxsize=100)
@@ -276,6 +183,8 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
"""
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
parser = etree.HTMLParser()
tree = None
@@ -301,7 +210,7 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
# This allows //title to match elements in the default namespace
namespaces[''] = tree.nsmap[None]
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=SafeXPath3Parser)
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=XPath3Parser)
#@note: //title/text() now works with default namespaces (fixed by registering '' prefix)
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
@@ -326,9 +235,6 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
else:
html_block += elementpath_tostring(element)
# Drop element references before the finally block so tree.clear() can release
# the libxml2 document immediately (elements pin the C-level doc via refcount).
del r
return html_block
finally:
# Explicitly clear the tree to free memory
@@ -424,16 +330,12 @@ def _parse_json(json_data, json_filter):
raise Exception("jq not support not found")
if json_filter.startswith("jq:"):
expr = json_filter.removeprefix("jq:")
validate_jq_expression(expr)
jq_expression = jq.compile(expr)
jq_expression = jq.compile(json_filter.removeprefix("jq:"))
match = jq_expression.input(json_data).all()
return _get_stripped_text_from_json_match(match)
if json_filter.startswith("jqraw:"):
expr = json_filter.removeprefix("jqraw:")
validate_jq_expression(expr)
jq_expression = jq.compile(expr)
jq_expression = jq.compile(json_filter.removeprefix("jqraw:"))
match = jq_expression.input(json_data).all()
return '\n'.join(str(item) for item in match)
@@ -537,25 +439,13 @@ def extract_json_as_string(content, json_filter, ensure_is_ldjson_info_type=None
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON {content[:20]}...{str(e)})")
else:
# Check for JSONP wrapper: someCallback({...}) or some.namespace({...})
# Server may claim application/json but actually return JSONP
jsonp_match = re.match(r'^\w[\w.]*\s*\((.+)\)\s*;?\s*$', content.lstrip("\ufeff").strip(), re.DOTALL)
if jsonp_match:
try:
inner = jsonp_match.group(1).strip()
logger.warning(f"Content looks like JSONP, attempting to extract inner JSON for filter '{json_filter}'")
stripped_text_from_html = _parse_json(json.loads(inner), json_filter)
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSONP inner content {content[:20]}...{str(e)})")
if not stripped_text_from_html:
# Probably something else, go fish inside for it
try:
stripped_text_from_html = extract_json_blob_from_html(content=content,
ensure_is_ldjson_info_type=ensure_is_ldjson_info_type,
json_filter=json_filter)
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON while extracting JSON from HTML blob {content[:20]}...{str(e)})")
# Probably something else, go fish inside for it
try:
stripped_text_from_html = extract_json_blob_from_html(content=content,
ensure_is_ldjson_info_type=ensure_is_ldjson_info_type,
json_filter=json_filter )
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON while extracting JSON from HTML blob {content[:20]}...{str(e)})")
if not stripped_text_from_html:
# Re 265 - Just return an empty string when filter not found
+6 -3
View File
@@ -12,6 +12,7 @@ from changedetectionio.notification import (
# Equal to or greater than this number of FilterNotFoundInResponse exceptions will trigger a filter-not-found notification
_FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT = 6
DEFAULT_SETTINGS_HEADERS_USERAGENT='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'
@@ -30,6 +31,10 @@ class model(dict):
'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None},
'timeout': int(getenv("DEFAULT_SETTINGS_REQUESTS_TIMEOUT", "45")), # Default 45 seconds
'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "5")), # Number of threads, lower is better for slow connections
'default_ua': {
'html_requests': getenv("DEFAULT_SETTINGS_HEADERS_USERAGENT", DEFAULT_SETTINGS_HEADERS_USERAGENT),
'html_webdriver': None,
}
},
'application': {
# Custom notification content
@@ -38,9 +43,7 @@ class model(dict):
'api_access_token_enabled': True,
'base_url' : None,
'empty_pages_are_a_change': False,
'browser_profile': None, # machine-name of the system-default BrowserProfile
'browser_profiles': {}, # user-defined profiles keyed by machine name
'fetch_backend': getenv("DEFAULT_FETCH_BACKEND", "requests"),
'fetch_backend': getenv("DEFAULT_FETCH_BACKEND", "html_requests"),
'filter_failure_notification_threshold_attempts': _FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT,
'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
'global_subtractive_selectors': [],
+62 -49
View File
@@ -43,11 +43,6 @@ from ..html_tools import TRANSLATE_WHITESPACE_TABLE
FAVICON_RESAVE_THRESHOLD_SECONDS=86400
BROTLI_COMPRESS_SIZE_THRESHOLD = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024*20))
# Module-level favicon filename cache: data_dir → basename (or None)
# Keyed by data_dir so it survives Watch object recreation, deepcopy, and concurrent requests.
# Invalidated explicitly in bump_favicon() when a new favicon is saved.
_FAVICON_FILENAME_CACHE: dict = {}
minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 3))
mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
@@ -353,40 +348,40 @@ class model(EntityPersistenceMixin, watch_base):
def is_source_type_url(self):
return self.get('url', '').startswith('source:')
@property
def effective_browser_profile(self):
"""Resolve the effective BrowserProfile for this watch.
Walks the chain: watch tag (overrides_watch=True) global settings built-in fallback.
Never raises. Returns a BrowserProfile instance.
"""
from changedetectionio.model.browser_profile import resolve_browser_profile, BUILTIN_REQUESTS
if not self._datastore:
return BUILTIN_REQUESTS
try:
return resolve_browser_profile(self, self._datastore)
except Exception:
return BUILTIN_REQUESTS
@property
def get_fetch_backend(self):
"""Legacy property — prefer effective_browser_profile.fetch_backend for new code.
Returns the raw fetch_backend stored on this watch (or 'requests' for PDFs).
Does NOT walk the tag/global resolution chain.
"""
if self.is_pdf:
return 'requests'
return self.get('fetch_backend')
Get the fetch backend for this watch with special case handling.
@property
def fetcher_supports_screenshots(self):
"""Return True if the resolved fetcher for this watch supports screenshots."""
from changedetectionio import content_fetchers
fetcher_class = content_fetchers.get_fetcher(self.effective_browser_profile.fetch_backend)
if fetcher_class is None:
return False
return bool(getattr(fetcher_class, 'supports_screenshots', False))
CHAIN RESOLUTION OPPORTUNITY:
Currently returns watch.fetch_backend directly, but doesn't implement
Watch Tag Global resolution chain. With Pydantic:
@computed_field
def resolved_fetch_backend(self) -> str:
# Special case: PDFs always use html_requests
if self.is_pdf:
return 'html_requests'
# Watch override
if self.fetch_backend and self.fetch_backend != 'system':
return self.fetch_backend
# Tag override (first tag with overrides_watch=True wins)
for tag_uuid in self.tags:
tag = self._datastore.get_tag(tag_uuid)
if tag.overrides_watch and tag.fetch_backend:
return tag.fetch_backend
# Global default
return self._datastore.settings.fetch_backend
"""
# Maybe also if is_image etc?
# This is because chrome/playwright wont render the PDF in the browser and we will just fetch it and use pdf2html to see the text.
if self.is_pdf:
return 'html_requests'
return self.get('fetch_backend')
@property
def is_pdf(self):
@@ -811,8 +806,9 @@ class model(EntityPersistenceMixin, watch_base):
with open(fname, 'wb') as f:
f.write(decoded)
# Invalidate module-level favicon filename cache for this watch
_FAVICON_FILENAME_CACHE.pop(self.data_dir, None)
# Invalidate favicon filename cache
if hasattr(self, '_favicon_filename_cache'):
delattr(self, '_favicon_filename_cache')
# A signal that could trigger the socket server to update the browser also
watch_check_update = signal('watch_favicon_bump')
@@ -827,23 +823,35 @@ class model(EntityPersistenceMixin, watch_base):
def get_favicon_filename(self) -> str | None:
"""
Find any favicon.* file in the watch data directory.
Find any favicon.* file in the current working directory
and return the contents of the newest one.
Uses a module-level cache keyed by data_dir to survive Watch object recreation,
deepcopy (which drops instance attrs), and concurrent request races.
Invalidated by bump_favicon() when a new favicon is saved.
MEMORY LEAK FIX: Cache the result to avoid repeated glob.glob() operations.
glob.glob() causes millions of fnmatch allocations when called for every watch on page load.
Returns:
str: Basename of the favicon file, or None if not found.
str: Basename of the newest favicon file, or None if not found.
"""
if self.data_dir in _FAVICON_FILENAME_CACHE:
return _FAVICON_FILENAME_CACHE[self.data_dir]
# Check cache first (prevents 26M+ allocations from repeated glob operations)
cache_key = '_favicon_filename_cache'
if hasattr(self, cache_key):
return getattr(self, cache_key)
import glob
# Search for all favicon.* files
files = glob.glob(os.path.join(self.data_dir, "favicon.*"))
fname = os.path.basename(files[0]) if files else None
_FAVICON_FILENAME_CACHE[self.data_dir] = fname
return fname
if not files:
result = None
else:
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
result = os.path.basename(newest_file)
# Cache the result
setattr(self, cache_key, result)
return result
def get_screenshot_as_thumbnail(self, max_age=3200):
"""Return path to a square thumbnail of the most recent screenshot.
@@ -1174,13 +1182,18 @@ class model(EntityPersistenceMixin, watch_base):
def compile_error_texts(self, has_proxies=None):
"""Compile error texts for this watch.
Accepts has_proxies parameter to ensure it works even outside app context"""
from flask import url_for, has_request_context
from flask import url_for
from markupsafe import Markup
output = [] # Initialize as list since we're using append
last_error = self.get('last_error','')
has_app_context = has_request_context()
try:
url_for('settings.settings_page')
except Exception as e:
has_app_context = False
else:
has_app_context = True
# has app+request context, we can use url_for()
if has_app_context:
+1 -4
View File
@@ -187,7 +187,6 @@ class watch_base(dict):
'content-type': None,
'date_created': None,
'extract_text': [], # Extract text by regex after filters
'browser_profile': 'system', # machine-name key of a BrowserProfile; 'system' → resolve via chain
'fetch_backend': 'system', # plaintext, playwright etc
'fetch_time': 0.0,
'filter_failure_notification_send': strtobool(os.getenv('FILTER_FAILURE_NOTIFICATION_SEND_DEFAULT', 'True')),
@@ -590,9 +589,7 @@ class watch_base(dict):
return None
try:
# _datastore is a ChangeDetectionStore (has .data) or a plain dict (unit tests)
store_data = self._datastore.data if hasattr(self._datastore, 'data') else self._datastore
value = store_data['settings']
value = self._datastore['settings']
for key in path:
value = value[key]
return value
-380
View File
@@ -1,380 +0,0 @@
"""
BrowserProfile named, reusable browser/fetcher configuration.
Storage key
-----------
Profiles are stored in ``settings.application.browser_profiles`` as a plain dict
keyed by *machine name* a lowercase, underscore-separated slug derived from the
human-readable ``name`` field:
'My Blocking Chrome' 'my_blocking_chrome'
'Custom CDP — Mobile (375px)' 'custom_cdp_mobile_375px'
Using the machine name as the key means that deleting a profile and recreating
it with the same name restores the original key, so all watches that referenced
it continue to work without any manual re-linking.
Resolution chain
----------------
``resolve_browser_profile(watch, datastore)`` walks:
watch.browser_profile first tag with overrides_watch=True
settings.application.browser_profile built-in fallback
It never raises. Stale / missing machine-name references are logged and the
resolver falls through to the next level.
Built-in profiles
-----------------
``BUILTIN_REQUESTS`` and ``BUILTIN_BROWSER`` are always available and cannot be
deleted from the UI (``is_builtin=True``). Their machine names are stored in
``RESERVED_MACHINE_NAMES`` to block user profiles from shadowing them.
Migration
---------
``store/updates.py::update_31`` converts the legacy ``fetch_backend`` field on
watches, tags and global settings into ``browser_profile`` machine-name
references. After that migration no legacy paths are needed here.
"""
from __future__ import annotations
import os
import re
from typing import Optional
from loguru import logger
from pydantic import BaseModel, field_validator
# Default User-Agent for the built-in plaintext requests profile.
# Overridable via environment variable for deployments that need a custom UA.
_DEFAULT_REQUESTS_UA = os.getenv(
"DEFAULT_SETTINGS_HEADERS_USERAGENT",
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'
)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
NAME_MAX_LEN = 100
# ---------------------------------------------------------------------------
# Model
# ---------------------------------------------------------------------------
class BrowserProfile(BaseModel):
"""
A named, reusable configuration for how a watch fetches its target URL.
The *machine name* (see ``get_machine_name()``) is the stable storage key.
Updating ``name`` changes the machine name; any watch that referenced the
old machine name will then fall back through the resolution chain until it
is explicitly re-pointed. To replace a profile without breaking watches,
delete it and recreate it with the *same* name.
"""
name: str
"""Human-readable label shown in the UI. Max 100 characters."""
fetch_backend: str = 'requests'
"""
Which fetch engine to use. This is the *clean* fetcher name without the
``html_`` module prefix (e.g. ``'requests'``, ``'webdriver'``,
``'playwright'``, ``'puppeteer'``, ``'cloakbrowser'``).
The module-level ``html_`` prefix (``html_requests``, ``html_webdriver``,
) is an implementation detail of ``content_fetchers/``. Use
``get_fetcher_class_name()`` to obtain the full module attribute name when
you need to look up the class.
Must be non-empty and contain only ``[a-z0-9_]`` characters.
"""
is_builtin: bool = False
"""Built-in profiles are always present and cannot be deleted from the UI."""
# ------------------------------------------------------------------
# Browser-specific settings (silently ignored by html_requests)
# ------------------------------------------------------------------
browser_connection_url: Optional[str] = None
"""
Custom CDP / WebSocket endpoint, e.g. ``ws://my-chrome:3000``.
Overrides the system-wide ``PLAYWRIGHT_DRIVER_URL`` for this profile.
Only meaningful for ``html_webdriver`` profiles.
"""
viewport_width: int = 1280
"""
Browser viewport width in pixels.
Common presets: 375 (iPhone), 768 (tablet), 1280 (desktop).
"""
viewport_height: int = 1000
"""
Browser viewport height in pixels.
Common presets: 812 (iPhone), 1024 (tablet), 1000 (desktop).
"""
block_images: bool = False
"""
Block all image requests. Typically cuts page-load time by 40-70 % on
image-heavy sites with no impact on text-based change detection.
"""
block_fonts: bool = False
"""Block web-font requests. Modest speed gain; rarely affects detection."""
user_agent: Optional[str] = None
"""
Override the browser User-Agent string.
``None`` keeps the fetcher's built-in default, which already strips
obvious headless markers such as ``HeadlessChrome``.
"""
ignore_https_errors: bool = False
"""
Proceed even when the server's TLS certificate is invalid or self-signed.
Useful for staging / development environments.
"""
locale: Optional[str] = None
"""
Browser locale (e.g. ``en-US``, ``de-DE``).
Sets the ``Accept-Language`` header and ``navigator.language``.
Some sites serve different prices or copy based on locale.
"""
custom_headers: str = ''
"""
Extra HTTP headers sent with every request using this profile, in ``Key: Value`` format
(one per line, ``#`` lines are ignored). Applied before per-watch headers so
individual watches can override them.
"""
service_workers: str = 'allow'
"""
Whether to allow Service Workers in the browser context.
Playwright accepts ``'allow'`` or ``'block'``.
Block to avoid large Service Worker data transfers (e.g. YouTube).
"""
extra_delay: int = 0
"""
Extra seconds to wait after page load before extracting content
(on top of the per-watch ``render_extract_delay``).
Sourced from ``WEBDRIVER_DELAY_BEFORE_CONTENT_READY`` at startup.
"""
model_config = {"frozen": False}
# ------------------------------------------------------------------
# Validators
# ------------------------------------------------------------------
@field_validator('fetch_backend')
@classmethod
def _validate_fetch_backend(cls, v: str) -> str:
v = v.strip()
if not v:
raise ValueError('fetch_backend cannot be empty')
if not re.fullmatch(r'[a-z0-9_]+', v):
raise ValueError(
f"fetch_backend must contain only lowercase letters, digits and underscores, got {v!r}"
)
if v.startswith('html_'):
raise ValueError(
f"fetch_backend should be the clean fetcher name without the 'html_' prefix "
f"(e.g. 'requests', 'webdriver', 'playwright'). Got {v!r}. "
f"Use get_fetcher_class_name() to obtain the full module attribute name."
)
return v
@field_validator('name')
@classmethod
def _validate_name(cls, v: str) -> str:
v = v.strip()
if not v:
raise ValueError('Name cannot be empty')
if len(v) > NAME_MAX_LEN:
raise ValueError(f'Name must be {NAME_MAX_LEN} characters or less')
return v
# ------------------------------------------------------------------
# Machine-name helpers
# ------------------------------------------------------------------
@staticmethod
def machine_name_from_str(name: str) -> str:
"""
Convert a human name to a machine-safe storage key.
Transformation rules (applied in order):
1. Strip surrounding whitespace; lower-case.
2. Replace runs of whitespace or hyphens with a single ``_``.
3. Drop every character that is not ``[a-z0-9_]``.
4. Collapse consecutive underscores.
5. Strip leading / trailing underscores.
6. Truncate to ``NAME_MAX_LEN`` characters.
Examples::
'My Blocking Browser Chrome' 'my_blocking_browser_chrome'
'Custom CDP — Mobile (375px)' 'custom_cdp_mobile_375px'
' Weird --- Name ' 'weird_name'
"""
s = name.strip().lower()
s = re.sub(r'[\s\-]+', '_', s) # whitespace / hyphens → underscore
s = re.sub(r'[^a-z0-9_]', '', s) # drop everything else
s = re.sub(r'_+', '_', s) # collapse repeated underscores
s = s.strip('_') # drop leading / trailing underscores
return s[:NAME_MAX_LEN]
def get_machine_name(self) -> str:
"""Return the machine-safe storage key derived from this profile's ``name``."""
return self.machine_name_from_str(self.name)
def get_fetcher_class_name(self) -> str:
"""Return the clean fetcher name for this profile (same as ``fetch_backend``).
Use with ``content_fetchers.get_fetcher()``::
from changedetectionio import content_fetchers
fetcher_cls = content_fetchers.get_fetcher(profile.get_fetcher_class_name())
"""
return self.fetch_backend
# ---------------------------------------------------------------------------
# Built-in profiles (always present, cannot be deleted)
# ---------------------------------------------------------------------------
BUILTIN_REQUESTS = BrowserProfile(
name='Direct HTTP (requests)',
fetch_backend='requests',
is_builtin=True,
user_agent=_DEFAULT_REQUESTS_UA,
)
BUILTIN_PLAYWRIGHT = BrowserProfile(
name='Browser (Chrome/Playwright)',
fetch_backend='playwright_cdp',
is_builtin=True,
)
BUILTIN_SELENIUM = BrowserProfile(
name='Browser (Chrome/Selenium)',
fetch_backend='selenium',
is_builtin=True,
)
BUILTIN_PUPPETEER = BrowserProfile(
name='Browser (Chrome/Puppeteer)',
fetch_backend='puppeteer',
is_builtin=True,
)
# Backwards-compatible alias — code that imported BUILTIN_BROWSER keeps working.
BUILTIN_BROWSER = BUILTIN_PLAYWRIGHT
# Keyed by machine name for O(1) lookup.
_BUILTINS: dict[str, BrowserProfile] = {
b.get_machine_name(): b
for b in (BUILTIN_REQUESTS, BUILTIN_PLAYWRIGHT, BUILTIN_SELENIUM, BUILTIN_PUPPETEER)
}
# Machine names that cannot be used by user-created profiles.
RESERVED_MACHINE_NAMES: frozenset[str] = frozenset(_BUILTINS.keys())
def get_default_browser_builtin() -> BrowserProfile:
"""Final fallback when no profile can be resolved through the chain.
``preconfigure_browser_profiles_based_on_env()`` sets
``settings.application.browser_profile`` explicitly at startup, so this
fallback is only reached for watches with stale / missing machine-name
references. Safe default is always direct HTTP requests.
"""
return BUILTIN_REQUESTS
# ---------------------------------------------------------------------------
# Lookup helpers
# ---------------------------------------------------------------------------
def get_builtin_profiles() -> dict[str, BrowserProfile]:
"""Return a shallow copy of the built-in profiles dict (keyed by machine name)."""
return dict(_BUILTINS)
def get_profile(machine_name: str, store_profiles: dict) -> Optional[BrowserProfile]:
"""
Look up a ``BrowserProfile`` by machine name.
Stored profiles are checked first so that env-configured built-ins (written
by ``preconfigure_browser_profiles_based_on_env``) take priority over the
bare module-level defaults. Falls back to ``_BUILTINS`` when no stored
version exists.
Returns ``None`` when the machine name is unknown or the stored data is
corrupt (a warning is logged in the latter case).
"""
raw = store_profiles.get(machine_name)
if raw is not None:
if isinstance(raw, BrowserProfile):
return raw
try:
return BrowserProfile(**raw)
except Exception as exc:
logger.warning(f"BrowserProfile '{machine_name}': failed to deserialize — {exc}")
# Fall through to built-in
if machine_name in _BUILTINS:
return _BUILTINS[machine_name]
return None
# ---------------------------------------------------------------------------
# Resolution
# ---------------------------------------------------------------------------
def resolve_browser_profile(watch, datastore) -> BrowserProfile:
"""
Resolve the effective ``BrowserProfile`` for *watch*.
Resolution chain
~~~~~~~~~~~~~~~~
1. ``watch['browser_profile']`` explicit machine name set on the watch.
2. First tag with ``overrides_watch=True`` that has ``browser_profile`` set.
3. ``settings.application['browser_profile']`` system-wide default.
4. Built-in fallback: ``BUILTIN_REQUESTS`` (requests is always the safe default).
Never raises. A stale / missing machine-name reference produces a
``logger.warning`` and the resolver continues down the chain.
"""
from changedetectionio.model.resolver import resolve_setting
store_profiles: dict = datastore.data['settings']['application'].get('browser_profiles', {})
machine_name = resolve_setting(
watch, datastore,
field_name='browser_profile',
sentinel_values={'system', 'default', ''},
default=None,
require_tag_override=True,
)
if machine_name:
profile = get_profile(machine_name, store_profiles)
if profile:
return profile
logger.warning(
f"Watch {watch.get('uuid')!r}: browser_profile {machine_name!r} not found, "
f"falling back through the chain"
)
return get_default_browser_builtin()
-63
View File
@@ -1,63 +0,0 @@
"""
Unified Watch Tag Global settings cascade resolver.
All settings resolution follows the same priority order:
1. Watch-level setting (if set and not a sentinel "use parent" value)
2. First tag with overrides_watch=True that has the field set
3. Global application settings
4. Caller-supplied default
This replaces the previously scattered manual resolution loops found in
notification_service.py, processors/base.py, and the restock processor.
"""
def resolve_setting(watch, datastore, field_name, *,
sentinel_values=None,
default=None,
require_tag_override=True):
"""
Resolve a single setting value by walking the Watch Tag Global chain.
Args:
watch: Watch dict / model object.
datastore: App datastore (must have get_all_tags_for_watch() and
data['settings']['application']).
field_name: The setting key to look up at each level.
sentinel_values: Set of values that mean "not configured here, keep looking".
For example {'system'} for fetch_backend.
default: Value returned when nothing is found in the chain.
require_tag_override: If True (default), only tags where overrides_watch=True
contribute to the cascade. Set to False when every tag
that carries the field should be considered (e.g. for
fields that make sense to merge/override at any tag level).
Returns:
The first non-sentinel, non-empty value found, or *default*.
"""
_sentinels = set(sentinel_values) if sentinel_values else set()
def _is_unset(v):
return v is None or v == '' or v in _sentinels
# 1. Watch level
v = watch.get(field_name)
if not _is_unset(v):
return v
# 2. Tag level
tags = datastore.get_all_tags_for_watch(uuid=watch.get('uuid'))
if tags:
for tag in tags.values():
if require_tag_override and not tag.get('overrides_watch'):
continue
v = tag.get(field_name)
if not _is_unset(v):
return v
# 3. Global application settings
v = datastore.data['settings']['application'].get(field_name)
if not _is_unset(v):
return v
return default
@@ -1,3 +0,0 @@
from .registry import registry, NotificationProfileType, AppriseProfileType
__all__ = ['registry', 'NotificationProfileType', 'AppriseProfileType']
@@ -1,73 +0,0 @@
"""
Per-profile notification log.
Each profile gets its own log file at:
{datastore_path}/notification-logs/{profile_uuid}.log
Entries are stored as JSON-lines (one JSON object per line).
The file is capped at MAX_ENTRIES lines (oldest pruned first).
"""
import json
import os
from datetime import datetime, timezone
MAX_ENTRIES = 100
_LOG_DIR = 'notification-logs'
def _log_file(datastore_path: str, profile_uuid: str) -> str:
return os.path.join(datastore_path, _LOG_DIR, f'{profile_uuid}.log')
def write_profile_log(datastore_path: str, profile_uuid: str, *,
watch_url: str = '',
watch_uuid: str = '',
status: str, # 'ok' | 'error' | 'test'
message: str = ''):
"""Append one log entry; prune to MAX_ENTRIES."""
log_dir = os.path.join(datastore_path, _LOG_DIR)
os.makedirs(log_dir, exist_ok=True)
entry = json.dumps({
'ts': datetime.now(tz=timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC'),
'watch_url': watch_url[:200],
'watch_uuid': watch_uuid,
'status': status,
'message': message[:500],
}, ensure_ascii=False)
path = _log_file(datastore_path, profile_uuid)
try:
with open(path, 'r', encoding='utf-8') as fh:
lines = [l for l in fh.read().splitlines() if l.strip()]
except FileNotFoundError:
lines = []
lines.append(entry)
lines = lines[-MAX_ENTRIES:]
with open(path, 'w', encoding='utf-8') as fh:
fh.write('\n'.join(lines) + '\n')
def read_profile_log(datastore_path: str, profile_uuid: str) -> list:
"""Return log entries as a list of dicts, newest first."""
path = _log_file(datastore_path, profile_uuid)
try:
with open(path, 'r', encoding='utf-8') as fh:
lines = [l.strip() for l in fh if l.strip()]
except FileNotFoundError:
return []
entries = []
for line in reversed(lines):
try:
entries.append(json.loads(line))
except (json.JSONDecodeError, ValueError):
pass
return entries
def has_log(datastore_path: str, profile_uuid: str) -> bool:
return os.path.exists(_log_file(datastore_path, profile_uuid))
@@ -1,111 +0,0 @@
"""
Notification Profile Type plugin registry.
NotificationProfileType is the abstract base the only contract is send().
Plugins are free to use any delivery mechanism (Apprise, direct HTTP, SDK, etc.).
Built-in: AppriseProfileType (raw Apprise URL list).
Third-party plugins register additional types:
from changedetectionio.notification_profiles.registry import registry, NotificationProfileType
@registry.register
class MyProfileType(NotificationProfileType):
type_id = "mytype"
display_name = "My Service"
icon = "bell"
template = "my_plugin/notification_profiles/types/mytype.html"
def send(self, config: dict, n_object: dict, datastore) -> bool:
requests.post(config['webhook_url'], json={"text": n_object['notification_body']})
return True
"""
from abc import ABC, abstractmethod
class NotificationProfileType(ABC):
type_id: str = NotImplemented
display_name: str = NotImplemented
icon: str = "bell" # feather icon name
template: str = NotImplemented # Jinja2 partial rendered in the profile edit form
@abstractmethod
def send(self, config: dict, n_object: dict, datastore) -> bool:
"""
Deliver the notification.
Args:
config: The profile's config dict (type-specific fields).
n_object: Fully-rendered NotificationContextData (title, body, format, etc.).
datastore: App datastore for any extra lookups.
Returns True on success, False on failure (do not raise log instead).
"""
def validate(self, config: dict) -> None:
"""Raise ValueError with a user-readable message on invalid config."""
pass
def get_url_hint(self, config: dict) -> str:
"""Short display string shown in the selector chip tooltip / dropdown row."""
return ''
class AppriseProfileType(NotificationProfileType):
"""Delivers notifications via Apprise using a raw URL list."""
type_id = "apprise"
display_name = "Apprise"
icon = "bell"
template = "notification_profiles/types/apprise.html"
def get_apprise_urls(self, config: dict) -> list:
return config.get('notification_urls') or []
def send(self, config: dict, n_object, datastore) -> bool:
from changedetectionio.notification.handler import process_notification
from changedetectionio.notification_service import NotificationContextData
urls = self.get_apprise_urls(config)
if not urls:
return False
if not isinstance(n_object, NotificationContextData):
n_object = NotificationContextData(n_object)
n_object['notification_urls'] = urls
n_object['notification_title'] = config.get('notification_title') or n_object.get('notification_title')
n_object['notification_body'] = config.get('notification_body') or n_object.get('notification_body')
n_object['notification_format'] = config.get('notification_format') or n_object.get('notification_format')
process_notification(n_object, datastore)
return True
def get_url_hint(self, config: dict) -> str:
urls = config.get('notification_urls') or []
if urls:
u = urls[0]
return (u[:60] + '') if len(u) > 60 else u
return ''
class _Registry:
def __init__(self):
self._types: dict = {}
def register(self, cls):
"""Register a NotificationProfileType subclass. Usable as a decorator."""
instance = cls()
self._types[instance.type_id] = instance
return cls
def get(self, type_id: str) -> NotificationProfileType:
return self._types.get(type_id, self._types.get('apprise'))
def all(self) -> list:
return list(self._types.values())
def choices(self) -> list:
return [(t.type_id, t.display_name) for t in self._types.values()]
registry = _Registry()
registry.register(AppriseProfileType)
@@ -1,49 +0,0 @@
"""
Resolve the full set of NotificationProfile objects that should fire for a given watch.
Merges profile UUIDs from: Watch Tags System (union, deduplicated).
Mute cascade is checked separately via resolve_setting() before calling this.
"""
from loguru import logger
def resolve_notification_profiles(watch, datastore) -> list:
"""
Return list of (profile_dict, NotificationProfileType) tuples to fire for *watch*.
Profiles are deduplicated by UUID if the same UUID appears at multiple levels
it fires once, not multiple times.
"""
from changedetectionio.notification_profiles.registry import registry
all_profiles = datastore.data['settings']['application'].get('notification_profile_data', {})
seen = set()
result = []
def _add(uuids):
for uid in (uuids or []):
if uid in seen:
continue
profile = all_profiles.get(uid)
if not profile:
logger.warning(f"Notification profile UUID {uid!r} not found, skipping")
continue
seen.add(uid)
type_handler = registry.get(profile.get('type', 'apprise'))
result.append((profile, type_handler))
# 1. Watch-level
_add(watch.get('notification_profiles', []))
# 2. Tag/group level
tags = datastore.get_all_tags_for_watch(uuid=watch.get('uuid'))
if tags:
for tag in tags.values():
_add(tag.get('notification_profiles', []))
# 3. System level
_add(datastore.data['settings']['application'].get('notification_profiles', []))
return result
+46 -26
View File
@@ -237,23 +237,14 @@ def register_builtin_fetchers():
This is called from content_fetchers/__init__.py after all fetchers are imported
to avoid circular import issues.
"""
from changedetectionio.content_fetchers import requests, puppeteer, webdriver_selenium
from changedetectionio.content_fetchers.playwright import CDP, chrome, firefox, webkit
from changedetectionio.content_fetchers import requests, playwright, puppeteer, webdriver_selenium
# Register each built-in fetcher plugin
if hasattr(requests, 'requests_plugin'):
plugin_manager.register(requests.requests_plugin, 'builtin_requests')
if hasattr(CDP, 'cdp_plugin'):
plugin_manager.register(CDP.cdp_plugin, 'builtin_playwright_cdp')
if hasattr(chrome, 'chrome_plugin'):
plugin_manager.register(chrome.chrome_plugin, 'builtin_playwright_chrome')
if hasattr(firefox, 'firefox_plugin'):
plugin_manager.register(firefox.firefox_plugin, 'builtin_playwright_firefox')
if hasattr(webkit, 'webkit_plugin'):
plugin_manager.register(webkit.webkit_plugin, 'builtin_playwright_webkit')
if hasattr(playwright, 'playwright_plugin'):
plugin_manager.register(playwright.playwright_plugin, 'builtin_playwright')
if hasattr(puppeteer, 'puppeteer_plugin'):
plugin_manager.register(puppeteer.puppeteer_plugin, 'builtin_puppeteer')
@@ -369,28 +360,57 @@ def get_active_plugins():
def get_fetcher_capabilities(watch, datastore):
"""Get capability flags for a watch's resolved fetcher.
"""Get capability flags for a watch's fetcher.
Uses the BrowserProfile resolution chain (watch tag global built-in)
to determine the actual fetcher class, then reads its capability flags.
Args:
watch: The watch object/dict
datastore: The datastore to resolve 'system' fetcher
Returns:
dict: {'supports_browser_steps': bool, 'supports_screenshots': bool,
'supports_xpath_element_data': bool}
dict: Dictionary with capability flags:
{
'supports_browser_steps': bool,
'supports_screenshots': bool,
'supports_xpath_element_data': bool
}
"""
from changedetectionio.model.browser_profile import resolve_browser_profile
# Get the fetcher name from watch
fetcher_name = watch.get('fetch_backend', 'system')
# Resolve 'system' to actual fetcher
if fetcher_name == 'system':
fetcher_name = datastore.data['settings']['application'].get('fetch_backend', 'html_requests')
# Get the fetcher class
from changedetectionio import content_fetchers
profile = resolve_browser_profile(watch, datastore)
fetcher_class = content_fetchers.get_fetcher(profile.fetch_backend)
# Try to get from built-in fetchers first
if hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
if fetcher_class is None:
return {'supports_browser_steps': False, 'supports_screenshots': False, 'supports_xpath_element_data': False}
# Try to get from plugin-provided fetchers
# Query all plugins for registered fetchers
plugin_fetchers = plugin_manager.hook.register_content_fetcher()
for fetcher_registration in plugin_fetchers:
if fetcher_registration:
name, fetcher_class = fetcher_registration
if name == fetcher_name:
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
# Default: no capabilities
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False),
'supports_browser_steps': False,
'supports_screenshots': False,
'supports_xpath_element_data': False
}
+62 -67
View File
@@ -23,7 +23,6 @@ class difference_detection_processor():
watch = None
xpath_data = None
preferred_proxy = None
preferred_proxy_override = None # Set externally to force a specific proxy (e.g. proxy checker)
screenshot_format = SCREENSHOT_FORMAT_JPEG
last_raw_content_checksum = None
@@ -37,8 +36,6 @@ class difference_detection_processor():
# 2. Preserves Watch object with properties (.link, .is_pdf, etc.) - can't use dict()
# 3. Safe now: Watch.__deepcopy__() shares datastore ref (no memory leak) but copies dict data
self.watch = deepcopy(self.datastore.data['watching'].get(watch_uuid))
if self.watch is None:
raise KeyError(f"Watch UUID {watch_uuid} not found in datastore (deleted before processing?)")
# Generic fetcher that should be extended (requests, playwright etc)
self.fetcher = Fetcher()
@@ -118,65 +115,82 @@ class difference_detection_processor():
f"Set ALLOW_IANA_RESTRICTED_ADDRESSES=true to allow."
)
async def call_browser(self):
async def call_browser(self, preferred_proxy_id=None):
from requests.structures import CaseInsensitiveDict
from changedetectionio.model.browser_profile import resolve_browser_profile, BUILTIN_REQUESTS
url = self.watch.link
# Protect against file:, file:/, file:// access
# Protect against file:, file:/, file:// access, check the real "link" without any meta "source:" etc prepended.
if re.search(r'^file:', url.strip(), re.IGNORECASE):
if not strtobool(os.getenv('ALLOW_FILE_URI', 'false')):
raise Exception("file:// type access is denied for security reasons.")
raise Exception(
"file:// type access is denied for security reasons."
)
await self.validate_iana_url()
# Resolve the full browser profile for this watch (watch → tag → global → built-in)
profile = resolve_browser_profile(self.watch, self.datastore)
# Requests, playwright, other browser via wss:// etc, fetch_extra_something
prefer_fetch_backend = self.watch.get('fetch_backend', 'system')
# PDFs always use the requests fetcher — browsers render them in an embedded viewer
# Proxy ID "key"
preferred_proxy_id = preferred_proxy_id if preferred_proxy_id else self.datastore.get_preferred_proxy_for_watch(
uuid=self.watch.get('uuid'))
# Pluggable content self.fetcher
if not prefer_fetch_backend or prefer_fetch_backend == 'system':
prefer_fetch_backend = self.datastore.data['settings']['application'].get('fetch_backend')
# In the case that the preferred fetcher was a browser config with custom connection URL..
# @todo - on save watch, if its extra_browser_ then it should be obvious it will use playwright (like if its requests now..)
custom_browser_connection_url = None
if prefer_fetch_backend.startswith('extra_browser_'):
(t, key) = prefer_fetch_backend.split('extra_browser_')
connection = list(
filter(lambda s: (s['browser_name'] == key), self.datastore.data['settings']['requests'].get('extra_browsers', [])))
if connection:
prefer_fetch_backend = 'html_webdriver'
custom_browser_connection_url = connection[0].get('browser_connection_url')
# PDF should be html_requests because playwright will serve it up (so far) in a embedded page
# @todo https://github.com/dgtlmoon/changedetection.io/issues/2019
# @todo needs test to or a fix
if self.watch.is_pdf:
profile = BUILTIN_REQUESTS
prefer_fetch_backend = "html_requests"
# Resolve proxy for the target URL fetch.
# Note: browser_connection_url is the WebSocket endpoint to reach the remote browser,
# which is separate from the proxy used by the browser to fetch target pages.
proxy_url = self.datastore.get_proxy_url_for_watch(self.watch.get('uuid'), override_id=self.preferred_proxy_override)
if proxy_url:
logger.debug(f"Proxy '{proxy_url}' for {url}")
logger.debug(f"BrowserProfile '{profile.get_machine_name()}' (fetcher={profile.fetch_backend}) for watch {self.watch['uuid']}")
# Select the fetcher class
# Grab the right kind of 'fetcher', (playwright, requests, etc)
from changedetectionio import content_fetchers
fetcher_class_name = profile.get_fetcher_class_name()
if hasattr(content_fetchers, prefer_fetch_backend):
# @todo TEMPORARY HACK - SWITCH BACK TO PLAYWRIGHT FOR BROWSERSTEPS
if prefer_fetch_backend == 'html_webdriver' and self.watch.has_browser_steps:
# This is never supported in selenium anyway
logger.warning(
"Using playwright fetcher override for possible puppeteer request in browsersteps, because puppetteer:browser steps is incomplete.")
from changedetectionio.content_fetchers.playwright import fetcher as playwright_fetcher
fetcher_obj = playwright_fetcher
else:
fetcher_obj = getattr(content_fetchers, prefer_fetch_backend)
else:
# What it referenced doesnt exist, Just use a default
fetcher_obj = getattr(content_fetchers, "html_requests")
fetcher_obj = content_fetchers.get_fetcher(fetcher_class_name)
if fetcher_obj is None:
logger.warning(f"Fetcher '{fetcher_class_name}' not found, falling back to requests")
fetcher_obj = content_fetchers.get_fetcher('requests')
elif self.watch.has_browser_steps and not getattr(fetcher_obj, 'supports_browser_steps', False):
# Browser steps require Playwright — override if the resolved fetcher doesn't support them
logger.warning(f"Fetcher '{fetcher_class_name}' does not support browser steps, overriding to Playwright")
fetcher_obj = content_fetchers.get_fetcher('playwright')
proxy_url = None
if preferred_proxy_id:
# Custom browser endpoints should NOT have a proxy added
if not prefer_fetch_backend.startswith('extra_browser_'):
proxy_url = self.datastore.proxy_list.get(preferred_proxy_id).get('url')
logger.debug(f"Selected proxy key '{preferred_proxy_id}' as proxy URL '{proxy_url}' for {url}")
else:
logger.debug("Skipping adding proxy data when custom Browser endpoint is specified. ")
self.fetcher = fetcher_obj(
proxy_override=proxy_url,
custom_browser_connection_url=profile.browser_connection_url,
screenshot_format=self.screenshot_format,
# BrowserProfile fields — browser fetchers use these; html_requests ignores them
viewport_width=profile.viewport_width,
viewport_height=profile.viewport_height,
block_images=profile.block_images,
block_fonts=profile.block_fonts,
profile_user_agent=profile.user_agent,
ignore_https_errors=profile.ignore_https_errors,
locale=profile.locale,
service_workers=profile.service_workers,
extra_delay=profile.extra_delay,
)
logger.debug(f"Using proxy '{proxy_url}' for {self.watch['uuid']}")
# Now call the fetcher (playwright/requests/etc) with arguments that only a fetcher would need.
# When browser_connection_url is None, it method should default to working out whats the best defaults (os env vars etc)
self.fetcher = fetcher_obj(proxy_override=proxy_url,
custom_browser_connection_url=custom_browser_connection_url,
screenshot_format=self.screenshot_format
)
if self.watch.has_browser_steps:
self.fetcher.browser_steps = browser_steps_get_valid_steps(self.watch.get('browser_steps', []))
@@ -186,17 +200,9 @@ class difference_detection_processor():
from changedetectionio.jinja2_custom import render as jinja_render
request_headers = CaseInsensitiveDict()
# Browser profile: UA override (lowest priority — watch headers override this)
if profile.user_agent:
request_headers['User-Agent'] = profile.user_agent
# Browser profile: custom headers (override profile UA, but watch headers override these)
if profile.custom_headers:
for line in profile.custom_headers.splitlines():
line = line.strip()
if not line.startswith('#') and ':' in line:
k, v = line.split(':', 1)
request_headers[k.strip()] = v.strip()
ua = self.datastore.data['settings']['requests'].get('default_ua')
if ua and ua.get(prefer_fetch_backend):
request_headers.update({'User-Agent': ua.get(prefer_fetch_backend)})
request_headers.update(self.watch.get('headers', {}))
request_headers.update(self.datastore.get_all_base_headers())
@@ -253,17 +259,6 @@ class difference_detection_processor():
# @todo .quit here could go on close object, so we can run JS if change-detected
await self.fetcher.quit(watch=self.watch)
self.fetcher.disk_cleanup_after_fetch()
# Sanitize lone surrogates - these can appear when servers return malformed/mixed-encoding
# content that gets decoded into surrogate characters (e.g. \udcad). Without this,
# encode('utf-8') raises UnicodeEncodeError downstream in checksums, diffs, file writes, etc.
# Covers all fetchers (requests, playwright, puppeteer, selenium) in one place.
# Also note: By this point we SHOULD know the original encoding so it can safely convert to utf-8 for the rest of the app.
# See: https://github.com/dgtlmoon/changedetection.io/issues/3952
if self.fetcher.content and isinstance(self.fetcher.content, str):
self.fetcher.content = self.fetcher.content.encode('utf-8', errors='replace').decode('utf-8')
# After init, call run_changedetection() which will do the actual change-detection
+5 -2
View File
@@ -42,7 +42,10 @@ def render_form(watch, datastore, request, url_for, render_template, flash, redi
# Get error information for the template
screenshot_url = watch.get_screenshot()
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
@@ -59,7 +62,7 @@ def render_form(watch, datastore, request, url_for, render_template, flash, redi
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
screenshot=screenshot_url,
fetcher_supports_screenshots=fetcher_supports_screenshots,
is_html_webdriver=is_html_webdriver,
password_enabled_and_share_is_off=password_enabled_and_share_is_off,
extra_title=f" - {watch.label} - Extract Data",
extra_stylesheets=[url_for('static_content', group='styles', filename='diff.css')],
+1 -7
View File
@@ -100,13 +100,7 @@ class guess_stream_type():
if any(s in http_content_header for s in RSS_XML_CONTENT_TYPES):
self.is_rss = True
elif any(s in http_content_header for s in JSON_CONTENT_TYPES):
# JSONP detection: server claims application/json but content is actually JSONP (e.g. cb({...}))
# A JSONP response starts with an identifier followed by '(' - not valid JSON
if re.match(r'^\w[\w.]*\s*\(', test_content):
logger.warning(f"Content-Type header claims JSON but content looks like JSONP (starts with identifier+parenthesis) - treating as plaintext")
self.is_plaintext = True
else:
self.is_json = True
self.is_json = True
elif 'pdf' in magic_content_header:
self.is_pdf = True
# magic will call a rss document 'xml'
@@ -1,7 +1,6 @@
from babel.numbers import parse_decimal
from changedetectionio.model.Watch import model as BaseWatch
from decimal import Decimal, InvalidOperation
from typing import Union
import re
@@ -11,8 +10,6 @@ supports_browser_steps = True
supports_text_filters_and_triggers = True
supports_text_filters_and_triggers_elements = True
supports_request_type = True
_price_re = re.compile(r"Price:\s*(\d+(?:\.\d+)?)", re.IGNORECASE)
class Restock(dict):
@@ -34,7 +31,6 @@ class Restock(dict):
if standardized_value:
# Convert to float
# @todo locale needs to be the locale of the webpage
return float(parse_decimal(standardized_value, locale='en'))
return None
@@ -66,17 +62,6 @@ class Restock(dict):
super().__setitem__(key, value)
def get_price_from_history_str(history_str):
m = _price_re.search(history_str)
if not m:
return None
try:
return str(Decimal(m.group(1)))
except InvalidOperation:
return None
class Watch(BaseWatch):
def __init__(self, *arg, **kw):
super().__init__(*arg, **kw)
@@ -90,27 +75,13 @@ class Watch(BaseWatch):
def extra_notification_token_values(self):
values = super().extra_notification_token_values()
values['restock'] = self.get('restock', {})
values['restock']['previous_price'] = None
if self.history_n >= 2:
history = self.history
if history and len(history) >=2:
"""Unfortunately for now timestamp is stored as string key"""
sorted_keys = sorted(list(history), key=lambda x: int(x))
sorted_keys.reverse()
price_str = self.get_history_snapshot(timestamp=sorted_keys[-1])
if price_str:
values['restock']['previous_price'] = get_price_from_history_str(price_str)
return values
def extra_notification_token_placeholder_info(self):
values = super().extra_notification_token_placeholder_info()
values.append(('restock.price', "Price detected"))
values.append(('restock.in_stock', "In stock status"))
values.append(('restock.original_price', "Original price at first check"))
values.append(('restock.previous_price', "Previous price in history"))
return values
@@ -437,18 +437,17 @@ class perform_site_check(difference_detection_processor):
# Only try to process restock information (like scraping for keywords) if the page was actually rendered correctly.
# Otherwise it will assume "in stock" because nothing suggesting the opposite was found
#useless
# from ...html_tools import html_to_text
# text = html_to_text(self.fetcher.content)
# logger.debug(f"Length of text after conversion: {len(text)}")
# if not len(text):
# from ...content_fetchers.exceptions import ReplyWithContentButNoText
# raise ReplyWithContentButNoText(url=watch.link,
# status_code=self.fetcher.get_last_status_code(),
# screenshot=self.fetcher.screenshot,
# html_content=self.fetcher.content,
# xpath_data=self.fetcher.xpath_data
# )
from ...html_tools import html_to_text
text = html_to_text(self.fetcher.content)
logger.debug(f"Length of text after conversion: {len(text)}")
if not len(text):
from ...content_fetchers.exceptions import ReplyWithContentButNoText
raise ReplyWithContentButNoText(url=watch.link,
status_code=self.fetcher.get_last_status_code(),
screenshot=self.fetcher.screenshot,
html_content=self.fetcher.content,
xpath_data=self.fetcher.xpath_data
)
# Which restock settings to compare against?
# Settings are stored in restock_diff.json (migrated from watch.json by update_30).
@@ -489,9 +488,19 @@ class perform_site_check(difference_detection_processor):
# @TODO !!! some setting like "Use as fallback" or "always use", "t
if not (has_price and has_availability) or True:
from changedetectionio.pluggy_interface import get_itemprop_availability_from_plugin
# Use the actual resolved fetcher name from the fetcher instance
fetcher_name = self.watch.effective_browser_profile.fetch_backend
logger.debug(f"Resolved effective fetcher: {fetcher_name}")
fetcher_name = watch.get('fetch_backend', 'html_requests')
# Resolve 'system' to the actual fetcher being used
# This allows plugins to work even when watch uses "system settings default"
if fetcher_name == 'system':
# Get the actual fetcher that was used (from self.fetcher)
# Fetcher class name gives us the actual backend (e.g., 'html_requests', 'html_webdriver')
actual_fetcher = type(self.fetcher).__name__
if 'html_requests' in actual_fetcher.lower():
fetcher_name = 'html_requests'
elif 'webdriver' in actual_fetcher.lower() or 'playwright' in actual_fetcher.lower():
fetcher_name = 'html_webdriver'
logger.debug(f"Resolved 'system' fetcher to actual fetcher: {fetcher_name}")
# Try plugin override - plugins can decide if they support this fetcher
if fetcher_name:
@@ -283,7 +283,4 @@ def query_price_availability(extracted_data):
if not result.get('availability') and 'availability' in microdata:
result['availability'] = microdata['availability']
# result['price'] could be float or str here, depending on the website, for example it might contain "1,00" commas, etc.
# using something like babel you need to know the locale of the website and even then it can be problematic
# we dont really do anything with the price data so far.. so just accept it the way it comes.
return result
@@ -154,7 +154,11 @@ def render(watch, datastore, request, url_for, render_template, flash, redirect,
screenshot_url = watch.get_screenshot()
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
@@ -210,7 +214,7 @@ def render(watch, datastore, request, url_for, render_template, flash, redirect,
extra_title=f" - {watch.label} - History",
extract_form=extract_form,
from_version=str(from_version),
fetcher_supports_screenshots=fetcher_supports_screenshots,
is_html_webdriver=is_html_webdriver,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
-2
View File
@@ -29,11 +29,9 @@ def register_watch_operation_handlers(socketio, datastore):
# Perform the operation
if op == 'pause':
watch.toggle_pause()
watch.commit()
logger.info(f"Socket.IO: Toggled pause for watch {uuid}")
elif op == 'mute':
watch.toggle_mute()
watch.commit()
logger.info(f"Socket.IO: Toggled mute for watch {uuid}")
elif op == 'recheck':
# Import here to avoid circular imports
@@ -199,31 +199,8 @@ def handle_watch_update(socketio, **kwargs):
logger.error(f"Socket.IO error in handle_watch_update: {str(e)}")
def _suppress_werkzeug_ws_abrupt_disconnect_noise():
"""Patch BaseWSGIServer.log to suppress the AssertionError traceback that fires when
a browser closes a WebSocket connection mid-handshake (e.g. closing a tab).
The exception is caught inside run_wsgi and routed to self.server.log() it never
propagates out, so wrapping run_wsgi doesn't help. Patching the log method is the
only reliable intercept point. The error is cosmetic: Socket.IO already handles the
disconnect correctly via its own disconnect handler and timeout logic."""
try:
from werkzeug.serving import BaseWSGIServer
_original_log = BaseWSGIServer.log
def _filtered_log(self, type, message, *args):
if type == 'error' and 'write() before start_response' in message:
return
_original_log(self, type, message, *args)
BaseWSGIServer.log = _filtered_log
except Exception:
pass
def init_socketio(app, datastore):
"""Initialize SocketIO with the main Flask app"""
_suppress_werkzeug_ws_abrupt_disconnect_noise()
import platform
import sys
File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 10 KiB

-8
View File
@@ -116,14 +116,6 @@ $(document).ready(function () {
$('#realtime-conn-error').show();
});
// Tell the server we're leaving cleanly so it can release the connection
// immediately rather than waiting for a timeout.
// Note: this only fires for voluntary closes (tab/window close, navigation away).
// Hard kills, crashes and network drops will still timeout normally on the server.
window.addEventListener('beforeunload', function () {
socket.disconnect();
});
socket.on('queue_size', function (data) {
console.log(`${data.event_timestamp} - Queue size update: ${data.q_length}`);
if(queueSizePagerInfoText) {
+1 -1
View File
@@ -4,7 +4,7 @@ $(document).ready(function(){
});
var checkUserVal = function(){
if($('#fetch_backend input:checked').val()=='requests') {
if($('#fetch_backend input:checked').val()=='html_requests') {
$('#request-override').show();
$('#webdriver-stepper').hide();
} else {
+6 -25
View File
@@ -3,40 +3,21 @@ $(document).ready(function () {
// Lazy Hide/Show elements mechanism
$('[data-visible-for]').hide();
function show_related_elem(e) {
var name = $(e).attr('name');
var val = $(e).val();
var n = name + "=" + val;
// Resolve browser_profile select → underlying fetch_backend class name
// browserProfileFetcherMap is injected by the page as {machine_name: 'playwright', ...}
if (name && name.endsWith('browser_profile') && typeof browserProfileFetcherMap !== 'undefined') {
var fetcherClass = val === 'system'
? (typeof default_system_fetch_backend !== 'undefined' ? default_system_fetch_backend : null)
: browserProfileFetcherMap[val];
if (fetcherClass) {
n = 'fetch_backend=' + fetcherClass;
}
} else if (n === 'fetch_backend=system') {
var n = $(e).attr('name') + "=" + $(e).val();
if (n === 'fetch_backend=system') {
n = "fetch_backend=" + default_system_fetch_backend;
}
$(`[data-visible-for~="${n}"]`).show();
}
$('select, :radio').on('change', function (e) {
$(`[data-visible-for]`).hide();
$('.advanced-options').hide();
show_related_elem(this);
});
// Retain original click/keyup handling for radio buttons
$(':radio').on('keyup keypress blur click', function (e) {
$(':radio').on('keyup keypress blur change click', function (e) {
$(`[data-visible-for]`).hide();
$('.advanced-options').hide();
show_related_elem(this);
});
$(':radio:checked, select').each(function (e) {
$(':radio:checked').each(function (e) {
show_related_elem(this);
});
})
// Show advanced
@@ -45,4 +26,4 @@ $(document).ready(function () {
$(this).toggle();
})
});
});
});
+10 -150
View File
@@ -143,7 +143,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.__data['settings']['application']['tags'][uuid] = Tag.model(
datastore_path=self.datastore_path,
__datastore=self,
__datastore=self.__data,
default=tag
)
logger.info(f"Tag: {uuid} {tag['title']}")
@@ -207,7 +207,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.json_store_path = os.path.join(self.datastore_path, "changedetection.json")
# Base definition for all watchers (deepcopy part of #569)
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, __datastore=self, default={}))
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, __datastore=self.__data, default={}))
# Load build SHA if available (Docker deployments)
if path.isfile('changedetectionio/source.txt'):
@@ -245,10 +245,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# Maybe they copied a bunch of watch subdirs across too
self._load_state()
# Apply env-var browser config after state is fully loaded so we can safely
# read existing settings without risk of being overwritten.
self.preconfigure_browser_profiles_based_on_env()
def init_fresh_install(self, include_default_watches, version_tag):
# Generate app_guid FIRST (required for all operations)
if "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ:
@@ -272,11 +268,13 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
if include_default_watches:
self.add_watch(
url='https://news.ycombinator.com/',
tag='Tech news'
tag='Tech news',
extras={'fetch_backend': 'html_requests'}
)
self.add_watch(
url='https://changedetection.io/CHANGELOG.txt',
tag='changedetection.io'
tag='changedetection.io',
extras={'fetch_backend': 'html_requests'}
)
# Create changedetection.json immediately
@@ -333,64 +331,9 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
if entity.get('processor') != 'text_json_diff':
logger.trace(f"Loading Watch object '{watch_class.__module__}.{watch_class.__name__}' for UUID {uuid}")
entity = watch_class(datastore_path=self.datastore_path, __datastore=self, default=entity)
entity = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, default=entity)
return entity
def preconfigure_browser_profiles_based_on_env(self):
"""Instantiate browser profiles from environment variables and store them.
Always runs at the end of reload_state() covers fresh installs,
existing datastores, and server restarts. Env vars always win so that
changing PLAYWRIGHT_DRIVER_URL and restarting is reflected immediately.
Creates BrowserProfile instances from env vars and stores them in
``settings.application.browser_profiles`` under their machine names,
then sets ``settings.application.browser_profile`` to that profile as
the system-wide default.
"""
from changedetectionio.model import browser_profile as bp
from changedetectionio.strtobool import strtobool
store_profiles = self.__data['settings']['application'].setdefault('browser_profiles', {})
service_workers = os.getenv('PLAYWRIGHT_SERVICE_WORKERS', 'allow')
extra_delay = int(os.getenv('WEBDRIVER_DELAY_BEFORE_CONTENT_READY', 0))
configured_profile = None
playwright_url = os.getenv('PLAYWRIGHT_DRIVER_URL')
if playwright_url:
playwright_url = playwright_url.strip('"')
builtin = bp.BUILTIN_PUPPETEER if strtobool(os.getenv('FAST_PUPPETEER_CHROME_FETCHER', 'False')) else bp.BUILTIN_PLAYWRIGHT
profile = bp.BrowserProfile(
name=builtin.name,
fetch_backend=builtin.fetch_backend,
browser_connection_url=playwright_url,
service_workers=service_workers,
extra_delay=extra_delay,
is_builtin=True,
)
logger.debug(f"Configuring browser profile '{profile.get_machine_name()}' from env")
store_profiles[profile.get_machine_name()] = profile.model_dump()
configured_profile = profile
webdriver_url = os.getenv('WEBDRIVER_URL')
if webdriver_url:
profile = bp.BrowserProfile(
name=bp.BUILTIN_SELENIUM.name,
fetch_backend=bp.BUILTIN_SELENIUM.fetch_backend,
browser_connection_url=webdriver_url.strip('"'),
extra_delay=extra_delay,
is_builtin=True,
)
logger.debug(f"Configuring browser profile '{profile.get_machine_name()}' from env")
store_profiles[profile.get_machine_name()] = profile.model_dump()
if not configured_profile:
configured_profile = profile
if configured_profile:
logger.debug(f"Setting system default browser profile to '{configured_profile.get_machine_name()}'")
self.__data['settings']['application']['browser_profile'] = configured_profile.get_machine_name()
# ============================================================================
# FileSavingDataStore Abstract Method Implementations
# ============================================================================
@@ -422,14 +365,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# Is saved as {uuid}/tag.json
settings_copy['application']['tags'] = {}
# Serialize BrowserProfile Pydantic instances to plain dicts for JSON storage
raw_profiles = settings_copy['application'].get('browser_profiles', {})
from changedetectionio.model.browser_profile import BrowserProfile
settings_copy['application']['browser_profiles'] = {
k: v.model_dump() if isinstance(v, BrowserProfile) else v
for k, v in raw_profiles.items()
}
return {
'note': 'Settings file - watches are in {uuid}/watch.json, tags are in {uuid}/tag.json',
'app_guid': self.__data.get('app_guid'),
@@ -486,7 +421,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return Tag.model(
datastore_path=self.datastore_path,
__datastore=self,
__datastore=self.__data,
default=entity_dict
)
@@ -832,7 +767,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# If the processor also has its own Watch implementation
watch_class = get_custom_watch_obj_for_processor(apply_extras.get('processor'))
new_watch = watch_class(datastore_path=self.datastore_path, __datastore=self, url=url)
new_watch = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, url=url)
new_uuid = new_watch.get('uuid')
@@ -917,16 +852,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return proxy_list if len(proxy_list) else None
def get_proxy_url_for_watch(self, uuid, override_id=None):
"""
Returns the resolved proxy URL string for a watch, or None.
override_id forces a specific proxy (e.g. proxy checker bypass).
"""
proxy_id = override_id or self.get_preferred_proxy_for_watch(uuid)
if proxy_id:
return self.proxy_list.get(proxy_id, {}).get('url')
return None
def get_preferred_proxy_for_watch(self, uuid):
"""
Returns the preferred proxy by ID key
@@ -960,71 +885,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return None
# ------------------------------------------------------------------
# BrowserProfile helpers
# ------------------------------------------------------------------
def get_browser_profile(self, machine_name: str):
"""Return a BrowserProfile by machine name, or None if not found.
Built-in profiles (direct_http_requests, browser_chromeplaywright) are
always available and checked first.
"""
from changedetectionio.model.browser_profile import get_profile
store_profiles = self.data['settings']['application'].get('browser_profiles', {})
return get_profile(machine_name, store_profiles)
def delete_browser_profile(self, machine_name: str):
"""Delete a user-defined BrowserProfile by machine name.
Rules enforced:
- Built-in profiles cannot be deleted.
- The profile cannot be the current system default
(settings.application.browser_profile); caller must change the
default first.
- Any watch or tag that referenced this profile is reset to None
(falls back through the chain on next fetch).
Returns the number of watches/tags that were reset.
"""
from changedetectionio.model.browser_profile import RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
raise ValueError(f"Built-in profile '{machine_name}' cannot be deleted")
system_default = self.data['settings']['application'].get('browser_profile')
if system_default == machine_name:
raise ValueError(
f"Profile '{machine_name}' is the system default. "
f"Change the system default before deleting it."
)
store_profiles = self.data['settings']['application'].get('browser_profiles', {})
if machine_name not in store_profiles:
return 0
del store_profiles[machine_name]
reset_count = 0
# Reset watches that reference this profile
for uuid, watch in self.data['watching'].items():
if watch.get('browser_profile') == machine_name:
watch['browser_profile'] = None
watch.commit()
reset_count += 1
# Reset tags that reference this profile
for tag_uuid, tag in self.data['settings']['application'].get('tags', {}).items():
if tag.get('browser_profile') == machine_name:
tag['browser_profile'] = None
tag.commit()
reset_count += 1
self._save_settings()
logger.info(f"Deleted BrowserProfile '{machine_name}', reset {reset_count} watches/tags")
return reset_count
@property
def has_extra_headers_file(self):
filepath = os.path.join(self.datastore_path, 'headers.txt')
@@ -1102,7 +962,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
from ..model import Tag
new_tag = Tag.model(
datastore_path=self.datastore_path,
__datastore=self,
__datastore=self.__data,
default={
'title': title.strip(),
'date_created': int(time.time())
-139
View File
@@ -15,7 +15,6 @@ import tarfile
import time
from loguru import logger
from copy import deepcopy
from typing import Optional
# Try to import orjson for faster JSON serialization
@@ -731,144 +730,6 @@ class DatastoreUpdatesMixin:
# (left this out by accident in previous update, added tags={} in the changedetection.json save_to_disk)
self._save_settings()
def update_31(self):
"""
Migrate legacy ``fetch_backend`` strings to the new ``browser_profile``
machine-name system.
What this migration does
------------------------
1. ``settings.requests.extra_browsers`` entries are converted into
``BrowserProfile`` objects and stored in
``settings.application.browser_profiles`` keyed by machine name.
2. ``settings.application.fetch_backend`` (the system-wide default) is
translated to a machine name and written to
``settings.application.browser_profile``.
3. Every watch that has an explicit ``fetch_backend`` (not ``'system'``)
gets a corresponding ``browser_profile`` machine name set, then
``fetch_backend`` is reset to ``'system'``.
4. The same translation is applied to tags with ``overrides_watch=True``
that carry an explicit ``fetch_backend``.
Legacy mapping
~~~~~~~~~~~~~~
* ``'html_requests'`` built-in ``'direct_http_requests'``
* ``'html_webdriver'`` built-in ``'browser_chromeplaywright'``
* ``'extra_browser_<name>'`` machine name of the migrated custom profile
* ``'system'`` / missing ``None`` (continue to use chain resolution)
Safe to re-run: skips watches / tags that already have ``browser_profile``
set, and skips extra_browser entries that have already been migrated.
"""
from ..model.browser_profile import (
BrowserProfile,
BUILTIN_REQUESTS,
BUILTIN_BROWSER,
)
app_settings = self.data['settings']['application']
# ------------------------------------------------------------------
# 1. Migrate extra_browsers → browser_profiles
# ------------------------------------------------------------------
extra_browsers = self.data['settings']['requests'].get('extra_browsers', [])
browser_profiles: dict = app_settings.setdefault('browser_profiles', {})
extra_browser_name_to_machine: dict[str, str] = {}
for entry in extra_browsers:
browser_name = entry.get('browser_name', '').strip()
connection_url = entry.get('browser_connection_url', '').strip()
if not browser_name:
continue
profile = BrowserProfile(
name=browser_name,
fetch_backend='playwright_cdp',
browser_connection_url=connection_url or None,
)
machine_name = profile.get_machine_name()
if machine_name not in browser_profiles:
browser_profiles[machine_name] = profile.model_dump()
logger.info(f"update_31: migrated extra_browser '{browser_name}' → profile '{machine_name}'")
extra_browser_name_to_machine[browser_name] = machine_name
# ------------------------------------------------------------------
# Helper: translate a fetch_backend string to a machine name
# ------------------------------------------------------------------
builtin_requests_name = BUILTIN_REQUESTS.get_machine_name()
builtin_browser_name = BUILTIN_BROWSER.get_machine_name()
def _to_machine_name(fetch_backend: str) -> Optional[str]:
if not fetch_backend or fetch_backend in ('system', 'default', ''):
return None
if fetch_backend.startswith('extra_browser_'):
key = fetch_backend[len('extra_browser_'):]
return extra_browser_name_to_machine.get(key)
# Strip legacy html_ prefix then query the fetcher registry
from changedetectionio import content_fetchers as cf
clean = fetch_backend[5:] if fetch_backend.startswith('html_') else fetch_backend
fetcher_cls = cf.get_fetcher(clean)
if fetcher_cls is None:
logger.warning(f"update_31: unknown fetch_backend value {fetch_backend!r}, skipping")
return None
if fetcher_cls.supports_screenshots:
return builtin_browser_name
return builtin_requests_name
# ------------------------------------------------------------------
# 2. Migrate system-wide default
# ------------------------------------------------------------------
system_fetch_backend = app_settings.get('fetch_backend', 'requests')
if not app_settings.get('browser_profile'):
machine = _to_machine_name(system_fetch_backend)
app_settings['browser_profile'] = machine
logger.info(
f"update_31: system fetch_backend '{system_fetch_backend}' → browser_profile '{machine}'"
)
# ------------------------------------------------------------------
# 3. Migrate watches
# ------------------------------------------------------------------
for uuid, watch in self.data['watching'].items():
if watch.get('browser_profile'):
continue # already migrated
fetch_backend = watch.get('fetch_backend', 'system')
machine = _to_machine_name(fetch_backend)
watch['browser_profile'] = machine
watch['fetch_backend'] = 'system' # clear legacy value
watch.commit()
if machine:
logger.info(
f"update_31: watch {uuid} fetch_backend '{fetch_backend}' → browser_profile '{machine}'"
)
# ------------------------------------------------------------------
# 4. Migrate tags
# ------------------------------------------------------------------
for tag_uuid, tag in app_settings.get('tags', {}).items():
if tag.get('browser_profile'):
continue # already migrated
fetch_backend = tag.get('fetch_backend', 'system')
machine = _to_machine_name(fetch_backend)
if machine:
tag['browser_profile'] = machine
tag['fetch_backend'] = 'system'
tag.commit()
logger.info(
f"update_31: tag {tag_uuid} fetch_backend '{fetch_backend}' → browser_profile '{machine}'"
)
self._save_settings()
logger.success("update_31: fetch_backend → browser_profile migration complete")
def update_30(self):
"""Migrate restock_settings out of watch.json into restock_diff.json processor config file.
@@ -1,208 +0,0 @@
{#
Notification Profile Selector widget.
Usage:
{% from '_notification_profiles_selector.html' import render_notification_profile_selector %}
{{ render_notification_profile_selector(
own_profiles=watch.get('notification_profiles', []),
inherited_profiles=inherited_notification_profiles,
all_profile_data=settings_application.get('notification_profile_data', {}),
registry=registry
) }}
own_profiles — list of UUIDs directly linked to this watch/group
inherited_profiles — list of (uuid, origin_label) tuples from parent groups/system
all_profile_data — dict of uuid→profile from settings.application.notification_profile_data
registry — notification_profiles.registry instance
#}
{% macro render_notification_profile_selector(own_profiles, inherited_profiles, all_profile_data, registry) %}
<div class="notification-profile-selector" id="notification-profile-selector">
{# Hidden inputs — one per selected UUID, submitted with the form #}
<div id="np-hidden-inputs">
{% for uid in own_profiles %}
<input type="hidden" name="notification_profiles" value="{{ uid }}">
{% endfor %}
</div>
<div class="np-chips" id="np-chips">
{# Own profiles — solid chips, removable #}
{% for uid in own_profiles %}
{% set profile = all_profile_data.get(uid) %}
{% if profile %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
<span class="np-chip np-chip-own" data-uuid="{{ uid }}"
title="{{ handler.get_url_hint(profile.get('config', {})) }}">
<i data-feather="{{ handler.icon }}" class="np-chip-icon"></i>
<span class="np-chip-name">{{ profile.get('name', uid) }}</span>
<span class="np-chip-remove" data-uuid="{{ uid }}" title="{{ _('Remove') }}">×</span>
</span>
{% endif %}
{% endfor %}
{# Inherited profiles — dimmed, read-only, show origin #}
{% for uid, origin_label in (inherited_profiles or []) %}
{% if uid not in own_profiles %}
{% set profile = all_profile_data.get(uid) %}
{% if profile %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
<span class="np-chip np-chip-inherited"
title="{{ _('Inherited from') }}: {{ origin_label }} — {{ handler.get_url_hint(profile.get('config', {})) }}">
<i data-feather="{{ handler.icon }}" class="np-chip-icon"></i>
<span class="np-chip-name">{{ profile.get('name', uid) }}</span>
<i data-feather="lock" class="np-chip-lock"></i>
</span>
{% endif %}
{% endif %}
{% endfor %}
{# Add button + dropdown #}
<div class="np-add-wrapper" id="np-add-wrapper">
<button type="button" class="np-add-btn pure-button button-xsmall" id="np-add-btn">
<i data-feather="plus"></i> {{ _('Add profile') }}
</button>
<div class="np-dropdown" id="np-dropdown" style="display:none;">
<input type="text" class="np-search" id="np-search" placeholder="{{ _('Search profiles…') }}" autocomplete="off">
<div class="np-options" id="np-options">
{% set has_options = [] %}
{% for uid, profile in all_profile_data.items() %}
{% if uid not in own_profiles %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
{% set hint = handler.get_url_hint(profile.get('config', {})) %}
<div class="np-option" data-uuid="{{ uid }}"
data-name="{{ profile.get('name', '') }}"
data-icon="{{ handler.icon }}"
data-hint="{{ hint }}">
<i data-feather="{{ handler.icon }}" class="np-option-icon"></i>
<span class="np-option-text">
<strong class="np-option-name">{{ profile.get('name', uid) }}</strong>
{% if hint %}<small class="np-option-hint">{{ hint }}</small>{% endif %}
</span>
</div>
{% if has_options.append(1) %}{% endif %}
{% endif %}
{% endfor %}
{% if not has_options %}
<div class="np-option np-no-results" style="pointer-events:none; color: var(--color-grey-600);">
{{ _('No other profiles available') }}
</div>
{% endif %}
<div class="np-no-match" style="display:none; padding: 8px 12px; color: var(--color-grey-600); font-size: 0.85em;">
{{ _('No profiles match') }}
</div>
</div>
<a href="{{ url_for('notification_profiles.edit') }}" class="np-create-new">
<i data-feather="plus-circle"></i> {{ _('Create new profile') }}
</a>
</div>
</div>
</div>{# .np-chips #}
{% if not own_profiles and not inherited_profiles %}
<p class="pure-form-message-inline" style="margin: 4px 0 0 0; color: var(--color-grey-600);">
{{ _('No notification profiles linked. Notifications will not be sent for this watch.') }}
</p>
{% endif %}
</div>{# .notification-profile-selector #}
<script>
(function() {
var selector = document.getElementById('notification-profile-selector');
if (!selector) return;
var addBtn = selector.querySelector('#np-add-btn');
var dropdown = selector.querySelector('#np-dropdown');
var search = selector.querySelector('#np-search');
var chips = selector.querySelector('#np-chips');
var hiddenWrap = selector.querySelector('#np-hidden-inputs');
var noMatch = selector.querySelector('.np-no-match');
// Toggle dropdown
addBtn.addEventListener('click', function(e) {
e.stopPropagation();
var open = dropdown.style.display !== 'none';
dropdown.style.display = open ? 'none' : 'block';
if (!open) { search.value = ''; filterOptions(''); search.focus(); }
});
// Close on outside click
document.addEventListener('click', function(e) {
if (!selector.contains(e.target)) dropdown.style.display = 'none';
});
// Search filter
search.addEventListener('input', function() { filterOptions(this.value.toLowerCase()); });
function filterOptions(q) {
var opts = selector.querySelectorAll('.np-option:not(.np-no-results)');
var visible = 0;
opts.forEach(function(opt) {
var match = !q || opt.dataset.name.toLowerCase().indexOf(q) !== -1
|| (opt.dataset.hint || '').toLowerCase().indexOf(q) !== -1;
opt.style.display = match ? '' : 'none';
if (match) visible++;
});
noMatch.style.display = (visible === 0 && q) ? 'block' : 'none';
}
// Add profile
selector.querySelectorAll('.np-option:not(.np-no-results)').forEach(function(opt) {
opt.addEventListener('click', function() {
var uuid = this.dataset.uuid;
var name = this.dataset.name;
var icon = this.dataset.icon;
var hint = this.dataset.hint;
// Add hidden input
var inp = document.createElement('input');
inp.type = 'hidden'; inp.name = 'notification_profiles'; inp.value = uuid;
hiddenWrap.appendChild(inp);
// Add chip (before the add-wrapper)
var chip = document.createElement('span');
chip.className = 'np-chip np-chip-own';
chip.dataset.uuid = uuid;
chip.title = hint || '';
chip.innerHTML = '<i data-feather="' + icon + '" class="np-chip-icon"></i>'
+ '<span class="np-chip-name">' + escHtml(name) + '</span>'
+ '<span class="np-chip-remove" data-uuid="' + uuid + '" title="{{ _("Remove") }}">×</span>';
chips.insertBefore(chip, selector.querySelector('#np-add-wrapper'));
chip.querySelector('.np-chip-remove').addEventListener('click', removeChip);
// Hide this option in dropdown
this.style.display = 'none';
dropdown.style.display = 'none';
if (window.feather) feather.replace();
});
});
// Remove chip
selector.querySelectorAll('.np-chip-remove').forEach(function(btn) {
btn.addEventListener('click', removeChip);
});
function removeChip() {
var uuid = this.dataset.uuid;
var chip = selector.querySelector('.np-chip-own[data-uuid="' + uuid + '"]');
if (chip) chip.remove();
var inp = hiddenWrap.querySelector('input[value="' + uuid + '"]');
if (inp) inp.remove();
// Re-show in dropdown
var opt = selector.querySelector('.np-option[data-uuid="' + uuid + '"]');
if (opt) opt.style.display = '';
}
function escHtml(s) {
return s.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
}
if (window.feather) feather.replace();
})();
</script>
{% endmacro %}
@@ -4,53 +4,29 @@ import os
from flask import url_for
from ..util import live_server_setup, wait_for_all_checks
CUSTOM_PROFILE_NAME = 'Custom Browser URL'
CUSTOM_PROFILE_MACHINE_NAME = 'custom_browser_url'
CUSTOM_BROWSER_WS = 'ws://sockpuppetbrowser-custom-url:3000'
def create_custom_browser_profile(client):
"""Create a browser profile that uses the custom sockpuppet container."""
res = client.post(
url_for("settings.settings_browsers.save"),
data={
"name": CUSTOM_PROFILE_NAME,
"fetch_backend": "playwright_cdp",
"browser_connection_url": CUSTOM_BROWSER_WS,
"viewport_width": 1280,
"viewport_height": 1000,
"block_images": "",
"block_fonts": "",
"ignore_https_errors": "",
"user_agent": "",
"locale": "",
"original_machine_name": "",
},
follow_redirects=True
)
assert b"saved." in res.data, f"Expected profile save confirmation, got: {res.data[:500]}"
def do_test(client, live_server, make_test_use_extra_browser=False):
# Grep for this string in the logs?
test_url = "https://changedetection.io/ci-test.html?non-custom-default=true"
# "non-custom-default" should not appear in the custom browser connection
custom_browser_name = 'custom browser URL'
# needs to be set and something like 'ws://127.0.0.1:3000'
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
test_url = "https://changedetection.io/ci-test.html?non-custom-default=true"
# preconfigure_browser_profiles_based_on_env() already set the correct system default
#####################
res = client.post(
url_for("settings.settings_page"),
data={
"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
},
data={"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver",
'requests-extra_browsers-0-browser_connection_url': 'ws://sockpuppetbrowser-custom-url:3000',
'requests-extra_browsers-0-browser_name': custom_browser_name
},
follow_redirects=True
)
assert b"Settings updated." in res.data
# Create the custom browser profile
create_custom_browser_profile(client)
assert b"Settings updated." in res.data
# Add our URL to the import page
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
@@ -59,24 +35,23 @@ def do_test(client, live_server, make_test_use_extra_browser=False):
if make_test_use_extra_browser:
# The custom profile name should appear in the edit page under "Request" tab
# So the name should appear in the edit page under "Request" > "Fetch Method"
res = client.get(
url_for("ui.ui_edit.edit_page", uuid="first"),
follow_redirects=True
)
assert CUSTOM_PROFILE_NAME.encode() in res.data, \
f"Expected '{CUSTOM_PROFILE_NAME}' in edit page fetch method choices"
assert b'custom browser URL' in res.data
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={
# 'run_custom_browser_url_tests.sh' will grep for this string in the custom container logs
"url": "https://changedetection.io/ci-test.html?custom-browser-search-string=1",
"tags": "",
"headers": "",
"browser_profile": CUSTOM_PROFILE_MACHINE_NAME,
"webdriver_js_execute_code": "",
"time_between_check_use_default": "y"
# 'run_customer_browser_url_tests.sh' will search for this string to know if we hit the right browser container or not
"url": "https://changedetection.io/ci-test.html?custom-browser-search-string=1",
"tags": "",
"headers": "",
'fetch_backend': f"extra_browser_{custom_browser_name}",
'webdriver_js_execute_code': '',
"time_between_check_use_default": "y"
},
follow_redirects=True
)
@@ -99,10 +74,12 @@ def do_test(client, live_server, make_test_use_extra_browser=False):
# Requires playwright to be installed
def test_request_via_custom_browser_url(client, live_server, measure_memory_usage, datastore_path):
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=True)
def test_request_not_via_custom_browser_url(client, live_server, measure_memory_usage, datastore_path):
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=False)
@@ -12,13 +12,12 @@ def test_fetch_webdriver_content(client, live_server, measure_memory_usage, data
# live_server_setup(live_server) # Setup on conftest per function
#####################
# preconfigure_browser_profiles_based_on_env() already set the correct system default
# (playwright or puppeteer depending on FAST_PUPPETEER_CHROME_FETCHER) — no need to override it.
res = client.post(
url_for("settings.settings_page"),
data={
"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver",
'application-ui-favicons_enabled': "y",
},
follow_redirects=True
@@ -25,6 +25,7 @@ def test_execute_custom_js(client, live_server, measure_memory_usage, datastore_
data={
"url": test_url,
"tags": "",
'fetch_backend': "html_webdriver",
'webdriver_js_execute_code': 'document.querySelector("button[name=test-button]").click();',
'headers': "testheader: yes\buser-agent: MyCustomAgent",
"time_between_check_use_default": "y",
@@ -22,7 +22,7 @@ def test_preferred_proxy(client, live_server, measure_memory_usage, datastore_pa
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"browser_profile": "system",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"headers": "",
"proxy": "proxy-two",
"tags": "",
@@ -22,6 +22,7 @@ def test_noproxy_option(client, live_server, measure_memory_usage, datastore_pat
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": "html_requests",
"requests-extra_proxies-0-proxy_name": "custom-one-proxy",
"requests-extra_proxies-0-proxy_url": "http://test:awesome@squid-one:3128",
"requests-extra_proxies-1-proxy_name": "custom-two-proxy",
@@ -56,6 +57,7 @@ def test_noproxy_option(client, live_server, measure_memory_usage, datastore_pat
url_for("ui.ui_edit.edit_page", uuid=uuid, unpause_on_save=1),
data={
"include_filters": "",
"fetch_backend": "html_requests",
"headers": "",
"proxy": "no-proxy",
"tags": "",
@@ -21,6 +21,7 @@ def test_proxy_noconnect_custom(client, live_server, measure_memory_usage, datas
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') or os.getenv("WEBDRIVER_URL") else 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
# test:awesome is set in tests/proxy_list/squid-passwords.txt
"requests-extra_proxies-0-proxy_url": "http://127.0.0.1:3128",
@@ -41,7 +42,7 @@ def test_proxy_noconnect_custom(client, live_server, measure_memory_usage, datas
options = {
"url": test_url,
"browser_profile": "system",
"fetch_backend": "html_webdriver" if os.getenv('PLAYWRIGHT_DRIVER_URL') or os.getenv("WEBDRIVER_URL") else "html_requests",
"proxy": "ui-0custom-test-proxy",
"time_between_check_use_default": "y",
}
@@ -15,6 +15,7 @@ def test_select_custom(client, live_server, measure_memory_usage, datastore_path
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
# test:awesome is set in tests/proxy_list/squid-passwords.txt
"requests-extra_proxies-0-proxy_url": "http://test:awesome@squid-custom:3128",
@@ -58,6 +59,7 @@ def test_custom_proxy_validation(client, live_server, measure_memory_usage, data
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
"requests-extra_proxies-0-proxy_url": "xxxxhtt/333??p://test:awesome@squid-custom:3128",
},
@@ -73,6 +75,7 @@ def test_custom_proxy_validation(client, live_server, measure_memory_usage, data
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
"requests-extra_proxies-0-proxy_url": "https://",
},
@@ -29,6 +29,7 @@ def test_socks5(client, live_server, measure_memory_usage, datastore_path):
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": "html_requests",
# set in .github/workflows/test-only.yml
"requests-extra_proxies-0-proxy_url": "socks5://proxy_user123:proxy_pass123@socks5proxy:1080",
"requests-extra_proxies-0-proxy_name": "socks5proxy",
@@ -60,7 +61,7 @@ def test_socks5(client, live_server, measure_memory_usage, datastore_path):
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"browser_profile": "system",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"headers": "",
"proxy": "ui-0socks5proxy",
"tags": "",
@@ -48,7 +48,7 @@ def test_socks5_from_proxiesjson_file(client, live_server, measure_memory_usage,
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"browser_profile": "system",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"headers": "",
"proxy": "socks5proxy",
"tags": "",
@@ -60,14 +60,15 @@ def test_restock_detection(client, live_server, measure_memory_usage, datastore_
#####################
# preconfigure_browser_profiles_based_on_env() already set the correct system default
# Set this up for when we remove the notification from the watch, it should fallback with these details
res = client.post(
url_for("settings.settings_page"),
data={"application-notification_urls": notification_url,
"application-notification_title": "fallback-title "+default_notification_title,
"application-notification_body": "fallback-body "+default_notification_body,
"application-notification_format": default_notification_format,
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver"},
follow_redirects=True
)
# Add our URL to the import page, because the docker container (playwright/selenium) wont be able to connect to our usual test url
@@ -56,7 +56,8 @@ def test_check_notification_email_formats_default_HTML(client, live_server, meas
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": "some text\nfallback-body<br> " + default_notification_body,
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -125,7 +126,8 @@ def test_check_notification_plaintext_format(client, live_server, measure_memory
"application-notification_title": "fallback-title {{watch_title}} {{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }} " + default_notification_title,
"application-notification_body": f"some text\n" + default_notification_body + f"\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -186,7 +188,8 @@ def test_check_notification_html_color_format(client, live_server, measure_memor
"application-notification_title": "fallback-title {{watch_title}} - diff_added_lines_test : '{{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }}' " + default_notification_title,
"application-notification_body": f"some text\n{default_notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'htmlcolor',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -270,7 +273,8 @@ def test_check_notification_markdown_format(client, live_server, measure_memory_
"application-notification_title": "fallback-title diff_added_lines_test : '{{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }}' " + default_notification_title,
"application-notification_body": "*header*\n\nsome text\n" + default_notification_body,
"application-notification_format": 'markdown',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -365,7 +369,8 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": notification_body,
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -415,7 +420,7 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
data={
"url": test_url,
"notification_format": 'html',
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -475,7 +480,8 @@ def test_check_plaintext_document_plaintext_notification_smtp(client, live_serve
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -527,7 +533,8 @@ def test_check_plaintext_document_html_notifications(client, live_server, measur
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -606,7 +613,8 @@ def test_check_plaintext_document_html_color_notifications(client, live_server,
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'htmlcolor',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -678,7 +686,8 @@ def test_check_html_document_plaintext_notification(client, live_server, measure
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -731,7 +740,8 @@ def test_check_html_notification_with_apprise_format_is_html(client, live_server
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": "some text\nfallback-body<br> " + default_notification_body,
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
+10 -6
View File
@@ -32,7 +32,8 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
url_for("settings.settings_page"),
data={"application-password": "foobar",
"application-shared_diff_access": "True",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -90,7 +91,8 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
res = c.post(
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -125,16 +127,16 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
assert b"IMPORT" in res.data
assert b"LOG OUT" in res.data
assert b"time_between_check-minutes" in res.data
assert b"fetch_backend" in res.data
##################################################
# Remove password button, and check that it worked
##################################################
# preconfigure_browser_profiles_based_on_env() already set the correct system default
res = c.post(
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_webdriver",
"application-removepassword_button": "Remove password"
},
follow_redirects=True,
@@ -148,7 +150,8 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
res = c.post(
url_for("settings.settings_page"),
data={"application-password": "",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -161,7 +164,8 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
data={"application-password": "foobar",
# Should be disabled
"application-shared_diff_access": "",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -60,6 +60,7 @@ def test_check_removed_line_contains_trigger(client, live_server, measure_memory
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"trigger_text": 'The golden line',
"url": test_url,
'fetch_backend': "html_requests",
'filter_text_removed': 'y',
"time_between_check_use_default": "y"},
follow_redirects=True
@@ -126,7 +127,8 @@ def test_check_add_line_contains_trigger(client, live_server, measure_memory_usa
# https://github.com/caronc/apprise/wiki/Notify_Custom_JSON#get-parameter-manipulation
"application-notification_urls": test_notification_url,
"application-notification_format": 'text',
"application-minutes_between_check": 180
"application-minutes_between_check": 180,
"application-fetch_backend": "html_requests"
},
follow_redirects=True
)
@@ -147,6 +149,7 @@ def test_check_add_line_contains_trigger(client, live_server, measure_memory_usa
data={"trigger_text": 'Oh yes please',
"url": test_url,
'processor': 'text_json_diff',
'fetch_backend': "html_requests",
'filter_text_removed': '',
'filter_text_added': 'y',
"time_between_check_use_default": "y"},
+4 -9
View File
@@ -170,14 +170,6 @@ def test_api_simple(client, live_server, measure_memory_usage, datastore_path):
headers={'x-api-key': api_key},
)
assert b'(changed) Which is across' in res.data
assert b'Some text thats the same' in res.data
# Fetch the difference between two versions (default text format)
res = client.get(
url_for("watchhistorydiff", uuid=watch_uuid, from_timestamp='previous', to_timestamp='latest')+"?changesOnly=true",
headers={'x-api-key': api_key},
)
assert b'Some text thats the same' not in res.data
# Test htmlcolor format
res = client.get(
@@ -416,6 +408,7 @@ def test_access_denied(client, live_server, measure_memory_usage, datastore_path
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": ""
},
follow_redirects=True
@@ -435,6 +428,7 @@ def test_access_denied(client, live_server, measure_memory_usage, datastore_path
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": "y"
},
follow_redirects=True
@@ -905,7 +899,8 @@ def test_api_conflict_UI_password(client, live_server, measure_memory_usage, dat
url_for("settings.settings_page"),
data={"application-password": "foobar", # password is now set! API should still work!
"application-api_access_token_enabled": "y",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -177,6 +177,7 @@ def test_openapi_validation_get_requests_bypass_validation(client, live_server,
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": ""
},
follow_redirects=True
+5 -26
View File
@@ -178,44 +178,23 @@ def test_api_tags_listing(client, live_server, measure_memory_usage, datastore_p
def test_api_tag_restock_processor_config(client, live_server, measure_memory_usage, datastore_path):
"""
Test that a tag/group can be created and updated with processor_config_restock_diff via the API.
Test that a tag/group can be updated with processor_config_restock_diff via the API.
Since Tag extends WatchBase, processor config fields injected into WatchBase are also valid for tags.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
# Create a tag with processor_config_restock_diff in a single POST (issue #3966)
# Create a tag
res = client.post(
url_for("tag"),
data=json.dumps({
"title": "Restock Group",
"overrides_watch": True,
"processor_config_restock_diff": {
"in_stock_processing": "in_stock_only",
"follow_price_changes": True,
"price_change_min": 7777777
}
}),
data=json.dumps({"title": "Restock Group"}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 201, f"POST tag with restock config failed: {res.data}"
assert res.status_code == 201
tag_uuid = res.json.get('uuid')
# Verify processor config was saved during creation (the bug: these were discarded)
res = client.get(
url_for("tag", uuid=tag_uuid),
headers={'x-api-key': api_key}
)
assert res.status_code == 200
tag_data = res.json
assert tag_data.get('overrides_watch') == True, "overrides_watch should be saved on POST"
assert tag_data.get('processor_config_restock_diff', {}).get('in_stock_processing') == 'in_stock_only', \
"processor_config_restock_diff should be saved on POST"
assert tag_data.get('processor_config_restock_diff', {}).get('price_change_min') == 7777777, \
"price_change_min should be saved on POST"
# Update tag with valid processor_config_restock_diff via PUT
# Update tag with valid processor_config_restock_diff
res = client.put(
url_for("tag", uuid=tag_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
+1 -1
View File
@@ -19,7 +19,7 @@ def test_basic_auth(client, live_server, measure_memory_usage, datastore_path):
# Check form validation
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", "time_between_check_use_default": "y"},
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"},
follow_redirects=True
)
assert b"Updated watch." in res.data
+6 -37
View File
@@ -48,15 +48,6 @@ def test_check_basic_change_detection_functionality(client, live_server, measure
# Check this class does not appear (that we didnt see the actual source)
assert b'foobar-detection' not in res.data
# Check POST preview
res = client.post(
url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True
)
# Check this class does not appear (that we didnt see the actual source)
assert b'foobar-detection' not in res.data
# Make a change
set_modified_response(datastore_path=datastore_path)
@@ -172,7 +163,8 @@ def test_title_scraper(client, live_server, measure_memory_usage, datastore_path
res = client.post(
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -214,7 +206,8 @@ def test_requests_timeout(client, live_server, measure_memory_usage, datastore_p
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180,
"requests-timeout": delay - 1},
"requests-timeout": delay - 1,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -232,7 +225,8 @@ def test_requests_timeout(client, live_server, measure_memory_usage, datastore_p
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180,
"requests-timeout": delay + 1}, # timeout should be a second more than the reply time
"requests-timeout": delay + 1, # timeout should be a second more than the reply time
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
@@ -419,28 +413,3 @@ def test_plaintext_even_if_xml_content_and_can_apply_filters(client, live_server
assert b'&lt;foobar' not in res.data
res = delete_all_watches(client)
def test_last_error_cleared_on_same_checksum(client, live_server, datastore_path):
"""last_error should be cleared even when content is unchanged (checksumFromPreviousCheckWasTheSame path)"""
set_original_response(datastore_path=datastore_path)
uuid = client.application.config.get('DATASTORE').add_watch(url=url_for('test_endpoint', _external=True))
# First check - establishes baseline checksum
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# Inject a stale last_error directly (simulates a prior failed check)
datastore = client.application.config.get('DATASTORE')
datastore.update_watch(uuid=uuid, update_obj={'last_error': 'Some previous error'})
assert datastore.data['watching'][uuid].get('last_error') == 'Some previous error'
# Second check - same content, so checksumFromPreviousCheckWasTheSame will fire
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# last_error must be cleared even though no change was detected
assert datastore.data['watching'][uuid].get('last_error') == False
delete_all_watches(client)
+2 -64
View File
@@ -3,7 +3,7 @@
from .util import set_original_response, live_server_setup, wait_for_all_checks
from flask import url_for
import io
from zipfile import ZipFile, ZIP_DEFLATED
from zipfile import ZipFile
import re
import time
from changedetectionio.model import Watch, Tag
@@ -68,9 +68,6 @@ def test_backup(client, live_server, measure_memory_usage, datastore_path):
# Check for changedetection.json (settings file)
assert 'changedetection.json' in l, "changedetection.json should be in backup"
# secret.txt must never be included — it contains the Flask session key
assert 'secret.txt' not in l, "secret.txt (Flask session key) must not be included in backup"
# Get the latest one
res = client.get(
url_for("backups.remove_backups"),
@@ -199,63 +196,4 @@ def test_backup_restore(client, live_server, measure_memory_usage, datastore_pat
assert restored_tag2 is not None, f"Tag {tag_uuid2} not found after restore"
assert restored_tag2['title'] == "Tasty backup tag number two", "Restored tag 2 title does not match"
assert isinstance(restored_tag2, Tag.model), \
f"Tag 2 not properly rehydrated, got {type(restored_tag2)}"
def test_backup_restore_zip_slip_rejected(client, live_server, measure_memory_usage, datastore_path):
"""Zip Slip path traversal entries in a restore zip must be rejected."""
import pytest
from changedetectionio.blueprint.backups.restore import import_from_zip
# Build a zip with a path traversal entry that would escape the extraction dir
malicious_zip = io.BytesIO()
with ZipFile(malicious_zip, 'w') as zf:
zf.writestr("../escaped.txt", "ATTACKER-CONTROLLED")
malicious_zip.seek(0)
datastore = live_server.app.config['DATASTORE']
with pytest.raises(ValueError, match="Zip Slip"):
import_from_zip(
zip_stream=malicious_zip,
datastore=datastore,
include_groups=True,
include_groups_replace=True,
include_watches=True,
include_watches_replace=True,
)
def test_backup_restore_zip_bomb_rejected(client, live_server, measure_memory_usage, datastore_path):
"""A zip whose total uncompressed size exceeds the limit must be rejected.
The guard reads file_size from the zip central-directory metadata no
actual decompression happens, so this test is fast and uses minimal RAM.
100 KB of zeros compresses to ~100 bytes; monkeypatching the limit to
50 KB is enough to trigger the check without creating any large files.
"""
import pytest
import changedetectionio.blueprint.backups.restore as restore_mod
from changedetectionio.blueprint.backups.restore import import_from_zip
# ~100 KB of zeros → deflate compresses to ~100 bytes, but file_size metadata = 100 KB
bomb_zip = io.BytesIO()
with ZipFile(bomb_zip, 'w', compression=ZIP_DEFLATED) as zf:
zf.writestr("data.txt", b"\x00" * (100 * 1024))
bomb_zip.seek(0)
datastore = live_server.app.config['DATASTORE']
original_limit = restore_mod._MAX_DECOMPRESSED_BYTES
try:
restore_mod._MAX_DECOMPRESSED_BYTES = 50 * 1024 # 50 KB limit for this test
with pytest.raises(ValueError, match="decompressed size"):
import_from_zip(
zip_stream=bomb_zip,
datastore=datastore,
include_groups=True,
include_groups_replace=True,
include_watches=True,
include_watches_replace=True,
)
finally:
restore_mod._MAX_DECOMPRESSED_BYTES = original_limit
f"Tag 2 not properly rehydrated, got {type(restored_tag2)}"
@@ -118,7 +118,8 @@ def test_everything(live_server, client, measure_memory_usage, datastore_path):
res = client.post(
url_for("settings.settings_page"),
data={"application-password": "foobar",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
@@ -83,6 +83,7 @@ def test_check_block_changedetection_text_NOT_present(client, live_server, measu
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={"text_should_not_be_present": ignore_text,
"url": test_url,
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -1,281 +0,0 @@
#!/usr/bin/env python3
"""
Tests that the watchlist shows/hides the browser status icon based on the
effective browser profile, covering the full inheritance chain:
watch browser_profile system default browser_profile direct_http_requests
"""
import pytest
from flask import url_for
def set_system_default_profile(client, profile_machine_name):
res = client.post(
url_for('settings.settings_browsers.set_default'),
data={'machine_name': profile_machine_name},
follow_redirects=True,
)
assert res.status_code == 200
def create_custom_browser_profile(client, name='My Custom Chrome'):
"""Create a custom browser profile using playwright_cdp and return its machine name."""
res = client.post(
url_for('settings.settings_browsers.save'),
data={
'name': name,
'fetch_backend': 'playwright_cdp',
'browser_connection_url': 'ws://localhost:3000',
'viewport_width': 1280,
'viewport_height': 1000,
'block_images': '',
'block_fonts': '',
'ignore_https_errors': '',
'user_agent': '',
'locale': '',
'custom_headers': '',
'original_machine_name': '',
},
follow_redirects=True,
)
assert b'saved.' in res.data
from changedetectionio.model.browser_profile import BrowserProfile
return BrowserProfile(name=name, fetch_backend='playwright_cdp').get_machine_name()
def create_requests_browser_profile(client, name, user_agent='', custom_headers=''):
"""Create a requests-type browser profile with optional UA and custom headers."""
res = client.post(
url_for('settings.settings_browsers.save'),
data={
'name': name,
'fetch_backend': 'requests',
'browser_connection_url': '',
'viewport_width': 1280,
'viewport_height': 1000,
'block_images': '',
'block_fonts': '',
'ignore_https_errors': '',
'user_agent': user_agent,
'locale': '',
'custom_headers': custom_headers,
'original_machine_name': '',
},
follow_redirects=True,
)
assert b'saved.' in res.data
from changedetectionio.model.browser_profile import BrowserProfile
return BrowserProfile(name=name, fetch_backend='requests').get_machine_name()
# ---------------------------------------------------------------------------
# Unit tests — status_icon attribute on fetcher classes
# ---------------------------------------------------------------------------
def test_status_icon_on_browser_fetchers():
"""Browser fetcher classes must declare a status_icon dict."""
from changedetectionio.content_fetchers.playwright.CDP import fetcher as playwright_fetcher
from changedetectionio.content_fetchers.puppeteer import fetcher as puppeteer_fetcher
from changedetectionio.content_fetchers.webdriver_selenium import fetcher as selenium_fetcher
for cls in (playwright_fetcher, puppeteer_fetcher, selenium_fetcher):
assert cls.status_icon is not None, f"{cls} should have status_icon set"
assert 'filename' in cls.status_icon
assert 'alt' in cls.status_icon
assert 'title' in cls.status_icon
def test_no_status_icon_on_requests_fetcher():
"""The plain requests fetcher must have status_icon = None."""
from changedetectionio.content_fetchers.requests import fetcher as requests_fetcher
assert requests_fetcher.status_icon is None
def test_fetcher_status_icons_filter_uses_status_icon(monkeypatch):
"""fetcher_status_icons filter returns icon HTML for a class with status_icon set."""
from changedetectionio import content_fetchers
class FakeBrowserFetcher:
status_icon = {'filename': 'test-icon.png', 'alt': 'Test browser', 'title': 'Test browser'}
supports_screenshots = True
monkeypatch.setitem(content_fetchers.FETCHERS, 'fake_browser', FakeBrowserFetcher)
from changedetectionio.flask_app import app
with app.test_request_context('/'):
from changedetectionio.flask_app import _jinja2_filter_fetcher_status_icons
result = _jinja2_filter_fetcher_status_icons('fake_browser')
assert 'test-icon.png' in result
assert 'Test browser' in result
# Requests fetcher → empty string
with app.test_request_context('/'):
result = _jinja2_filter_fetcher_status_icons('requests')
assert result == ''
# ---------------------------------------------------------------------------
# Integration tests — inheritance chain
# ---------------------------------------------------------------------------
def test_watch_explicit_browser_profile_shows_icon(client, live_server, measure_memory_usage, datastore_path):
"""Watch explicitly assigned a browser profile shows the chrome icon,
even when the system default is requests."""
datastore = client.application.config.get('DATASTORE')
set_system_default_profile(client, 'direct_http_requests')
machine_name = create_custom_browser_profile(client)
uuid = datastore.add_watch(url='http://example.com', extras={'browser_profile': machine_name, 'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' in res.data, \
"Chrome icon should appear when watch is explicitly set to a browser profile"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_watch_explicit_requests_profile_no_icon(client, live_server, measure_memory_usage, datastore_path):
"""Watch explicitly set to direct_http_requests never shows the chrome icon,
even when the system default is a browser."""
datastore = client.application.config.get('DATASTORE')
machine_name = create_custom_browser_profile(client)
set_system_default_profile(client, machine_name)
uuid = datastore.add_watch(url='http://example.com', extras={'browser_profile': 'direct_http_requests', 'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' not in res.data, \
"Chrome icon should NOT appear when watch is explicitly set to direct_http_requests"
datastore.delete(uuid)
set_system_default_profile(client, 'direct_http_requests')
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_system_default_requests_inherited_by_watch(client, live_server, measure_memory_usage, datastore_path):
"""Watch using system default inherits requests → no icon."""
datastore = client.application.config.get('DATASTORE')
set_system_default_profile(client, 'direct_http_requests')
uuid = datastore.add_watch(url='http://example.com', extras={'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' not in res.data, \
"Chrome icon should NOT appear when system default is requests and watch uses system default"
datastore.delete(uuid)
def test_system_default_browser_inherited_by_watch(client, live_server, measure_memory_usage, datastore_path):
"""Watch using system default inherits a browser profile → icon shown."""
datastore = client.application.config.get('DATASTORE')
machine_name = create_custom_browser_profile(client)
set_system_default_profile(client, machine_name)
uuid = datastore.add_watch(url='http://example.com', extras={'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' in res.data, \
"Chrome icon should appear when system default is a browser profile and watch uses system default"
datastore.delete(uuid)
set_system_default_profile(client, 'direct_http_requests')
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
# ---------------------------------------------------------------------------
# Integration tests — BrowserProfile UA and custom_headers applied to requests
# ---------------------------------------------------------------------------
def test_browser_profile_user_agent_applied(client, live_server, measure_memory_usage, datastore_path):
"""User-Agent set on a BrowserProfile appears in the fetched request;
a per-watch User-Agent header overrides it."""
from changedetectionio.tests.util import wait_for_all_checks
datastore = client.application.config.get('DATASTORE')
test_url = url_for('test_headers', _external=True)
machine_name = create_requests_browser_profile(
client, name='UA Profile Test', user_agent='profile-ua/2.0'
)
uuid = datastore.add_watch(url=test_url, extras={'browser_profile': machine_name})
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'profile-ua/2.0' in res.data, "Profile UA should appear in the echoed request headers"
# Per-watch User-Agent header overrides the profile UA
client.post(
url_for('ui.ui_edit.edit_page', uuid='first'),
data={
'url': test_url,
'tags': '',
'browser_profile': machine_name,
'headers': 'User-Agent: watch-ua/3.0',
'time_between_check_use_default': 'y',
},
follow_redirects=True,
)
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'watch-ua/3.0' in res.data, "Watch-level UA should override profile UA"
assert b'profile-ua/2.0' not in res.data, "Profile UA should be superseded by watch-level header"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_browser_profile_custom_headers_applied(client, live_server, measure_memory_usage, datastore_path):
"""Custom headers set on a BrowserProfile are sent with every request using that profile;
per-watch headers override them when the same header name is used."""
from changedetectionio.tests.util import wait_for_all_checks
datastore = client.application.config.get('DATASTORE')
test_url = url_for('test_headers', _external=True)
machine_name = create_requests_browser_profile(
client,
name='Headers Profile Test',
custom_headers='X-Profile-Header: profile-value\nX-Shared-Header: from-profile',
)
uuid = datastore.add_watch(url=test_url, extras={'browser_profile': machine_name})
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'X-Profile-Header:profile-value' in res.data, \
"Profile custom header should appear in the echoed request"
assert b'X-Shared-Header:from-profile' in res.data, \
"Second profile custom header should appear"
# Per-watch header for the same key overrides the profile header
client.post(
url_for('ui.ui_edit.edit_page', uuid='first'),
data={
'url': test_url,
'tags': '',
'browser_profile': machine_name,
'headers': 'X-Shared-Header: from-watch\nX-Watch-Only: watch-value',
'time_between_check_use_default': 'y',
},
follow_redirects=True,
)
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'X-Profile-Header:profile-value' in res.data, \
"Unrelated profile header should still be present"
assert b'X-Shared-Header:from-watch' in res.data, \
"Watch-level header should override the same-named profile header"
assert b'X-Shared-Header:from-profile' not in res.data, \
"Profile value for overridden header should be gone"
assert b'X-Watch-Only:watch-value' in res.data, \
"Watch-only header should appear"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
@@ -464,7 +464,7 @@ def test_settings_persist_after_update(client, live_server):
# Update settings directly (bypass form validation issues)
datastore.data['settings']['application']['empty_pages_are_a_change'] = True
datastore.data['settings']['application']['browser_profile'] = 'direct_http_requests'
datastore.data['settings']['application']['fetch_backend'] = 'html_requests'
datastore.data['settings']['requests']['time_between_check']['minutes'] = 120
datastore.commit()
@@ -478,7 +478,7 @@ def test_settings_persist_after_update(client, live_server):
# Verify settings survived
assert datastore2.data['settings']['application']['empty_pages_are_a_change'] == True, "empty_pages_are_a_change should persist"
assert datastore2.data['settings']['application']['browser_profile'] == 'direct_http_requests', "browser_profile should persist"
assert datastore2.data['settings']['application']['fetch_backend'] == 'html_requests', "fetch_backend should persist"
assert datastore2.data['settings']['requests']['time_between_check']['minutes'] == 120, "time_between_check should persist"
@@ -634,7 +634,7 @@ def test_ui_watch_edit_persists_all_fields(client, live_server):
'time_between_check-hours': '2',
'time_between_check-minutes': '30',
'include_filters': '#content',
'browser_profile': 'direct_http_requests',
'fetch_backend': 'html_requests',
'method': 'POST',
'ignore_text': 'Advertisement\nTracking'
},
@@ -657,5 +657,5 @@ def test_ui_watch_edit_persists_all_fields(client, live_server):
assert watch['title'] == 'Updated Watch Title'
assert watch['time_between_check']['hours'] == 2
assert watch['time_between_check']['minutes'] == 30
assert watch['browser_profile'] == 'direct_http_requests'
assert watch['fetch_backend'] == 'html_requests'
assert watch['method'] == 'POST'
@@ -72,6 +72,7 @@ def test_conditions_with_text_and_number(client, live_server, measure_memory_usa
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={
"url": test_url,
"fetch_backend": "html_requests",
"include_filters": ".number-container",
"title": "Number AND Text Condition Test",
"conditions_match_logic": CONDITIONS_MATCH_LOGIC_DEFAULT, # ALL = AND logic
@@ -257,6 +258,7 @@ def test_lev_conditions_plugin(client, live_server, measure_memory_usage, datast
url_for("ui.ui_edit.edit_page", uuid=uuid, unpause_on_save=1),
data={
"url": test_url,
"fetch_backend": "html_requests",
"conditions_match_logic": CONDITIONS_MATCH_LOGIC_DEFAULT, # ALL = AND logic
"conditions-0-field": "levenshtein_ratio",
"conditions-0-operator": "<",
+3 -3
View File
@@ -89,7 +89,7 @@ def test_check_markup_include_filters_restriction(client, live_server, measure_m
# Add our URL to the import page
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'browser_profile': "direct_http_requests", "time_between_check_use_default": "y"},
data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"},
follow_redirects=True
)
assert b"Updated watch." in res.data
@@ -144,7 +144,7 @@ def test_check_multiple_filters(client, live_server, measure_memory_usage, datas
"url": test_url,
"tags": "",
"headers": "",
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -195,7 +195,7 @@ def test_filter_is_empty_help_suggestion(client, live_server, measure_memory_usa
"url": test_url,
"tags": "",
"headers": "",
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -171,7 +171,7 @@ def test_element_removal_full(client, live_server, measure_memory_usage, datasto
"url": test_url,
"tags": "",
"headers": "",
"browser_profile": "direct_http_requests",
"fetch_backend": "html_requests",
"time_between_check_use_default": "y",
},
follow_redirects=True,
-64
View File
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
# coding=utf-8
import hashlib
import time
from flask import url_for
from .util import live_server_setup, wait_for_all_checks, extract_UUID_from_client
@@ -12,69 +11,6 @@ import os
def test_surrogate_characters_in_content_are_sanitized():
"""Lone surrogates can appear in requests' r.text when a server returns malformed/mixed-encoding
content. Without sanitization, encoding to UTF-8 raises UnicodeEncodeError.
See: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
content_with_surrogate = '<html><body>Hello \udcad World</body></html>'
# Confirm the raw problem exists
with pytest.raises(UnicodeEncodeError):
content_with_surrogate.encode('utf-8')
# Our fix: sanitize after fetcher.run() in processors/base.py call_browser()
sanitized = content_with_surrogate.encode('utf-8', errors='replace').decode('utf-8')
assert 'Hello' in sanitized
assert 'World' in sanitized
assert '\udcad' not in sanitized
# Checksum computation (processors/base.py get_raw_document_checksum) must not crash
hashlib.md5(sanitized.encode('utf-8')).hexdigest()
def test_utf8_content_without_charset_header(client, live_server, datastore_path):
"""Server returns UTF-8 content but no charset in Content-Type header.
chardet can misdetect such pages as UTF-7 (Python 3.14 then produces surrogates).
Our fix tries UTF-8 first before falling back to chardet.
See: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
from .util import write_test_file_and_sync
# UTF-8 encoded content with non-ASCII chars - no charset will be in the header
html = '<html><body><p>Español</p><p>Français</p><p>日本語</p></body></html>'
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), html.encode('utf-8'), mode='wb')
test_url = url_for('test_endpoint', content_type="text/html", _external=True)
client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("ui.ui_preview.preview_page", uuid="first"), follow_redirects=True)
# Should decode correctly as UTF-8, not produce mojibake (Español) or replacement chars
assert 'Español'.encode('utf-8') in res.data
assert 'Français'.encode('utf-8') in res.data
assert '日本語'.encode('utf-8') in res.data
def test_shiftjis_with_meta_charset(client, live_server, datastore_path):
"""Server returns Shift-JIS content with no charset in HTTP header, but the HTML
declares <meta charset="Shift-JIS">. We should use the meta tag, not chardet.
Real-world case: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
from .util import write_test_file_and_sync
japanese_text = '日本語のページ'
html = f'<html><head><meta http-equiv="Content-Type" content="text/html;charset=Shift-JIS"></head><body><p>{japanese_text}</p></body></html>'
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), html.encode('shift_jis'), mode='wb')
test_url = url_for('test_endpoint', content_type="text/html", _external=True)
client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("ui.ui_preview.preview_page", uuid="first"), follow_redirects=True)
assert japanese_text.encode('utf-8') in res.data
def set_html_response(datastore_path):
test_return_data = """
<html><body><span class="nav_second_img_text">
+4 -14
View File
@@ -10,8 +10,6 @@ from .util import live_server_setup, wait_for_all_checks, delete_all_watches
def _runner_test_http_errors(client, live_server, http_code, expected_text, datastore_path):
from loguru import logger
logger.debug(f"_runner_test_http_errors - testing text '{expected_text}' for code {http_code}")
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write("Now you going to get a {} error code\n".format(http_code))
@@ -22,11 +20,6 @@ def _runner_test_http_errors(client, live_server, http_code, expected_text, data
status_code=http_code,
_external=True)
if os.getenv("PLAYWRIGHT_DRIVER_URL") or os.getenv('WEBDRIVER_URL'):
logger.warning("!!! Looks like we're running test with playwright or selenium, so FORCE a connection back to our container 'cdio'")
test_url = test_url.replace('localhost.localdomain', 'changedet')
test_url = test_url.replace('localhost', 'changedet')
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
@@ -83,8 +76,7 @@ def test_DNS_errors(client, live_server, measure_memory_usage, datastore_path):
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
b"Connection error occurred" in res.data
)
assert found_name_resolution_error
# Should always record that we tried
@@ -116,8 +108,7 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
b"Connection error occurred" in res.data
)
assert found_name_resolution_error
@@ -126,7 +117,7 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
url_for("ui.ui_edit.edit_page", uuid="first"),
data={
"url": test_url,
"browser_profile": "direct_http_requests",
"fetch_backend": "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -140,8 +131,7 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
b"Connection error occurred" in res.data
)
assert not found_name_resolution_error
@@ -92,7 +92,7 @@ def test_check_filter_multiline(client, live_server, measure_memory_usage, datas
"url": test_url,
"tags": "",
"headers": "",
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -143,7 +143,7 @@ def test_check_filter_and_regex_extract(client, live_server, measure_memory_usag
"url": test_url,
"tags": "",
"headers": "",
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -212,7 +212,7 @@ def test_regex_error_handling(client, live_server, measure_memory_usage, datasto
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={"extract_text": '/something bad\d{3/XYZ',
"url": test_url,
"browser_profile": "direct_http_requests",
"fetch_backend": "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -96,7 +96,7 @@ def test_filter_doesnt_exist_then_exists_should_get_notification(client, live_se
# preprended with extra filter that intentionally doesn't match any entry,
# notification should still be sent even if first filter does not match (PR#3516)
"include_filters": ".non-matching-selector\n.ticket-available",
"browser_profile": "direct_http_requests",
"fetch_backend": "html_requests",
"time_between_check_use_default": "y"})
res = client.post(
@@ -70,7 +70,7 @@ def run_filter_test(client, live_server, content_filter, app_notification_format
"Diff as Patch: {{diff_patch}}\n"
":-)",
"notification_format": 'text',
"browser_profile": "direct_http_requests",
"fetch_backend": "html_requests",
"filter_failure_notification_send": 'y',
"time_between_check_use_default": "y",
"headers": "",
+1 -1
View File
@@ -417,7 +417,7 @@ def test_order_of_filters_tag_filter_and_watch_filter(client, live_server, measu
"url": test_url,
"tags": "test-tag-keep-order",
"headers": "",
'browser_profile': "direct_http_requests",
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -50,7 +50,8 @@ def test_consistent_history(client, live_server, measure_memory_usage, datastore
res = client.post(
url_for("settings.settings_page"),
data={"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180},
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -243,7 +244,7 @@ def test_history_trim_global_override_in_watch(client, live_server, measure_memo
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", 'browser_profile': "direct_http_requests",
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests",
"time_between_check_use_default": "y", "history_snapshot_max_length": str(limit)},
follow_redirects=True
)
-73
View File
@@ -624,76 +624,3 @@ def test_session_locale_overrides_accept_language(client, live_server, measure_m
assert "".encode() in res.data, "Expected Korean '' for Minutes"
assert "小時".encode() not in res.data, "Should not have Traditional Chinese '小時' when Korean is set"
assert "分鐘".encode() not in res.data, "Should not have Traditional Chinese '分鐘' when Korean is set"
def test_clear_history_translated_confirmation(client, live_server, measure_memory_usage, datastore_path):
"""
Test that clearing snapshot history works with translated confirmation text.
Issue #3865: When the app language is set to German, the clear history
confirmation dialog shows the translated word (e.g. 'loschen') but the
backend only accepted the English word 'clear', making it impossible
to clear snapshots in non-English languages.
"""
from flask import url_for
test_url = url_for('test_endpoint', _external=True)
# Add a watch so there is history to clear
res = client.post(
url_for("imports.import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
# Set language to German
res = client.get(
url_for("set_language", locale="de"),
follow_redirects=True
)
assert res.status_code == 200
# Verify the clear history page shows the German confirmation word
res = client.get(
url_for("ui.clear_all_history"),
follow_redirects=True
)
assert res.status_code == 200
assert "löschen".encode() in res.data, "Expected German word 'loschen' on clear history page"
# Submit the form with the German translated word
res = client.post(
url_for("ui.clear_all_history"),
data={"confirmtext": "löschen"},
follow_redirects=True
)
assert res.status_code == 200
# Should NOT show error message
assert b"Incorrect confirmation text" not in res.data, \
"German confirmation word 'loschen' should be accepted (issue #3865)"
# Switch back to English and verify English word still works
res = client.get(
url_for("set_language", locale="en_US"),
follow_redirects=True
)
res = client.post(
url_for("ui.clear_all_history"),
data={"confirmtext": "clear"},
follow_redirects=True
)
assert res.status_code == 200
assert b"Incorrect confirmation text" not in res.data, \
"English confirmation word 'clear' should still be accepted"
# Verify that missing/empty confirmtext does not crash the server
res = client.post(
url_for("ui.clear_all_history"),
data={},
follow_redirects=True
)
assert res.status_code == 200, \
"Missing confirmtext should not crash the server"

Some files were not shown because too many files have changed in this diff Show More