Compare commits

...

19 Commits

Author SHA1 Message Date
dgtlmoon
c22335ed01 0.52.9
Some checks are pending
Build and push containers / metadata (push) Waiting to run
Build and push containers / build-push-containers (push) Waiting to run
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Waiting to run
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Blocked by required conditions
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Blocked by required conditions
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Waiting to run
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Waiting to run
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Waiting to run
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Waiting to run
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Waiting to run
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Waiting to run
ChangeDetection.io App Test / lint-code (push) Waiting to run
ChangeDetection.io App Test / test-application-3-10 (push) Blocked by required conditions
ChangeDetection.io App Test / test-application-3-11 (push) Blocked by required conditions
ChangeDetection.io App Test / test-application-3-12 (push) Blocked by required conditions
ChangeDetection.io App Test / test-application-3-13 (push) Blocked by required conditions
2026-01-22 10:30:12 +01:00
dgtlmoon
0042f0c36a Memory management improvements for large screenshots, Brotli snapshot improvements (#3798) 2026-01-22 10:29:22 +01:00
dgtlmoon
55e14cf394 Updating site.webmanifest for PWA usage 2026-01-22 10:28:38 +01:00
Ianis BERNARD
308ccb5841 Use credentials to fetch web manifest (#3790) 2026-01-22 10:27:20 +01:00
dgtlmoon
978e17acf6 Make language selection sticky and provide a way to return back to default auto-detect #3792 (#3795) 2026-01-22 08:01:49 +01:00
dgtlmoon
73c29d1fa0 Element locking 'off' by default (so they dont move when the screenshot scroll happens), only lock top viewport elements. Improve logging. (#3796) 2026-01-22 08:01:19 +01:00
dgtlmoon
b3eb88b6d2 Rebuilding language translation files
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-01-22 06:11:21 +01:00
Alex Notes
aa73ce2ee6 Update French translation (#3788) 2026-01-22 05:44:20 +01:00
Maicon Strey
0cbf345e84 Open github link on new tab (#3791) 2026-01-22 05:43:15 +01:00
Dominik Herold
d65e08e7c8 Update messages.po // German "From" (#3793) 2026-01-22 05:42:49 +01:00
dgtlmoon
10f233a939 Improving container version labeling, using master branch as docker :dev tag. Re #3794 2026-01-22 05:38:56 +01:00
dgtlmoon
52911d699f 0.52.8
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-20 13:35:12 +01:00
dgtlmoon
7e886e0c56 Memory - Favicon reader had a memory leak, Restart fetch workers between jobs, misc tweaks (#3787) 2026-01-20 12:49:53 +01:00
dgtlmoon
151e603af7 API - Validation improvements (#3782)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-19 18:16:25 +01:00
dgtlmoon
7311af4b58 i18n - zh traditional chinese autodetect from browser fix 2026-01-19 16:28:25 +01:00
dgtlmoon
af193e8d7a UI - Fixes for search dialog #3778 (#3781) 2026-01-19 16:18:23 +01:00
dgtlmoon
9e2acadb7e 0.52.7
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-19 09:37:01 +01:00
吾爱分享
48da93b4ec Fix zh PO duplicates and complete new translations. (#3773) 2026-01-19 09:35:52 +01:00
dgtlmoon
0c1adc8906 Lots of translation updates (#3772)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-17 22:24:23 +01:00
60 changed files with 24413 additions and 17678 deletions

View File

@@ -15,7 +15,6 @@ on:
push:
branches:
- master
- dev
jobs:
metadata:
@@ -93,17 +92,28 @@ jobs:
version: latest
driver-opts: image=moby/buildkit:master
# dev branch -> :dev container tag
# master branch -> :dev container tag
- name: Docker meta :dev
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/metadata-action@v5
id: meta_dev
with:
images: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io
ghcr.io/${{ github.repository }}
tags: |
type=raw,value=dev
- name: Build and push :dev
id: docker_build
if: ${{ github.ref == 'refs/heads/dev' }}
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
push: true
tags: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:dev,ghcr.io/${{ github.repository }}:dev
tags: ${{ steps.meta_dev.outputs.tags }}
labels: ${{ steps.meta_dev.outputs.labels }}
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/arm/v8
cache-from: type=gha
cache-to: type=gha,mode=max
@@ -142,6 +152,7 @@ jobs:
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/arm/v8
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@@ -78,6 +78,12 @@ RUN --mount=type=cache,id=pip,sharing=locked,target=/tmp/pip-cache \
# Final image stage
FROM python:${PYTHON_VERSION}-slim-bookworm
LABEL org.opencontainers.image.source="https://github.com/dgtlmoon/changedetection.io"
LABEL org.opencontainers.image.url="https://changedetection.io"
LABEL org.opencontainers.image.documentation="https://changedetection.io/tutorials"
LABEL org.opencontainers.image.title="changedetection.io"
LABEL org.opencontainers.image.description="Self-hosted web page change monitoring and notification service"
LABEL org.opencontainers.image.licenses="Apache-2.0"
LABEL org.opencontainers.image.vendor="changedetection.io"
RUN apt-get update && apt-get install -y --no-install-recommends \
libxslt1.1 \

View File

@@ -16,6 +16,7 @@ recursive-include changedetectionio/widgets *
prune changedetectionio/static/package-lock.json
prune changedetectionio/static/styles/node_modules
prune changedetectionio/static/styles/package-lock.json
include changedetectionio/favicon_utils.py
include changedetection.py
include requirements.txt
include README-pip.md

5
babel.cfg Normal file
View File

@@ -0,0 +1,5 @@
[python: **.py]
keywords = _:1,_l:1,gettext:1
[jinja2: **/templates/**.html]
encoding = utf-8

View File

@@ -2,7 +2,7 @@
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
# Semver means never use .01, or 00. Should be .1.
__version__ = '0.52.6'
__version__ = '0.52.9'
from changedetectionio.strtobool import strtobool
from json.decoder import JSONDecodeError

View File

@@ -2,6 +2,7 @@ import os
import threading
from changedetectionio.validate_url import is_safe_valid_url
from changedetectionio.favicon_utils import get_favicon_mime_type
from . import auth
from changedetectionio import queuedWatchMetaData, strtobool
@@ -68,13 +69,17 @@ class Watch(Resource):
import time
from copy import deepcopy
watch = None
for _ in range(20):
# Retry up to 20 times if dict is being modified
# With sleep(0), this is fast: ~200µs best case, ~20ms worst case under heavy load
for attempt in range(20):
try:
watch = deepcopy(self.datastore.data['watching'].get(uuid))
break
except RuntimeError:
# Incase dict changed, try again
time.sleep(0.01)
# Dict changed during deepcopy, retry after yielding to scheduler
# sleep(0) releases GIL and yields - no fixed delay, just lets other threads run
if attempt < 19: # Don't yield on last attempt
time.sleep(0) # Yield to scheduler (microseconds, not milliseconds)
if not watch:
abort(404, message='No watch exists with the UUID of {}'.format(uuid))
@@ -126,17 +131,31 @@ class Watch(Resource):
if request.json.get('proxy'):
plist = self.datastore.proxy_list
if not request.json.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
if not plist or request.json.get('proxy') not in plist:
proxy_list_str = ', '.join(plist) if plist else 'none configured'
return f"Invalid proxy choice, currently supported proxies are '{proxy_list_str}'", 400
# Validate time_between_check when not using defaults
validation_error = validate_time_between_check_required(request.json)
if validation_error:
return validation_error, 400
# XSS etc protection
if request.json.get('url') and not is_safe_valid_url(request.json.get('url')):
return "Invalid URL", 400
# XSS etc protection - validate URL if it's being updated
if 'url' in request.json:
new_url = request.json.get('url')
# URL must be a non-empty string
if new_url is None:
return "URL cannot be null", 400
if not isinstance(new_url, str):
return "URL must be a string", 400
if not new_url.strip():
return "URL cannot be empty or whitespace only", 400
if not is_safe_valid_url(new_url.strip()):
return "Invalid or unsupported URL format. URL must use http://, https://, or ftp:// protocol", 400
# Handle processor-config-* fields separately (save to JSON, not datastore)
from changedetectionio import processors
@@ -232,6 +251,10 @@ class WatchSingleHistory(Resource):
if timestamp == 'latest':
timestamp = list(watch.history.keys())[-1]
# Validate that the timestamp exists in history
if timestamp not in watch.history:
abort(404, message=f"No history snapshot found for timestamp '{timestamp}'")
if request.args.get('html'):
content = watch.get_fetched_html(timestamp)
if content:
@@ -380,16 +403,9 @@ class WatchFavicon(Resource):
favicon_filename = watch.get_favicon_filename()
if favicon_filename:
try:
import magic
mime = magic.from_file(
os.path.join(watch.watch_data_dir, favicon_filename),
mime=True
)
except ImportError:
# Fallback, no python-magic
import mimetypes
mime, encoding = mimetypes.guess_type(favicon_filename)
# Use cached MIME type detection
filepath = os.path.join(watch.watch_data_dir, favicon_filename)
mime = get_favicon_mime_type(filepath)
response = make_response(send_from_directory(watch.watch_data_dir, favicon_filename))
response.headers['Content-type'] = mime
@@ -419,8 +435,9 @@ class CreateWatch(Resource):
if json_data.get('proxy'):
plist = self.datastore.proxy_list
if not json_data.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
if not plist or json_data.get('proxy') not in plist:
proxy_list_str = ', '.join(plist) if plist else 'none configured'
return f"Invalid proxy choice, currently supported proxies are '{proxy_list_str}'", 400
# Validate time_between_check when not using defaults
validation_error = validate_time_between_check_required(json_data)

View File

@@ -30,13 +30,23 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
app: Flask application instance
datastore: Application datastore
executor: ThreadPoolExecutor for queue operations (optional)
Returns:
"restart" if worker should restart, "shutdown" for clean exit
"""
# Set a descriptive name for this task
task = asyncio.current_task()
if task:
task.set_name(f"async-worker-{worker_id}")
logger.info(f"Starting async worker {worker_id}")
# Read restart policy from environment
max_jobs = int(os.getenv("WORKER_MAX_JOBS", "10"))
max_runtime_seconds = int(os.getenv("WORKER_MAX_RUNTIME", "3600")) # 1 hour default
jobs_processed = 0
start_time = time.time()
logger.info(f"Starting async worker {worker_id} (max_jobs={max_jobs}, max_runtime={max_runtime_seconds}s)")
while not app.config.exit.is_set():
update_handler = None
@@ -51,7 +61,11 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
)
except asyncio.TimeoutError:
# No jobs available, continue loop
# No jobs available - check if we should restart based on time while idle
runtime = time.time() - start_time
if runtime >= max_runtime_seconds:
logger.info(f"Worker {worker_id} idle and reached max runtime ({runtime:.0f}s), restarting")
return "restart"
continue
except Exception as e:
# Handle expected Empty exception from queue timeout
@@ -149,8 +163,10 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
except ProcessorException as e:
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot)
e.screenshot = None # Free memory immediately
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
e.xpath_data = None # Free memory immediately
datastore.update_watch(uuid=uuid, update_obj={'last_error': e.message})
process_changedetection_results = False
@@ -170,9 +186,11 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
e.screenshot = None # Free memory immediately
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
e.xpath_data = None # Free memory immediately
process_changedetection_results = False
@@ -191,8 +209,10 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
e.screenshot = None # Free memory immediately
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data, as_error=True)
e.xpath_data = None # Free memory immediately
if e.page_text:
watch.save_error_text(contents=e.page_text)
@@ -209,9 +229,11 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
# Filter wasnt found, but we should still update the visual selector so that they can have a chance to set it up again
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot)
e.screenshot = None # Free memory immediately
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
e.xpath_data = None # Free memory immediately
# Only when enabled, send the notification
if watch.get('filter_failure_notification_send', False):
@@ -303,6 +325,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
err_text = "Error running JS Actions - Page request - "+e.message
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
e.screenshot = None # Free memory immediately
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code})
process_changedetection_results = False
@@ -314,6 +337,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
e.screenshot = None # Free memory immediately
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code,
@@ -355,9 +379,17 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
if changed_detected or not watch.history_n:
if update_handler.screenshot:
watch.save_screenshot(screenshot=update_handler.screenshot)
# Free screenshot memory immediately after saving
update_handler.screenshot = None
if hasattr(update_handler, 'fetcher') and hasattr(update_handler.fetcher, 'screenshot'):
update_handler.fetcher.screenshot = None
if update_handler.xpath_data:
watch.save_xpath_data(data=update_handler.xpath_data)
# Free xpath data memory
update_handler.xpath_data = None
if hasattr(update_handler, 'fetcher') and hasattr(update_handler.fetcher, 'xpath_data'):
update_handler.fetcher.xpath_data = None
# Ensure unique timestamp for history
if watch.newest_history_key and int(fetch_start_time) == int(watch.newest_history_key):
@@ -424,6 +456,20 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
update_handler.fetcher.clear_content()
logger.debug(f"Cleared fetcher content for UUID {uuid}")
# Explicitly delete update_handler to free all references
if update_handler:
del update_handler
update_handler = None
# Force aggressive memory cleanup after clearing
import gc
gc.collect()
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
except Exception as e:
logger.error(f"Worker {worker_id} unexpected error processing {uuid}: {e}")
logger.error(f"Worker {worker_id} traceback:", exc_info=True)
@@ -488,6 +534,19 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
# Small yield for normal completion
await asyncio.sleep(0.01)
# Job completed - increment counter and check restart conditions
jobs_processed += 1
runtime = time.time() - start_time
# Check if we should restart (only when idle, between jobs)
should_restart_jobs = jobs_processed >= max_jobs
should_restart_time = runtime >= max_runtime_seconds
if should_restart_jobs or should_restart_time:
reason = f"{jobs_processed} jobs" if should_restart_jobs else f"{runtime:.0f}s runtime"
logger.info(f"Worker {worker_id} restarting after {reason} ({jobs_processed} jobs, {runtime:.0f}s runtime)")
return "restart"
# Check if we should exit
if app.config.exit.is_set():
break
@@ -495,10 +554,12 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
# Check if we're in pytest environment - if so, be more gentle with logging
import sys
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
if not in_pytest:
logger.info(f"Worker {worker_id} shutting down")
return "shutdown"
def cleanup_error_artifacts(uuid, datastore):
"""Helper function to clean up error artifacts"""

View File

@@ -25,9 +25,7 @@
<li class="tab"><a href="#ui-options">{{ _('UI Options') }}</a></li>
<li class="tab"><a href="#api">{{ _('API') }}</a></li>
<li class="tab"><a href="#rss">{{ _('RSS') }}</a></li>
<li class="pure-menu-item menu-collapsible {% if request.endpoint.startswith('backups.') %}active{% endif %}">
<a href="{{ url_for('backups.index') }}" class="pure-menu-link">{{ _('Backups') }}</a>
</li>
<li class="tab"><a href="{{ url_for('backups.index') }}" class="pure-menu-link">{{ _('Backups') }}</a></li>
<li class="tab"><a href="#timedate">{{ _('Time & Date') }}</a></li>
<li class="tab"><a href="#proxies">{{ _('CAPTCHA & Proxies') }}</a></li>
{% if plugin_tabs %}
@@ -56,9 +54,9 @@
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.filter_failure_notification_threshold_attempts, class="filter_failure_notification_threshold_attempts") }}
<span class="pure-form-message-inline">After this many consecutive times that the CSS/xPath filter is missing, send a notification
<span class="pure-form-message-inline">{{ _('After this many consecutive times that the CSS/xPath filter is missing, send a notification') }}
<br>
Set to <strong>0</strong> to disable
{{ _('Set to') }} <strong>0</strong> {{ _('to disable') }}
</span>
</div>
<div class="pure-control-group">
@@ -67,21 +65,20 @@
{{ render_button(form.application.form.removepassword_button) }}
{% else %}
{{ render_field(form.application.form.password) }}
<span class="pure-form-message-inline">Password protection for your changedetection.io application.</span>
<span class="pure-form-message-inline">{{ _('Password protection for your changedetection.io application.') }}</span>
{% endif %}
{% else %}
<span class="pure-form-message-inline">Password is locked.</span>
<span class="pure-form-message-inline">{{ _('Password is locked.') }}</span>
{% endif %}
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.shared_diff_access, class="shared_diff_access") }}
<span class="pure-form-message-inline">Allow access to the watch change history page when password is enabled (Good for sharing the diff page)
</span>
<span class="pure-form-message-inline">{{ _('Allow access to the watch change history page when password is enabled (Good for sharing the diff page)') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.empty_pages_are_a_change) }}
<span class="pure-form-message-inline">When a request returns no content, or the HTML does not contain any text, is this considered a change?</span>
<span class="pure-form-message-inline">{{ _('When a request returns no content, or the HTML does not contain any text, is this considered a change?') }}</span>
</div>
</fieldset>
</div>
@@ -93,8 +90,8 @@
<div class="pure-control-group" id="notification-base-url">
{{ render_field(form.application.form.base_url, class="m-d") }}
<span class="pure-form-message-inline">
Base URL used for the <code>{{ '{{ base_url }}' }}</code> token in notification links.<br>
Default value is the system environment variable '<code>BASE_URL</code>' - <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Configurable-BASE_URL-setting">read more here</a>.
{{ _('Base URL used for the') }} <code>{{ '{{ base_url }}' }}</code> {{ _('token in notification links.') }}<br>
{{ _('Default value is the system environment variable') }} '<code>BASE_URL</code>' - <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Configurable-BASE_URL-setting">{{ _('read more here') }}</a>.
</span>
</div>
</div>
@@ -103,15 +100,15 @@
<div class="pure-control-group inline-radio">
{{ render_field(form.application.form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched sites don't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server, set by the ENV var 'WEBDRIVER_URL'. </p>
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched sites don\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var') }} 'WEBDRIVER_URL'. </p>
</span>
</div>
<fieldset class="pure-group" id="webdriver-override-options" data-visible-for="application-fetch_backend=html_webdriver">
<div class="pure-form-message-inline">
<strong>If you're having trouble waiting for the page to be fully rendered (text missing etc), try increasing the 'wait' time here.</strong>
<strong>{{ _('If you\'re having trouble waiting for the page to be fully rendered (text missing etc), try increasing the \'wait\' time here.') }}</strong>
<br>
This will wait <i>n</i> seconds before extracting the text.
{{ _('This will wait') }} <i>n</i> {{ _('seconds before extracting the text.') }}
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.webdriver_delay) }}
@@ -120,27 +117,27 @@
<div class="pure-control-group">
{{ render_field(form.requests.form.workers) }}
{% set worker_info = get_worker_status_info() %}
<span class="pure-form-message-inline">Number of concurrent workers to process watches. More workers = faster processing but higher memory usage.<br>
Currently running: <strong>{{ worker_info.count }}</strong> operational {{ worker_info.type }} workers{% if worker_info.active_workers > 0 %} ({{ worker_info.active_workers }} actively processing){% endif %}.</span>
<span class="pure-form-message-inline">{{ _('Number of concurrent workers to process watches. More workers = faster processing but higher memory usage.') }}<br>
{{ _('Currently running:') }} <strong>{{ worker_info.count }}</strong> {{ _('operational') }} {{ worker_info.type }} {{ _('workers') }}{% if worker_info.active_workers > 0 %} ({{ worker_info.active_workers }} {{ _('actively processing') }}){% endif %}.</span>
</div>
<div class="pure-control-group">
{{ render_field(form.requests.form.jitter_seconds, class="jitter_seconds") }}
<span class="pure-form-message-inline">Example - 3 seconds random jitter could trigger up to 3 seconds earlier or up to 3 seconds later</span>
<span class="pure-form-message-inline">{{ _('Example - 3 seconds random jitter could trigger up to 3 seconds earlier or up to 3 seconds later') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(form.requests.form.timeout) }}
<span class="pure-form-message-inline">For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.</span><br>
<span class="pure-form-message-inline">{{ _('For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.') }}</span><br>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.requests.form.default_ua) }}
<span class="pure-form-message-inline">
Applied to all requests.<br><br>
Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it's important to consider <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">all of the ways that the browser is detected</a>.
{{ _('Applied to all requests.') }}<br><br>
{{ _('Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it\'s important to consider') }} <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">{{ _('all of the ways that the browser is detected') }}</a>.
</span>
</div>
<div class="pure-control-group">
<br>
Tip: <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">Connect using Bright Data and Oxylabs Proxies, find out more here.</a>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
</div>
</div>
@@ -149,15 +146,15 @@
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.ignore_whitespace) }}
<span class="pure-form-message-inline">Ignore whitespace, tabs and new-lines/line-feeds when considering if a change was detected.<br>
<i>Note:</i> Changing this will change the status of your existing watches, possibly trigger alerts etc.
<span class="pure-form-message-inline">{{ _('Ignore whitespace, tabs and new-lines/line-feeds when considering if a change was detected.') }}<br>
<i>{{ _('Note:') }}</i> {{ _('Changing this will change the status of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.render_anchor_tag_content) }}
<span class="pure-form-message-inline">Render anchor tag content, default disabled, when enabled renders links as <code>(link text)[https://somesite.com]</code>
<span class="pure-form-message-inline">{{ _('Render anchor tag content, default disabled, when enabled renders links as') }} <code>(link text)[https://somesite.com]</code>
<br>
<i>Note:</i> Changing this could affect the content of your existing watches, possibly trigger alerts etc.
<i>{{ _('Note:') }}</i> {{ _('Changing this could affect the content of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
<fieldset class="pure-group">
@@ -168,9 +165,9 @@ nav
//*[contains(text(), 'Advertisement')]") }}
<span class="pure-form-message-inline">
<ul>
<li> Remove HTML element(s) by CSS and XPath selectors before text conversion. </li>
<li> Don't paste HTML here, use only CSS and XPath selectors </li>
<li> Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML. </li>
<li> {{ _('Remove HTML element(s) by CSS and XPath selectors before text conversion.') }} </li>
<li> {{ _('Don\'t paste HTML here, use only CSS and XPath selectors') }} </li>
<li> {{ _('Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML.') }} </li>
</ul>
</span>
</fieldset>
@@ -178,50 +175,50 @@ nav
{{ render_field(form.application.form.global_ignore_text, rows=5, placeholder="Some text to ignore in a line
/some.regex\d{2}/ for case-INsensitive regex
") }}
<span class="pure-form-message-inline">Note: This is applied globally in addition to the per-watch rules.</span><br>
<span class="pure-form-message-inline">{{ _('Note: This is applied globally in addition to the per-watch rules.') }}</span><br>
<span class="pure-form-message-inline">
<ul>
<li>Matching text will be <strong>ignored</strong> in the text snapshot (you can still see it but it wont trigger a change)</li>
<li>Note: This is applied globally in addition to the per-watch rules.</li>
<li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
<li>Regular Expression support, wrap the entire line in forward slash <code>/regex/</code></li>
<li>Changing this will affect the comparison checksum which may trigger an alert</li>
<li>{{ _('Matching text will be') }} <strong>{{ _('ignored') }}</strong> {{ _('in the text snapshot (you can still see it but it wont trigger a change)') }}</li>
<li>{{ _('Note: This is applied globally in addition to the per-watch rules.') }}</li>
<li>{{ _('Each line processed separately, any line matching will be ignored (removed before creating the checksum)') }}</li>
<li>{{ _('Regular Expression support, wrap the entire line in forward slash') }} <code>/regex/</code></li>
<li>{{ _('Changing this will affect the comparison checksum which may trigger an alert') }}</li>
</ul>
</span>
</fieldset>
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.strip_ignored_lines) }}
<span class="pure-form-message-inline">Remove any text that appears in the "Ignore text" from the output (otherwise its just ignored for change-detection)<br>
<i>Note:</i> Changing this will change the status of your existing watches, possibly trigger alerts etc.
<span class="pure-form-message-inline">{{ _('Remove any text that appears in the "Ignore text" from the output (otherwise its just ignored for change-detection)') }}<br>
<i>{{ _('Note:') }}</i> {{ _('Changing this will change the status of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
</div>
<div class="tab-pane-inner" id="api">
<h4>API Access</h4>
<p>Drive your changedetection.io via API, More about <a href="https://changedetection.io/docs/api_v1/index.html">API access and examples here</a>.</p>
<h4>{{ _('API Access') }}</h4>
<p>{{ _('Drive your changedetection.io via API, More about') }} <a href="https://changedetection.io/docs/api_v1/index.html">{{ _('API access and examples here') }}</a>.</p>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.api_access_token_enabled) }}
<div class="pure-form-message-inline">Restrict API access limit by using <code>x-api-key</code> header - required for the Chrome Extension to work</div><br>
<div class="pure-form-message-inline"><br>API Key <span id="api-key">{{api_key}}</span>
<span style="display:none;" id="api-key-copy" >copy</span>
<div class="pure-form-message-inline">{{ _('Restrict API access limit by using') }} <code>x-api-key</code> {{ _('header - required for the Chrome Extension to work') }}</div><br>
<div class="pure-form-message-inline"><br>{{ _('API Key') }} <span id="api-key">{{api_key}}</span>
<span style="display:none;" id="api-key-copy" >{{ _('copy') }}</span>
</div>
</div>
<div class="pure-control-group">
<a href="{{url_for('settings.settings_reset_api_key')}}" class="pure-button button-small button-cancel">Regenerate API key</a>
<a href="{{url_for('settings.settings_reset_api_key')}}" class="pure-button button-small button-cancel">{{ _('Regenerate API key') }}</a>
</div>
<div class="pure-control-group">
<h4>Chrome Extension</h4>
<p>Easily add any web-page to your changedetection.io installation from within Chrome.</p>
<strong>Step 1</strong> Install the extension, <strong>Step 2</strong> Navigate to this page,
<strong>Step 3</strong> Open the extension from the toolbar and click "<i>Sync API Access</i>"
<h4>{{ _('Chrome Extension') }}</h4>
<p>{{ _('Easily add any web-page to your changedetection.io installation from within Chrome.') }}</p>
<strong>{{ _('Step 1') }}</strong> {{ _('Install the extension,') }} <strong>{{ _('Step 2') }}</strong> {{ _('Navigate to this page,') }}
<strong>{{ _('Step 3') }}</strong> {{ _('Open the extension from the toolbar and click') }} "<i>{{ _('Sync API Access') }}</i>"
<p>
<a id="chrome-extension-link"
title="Try our new Chrome Extension!"
title="{{ _('Try our new Chrome Extension!') }}"
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
<img alt="Chrome store icon" src="{{ url_for('static_content', group='images', filename='google-chrome-icon.png') }}" >
Chrome Webstore
<img alt="{{ _('Chrome store icon') }}" src="{{ url_for('static_content', group='images', filename='google-chrome-icon.png') }}" >
{{ _('Chrome Webstore') }}
</a>
</p>
</div>
@@ -232,20 +229,20 @@ nav
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.rss_diff_length) }}
<span class="pure-form-message-inline">Maximum number of history snapshots to include in the watch specific RSS feed.</span>
<span class="pure-form-message-inline">{{ _('Maximum number of history snapshots to include in the watch specific RSS feed.') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.rss_reader_mode) }}
<span class="pure-form-message-inline">For watching other RSS feeds - When watching RSS/Atom feeds, convert them into clean text for better change detection.</span>
<span class="pure-form-message-inline">{{ _('For watching other RSS feeds - When watching RSS/Atom feeds, convert them into clean text for better change detection.') }}</span>
</div>
<div class="pure-control-group grey-form-border">
<div class="pure-control-group">
{{ render_field(form.application.form.rss_content_format) }}
<span class="pure-form-message-inline">Does your reader support HTML? Set it here</span>
<span class="pure-form-message-inline">{{ _('Does your reader support HTML? Set it here') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.rss_template_type) }}
<span class="pure-form-message-inline">'System default' for the same template for all items, or re-use your "Notification Body" as the template.</span>
<span class="pure-form-message-inline">{{ _('\'System default\' for the same template for all items, or re-use your "Notification Body" as the template.') }}</span>
</div>
<div>
{{ render_field(form.application.form.rss_template_override) }}
@@ -258,11 +255,11 @@ nav
</div>
<div class="tab-pane-inner" id="timedate">
<div class="pure-control-group">
Ensure the settings below are correct, they are used to manage the time schedule for checking your web page watches.
{{ _('Ensure the settings below are correct, they are used to manage the time schedule for checking your web page watches.') }}
</div>
<div class="pure-control-group">
<p><strong>UTC Time &amp; Date from Server:</strong> <span id="utc-time" >{{ utc_time }}</span></p>
<p><strong>Local Time &amp; Date in Browser:</strong> <span class="local-time" data-utc="{{ utc_time }}"></span></p>
<p><strong>{{ _('UTC Time & Date from Server:') }}</strong> <span id="utc-time" >{{ utc_time }}</span></p>
<p><strong>{{ _('Local Time & Date in Browser:') }}</strong> <span class="local-time" data-utc="{{ utc_time }}"></span></p>
<div>
{{ render_field(form.application.form.scheduler_timezone_default) }}
<datalist id="timezones" style="display: none;">
@@ -274,22 +271,22 @@ nav
<div class="tab-pane-inner" id="ui-options">
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.open_diff_in_new_tab, class="open_diff_in_new_tab") }}
<span class="pure-form-message-inline">Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.</span>
<span class="pure-form-message-inline">{{ _('Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.socket_io_enabled, class="socket_io_enabled") }}
<span class="pure-form-message-inline">Realtime UI Updates Enabled - (Restart required if this is changed)</span>
<span class="pure-form-message-inline">{{ _('Realtime UI Updates Enabled - (Restart required if this is changed)') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.favicons_enabled, class="") }}
<span class="pure-form-message-inline">Enable or Disable Favicons next to the watch list</span>
<span class="pure-form-message-inline">{{ _('Enable or Disable Favicons next to the watch list') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.use_page_title_in_list) }}
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.pager_size) }}
<span class="pure-form-message-inline">Number of items per page in the watch overview list, 0 to disable.</span>
<span class="pure-form-message-inline">{{ _('Number of items per page in the watch overview list, 0 to disable.') }}</span>
</div>
</div>
@@ -337,18 +334,18 @@ nav
</div>
</div>
<p><strong>Tip</strong>: "Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.</p>
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.') }}</p>
<div class="pure-control-group" id="extra-proxies-setting">
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }}
<span class="pure-form-message-inline">"Name" will be used for selecting the proxy in the Watch Edit settings</span><br>
<span class="pure-form-message-inline">SOCKS5 proxies with authentication are only supported with 'plain requests' fetcher, for other fetchers you should whitelist the IP access instead</span>
<span class="pure-form-message-inline">{{ _('"Name" will be used for selecting the proxy in the Watch Edit settings') }}</span><br>
<span class="pure-form-message-inline">{{ _('SOCKS5 proxies with authentication are only supported with \'plain requests\' fetcher, for other fetchers you should whitelist the IP access instead') }}</span>
{% if form.requests.proxy %}
<div>
<br>
<div class="inline-radio">
{{ render_field(form.requests.form.proxy, class="fetch-backend-proxy") }}
<span class="pure-form-message-inline">Choose a default proxy for all watches</span>
<span class="pure-form-message-inline">{{ _('Choose a default proxy for all watches') }}</span>
</div>
</div>
{% endif %}

View File

@@ -1,6 +1,6 @@
import time
import threading
from flask import Blueprint, request, redirect, url_for, flash, render_template, session
from flask import Blueprint, request, redirect, url_for, flash, render_template, session, current_app
from flask_babel import gettext
from loguru import logger
@@ -404,4 +404,25 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
return redirect(url_for('watchlist.index'))
@ui_blueprint.route("/language/auto-detect", methods=['GET'])
def delete_locale_language_session_var_if_it_exists():
"""Clear the session locale preference to auto-detect from browser Accept-Language header"""
if 'locale' in session:
session.pop('locale', None)
# Refresh Flask-Babel to clear cached locale
from flask_babel import refresh
refresh()
flash(gettext("Language set to auto-detect from browser"))
# Check if there's a redirect parameter to return to the same page
redirect_url = request.args.get('redirect')
# If redirect is provided and safe, use it
from changedetectionio.is_safe_url import is_safe_url
if redirect_url and is_safe_url(redirect_url, current_app):
return redirect(redirect_url)
# Otherwise redirect to watchlist
return redirect(url_for('watchlist.index'))
return ui_blueprint

View File

@@ -3,6 +3,7 @@ import time
from flask import Blueprint, request, make_response, render_template, redirect, url_for, flash, session
from flask_paginate import Pagination, get_page_parameter
from flask_babel import gettext as _
from changedetectionio import forms
from changedetectionio import processors
@@ -73,7 +74,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
pagination = Pagination(page=page,
total=total_count,
per_page=datastore.data['settings']['application'].get('pager_size', 50), css_framework="semantic")
per_page=datastore.data['settings']['application'].get('pager_size', 50),
css_framework="semantic",
display_msg=_('displaying <b>{start} - {end}</b> {record_name} in total <b>{total}</b>'),
record_name=_('records'))
sorted_tags = sorted(datastore.data['settings']['application'].get('tags').items(), key=lambda x: x[1]['title'])

View File

@@ -62,7 +62,7 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{{ render_nolabel_field(form.edit_and_watch_submit_button, title=_("Edit first then Watch") ) }}
</div>
<div id="watch-group-tag">
{{ render_field(form.tags, value=active_tag.title if active_tag_uuid else '', placeholder="Watch group / tag", class="transparent-field") }}
{{ render_field(form.tags, value=active_tag.title if active_tag_uuid else '', placeholder=_("Watch group / tag"), class="transparent-field") }}
</div>
<div id="quick-watch-processor-type">
{{ render_simple_field(form.processor) }}

View File

@@ -71,10 +71,19 @@ class Fetcher():
supports_screenshots = False # Can capture page screenshots
supports_xpath_element_data = False # Can extract xpath element positions/data for visual selector
# Screenshot element locking - prevents layout shifts during screenshot capture
# Only needed for visual comparison (image_ssim_diff processor)
# Locks element dimensions in the first viewport to prevent headers/ads from resizing
lock_viewport_elements = False # Default: disabled for performance
def __init__(self, **kwargs):
if kwargs and 'screenshot_format' in kwargs:
self.screenshot_format = kwargs.get('screenshot_format')
# Allow lock_viewport_elements to be set via kwargs
if kwargs and 'lock_viewport_elements' in kwargs:
self.lock_viewport_elements = kwargs.get('lock_viewport_elements')
@classmethod
def get_status_icon_data(cls):

View File

@@ -10,18 +10,20 @@ from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, vi
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
async def capture_full_page_async(page, screenshot_format='JPEG'):
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
import multiprocessing
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width}")
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Use an approach similar to puppeteer: set a larger viewport and take screenshots in chunks
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Size that won't cause GPU to overflow
@@ -29,25 +31,31 @@ async def capture_full_page_async(page, screenshot_format='JPEG'):
y = 0
elements_locked = False
if page_height > page.viewport_size['height']:
# Lock all element dimensions BEFORE screenshot to prevent CSS media queries from resizing
# capture_full_page_async() changes viewport height which triggers @media (min-height) rules
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled (took {lock_time:.2f}s)")
logger.debug("Element dimensions locked before screenshot capture")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
logger.debug(f"Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
viewport_start = time.time()
logger.debug(f"{watch_info}Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
# Set viewport to a larger size to capture more content at once
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport_size['width']}x{step_size} (took {viewport_time:.2f}s)")
# Capture screenshots in chunks up to the max total height
capture_start = time.time()
chunk_times = []
# Use PNG for better quality (no compression artifacts), JPEG for smaller size
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
# PNG should use quality 100, JPEG uses configurable quality
@@ -69,7 +77,11 @@ async def capture_full_page_async(page, screenshot_format='JPEG'):
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
# Restore original viewport size
@@ -81,40 +93,54 @@ async def capture_full_page_async(page, screenshot_format='JPEG'):
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug("Element dimensions unlocked after screenshot capture")
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
# If we have multiple chunks, stitch them together
if len(screenshot_chunks) > 1:
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# For small number of chunks (2-3), stitch inline to avoid multiprocessing overhead
# Only use separate process for many chunks (4+) to avoid blocking the event loop
if len(screenshot_chunks) <= 3:
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_inline
screenshot = stitch_images_inline(screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT)
else:
# Use separate process for many chunks to avoid blocking
# Always use spawn for thread safety - consistent behavior in tests and production
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
screenshot = parent_conn.recv_bytes()
p.join()
# Explicit cleanup
del p
del parent_conn, child_conn
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
# Explicit cleanup
del screenshot_chunks
screenshot_chunks = None
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
@@ -184,7 +210,8 @@ class fetcher(Fetcher):
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Request GC immediately after screenshot to free memory
# Screenshots can be large and browser steps take many of them
@@ -233,6 +260,7 @@ class fetcher(Fetcher):
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid # Store for use in screenshot_step
response = None
async with async_playwright() as p:
@@ -318,7 +346,7 @@ class fetcher(Fetcher):
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format)
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Cleanup before raising to prevent memory leak
await self.page.close()
await context.close()
@@ -374,7 +402,17 @@ class fetcher(Fetcher):
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
self.screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format)
self.screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers
await self.page.request_gc()
gc.collect()
# Release C-level memory from base64 decode back to OS
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
except ScreenshotUnavailable:
# Re-raise screenshot unavailable exceptions

View File

@@ -20,18 +20,20 @@ from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
async def capture_full_page(page, screenshot_format='JPEG'):
async def capture_full_page(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
import multiprocessing
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport
dimensions_time = time.time() - setup_start
logger.debug(f"Puppeteer viewport size {page.viewport} page height {page_height} page width {page_width}")
logger.debug(f"{watch_info}Puppeteer viewport size {page.viewport} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
@@ -50,20 +52,35 @@ async def capture_full_page(page, screenshot_format='JPEG'):
screenshot_chunks = []
y = 0
elements_locked = False
if page_height > page.viewport['height']:
# Lock all element dimensions BEFORE screenshot to prevent CSS media queries from resizing
# capture_full_page() changes viewport height which triggers @media (min-height) rules
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
file_read_start = time.time()
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
logger.debug("Element dimensions locked before screenshot capture")
file_read_time = time.time() - file_read_start
evaluate_start = time.time()
await page.evaluate(lock_elements_js)
evaluate_time = time.time() - evaluate_start
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled - File read: {file_read_time:.3f}s, Browser evaluate: {evaluate_time:.2f}s, Total: {lock_time:.2f}s")
if page_height > page.viewport['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
viewport_start = time.time()
await page.setViewport({'width': page.viewport['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport['width']}x{step_size} (took {viewport_time:.2f}s)")
capture_start = time.time()
chunk_times = []
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
# better than scrollTo incase they override it in the page
await page.evaluate(
@@ -82,7 +99,11 @@ async def capture_full_page(page, screenshot_format='JPEG'):
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
await page.setViewport({'width': original_viewport['width'], 'height': original_viewport['height']})
@@ -93,26 +114,53 @@ async def capture_full_page(page, screenshot_format='JPEG'):
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug("Element dimensions unlocked after screenshot capture")
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
if len(screenshot_chunks) > 1:
# Always use spawn for thread safety - consistent behavior in tests and production
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
logger.debug(
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
screenshot_chunks = None
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
@@ -357,7 +405,7 @@ class fetcher(Fetcher):
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format)
screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
@@ -387,7 +435,17 @@ class fetcher(Fetcher):
# Now take screenshot (scrolling may trigger layout changes, but measurements are already captured)
logger.debug(f"Screenshot format {self.screenshot_format}")
self.screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format)
self.screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force aggressive memory cleanup - pyppeteer base64 decode creates temporary buffers
import gc
gc.collect()
# Release C-level memory from base64 decode back to OS
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT

View File

@@ -1,5 +1,5 @@
/**
* Lock Element Dimensions for Screenshot Capture
* Lock Element Dimensions for Screenshot Capture (First Viewport Only)
*
* THE PROBLEM:
* When taking full-page screenshots of tall pages, Chrome/Puppeteer/Playwright need to:
@@ -10,40 +10,31 @@
* However, changing the viewport height triggers CSS media queries like:
* @media (min-height: 860px) { .ad { height: 250px; } }
*
* This causes elements (especially ads) to resize during screenshot capture, creating a mismatch:
* - Screenshot shows element at NEW size (after media query triggered)
* - xpath element coordinates measured at OLD size (before viewport change)
* - Visual selector overlays don't align with screenshot
*
* EXAMPLE BUG:
* - Initial viewport: 1280x800, ad height: 138px, article position: 279px ✓
* - Viewport changes to 1280x3809 for screenshot
* - Media query triggers: ad expands to 250px
* - All content below shifts down by 112px (250-138)
* - Article now at position: 391px (279+112)
* - But xpath data says 279px → 112px mismatch! ✗
* This causes elements (especially ads/headers) to resize during screenshot capture.
*
* THE SOLUTION:
* Before changing viewport, lock ALL element dimensions with !important inline styles.
* Inline styles with !important override media query CSS, preventing layout changes.
* Lock element dimensions in the FIRST VIEWPORT ONLY with !important inline styles.
* This prevents headers, navigation, and top ads from resizing when viewport changes.
* We only lock the visible portion because:
* - Most layout shifts happen in headers/navbars/top ads
* - Locking only visible elements is 100x+ faster (100-200 elements vs 10,000+)
* - Below-fold content shifts don't affect visual comparison accuracy
*
* WHAT THIS SCRIPT DOES:
* 1. Iterates through every element on the page
* 2. Captures current computed dimensions (width, height)
* 3. Sets inline styles with !important to freeze those dimensions
* 1. Gets current viewport height
* 2. Finds elements within first viewport (top of page to bottom of screen)
* 3. Locks their dimensions with !important inline styles
* 4. Disables ResizeObserver API (for JS-based resizing)
* 5. When viewport changes for screenshot, media queries can't resize anything
* 6. Layout remains consistent → xpath coordinates match screenshot ✓
*
* USAGE:
* Execute this script BEFORE calling capture_full_page() / screenshot functions.
* The page must be fully loaded and settled at its initial viewport size.
* No need to restore state afterward - page is closed after screenshot.
* Only enabled for image_ssim_diff processor (visual comparison).
* Default: OFF for performance.
*
* PERFORMANCE:
* - Iterates all DOM elements (can be 1000s on complex pages)
* - Typically completes in 50-200ms
* - One-time cost before screenshot, well worth it for coordinate accuracy
* - Only processes 100-300 elements (first viewport) vs 10,000+ (entire page)
* - Typically completes in 10-50ms
* - 100x+ faster than locking entire page
*
* @see https://github.com/dgtlmoon/changedetection.io/issues/XXXX
*/
@@ -52,11 +43,34 @@
// Store original styles in a global WeakMap for later restoration
window.__elementSizingRestore = new WeakMap();
// Lock ALL element dimensions to prevent media query layout changes
document.querySelectorAll('*').forEach(el => {
const computed = window.getComputedStyle(el);
const rect = el.getBoundingClientRect();
const start = performance.now();
// Get current viewport height (visible portion of page)
const viewportHeight = window.innerHeight;
// Get all elements and filter to FIRST VIEWPORT ONLY
// This dramatically reduces elements to process (100-300 vs 10,000+)
const allElements = Array.from(document.querySelectorAll('*'));
// BATCH READ PHASE: Get bounding rects and filter to viewport
const measurements = allElements.map(el => {
const rect = el.getBoundingClientRect();
const computed = window.getComputedStyle(el);
// Only lock elements in the first viewport (visible on initial page load)
// rect.top < viewportHeight means element starts within visible area
const inViewport = rect.top < viewportHeight && rect.top >= 0;
const hasSize = rect.height > 0 && rect.width > 0;
return inViewport && hasSize ? { el, computed, rect } : null;
}).filter(Boolean); // Remove null entries
const elapsed = performance.now() - start;
console.log(`Locked first viewport elements: ${measurements.length} of ${allElements.length} total elements (viewport height: ${viewportHeight}px, took ${elapsed.toFixed(0)}ms)`);
// BATCH WRITE PHASE: Apply all inline styles without triggering layout
// No interleaved reads means browser can optimize style application
measurements.forEach(({el, computed, rect}) => {
// Save original inline style values BEFORE locking
const properties = ['height', 'min-height', 'max-height', 'width', 'min-width', 'max-width'];
const originalStyles = {};
@@ -89,5 +103,5 @@
disconnect() {}
};
console.log('✓ Element dimensions locked to prevent media query changes during screenshot');
console.log(`✓ Element dimensions locked (${measurements.length} elements) to prevent media query changes during screenshot`);
})();

View File

@@ -8,92 +8,42 @@ from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, SCREENSHOT_DEFAULT_QUALITY
# Cache font to avoid loading on every stitch
_cached_font = None
def _get_caption_font():
"""Get or create cached font for caption text."""
global _cached_font
if _cached_font is None:
from PIL import ImageFont
try:
_cached_font = ImageFont.truetype("arial.ttf", 35)
except IOError:
_cached_font = ImageFont.load_default()
return _cached_font
def stitch_images_inline(chunks_bytes, original_page_height, capture_height):
"""
Stitch image chunks together inline (no multiprocessing).
Optimized for small number of chunks (2-3) to avoid process creation overhead.
Args:
chunks_bytes: List of JPEG image bytes
original_page_height: Original page height in pixels
capture_height: Maximum capture height
Returns:
bytes: Stitched JPEG image
"""
import os
import io
from PIL import Image, ImageDraw
# Load images from byte chunks
images = [Image.open(io.BytesIO(b)) for b in chunks_bytes]
total_height = sum(im.height for im in images)
max_width = max(im.width for im in images)
# Create stitched image
stitched = Image.new('RGB', (max_width, total_height))
y_offset = 0
for im in images:
stitched.paste(im, (0, y_offset))
y_offset += im.height
im.close() # Close immediately after pasting
# Draw caption only if page was trimmed
if original_page_height > capture_height:
draw = ImageDraw.Draw(stitched)
caption_text = f"WARNING: Screenshot was {original_page_height}px but trimmed to {capture_height}px because it was too long"
padding = 10
font = _get_caption_font()
bbox = draw.textbbox((0, 0), caption_text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
# Draw white background rectangle
draw.rectangle([(0, 0), (max_width, text_height + 2 * padding)], fill=(255, 255, 255))
# Draw text centered
text_x = (max_width - text_width) // 2
draw.text((text_x, padding), caption_text, font=font, fill=(255, 0, 0))
# Encode to JPEG
output = io.BytesIO()
stitched.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", SCREENSHOT_DEFAULT_QUALITY)), optimize=True)
result = output.getvalue()
# Cleanup
stitched.close()
return result
def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_height):
def stitch_images_worker_raw_bytes(pipe_conn, original_page_height, capture_height):
"""
Stitch image chunks together in a separate process.
Used for large number of chunks (4+) to avoid blocking the main event loop.
Uses spawn multiprocessing to isolate PIL's C-level memory allocation.
When the subprocess exits, the OS reclaims ALL memory including C-level allocations
that Python's GC cannot release. This prevents the ~50MB per stitch from accumulating
in the main process.
Trade-off: Adds 35MB resource_tracker subprocess, but prevents 500MB+ memory leak
in main process (much better at scale: 35GB vs 500GB for 1000 instances).
Args:
pipe_conn: Pipe connection to receive data and send result
original_page_height: Original page height in pixels
capture_height: Maximum capture height
"""
import os
import io
import struct
from PIL import Image, ImageDraw, ImageFont
try:
# Receive chunk count as 4-byte integer (no pickle!)
count_bytes = pipe_conn.recv_bytes()
chunk_count = struct.unpack('I', count_bytes)[0]
# Receive each chunk as raw bytes (no pickle!)
chunks_bytes = []
for _ in range(chunk_count):
chunks_bytes.append(pipe_conn.recv_bytes())
# Load images from byte chunks
images = [Image.open(io.BytesIO(b)) for b in chunks_bytes]
del chunks_bytes
total_height = sum(im.height for im in images)
max_width = max(im.width for im in images)
@@ -103,15 +53,14 @@ def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_
for im in images:
stitched.paste(im, (0, y_offset))
y_offset += im.height
im.close() # Close immediately after pasting
im.close()
del images
# Draw caption only if page was trimmed
if original_page_height > capture_height:
draw = ImageDraw.Draw(stitched)
caption_text = f"WARNING: Screenshot was {original_page_height}px but trimmed to {capture_height}px because it was too long"
padding = 10
# Try to load font
try:
font = ImageFont.truetype("arial.ttf", 35)
except IOError:
@@ -120,23 +69,26 @@ def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_
bbox = draw.textbbox((0, 0), caption_text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
# Draw white background rectangle
draw.rectangle([(0, 0), (max_width, text_height + 2 * padding)], fill=(255, 255, 255))
# Draw text centered
text_x = (max_width - text_width) // 2
draw.text((text_x, padding), caption_text, font=font, fill=(255, 0, 0))
# Encode and send image with optimization
# Encode and send
output = io.BytesIO()
stitched.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", SCREENSHOT_DEFAULT_QUALITY)), optimize=True)
pipe_conn.send_bytes(output.getvalue())
result_bytes = output.getvalue()
stitched.close()
del stitched
output.close()
del output
pipe_conn.send_bytes(result_bytes)
del result_bytes
except Exception as e:
pipe_conn.send(f"error:{e}")
logger.error(f"Error in stitch_images_worker_raw_bytes: {e}")
error_msg = f"error:{e}".encode('utf-8')
pipe_conn.send_bytes(error_msg)
finally:
pipe_conn.close()

View File

@@ -0,0 +1,43 @@
"""
Favicon utilities for changedetection.io
Handles favicon MIME type detection with caching
"""
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_favicon_mime_type(filepath):
"""
Detect MIME type of favicon by reading file content using puremagic.
Results are cached to avoid repeatedly reading the same files.
Args:
filepath: Full path to the favicon file
Returns:
MIME type string (e.g., 'image/png')
"""
mime = None
try:
import puremagic
with open(filepath, 'rb') as f:
content_bytes = f.read(200) # Read first 200 bytes
detections = puremagic.magic_string(content_bytes)
if detections:
mime = detections[0].mime_type
except Exception:
pass
# Fallback to mimetypes if puremagic fails
if not mime:
import mimetypes
mime, _ = mimetypes.guess_type(filepath)
# Final fallback based on extension
if not mime:
mime = 'image/x-icon' if filepath.endswith('.ico') else 'image/png'
return mime

View File

@@ -44,6 +44,8 @@ from changedetectionio.api import Watch, WatchHistory, WatchSingleHistory, Watch
from changedetectionio.api.Search import Search
from .time_handler import is_within_schedule
from changedetectionio.languages import get_available_languages, get_language_codes, get_flag_for_locale, get_timeago_locale
from changedetectionio.favicon_utils import get_favicon_mime_type
IN_PYTEST = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
datastore = None
@@ -69,9 +71,13 @@ socketio_server = None
CORS(app)
# Super handy for compressing large BrowserSteps responses and others
FlaskCompress(app)
app.config['COMPRESS_MIN_SIZE'] = 4096
# Flask-Compress handles HTTP compression, Socket.IO compression disabled to prevent memory leak
compress = FlaskCompress()
app.config['COMPRESS_MIN_SIZE'] = 2096
app.config['COMPRESS_MIMETYPES'] = ['text/html', 'text/css', 'text/javascript', 'application/json', 'application/javascript', 'image/svg+xml']
# Use gzip only - smaller memory footprint than zstd/brotli (4-8KB vs 200-500KB contexts)
app.config['COMPRESS_ALGORITHM'] = ['gzip']
compress.init_app(app)
app.config['TEMPLATES_AUTO_RELOAD'] = False
@@ -88,6 +94,14 @@ if os.getenv('FLASK_SERVER_NAME'):
app.config['BABEL_TRANSLATION_DIRECTORIES'] = str(Path(__file__).parent / 'translations')
app.config['BABEL_DEFAULT_LOCALE'] = 'en_GB'
# Session configuration
# NOTE: Flask session (for locale, etc.) is separate from Flask-Login's remember-me cookie
# - Flask session stores data like session['locale'] in a signed cookie
# - Flask-Login's remember=True creates a separate authentication cookie
# - Setting PERMANENT_SESSION_LIFETIME controls how long the Flask session cookie lasts
from datetime import timedelta
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=3650) # ~10 years (effectively unlimited)
#app.config["EXPLAIN_TEMPLATE_LOADING"] = True
@@ -400,11 +414,27 @@ def changedetection_app(config=None, datastore_o=None):
language_codes = get_language_codes()
def get_locale():
# Locale aliases: map browser language codes to translation directory names
# This handles cases where browsers send standard codes (e.g., zh-TW)
# but our translations use more specific codes (e.g., zh_Hant_TW)
locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
# 1. Try to get locale from session (user explicitly selected)
if 'locale' in session:
return session['locale']
# 2. Fall back to Accept-Language header
return request.accept_languages.best_match(language_codes)
# Get the best match from browser's Accept-Language header
browser_locale = request.accept_languages.best_match(language_codes + list(locale_aliases.keys()))
# 3. Check if we need to map the browser locale to our internal locale
if browser_locale in locale_aliases:
return locale_aliases[browser_locale]
return browser_locale
# Initialize Babel with locale selector
babel = Babel(app, locale_selector=get_locale)
@@ -528,6 +558,9 @@ def changedetection_app(config=None, datastore_o=None):
# Validate the locale against available languages
if locale in language_codes:
# Make session permanent so language preference persists across browser sessions
# NOTE: This is the Flask session cookie (separate from Flask-Login's remember-me auth cookie)
session.permanent = True
session['locale'] = locale
# CRITICAL: Flask-Babel caches the locale in the request context (ctx.babel_locale)
@@ -666,16 +699,9 @@ def changedetection_app(config=None, datastore_o=None):
favicon_filename = watch.get_favicon_filename()
if favicon_filename:
try:
import magic
mime = magic.from_file(
os.path.join(watch.watch_data_dir, favicon_filename),
mime=True
)
except ImportError:
# Fallback, no python-magic
import mimetypes
mime, encoding = mimetypes.guess_type(favicon_filename)
# Use cached MIME type detection
filepath = os.path.join(watch.watch_data_dir, favicon_filename)
mime = get_favicon_mime_type(filepath)
response = make_response(send_from_directory(watch.watch_data_dir, favicon_filename))
response.headers['Content-type'] = mime

View File

@@ -727,8 +727,8 @@ class ValidateStartsWithRegex(object):
raise ValidationError(self.message or _l("Invalid value."))
class quickWatchForm(Form):
url = fields.URLField('URL', validators=[validateURL()])
tags = StringTagUUID('Group tag', [validators.Optional()])
url = fields.URLField(_l('URL'), validators=[validateURL()])
tags = StringTagUUID(_l('Group tag'), validators=[validators.Optional()])
watch_submit_button = SubmitField(_l('Watch'), render_kw={"class": "pure-button pure-button-primary"})
processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default="text_json_diff")
edit_and_watch_submit_button = SubmitField(_l('Edit > Watch'), render_kw={"class": "pure-button pure-button-primary"})
@@ -786,6 +786,7 @@ class processor_text_json_diff_form(commonSettingsForm):
time_between_check = EnhancedFormField(
TimeBetweenCheckForm,
label=_l('Time Between Check'),
conditional_field='time_between_check_use_default',
conditional_message=REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT,
conditional_test_function=validate_time_between_check_has_values
@@ -947,7 +948,7 @@ class DefaultUAInputForm(Form):
# datastore.data['settings']['requests']..
class globalSettingsRequestForm(Form):
time_between_check = RequiredFormField(TimeBetweenCheckForm)
time_between_check = RequiredFormField(TimeBetweenCheckForm, label=_l('Time Between Check'))
time_schedule_limit = FormField(ScheduleLimitForm)
proxy = RadioField(_l('Default proxy'))
jitter_seconds = IntegerField(_l('Random jitter seconds ± check'),
@@ -1007,7 +1008,7 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"placeholder": "0.1", "style": "width: 8em;"}
)
password = SaltyPasswordField()
password = SaltyPasswordField(_l('Password'))
pager_size = IntegerField(_l('Pager size'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,

View File

@@ -20,8 +20,9 @@ mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86
def _brotli_save(contents, filepath, mode=None, fallback_uncompressed=False):
"""
Save compressed data using native brotli.
Testing shows no memory leak when using gc.collect() after compression.
Save compressed data using native brotli with streaming compression.
Uses chunked compression to minimize peak memory usage and malloc_trim()
to force release of C-level memory back to the OS.
Args:
contents: data to compress (str or bytes)
@@ -37,27 +38,52 @@ def _brotli_save(contents, filepath, mode=None, fallback_uncompressed=False):
"""
import brotli
import gc
import ctypes
# Ensure contents are bytes
if isinstance(contents, str):
contents = contents.encode('utf-8')
try:
logger.debug(f"Starting brotli compression of {len(contents)} bytes.")
original_size = len(contents)
logger.debug(f"Starting brotli streaming compression of {original_size} bytes.")
if mode is not None:
compressed_data = brotli.compress(contents, mode=mode)
else:
compressed_data = brotli.compress(contents)
# Create streaming compressor
compressor = brotli.Compressor(quality=6, mode=mode if mode is not None else brotli.MODE_GENERIC)
# Stream compress in chunks to minimize memory usage
chunk_size = 65536 # 64KB chunks
total_compressed_size = 0
with open(filepath, 'wb') as f:
f.write(compressed_data)
# Process data in chunks
offset = 0
while offset < len(contents):
chunk = contents[offset:offset + chunk_size]
compressed_chunk = compressor.process(chunk)
if compressed_chunk:
f.write(compressed_chunk)
total_compressed_size += len(compressed_chunk)
offset += chunk_size
logger.debug(f"Finished brotli compression - From {len(contents)} to {len(compressed_data)} bytes.")
# Finalize compression - critical for proper cleanup
final_chunk = compressor.finish()
if final_chunk:
f.write(final_chunk)
total_compressed_size += len(final_chunk)
# Force garbage collection to prevent memory buildup
logger.debug(f"Finished brotli compression - From {original_size} to {total_compressed_size} bytes.")
# Cleanup: Delete compressor, force Python GC, then force C-level memory release
del compressor
gc.collect()
# Force release of C-level memory back to OS (since brotli is a C library)
try:
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass # malloc_trim not available on all systems (e.g., macOS)
return filepath
except Exception as e:

View File

@@ -69,7 +69,7 @@ class RecheckPriorityQueue:
# Emit signals
self._emit_put_signals(item)
logger.debug(f"Successfully queued item: {self._get_item_uuid(item)}")
logger.trace(f"Successfully queued item: {self._get_item_uuid(item)}")
return True
except Exception as e:

View File

@@ -240,7 +240,10 @@ def init_socketio(app, datastore):
async_mode=async_mode,
cors_allowed_origins=cors_origins, # None means same-origin only
logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')),
engineio_logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')))
engineio_logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')),
# Disable WebSocket compression to prevent memory accumulation
# Flask-Compress already handles HTTP response compression
engineio_options={'http_compression': False, 'compression_threshold': 0})
# Set up event handlers
logger.info("Socket.IO: Registering connect event handler")

View File

@@ -91,8 +91,8 @@ REMOVE_REQUESTS_OLD_SCREENSHOTS=false pytest -vv -s --maxfail=1 tests/test_notif
# And again with brotli+screenshot attachment
SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD=5 REMOVE_REQUESTS_OLD_SCREENSHOTS=false pytest -vv -s --maxfail=1 --dist=load tests/test_backend.py tests/test_rss.py tests/test_unique_lines.py tests/test_notification.py tests/test_access_control.py
# Try high concurrency
FETCH_WORKERS=50 pytest tests/test_history_consistency.py -vv -l -s
# Try high concurrency with aggressive worker restarts
FETCH_WORKERS=50 WORKER_MAX_RUNTIME=2 WORKER_MAX_JOBS=1 pytest tests/test_history_consistency.py -vv -l -s
# Check file:// will pickup a file when enabled
echo "Hello world" > /tmp/test-file.txt

View File

@@ -1,19 +1,25 @@
{
"name": "",
"short_name": "",
"name": "ChangeDetection.io",
"short_name": "ChangeDetect",
"description": "Self-hosted website change detection and monitoring",
"icons": [
{
"src": "android-chrome-192x192.png",
"sizes": "192x192",
"type": "image/png"
"type": "image/png",
"purpose": "any maskable"
},
{
"src": "android-chrome-256x256.png",
"sizes": "256x256",
"type": "image/png"
"type": "image/png",
"purpose": "any maskable"
}
],
"theme_color": "#ffffff",
"start_url": "/",
"theme_color": "#5bbad5",
"background_color": "#ffffff",
"display": "standalone"
"display": "standalone",
"categories": ["utilities", "productivity"],
"orientation": "any"
}

View File

@@ -69,6 +69,19 @@
}
});
// Handle Enter key in search input
if (searchInput) {
searchInput.addEventListener('keydown', function(e) {
if (e.key === 'Enter') {
e.preventDefault();
if (searchForm) {
// Trigger form submission programmatically
searchForm.dispatchEvent(new Event('submit'));
}
}
});
}
// Handle form submission
if (searchForm) {
searchForm.addEventListener('submit', function(e) {
@@ -88,8 +101,8 @@
params.append('tags', tags);
}
// Navigate to search results
window.location.href = '?' + params.toString();
// Navigate to search results (always redirect to watchlist home)
window.location.href = '/?' + params.toString();
});
}
});

View File

@@ -1,3 +1,4 @@
import gc
import shutil
from changedetectionio.strtobool import strtobool
@@ -125,6 +126,10 @@ class ChangeDetectionStore:
if 'application' in from_disk['settings']:
self.__data['settings']['application'].update(from_disk['settings']['application'])
# from_disk no longer needed - free memory immediately
del from_disk
gc.collect()
# Convert each existing watch back to the Watch.model object
for uuid, watch in self.__data['watching'].items():
self.__data['watching'][uuid] = self.rehydrate_entity(uuid, watch)
@@ -348,7 +353,8 @@ class ChangeDetectionStore:
r = requests.request(method="GET",
url=url,
# So we know to return the JSON instead of the human-friendly "help" page
headers={'App-Guid': self.__data['app_guid']})
headers={'App-Guid': self.__data['app_guid']},
timeout=5.0) # 5 second timeout to prevent blocking
res = r.json()
# List of permissible attributes we accept from the wild internet
@@ -449,7 +455,7 @@ class ChangeDetectionStore:
data = deepcopy(self.__data)
except RuntimeError as e:
# Try again in 15 seconds
time.sleep(15)
time.sleep(1)
logger.error(f"! Data changed when writing to JSON, trying again.. {str(e)}")
self.sync_to_json()
return

View File

@@ -6,7 +6,7 @@
<div class="pure-controls">
<span class="pure-form-message-inline">
Body for all notifications &dash; You can use <a target="newwindow" href="https://jinja.palletsprojects.com/en/3.0.x/templates/">Jinja2</a> templating in the notification title, body and URL, and tokens from below.
{{ _('Body for all notifications You can use') }} <a target="newwindow" href="https://jinja.palletsprojects.com/en/3.0.x/templates/">Jinja2</a> {{ _('templating in the notification title, body and URL, and tokens from below.') }}
</span><br>
<div data-target="#notification-tokens-info{{ suffix }}" class="toggle-show pure-button button-tag button-xsmall">{{ _('Show token/placeholders') }}
</div>
@@ -22,77 +22,77 @@
<tbody>
<tr>
<td><code>{{ '{{base_url}}' }}</code></td>
<td>The URL of the changedetection.io instance you are running.</td>
<td>{{ _('The URL of the changedetection.io instance you are running.') }}</td>
</tr>
<tr>
<td><code>{{ '{{watch_url}}' }}</code></td>
<td>The URL being watched.</td>
<td>{{ _('The URL being watched.') }}</td>
</tr>
<tr>
<td><code>{{ '{{watch_uuid}}' }}</code></td>
<td>The UUID of the watch.</td>
<td>{{ _('The UUID of the watch.') }}</td>
</tr>
<tr>
<td><code>{{ '{{watch_title}}' }}</code></td>
<td>The page title of the watch, uses &lt;title&gt; if not set, falls back to URL</td>
<td>{{ _('The page title of the watch, uses <title> if not set, falls back to URL') }}</td>
</tr>
<tr>
<td><code>{{ '{{watch_tag}}' }}</code></td>
<td>The watch group / tag</td>
<td>{{ _('The watch group / tag') }}</td>
</tr>
<tr>
<td><code>{{ '{{preview_url}}' }}</code></td>
<td>The URL of the preview page generated by changedetection.io.</td>
<td>{{ _('The URL of the preview page generated by changedetection.io.') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff_url}}' }}</code></td>
<td>The URL of the diff output for the watch.</td>
<td>{{ _('The URL of the diff output for the watch.') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff}}' }}</code></td>
<td>The diff output - only changes, additions, and removals</td>
<td>{{ _('The diff output - only changes, additions, and removals') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff_clean}}' }}</code></td>
<td>The diff output - only changes, additions, and removals &dash; <i>Without (added) prefix or colors</i>
<td>{{ _('The diff output - only changes, additions, and removals —') }} <i>{{ _('Without (added) prefix or colors') }}</i>
</td>
</tr>
<tr>
<td><code>{{ '{{diff_added}}' }}</code></td>
<td>The diff output - only changes and additions</td>
<td>{{ _('The diff output - only changes and additions') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff_added_clean}}' }}</code></td>
<td>The diff output - only changes and additions &dash; <i>Without (added) prefix or colors</i></td>
<td>{{ _('The diff output - only changes and additions —') }} <i>{{ _('Without (added) prefix or colors') }}</i></td>
</tr>
<tr>
<td><code>{{ '{{diff_removed}}' }}</code></td>
<td>The diff output - only changes and removals</td>
<td>{{ _('The diff output - only changes and removals') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff_removed_clean}}' }}</code></td>
<td>The diff output - only changes and removals &dash; <i>Without (added) prefix or colors</i></td>
<td>{{ _('The diff output - only changes and removals —') }} <i>{{ _('Without (added) prefix or colors') }}</i></td>
</tr>
<tr>
<td><code>{{ '{{diff_full}}' }}</code></td>
<td>The diff output - full difference output</td>
<td>{{ _('The diff output - full difference output') }}</td>
</tr>
<tr>
<td><code>{{ '{{diff_full_clean}}' }}</code></td>
<td>The diff output - full difference output &dash; <i>Without (added) prefix or colors</i></td>
<td>{{ _('The diff output - full difference output —') }} <i>{{ _('Without (added) prefix or colors') }}</i></td>
</tr>
<tr>
<td><code>{{ '{{diff_patch}}' }}</code></td>
<td>The diff output - patch in unified format</td>
<td>{{ _('The diff output - patch in unified format') }}</td>
</tr>
<tr>
<td><code>{{ '{{current_snapshot}}' }}</code></td>
<td>The current snapshot text contents value, useful when combined with JSON or CSS filters
<td>{{ _('The current snapshot text contents value, useful when combined with JSON or CSS filters') }}
</td>
</tr>
<tr>
<td><code>{{ '{{triggered_text}}' }}</code></td>
<td>Text that tripped the trigger from filters</td>
<td>{{ _('Text that tripped the trigger from filters') }}</td>
{% if extra_notification_token_placeholder_info %}
{% for token in extra_notification_token_placeholder_info %}
@@ -106,8 +106,8 @@
</table>
<span class="pure-form-message-inline">
Warning: Contents of <code>{{ '{{diff}}' }}</code>, <code>{{ '{{diff_removed}}' }}</code>, and <code>{{ '{{diff_added}}' }}</code> depend on how the difference algorithm perceives the change. <br>
For example, an addition or removal could be perceived as a change in some cases. <a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/wiki/Using-the-%7B%7Bdiff%7D%7D,-%7B%7Bdiff_added%7D%7D,-and-%7B%7Bdiff_removed%7D%7D-notification-tokens">More Here</a> <br>
{{ _('Warning: Contents of') }} <code>{{ '{{diff}}' }}</code>, <code>{{ '{{diff_removed}}' }}</code>, {{ _('and') }} <code>{{ '{{diff_added}}' }}</code> {{ _('depend on how the difference algorithm perceives the change.') }} <br>
{{ _('For example, an addition or removal could be perceived as a change in some cases.') }} <a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/wiki/Using-the-%7B%7Bdiff%7D%7D,-%7B%7Bdiff_added%7D%7D,-and-%7B%7Bdiff_removed%7D%7D-notification-tokens">{{ _('More Here') }}</a> <br>
</span>
</div>
{% endmacro %}
@@ -123,15 +123,15 @@
}}
<div class="pure-form-message-inline">
<p>
<strong>Tip:</strong> Use <a target="newwindow" href="https://github.com/caronc/apprise">AppRise Notification URLs</a> for notification to just about any service! <i><a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes">Please read the notification services wiki here for important configuration notes</a></i>.<br>
<strong>{{ _('Tip:') }}</strong> {{ _('Use') }} <a target="newwindow" href="https://github.com/caronc/apprise">{{ _('AppRise Notification URLs') }}</a> {{ _('for notification to just about any service!') }} <i><a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes">{{ _('Please read the notification services wiki here for important configuration notes') }}</a></i>.<br>
</p>
<div data-target="#advanced-help-notifications" class="toggle-show pure-button button-tag button-xsmall">{{ _('Show advanced help and tips') }}</div>
<ul style="display: none" id="advanced-help-notifications">
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_discord">discord://</a></code> (or <code>https://discord.com/api/webhooks...</code>)) only supports a maximum <strong>2,000 characters</strong> of notification text, including the title.</li>
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> bots can't send messages to other bots, so you should specify chat ID of non-bot user.</li>
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> only supports very limited HTML and can fail when extra tags are sent, <a href="https://core.telegram.org/bots/api#html-style">read more here</a> (or use plaintext/markdown format)</li>
<li><code>gets://</code>, <code>posts://</code>, <code>puts://</code>, <code>deletes://</code> for direct API calls (or omit the "<code>s</code>" for non-SSL ie <code>get://</code>) <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes#postposts">more help here</a></li>
<li>Accepts the <code>{{ '{{token}}' }}</code> placeholders listed below</li>
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_discord">discord://</a></code> {{ _('(or') }} <code>https://discord.com/api/webhooks...</code>)) {{ _('only supports a maximum') }} <strong>{{ _('2,000 characters') }}</strong> {{ _('of notification text, including the title.') }}</li>
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> {{ _('bots can\'t send messages to other bots, so you should specify chat ID of non-bot user.') }}</li>
<li><code><a target="newwindow" href="https://github.com/caronc/apprise/wiki/Notify_telegram">tgram://</a></code> {{ _('only supports very limited HTML and can fail when extra tags are sent,') }} <a href="https://core.telegram.org/bots/api#html-style">{{ _('read more here') }}</a> {{ _('(or use plaintext/markdown format)') }}</li>
<li><code>gets://</code>, <code>posts://</code>, <code>puts://</code>, <code>deletes://</code> {{ _('for direct API calls (or omit the') }} "<code>s</code>" {{ _('for non-SSL ie') }} <code>get://</code>) <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes#postposts">{{ _('more help here') }}</a></li>
<li>{{ _('Accepts the') }} <code>{{ '{{token}}' }}</code> {{ _('placeholders listed below') }}</li>
</ul>
</div>
<div class="notifications-wrapper">
@@ -156,16 +156,16 @@
<div class="pure-form-message-inline">
<ul>
<li><span class="pure-form-message-inline">
For JSON payloads, use <strong>|tojson</strong> without quotes for automatic escaping, for example - <code>{ "name": {{ '{{ watch_title|tojson }}' }} }</code>
{{ _('For JSON payloads, use') }} <strong>|tojson</strong> {{ _('without quotes for automatic escaping, for example -') }} <code>{ "name": {{ '{{ watch_title|tojson }}' }} }</code>
</span></li>
<li><span class="pure-form-message-inline">
URL encoding, use <strong>|urlencode</strong>, for example - <code>gets://hook-website.com/test.php?title={{ '{{ watch_title|urlencode }}' }}</code>
{{ _('URL encoding, use') }} <strong>|urlencode</strong>, {{ _('for example -') }} <code>gets://hook-website.com/test.php?title={{ '{{ watch_title|urlencode }}' }}</code>
</span></li>
<li><span class="pure-form-message-inline">
Regular-expression replace, use <strong>|regex_replace</strong>, for example - <code>{{ "{{ \"hello world 123\" | regex_replace('[0-9]+', 'no-more-numbers') }}" }}</code>
{{ _('Regular-expression replace, use') }} <strong>|regex_replace</strong>, {{ _('for example -') }} <code>{{ "{{ \"hello world 123\" | regex_replace('[0-9]+', 'no-more-numbers') }}" }}</code>
</span></li>
<li><span class="pure-form-message-inline">
For a complete reference of all Jinja2 built-in filters, users can refer to the <a href="https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters">https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters</a>
{{ _('For a complete reference of all Jinja2 built-in filters, users can refer to the') }} <a href="https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters">https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters</a>
</span></li>
</ul>
<br>

View File

@@ -1,5 +1,5 @@
{% macro render_field(field) %}
<div {% if field.errors or field.top_errors %} class="error" {% endif %}><label for="{{ field.id }}">{{ field.label.text | string | forceescape }}</label></div>
<div {% if field.errors or field.top_errors %} class="error" {% endif %}>{{ field.label }}</div>
<div {% if field.errors or field.top_errors %} class="error" {% endif %}>{{ field(**kwargs)|safe }}
{% if field.top_errors %}
top
@@ -59,7 +59,7 @@
{% macro render_ternary_field(field, BooleanField=false) %}
{% if BooleanField %}
{% set _ = field.__setattr__('boolean_mode', true) %}
{% set dummy = field.__setattr__('boolean_mode', true) %}
{% endif %}
<div class="ternary-field {% if field.errors %} error {% endif %}">
<div class="ternary-field-label"><label for="{{ field.id }}">{{ field.label.text | string | forceescape }}</label></div>
@@ -113,17 +113,17 @@
{% macro render_fieldlist_with_inline_errors(fieldlist) %}
{# Specialized macro for FieldList(FormField(...)) that renders errors inline with each field #}
<div {% if fieldlist.errors %} class="error" {% endif %}>{{ fieldlist.label }}</div>
<div {% if fieldlist.errors %} class="error" {% endif %}>{{ _(fieldlist.label.text | string) }}</div>
<div {% if fieldlist.errors %} class="error" {% endif %}>
<ul id="{{ fieldlist.id }}">
{% for entry in fieldlist %}
<li {% if entry.errors %} class="error" {% endif %}>
<label for="{{ entry.id }}" {% if entry.errors %} class="error" {% endif %}>{{ fieldlist.label.text }}-{{ loop.index0 }}</label>
<label for="{{ entry.id }}" {% if entry.errors %} class="error" {% endif %}>{{ _(fieldlist.label.text | string) }}-{{ loop.index0 }}</label>
<table id="{{ entry.id }}" {% if entry.errors %} class="error" {% endif %}>
<tbody>
{% for subfield in entry %}
<tr {% if subfield.errors %} class="error" {% endif %}>
<th {% if subfield.errors %} class="error" {% endif %}><label for="{{ subfield.id }}" {% if subfield.errors %} class="error" {% endif %}>{{ subfield.label.text }}</label></th>
<th {% if subfield.errors %} class="error" {% endif %}><label for="{{ subfield.id }}" {% if subfield.errors %} class="error" {% endif %}>{{ subfield.label.text | string }}</label></th>
<td {% if subfield.errors %} class="error" {% endif %}>
{{ subfield(**kwargs)|safe }}
{% if subfield.errors %}
@@ -148,7 +148,7 @@
<div class="fieldlist_formfields" id="{{ table_id }}">
<div class="fieldlist-header">
{% for subfield in fieldlist[0] %}
<div class="fieldlist-header-cell">{{ subfield.label }}</div>
<div class="fieldlist-header-cell">{{ subfield.label.text | string }}</div>
{% endfor %}
<div class="fieldlist-header-cell">{{ _('Actions') }}</div>
</div>

View File

@@ -27,7 +27,7 @@
<link rel="apple-touch-icon" sizes="180x180" href="{{url_for('static_content', group='favicons', filename='apple-touch-icon.png')}}">
<link rel="icon" type="image/png" sizes="32x32" href="{{url_for('static_content', group='favicons', filename='favicon-32x32.png')}}">
<link rel="icon" type="image/png" sizes="16x16" href="{{url_for('static_content', group='favicons', filename='favicon-16x16.png')}}">
<link rel="manifest" href="{{url_for('static_content', group='favicons', filename='site.webmanifest')}}">
<link rel="manifest" href="{{url_for('static_content', group='favicons', filename='site.webmanifest')}}" crossorigin="use-credentials">
<link rel="mask-icon" href="{{url_for('static_content', group='favicons', filename='safari-pinned-tab.svg')}}" color="#5bbad5">
<link rel="shortcut icon" href="{{url_for('static_content', group='favicons', filename='favicon.ico')}}">
<meta name="msapplication-TileColor" content="#da532c">
@@ -265,6 +265,9 @@
</a>
{% endfor %}
</div>
<div>
<a href="{{ url_for('ui.delete_locale_language_session_var_if_it_exists', redirect=request.path) }}" >{{ _('Auto-detect from browser') }}</a>
</div>
<div>
{{ _('Language support is in beta, please help us improve by opening a PR on GitHub with any updates.') }}
</div>

View File

@@ -14,10 +14,10 @@
<a href="{{ url_for('imports.import_page') }}" class="pure-menu-link">{{ _('IMPORT') }}</a>
</li>
<li class="pure-menu-item" id="menu-pause">
<a href="{{ url_for('settings.toggle_all_paused') }}" ><img src="{{url_for('static_content', group='images', filename='pause.svg')}}" alt="{% if all_paused %}Resume automatic scheduling{% else %}Pause auto-queue scheduling of watches{% endif %}" title="{% if all_paused %}Scheduling is paused - click to resume{% else %}Pause auto-queue scheduling of watches{% endif %}" class="icon icon-pause"{% if not all_paused %} style="opacity: 0.3"{% endif %}></a>
<a href="{{ url_for('settings.toggle_all_paused') }}" ><img src="{{url_for('static_content', group='images', filename='pause.svg')}}" alt="{% if all_paused %}{{ _('Resume automatic scheduling') }}{% else %}{{ _('Pause auto-queue scheduling of watches') }}{% endif %}" title="{% if all_paused %}{{ _('Scheduling is paused - click to resume') }}{% else %}{{ _('Pause auto-queue scheduling of watches') }}{% endif %}" class="icon icon-pause"{% if not all_paused %} style="opacity: 0.3"{% endif %}></a>
</li>
<li class="pure-menu-item " id="menu-mute">
<a href="{{ url_for('settings.toggle_all_muted') }}" ><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="{% if all_muted %}Unmute notifications{% else %}Mute notifications{% endif %}" title="{% if all_muted %}Notifications are muted - click to unmute{% else %}Mute notifications{% endif %}" class="icon icon-mute"{% if not all_muted %} style="opacity: 0.3"{% endif %}></a>
<a href="{{ url_for('settings.toggle_all_muted') }}" ><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="{% if all_muted %}{{ _('Unmute notifications') }}{% else %}{{ _('Mute notifications') }}{% endif %}" title="{% if all_muted %}{{ _('Notifications are muted - click to unmute') }}{% else %}{{ _('Mute notifications') }}{% endif %}" class="icon icon-mute"{% if not all_muted %} style="opacity: 0.3"{% endif %}></a>
</li>
{% else %}
<li class="pure-menu-item menu-collapsible">
@@ -33,7 +33,7 @@
{% else %}
<li class="pure-menu-item menu-collapsible">
<a class="pure-menu-link" href="https://changedetection.io">Website Change Detection and Notification.</a>
<a class="pure-menu-link" href="https://changedetection.io">{{ _('Website Change Detection and Notification.') }}</a>
</li>
{% endif %}
<li class="pure-menu-item menu-collapsible" id="inline-menu-extras-group">
@@ -50,7 +50,9 @@
<span class="visually-hidden">{{ _('Change language') }}</span>
<span class="{{ get_flag_for_locale(get_locale()) }}" id="language-selector-flag"></span>
</button>
<a class="github-link" href="https://github.com/dgtlmoon/changedetection.io">
<a class="github-link" href="https://github.com/dgtlmoon/changedetection.io"
target="_blank"
rel="noopener" >
{% include "svgs/github.svg" %}
</a>
</li>

View File

@@ -58,7 +58,7 @@ def is_valid_uuid(val):
def test_api_simple(client, live_server, measure_memory_usage, datastore_path):
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
@@ -506,7 +506,7 @@ def test_api_import(client, live_server, measure_memory_usage, datastore_path):
def test_api_conflict_UI_password(client, live_server, measure_memory_usage, datastore_path):
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Enable password check and diff page access bypass
@@ -548,3 +548,172 @@ def test_api_conflict_UI_password(client, live_server, measure_memory_usage, dat
assert len(res.json)
def test_api_url_validation(client, live_server, measure_memory_usage, datastore_path):
"""
Test URL validation for edge cases in both CREATE and UPDATE endpoints.
Addresses security issues where empty/null/invalid URLs could bypass validation.
This test ensures that:
- CREATE endpoint rejects null, empty, and invalid URLs
- UPDATE endpoint rejects attempts to change URL to null, empty, or invalid
- UPDATE endpoint allows updating other fields without touching URL
- URL validation properly checks protocol, format, and safety
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: CREATE with null URL should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": None}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 400, "Creating watch with null URL should fail"
# Test 2: CREATE with empty string URL should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": ""}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 400, "Creating watch with empty string URL should fail"
assert b'Invalid or unsupported URL' in res.data or b'required' in res.data.lower()
# Test 3: CREATE with whitespace-only URL should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": " "}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 400, "Creating watch with whitespace-only URL should fail"
# Test 4: CREATE with invalid protocol should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": "javascript:alert(1)"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 400, "Creating watch with javascript: protocol should fail"
# Test 5: CREATE with missing protocol should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": "example.com"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 400, "Creating watch without protocol should fail"
# Test 6: CREATE with valid URL should succeed (baseline)
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url, "title": "Valid URL test"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
assert res.status_code == 201, "Creating watch with valid URL should succeed"
assert is_valid_uuid(res.json.get('uuid'))
watch_uuid = res.json.get('uuid')
wait_for_all_checks(client)
# Test 7: UPDATE to null URL should fail
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": None}),
)
assert res.status_code == 400, "Updating watch URL to null should fail"
# Accept either OpenAPI validation error or our custom validation error
assert b'URL cannot be null' in res.data or b'OpenAPI validation failed' in res.data or b'validation error' in res.data.lower()
# Test 8: UPDATE to empty string URL should fail
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": ""}),
)
assert res.status_code == 400, "Updating watch URL to empty string should fail"
# Accept either our custom validation error or OpenAPI/schema validation error
assert b'URL cannot be empty' in res.data or b'OpenAPI validation' in res.data or b'Invalid or unsupported URL' in res.data
# Test 9: UPDATE to whitespace-only URL should fail
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": " \t\n "}),
)
assert res.status_code == 400, "Updating watch URL to whitespace should fail"
# Accept either our custom validation error or generic validation error
assert b'URL cannot be empty' in res.data or b'Invalid or unsupported URL' in res.data or b'validation' in res.data.lower()
# Test 10: UPDATE to invalid protocol should fail (javascript:)
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": "javascript:alert(document.domain)"}),
)
assert res.status_code == 400, "Updating watch URL to XSS attempt should fail"
assert b'Invalid or unsupported URL' in res.data or b'protocol' in res.data.lower()
# Test 11: UPDATE to file:// protocol should fail (unless ALLOW_FILE_URI is set)
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": "file:///etc/passwd"}),
)
assert res.status_code == 400, "Updating watch URL to file:// should fail by default"
# Test 12: UPDATE other fields without URL should succeed
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"title": "Updated title without URL change"}),
)
assert res.status_code == 200, "Updating other fields without URL should succeed"
# Test 13: Verify URL is still valid after non-URL update
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
assert res.json.get('url') == test_url, "URL should remain unchanged"
assert res.json.get('title') == "Updated title without URL change"
# Test 14: UPDATE to valid different URL should succeed
new_valid_url = test_url + "?new=param"
res = client.put(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
data=json.dumps({"url": new_valid_url}),
)
assert res.status_code == 200, "Updating to valid different URL should succeed"
# Test 15: Verify URL was actually updated
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
assert res.json.get('url') == new_valid_url, "URL should be updated to new valid URL"
# Test 16: CREATE with XSS in URL parameters should fail
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": "http://example.com?xss=<script>alert(1)</script>"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
follow_redirects=True
)
# This should fail because of suspicious characters check
assert res.status_code == 400, "Creating watch with XSS in URL params should fail"
# Cleanup
client.delete(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key},
)
delete_all_watches(client)

View File

@@ -0,0 +1,805 @@
#!/usr/bin/env python3
"""
Comprehensive security and edge case tests for the API.
Tests critical areas that were identified as gaps in the existing test suite.
"""
import time
import json
import threading
import uuid as uuid_module
from flask import url_for
from .util import live_server_setup, wait_for_all_checks, delete_all_watches
import os
def set_original_response(datastore_path):
test_return_data = """<html>
<body>
Some initial text<br>
<p>Which is across multiple lines</p>
</body>
</html>
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None
def is_valid_uuid(val):
try:
uuid_module.UUID(str(val))
return True
except ValueError:
return False
# ============================================================================
# TIER 1: CRITICAL SECURITY TESTS
# ============================================================================
def test_api_path_traversal_in_uuids(client, live_server, measure_memory_usage, datastore_path):
"""
Test that path traversal attacks via UUID parameter are blocked.
Addresses CVE-like vulnerabilities where ../../../ in UUID could access arbitrary files.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Create a valid watch first
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url, "title": "Valid watch"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 201
valid_uuid = res.json.get('uuid')
# Test 1: Path traversal with ../../../
res = client.get(
f"/api/v1/watch/../../etc/passwd",
headers={'x-api-key': api_key}
)
assert res.status_code in [400, 404], "Path traversal should be rejected"
# Test 2: Encoded path traversal
res = client.get(
"/api/v1/watch/..%2F..%2F..%2Fetc%2Fpasswd",
headers={'x-api-key': api_key}
)
assert res.status_code in [400, 404], "Encoded path traversal should be rejected"
# Test 3: Double-encoded path traversal
res = client.get(
"/api/v1/watch/%2e%2e%2f%2e%2e%2f%2e%2e%2f",
headers={'x-api-key': api_key}
)
assert res.status_code in [400, 404], "Double-encoded traversal should be rejected"
# Test 4: Try to access datastore file
res = client.get(
"/api/v1/watch/../url-watches.json",
headers={'x-api-key': api_key}
)
assert res.status_code in [400, 404], "Access to datastore should be blocked"
# Test 5: Null byte injection
res = client.get(
f"/api/v1/watch/{valid_uuid}%00.json",
headers={'x-api-key': api_key}
)
# Should either work (ignoring null byte) or reject - but not crash
assert res.status_code in [200, 400, 404]
# Test 6: DELETE with path traversal
res = client.delete(
"/api/v1/watch/../../datastore/url-watches.json",
headers={'x-api-key': api_key}
)
assert res.status_code in [400, 404, 405], "DELETE with traversal should be blocked (405=method not allowed is also acceptable)"
# Cleanup
client.delete(url_for("watch", uuid=valid_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_injection_via_headers_and_proxy(client, live_server, measure_memory_usage, datastore_path):
"""
Test that injection attacks via headers and proxy fields are properly sanitized.
Addresses XSS and injection vulnerabilities.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: XSS in headers
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"headers": {
"User-Agent": "<script>alert(1)</script>",
"X-Custom": "'; DROP TABLE watches; --"
}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Headers are metadata used for HTTP requests, not HTML rendering
# Storing them as-is is expected behavior
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
# Verify headers are stored (API returns JSON, not HTML, so no XSS risk)
res = client.get(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
assert res.status_code == 200
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 2: Null bytes in headers
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"headers": {"X-Test": "value\x00null"}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should handle null bytes gracefully (reject or sanitize)
assert res.status_code in [201, 400]
# Test 3: Malformed proxy string
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"proxy": "http://evil.com:8080@victim.com"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should reject invalid proxy format
assert res.status_code == 400
# Test 4: Control characters in notification title
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"notification_title": "Test\r\nInjected-Header: value"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept but sanitize control characters
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_large_payload_dos(client, live_server, measure_memory_usage, datastore_path):
"""
Test that excessively large payloads are rejected to prevent DoS.
Addresses memory leak issues found in changelog.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Huge ignore_text array
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"ignore_text": ["a" * 10000] * 100 # 1MB of data
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should either accept (with limits) or reject
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 2: Massive headers object
huge_headers = {f"X-Header-{i}": "x" * 1000 for i in range(100)}
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"headers": huge_headers
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should reject or truncate
assert res.status_code in [201, 400, 413]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 3: Huge browser_steps array
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"browser_steps": [
{"operation": "click", "selector": "#test" * 1000, "optional_value": ""}
] * 100
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should reject or limit
assert res.status_code in [201, 400, 413]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 4: Extremely long title
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"title": "x" * 100000 # 100KB title
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should reject (exceeds maxLength: 5000)
assert res.status_code == 400
delete_all_watches(client)
def test_api_utf8_encoding_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test UTF-8 encoding edge cases that have caused bugs on Windows.
Addresses 18+ encoding bugs from changelog.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Unicode in title (should work)
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"title": "Test 中文 Ελληνικά 日本語 🔥"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 201
watch_uuid = res.json.get('uuid')
# Verify it round-trips correctly
res = client.get(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
assert res.status_code == 200
assert "中文" in res.json.get('title')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 2: Unicode in URL query parameters
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url + "?search=日本語"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should handle URL encoding properly
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 3: Null byte in title (should be rejected or sanitized)
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"title": "Test\x00Title"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should handle gracefully
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 4: BOM (Byte Order Mark) in title
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"title": "\ufeffTest with BOM"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_concurrency_race_conditions(client, live_server, measure_memory_usage, datastore_path):
"""
Test concurrent API requests to detect race conditions.
Addresses 20+ concurrency bugs from changelog.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Create a watch
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url, "title": "Concurrency test"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 201
watch_uuid = res.json.get('uuid')
wait_for_all_checks(client)
# Test 1: Concurrent updates to same watch
# Note: Flask test client is not thread-safe, so we test sequential updates instead
# Real concurrency issues would be caught in integration tests with actual HTTP requests
results = []
for i in range(10):
try:
r = client.put(
url_for("watch", uuid=watch_uuid),
data=json.dumps({"title": f"Title {i}"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
results.append(r.status_code)
except Exception as e:
results.append(str(e))
# All updates should succeed (200) without crashes
assert all(r == 200 for r in results), f"Some updates failed: {results}"
# Test 2: Update while watch is being checked
# Queue a recheck
client.get(
url_for("watch", uuid=watch_uuid, recheck=True),
headers={'x-api-key': api_key}
)
# Immediately update it
res = client.put(
url_for("watch", uuid=watch_uuid),
data=json.dumps({"title": "Updated during check"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should succeed without error
assert res.status_code == 200
# Test 3: Delete watch that's being processed
# Create another watch
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
watch_uuid2 = res.json.get('uuid')
# Queue it for checking
client.get(url_for("watch", uuid=watch_uuid2, recheck=True), headers={'x-api-key': api_key})
# Immediately delete it
res = client.delete(url_for("watch", uuid=watch_uuid2), headers={'x-api-key': api_key})
# Should succeed or return appropriate error
assert res.status_code in [204, 404, 400]
# Cleanup
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
# ============================================================================
# TIER 2: IMPORTANT FUNCTIONALITY TESTS
# ============================================================================
def test_api_time_validation_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test time_between_check validation edge cases.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Zero interval
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"time_between_check_use_default": False,
"time_between_check": {"seconds": 0}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 400, "Zero interval should be rejected"
# Test 2: Negative interval
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"time_between_check_use_default": False,
"time_between_check": {"seconds": -100}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 400, "Negative interval should be rejected"
# Test 3: All fields null with use_default=false
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"time_between_check_use_default": False,
"time_between_check": {"weeks": None, "days": None, "hours": None, "minutes": None, "seconds": None}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 400, "All null intervals should be rejected when not using default"
# Test 4: Extremely large interval (overflow risk)
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"time_between_check_use_default": False,
"time_between_check": {"weeks": 999999999}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should either accept (with limits) or reject
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 5: Valid minimal interval (should work)
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"time_between_check_use_default": False,
"time_between_check": {"seconds": 60}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
assert res.status_code == 201
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_browser_steps_validation(client, live_server, measure_memory_usage, datastore_path):
"""
Test browser_steps validation for invalid operations and structures.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Empty browser step
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"browser_steps": [
{"operation": "", "selector": "", "optional_value": ""}
]
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept (empty is valid as null)
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 2: Invalid operation type
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"browser_steps": [
{"operation": "invalid_operation", "selector": "#test", "optional_value": ""}
]
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept (validation happens at runtime) or reject
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 3: Missing required fields in browser step
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"browser_steps": [
{"operation": "click"} # Missing selector and optional_value
]
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should be rejected due to schema validation
assert res.status_code == 400
# Test 4: Extra fields in browser step
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"browser_steps": [
{"operation": "click", "selector": "#test", "optional_value": "", "extra_field": "value"}
]
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should be rejected due to additionalProperties: false
assert res.status_code == 400
delete_all_watches(client)
def test_api_queue_manipulation(client, live_server, measure_memory_usage, datastore_path):
"""
Test queue behavior under stress and edge cases.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Create many watches rapidly
watch_uuids = []
for i in range(20):
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url, "title": f"Watch {i}"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
if res.status_code == 201:
watch_uuids.append(res.json.get('uuid'))
assert len(watch_uuids) == 20, "Should be able to create 20 watches"
# Test 2: Recheck all when watches exist
res = client.get(
url_for("createwatch", recheck_all='1'),
headers={'x-api-key': api_key},
)
# Should return success (200 or 202 for background processing)
assert res.status_code in [200, 202]
# Test 3: Verify queue doesn't overflow with moderate load
# The app has MAX_QUEUE_SIZE = 5000, we're well below that
wait_for_all_checks(client)
# Cleanup
for uuid in watch_uuids:
client.delete(url_for("watch", uuid=uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
# ============================================================================
# TIER 3: EDGE CASES & POLISH
# ============================================================================
def test_api_history_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test history API with invalid timestamps and edge cases.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Create watch and generate history
res = client.post(
url_for("createwatch"),
data=json.dumps({"url": test_url}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
watch_uuid = res.json.get('uuid')
wait_for_all_checks(client)
# Test 1: Get history with invalid timestamp
res = client.get(
url_for("watchsinglehistory", uuid=watch_uuid, timestamp="invalid"),
headers={'x-api-key': api_key}
)
assert res.status_code == 404, "Invalid timestamp should return 404"
# Test 2: Future timestamp
res = client.get(
url_for("watchsinglehistory", uuid=watch_uuid, timestamp="9999999999"),
headers={'x-api-key': api_key}
)
assert res.status_code == 404, "Future timestamp should return 404"
# Test 3: Negative timestamp
res = client.get(
url_for("watchsinglehistory", uuid=watch_uuid, timestamp="-1"),
headers={'x-api-key': api_key}
)
assert res.status_code == 404, "Negative timestamp should return 404"
# Test 4: Diff with reversed timestamps (from > to)
# First get actual timestamps
res = client.get(
url_for("watchhistory", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
if len(res.json) >= 2:
timestamps = sorted(res.json.keys())
# Try reversed order
res = client.get(
url_for("watchhistorydiff", uuid=watch_uuid, from_timestamp=timestamps[-1], to_timestamp=timestamps[0]),
headers={'x-api-key': api_key}
)
# Should either work (show reverse diff) or return error
assert res.status_code in [200, 400]
# Cleanup
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_notification_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test notification configuration edge cases.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Invalid notification URL
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"notification_urls": ["invalid://url", "ftp://test.com"]
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept (apprise validates at runtime) or reject
assert res.status_code in [201, 400]
if res.status_code == 201:
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
# Test 2: Invalid notification format
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"notification_format": "invalid_format"
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should be rejected by schema
assert res.status_code == 400
# Test 3: Empty notification arrays
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": test_url,
"notification_urls": []
}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept (empty is valid)
assert res.status_code == 201
watch_uuid = res.json.get('uuid')
client.delete(url_for("watch", uuid=watch_uuid), headers={'x-api-key': api_key})
delete_all_watches(client)
def test_api_tag_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test tag/group API edge cases including XSS and path traversal.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Test 1: Empty tag title
res = client.post(
url_for("tag"),
data=json.dumps({"title": ""}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should be rejected (empty title)
assert res.status_code == 400
# Test 2: XSS in tag title
res = client.post(
url_for("tag"),
data=json.dumps({"title": "<script>alert(1)</script>"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept but sanitize
if res.status_code == 201:
tag_uuid = res.json.get('uuid')
# Verify title is stored safely
res = client.get(url_for("tag", uuid=tag_uuid), headers={'x-api-key': api_key})
# Should be escaped or sanitized
client.delete(url_for("tag", uuid=tag_uuid), headers={'x-api-key': api_key})
# Test 3: Path traversal in tag title
res = client.post(
url_for("tag"),
data=json.dumps({"title": "../../etc/passwd"}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should accept (it's just a string, not a path)
if res.status_code == 201:
tag_uuid = res.json.get('uuid')
client.delete(url_for("tag", uuid=tag_uuid), headers={'x-api-key': api_key})
# Test 4: Very long tag title
res = client.post(
url_for("tag"),
data=json.dumps({"title": "x" * 10000}),
headers={'content-type': 'application/json', 'x-api-key': api_key},
)
# Should be rejected (exceeds maxLength)
assert res.status_code == 400
def test_api_authentication_edge_cases(client, live_server, measure_memory_usage, datastore_path):
"""
Test API authentication edge cases.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True)
# Test 1: Missing API key
res = client.get(url_for("createwatch"))
assert res.status_code == 403, "Missing API key should be forbidden"
# Test 2: Invalid API key
res = client.get(
url_for("createwatch"),
headers={'x-api-key': "invalid_key_12345"}
)
assert res.status_code == 403, "Invalid API key should be forbidden"
# Test 3: API key with special characters
res = client.get(
url_for("createwatch"),
headers={'x-api-key': "key<script>alert(1)</script>"}
)
assert res.status_code == 403, "Invalid API key should be forbidden"
# Test 4: Very long API key
res = client.get(
url_for("createwatch"),
headers={'x-api-key': "x" * 10000}
)
assert res.status_code == 403, "Invalid API key should be forbidden"
# Test 5: Case sensitivity of API key
wrong_case_key = api_key.upper() if api_key.islower() else api_key.lower()
res = client.get(
url_for("createwatch"),
headers={'x-api-key': wrong_case_key}
)
# Should be forbidden (keys are case-sensitive)
assert res.status_code == 403, "Wrong case API key should be forbidden"
# Test 6: Valid API key should work
res = client.get(
url_for("createwatch"),
headers={'x-api-key': api_key}
)
assert res.status_code == 200, "Valid API key should work"

View File

@@ -160,7 +160,7 @@ def test_invalid_locale(client, live_server, measure_memory_usage, datastore_pat
def test_language_persistence_in_session(client, live_server, measure_memory_usage, datastore_path):
"""
Test that the language preference persists across multiple requests
within the same session.
within the same session, and that auto-detect properly clears the preference.
"""
# Establish session cookie
@@ -184,6 +184,34 @@ def test_language_persistence_in_session(client, live_server, measure_memory_usa
assert res.status_code == 200
assert b"Annulla" in res.data, "Italian text should persist across requests"
# Verify locale is in session
with client.session_transaction() as sess:
assert sess.get('locale') == 'it', "Locale should be set in session"
# Call auto-detect to clear the locale
res = client.get(
url_for("ui.delete_locale_language_session_var_if_it_exists"),
follow_redirects=True
)
assert res.status_code == 200
# Verify the flash message appears (in English since we cleared the locale)
assert b"Language set to auto-detect from browser" in res.data, "Should show flash message"
# Verify locale was removed from session
with client.session_transaction() as sess:
assert 'locale' not in sess, "Locale should be removed from session after auto-detect"
# Now requests should use browser default (English in test environment)
res = client.get(
url_for("watchlist.index"),
follow_redirects=True
)
assert res.status_code == 200
assert b"Cancel" in res.data, "Should show English after auto-detect clears Italian"
assert b"Annulla" not in res.data, "Should not show Italian after auto-detect"
def test_set_language_with_redirect(client, live_server, measure_memory_usage, datastore_path):
"""
@@ -225,3 +253,374 @@ def test_set_language_with_redirect(client, live_server, measure_memory_usage, d
assert res.status_code in [302, 303]
# Should not redirect to evil.com
assert 'evil.com' not in res.location
def test_time_unit_translations(client, live_server, measure_memory_usage, datastore_path):
"""
Test that time unit labels (Hours, Minutes, Seconds) and Chrome Extension
are correctly translated on the settings page for all supported languages.
"""
from flask import url_for
# Establish session cookie
client.get(url_for("watchlist.index"), follow_redirects=True)
# Test Italian translations
res = client.get(url_for("set_language", locale="it"), follow_redirects=True)
assert res.status_code == 200
res = client.get(url_for("settings.settings_page"), follow_redirects=True)
assert res.status_code == 200
# Check that Italian translations are present (not English)
assert b"Minutes" not in res.data or b"Minuti" in res.data, "Expected Italian 'Minuti' not English 'Minutes'"
assert b"Ore" in res.data, "Expected Italian 'Ore' for Hours"
assert b"Minuti" in res.data, "Expected Italian 'Minuti' for Minutes"
assert b"Secondi" in res.data, "Expected Italian 'Secondi' for Seconds"
assert b"Estensione Chrome" in res.data, "Expected Italian 'Estensione Chrome' for Chrome Extension"
assert b"Intervallo tra controlli" in res.data, "Expected Italian 'Intervallo tra controlli' for Time Between Check"
assert b"Time Between Check" not in res.data, "Should not have English 'Time Between Check'"
# Test Korean translations
res = client.get(url_for("set_language", locale="ko"), follow_redirects=True)
assert res.status_code == 200
res = client.get(url_for("settings.settings_page"), follow_redirects=True)
assert res.status_code == 200
# Check that Korean translations are present (not English)
# Korean: Hours=시간, Minutes=분, Seconds=초, Chrome Extension=Chrome 확장 프로그램, Time Between Check=확인 간격
assert "시간".encode() in res.data, "Expected Korean '시간' for Hours"
assert "".encode() in res.data, "Expected Korean '' for Minutes"
assert "".encode() in res.data, "Expected Korean '' for Seconds"
assert "Chrome 확장 프로그램".encode() in res.data, "Expected Korean 'Chrome 확장 프로그램' for Chrome Extension"
assert "확인 간격".encode() in res.data, "Expected Korean '확인 간격' for Time Between Check"
# Make sure we don't have the incorrect translations
assert "목요일".encode() not in res.data, "Should not have '목요일' (Thursday) for Hours"
assert "무음".encode() not in res.data, "Should not have '무음' (Mute) for Minutes"
assert "Chrome 요청".encode() not in res.data, "Should not have 'Chrome 요청' (Chrome requests) for Chrome Extension"
assert b"Time Between Check" not in res.data, "Should not have English 'Time Between Check'"
# Test Chinese Simplified translations
res = client.get(url_for("set_language", locale="zh"), follow_redirects=True)
assert res.status_code == 200
res = client.get(url_for("settings.settings_page"), follow_redirects=True)
assert res.status_code == 200
# Check that Chinese translations are present
# Chinese: Hours=小时, Minutes=分钟, Seconds=秒, Chrome Extension=Chrome 扩展程序, Time Between Check=检查间隔
assert "小时".encode() in res.data, "Expected Chinese '小时' for Hours"
assert "分钟".encode() in res.data, "Expected Chinese '分钟' for Minutes"
assert "".encode() in res.data, "Expected Chinese '' for Seconds"
assert "Chrome 扩展程序".encode() in res.data, "Expected Chinese 'Chrome 扩展程序' for Chrome Extension"
assert "检查间隔".encode() in res.data, "Expected Chinese '检查间隔' for Time Between Check"
assert b"Time Between Check" not in res.data, "Should not have English 'Time Between Check'"
# Test German translations
res = client.get(url_for("set_language", locale="de"), follow_redirects=True)
assert res.status_code == 200
res = client.get(url_for("settings.settings_page"), follow_redirects=True)
assert res.status_code == 200
# Check that German translations are present
# German: Hours=Stunden, Minutes=Minuten, Seconds=Sekunden, Chrome Extension=Chrome-Erweiterung, Time Between Check=Prüfintervall
assert b"Stunden" in res.data, "Expected German 'Stunden' for Hours"
assert b"Minuten" in res.data, "Expected German 'Minuten' for Minutes"
assert b"Sekunden" in res.data, "Expected German 'Sekunden' for Seconds"
assert b"Chrome-Erweiterung" in res.data, "Expected German 'Chrome-Erweiterung' for Chrome Extension"
assert b"Time Between Check" not in res.data, "Should not have English 'Time Between Check'"
# Test Traditional Chinese (zh_Hant_TW) translations
res = client.get(url_for("set_language", locale="zh_Hant_TW"), follow_redirects=True)
assert res.status_code == 200
res = client.get(url_for("settings.settings_page"), follow_redirects=True)
assert res.status_code == 200
# Check that Traditional Chinese translations are present (not English)
# Traditional Chinese: Hours=小時, Minutes=分鐘, Seconds=秒, Chrome Extension=Chrome 擴充功能, Time Between Check=檢查間隔
assert "小時".encode() in res.data, "Expected Traditional Chinese '小時' for Hours"
assert "分鐘".encode() in res.data, "Expected Traditional Chinese '分鐘' for Minutes"
assert "".encode() in res.data, "Expected Traditional Chinese '' for Seconds"
assert "Chrome 擴充功能".encode() in res.data, "Expected Traditional Chinese 'Chrome 擴充功能' for Chrome Extension"
assert "發送測試通知".encode() in res.data, "Expected Traditional Chinese '發送測試通知' for Send test notification"
assert "通知除錯記錄".encode() in res.data, "Expected Traditional Chinese '通知除錯記錄' for Notification debug logs"
assert "檢查間隔".encode() in res.data, "Expected Traditional Chinese '檢查間隔' for Time Between Check"
# Make sure we don't have incorrect English text or wrong translations
assert b"Send test notification" not in res.data, "Should not have English 'Send test notification'"
assert b"Time Between Check" not in res.data, "Should not have English 'Time Between Check'"
assert "Chrome 請求".encode() not in res.data, "Should not have incorrect 'Chrome 請求' (Chrome requests)"
assert "使用預設通知".encode() not in res.data, "Should not have incorrect '使用預設通知' (Use default notification)"
def test_accept_language_header_zh_tw(client, live_server, measure_memory_usage, datastore_path):
"""
Test that browsers sending zh-TW in Accept-Language header get Traditional Chinese.
This tests the locale alias mapping for issue #3779.
"""
from flask import url_for
# Clear any session data to simulate a fresh visitor
with client.session_transaction() as sess:
sess.clear()
# Request the index page with zh-TW in Accept-Language header (what browsers send)
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'},
follow_redirects=True
)
assert res.status_code == 200
# Should get Traditional Chinese content, not Simplified Chinese
# Traditional: 選擇語言, Simplified: 选择语言
assert '選擇語言'.encode() in res.data, "Expected Traditional Chinese '選擇語言' (Select Language)"
assert '选择语言'.encode() not in res.data, "Should not get Simplified Chinese '选择语言'"
# Check HTML lang attribute uses BCP 47 format
assert b'<html lang="zh-Hant-TW"' in res.data, "Expected BCP 47 language tag zh-Hant-TW in HTML"
# Check that the correct flag icon is shown (Taiwan flag for Traditional Chinese)
assert b'<span class="fi fi-tw fis" id="language-selector-flag">' in res.data, \
"Expected Taiwan flag 'fi fi-tw' for Traditional Chinese"
assert b'<span class="fi fi-cn fis" id="language-selector-flag">' not in res.data, \
"Should not show China flag 'fi fi-cn' for Traditional Chinese"
# Verify we're getting Traditional Chinese text throughout the page
res = client.get(
url_for("settings.settings_page"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'},
follow_redirects=True
)
assert res.status_code == 200
# Check Traditional Chinese translations (not English)
assert "小時".encode() in res.data, "Expected Traditional Chinese '小時' for Hours"
assert "分鐘".encode() in res.data, "Expected Traditional Chinese '分鐘' for Minutes"
assert b"Hours" not in res.data or "小時".encode() in res.data, "Should have Traditional Chinese, not English"
def test_accept_language_header_en_variants(client, live_server, measure_memory_usage, datastore_path):
"""
Test that browsers sending en-GB and en-US in Accept-Language header get the correct English variant.
This ensures the locale selector works properly for English variants.
"""
from flask import url_for
# Test 1: British English (en-GB)
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'en-GB,en;q=0.9'},
follow_redirects=True
)
assert res.status_code == 200
# Should get English content
assert b"Select Language" in res.data, "Expected English text 'Select Language'"
# Check HTML lang attribute uses BCP 47 format with hyphen
assert b'<html lang="en-GB"' in res.data, "Expected BCP 47 language tag en-GB in HTML"
# Check that the correct flag icon is shown (UK flag for en-GB)
assert b'<span class="fi fi-gb fis" id="language-selector-flag">' in res.data, \
"Expected UK flag 'fi fi-gb' for British English"
# Test 2: American English (en-US)
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'en-US,en;q=0.9'},
follow_redirects=True
)
assert res.status_code == 200
# Should get English content
assert b"Select Language" in res.data, "Expected English text 'Select Language'"
# Check HTML lang attribute uses BCP 47 format with hyphen
assert b'<html lang="en-US"' in res.data, "Expected BCP 47 language tag en-US in HTML"
# Check that the correct flag icon is shown (US flag for en-US)
assert b'<span class="fi fi-us fis" id="language-selector-flag">' in res.data, \
"Expected US flag 'fi fi-us' for American English"
# Test 3: Generic 'en' should fall back to one of the English variants
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'en'},
follow_redirects=True
)
assert res.status_code == 200
# Should get English content (either variant is fine)
assert b"Select Language" in res.data, "Expected English text 'Select Language'"
def test_accept_language_header_zh_simplified(client, live_server, measure_memory_usage, datastore_path):
"""
Test that browsers sending zh or zh-CN in Accept-Language header get Simplified Chinese.
This ensures Simplified Chinese still works correctly and doesn't get confused with Traditional.
"""
from flask import url_for
# Test 1: Generic 'zh' should get Simplified Chinese
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'zh,en;q=0.9'},
follow_redirects=True
)
assert res.status_code == 200
# Should get Simplified Chinese content, not Traditional Chinese
# Simplified: 选择语言, Traditional: 選擇語言
assert '选择语言'.encode() in res.data, "Expected Simplified Chinese '选择语言' (Select Language)"
assert '選擇語言'.encode() not in res.data, "Should not get Traditional Chinese '選擇語言'"
# Check HTML lang attribute
assert b'<html lang="zh"' in res.data, "Expected language tag zh in HTML"
# Check that the correct flag icon is shown (China flag for Simplified Chinese)
assert b'<span class="fi fi-cn fis" id="language-selector-flag">' in res.data, \
"Expected China flag 'fi fi-cn' for Simplified Chinese"
assert b'<span class="fi fi-tw fis" id="language-selector-flag">' not in res.data, \
"Should not show Taiwan flag 'fi fi-tw' for Simplified Chinese"
# Test 2: 'zh-CN' should also get Simplified Chinese
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8'},
follow_redirects=True
)
assert res.status_code == 200
# Should get Simplified Chinese content
assert '选择语言'.encode() in res.data, "Expected Simplified Chinese '选择语言' with zh-CN header"
assert '選擇語言'.encode() not in res.data, "Should not get Traditional Chinese with zh-CN header"
# Check that the correct flag icon is shown (China flag for zh-CN)
assert b'<span class="fi fi-cn fis" id="language-selector-flag">' in res.data, \
"Expected China flag 'fi fi-cn' for zh-CN header"
# Verify Simplified Chinese in settings page
res = client.get(
url_for("settings.settings_page"),
headers={'Accept-Language': 'zh,en;q=0.9'},
follow_redirects=True
)
assert res.status_code == 200
# Check Simplified Chinese translations (not Traditional or English)
# Simplified: 小时, Traditional: 小時
assert "小时".encode() in res.data, "Expected Simplified Chinese '小时' for Hours"
assert "分钟".encode() in res.data, "Expected Simplified Chinese '分钟' for Minutes"
assert "".encode() in res.data, "Expected Simplified Chinese '' for Seconds"
# Make sure it's not Traditional Chinese
assert "小時".encode() not in res.data, "Should not have Traditional Chinese '小時'"
assert "分鐘".encode() not in res.data, "Should not have Traditional Chinese '分鐘'"
def test_session_locale_overrides_accept_language(client, live_server, measure_memory_usage, datastore_path):
"""
Test that session locale preference overrides browser Accept-Language header.
Scenario:
1. Browser auto-detects zh-TW (Traditional Chinese) from Accept-Language header
2. User explicitly selects Korean language
3. On subsequent page loads, Korean should be shown (not Traditional Chinese)
even though the Accept-Language header still says zh-TW
This tests the session override behavior for issue #3779.
"""
from flask import url_for
# Step 1: Clear session and make first request with zh-TW header (auto-detect)
with client.session_transaction() as sess:
sess.clear()
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'},
follow_redirects=True
)
assert res.status_code == 200
# Should initially get Traditional Chinese from auto-detect
assert '選擇語言'.encode() in res.data, "Expected Traditional Chinese '選擇語言' from auto-detect"
assert b'<html lang="zh-Hant-TW"' in res.data, "Expected zh-Hant-TW language tag"
assert b'<span class="fi fi-tw fis" id="language-selector-flag">' in res.data, \
"Expected Taiwan flag 'fi fi-tw' from auto-detect"
# Step 2: User explicitly selects Korean language
res = client.get(
url_for("set_language", locale="ko"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'}, # Browser still sends zh-TW
follow_redirects=True
)
assert res.status_code == 200
# Step 3: Make another request with same zh-TW header
# Session should override the Accept-Language header
res = client.get(
url_for("watchlist.index"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'}, # Still sending zh-TW!
follow_redirects=True
)
assert res.status_code == 200
# Should now get Korean (session overrides auto-detect)
# Korean: 언어 선택, Traditional Chinese: 選擇語言
assert '언어 선택'.encode() in res.data, "Expected Korean '언어 선택' (Select Language) from session"
assert '選擇語言'.encode() not in res.data, "Should not get Traditional Chinese when Korean is set in session"
# Check HTML lang attribute is Korean
assert b'<html lang="ko"' in res.data, "Expected Korean language tag 'ko' in HTML"
# Check that Korean flag is shown (not Taiwan flag)
assert b'<span class="fi fi-kr fis" id="language-selector-flag">' in res.data, \
"Expected Korean flag 'fi fi-kr' from session preference"
assert b'<span class="fi fi-tw fis" id="language-selector-flag">' not in res.data, \
"Should not show Taiwan flag when Korean is set in session"
# Verify Korean text on settings page as well
res = client.get(
url_for("settings.settings_page"),
headers={'Accept-Language': 'zh-TW,zh;q=0.9,en;q=0.8'}, # Still zh-TW!
follow_redirects=True
)
assert res.status_code == 200
# Check Korean translations (not Traditional Chinese or English)
# Korean: 시간 (Hours), 분 (Minutes), 초 (Seconds)
# Traditional Chinese: 小時, 分鐘, 秒
assert "시간".encode() in res.data, "Expected Korean '시간' for Hours"
assert "".encode() in res.data, "Expected Korean '' for Minutes"
assert "小時".encode() not in res.data, "Should not have Traditional Chinese '小時' when Korean is set"
assert "分鐘".encode() not in res.data, "Should not have Traditional Chinese '分鐘' when Korean is set"

View File

@@ -22,14 +22,13 @@ something to trigger<br>
def test_content_filter_live_preview(client, live_server, measure_memory_usage, datastore_path):
# live_server_setup(live_server) # Setup on conftest per function
set_response(datastore_path=datastore_path)
import time
test_url = url_for('test_endpoint', _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
assert b'Queued 1 watch for rechecking.' in res.data
time.sleep(0.5)
wait_for_all_checks(client)
res = client.post(

View File

@@ -42,6 +42,9 @@ def test_check_notification_error_handling(client, live_server, measure_memory_u
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
found=False
for i in range(1, 10):

View File

@@ -0,0 +1,101 @@
# Translation Guide
## Updating Translations
To maintain consistency and minimize unnecessary changes in translation files, run these commands:
```bash
python setup.py extract_messages # Extract translatable strings
python setup.py update_catalog # Update all language files
python setup.py compile_catalog # Compile to binary .mo files
```
## Configuration
All translation settings are configured in **`../../setup.cfg`** (single source of truth).
The configuration below is shown for reference - **edit `setup.cfg` to change settings**:
```ini
[extract_messages]
# Extract translatable strings from source code
mapping_file = babel.cfg
output_file = changedetectionio/translations/messages.pot
input_paths = changedetectionio
keywords = _ _l gettext
# Options to reduce unnecessary changes in .pot files
sort_by_file = true # Keeps entries ordered by file path
width = 120 # Consistent line width (prevents rewrapping)
add_location = file # Show file path only (not line numbers)
[update_catalog]
# Update existing .po files with new strings from .pot
# Note: 'locale' is omitted - Babel auto-discovers all catalogs in output_dir
input_file = changedetectionio/translations/messages.pot
output_dir = changedetectionio/translations
domain = messages
# Options for consistent formatting
width = 120 # Consistent line width
no_fuzzy_matching = true # Avoids incorrect automatic matches
[compile_catalog]
# Compile .po files to .mo binary format
directory = changedetectionio/translations
domain = messages
```
**Key formatting options:**
- `sort_by_file = true` - Orders entries by file path (consistent ordering)
- `width = 120` - Fixed line width prevents text rewrapping
- `add_location = file` - Shows file path only, not line numbers (reduces git churn)
- `no_fuzzy_matching = true` - Prevents incorrect automatic fuzzy matches
## Why Use These Commands?
Running pybabel commands directly without consistent options causes:
- ❌ Entries get reordered differently each time
- ❌ Text gets rewrapped at different widths
- ❌ Line numbers change every edit (if not configured)
- ❌ Large diffs that make code review difficult
Using `python setup.py` commands ensures:
- ✅ Consistent ordering (by file path, not alphabetically)
- ✅ Consistent line width (120 characters, no rewrapping)
- ✅ File-only locations (no line number churn)
- ✅ No fuzzy matching (prevents incorrect auto-translations)
- ✅ Minimal diffs (only actual changes show up)
- ✅ Easier code review and git history
These commands read settings from `../../setup.cfg` automatically.
## Supported Languages
- `cs` - Czech (Čeština)
- `de` - German (Deutsch)
- `en_GB` - English (UK)
- `en_US` - English (US)
- `fr` - French (Français)
- `it` - Italian (Italiano)
- `ko` - Korean (한국어)
- `zh` - Chinese Simplified (中文简体)
- `zh_Hant_TW` - Chinese Traditional (繁體中文)
## Adding a New Language
1. Initialize the new language catalog:
```bash
pybabel init -i changedetectionio/translations/messages.pot -d changedetectionio/translations -l NEW_LANG_CODE
```
2. Compile it:
```bash
python setup.py compile_catalog
```
Babel will auto-discover the new language on subsequent translation updates.
## Translation Notes
From CLAUDE.md:
- Always use "monitor" or "watcher" terminology (not "clock")
- Use the most brief wording suitable
- When finding issues in one language, check ALL languages for the same issue

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -64,6 +64,19 @@ def is_safe_valid_url(test_url):
import re
import validators
# Validate input type first - must be a non-empty string
if test_url is None:
logger.warning('URL validation failed: URL is None')
return False
if not isinstance(test_url, str):
logger.warning(f'URL validation failed: URL must be a string, got {type(test_url).__name__}')
return False
if not test_url.strip():
logger.warning('URL validation failed: URL is empty or whitespace only')
return False
allow_file_access = strtobool(os.getenv('ALLOW_FILE_URI', 'false'))
safe_protocol_regex = '^(http|https|ftp|file):' if allow_file_access else '^(http|https|ftp):'

View File

@@ -132,11 +132,19 @@ async def start_single_async_worker(worker_id, update_q, notification_q, app, da
while not app.config.exit.is_set():
try:
await async_update_worker(worker_id, update_q, notification_q, app, datastore, executor)
# If we reach here, worker exited cleanly
if not in_pytest:
logger.info(f"Async worker {worker_id} exited cleanly")
break
result = await async_update_worker(worker_id, update_q, notification_q, app, datastore, executor)
if result == "restart":
# Worker requested restart - immediately loop back and restart
if not in_pytest:
logger.debug(f"Async worker {worker_id} restarting")
continue
else:
# Worker exited cleanly (shutdown)
if not in_pytest:
logger.info(f"Async worker {worker_id} exited cleanly")
break
except asyncio.CancelledError:
# Task was cancelled (normal shutdown)
if not in_pytest:
@@ -147,7 +155,7 @@ async def start_single_async_worker(worker_id, update_q, notification_q, app, da
if not in_pytest:
logger.info(f"Restarting async worker {worker_id} in 5 seconds...")
await asyncio.sleep(5)
if not in_pytest:
logger.info(f"Async worker {worker_id} shutdown complete")
@@ -161,7 +169,11 @@ def add_worker(update_q, notification_q, app, datastore):
"""Add a new async worker (for dynamic scaling)"""
global worker_threads
worker_id = len(worker_threads)
# Reuse lowest available ID to prevent unbounded growth over time
used_ids = {w.worker_id for w in worker_threads}
worker_id = 0
while worker_id in used_ids:
worker_id += 1
logger.info(f"Adding async worker {worker_id}")
try:
@@ -251,7 +263,7 @@ def queue_item_async_safe(update_q, item, silent=False):
return False
if not silent:
logger.debug(f"Successfully queued item: {item_uuid}")
logger.trace(f"Successfully queued item: {item_uuid}")
return True
except Exception as e:

View File

@@ -183,15 +183,30 @@ components:
properties:
weeks:
type: integer
minimum: 0
maximum: 52000
nullable: true
days:
type: integer
minimum: 0
maximum: 365000
nullable: true
hours:
type: integer
minimum: 0
maximum: 8760000
nullable: true
minutes:
type: integer
minimum: 0
maximum: 525600000
nullable: true
seconds:
type: integer
description: Time intervals between checks
minimum: 0
maximum: 31536000000
nullable: true
description: Time intervals between checks. All fields must be non-negative. At least one non-zero value required when not using default settings.
time_between_check_use_default:
type: boolean
default: true
@@ -200,7 +215,9 @@ components:
type: array
items:
type: string
description: Notification URLs for this web page change monitor (watch)
maxLength: 1000
maxItems: 100
description: Notification URLs for this web page change monitor (watch). Maximum 100 URLs.
notification_title:
type: string
description: Custom notification title
@@ -224,14 +241,19 @@ components:
operation:
type: string
maxLength: 5000
nullable: true
selector:
type: string
maxLength: 5000
nullable: true
optional_value:
type: string
maxLength: 5000
nullable: true
required: [operation, selector, optional_value]
description: Browser automation steps
additionalProperties: false
maxItems: 100
description: Browser automation steps. Maximum 100 steps allowed.
processor:
type: string
enum: [restock_diff, text_json_diff]

View File

@@ -91,7 +91,7 @@ jq~=1.3; python_version >= "3.8" and sys_platform == "linux"
# playwright is installed at Dockerfile build time because it's not available on all platforms
pyppeteer-ng==2.0.0rc11
pyppeteer-ng==2.0.0rc12
pyppeteerstealth>=0.0.4
# Include pytest, so if theres a support issue we can ask them to run these tests on their setup

28
setup.cfg Normal file
View File

@@ -0,0 +1,28 @@
# Translation configuration for changedetection.io
# See changedetectionio/translations/README.md for full documentation on updating translations
[extract_messages]
# Extract translatable strings from source code
mapping_file = babel.cfg
output_file = changedetectionio/translations/messages.pot
input_paths = changedetectionio
keywords = _ _l gettext
# Options to reduce unnecessary changes in .pot files
sort_by_file = true
width = 120
add_location = file
[update_catalog]
# Update existing .po files with new strings from .pot
# Note: Omitting 'locale' makes Babel auto-discover all catalogs in output_dir
input_file = changedetectionio/translations/messages.pot
output_dir = changedetectionio/translations
domain = messages
# Options for consistent formatting
width = 120
no_fuzzy_matching = true
[compile_catalog]
# Compile .po files to .mo binary format
directory = changedetectionio/translations
domain = messages