Compare commits

..

11 Commits

Author SHA1 Message Date
dgtlmoon 6f420f5bff Merge branch 'master' into datastore-refactor
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-28 09:17:18 +01:00
dgtlmoon bee6713f12 WIP
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-22 19:54:28 +01:00
dgtlmoon ef6208fdbd tweaks 2026-01-22 17:36:29 +01:00
dgtlmoon 1ef29ab1b8 Small tweak 2026-01-22 17:19:12 +01:00
dgtlmoon 445ce88114 Misc performance improvements 2026-01-22 17:16:40 +01:00
dgtlmoon eb83a253f4 Merge branch 'master' into datastore-refactor 2026-01-22 16:22:27 +01:00
dgtlmoon 746990391a test fix
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-19 18:58:39 +01:00
dgtlmoon 1661f7b85b tweak 2026-01-19 18:55:16 +01:00
dgtlmoon 1f37b358b1 Merge branch 'master' into datastore-refactor 2026-01-19 18:52:06 +01:00
dgtlmoon 639c53f0f8 left out pypi file 2026-01-19 13:11:57 +01:00
dgtlmoon 48e8295433 Big refactor to save watches as their own datafile with some agnostic data store backend 2026-01-19 12:59:10 +01:00
67 changed files with 851 additions and 1848 deletions
@@ -37,29 +37,10 @@ jobs:
${{ runner.os }}-pip-py${{ env.PYTHON_VERSION }}- ${{ runner.os }}-pip-py${{ env.PYTHON_VERSION }}-
${{ runner.os }}-pip- ${{ runner.os }}-pip-
- name: Get current date for cache key
id: date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build changedetection.io container for testing under Python ${{ env.PYTHON_VERSION }} - name: Build changedetection.io container for testing under Python ${{ env.PYTHON_VERSION }}
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
build-args: |
PYTHON_VERSION=${{ env.PYTHON_VERSION }}
LOGGER_LEVEL=TRACE
tags: test-changedetectionio
load: true
cache-from: type=gha,scope=build-${{ github.ref_name }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements.txt', 'Dockerfile') }}-${{ steps.date.outputs.date }}
cache-to: type=gha,mode=max,scope=build-${{ github.ref_name }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements.txt', 'Dockerfile') }}-${{ steps.date.outputs.date }}
- name: Verify build
run: | run: |
echo "---- Built for Python ${{ env.PYTHON_VERSION }} -----" echo "---- Building for Python ${{ env.PYTHON_VERSION }} -----"
docker build --build-arg PYTHON_VERSION=${{ env.PYTHON_VERSION }} --build-arg LOGGER_LEVEL=TRACE -t test-changedetectionio .
docker run test-changedetectionio bash -c 'pip list' docker run test-changedetectionio bash -c 'pip list'
- name: We should be Python ${{ env.PYTHON_VERSION }} ... - name: We should be Python ${{ env.PYTHON_VERSION }} ...
@@ -395,29 +376,6 @@ jobs:
cd changedetectionio cd changedetectionio
./run_custom_browser_url_tests.sh ./run_custom_browser_url_tests.sh
processor-plugin-tests:
runs-on: ubuntu-latest
needs: build
timeout-minutes: 20
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v7
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
- name: Load Docker image
run: |
docker load -i /tmp/test-changedetectionio.tar
- name: Basic processor plugin registration and checks
run: |
docker run -e EXTRA_PACKAGES=changedetection.io-osint-processor test-changedetectionio bash -c 'cd changedetectionio;pytest -vvv -s tests/plugins/test_processor.py::test_check_plugin_processor'
# Container startup tests # Container startup tests
container-tests: container-tests:
runs-on: ubuntu-latest runs-on: ubuntu-latest
-1
View File
@@ -29,4 +29,3 @@ test-datastore/
# Memory consumption log # Memory consumption log
test-memory.log test-memory.log
tests/logs/
-9
View File
@@ -138,15 +138,6 @@ ENV LOGGER_LEVEL="$LOGGER_LEVEL"
ENV LC_ALL=en_US.UTF-8 ENV LC_ALL=en_US.UTF-8
WORKDIR /app WORKDIR /app
# Copy and set up entrypoint script for installing extra packages
COPY docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh
# Set entrypoint to handle EXTRA_PACKAGES env var
ENTRYPOINT ["/docker-entrypoint.sh"]
# Default command (can be overridden in docker-compose.yml)
CMD ["python", "./changedetection.py", "-d", "/datastore"] CMD ["python", "./changedetection.py", "-d", "/datastore"]
+72 -68
View File
@@ -102,8 +102,8 @@ def sigshutdown_handler(_signo, _stack_frame):
# Shutdown workers and queues immediately # Shutdown workers and queues immediately
try: try:
from changedetectionio import worker_pool from changedetectionio import worker_handler
worker_pool.shutdown_workers() worker_handler.shutdown_workers()
except Exception as e: except Exception as e:
logger.error(f"Error shutting down workers: {str(e)}") logger.error(f"Error shutting down workers: {str(e)}")
@@ -133,48 +133,43 @@ def sigshutdown_handler(_signo, _stack_frame):
sys.exit() sys.exit()
def print_help():
"""Print help text for command line options"""
print('Usage: changedetection.py [options]')
print('')
print('Standard options:')
print(' -s SSL enable')
print(' -h HOST Listen host (default: 0.0.0.0)')
print(' -p PORT Listen port (default: 5000)')
print(' -d PATH Datastore path')
print(' -l LEVEL Log level (TRACE, DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL)')
print(' -c Cleanup unused snapshots')
print(' -C Create datastore directory if it doesn\'t exist')
print(' -P true/false Set all watches paused (true) or active (false)')
print('')
print('Add URLs on startup:')
print(' -u URL Add URL to watch (can be used multiple times)')
print(' -u0 \'JSON\' Set options for first -u URL (e.g. \'{"processor":"text_json_diff"}\')')
print(' -u1 \'JSON\' Set options for second -u URL (0-indexed)')
print(' -u2 \'JSON\' Set options for third -u URL, etc.')
print(' Available options: processor, fetch_backend, headers, method, etc.')
print(' See model/Watch.py for all available options')
print('')
print('Recheck on startup:')
print(' -r all Queue all watches for recheck on startup')
print(' -r UUID,... Queue specific watches (comma-separated UUIDs)')
print(' -r all N Queue all watches, wait for completion, repeat N times')
print(' -r UUID,... N Queue specific watches, wait for completion, repeat N times')
print('')
print('Batch mode:')
print(' -b Run in batch mode (process queue then exit)')
print(' Useful for CI/CD, cron jobs, or one-time checks')
print(' NOTE: Batch mode checks if Flask is running and aborts if port is in use')
print(' Use -p PORT to specify a different port if needed')
print('')
def main(): def main():
global datastore global datastore
global app global app
# Early help/version check before any initialization # Early help/version check before any initialization
if '--help' in sys.argv or '-help' in sys.argv: if '--help' in sys.argv or '-help' in sys.argv:
print_help() print('Usage: changedetection.py [options]')
print('')
print('Standard options:')
print(' -s SSL enable')
print(' -h HOST Listen host (default: 0.0.0.0)')
print(' -p PORT Listen port (default: 5000)')
print(' -d PATH Datastore path')
print(' -l LEVEL Log level (TRACE, DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL)')
print(' -c Cleanup unused snapshots')
print(' -C Create datastore directory if it doesn\'t exist')
print('')
print('Add URLs on startup:')
print(' -u URL Add URL to watch (can be used multiple times)')
print(' -u0 \'JSON\' Set options for first -u URL (e.g. \'{"processor":"text_json_diff"}\')')
print(' -u1 \'JSON\' Set options for second -u URL (0-indexed)')
print(' -u2 \'JSON\' Set options for third -u URL, etc.')
print(' Available options: processor, fetch_backend, headers, method, etc.')
print(' See model/Watch.py for all available options')
print('')
print('Recheck on startup:')
print(' -r all Queue all watches for recheck on startup')
print(' -r UUID,... Queue specific watches (comma-separated UUIDs)')
print(' -r all N Queue all watches, wait for completion, repeat N times')
print(' -r UUID,... N Queue specific watches, wait for completion, repeat N times')
print('')
print('Batch mode:')
print(' -b Run in batch mode (process queue then exit)')
print(' Useful for CI/CD, cron jobs, or one-time checks')
print(' NOTE: Batch mode checks if Flask is running and aborts if port is in use')
print(' Use -p PORT to specify a different port if needed')
print('')
sys.exit(0) sys.exit(0)
if '--version' in sys.argv or '-v' in sys.argv: if '--version' in sys.argv or '-v' in sys.argv:
@@ -190,7 +185,6 @@ def main():
# Set a default logger level # Set a default logger level
logger_level = 'DEBUG' logger_level = 'DEBUG'
include_default_watches = True include_default_watches = True
all_paused = None # None means don't change, True/False to set
host = os.environ.get("LISTEN_HOST", "0.0.0.0").strip() host = os.environ.get("LISTEN_HOST", "0.0.0.0").strip()
port = int(os.environ.get('PORT', 5000)) port = int(os.environ.get('PORT', 5000))
@@ -269,9 +263,39 @@ def main():
i += 1 i += 1
try: try:
opts, args = getopt.getopt(cleaned_argv[1:], "6Ccsd:h:p:l:P:", "port") opts, args = getopt.getopt(cleaned_argv[1:], "6Ccsd:h:p:l:", "port")
except getopt.GetoptError as e: except getopt.GetoptError as e:
print_help() print('Usage: changedetection.py [options]')
print('')
print('Standard options:')
print(' -s SSL enable')
print(' -h HOST Listen host (default: 0.0.0.0)')
print(' -p PORT Listen port (default: 5000)')
print(' -d PATH Datastore path')
print(' -l LEVEL Log level (TRACE, DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL)')
print(' -c Cleanup unused snapshots')
print(' -C Create datastore directory if it doesn\'t exist')
print('')
print('Add URLs on startup:')
print(' -u URL Add URL to watch (can be used multiple times)')
print(' -u0 \'JSON\' Set options for first -u URL (e.g. \'{"processor":"text_json_diff"}\')')
print(' -u1 \'JSON\' Set options for second -u URL (0-indexed)')
print(' -u2 \'JSON\' Set options for third -u URL, etc.')
print(' Available options: processor, fetch_backend, headers, method, etc.')
print(' See model/Watch.py for all available options')
print('')
print('Recheck on startup:')
print(' -r all Queue all watches for recheck on startup')
print(' -r UUID,... Queue specific watches (comma-separated UUIDs)')
print(' -r all N Queue all watches, wait for completion, repeat N times')
print(' -r UUID,... N Queue specific watches, wait for completion, repeat N times')
print('')
print('Batch mode:')
print(' -b Run in batch mode (process queue then exit)')
print(' Useful for CI/CD, cron jobs, or one-time checks')
print(' NOTE: Batch mode checks if Flask is running and aborts if port is in use')
print(' Use -p PORT to specify a different port if needed')
print('')
print(f'Error: {e}') print(f'Error: {e}')
sys.exit(2) sys.exit(2)
@@ -308,14 +332,6 @@ def main():
if opt == '-l': if opt == '-l':
logger_level = int(arg) if arg.isdigit() else arg.upper() logger_level = int(arg) if arg.isdigit() else arg.upper()
if opt == '-P':
try:
all_paused = bool(strtobool(arg))
except ValueError:
print(f'Error: Invalid value for -P option: {arg}')
print('Expected: true, false, yes, no, 1, or 0')
sys.exit(2)
# If URLs are provided, don't include default watches # If URLs are provided, don't include default watches
if urls_to_add: if urls_to_add:
include_default_watches = False include_default_watches = False
@@ -382,11 +398,6 @@ def main():
logger.critical(str(e)) logger.critical(str(e))
return return
# Apply all_paused setting if specified via CLI
if all_paused is not None:
datastore.data['settings']['application']['all_paused'] = all_paused
logger.info(f"Setting all watches paused: {all_paused}")
# Inject datastore into plugins that need access to settings # Inject datastore into plugins that need access to settings
from changedetectionio.pluggy_interface import inject_datastore_into_plugins from changedetectionio.pluggy_interface import inject_datastore_into_plugins
inject_datastore_into_plugins(datastore) inject_datastore_into_plugins(datastore)
@@ -415,12 +426,12 @@ def main():
# This must happen AFTER app initialization so update_q is available # This must happen AFTER app initialization so update_q is available
if batch_mode and added_watch_uuids: if batch_mode and added_watch_uuids:
from changedetectionio.flask_app import update_q from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData, worker_pool from changedetectionio import queuedWatchMetaData, worker_handler
logger.info(f"Batch mode: Queuing {len(added_watch_uuids)} newly added watches") logger.info(f"Batch mode: Queuing {len(added_watch_uuids)} newly added watches")
for watch_uuid in added_watch_uuids: for watch_uuid in added_watch_uuids:
try: try:
worker_pool.queue_item_async_safe( worker_handler.queue_item_async_safe(
update_q, update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}) queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
) )
@@ -432,7 +443,7 @@ def main():
# This must happen AFTER app initialization so update_q is available # This must happen AFTER app initialization so update_q is available
if recheck_watches is not None: if recheck_watches is not None:
from changedetectionio.flask_app import update_q from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData, worker_pool from changedetectionio import queuedWatchMetaData, worker_handler
watches_to_queue = [] watches_to_queue = []
if recheck_watches == 'all': if recheck_watches == 'all':
@@ -454,7 +465,7 @@ def main():
for watch_uuid in watches_to_queue: for watch_uuid in watches_to_queue:
if watch_uuid in datastore.data['watching']: if watch_uuid in datastore.data['watching']:
try: try:
worker_pool.queue_item_async_safe( worker_handler.queue_item_async_safe(
update_q, update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}) queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
) )
@@ -516,7 +527,7 @@ def main():
for watch_uuid in watches_to_queue: for watch_uuid in watches_to_queue:
if watch_uuid in datastore.data['watching']: if watch_uuid in datastore.data['watching']:
try: try:
worker_pool.queue_item_async_safe( worker_handler.queue_item_async_safe(
update_q, update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}) queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
) )
@@ -549,7 +560,7 @@ def main():
logger.info(f"Batch mode: Waiting for iteration {current_iteration}/{total_iterations} to complete...") logger.info(f"Batch mode: Waiting for iteration {current_iteration}/{total_iterations} to complete...")
# Use the shared wait_for_all_checks function # Use the shared wait_for_all_checks function
completed = worker_pool.wait_for_all_checks(update_q, timeout=300) completed = worker_handler.wait_for_all_checks(update_q, timeout=300)
if not completed: if not completed:
logger.warning(f"Batch mode: Iteration {current_iteration} timed out after 300 seconds") logger.warning(f"Batch mode: Iteration {current_iteration} timed out after 300 seconds")
@@ -642,14 +653,7 @@ def main():
if os.getenv('USE_X_SETTINGS'): if os.getenv('USE_X_SETTINGS'):
logger.info("USE_X_SETTINGS is ENABLED") logger.info("USE_X_SETTINGS is ENABLED")
from werkzeug.middleware.proxy_fix import ProxyFix from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix( app.wsgi_app = ProxyFix(app.wsgi_app, x_prefix=1, x_host=1)
app.wsgi_app,
x_for=1, # X-Forwarded-For (client IP)
x_proto=1, # X-Forwarded-Proto (http/https)
x_host=1, # X-Forwarded-Host (original host)
x_port=1, # X-Forwarded-Port (original port)
x_prefix=1 # X-Forwarded-Prefix (URL prefix)
)
# In batch mode, skip starting the HTTP server - just keep workers running # In batch mode, skip starting the HTTP server - just keep workers running
+3 -13
View File
@@ -1,5 +1,5 @@
from changedetectionio import queuedWatchMetaData from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_pool from changedetectionio import worker_handler
from flask_expects_json import expects_json from flask_expects_json import expects_json
from flask_restful import abort, Resource from flask_restful import abort, Resource
from loguru import logger from loguru import logger
@@ -42,7 +42,7 @@ class Tag(Resource):
# If less than 20 watches, queue synchronously for immediate feedback # If less than 20 watches, queue synchronously for immediate feedback
if len(watches_to_queue) < 20: if len(watches_to_queue) < 20:
for watch_uuid in watches_to_queue: for watch_uuid in watches_to_queue:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})) worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
return {'status': f'OK, queued {len(watches_to_queue)} watches for rechecking'}, 200 return {'status': f'OK, queued {len(watches_to_queue)} watches for rechecking'}, 200
else: else:
# 20+ watches - queue in background thread to avoid blocking API response # 20+ watches - queue in background thread to avoid blocking API response
@@ -50,7 +50,7 @@ class Tag(Resource):
"""Background thread to queue watches - discarded after completion.""" """Background thread to queue watches - discarded after completion."""
try: try:
for watch_uuid in watches_to_queue: for watch_uuid in watches_to_queue:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})) worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
logger.info(f"Background queueing complete for tag {tag['uuid']}: {len(watches_to_queue)} watches queued") logger.info(f"Background queueing complete for tag {tag['uuid']}: {len(watches_to_queue)} watches queued")
except Exception as e: except Exception as e:
logger.error(f"Error in background queueing for tag {tag['uuid']}: {e}") logger.error(f"Error in background queueing for tag {tag['uuid']}: {e}")
@@ -96,16 +96,6 @@ class Tag(Resource):
if not tag: if not tag:
abort(404, message='No tag exists with the UUID of {}'.format(uuid)) abort(404, message='No tag exists with the UUID of {}'.format(uuid))
# Validate notification_urls if provided
if 'notification_urls' in request.json:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = request.json.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
tag.update(request.json) tag.update(request.json)
self.datastore.needs_write_urgent = True self.datastore.needs_write_urgent = True
+55 -45
View File
@@ -6,7 +6,7 @@ from changedetectionio.favicon_utils import get_favicon_mime_type
from . import auth from . import auth
from changedetectionio import queuedWatchMetaData, strtobool from changedetectionio import queuedWatchMetaData, strtobool
from changedetectionio import worker_pool from changedetectionio import worker_handler
from flask import request, make_response, send_from_directory from flask import request, make_response, send_from_directory
from flask_expects_json import expects_json from flask_expects_json import expects_json
from flask_restful import abort, Resource from flask_restful import abort, Resource
@@ -85,7 +85,7 @@ class Watch(Resource):
abort(404, message='No watch exists with the UUID of {}'.format(uuid)) abort(404, message='No watch exists with the UUID of {}'.format(uuid))
if request.args.get('recheck'): if request.args.get('recheck'):
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return "OK", 200 return "OK", 200
if request.args.get('paused', '') == 'paused': if request.args.get('paused', '') == 'paused':
self.datastore.data['watching'].get(uuid).pause() self.datastore.data['watching'].get(uuid).pause()
@@ -140,16 +140,6 @@ class Watch(Resource):
if validation_error: if validation_error:
return validation_error, 400 return validation_error, 400
# Validate notification_urls if provided
if 'notification_urls' in request.json:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = request.json.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
# XSS etc protection - validate URL if it's being updated # XSS etc protection - validate URL if it's being updated
if 'url' in request.json: if 'url' in request.json:
new_url = request.json.get('url') new_url = request.json.get('url')
@@ -169,18 +159,58 @@ class Watch(Resource):
# Handle processor-config-* fields separately (save to JSON, not datastore) # Handle processor-config-* fields separately (save to JSON, not datastore)
from changedetectionio import processors from changedetectionio import processors
processor_config_data = {}
regular_data = {}
# Make a mutable copy of request.json for modification for key, value in request.json.items():
json_data = dict(request.json) if key.startswith('processor_config_'):
config_key = key.replace('processor_config_', '')
# Extract and remove processor config fields from json_data if value: # Only save non-empty values
processor_config_data = processors.extract_processor_config_from_form_data(json_data) processor_config_data[config_key] = value
else:
regular_data[key] = value
# Update watch with regular (non-processor-config) fields # Update watch with regular (non-processor-config) fields
watch.update(json_data) watch.update(regular_data)
# Save processor config to JSON file # Save processor config to JSON file if any config data exists
processors.save_processor_config(self.datastore, uuid, processor_config_data) if processor_config_data:
try:
processor_name = request.json.get('processor', watch.get('processor'))
if processor_name:
# Create a processor instance to access config methods
from changedetectionio.processors import difference_detection_processor
processor_instance = difference_detection_processor(self.datastore, uuid)
# Use processor name as filename so each processor keeps its own config
config_filename = f'{processor_name}.json'
processor_instance.update_extra_watch_config(config_filename, processor_config_data)
logger.debug(f"API: Saved processor config to {config_filename}: {processor_config_data}")
# Call optional edit_hook if processor has one
try:
import importlib
edit_hook_module_name = f'changedetectionio.processors.{processor_name}.edit_hook'
try:
edit_hook = importlib.import_module(edit_hook_module_name)
logger.debug(f"API: Found edit_hook module for {processor_name}")
if hasattr(edit_hook, 'on_config_save'):
logger.info(f"API: Calling edit_hook.on_config_save for {processor_name}")
# Call hook and get updated config
updated_config = edit_hook.on_config_save(watch, processor_config_data, self.datastore)
# Save updated config back to file
processor_instance.update_extra_watch_config(config_filename, updated_config)
logger.info(f"API: Edit hook updated config: {updated_config}")
else:
logger.debug(f"API: Edit hook module found but no on_config_save function")
except ModuleNotFoundError:
logger.debug(f"API: No edit_hook module for processor {processor_name} (this is normal)")
except Exception as hook_error:
logger.error(f"API: Edit hook error (non-fatal): {hook_error}", exc_info=True)
except Exception as e:
logger.error(f"API: Failed to save processor config: {e}")
return "OK", 200 return "OK", 200
@@ -414,16 +444,6 @@ class CreateWatch(Resource):
if validation_error: if validation_error:
return validation_error, 400 return validation_error, 400
# Validate notification_urls if provided
if 'notification_urls' in json_data:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = json_data.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
extras = copy.deepcopy(json_data) extras = copy.deepcopy(json_data)
# Because we renamed 'tag' to 'tags' but don't want to change the API (can do this in v2 of the API) # Because we renamed 'tag' to 'tags' but don't want to change the API (can do this in v2 of the API)
@@ -437,19 +457,9 @@ class CreateWatch(Resource):
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags) new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags)
if new_uuid: if new_uuid:
# Dont queue because the scheduler will check that it hasnt been checked before anyway # Dont queue because the scheduler will check that it hasnt been checked before anyway
# worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid})) # worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
return {'uuid': new_uuid}, 201 return {'uuid': new_uuid}, 201
else: else:
# Check if it was a limit issue
page_watch_limit = os.getenv('PAGE_WATCH_LIMIT')
if page_watch_limit:
try:
page_watch_limit = int(page_watch_limit)
current_watch_count = len(self.datastore.data['watching'])
if current_watch_count >= page_watch_limit:
return f"Watch limit reached ({current_watch_count}/{page_watch_limit} watches). Cannot add more watches.", 429
except ValueError:
pass
return "Invalid or unsupported URL", 400 return "Invalid or unsupported URL", 400
@auth.check_token @auth.check_token
@@ -484,7 +494,7 @@ class CreateWatch(Resource):
if len(watches_to_queue) < 20: if len(watches_to_queue) < 20:
# Get already queued/running UUIDs once (efficient) # Get already queued/running UUIDs once (efficient)
queued_uuids = set(self.update_q.get_queued_uuids()) queued_uuids = set(self.update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids()) running_uuids = set(worker_handler.get_running_uuids())
# Filter out watches that are already queued or running # Filter out watches that are already queued or running
watches_to_queue_filtered = [ watches_to_queue_filtered = [
@@ -494,7 +504,7 @@ class CreateWatch(Resource):
# Queue only the filtered watches # Queue only the filtered watches
for uuid in watches_to_queue_filtered: for uuid in watches_to_queue_filtered:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Provide feedback about skipped watches # Provide feedback about skipped watches
skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered) skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered)
@@ -506,7 +516,7 @@ class CreateWatch(Resource):
# 20+ watches - queue in background thread to avoid blocking API response # 20+ watches - queue in background thread to avoid blocking API response
# Capture queued/running state before background thread # Capture queued/running state before background thread
queued_uuids = set(self.update_q.get_queued_uuids()) queued_uuids = set(self.update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids()) running_uuids = set(worker_handler.get_running_uuids())
def queue_all_watches_background(): def queue_all_watches_background():
"""Background thread to queue all watches - discarded after completion.""" """Background thread to queue all watches - discarded after completion."""
@@ -516,7 +526,7 @@ class CreateWatch(Resource):
for uuid in watches_to_queue: for uuid in watches_to_queue:
# Check if already queued or running (state captured at start) # Check if already queued or running (state captured at start)
if uuid not in queued_uuids and uuid not in running_uuids: if uuid not in queued_uuids and uuid not in running_uuids:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
queued_count += 1 queued_count += 1
else: else:
skipped_count += 1 skipped_count += 1
@@ -3,9 +3,7 @@ from .processors.exceptions import ProcessorException
import changedetectionio.content_fetchers.exceptions as content_fetchers_exceptions import changedetectionio.content_fetchers.exceptions as content_fetchers_exceptions
from changedetectionio.processors.text_json_diff.processor import FilterNotFoundInResponse from changedetectionio.processors.text_json_diff.processor import FilterNotFoundInResponse
from changedetectionio import html_tools from changedetectionio import html_tools
from changedetectionio import worker_pool
from changedetectionio.flask_app import watch_check_update from changedetectionio.flask_app import watch_check_update
from changedetectionio.queuedWatchMetaData import PrioritizedItem
import asyncio import asyncio
import importlib import importlib
@@ -48,33 +46,19 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
jobs_processed = 0 jobs_processed = 0
start_time = time.time() start_time = time.time()
# Log thread name for debugging logger.info(f"Starting async worker {worker_id} (max_jobs={max_jobs}, max_runtime={max_runtime_seconds}s)")
import threading
thread_name = threading.current_thread().name
logger.info(f"Starting async worker {worker_id} on thread '{thread_name}' (max_jobs={max_jobs}, max_runtime={max_runtime_seconds}s)")
while not app.config.exit.is_set(): while not app.config.exit.is_set():
update_handler = None update_handler = None
watch = None watch = None
try: try:
# Efficient blocking via run_in_executor (no polling overhead!) # Use sync interface via run_in_executor since each worker has its own event loop
# Worker blocks in threading.Queue.get() which uses Condition.wait() loop = asyncio.get_event_loop()
# Executor must be sized to match worker count (see worker_pool.py: 50 threads default) queued_item_data = await asyncio.wait_for(
# Single timeout (no double-timeout wrapper) = no race condition loop.run_in_executor(executor, q.get, True, 1.0), # block=True, timeout=1.0
queued_item_data = await q.async_get(executor=executor, timeout=1.0) timeout=1.5
)
# CRITICAL: Claim UUID immediately after getting from queue to prevent race condition
# in wait_for_all_checks() which checks qsize() and running_uuids separately
uuid = queued_item_data.item.get('uuid')
if not worker_pool.claim_uuid_for_processing(uuid, worker_id):
# Already being processed - re-queue and continue
logger.trace(f"Worker {worker_id} detected UUID {uuid} already processing during claim - deferring")
await asyncio.sleep(DEFER_SLEEP_TIME_ALREADY_QUEUED)
deferred_priority = max(1000, queued_item_data.priority * 10)
deferred_item = PrioritizedItem(priority=deferred_priority, item=queued_item_data.item)
worker_pool.queue_item_async_safe(q, deferred_item, silent=True)
continue
except asyncio.TimeoutError: except asyncio.TimeoutError:
# No jobs available - check if we should restart based on time while idle # No jobs available - check if we should restart based on time while idle
@@ -83,17 +67,6 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
logger.info(f"Worker {worker_id} idle and reached max runtime ({runtime:.0f}s), restarting") logger.info(f"Worker {worker_id} idle and reached max runtime ({runtime:.0f}s), restarting")
return "restart" return "restart"
continue continue
except RuntimeError as e:
# Handle executor shutdown gracefully - this is expected during shutdown
if "cannot schedule new futures after shutdown" in str(e):
# Executor shut down - exit gracefully without logging in pytest
if not IN_PYTEST:
logger.debug(f"Worker {worker_id} detected executor shutdown, exiting")
break
# Other RuntimeError - log and continue
logger.error(f"Worker {worker_id} runtime error: {e}")
await asyncio.sleep(0.1)
continue
except Exception as e: except Exception as e:
# Handle expected Empty exception from queue timeout # Handle expected Empty exception from queue timeout
import queue import queue
@@ -115,8 +88,26 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
continue continue
# UUID already claimed above immediately after getting from queue uuid = queued_item_data.item.get('uuid')
# to prevent race condition with wait_for_all_checks()
# RACE CONDITION FIX: Atomically claim this UUID for processing
from changedetectionio import worker_handler
from changedetectionio.queuedWatchMetaData import PrioritizedItem
# Try to claim the UUID atomically - prevents duplicate processing
if not worker_handler.claim_uuid_for_processing(uuid, worker_id):
# Already being processed by another worker
logger.trace(f"Worker {worker_id} detected UUID {uuid} already being processed - deferring")
# Sleep to avoid tight loop and give the other worker time to finish
await asyncio.sleep(DEFER_SLEEP_TIME_ALREADY_QUEUED)
# Re-queue with lower priority so it gets checked again after current processing finishes
deferred_priority = max(1000, queued_item_data.priority * 10)
deferred_item = PrioritizedItem(priority=deferred_priority, item=queued_item_data.item)
worker_handler.queue_item_async_safe(q, deferred_item, silent=True)
logger.debug(f"Worker {worker_id} re-queued UUID {uuid} for subsequent check")
continue
fetch_start_time = round(time.time()) fetch_start_time = round(time.time())
@@ -142,14 +133,11 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
processor = watch.get('processor', 'text_json_diff') processor = watch.get('processor', 'text_json_diff')
# Init a new 'difference_detection_processor' # Init a new 'difference_detection_processor'
# Use get_processor_module() to support both built-in and plugin processors try:
from changedetectionio.processors import get_processor_module processor_module = importlib.import_module(f"changedetectionio.processors.{processor}.processor")
processor_module = get_processor_module(processor) except ModuleNotFoundError as e:
print(f"Processor module '{processor}' not found.")
if not processor_module: raise e
error_msg = f"Processor module '{processor}' not found."
logger.error(error_msg)
raise ModuleNotFoundError(error_msg)
update_handler = processor_module.perform_site_check(datastore=datastore, update_handler = processor_module.perform_site_check(datastore=datastore,
watch_uuid=uuid) watch_uuid=uuid)
@@ -236,7 +224,6 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
except FilterNotFoundInResponse as e: except FilterNotFoundInResponse as e:
if not datastore.data['watching'].get(uuid): if not datastore.data['watching'].get(uuid):
continue continue
logger.debug(f"Received FilterNotFoundInResponse exception for {uuid}")
err_text = "Warning, no filters were found, no change detection ran - Did the page change layout? update your Visual Filter if necessary." err_text = "Warning, no filters were found, no change detection ran - Did the page change layout? update your Visual Filter if necessary."
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text}) datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text})
@@ -256,19 +243,17 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
c += 1 c += 1
# Send notification if we reached the threshold? # Send notification if we reached the threshold?
threshold = datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts', 0) threshold = datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts', 0)
logger.debug(f"FilterNotFoundInResponse - Filter for {uuid} not found, consecutive_filter_failures: {c} of threshold {threshold}") logger.debug(f"Filter for {uuid} not found, consecutive_filter_failures: {c} of threshold {threshold}")
if c >= threshold: if c >= threshold:
if not watch.get('notification_muted'): if not watch.get('notification_muted'):
logger.debug(f"FilterNotFoundInResponse - Sending filter failed notification for {uuid}") logger.debug(f"Sending filter failed notification for {uuid}")
await send_filter_failure_notification(uuid, notification_q, datastore) await send_filter_failure_notification(uuid, notification_q, datastore)
c = 0 c = 0
logger.debug(f"FilterNotFoundInResponse - Reset filter failure count back to zero") logger.debug(f"Reset filter failure count back to zero")
else:
logger.debug(f"FilterNotFoundInResponse - {c} of threshold {threshold}..")
datastore.update_watch(uuid=uuid, update_obj={'consecutive_filter_failures': c}) datastore.update_watch(uuid=uuid, update_obj={'consecutive_filter_failures': c})
else: else:
logger.trace(f"FilterNotFoundInResponse - {uuid} - filter_failure_notification_send not enabled, skipping") logger.trace(f"{uuid} - filter_failure_notification_send not enabled, skipping")
process_changedetection_results = False process_changedetection_results = False
@@ -368,10 +353,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
logger.error(f"Exception (BrowserStepsInUnsupportedFetcher) reached processing watch UUID: {uuid}") logger.error(f"Exception (BrowserStepsInUnsupportedFetcher) reached processing watch UUID: {uuid}")
except Exception as e: except Exception as e:
import traceback
logger.error(f"Worker {worker_id} exception processing watch UUID: {uuid}") logger.error(f"Worker {worker_id} exception processing watch UUID: {uuid}")
logger.error(str(e)) logger.error(str(e))
logger.error(traceback.format_exc())
datastore.update_watch(uuid=uuid, update_obj={'last_error': "Exception: " + str(e)}) datastore.update_watch(uuid=uuid, update_obj={'last_error': "Exception: " + str(e)})
process_changedetection_results = False process_changedetection_results = False
@@ -390,8 +373,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
if not datastore.data['watching'].get(uuid): if not datastore.data['watching'].get(uuid):
continue continue
logger.debug(f"Processing watch UUID: {uuid} - xpath_data length returned {len(update_handler.xpath_data) if update_handler and update_handler.xpath_data else 'empty.'}") logger.debug(f"Processing watch UUID: {uuid} - xpath_data length returned {len(update_handler.xpath_data) if update_handler.xpath_data else 'empty.'}")
if update_handler and process_changedetection_results: if process_changedetection_results:
try: try:
datastore.update_watch(uuid=uuid, update_obj=update_obj) datastore.update_watch(uuid=uuid, update_obj=update_obj)
@@ -441,44 +424,44 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
# Always record attempt count # Always record attempt count
count = watch.get('check_count', 0) + 1 count = watch.get('check_count', 0) + 1
if update_handler: # Could be none or empty if the processor was not found
# Always record page title (used in notifications, and can change even when the content is the same)
if update_obj.get('content-type') and 'html' in update_obj.get('content-type'):
try:
page_title = html_tools.extract_title(data=update_handler.fetcher.content)
if page_title:
page_title = page_title.strip()[:2000]
logger.debug(f"UUID: {uuid} Page <title> is '{page_title}'")
datastore.update_watch(uuid=uuid, update_obj={'page_title': page_title})
except Exception as e:
logger.warning(f"UUID: {uuid} Exception when extracting <title> - {str(e)}")
# Record server header # Always record page title (used in notifications, and can change even when the content is the same)
if update_obj.get('content-type') and 'html' in update_obj.get('content-type'):
try: try:
server_header = update_handler.fetcher.headers.get('server', '').strip().lower()[:255] page_title = html_tools.extract_title(data=update_handler.fetcher.content)
datastore.update_watch(uuid=uuid, update_obj={'remote_server_reply': server_header}) if page_title:
page_title = page_title.strip()[:2000]
logger.debug(f"UUID: {uuid} Page <title> is '{page_title}'")
datastore.update_watch(uuid=uuid, update_obj={'page_title': page_title})
except Exception as e: except Exception as e:
pass logger.warning(f"UUID: {uuid} Exception when extracting <title> - {str(e)}")
# Store favicon if necessary # Record server header
if update_handler.fetcher.favicon_blob and update_handler.fetcher.favicon_blob.get('base64'): try:
watch.bump_favicon(url=update_handler.fetcher.favicon_blob.get('url'), server_header = update_handler.fetcher.headers.get('server', '').strip().lower()[:255]
favicon_base_64=update_handler.fetcher.favicon_blob.get('base64') datastore.update_watch(uuid=uuid, update_obj={'remote_server_reply': server_header})
) except Exception as e:
pass
datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - fetch_start_time, 3), # Store favicon if necessary
'check_count': count}) if update_handler.fetcher.favicon_blob and update_handler.fetcher.favicon_blob.get('base64'):
watch.bump_favicon(url=update_handler.fetcher.favicon_blob.get('url'),
favicon_base_64=update_handler.fetcher.favicon_blob.get('base64')
)
# NOW clear fetcher content - after all processing is complete datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - fetch_start_time, 3),
# This is the last point where we need the fetcher data 'check_count': count})
if update_handler and hasattr(update_handler, 'fetcher') and update_handler.fetcher:
update_handler.fetcher.clear_content()
logger.debug(f"Cleared fetcher content for UUID {uuid}")
# Explicitly delete update_handler to free all references # NOW clear fetcher content - after all processing is complete
if update_handler: # This is the last point where we need the fetcher data
del update_handler if update_handler and hasattr(update_handler, 'fetcher') and update_handler.fetcher:
update_handler = None update_handler.fetcher.clear_content()
logger.debug(f"Cleared fetcher content for UUID {uuid}")
# Explicitly delete update_handler to free all references
if update_handler:
del update_handler
update_handler = None
# Force aggressive memory cleanup after clearing # Force aggressive memory cleanup after clearing
import gc import gc
@@ -490,9 +473,6 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
pass pass
except Exception as e: except Exception as e:
import traceback
logger.error(traceback.format_exc())
logger.error(f"Worker {worker_id} unexpected error processing {uuid}: {e}") logger.error(f"Worker {worker_id} unexpected error processing {uuid}: {e}")
logger.error(f"Worker {worker_id} traceback:", exc_info=True) logger.error(f"Worker {worker_id} traceback:", exc_info=True)
@@ -510,7 +490,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
logger.error(f"Exception while cleaning/quit after calling browser: {e}") logger.error(f"Exception while cleaning/quit after calling browser: {e}")
try: try:
# Release UUID from processing (thread-safe) # Release UUID from processing (thread-safe)
worker_pool.release_uuid_from_processing(uuid, worker_id=worker_id) worker_handler.release_uuid_from_processing(uuid, worker_id=worker_id)
# Send completion signal # Send completion signal
if watch: if watch:
@@ -14,7 +14,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
from changedetectionio import forms from changedetectionio import forms
# #
if request.method == 'POST': if request.method == 'POST':
# from changedetectionio import worker_pool # from changedetectionio import worker_handler
from changedetectionio.blueprint.imports.importer import ( from changedetectionio.blueprint.imports.importer import (
import_url_list, import_url_list,
@@ -26,13 +26,12 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# URL List import # URL List import
if request.values.get('urls') and len(request.values.get('urls').strip()): if request.values.get('urls') and len(request.values.get('urls').strip()):
# Import and push into the queue for immediate update check # Import and push into the queue for immediate update check
from changedetectionio import processors
importer_handler = import_url_list() importer_handler = import_url_list()
importer_handler.run(data=request.values.get('urls'), flash=flash, datastore=datastore, processor=request.values.get('processor', processors.get_default_processor())) importer_handler.run(data=request.values.get('urls'), flash=flash, datastore=datastore, processor=request.values.get('processor', 'text_json_diff'))
logger.debug(f"Imported {len(importer_handler.new_uuids)} new UUIDs") logger.debug(f"Imported {len(importer_handler.new_uuids)} new UUIDs")
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue # Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids: # for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) # worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
if len(importer_handler.remaining_data) == 0: if len(importer_handler.remaining_data) == 0:
return redirect(url_for('watchlist.index')) return redirect(url_for('watchlist.index'))
@@ -46,7 +45,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
d_importer.run(data=request.values.get('distill-io'), flash=flash, datastore=datastore) d_importer.run(data=request.values.get('distill-io'), flash=flash, datastore=datastore)
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue # Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids: # for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) # worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# XLSX importer # XLSX importer
@@ -71,7 +70,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue # Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids: # for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) # worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Could be some remaining, or we could be on GET # Could be some remaining, or we could be on GET
@@ -4,7 +4,7 @@ from flask import Blueprint, flash, redirect, url_for
from flask_login import login_required from flask_login import login_required
from changedetectionio.store import ChangeDetectionStore from changedetectionio.store import ChangeDetectionStore
from changedetectionio import queuedWatchMetaData from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_pool from changedetectionio import worker_handler
from queue import PriorityQueue from queue import PriorityQueue
PRICE_DATA_TRACK_ACCEPT = 'accepted' PRICE_DATA_TRACK_ACCEPT = 'accepted'
@@ -20,7 +20,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q: PriorityQueue
datastore.data['watching'][uuid]['track_ldjson_price_data'] = PRICE_DATA_TRACK_ACCEPT datastore.data['watching'][uuid]['track_ldjson_price_data'] = PRICE_DATA_TRACK_ACCEPT
datastore.data['watching'][uuid]['processor'] = 'restock_diff' datastore.data['watching'][uuid]['processor'] = 'restock_diff'
datastore.data['watching'][uuid].clear_watch() datastore.data['watching'][uuid].clear_watch()
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return redirect(url_for("watchlist.index")) return redirect(url_for("watchlist.index"))
@login_required @login_required
@@ -37,8 +37,6 @@ def construct_single_watch_routes(rss_blueprint, datastore):
rss_content_format = datastore.data['settings']['application'].get('rss_content_format') rss_content_format = datastore.data['settings']['application'].get('rss_content_format')
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# Get the watch by UUID # Get the watch by UUID
watch = datastore.data['watching'].get(uuid) watch = datastore.data['watching'].get(uuid)
if not watch: if not watch:
@@ -83,7 +83,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Adjust worker count if it changed # Adjust worker count if it changed
if new_worker_count != old_worker_count: if new_worker_count != old_worker_count:
from changedetectionio import worker_pool from changedetectionio import worker_handler
from changedetectionio.flask_app import update_q, notification_q, app, datastore as ds from changedetectionio.flask_app import update_q, notification_q, app, datastore as ds
# Check CPU core availability and warn if worker count is high # Check CPU core availability and warn if worker count is high
@@ -92,7 +92,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
flash(gettext("Warning: Worker count ({}) is close to or exceeds available CPU cores ({})").format( flash(gettext("Warning: Worker count ({}) is close to or exceeds available CPU cores ({})").format(
new_worker_count, cpu_count), 'warning') new_worker_count, cpu_count), 'warning')
result = worker_pool.adjust_async_worker_count( result = worker_handler.adjust_async_worker_count(
new_count=new_worker_count, new_count=new_worker_count,
update_q=update_q, update_q=update_q,
notification_q=notification_q, notification_q=notification_q,
@@ -80,16 +80,6 @@
{{ render_checkbox_field(form.application.form.empty_pages_are_a_change) }} {{ render_checkbox_field(form.application.form.empty_pages_are_a_change) }}
<span class="pure-form-message-inline">{{ _('When a request returns no content, or the HTML does not contain any text, is this considered a change?') }}</span> <span class="pure-form-message-inline">{{ _('When a request returns no content, or the HTML does not contain any text, is this considered a change?') }}</span>
</div> </div>
{% if form.requests.proxy %}
<div>
<br>
<div class="inline-radio">
{{ render_field(form.requests.form.proxy, class="fetch-backend-proxy") }}
<span class="pure-form-message-inline">{{ _('Choose a default proxy for all watches') }}</span>
</div>
</div>
{% endif %}
</fieldset> </fieldset>
</div> </div>
@@ -350,6 +340,15 @@ nav
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }} {{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }}
<span class="pure-form-message-inline">{{ _('"Name" will be used for selecting the proxy in the Watch Edit settings') }}</span><br> <span class="pure-form-message-inline">{{ _('"Name" will be used for selecting the proxy in the Watch Edit settings') }}</span><br>
<span class="pure-form-message-inline">{{ _('SOCKS5 proxies with authentication are only supported with \'plain requests\' fetcher, for other fetchers you should whitelist the IP access instead') }}</span> <span class="pure-form-message-inline">{{ _('SOCKS5 proxies with authentication are only supported with \'plain requests\' fetcher, for other fetchers you should whitelist the IP access instead') }}</span>
{% if form.requests.proxy %}
<div>
<br>
<div class="inline-radio">
{{ render_field(form.requests.form.proxy, class="fetch-backend-proxy") }}
<span class="pure-form-message-inline">{{ _('Choose a default proxy for all watches') }}</span>
</div>
</div>
{% endif %}
</div> </div>
<div class="pure-control-group" id="extra-browsers-setting"> <div class="pure-control-group" id="extra-browsers-setting">
<p> <p>
+18 -15
View File
@@ -10,7 +10,7 @@ from changedetectionio.blueprint.ui.notification import construct_blueprint as c
from changedetectionio.blueprint.ui.views import construct_blueprint as construct_views_blueprint from changedetectionio.blueprint.ui.views import construct_blueprint as construct_views_blueprint
from changedetectionio.blueprint.ui import diff, preview from changedetectionio.blueprint.ui import diff, preview
def _handle_operations(op, uuids, datastore, worker_pool, update_q, queuedWatchMetaData, watch_check_update, extra_data=None, emit_flash=True): def _handle_operations(op, uuids, datastore, worker_handler, update_q, queuedWatchMetaData, watch_check_update, extra_data=None, emit_flash=True):
from flask import request, flash from flask import request, flash
if op == 'delete': if op == 'delete':
@@ -63,7 +63,7 @@ def _handle_operations(op, uuids, datastore, worker_pool, update_q, queuedWatchM
for uuid in uuids: for uuid in uuids:
if datastore.data['watching'].get(uuid): if datastore.data['watching'].get(uuid):
# Recheck and require a full reprocessing # Recheck and require a full reprocessing
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
if emit_flash: if emit_flash:
flash(gettext("{} watches queued for rechecking").format(len(uuids))) flash(gettext("{} watches queued for rechecking").format(len(uuids)))
@@ -114,7 +114,7 @@ def _handle_operations(op, uuids, datastore, worker_pool, update_q, queuedWatchM
for uuid in uuids: for uuid in uuids:
watch_check_update.send(watch_uuid=uuid) watch_check_update.send(watch_uuid=uuid)
def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool, queuedWatchMetaData, watch_check_update): def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handler, queuedWatchMetaData, watch_check_update):
ui_blueprint = Blueprint('ui', __name__, template_folder="templates") ui_blueprint = Blueprint('ui', __name__, template_folder="templates")
# Register the edit blueprint # Register the edit blueprint
@@ -222,14 +222,14 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
@login_optionally_required @login_optionally_required
def form_delete(): def form_delete():
uuid = request.args.get('uuid') uuid = request.args.get('uuid')
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
if uuid != 'all' and not uuid in datastore.data['watching'].keys(): if uuid != 'all' and not uuid in datastore.data['watching'].keys():
flash(gettext('The watch by UUID {} does not exist.').format(uuid), 'error') flash(gettext('The watch by UUID {} does not exist.').format(uuid), 'error')
return redirect(url_for('watchlist.index')) return redirect(url_for('watchlist.index'))
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
datastore.delete(uuid) datastore.delete(uuid)
flash(gettext('Deleted.')) flash(gettext('Deleted.'))
@@ -239,14 +239,14 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
@login_optionally_required @login_optionally_required
def form_clone(): def form_clone():
uuid = request.args.get('uuid') uuid = request.args.get('uuid')
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
new_uuid = datastore.clone(uuid) new_uuid = datastore.clone(uuid)
if not datastore.data['watching'].get(uuid).get('paused'): if not datastore.data['watching'].get(uuid).get('paused'):
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid}))
flash(gettext('Cloned, you are editing the new watch.')) flash(gettext('Cloned, you are editing the new watch.'))
@@ -262,10 +262,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
if uuid: if uuid:
# Single watch - check if already queued or running # Single watch - check if already queued or running
if worker_pool.is_watch_running(uuid) or uuid in update_q.get_queued_uuids(): if worker_handler.is_watch_running(uuid) or uuid in update_q.get_queued_uuids():
flash(gettext("Watch is already queued or being checked.")) flash(gettext("Watch is already queued or being checked."))
else: else:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
flash(gettext("Queued 1 watch for rechecking.")) flash(gettext("Queued 1 watch for rechecking."))
else: else:
# Multiple watches - first count how many need to be queued # Multiple watches - first count how many need to be queued
@@ -284,7 +284,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
if len(watches_to_queue) < 20: if len(watches_to_queue) < 20:
# Get already queued/running UUIDs once (efficient) # Get already queued/running UUIDs once (efficient)
queued_uuids = set(update_q.get_queued_uuids()) queued_uuids = set(update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids()) running_uuids = set(worker_handler.get_running_uuids())
# Filter out watches that are already queued or running # Filter out watches that are already queued or running
watches_to_queue_filtered = [] watches_to_queue_filtered = []
@@ -294,7 +294,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
# Queue only the filtered watches # Queue only the filtered watches
for watch_uuid in watches_to_queue_filtered: for watch_uuid in watches_to_queue_filtered:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
# Provide feedback about skipped watches # Provide feedback about skipped watches
skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered) skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered)
@@ -310,7 +310,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
# 20+ watches - queue in background thread to avoid blocking HTTP response # 20+ watches - queue in background thread to avoid blocking HTTP response
# Capture queued/running state before background thread # Capture queued/running state before background thread
queued_uuids = set(update_q.get_queued_uuids()) queued_uuids = set(update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids()) running_uuids = set(worker_handler.get_running_uuids())
def queue_watches_background(): def queue_watches_background():
"""Background thread to queue watches - discarded after completion.""" """Background thread to queue watches - discarded after completion."""
@@ -320,7 +320,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
for watch_uuid in watches_to_queue: for watch_uuid in watches_to_queue:
# Check if already queued or running (state captured at start) # Check if already queued or running (state captured at start)
if watch_uuid not in queued_uuids and watch_uuid not in running_uuids: if watch_uuid not in queued_uuids and watch_uuid not in running_uuids:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
queued_count += 1 queued_count += 1
else: else:
skipped_count += 1 skipped_count += 1
@@ -349,7 +349,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
extra_data=extra_data, extra_data=extra_data,
queuedWatchMetaData=queuedWatchMetaData, queuedWatchMetaData=queuedWatchMetaData,
uuids=uuids, uuids=uuids,
worker_pool=worker_pool, worker_handler=worker_handler,
update_q=update_q, update_q=update_q,
watch_check_update=watch_check_update, watch_check_update=watch_check_update,
op=op, op=op,
@@ -367,6 +367,9 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
import json import json
from copy import deepcopy from copy import deepcopy
# more for testing
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# copy it to memory as trim off what we dont need (history) # copy it to memory as trim off what we dont need (history)
watch = deepcopy(datastore.data['watching'].get(uuid)) watch = deepcopy(datastore.data['watching'].get(uuid))
+82 -70
View File
@@ -83,6 +83,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
If a processor doesn't have a difference module, falls back to text_json_diff. If a processor doesn't have a difference module, falls back to text_json_diff.
""" """
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
@@ -100,21 +101,23 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's difference module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's difference module
processor_module = get_processor_submodule(processor_name, 'difference') processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.difference')
# Call the processor's render() function # Call the processor's render() function
if processor_module and hasattr(processor_module, 'render'): if hasattr(processor_module, 'render'):
return processor_module.render( return processor_module.render(
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request, request=request,
url_for=url_for, url_for=url_for,
render_template=render_template, render_template=render_template,
flash=flash, flash=flash,
redirect=redirect redirect=redirect
) )
except (ImportError, ModuleNotFoundError) as e:
logger.warning(f"Processor {processor_name} does not have a difference module, falling back to text_json_diff: {e}")
# Fallback: if processor doesn't have difference module, use text_json_diff as default # Fallback: if processor doesn't have difference module, use text_json_diff as default
from changedetectionio.processors.text_json_diff.difference import render as default_render from changedetectionio.processors.text_json_diff.difference import render as default_render
@@ -141,10 +144,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
Each processor implements processors/{type}/extract.py::render_form() Each processor implements processors/{type}/extract.py::render_form()
If a processor doesn't have an extract module, falls back to text_json_diff. If a processor doesn't have an extract module, falls back to text_json_diff.
""" """
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
try: try:
watch = datastore.data['watching'][uuid] watch = datastore.data['watching'][uuid]
except KeyError: except KeyError:
@@ -154,21 +157,23 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's extract module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's extract module
processor_module = get_processor_submodule(processor_name, 'extract') processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.extract')
# Call the processor's render_form() function # Call the processor's render_form() function
if processor_module and hasattr(processor_module, 'render_form'): if hasattr(processor_module, 'render_form'):
return processor_module.render_form( return processor_module.render_form(
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request, request=request,
url_for=url_for, url_for=url_for,
render_template=render_template, render_template=render_template,
flash=flash, flash=flash,
redirect=redirect redirect=redirect
) )
except (ImportError, ModuleNotFoundError) as e:
logger.warning(f"Processor {processor_name} does not have an extract module, falling back to base extractor: {e}")
# Fallback: if processor doesn't have extract module, use base processors.extract as default # Fallback: if processor doesn't have extract module, use base processors.extract as default
from changedetectionio.processors.extract import render_form as default_render_form from changedetectionio.processors.extract import render_form as default_render_form
@@ -195,7 +200,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
Each processor implements processors/{type}/extract.py::process_extraction() Each processor implements processors/{type}/extract.py::process_extraction()
If a processor doesn't have an extract module, falls back to text_json_diff. If a processor doesn't have an extract module, falls back to text_json_diff.
""" """
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
@@ -208,22 +213,24 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's extract module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's extract module
processor_module = get_processor_submodule(processor_name, 'extract') processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.extract')
# Call the processor's process_extraction() function # Call the processor's process_extraction() function
if processor_module and hasattr(processor_module, 'process_extraction'): if hasattr(processor_module, 'process_extraction'):
return processor_module.process_extraction( return processor_module.process_extraction(
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request, request=request,
url_for=url_for, url_for=url_for,
make_response=make_response, make_response=make_response,
send_from_directory=send_from_directory, send_from_directory=send_from_directory,
flash=flash, flash=flash,
redirect=redirect redirect=redirect
) )
except (ImportError, ModuleNotFoundError) as e:
logger.warning(f"Processor {processor_name} does not have an extract module, falling back to base extractor: {e}")
# Fallback: if processor doesn't have extract module, use base processors.extract as default # Fallback: if processor doesn't have extract module, use base processors.extract as default
from changedetectionio.processors.extract import process_extraction as default_process_extraction from changedetectionio.processors.extract import process_extraction as default_process_extraction
@@ -260,7 +267,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
- /diff/{uuid}/processor-asset/after - /diff/{uuid}/processor-asset/after
- /diff/{uuid}/processor-asset/rendered_diff - /diff/{uuid}/processor-asset/rendered_diff
""" """
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
@@ -273,33 +280,38 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's difference module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's difference module
processor_module = get_processor_submodule(processor_name, 'difference') processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.difference')
# Call the processor's get_asset() function # Call the processor's get_asset() function
if processor_module and hasattr(processor_module, 'get_asset'): if hasattr(processor_module, 'get_asset'):
result = processor_module.get_asset( result = processor_module.get_asset(
asset_name=asset_name, asset_name=asset_name,
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request request=request
) )
if result is None: if result is None:
from flask import abort
abort(404, description=f"Asset '{asset_name}' not found")
binary_data, content_type, cache_control = result
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort from flask import abort
abort(404, description=f"Asset '{asset_name}' not found") abort(404, description=f"Processor '{processor_name}' does not support assets")
binary_data, content_type, cache_control = result except (ImportError, ModuleNotFoundError) as e:
logger.warning(f"Processor {processor_name} does not have a difference module: {e}")
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort from flask import abort
abort(404, description=f"Processor '{processor_name}' does not support assets") abort(404, description=f"Processor '{processor_name}' not found")
return diff_blueprint return diff_blueprint
+60 -28
View File
@@ -9,7 +9,7 @@ from jinja2 import Environment, FileSystemLoader
from changedetectionio.store import ChangeDetectionStore from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio.time_handler import is_within_schedule from changedetectionio.time_handler import is_within_schedule
from changedetectionio import worker_pool from changedetectionio import worker_handler
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData): def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData):
edit_blueprint = Blueprint('ui_edit', __name__, template_folder="../ui/templates") edit_blueprint = Blueprint('ui_edit', __name__, template_folder="../ui/templates")
@@ -30,13 +30,14 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
from changedetectionio import processors from changedetectionio import processors
import importlib import importlib
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# More for testing, possible to return the first/only # More for testing, possible to return the first/only
if not datastore.data['watching'].keys(): if not datastore.data['watching'].keys():
flash(gettext("No watches to edit"), "error") flash(gettext("No watches to edit"), "error")
return redirect(url_for('watchlist.index')) return redirect(url_for('watchlist.index'))
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
if not uuid in datastore.data['watching']: if not uuid in datastore.data['watching']:
flash(gettext("No watch with the UUID {} found.").format(uuid), "error") flash(gettext("No watch with the UUID {} found.").format(uuid), "error")
return redirect(url_for('watchlist.index')) return redirect(url_for('watchlist.index'))
@@ -71,13 +72,8 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
processor_name = datastore.data['watching'][uuid].get('processor', '') processor_name = datastore.data['watching'][uuid].get('processor', '')
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == processor_name), None) processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == processor_name), None)
if not processor_classes: if not processor_classes:
flash(gettext("Could not load '{}' processor, processor plugin might be missing. Please select a different processor.").format(processor_name), 'error') flash(gettext("Cannot load the edit form for processor/plugin '{}', plugin missing?").format(processor_classes[1]), 'error')
# Fall back to default processor so user can still edit and change processor return redirect(url_for('watchlist.index'))
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == 'text_json_diff'), None)
if not processor_classes:
# If even text_json_diff is missing, something is very wrong
flash(gettext("Could not load '{}' processor, processor plugin might be missing.").format(processor_name), 'error')
return redirect(url_for('watchlist.index'))
parent_module = processors.get_parent_module(processor_classes[0]) parent_module = processors.get_parent_module(processor_classes[0])
@@ -154,10 +150,58 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
extra_update_obj['time_between_check'] = form.time_between_check.data extra_update_obj['time_between_check'] = form.time_between_check.data
# Handle processor-config-* fields separately (save to JSON, not datastore) # Handle processor-config-* fields separately (save to JSON, not datastore)
# IMPORTANT: These must NOT be saved to url-watches.json, only to the processor-specific JSON file processor_config_data = {}
processor_config_data = processors.extract_processor_config_from_form_data(form.data) fields_to_remove = []
processors.save_processor_config(datastore, uuid, processor_config_data) for field_name, field_value in form.data.items():
if field_name.startswith('processor_config_'):
config_key = field_name.replace('processor_config_', '')
if field_value: # Only save non-empty values
processor_config_data[config_key] = field_value
fields_to_remove.append(field_name)
# Save processor config to JSON file if any config data exists
if processor_config_data:
try:
processor_name = form.data.get('processor')
# Create a processor instance to access config methods
processor_instance = processors.difference_detection_processor(datastore, uuid)
# Use processor name as filename so each processor keeps its own config
config_filename = f'{processor_name}.json'
processor_instance.update_extra_watch_config(config_filename, processor_config_data)
logger.debug(f"Saved processor config to {config_filename}: {processor_config_data}")
# Call optional edit_hook if processor has one
try:
# Try to import the edit_hook module from the processor package
import importlib
edit_hook_module_name = f'changedetectionio.processors.{processor_name}.edit_hook'
try:
edit_hook = importlib.import_module(edit_hook_module_name)
logger.debug(f"Found edit_hook module for {processor_name}")
if hasattr(edit_hook, 'on_config_save'):
logger.info(f"Calling edit_hook.on_config_save for {processor_name}")
watch_obj = datastore.data['watching'][uuid]
# Call hook and get updated config
updated_config = edit_hook.on_config_save(watch_obj, processor_config_data, datastore)
# Save updated config back to file
processor_instance.update_extra_watch_config(config_filename, updated_config)
logger.info(f"Edit hook updated config: {updated_config}")
else:
logger.debug(f"Edit hook module found but no on_config_save function")
except ModuleNotFoundError:
logger.debug(f"No edit_hook module for processor {processor_name} (this is normal)")
except Exception as hook_error:
logger.error(f"Edit hook error (non-fatal): {hook_error}", exc_info=True)
except Exception as e:
logger.error(f"Failed to save processor config: {e}")
# Remove processor-config-* fields from form.data before updating datastore
for field_name in fields_to_remove:
form.data.pop(field_name, None)
# Ignore text # Ignore text
form_ignore_text = form.ignore_text.data form_ignore_text = form.ignore_text.data
@@ -239,7 +283,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
############################# #############################
if not datastore.data['watching'][uuid].get('paused') and is_in_schedule: if not datastore.data['watching'][uuid].get('paused') and is_in_schedule:
# Queue the watch for immediate recheck, with a higher priority # Queue the watch for immediate recheck, with a higher priority
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Diff page [edit] link should go back to diff page # Diff page [edit] link should go back to diff page
if request.args.get("next") and request.args.get("next") == 'diff': if request.args.get("next") and request.args.get("next") == 'diff':
@@ -267,17 +311,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Get fetcher capabilities instead of hardcoded logic # Get fetcher capabilities instead of hardcoded logic
capabilities = get_fetcher_capabilities(watch, datastore) capabilities = get_fetcher_capabilities(watch, datastore)
# Add processor capabilities from module
capabilities['supports_visual_selector'] = getattr(parent_module, 'supports_visual_selector', False)
capabilities['supports_text_filters_and_triggers'] = getattr(parent_module, 'supports_text_filters_and_triggers', False)
capabilities['supports_text_filters_and_triggers_elements'] = getattr(parent_module, 'supports_text_filters_and_triggers_elements', False)
capabilities['supports_request_type'] = getattr(parent_module, 'supports_request_type', False)
app_rss_token = datastore.data['settings']['application'].get('rss_access_token'), app_rss_token = datastore.data['settings']['application'].get('rss_access_token'),
c = [f"processor-{watch.get('processor')}"] c = [f"processor-{watch.get('processor')}"]
if worker_pool.is_watch_running(uuid): if worker_handler.is_watch_running(uuid):
c.append('checking-now') c.append('checking-now')
template_args = { template_args = {
@@ -334,8 +371,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
from flask import send_file from flask import send_file
import brotli import brotli
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
watch = datastore.data['watching'].get(uuid) watch = datastore.data['watching'].get(uuid)
if watch and watch.history.keys() and os.path.isdir(watch.watch_data_dir): if watch and watch.history.keys() and os.path.isdir(watch.watch_data_dir):
latest_filename = list(watch.history.keys())[-1] latest_filename = list(watch.history.keys())[-1]
@@ -360,9 +395,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
def watch_get_preview_rendered(uuid): def watch_get_preview_rendered(uuid):
'''For when viewing the "preview" of the rendered text from inside of Edit''' '''For when viewing the "preview" of the rendered text from inside of Edit'''
from flask import jsonify from flask import jsonify
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
from changedetectionio.processors.text_json_diff import prepare_filter_prevew from changedetectionio.processors.text_json_diff import prepare_filter_prevew
result = prepare_filter_prevew(watch_uuid=uuid, form_data=request.form, datastore=datastore) result = prepare_filter_prevew(watch_uuid=uuid, form_data=request.form, datastore=datastore)
return jsonify(result) return jsonify(result)
+50 -38
View File
@@ -26,9 +26,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
Each processor implements processors/{type}/preview.py::render() Each processor implements processors/{type}/preview.py::render()
If a processor doesn't have a preview module, falls back to default text preview. If a processor doesn't have a preview module, falls back to default text preview.
""" """
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
try: try:
watch = datastore.data['watching'][uuid] watch = datastore.data['watching'][uuid]
except KeyError: except KeyError:
@@ -38,21 +39,24 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's preview module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's preview module
processor_module = get_processor_submodule(processor_name, 'preview') import importlib
processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.preview')
# Call the processor's render() function # Call the processor's render() function
if processor_module and hasattr(processor_module, 'render'): if hasattr(processor_module, 'render'):
return processor_module.render( return processor_module.render(
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request, request=request,
url_for=url_for, url_for=url_for,
render_template=render_template, render_template=render_template,
flash=flash, flash=flash,
redirect=redirect redirect=redirect
) )
except (ImportError, ModuleNotFoundError) as e:
logger.debug(f"Processor {processor_name} does not have a preview module, using default preview: {e}")
# Fallback: if processor doesn't have preview module, use default text preview # Fallback: if processor doesn't have preview module, use default text preview
content = [] content = []
@@ -146,8 +150,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
""" """
from flask import make_response from flask import make_response
# More for testing, possible to return the first/only
if uuid == 'first': if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop() uuid = list(datastore.data['watching'].keys()).pop()
try: try:
watch = datastore.data['watching'][uuid] watch = datastore.data['watching'][uuid]
except KeyError: except KeyError:
@@ -157,33 +163,39 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Get the processor type for this watch # Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff') processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's preview module (works for both built-in and plugin processors) try:
from changedetectionio.processors import get_processor_submodule # Try to import the processor's preview module
processor_module = get_processor_submodule(processor_name, 'preview') import importlib
processor_module = importlib.import_module(f'changedetectionio.processors.{processor_name}.preview')
# Call the processor's get_asset() function # Call the processor's get_asset() function
if processor_module and hasattr(processor_module, 'get_asset'): if hasattr(processor_module, 'get_asset'):
result = processor_module.get_asset( result = processor_module.get_asset(
asset_name=asset_name, asset_name=asset_name,
watch=watch, watch=watch,
datastore=datastore, datastore=datastore,
request=request request=request
) )
if result is None: if result is None:
from flask import abort
abort(404, description=f"Asset '{asset_name}' not found")
binary_data, content_type, cache_control = result
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort from flask import abort
abort(404, description=f"Asset '{asset_name}' not found") abort(404, description=f"Processor '{processor_name}' does not support assets")
binary_data, content_type, cache_control = result except (ImportError, ModuleNotFoundError) as e:
logger.warning(f"Processor {processor_name} does not have a preview module: {e}")
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort from flask import abort
abort(404, description=f"Processor '{processor_name}' does not support assets") abort(404, description=f"Processor '{processor_name}' not found")
return preview_blueprint return preview_blueprint
@@ -45,19 +45,14 @@
<div class="tabs collapsable"> <div class="tabs collapsable">
<ul> <ul>
<li class="tab"><a href="#general">{{ _('General') }}</a></li> <li class="tab"><a href="#general">{{ _('General') }}</a></li>
{% if capabilities.supports_request_type %}
<li class="tab"><a href="#request">{{ _('Request') }}</a></li> <li class="tab"><a href="#request">{{ _('Request') }}</a></li>
{% endif %}
{% if extra_tab_content %} {% if extra_tab_content %}
<li class="tab"><a href="#extras_tab">{{ extra_tab_content }}</a></li> <li class="tab"><a href="#extras_tab">{{ extra_tab_content }}</a></li>
{% endif %} {% endif %}
{% if capabilities.supports_browser_steps %}
<li class="tab"><a id="browsersteps-tab" href="#browser-steps">{{ _('Browser Steps') }}</a></li> <li class="tab"><a id="browsersteps-tab" href="#browser-steps">{{ _('Browser Steps') }}</a></li>
{% endif %} <!-- should goto extra forms? -->
{% if capabilities.supports_visual_selector %} {% if watch['processor'] == 'text_json_diff' or watch['processor'] == 'image_ssim_diff' %}
<li class="tab"><a id="visualselector-tab" href="#visualselector">{{ _('Visual Filter Selector') }}</a></li> <li class="tab"><a id="visualselector-tab" href="#visualselector">{{ _('Visual Filter Selector') }}</a></li>
{% endif %}
{% if capabilities.supports_text_filters_and_triggers %}
<li class="tab" id="filters-and-triggers-tab"><a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a></li> <li class="tab" id="filters-and-triggers-tab"><a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a></li>
<li class="tab" id="conditions-tab"><a href="#conditions">{{ _('Conditions') }}</a></li> <li class="tab" id="conditions-tab"><a href="#conditions">{{ _('Conditions') }}</a></li>
{% endif %} {% endif %}
@@ -121,7 +116,6 @@
</fieldset> </fieldset>
</div> </div>
{% if capabilities.supports_request_type %}
<div class="tab-pane-inner" id="request"> <div class="tab-pane-inner" id="request">
<div class="pure-control-group inline-radio"> <div class="pure-control-group inline-radio">
{{ render_field(form.fetch_backend, class="fetch-backend") }} {{ render_field(form.fetch_backend, class="fetch-backend") }}
@@ -209,7 +203,6 @@ Math: {{ 1 + 1 }}") }}
</div> </div>
</fieldset> </fieldset>
</div> </div>
{% endif %}
<div class="tab-pane-inner" id="browser-steps"> <div class="tab-pane-inner" id="browser-steps">
{% if capabilities.supports_browser_steps %} {% if capabilities.supports_browser_steps %}
@@ -290,7 +283,8 @@ Math: {{ 1 + 1 }}") }}
</fieldset> </fieldset>
</div> </div>
{% if capabilities.supports_text_filters_and_triggers %} {% if watch['processor'] == 'text_json_diff' or watch['processor'] == 'image_ssim_diff' %}
<div class="tab-pane-inner" id="conditions"> <div class="tab-pane-inner" id="conditions">
<script> <script>
const verify_condition_rule_url="{{url_for('conditions.verify_condition_single_rule', watch_uuid=uuid)}}"; const verify_condition_rule_url="{{url_for('conditions.verify_condition_single_rule', watch_uuid=uuid)}}";
@@ -309,9 +303,7 @@ Math: {{ 1 + 1 }}") }}
<span id="activate-text-preview" class="pure-button pure-button-primary button-xsmall">{{ _('Activate preview') }}</span> <span id="activate-text-preview" class="pure-button pure-button-primary button-xsmall">{{ _('Activate preview') }}</span>
<div> <div>
<div id="edit-text-filter"> <div id="edit-text-filter">
<div class="pure-control-group" id="pro-tips">
{% if capabilities.supports_text_filters_and_triggers_elements %}
<div class="pure-control-group" id="pro-tips">
<strong>{{ _('Pro-tips:') }}</strong><br> <strong>{{ _('Pro-tips:') }}</strong><br>
<ul> <ul>
<li> <li>
@@ -322,8 +314,8 @@ Math: {{ 1 + 1 }}") }}
</li> </li>
</ul> </ul>
</div> </div>
{% include "edit/include_subtract.html" %} {% include "edit/include_subtract.html" %}
{% endif %}
<div class="text-filtering border-fieldset"> <div class="text-filtering border-fieldset">
<fieldset class="pure-group" id="text-filtering-type-options"> <fieldset class="pure-group" id="text-filtering-type-options">
<h3>{{ _('Text filtering') }}</h3> <h3>{{ _('Text filtering') }}</h3>
@@ -382,7 +374,7 @@ Math: {{ 1 + 1 }}") }}
{{ extra_form_content|safe }} {{ extra_form_content|safe }}
</div> </div>
{% endif %} {% endif %}
{% if capabilities.supports_visual_selector %} {% if watch['processor'] == 'text_json_diff' or watch['processor'] == 'image_ssim_diff' %}
<div class="tab-pane-inner visual-selector-ui" id="visualselector"> <div class="tab-pane-inner visual-selector-ui" id="visualselector">
<img class="beta-logo" src="{{url_for('static_content', group='images', filename='beta-logo.png')}}" alt="New beta functionality"> <img class="beta-logo" src="{{url_for('static_content', group='images', filename='beta-logo.png')}}" alt="New beta functionality">
@@ -394,7 +386,7 @@ Math: {{ 1 + 1 }}") }}
{{ _('The Visual Selector tool lets you select the') }} <i>{{ _('text') }}</i> {{ _('elements that will be used for the change detection. It automatically fills-in the filters in the "CSS/JSONPath/JQ/XPath Filters" box of the') }} <a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a> {{ _('tab. Use') }} <strong>{{ _('Shift+Click') }}</strong> {{ _('to select multiple items.') }} {{ _('The Visual Selector tool lets you select the') }} <i>{{ _('text') }}</i> {{ _('elements that will be used for the change detection. It automatically fills-in the filters in the "CSS/JSONPath/JQ/XPath Filters" box of the') }} <a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a> {{ _('tab. Use') }} <strong>{{ _('Shift+Click') }}</strong> {{ _('to select multiple items.') }}
</span> </span>
{% if watch['processor'] == 'image_ssim_diff' %} {# @todo, integrate with image_ssim_diff selector better, use some extra form ? #} {% if watch['processor'] == 'image_ssim_diff' %}
<div id="selection-mode-controls" style="margin: 10px 0; padding: 10px; background: var(--color-background-tab); border-radius: 5px;"> <div id="selection-mode-controls" style="margin: 10px 0; padding: 10px; background: var(--color-background-tab); border-radius: 5px;">
<label style="font-weight: 600; margin-right: 15px;">{{ _('Selection Mode:') }}</label> <label style="font-weight: 600; margin-right: 15px;">{{ _('Selection Mode:') }}</label>
<label style="margin-right: 15px;"> <label style="margin-right: 15px;">
+3 -4
View File
@@ -2,7 +2,7 @@ from flask import Blueprint, request, redirect, url_for, flash
from flask_babel import gettext from flask_babel import gettext
from changedetectionio.store import ChangeDetectionStore from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import worker_pool from changedetectionio import worker_handler
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData, watch_check_update): def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData, watch_check_update):
@@ -24,8 +24,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
flash(gettext('Warning, URL {} already exists').format(url), "notice") flash(gettext('Warning, URL {} already exists').format(url), "notice")
add_paused = request.form.get('edit_and_watch_submit_button') != None add_paused = request.form.get('edit_and_watch_submit_button') != None
from changedetectionio import processors processor = request.form.get('processor', 'text_json_diff')
processor = request.form.get('processor', processors.get_default_processor())
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags').strip(), extras={'paused': add_paused, 'processor': processor}) new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags').strip(), extras={'paused': add_paused, 'processor': processor})
if new_uuid: if new_uuid:
@@ -34,7 +33,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
return redirect(url_for('ui.ui_edit.edit_page', uuid=new_uuid, unpause_on_save=1, tag=request.args.get('tag'))) return redirect(url_for('ui.ui_edit.edit_page', uuid=new_uuid, unpause_on_save=1, tag=request.args.get('tag')))
else: else:
# Straight into the queue. # Straight into the queue.
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
flash(gettext("Watch added.")) flash(gettext("Watch added."))
return redirect(url_for('watchlist.index', tag=request.args.get('tag',''))) return redirect(url_for('watchlist.index', tag=request.args.get('tag','')))
@@ -1,9 +1,5 @@
{%- extends 'base.html' -%} {%- extends 'base.html' -%}
{%- block content -%} {%- block content -%}
{%- set tips = [
_("Changedetection.io can monitor more than just web-pages! See our plugins!") ~ ' <a href="https://changedetection.io/plugins">' ~ _('More info') ~ '</a>',
_("You can also add 'shared' watches.") ~ ' <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Sharing-a-Watch">' ~ _('More info') ~ '</a>'
] -%}
{%- from '_helpers.html' import render_simple_field, render_field, render_nolabel_field, sort_by_title -%} {%- from '_helpers.html' import render_simple_field, render_field, render_nolabel_field, sort_by_title -%}
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script> <script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='watch-overview.js')}}" defer></script> <script src="{{url_for('static_content', group='js', filename='watch-overview.js')}}" defer></script>
@@ -73,9 +69,7 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
</div> </div>
</fieldset> </fieldset>
<span style="color:#eee; font-size: 80%;"> <span style="color:#eee; font-size: 80%;"><img alt="{{ _('Create a shareable link') }}" style="height: 1em;display:inline-block;" src="{{url_for('static_content', group='images', filename='spread-white.svg')}}" > {{ _("Tip: You can also add 'shared' watches.") }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Sharing-a-Watch">{{ _('More info') }}</a></span>
<strong>Tip: </strong> {{ tips | random | safe }}
</span>
</form> </form>
</div> </div>
<div class="box"> <div class="box">
@@ -211,26 +205,25 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
</div> </div>
{% endif %} {% endif %}
<div> <div>
{%- if watch['processor'] and watch['processor'] in processor_badge_texts -%} <span class="watch-title">
<span class="processor-badge processor-badge-{{ watch['processor'] }}" title="{{ processor_descriptions.get(watch['processor'], watch['processor']) }}">{{ processor_badge_texts[watch['processor']] }}</span> {% if system_use_url_watchlist or watch.get('use_page_title_in_list') %}
{%- endif -%} {{ watch.label }}
<span class="watch-title"> {% else %}
{% if system_use_url_watchlist or watch.get('use_page_title_in_list') %} {{ watch.get('title') or watch.link }}
{{ watch.label }} {% endif %}
{% else %} <a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
{{ watch.get('title') or watch.link }} </span>
{% endif %}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
</span>
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div> <div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div>
{%- if watch['processor'] == 'text_json_diff' -%} {%- if watch['processor'] == 'text_json_diff' -%}
{%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%} {%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%}
<div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div> <div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div>
{%- endif -%} {%- endif -%}
{%- endif -%} {%- endif -%}
{%- if watch['processor'] and watch['processor'] in processor_badge_texts -%}
<span class="processor-badge processor-badge-{{ watch['processor'] }}" title="{{ processor_descriptions.get(watch['processor'], watch['processor']) }}">{{ processor_badge_texts[watch['processor']] }}</span>
{%- endif -%}
{%- for watch_tag_uuid, watch_tag in datastore.get_all_tags_for_watch(watch['uuid']).items() -%} {%- for watch_tag_uuid, watch_tag in datastore.get_all_tags_for_watch(watch['uuid']).items() -%}
<a href="{{url_for('watchlist.index', tag=watch_tag_uuid) }}" class="watch-tag-list tag-{{ watch_tag.title|sanitize_tag_class }}">{{ watch_tag.title }}</a> <span class="watch-tag-list tag-{{ watch_tag.title|sanitize_tag_class }}">{{ watch_tag.title }}</span>
{%- endfor -%} {%- endfor -%}
</div> </div>
<div class="status-icons"> <div class="status-icons">
@@ -8,9 +8,7 @@ from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \ from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, FAVICON_FETCHER_JS SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, FAVICON_FETCHER_JS
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable, \ from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
BrowserStepsStepException
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False): async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os import os
@@ -367,16 +365,7 @@ class fetcher(Fetcher):
try: try:
# Run Browser Steps here # Run Browser Steps here
if self.browser_steps_get_valid_steps(): if self.browser_steps_get_valid_steps():
try: await self.iterate_browser_steps(start_url=url)
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
try:
await context.close()
await browser.close()
except Exception as e:
# Fine, could be messy situation
pass
raise
await self.page.wait_for_timeout(extra_wait * 1000) await self.page.wait_for_timeout(extra_wait * 1000)
@@ -418,11 +407,19 @@ class fetcher(Fetcher):
# Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers # Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers
await self.page.request_gc() await self.page.request_gc()
gc.collect() gc.collect()
# Release C-level memory from base64 decode back to OS
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
except ScreenshotUnavailable: except ScreenshotUnavailable:
# Re-raise screenshot unavailable exceptions # Re-raise screenshot unavailable exceptions
raise
except Exception as e:
# It's likely the screenshot was too long/big and something crashed
raise ScreenshotUnavailable(url=url, status_code=self.status_code) raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally: finally:
# Request garbage collection one more time before closing # Request garbage collection one more time before closing
try: try:
+2 -23
View File
@@ -55,26 +55,6 @@ class fetcher(Fetcher):
session = requests.Session() session = requests.Session()
# Configure retry adapter for low-level network errors only
# Retries connection timeouts, read timeouts, connection resets - not HTTP status codes
# Especially helpful in parallel test execution when servers are slow/overloaded
# Configurable via REQUESTS_RETRY_MAX_COUNT (default: 3 attempts)
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
max_retries = int(os.getenv("REQUESTS_RETRY_MAX_COUNT", "6"))
retry_strategy = Retry(
total=max_retries,
connect=max_retries, # Retry connection timeouts
read=max_retries, # Retry read timeouts
status=0, # Don't retry on HTTP status codes
backoff_factor=0.5, # Wait 0.3s, 0.6s, 1.2s between retries
allowed_methods=["HEAD", "GET", "OPTIONS", "POST"],
raise_on_status=False
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
if strtobool(os.getenv('ALLOW_FILE_URI', 'false')) and url.startswith('file://'): if strtobool(os.getenv('ALLOW_FILE_URI', 'false')) and url.startswith('file://'):
from requests_file import FileAdapter from requests_file import FileAdapter
@@ -162,11 +142,10 @@ class fetcher(Fetcher):
watch_uuid=None, watch_uuid=None,
): ):
"""Async wrapper that runs the synchronous requests code in a thread pool""" """Async wrapper that runs the synchronous requests code in a thread pool"""
loop = asyncio.get_event_loop() loop = asyncio.get_event_loop()
# Run the synchronous _run_sync in a thread pool to avoid blocking the event loop # Run the synchronous _run_sync in a thread pool to avoid blocking the event loop
# Retry logic is handled by requests' HTTPAdapter (see _run_sync for configuration)
await loop.run_in_executor( await loop.run_in_executor(
None, # Use default ThreadPoolExecutor None, # Use default ThreadPoolExecutor
lambda: self._run_sync( lambda: self._run_sync(
+16 -26
View File
@@ -14,7 +14,7 @@ from pathlib import Path
from changedetectionio.strtobool import strtobool from changedetectionio.strtobool import strtobool
from threading import Event from threading import Event
from changedetectionio.queue_handlers import RecheckPriorityQueue, NotificationQueue from changedetectionio.queue_handlers import RecheckPriorityQueue, NotificationQueue
from changedetectionio import worker_pool from changedetectionio import worker_handler
from flask import ( from flask import (
Flask, Flask,
@@ -195,7 +195,7 @@ def _jinja2_filter_format_number_locale(value: float) -> str:
@app.template_global('is_checking_now') @app.template_global('is_checking_now')
def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"): def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"):
return worker_pool.is_watch_running(watch_obj['uuid']) return worker_handler.is_watch_running(watch_obj['uuid'])
@app.template_global('get_watch_queue_position') @app.template_global('get_watch_queue_position')
def _get_watch_queue_position(watch_obj): def _get_watch_queue_position(watch_obj):
@@ -206,13 +206,13 @@ def _get_watch_queue_position(watch_obj):
@app.template_global('get_current_worker_count') @app.template_global('get_current_worker_count')
def _get_current_worker_count(): def _get_current_worker_count():
"""Get the current number of operational workers""" """Get the current number of operational workers"""
return worker_pool.get_worker_count() return worker_handler.get_worker_count()
@app.template_global('get_worker_status_info') @app.template_global('get_worker_status_info')
def _get_worker_status_info(): def _get_worker_status_info():
"""Get detailed worker status information for display""" """Get detailed worker status information for display"""
status = worker_pool.get_worker_status() status = worker_handler.get_worker_status()
running_uuids = worker_pool.get_running_uuids() running_uuids = worker_handler.get_running_uuids()
return { return {
'count': status['worker_count'], 'count': status['worker_count'],
@@ -801,7 +801,7 @@ def changedetection_app(config=None, datastore_o=None):
# watchlist UI buttons etc # watchlist UI buttons etc
import changedetectionio.blueprint.ui as ui import changedetectionio.blueprint.ui as ui
app.register_blueprint(ui.construct_blueprint(datastore, update_q, worker_pool, queuedWatchMetaData, watch_check_update)) app.register_blueprint(ui.construct_blueprint(datastore, update_q, worker_handler, queuedWatchMetaData, watch_check_update))
import changedetectionio.blueprint.watchlist as watchlist import changedetectionio.blueprint.watchlist as watchlist
app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='') app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='')
@@ -838,10 +838,10 @@ def changedetection_app(config=None, datastore_o=None):
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers'])) expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
# Get basic status # Get basic status
status = worker_pool.get_worker_status() status = worker_handler.get_worker_status()
# Perform health check # Perform health check
health_result = worker_pool.check_worker_health( health_result = worker_handler.check_worker_health(
expected_count=expected_workers, expected_count=expected_workers,
update_q=update_q, update_q=update_q,
notification_q=notification_q, notification_q=notification_q,
@@ -905,24 +905,14 @@ def changedetection_app(config=None, datastore_o=None):
# Can be overridden by ENV or use the default settings # Can be overridden by ENV or use the default settings
n_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers'])) n_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
logger.info(f"Starting {n_workers} workers during app initialization") logger.info(f"Starting {n_workers} workers during app initialization")
worker_pool.start_workers(n_workers, update_q, notification_q, app, datastore) worker_handler.start_workers(n_workers, update_q, notification_q, app, datastore)
# Skip background threads in batch mode (just process queue and exit) # Skip background threads in batch mode (just process queue and exit)
batch_mode = app.config.get('batch_mode', False) batch_mode = app.config.get('batch_mode', False)
if not batch_mode: if not batch_mode:
# @todo handle ctrl break # @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks, daemon=True, name="TickerThread-ScheduleChecker").start() ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks, daemon=True, name="TickerThread-ScheduleChecker").start()
threading.Thread(target=notification_runner, daemon=True, name="NotificationRunner").start()
# Start configurable number of notification workers (default 1)
notification_workers = int(os.getenv("NOTIFICATION_WORKERS", "1"))
for i in range(notification_workers):
threading.Thread(
target=notification_runner,
args=(i,),
daemon=True,
name=f"NotificationRunner-{i}"
).start()
logger.info(f"Started {notification_workers} notification worker(s)")
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
# Check for new release version, but not when running in test/build or pytest # Check for new release version, but not when running in test/build or pytest
@@ -964,14 +954,14 @@ def check_for_new_version():
app.config.exit.wait(86400) app.config.exit.wait(86400)
def notification_runner(worker_id=0): def notification_runner():
global notification_debug_log global notification_debug_log
from datetime import datetime from datetime import datetime
import json import json
with app.app_context(): with app.app_context():
while not app.config.exit.is_set(): while not app.config.exit.is_set():
try: try:
# Multiple workers can run concurrently (configurable via NOTIFICATION_WORKERS) # At the moment only one thread runs (single runner)
n_object = notification_q.get(block=False) n_object = notification_q.get(block=False)
except queue.Empty: except queue.Empty:
app.config.exit.wait(1) app.config.exit.wait(1)
@@ -997,7 +987,7 @@ def notification_runner(worker_id=0):
sent_obj = process_notification(n_object, datastore) sent_obj = process_notification(n_object, datastore)
except Exception as e: except Exception as e:
logger.error(f"Notification worker {worker_id} - Watch URL: {n_object['watch_url']} Error {str(e)}") logger.error(f"Watch URL: {n_object['watch_url']} Error {str(e)}")
# UUID wont be present when we submit a 'test' from the global settings # UUID wont be present when we submit a 'test' from the global settings
if 'uuid' in n_object: if 'uuid' in n_object:
@@ -1038,7 +1028,7 @@ def ticker_thread_check_time_launch_checks():
now = time.time() now = time.time()
if now - last_health_check > 60: if now - last_health_check > 60:
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers'])) expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
health_result = worker_pool.check_worker_health( health_result = worker_handler.check_worker_health(
expected_count=expected_workers, expected_count=expected_workers,
update_q=update_q, update_q=update_q,
notification_q=notification_q, notification_q=notification_q,
@@ -1057,7 +1047,7 @@ def ticker_thread_check_time_launch_checks():
continue continue
# Get a list of watches by UUID that are currently fetching data # Get a list of watches by UUID that are currently fetching data
running_uuids = worker_pool.get_running_uuids() running_uuids = worker_handler.get_running_uuids()
# Build set of queued UUIDs once for O(1) lookup instead of O(n) per watch # Build set of queued UUIDs once for O(1) lookup instead of O(n) per watch
queued_uuids = {q_item.item['uuid'] for q_item in update_q.queue} queued_uuids = {q_item.item['uuid'] for q_item in update_q.queue}
@@ -1163,7 +1153,7 @@ def ticker_thread_check_time_launch_checks():
priority = int(time.time()) priority = int(time.time())
# Into the queue with you # Into the queue with you
queued_successfully = worker_pool.queue_item_async_safe(update_q, queued_successfully = worker_handler.queue_item_async_safe(update_q,
queuedWatchMetaData.PrioritizedItem(priority=priority, queuedWatchMetaData.PrioritizedItem(priority=priority,
item={'uuid': uuid}) item={'uuid': uuid})
) )
+3 -3
View File
@@ -730,7 +730,7 @@ class quickWatchForm(Form):
url = fields.URLField(_l('URL'), validators=[validateURL()]) url = fields.URLField(_l('URL'), validators=[validateURL()])
tags = StringTagUUID(_l('Group tag'), validators=[validators.Optional()]) tags = StringTagUUID(_l('Group tag'), validators=[validators.Optional()])
watch_submit_button = SubmitField(_l('Watch'), render_kw={"class": "pure-button pure-button-primary"}) watch_submit_button = SubmitField(_l('Watch'), render_kw={"class": "pure-button pure-button-primary"})
processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default=processors.get_default_processor) processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default="text_json_diff")
edit_and_watch_submit_button = SubmitField(_l('Edit > Watch'), render_kw={"class": "pure-button pure-button-primary"}) edit_and_watch_submit_button = SubmitField(_l('Edit > Watch'), render_kw={"class": "pure-button pure-button-primary"})
@@ -749,7 +749,7 @@ class commonSettingsForm(Form):
notification_format = SelectField(_l('Notification format'), choices=list(valid_notification_formats.items())) notification_format = SelectField(_l('Notification format'), choices=list(valid_notification_formats.items()))
notification_title = StringField(_l('Notification Title'), default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()]) notification_title = StringField(_l('Notification Title'), default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()])
notification_urls = StringListField(_l('Notification URL List'), validators=[validators.Optional(), ValidateAppRiseServers(), ValidateJinja2Template()]) notification_urls = StringListField(_l('Notification URL List'), validators=[validators.Optional(), ValidateAppRiseServers(), ValidateJinja2Template()])
processor = RadioField( label=_l("Processor - What do you want to achieve?"), choices=lambda: processors.available_processors(), default=processors.get_default_processor) processor = RadioField( label=_l("Processor - What do you want to achieve?"), choices=lambda: processors.available_processors(), default="text_json_diff")
scheduler_timezone_default = StringField(_l("Default timezone for watch check scheduler"), render_kw={"list": "timezones"}, validators=[validateTimeZoneName()]) scheduler_timezone_default = StringField(_l("Default timezone for watch check scheduler"), render_kw={"list": "timezones"}, validators=[validateTimeZoneName()])
webdriver_delay = IntegerField(_l('Wait seconds before extracting text'), validators=[validators.Optional(), validators.NumberRange(min=1, message=_l("Should contain one or more seconds"))]) webdriver_delay = IntegerField(_l('Wait seconds before extracting text'), validators=[validators.Optional(), validators.NumberRange(min=1, message=_l("Should contain one or more seconds"))])
@@ -763,7 +763,7 @@ class commonSettingsForm(Form):
class importForm(Form): class importForm(Form):
processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default=processors.get_default_processor) processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default="text_json_diff")
urls = TextAreaField(_l('URLs')) urls = TextAreaField(_l('URLs'))
xlsx_file = FileField(_l('Upload .xlsx file'), validators=[FileAllowed(['xlsx'], _l('Must be .xlsx file!'))]) xlsx_file = FileField(_l('Upload .xlsx file'), validators=[FileAllowed(['xlsx'], _l('Must be .xlsx file!'))])
file_mapping = SelectField(_l('File mapping'), [validators.DataRequired()], choices={('wachete', 'Wachete mapping'), ('custom','Custom mapping')}) file_mapping = SelectField(_l('File mapping'), [validators.DataRequired()], choices={('wachete', 'Wachete mapping'), ('custom','Custom mapping')})
+1 -1
View File
@@ -29,7 +29,7 @@ class model(dict):
'proxy': None, # Preferred proxy connection 'proxy': None, # Preferred proxy connection
'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None}, 'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None},
'timeout': int(getenv("DEFAULT_SETTINGS_REQUESTS_TIMEOUT", "45")), # Default 45 seconds 'timeout': int(getenv("DEFAULT_SETTINGS_REQUESTS_TIMEOUT", "45")), # Default 45 seconds
'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "5")), # Number of threads, lower is better for slow connections 'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "10")), # Number of threads, lower is better for slow connections
'default_ua': { 'default_ua': {
'html_requests': getenv("DEFAULT_SETTINGS_HEADERS_USERAGENT", DEFAULT_SETTINGS_HEADERS_USERAGENT), 'html_requests': getenv("DEFAULT_SETTINGS_HEADERS_USERAGENT", DEFAULT_SETTINGS_HEADERS_USERAGENT),
'html_webdriver': None, 'html_webdriver': None,
-24
View File
@@ -105,30 +105,6 @@ class ChangeDetectionSpec:
""" """
pass pass
@hookspec
def register_processor(self):
"""Register an external processor plugin.
External packages can implement this hook to register custom processors
that will be discovered alongside built-in processors.
Returns:
dict or None: Dictionary with processor information:
{
'processor_name': str, # Machine name (e.g., 'osint_recon')
'processor_module': module, # Module containing processor.py
'processor_class': class, # The perform_site_check class
'metadata': { # Optional metadata
'name': str, # Display name
'description': str, # Description
'processor_weight': int,# Sort weight (lower = higher priority)
'list_badge_text': str, # Badge text for UI
}
}
Return None if this plugin doesn't provide a processor
"""
pass
# Set up Plugin Manager # Set up Plugin Manager
plugin_manager = pluggy.PluginManager(PLUGIN_NAMESPACE) plugin_manager = pluggy.PluginManager(PLUGIN_NAMESPACE)
+30 -219
View File
@@ -17,11 +17,9 @@ def find_sub_packages(package_name):
return [name for _, name, is_pkg in pkgutil.iter_modules(package.__path__) if is_pkg] return [name for _, name, is_pkg in pkgutil.iter_modules(package.__path__) if is_pkg]
@lru_cache(maxsize=1)
def find_processors(): def find_processors():
""" """
Find all subclasses of DifferenceDetectionProcessor in the specified package. Find all subclasses of DifferenceDetectionProcessor in the specified package.
Results are cached to avoid repeated discovery.
:param package_name: The name of the package to scan for processor modules. :param package_name: The name of the package to scan for processor modules.
:return: A list of (module, class) tuples. :return: A list of (module, class) tuples.
@@ -48,23 +46,6 @@ def find_processors():
except (ModuleNotFoundError, ImportError) as e: except (ModuleNotFoundError, ImportError) as e:
logger.warning(f"Failed to import module {module_name}: {e} (find_processors())") logger.warning(f"Failed to import module {module_name}: {e} (find_processors())")
# Discover plugin processors via pluggy
try:
from changedetectionio.pluggy_interface import plugin_manager
plugin_results = plugin_manager.hook.register_processor()
for result in plugin_results:
if result and isinstance(result, dict):
processor_module = result.get('processor_module')
processor_name = result.get('processor_name')
if processor_module and processor_name:
processors.append((processor_module, processor_name))
plugin_path = getattr(processor_module, '__file__', 'unknown location')
logger.info(f"Registered plugin processor: {processor_name} from {plugin_path}")
except Exception as e:
logger.warning(f"Error loading plugin processors: {e}")
return processors return processors
@@ -116,137 +97,54 @@ def find_processor_module(processor_name):
return None return None
def get_processor_module(processor_name):
"""
Get the actual processor module (with perform_site_check class) by name.
Works for both built-in and plugin processors.
Args:
processor_name: Processor machine name (e.g., 'text_json_diff', 'osint_recon')
Returns:
module: The processor module containing perform_site_check, or None if not found
"""
processor_classes = find_processors()
processor_tuple = next((tpl for tpl in processor_classes if tpl[1] == processor_name), None)
if processor_tuple:
# Return the actual processor module (first element of tuple)
return processor_tuple[0]
return None
def get_processor_submodule(processor_name, submodule_name):
"""
Get an optional submodule from a processor (e.g., 'difference', 'extract', 'preview').
Works for both built-in and plugin processors.
Args:
processor_name: Processor machine name (e.g., 'text_json_diff', 'osint_recon')
submodule_name: Name of the submodule (e.g., 'difference', 'extract', 'preview')
Returns:
module: The submodule if it exists, or None if not found
"""
processor_classes = find_processors()
processor_tuple = next((tpl for tpl in processor_classes if tpl[1] == processor_name), None)
if not processor_tuple:
return None
processor_module = processor_tuple[0]
parent_module = get_parent_module(processor_module)
if not parent_module:
return None
# Try to import the submodule
try:
# For built-in processors: changedetectionio.processors.text_json_diff.difference
# For plugin processors: changedetectionio_osint.difference
parent_module_name = parent_module.__name__
submodule_full_name = f"{parent_module_name}.{submodule_name}"
return importlib.import_module(submodule_full_name)
except (ModuleNotFoundError, ImportError):
return None
@lru_cache(maxsize=1)
def get_plugin_processor_metadata():
"""Get metadata from plugin processors."""
metadata = {}
try:
from changedetectionio.pluggy_interface import plugin_manager
plugin_results = plugin_manager.hook.register_processor()
for result in plugin_results:
if result and isinstance(result, dict):
processor_name = result.get('processor_name')
meta = result.get('metadata', {})
if processor_name:
metadata[processor_name] = meta
except Exception as e:
logger.warning(f"Error getting plugin processor metadata: {e}")
return metadata
def available_processors(): def available_processors():
""" """
Get a list of processors by name and description for the UI elements. Get a list of processors by name and description for the UI elements.
Can be filtered via DISABLED_PROCESSORS environment variable (comma-separated list). Can be filtered via ALLOWED_PROCESSORS environment variable (comma-separated list).
:return: A list :) :return: A list :)
""" """
processor_classes = find_processors() processor_classes = find_processors()
# Check if DISABLED_PROCESSORS env var is set # Check if ALLOWED_PROCESSORS env var is set
disabled_processors_env = os.getenv('DISABLED_PROCESSORS', 'image_ssim_diff').strip() # For now we disable it, need to make a deploy with lots of new code and this will be an overload
disabled_processors = [] allowed_processors_env = os.getenv('ALLOWED_PROCESSORS', 'text_json_diff, restock_diff').strip()
if disabled_processors_env: allowed_processors = None
if allowed_processors_env:
# Parse comma-separated list and strip whitespace # Parse comma-separated list and strip whitespace
disabled_processors = [p.strip() for p in disabled_processors_env.split(',') if p.strip()] allowed_processors = [p.strip() for p in allowed_processors_env.split(',') if p.strip()]
logger.info(f"DISABLED_PROCESSORS set, disabling: {disabled_processors}") logger.info(f"ALLOWED_PROCESSORS set, filtering to: {allowed_processors}")
available = [] available = []
plugin_metadata = get_plugin_processor_metadata()
for module, sub_package_name in processor_classes: for module, sub_package_name in processor_classes:
# Skip disabled processors # Filter by allowed processors if set
if sub_package_name in disabled_processors: if allowed_processors and sub_package_name not in allowed_processors:
logger.debug(f"Skipping processor '{sub_package_name}' (in DISABLED_PROCESSORS)") logger.debug(f"Skipping processor '{sub_package_name}' (not in ALLOWED_PROCESSORS)")
continue continue
# Check if this is a plugin processor # Try to get the 'name' attribute from the processor module first
if sub_package_name in plugin_metadata: if hasattr(module, 'name'):
meta = plugin_metadata[sub_package_name] description = gettext(module.name)
description = gettext(meta.get('name', sub_package_name))
# Plugin processors start from weight 10 to separate them from built-in processors
weight = 100 + meta.get('processor_weight', 0)
else: else:
# Try to get the 'name' attribute from the processor module first # Fall back to processor_description from parent module's __init__.py
if hasattr(module, 'name'): parent_module = get_parent_module(module)
description = gettext(module.name) if parent_module and hasattr(parent_module, 'processor_description'):
description = gettext(parent_module.processor_description)
else: else:
# Fall back to processor_description from parent module's __init__.py # Final fallback to a readable name
parent_module = get_parent_module(module) description = sub_package_name.replace('_', ' ').title()
if parent_module and hasattr(parent_module, 'processor_description'):
description = gettext(parent_module.processor_description)
else:
# Final fallback to a readable name
description = sub_package_name.replace('_', ' ').title()
# Get weight for sorting (lower weight = higher in list) # Get weight for sorting (lower weight = higher in list)
weight = 0 # Default weight for processors without explicit weight weight = 0 # Default weight for processors without explicit weight
# Check processor module itself first # Check processor module itself first
if hasattr(module, 'processor_weight'): if hasattr(module, 'processor_weight'):
weight = module.processor_weight weight = module.processor_weight
else: else:
# Fall back to parent module (package __init__.py) # Fall back to parent module (package __init__.py)
parent_module = get_parent_module(module) parent_module = get_parent_module(module)
if parent_module and hasattr(parent_module, 'processor_weight'): if parent_module and hasattr(parent_module, 'processor_weight'):
weight = parent_module.processor_weight weight = parent_module.processor_weight
available.append((sub_package_name, description, weight)) available.append((sub_package_name, description, weight))
@@ -257,20 +155,6 @@ def available_processors():
return [(name, desc) for name, desc, weight in available] return [(name, desc) for name, desc, weight in available]
def get_default_processor():
"""
Get the default processor to use when none is specified.
Returns the first available processor based on weight (lowest weight = highest priority).
This ensures forms auto-select a valid processor even when DISABLED_PROCESSORS filters the list.
:return: The processor name string (e.g., 'text_json_diff')
"""
available = available_processors()
if available:
return available[0][0] # Return the processor name from first tuple
return 'text_json_diff' # Fallback if somehow no processors are available
def get_processor_badge_texts(): def get_processor_badge_texts():
""" """
Get a dictionary mapping processor names to their list_badge_text values. Get a dictionary mapping processor names to their list_badge_text values.
@@ -395,76 +279,3 @@ def get_processor_badge_css():
return '\n\n'.join(css_rules) return '\n\n'.join(css_rules)
def save_processor_config(datastore, watch_uuid, config_data):
"""
Save processor-specific configuration to JSON file.
This is a shared helper function used by both the UI edit form and API endpoints
to consistently handle processor configuration storage.
Args:
datastore: The application datastore instance
watch_uuid: UUID of the watch
config_data: Dictionary of configuration data to save (with processor_config_* prefix removed)
Returns:
bool: True if saved successfully, False otherwise
"""
if not config_data:
return True
try:
from changedetectionio.processors.base import difference_detection_processor
# Get processor name from watch
watch = datastore.data['watching'].get(watch_uuid)
if not watch:
logger.error(f"Cannot save processor config: watch {watch_uuid} not found")
return False
processor_name = watch.get('processor', 'text_json_diff')
# Create a processor instance to access config methods
processor_instance = difference_detection_processor(datastore, watch_uuid)
# Use processor name as filename so each processor keeps its own config
config_filename = f'{processor_name}.json'
processor_instance.update_extra_watch_config(config_filename, config_data)
logger.debug(f"Saved processor config to {config_filename}: {config_data}")
return True
except Exception as e:
logger.error(f"Failed to save processor config: {e}")
return False
def extract_processor_config_from_form_data(form_data):
"""
Extract processor_config_* fields from form data and return separate dicts.
This is a shared helper function used by both the UI edit form and API endpoints
to consistently handle processor configuration extraction.
IMPORTANT: This function modifies form_data in-place by removing processor_config_* fields.
Args:
form_data: Dictionary of form data (will be modified in-place)
Returns:
dict: Dictionary of processor config data (with processor_config_* prefix removed)
"""
processor_config_data = {}
# Use list() to create a copy of keys since we're modifying the dict
for field_name in list(form_data.keys()):
if field_name.startswith('processor_config_'):
config_key = field_name.replace('processor_config_', '')
# Save all values (including empty strings) to allow explicit clearing of settings
processor_config_data[config_key] = form_data[field_name]
# Remove from form_data to prevent it from reaching datastore
del form_data[field_name]
return processor_config_data
@@ -12,13 +12,6 @@ processor_description = "Visual/Screenshot change detection (Fast)"
processor_name = "image_ssim_diff" processor_name = "image_ssim_diff"
processor_weight = 2 # Lower weight = appears at top, heavier weight = appears lower (bottom) processor_weight = 2 # Lower weight = appears at top, heavier weight = appears lower (bottom)
# Processor capabilities
supports_visual_selector = True
supports_browser_steps = True
supports_text_filters_and_triggers = False
supports_text_filters_and_triggers_elements = False
supports_request_type = True
PROCESSOR_CONFIG_NAME = f"{Path(__file__).parent.name}.json" PROCESSOR_CONFIG_NAME = f"{Path(__file__).parent.name}.json"
# Subprocess timeout settings # Subprocess timeout settings
@@ -4,13 +4,6 @@ from changedetectionio.model.Watch import model as BaseWatch
from typing import Union from typing import Union
import re import re
# Processor capabilities
supports_visual_selector = True
supports_browser_steps = True
supports_text_filters_and_triggers = True
supports_text_filters_and_triggers_elements = True
supports_request_type = True
class Restock(dict): class Restock(dict):
def parse_currency(self, raw_value: str) -> Union[float, None]: def parse_currency(self, raw_value: str) -> Union[float, None]:
@@ -1,11 +1,5 @@
from loguru import logger
# Processor capabilities from loguru import logger
supports_visual_selector = True
supports_browser_steps = True
supports_text_filters_and_triggers = True
supports_text_filters_and_triggers_elements = True
supports_request_type = True
+141 -228
View File
@@ -5,57 +5,51 @@ import heapq
import queue import queue
import threading import threading
# Janus is no longer required - we use pure threading.Queue for multi-loop support try:
# try: import janus
# import janus except ImportError:
# except ImportError: logger.critical(f"CRITICAL: janus library is required. Install with: pip install janus")
# pass # Not needed anymore raise
class RecheckPriorityQueue: class RecheckPriorityQueue:
""" """
Thread-safe priority queue supporting multiple async event loops. Ultra-reliable priority queue using janus for async/sync bridging.
ARCHITECTURE: CRITICAL DESIGN NOTE: Both sync_q and async_q are required because:
- Multiple async workers, each with its own event loop in its own thread - sync_q: Used by Flask routes, ticker threads, and other synchronous code
- Hybrid sync/async design for maximum scalability - async_q: Used by async workers (the actual fetchers/processors) and coroutines
- Sync interface for ticker thread (threading.Queue)
- Async interface for workers (asyncio.Event - NO executor threads!) DO NOT REMOVE EITHER INTERFACE - they bridge different execution contexts:
- Synchronous code (Flask, threads) cannot use async methods without blocking
SCALABILITY: - Async code cannot use sync methods without blocking the event loop
- Scales to 100-200+ workers without executor thread exhaustion - janus provides the only safe bridge between these two worlds
- Async workers wait on asyncio.Event (pure coroutines, no threads)
- Sync callers use threading.Queue (backward compatible) Attempting to unify to async-only would require:
- Converting all Flask routes to async (major breaking change)
WHY NOT JANUS: - Using asyncio.run() in sync contexts (causes deadlocks)
- Janus binds to ONE event loop at creation time - Thread-pool wrapping (adds complexity and overhead)
- Our architecture has 15+ workers, each with separate event loops
- Workers in different threads/loops cannot share janus async interface Minimal implementation focused on reliability:
- Pure janus for sync/async bridge
WHY NOT RUN_IN_EXECUTOR: - Thread-safe priority ordering
- With 200 workers, run_in_executor() would block 200 threads - Bulletproof error handling with critical logging
- Exhausts ThreadPoolExecutor, starves Flask HTTP handlers
- Pure async approach uses 0 threads while waiting
""" """
def __init__(self, maxsize: int = 0): def __init__(self, maxsize: int = 0):
try: try:
import asyncio self._janus_queue = janus.Queue(maxsize=maxsize)
# BOTH interfaces required - see class docstring for why
# Sync interface: threading.Queue for ticker thread and Flask routes self.sync_q = self._janus_queue.sync_q # Flask routes, ticker thread
self._notification_queue = queue.Queue(maxsize=maxsize if maxsize > 0 else 0) self.async_q = self._janus_queue.async_q # Async workers
# Priority storage - thread-safe # Priority storage - thread-safe
self._priority_items = [] self._priority_items = []
self._lock = threading.RLock() self._lock = threading.RLock()
# No event signaling needed - pure polling approach
# Workers check queue every 50ms (latency acceptable: 0-500ms)
# Scales to 1000+ workers: each sleeping worker = ~4KB coroutine, not thread
# Signals for UI updates # Signals for UI updates
self.queue_length_signal = signal('queue_length') self.queue_length_signal = signal('queue_length')
logger.debug("RecheckPriorityQueue initialized successfully") logger.debug("RecheckPriorityQueue initialized successfully")
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to initialize RecheckPriorityQueue: {str(e)}") logger.critical(f"CRITICAL: Failed to initialize RecheckPriorityQueue: {str(e)}")
@@ -64,48 +58,38 @@ class RecheckPriorityQueue:
# SYNC INTERFACE (for ticker thread) # SYNC INTERFACE (for ticker thread)
def put(self, item, block: bool = True, timeout: Optional[float] = None): def put(self, item, block: bool = True, timeout: Optional[float] = None):
"""Thread-safe sync put with priority ordering""" """Thread-safe sync put with priority ordering"""
logger.trace(f"RecheckQueue.put() called for item: {self._get_item_uuid(item)}, block={block}, timeout={timeout}")
try: try:
# CRITICAL: Add to both priority storage AND notification queue atomically # Add to priority storage
# to prevent desynchronization where item exists but no notification
with self._lock: with self._lock:
heapq.heappush(self._priority_items, item) heapq.heappush(self._priority_items, item)
# Add notification - use blocking with timeout for safety # Notify via janus sync queue
# Notification queue is unlimited size, so should never block in practice self.sync_q.put(True, block=block, timeout=timeout)
# but timeout ensures we detect any unexpected issues (deadlock, etc)
try: # Emit signals
self._notification_queue.put(True, block=True, timeout=5.0) self._emit_put_signals(item)
except Exception as notif_e:
# Notification failed - MUST remove from priority_items to keep in sync
# This prevents "Priority queue inconsistency" errors in get()
logger.critical(f"CRITICAL: Notification queue put failed, removing from priority_items: {notif_e}")
self._priority_items.remove(item)
heapq.heapify(self._priority_items)
raise # Re-raise to be caught by outer exception handler
# Signal emission after successful queue - log but don't fail the operation
# Item is already safely queued, so signal failure shouldn't affect queue state
try:
self._emit_put_signals(item)
except Exception as signal_e:
logger.error(f"Failed to emit put signals but item queued successfully: {signal_e}")
logger.trace(f"Successfully queued item: {self._get_item_uuid(item)}") logger.trace(f"Successfully queued item: {self._get_item_uuid(item)}")
return True return True
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to put item {self._get_item_uuid(item)}: {type(e).__name__}: {str(e)}") logger.critical(f"CRITICAL: Failed to put item {self._get_item_uuid(item)}: {str(e)}")
# Item should have been cleaned up in the inner try/except if notification failed # Remove from priority storage if janus put failed
try:
with self._lock:
if item in self._priority_items:
self._priority_items.remove(item)
heapq.heapify(self._priority_items)
except Exception as cleanup_e:
logger.critical(f"CRITICAL: Failed to cleanup after put failure: {str(e)}")
return False return False
def get(self, block: bool = True, timeout: Optional[float] = None): def get(self, block: bool = True, timeout: Optional[float] = None):
"""Thread-safe sync get with priority ordering""" """Thread-safe sync get with priority ordering"""
logger.trace(f"RecheckQueue.get() called, block={block}, timeout={timeout}") import queue
import queue as queue_module
try: try:
# Wait for notification (this doesn't return the actual item, just signals availability) # Wait for notification
self._notification_queue.get(block=block, timeout=timeout) self.sync_q.get(block=block, timeout=timeout)
# Get highest priority item # Get highest priority item
with self._lock: with self._lock:
@@ -114,91 +98,69 @@ class RecheckPriorityQueue:
raise Exception("Priority queue inconsistency") raise Exception("Priority queue inconsistency")
item = heapq.heappop(self._priority_items) item = heapq.heappop(self._priority_items)
# Signal emission after successful retrieval - log but don't lose the item # Emit signals
# Item is already retrieved, so signal failure shouldn't affect queue state self._emit_get_signals()
try:
self._emit_get_signals()
except Exception as signal_e:
logger.error(f"Failed to emit get signals but item retrieved successfully: {signal_e}")
logger.trace(f"RecheckQueue.get() successfully retrieved item: {self._get_item_uuid(item)}") logger.debug(f"Successfully retrieved item: {self._get_item_uuid(item)}")
return item
except queue_module.Empty:
# Queue is empty with timeout - expected behavior
logger.trace(f"RecheckQueue.get() timed out - queue is empty (timeout={timeout})")
raise # noqa
except Exception as e:
# Re-raise without logging - caller (worker) will handle and log appropriately
logger.trace(f"RecheckQueue.get() failed with exception: {type(e).__name__}: {str(e)}")
raise
# ASYNC INTERFACE (for workers)
async def async_put(self, item, executor=None):
"""Async put with priority ordering - uses thread pool to avoid blocking
Args:
item: Item to add to queue
executor: Optional ThreadPoolExecutor. If None, uses default pool.
"""
logger.trace(f"RecheckQueue.async_put() called for item: {self._get_item_uuid(item)}, executor={executor}")
import asyncio
try:
# Use run_in_executor to call sync put without blocking event loop
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
executor, # Use provided executor or default
lambda: self.put(item, block=True, timeout=5.0)
)
logger.trace(f"RecheckQueue.async_put() successfully queued item: {self._get_item_uuid(item)}")
return result
except Exception as e:
logger.critical(f"CRITICAL: Failed to async put item {self._get_item_uuid(item)}: {str(e)}")
return False
async def async_get(self, executor=None, timeout=1.0):
"""
Efficient async get using executor for blocking call.
HYBRID APPROACH: Best of both worlds
- Uses run_in_executor for efficient blocking (no polling overhead)
- Single timeout (no double-timeout race condition)
- Scales well: executor sized to match worker count
With FETCH_WORKERS=10: 10 threads blocked max (acceptable)
With FETCH_WORKERS=200: Need executor with 200+ threads (see worker_pool.py)
Args:
executor: ThreadPoolExecutor (sized to match worker count)
timeout: Maximum time to wait in seconds
Returns:
Item from queue
Raises:
queue.Empty: If timeout expires with no item available
"""
logger.trace(f"RecheckQueue.async_get() called, timeout={timeout}")
import asyncio
try:
# Use run_in_executor to call sync get efficiently
# No outer asyncio.wait_for wrapper = no double timeout issue!
loop = asyncio.get_event_loop()
item = await loop.run_in_executor(
executor,
lambda: self.get(block=True, timeout=timeout)
)
logger.trace(f"RecheckQueue.async_get() successfully retrieved item: {self._get_item_uuid(item)}")
return item return item
except queue.Empty: except queue.Empty:
logger.trace(f"RecheckQueue.async_get() timed out - queue is empty") # Queue is empty with timeout - expected behavior, re-raise without logging
raise raise
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to async get item from queue: {type(e).__name__}: {str(e)}") # Re-raise without logging - caller (worker) will handle and log appropriately
raise
# ASYNC INTERFACE (for workers)
async def async_put(self, item):
"""Pure async put with priority ordering"""
try:
# Add to priority storage
with self._lock:
heapq.heappush(self._priority_items, item)
# Notify via janus async queue
await self.async_q.put(True)
# Emit signals
self._emit_put_signals(item)
logger.debug(f"Successfully async queued item: {self._get_item_uuid(item)}")
return True
except Exception as e:
logger.critical(f"CRITICAL: Failed to async put item {self._get_item_uuid(item)}: {str(e)}")
# Remove from priority storage if janus put failed
try:
with self._lock:
if item in self._priority_items:
self._priority_items.remove(item)
heapq.heapify(self._priority_items)
except Exception as cleanup_e:
logger.critical(f"CRITICAL: Failed to cleanup after async put failure: {str(e)}")
return False
async def async_get(self):
"""Pure async get with priority ordering"""
try:
# Wait for notification
await self.async_q.get()
# Get highest priority item
with self._lock:
if not self._priority_items:
logger.critical(f"CRITICAL: Async queue notification received but no priority items available")
raise Exception("Priority queue inconsistency")
item = heapq.heappop(self._priority_items)
# Emit signals
self._emit_get_signals()
logger.debug(f"Successfully async retrieved item: {self._get_item_uuid(item)}")
return item
except Exception as e:
logger.critical(f"CRITICAL: Failed to async get item from queue: {str(e)}")
raise raise
# UTILITY METHODS # UTILITY METHODS
@@ -224,35 +186,10 @@ class RecheckPriorityQueue:
logger.critical(f"CRITICAL: Failed to get queued UUIDs: {str(e)}") logger.critical(f"CRITICAL: Failed to get queued UUIDs: {str(e)}")
return [] return []
def clear(self):
"""Clear all items from both priority storage and notification queue"""
try:
with self._lock:
# Clear priority items
self._priority_items.clear()
# Drain all notifications to prevent stale notifications
# This is critical for test cleanup to prevent queue desynchronization
drained = 0
while not self._notification_queue.empty():
try:
self._notification_queue.get_nowait()
drained += 1
except queue.Empty:
break
if drained > 0:
logger.debug(f"Cleared queue: removed {drained} notifications")
return True
except Exception as e:
logger.critical(f"CRITICAL: Failed to clear queue: {str(e)}")
return False
def close(self): def close(self):
"""Close the queue""" """Close the janus queue"""
try: try:
# Nothing to close for threading.Queue self._janus_queue.close()
logger.debug("RecheckPriorityQueue closed successfully") logger.debug("RecheckPriorityQueue closed successfully")
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to close RecheckPriorityQueue: {str(e)}") logger.critical(f"CRITICAL: Failed to close RecheckPriorityQueue: {str(e)}")
@@ -384,7 +321,7 @@ class RecheckPriorityQueue:
except Exception: except Exception:
pass pass
return 'unknown' return 'unknown'
def _emit_put_signals(self, item): def _emit_put_signals(self, item):
"""Emit signals when item is added""" """Emit signals when item is added"""
try: try:
@@ -393,14 +330,14 @@ class RecheckPriorityQueue:
watch_check_update = signal('watch_check_update') watch_check_update = signal('watch_check_update')
if watch_check_update: if watch_check_update:
watch_check_update.send(watch_uuid=item.item['uuid']) watch_check_update.send(watch_uuid=item.item['uuid'])
# Queue length signal # Queue length signal
if self.queue_length_signal: if self.queue_length_signal:
self.queue_length_signal.send(length=self.qsize()) self.queue_length_signal.send(length=self.qsize())
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to emit put signals: {str(e)}") logger.critical(f"CRITICAL: Failed to emit put signals: {str(e)}")
def _emit_get_signals(self): def _emit_get_signals(self):
"""Emit signals when item is removed""" """Emit signals when item is removed"""
try: try:
@@ -426,11 +363,12 @@ class NotificationQueue:
def __init__(self, maxsize: int = 0, datastore=None): def __init__(self, maxsize: int = 0, datastore=None):
try: try:
# Use pure threading.Queue to avoid event loop binding issues self._janus_queue = janus.Queue(maxsize=maxsize)
self._notification_queue = queue.Queue(maxsize=maxsize if maxsize > 0 else 0) # BOTH interfaces required - see class docstring for why
self.sync_q = self._janus_queue.sync_q # Flask routes, threads
self.async_q = self._janus_queue.async_q # Async workers
self.notification_event_signal = signal('notification_event') self.notification_event_signal = signal('notification_event')
self.datastore = datastore # For checking all_muted setting self.datastore = datastore # For checking all_muted setting
self._lock = threading.RLock()
logger.debug("NotificationQueue initialized successfully") logger.debug("NotificationQueue initialized successfully")
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to initialize NotificationQueue: {str(e)}") logger.critical(f"CRITICAL: Failed to initialize NotificationQueue: {str(e)}")
@@ -442,97 +380,72 @@ class NotificationQueue:
def put(self, item: Dict[str, Any], block: bool = True, timeout: Optional[float] = None): def put(self, item: Dict[str, Any], block: bool = True, timeout: Optional[float] = None):
"""Thread-safe sync put with signal emission""" """Thread-safe sync put with signal emission"""
logger.trace(f"NotificationQueue.put() called for item: {item.get('uuid', 'unknown')}, block={block}, timeout={timeout}")
try: try:
# Check if all notifications are muted # Check if all notifications are muted
if self.datastore and self.datastore.data['settings']['application'].get('all_muted', False): if self.datastore and self.datastore.data['settings']['application'].get('all_muted', False):
logger.debug(f"Notification blocked - all notifications are muted: {item.get('uuid', 'unknown')}") logger.debug(f"Notification blocked - all notifications are muted: {item.get('uuid', 'unknown')}")
return False return False
with self._lock: self.sync_q.put(item, block=block, timeout=timeout)
self._notification_queue.put(item, block=block, timeout=timeout)
self._emit_notification_signal(item) self._emit_notification_signal(item)
logger.trace(f"NotificationQueue.put() successfully queued notification: {item.get('uuid', 'unknown')}") logger.debug(f"Successfully queued notification: {item.get('uuid', 'unknown')}")
return True return True
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to put notification {item.get('uuid', 'unknown')}: {str(e)}") logger.critical(f"CRITICAL: Failed to put notification {item.get('uuid', 'unknown')}: {str(e)}")
return False return False
async def async_put(self, item: Dict[str, Any], executor=None): async def async_put(self, item: Dict[str, Any]):
"""Async put with signal emission - uses thread pool """Pure async put with signal emission"""
Args:
item: Notification item to queue
executor: Optional ThreadPoolExecutor
"""
logger.trace(f"NotificationQueue.async_put() called for item: {item.get('uuid', 'unknown')}, executor={executor}")
import asyncio
try: try:
# Check if all notifications are muted # Check if all notifications are muted
if self.datastore and self.datastore.data['settings']['application'].get('all_muted', False): if self.datastore and self.datastore.data['settings']['application'].get('all_muted', False):
logger.debug(f"Notification blocked - all notifications are muted: {item.get('uuid', 'unknown')}") logger.debug(f"Notification blocked - all notifications are muted: {item.get('uuid', 'unknown')}")
return False return False
loop = asyncio.get_event_loop() await self.async_q.put(item)
await loop.run_in_executor(executor, lambda: self.put(item, block=True, timeout=5.0)) self._emit_notification_signal(item)
logger.trace(f"NotificationQueue.async_put() successfully queued notification: {item.get('uuid', 'unknown')}") logger.debug(f"Successfully async queued notification: {item.get('uuid', 'unknown')}")
return True return True
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to async put notification {item.get('uuid', 'unknown')}: {str(e)}") logger.critical(f"CRITICAL: Failed to async put notification {item.get('uuid', 'unknown')}: {str(e)}")
return False return False
def get(self, block: bool = True, timeout: Optional[float] = None): def get(self, block: bool = True, timeout: Optional[float] = None):
"""Thread-safe sync get""" """Thread-safe sync get"""
logger.trace(f"NotificationQueue.get() called, block={block}, timeout={timeout}")
try: try:
with self._lock: return self.sync_q.get(block=block, timeout=timeout)
item = self._notification_queue.get(block=block, timeout=timeout)
logger.trace(f"NotificationQueue.get() retrieved item: {item.get('uuid', 'unknown') if isinstance(item, dict) else 'unknown'}")
return item
except queue.Empty as e: except queue.Empty as e:
logger.trace(f"NotificationQueue.get() timed out - queue is empty (timeout={timeout})")
raise e raise e
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to get notification: {type(e).__name__}: {str(e)}") logger.critical(f"CRITICAL: Failed to get notification: {str(e)}")
raise e raise e
async def async_get(self, executor=None): async def async_get(self):
"""Async get - uses thread pool """Pure async get"""
Args:
executor: Optional ThreadPoolExecutor
"""
logger.trace(f"NotificationQueue.async_get() called, executor={executor}")
import asyncio
try: try:
loop = asyncio.get_event_loop() return await self.async_q.get()
item = await loop.run_in_executor(executor, lambda: self.get(block=True, timeout=1.0))
logger.trace(f"NotificationQueue.async_get() retrieved item: {item.get('uuid', 'unknown') if isinstance(item, dict) else 'unknown'}")
return item
except queue.Empty as e: except queue.Empty as e:
logger.trace(f"NotificationQueue.async_get() timed out - queue is empty")
raise e raise e
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to async get notification: {type(e).__name__}: {str(e)}") logger.critical(f"CRITICAL: Failed to async get notification: {str(e)}")
raise e raise e
def qsize(self) -> int: def qsize(self) -> int:
"""Get current queue size""" """Get current queue size"""
try: try:
with self._lock: return self.sync_q.qsize()
return self._notification_queue.qsize()
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to get notification queue size: {str(e)}") logger.critical(f"CRITICAL: Failed to get notification queue size: {str(e)}")
return 0 return 0
def empty(self) -> bool: def empty(self) -> bool:
"""Check if queue is empty""" """Check if queue is empty"""
return self.qsize() == 0 return self.qsize() == 0
def close(self): def close(self):
"""Close the queue""" """Close the janus queue"""
try: try:
# Nothing to close for threading.Queue self._janus_queue.close()
logger.debug("NotificationQueue closed successfully") logger.debug("NotificationQueue closed successfully")
except Exception as e: except Exception as e:
logger.critical(f"CRITICAL: Failed to close NotificationQueue: {str(e)}") logger.critical(f"CRITICAL: Failed to close NotificationQueue: {str(e)}")
+2 -2
View File
@@ -37,9 +37,9 @@ def register_watch_operation_handlers(socketio, datastore):
# Import here to avoid circular imports # Import here to avoid circular imports
from changedetectionio.flask_app import update_q from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_pool from changedetectionio import worker_handler
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid})) worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
logger.info(f"Socket.IO: Queued recheck for watch {uuid}") logger.info(f"Socket.IO: Queued recheck for watch {uuid}")
else: else:
emit('operation_result', {'success': False, 'error': f'Unknown operation: {op}'}) emit('operation_result', {'success': False, 'error': f'Unknown operation: {op}'})
+4 -4
View File
@@ -145,10 +145,10 @@ def handle_watch_update(socketio, **kwargs):
# Emit the watch update to all connected clients # Emit the watch update to all connected clients
from changedetectionio.flask_app import update_q from changedetectionio.flask_app import update_q
from changedetectionio.flask_app import _jinja2_filter_datetime from changedetectionio.flask_app import _jinja2_filter_datetime
from changedetectionio import worker_pool from changedetectionio import worker_handler
# Get list of watches that are currently running # Get list of watches that are currently running
running_uuids = worker_pool.get_running_uuids() running_uuids = worker_handler.get_running_uuids()
# Get list of watches in the queue (efficient single-lock method) # Get list of watches in the queue (efficient single-lock method)
queue_list = update_q.get_queued_uuids() queue_list = update_q.get_queued_uuids()
@@ -252,7 +252,7 @@ def init_socketio(app, datastore):
def event_checkbox_operations(data): def event_checkbox_operations(data):
from changedetectionio.blueprint.ui import _handle_operations from changedetectionio.blueprint.ui import _handle_operations
from changedetectionio import queuedWatchMetaData from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_pool from changedetectionio import worker_handler
from changedetectionio.flask_app import update_q, watch_check_update from changedetectionio.flask_app import update_q, watch_check_update
import threading import threading
@@ -268,7 +268,7 @@ def init_socketio(app, datastore):
uuids=data.get('uuids'), uuids=data.get('uuids'),
datastore=datastore, datastore=datastore,
extra_data=data.get('extra_data'), extra_data=data.get('extra_data'),
worker_pool=worker_pool, worker_handler=worker_handler,
update_q=update_q, update_q=update_q,
queuedWatchMetaData=queuedWatchMetaData, queuedWatchMetaData=queuedWatchMetaData,
watch_check_update=watch_check_update, watch_check_update=watch_check_update,
+2 -6
View File
@@ -10,7 +10,6 @@
set -e set -e
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
rm tests/logs/* -f
# Since theres no curl installed lets roll with python3 # Since theres no curl installed lets roll with python3
check_sanity() { check_sanity() {
@@ -65,21 +64,18 @@ data_sanity_test
echo "-------------------- Running rest of tests in parallel -------------------------------" echo "-------------------- Running rest of tests in parallel -------------------------------"
# REMOVE_REQUESTS_OLD_SCREENSHOTS disabled so that we can write a screenshot and send it in test_notifications.py without a real browser # REMOVE_REQUESTS_OLD_SCREENSHOTS disabled so that we can write a screenshot and send it in test_notifications.py without a real browser
FETCH_WORKERS=2 REMOVE_REQUESTS_OLD_SCREENSHOTS=false \ REMOVE_REQUESTS_OLD_SCREENSHOTS=false \
pytest tests/test_*.py \ pytest tests/test_*.py \
-n 18 \ -n 30 \
--dist=load \ --dist=load \
-vvv \ -vvv \
-s \ -s \
--capture=no \ --capture=no \
-k "not test_queue_system" \
--log-cli-level=DEBUG \ --log-cli-level=DEBUG \
--log-cli-format="%(asctime)s [%(process)d] [%(levelname)s] %(name)s: %(message)s" --log-cli-format="%(asctime)s [%(process)d] [%(levelname)s] %(name)s: %(message)s"
echo "---------------------------- DONE parallel test ---------------------------------------" echo "---------------------------- DONE parallel test ---------------------------------------"
FETCH_WORKERS=20 pytest -vvv -s tests/test_queue_handler.py
echo "RUNNING WITH BASE_URL SET" echo "RUNNING WITH BASE_URL SET"
# Now re-run some tests with BASE_URL enabled # Now re-run some tests with BASE_URL enabled
@@ -222,19 +222,6 @@ code {
color: var(--color-white); color: var(--color-white);
background: var(--color-text-watch-tag-list); background: var(--color-text-watch-tag-list);
@extend .inline-tag; @extend .inline-tag;
/* Remove default anchor styling when used as links */
text-decoration: none;
&:hover {
text-decoration: none;
opacity: 0.8;
cursor: pointer;
}
&:visited {
color: var(--color-white);
}
} }
@media (min-width: 768px) { @media (min-width: 768px) {
+11 -29
View File
@@ -166,9 +166,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
""" """
logger.info(f"Datastore path is '{datastore_path}'") logger.info(f"Datastore path is '{datastore_path}'")
# CRITICAL: Update datastore_path (was using old path from __init__)
self.datastore_path = datastore_path
# Initialize data structure # Initialize data structure
self.__data = App.model() self.__data = App.model()
self.json_store_path = os.path.join(self.datastore_path, "changedetection.json") self.json_store_path = os.path.join(self.datastore_path, "changedetection.json")
@@ -221,16 +218,21 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# Load the legacy datastore to get its schema_version # Load the legacy datastore to get its schema_version
from .legacy_loader import load_legacy_format from .legacy_loader import load_legacy_format
legacy_path = os.path.join(self.datastore_path, "url-watches.json") legacy_path = os.path.join(self.datastore_path, "url-watches.json")
with open(legacy_path) as f: legacy_data = load_legacy_format(legacy_path)
self.__data = json.load(f)
if not self.__data: if not legacy_data:
raise Exception("Failed to load legacy datastore from url-watches.json") raise Exception("Failed to load legacy datastore from url-watches.json")
# Get the schema version from legacy datastore (defaults to 0 if not present)
legacy_schema_version = legacy_data.get('settings', {}).get('application', {}).get('schema_version', 0)
logger.info(f"Legacy datastore schema version: {legacy_schema_version}")
# Set our schema version to match the legacy one
self.__data['settings']['application']['schema_version'] = legacy_schema_version
# update_26 will load the legacy data again and migrate to new format # update_26 will load the legacy data again and migrate to new format
# Only run updates AFTER the legacy schema version (e.g., if legacy is at 25, only run 26+) # Only run updates AFTER the legacy schema version (e.g., if legacy is at 25, only run 26+)
self.run_updates() self.run_updates(current_schema_version=legacy_schema_version)
else: else:
# Fresh install - create new datastore # Fresh install - create new datastore
@@ -305,7 +307,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
else: else:
watch_class = get_custom_watch_obj_for_processor(entity.get('processor')) watch_class = get_custom_watch_obj_for_processor(entity.get('processor'))
if entity.get('processor') != 'text_json_diff': if entity.get('uuid') != 'text_json_diff':
logger.trace(f"Loading Watch object '{watch_class.__module__}.{watch_class.__name__}' for UUID {uuid}") logger.trace(f"Loading Watch object '{watch_class.__module__}.{watch_class.__name__}' for UUID {uuid}")
entity = watch_class(datastore_path=self.datastore_path, default=entity) entity = watch_class(datastore_path=self.datastore_path, default=entity)
@@ -371,13 +373,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.__data['watching'] = watching self.__data['watching'] = watching
self._watch_hashes = watch_hashes self._watch_hashes = watch_hashes
# Verify all watches have hashes
missing_hashes = [uuid for uuid in watching.keys() if uuid not in watch_hashes]
if missing_hashes:
logger.error(f"WARNING: {len(missing_hashes)} watches missing hashes after load: {missing_hashes[:5]}")
else:
logger.debug(f"All {len(watching)} watches have valid hashes")
def _delete_watch(self, uuid): def _delete_watch(self, uuid):
""" """
Delete a watch from storage. Delete a watch from storage.
@@ -607,19 +602,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return None return None
# Check PAGE_WATCH_LIMIT if set
page_watch_limit = os.getenv('PAGE_WATCH_LIMIT')
if page_watch_limit:
try:
page_watch_limit = int(page_watch_limit)
current_watch_count = len(self.__data['watching'])
if current_watch_count >= page_watch_limit:
logger.error(f"Watch limit reached: {current_watch_count}/{page_watch_limit} watches. Cannot add {url}")
flash(gettext("Watch limit reached ({}/{} watches). Cannot add more watches.").format(current_watch_count, page_watch_limit), 'error')
return None
except ValueError:
logger.warning(f"Invalid PAGE_WATCH_LIMIT value: {page_watch_limit}, ignoring limit check")
if tag and type(tag) == str: if tag and type(tag) == str:
# Then it's probably a string of the actual tag by name, split and add it # Then it's probably a string of the actual tag by name, split and add it
for t in tag.split(','): for t in tag.split(','):
@@ -16,11 +16,11 @@ import os
import tempfile import tempfile
import time import time
from concurrent.futures import ThreadPoolExecutor, as_completed from concurrent.futures import ThreadPoolExecutor, as_completed
from distutils.util import strtobool
from threading import Thread from threading import Thread
from loguru import logger from loguru import logger
from .base import DataStore from .base import DataStore
from .. import strtobool
# Try to import orjson for faster JSON serialization # Try to import orjson for faster JSON serialization
try: try:
@@ -322,9 +322,8 @@ def load_all_watches(datastore_path, rehydrate_entity_func, compute_hash_func):
watch, raw_data = load_watch_from_file(watch_json, uuid_dir, rehydrate_entity_func) watch, raw_data = load_watch_from_file(watch_json, uuid_dir, rehydrate_entity_func)
if watch and raw_data: if watch and raw_data:
watching[uuid_dir] = watch watching[uuid_dir] = watch
# Compute hash from rehydrated Watch object (as dict) to match how we compute on save # Compute hash from raw data BEFORE rehydration to match saved hash
# This ensures hash matches what audit will compute from dict(watch) watch_hashes[uuid_dir] = compute_hash_func(raw_data)
watch_hashes[uuid_dir] = compute_hash_func(dict(watch))
loaded += 1 loaded += 1
if loaded % 100 == 0: if loaded % 100 == 0:
@@ -744,7 +743,7 @@ class FileSavingDataStore(DataStore):
self._dirty_watches.add(uuid) self._dirty_watches.add(uuid)
changes_found += 1 changes_found += 1
logger.warning( logger.warning(
f"Audit detected unmarked change in watch {uuid[:8]}... current {current_hash:8} stored hash {stored_hash[:8]}" f"Audit detected unmarked change in watch {uuid[:8]}... "
f"(hash changed but not marked dirty)" f"(hash changed but not marked dirty)"
) )
self.needs_write = True self.needs_write = True
+25 -21
View File
@@ -534,7 +534,7 @@ class DatastoreUpdatesMixin:
logger.debug(f"Renaming history index {old_history_txt} to {new_history_txt}...") logger.debug(f"Renaming history index {old_history_txt} to {new_history_txt}...")
shutil.move(old_history_txt, new_history_txt) shutil.move(old_history_txt, new_history_txt)
def migrate_legacy_db_format(self): def update_26(self):
""" """
Migration: Individual watch persistence (COPY-based, safe rollback). Migration: Individual watch persistence (COPY-based, safe rollback).
@@ -578,6 +578,25 @@ class DatastoreUpdatesMixin:
# Populate settings from legacy data # Populate settings from legacy data
logger.info("Populating settings from legacy data...") logger.info("Populating settings from legacy data...")
if 'settings' in legacy_data:
self.data['settings'] = legacy_data['settings']
if 'app_guid' in legacy_data:
self.data['app_guid'] = legacy_data['app_guid']
if 'build_sha' in legacy_data:
self.data['build_sha'] = legacy_data['build_sha']
if 'version_tag' in legacy_data:
self.data['version_tag'] = legacy_data['version_tag']
# Rehydrate watches from legacy data
logger.info("Rehydrating watches from legacy data...")
self.data['watching'] = {}
for uuid, watch_data in legacy_data.get('watching', {}).items():
try:
self.data['watching'][uuid] = self.rehydrate_entity(uuid, watch_data)
except Exception as e:
logger.error(f"Failed to rehydrate watch {uuid}: {e}")
raise Exception(f"Migration failed: Could not rehydrate watch {uuid}. Error: {e}")
watch_count = len(self.data['watching']) watch_count = len(self.data['watching'])
logger.success(f"Loaded {watch_count} watches from legacy format") logger.success(f"Loaded {watch_count} watches from legacy format")
@@ -590,10 +609,12 @@ class DatastoreUpdatesMixin:
watch_dict = dict(watch) watch_dict = dict(watch)
watch_dir = os.path.join(self.datastore_path, uuid) watch_dir = os.path.join(self.datastore_path, uuid)
save_watch_atomic(watch_dir, uuid, watch_dict) save_watch_atomic(watch_dir, uuid, watch_dict)
# Initialize hash
self._watch_hashes[uuid] = self._compute_hash(watch_dict)
saved_count += 1 saved_count += 1
if saved_count % 100 == 0: if saved_count % 100 == 0:
logger.info(f" Progress: {saved_count}/{watch_count} watches migrated...") logger.info(f" Progress: {saved_count}/{watch_count} watches saved...")
except Exception as e: except Exception as e:
logger.error(f"Failed to save watch {uuid}: {e}") logger.error(f"Failed to save watch {uuid}: {e}")
@@ -646,25 +667,9 @@ class DatastoreUpdatesMixin:
# Success! Now reload from new format # Success! Now reload from new format
logger.critical("Reloading datastore from new format...") logger.critical("Reloading datastore from new format...")
self._load_state() # Includes load_watches self._load_state()
logger.success("Datastore reloaded from new format successfully") logger.success("Datastore reloaded from new format successfully")
# Verify all watches have hashes after migration
missing_hashes = [uuid for uuid in self.data['watching'].keys() if uuid not in self._watch_hashes]
if missing_hashes:
logger.error(f"WARNING: {len(missing_hashes)} watches missing hashes after migration: {missing_hashes[:5]}")
else:
logger.success(f"All {len(self.data['watching'])} watches have valid hashes after migration")
# Set schema version to latest available update
# This prevents re-running updates and re-marking all watches as dirty
updates_available = self.get_updates_available()
latest_schema = updates_available[-1] if updates_available else 26
self.data['settings']['application']['schema_version'] = latest_schema
self.mark_settings_dirty()
logger.info(f"Set schema_version to {latest_schema} (migration complete, all watches already saved)")
logger.critical("=" * 80) logger.critical("=" * 80)
logger.critical("MIGRATION COMPLETED SUCCESSFULLY!") logger.critical("MIGRATION COMPLETED SUCCESSFULLY!")
logger.critical("=" * 80) logger.critical("=" * 80)
@@ -682,5 +687,4 @@ class DatastoreUpdatesMixin:
logger.info(f" - rm {os.path.join(self.datastore_path, 'url-watches.json')}") logger.info(f" - rm {os.path.join(self.datastore_path, 'url-watches.json')}")
logger.info("") logger.info("")
def update_26(self): # Schema version will be updated by run_updates()
self.migrate_legacy_db_format()
+21 -43
View File
@@ -70,8 +70,8 @@ test_single_url() {
local test_id=$1 local test_id=$1
local dir="/tmp/cli-test-single-${test_id}-$$" local dir="/tmp/cli-test-single-${test_id}-$$"
timeout 10 python3 changedetection.py -d "$dir" -C -u https://example.com -b &>/dev/null timeout 10 python3 changedetection.py -d "$dir" -C -u https://example.com -b &>/dev/null
# Count watch directories (UUID directories containing watch.json) [ -f "$dir/url-watches.json" ] && \
[ "$(find "$dir" -mindepth 2 -maxdepth 2 -name 'watch.json' | wc -l)" -eq 1 ] [ "$(python3 -c "import json; print(len(json.load(open('$dir/url-watches.json')).get('watching', {})))")" -eq 1 ]
} }
test_multiple_urls() { test_multiple_urls() {
@@ -82,8 +82,8 @@ test_multiple_urls() {
-u https://github.com \ -u https://github.com \
-u https://httpbin.org \ -u https://httpbin.org \
-b &>/dev/null -b &>/dev/null
# Count watch directories (UUID directories containing watch.json) [ -f "$dir/url-watches.json" ] && \
[ "$(find "$dir" -mindepth 2 -maxdepth 2 -name 'watch.json' | wc -l)" -eq 3 ] [ "$(python3 -c "import json; print(len(json.load(open('$dir/url-watches.json')).get('watching', {})))")" -eq 3 ]
} }
test_url_with_options() { test_url_with_options() {
@@ -93,17 +93,8 @@ test_url_with_options() {
-u https://example.com \ -u https://example.com \
-u0 '{"title":"Test Site","processor":"text_json_diff"}' \ -u0 '{"title":"Test Site","processor":"text_json_diff"}' \
-b &>/dev/null -b &>/dev/null
# Check that at least one watch.json contains the title "Test Site" [ -f "$dir/url-watches.json" ] && \
python3 -c " python3 -c "import json; data=json.load(open('$dir/url-watches.json')); watches=data.get('watching', {}); exit(0 if any(w.get('title')=='Test Site' for w in watches.values()) else 1)"
import json, glob, sys
watch_files = glob.glob('$dir/*/watch.json')
for wf in watch_files:
with open(wf) as f:
data = json.load(f)
if data.get('title') == 'Test Site':
sys.exit(0)
sys.exit(1)
"
} }
test_multiple_urls_with_options() { test_multiple_urls_with_options() {
@@ -115,19 +106,9 @@ test_multiple_urls_with_options() {
-u https://github.com \ -u https://github.com \
-u1 '{"title":"Site Two"}' \ -u1 '{"title":"Site Two"}' \
-b &>/dev/null -b &>/dev/null
# Check that we have 2 watches and both titles are present [ -f "$dir/url-watches.json" ] && \
python3 -c " [ "$(python3 -c "import json; print(len(json.load(open('$dir/url-watches.json')).get('watching', {})))")" -eq 2 ] && \
import json, glob, sys python3 -c "import json; data=json.load(open('$dir/url-watches.json')); watches=data.get('watching', {}); titles=[w.get('title') for w in watches.values()]; exit(0 if 'Site One' in titles and 'Site Two' in titles else 1)"
watch_files = glob.glob('$dir/*/watch.json')
if len(watch_files) != 2:
sys.exit(1)
titles = []
for wf in watch_files:
with open(wf) as f:
data = json.load(f)
titles.append(data.get('title'))
sys.exit(0 if 'Site One' in titles and 'Site Two' in titles else 1)
"
} }
test_batch_mode_exit() { test_batch_mode_exit() {
@@ -145,24 +126,21 @@ test_batch_mode_exit() {
test_recheck_all() { test_recheck_all() {
local test_id=$1 local test_id=$1
local dir="/tmp/cli-test-recheck-all-${test_id}-$$" local dir="/tmp/cli-test-recheck-all-${test_id}-$$"
# Create a watch using CLI, then recheck it mkdir -p "$dir"
timeout 10 python3 changedetection.py -d "$dir" -C -u https://example.com -b &>/dev/null cat > "$dir/url-watches.json" << 'EOF'
# Now recheck all watches {"watching":{"test-uuid":{"url":"https://example.com","last_checked":0,"processor":"text_json_diff","uuid":"test-uuid"}},"settings":{"application":{"password":false}}}
timeout 10 python3 changedetection.py -d "$dir" -r all -b 2>&1 | grep -q "Queuing" EOF
timeout 10 python3 changedetection.py -d "$dir" -r all -b 2>&1 | grep -q "Queuing all"
} }
test_recheck_specific() { test_recheck_specific() {
local test_id=$1 local test_id=$1
local dir="/tmp/cli-test-recheck-uuid-${test_id}-$$" local dir="/tmp/cli-test-recheck-uuid-${test_id}-$$"
# Create 2 watches using CLI mkdir -p "$dir"
timeout 12 python3 changedetection.py -d "$dir" -C \ cat > "$dir/url-watches.json" << 'EOF'
-u https://example.com \ {"watching":{"uuid-1":{"url":"https://example.com","last_checked":0,"processor":"text_json_diff","uuid":"uuid-1"},"uuid-2":{"url":"https://github.com","last_checked":0,"processor":"text_json_diff","uuid":"uuid-2"}},"settings":{"application":{"password":false}}}
-u https://github.com \ EOF
-b &>/dev/null timeout 10 python3 changedetection.py -d "$dir" -r uuid-1,uuid-2 -b 2>&1 | grep -q "Queuing 2 specific watches"
# Get the UUIDs that were created
local uuids=$(find "$dir" -mindepth 2 -maxdepth 2 -name 'watch.json' -exec dirname {} \; | xargs -n1 basename | tr '\n' ',' | sed 's/,$//')
# Now recheck specific UUIDs
timeout 10 python3 changedetection.py -d "$dir" -r "$uuids" -b 2>&1 | grep -q "Queuing"
} }
test_combined_operations() { test_combined_operations() {
@@ -173,8 +151,8 @@ test_combined_operations() {
-u https://github.com \ -u https://github.com \
-r all \ -r all \
-b &>/dev/null -b &>/dev/null
# Count watch directories (UUID directories containing watch.json) [ -f "$dir/url-watches.json" ] && \
[ "$(find "$dir" -mindepth 2 -maxdepth 2 -name 'watch.json' | wc -l)" -eq 2 ] [ "$(python3 -c "import json; print(len(json.load(open('$dir/url-watches.json')).get('watching', {})))")" -eq 2 ]
} }
test_invalid_json() { test_invalid_json() {
+3 -145
View File
@@ -9,11 +9,6 @@ from changedetectionio import store
import os import os
import sys import sys
# CRITICAL: Set short timeout for tests to prevent 45-second hangs
# When test server is slow/unresponsive, workers fail fast instead of holding UUIDs for 45s
# This prevents exponential priority growth from repeated deferrals (priority × 10 each defer)
os.environ['DEFAULT_SETTINGS_REQUESTS_TIMEOUT'] = '5'
from changedetectionio.flask_app import init_app_secret, changedetection_app from changedetectionio.flask_app import init_app_secret, changedetection_app
from changedetectionio.tests.util import live_server_setup, new_live_server_setup from changedetectionio.tests.util import live_server_setup, new_live_server_setup
@@ -34,93 +29,6 @@ def reportlog(pytestconfig):
logger.remove(handler_id) logger.remove(handler_id)
@pytest.fixture(autouse=True)
def per_test_log_file(request):
"""Create a separate log file for each test function with pytest output."""
import re
# Create logs directory if it doesn't exist
log_dir = os.path.join(os.path.dirname(__file__), "logs")
os.makedirs(log_dir, exist_ok=True)
# Generate log filename from test name and worker ID (for parallel runs)
test_name = request.node.name
# Sanitize test name - replace unsafe characters with underscores
# Keep only alphanumeric, dash, underscore, and period
safe_test_name = re.sub(r'[^\w\-.]', '_', test_name)
# Limit length to avoid filesystem issues (max 200 chars)
if len(safe_test_name) > 200:
# Keep first 150 chars + hash of full name + last 30 chars
import hashlib
name_hash = hashlib.md5(test_name.encode()).hexdigest()[:8]
safe_test_name = f"{safe_test_name[:150]}_{name_hash}_{safe_test_name[-30:]}"
worker_id = os.environ.get('PYTEST_XDIST_WORKER', 'master')
log_file = os.path.join(log_dir, f"{safe_test_name}_{worker_id}.log")
# Add file handler for this test with TRACE level
handler_id = logger.add(
log_file,
format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {process} | {name}:{function}:{line} - {message}",
level="TRACE",
mode="w", # Overwrite if exists
enqueue=True # Thread-safe
)
logger.info(f"=== Starting test: {test_name} (worker: {worker_id}) ===")
logger.info(f"Test location: {request.node.nodeid}")
yield
# Capture test outcome (PASSED/FAILED/SKIPPED/ERROR)
outcome = "UNKNOWN"
exc_info = None
stdout = None
stderr = None
if hasattr(request.node, 'rep_call'):
outcome = request.node.rep_call.outcome.upper()
if request.node.rep_call.failed:
exc_info = request.node.rep_call.longreprtext
# Capture stdout/stderr from call phase
if hasattr(request.node.rep_call, 'sections'):
for section_name, section_content in request.node.rep_call.sections:
if 'stdout' in section_name.lower():
stdout = section_content
elif 'stderr' in section_name.lower():
stderr = section_content
elif hasattr(request.node, 'rep_setup'):
if request.node.rep_setup.failed:
outcome = "SETUP_FAILED"
exc_info = request.node.rep_setup.longreprtext
logger.info(f"=== Test Result: {outcome} ===")
if exc_info:
logger.error(f"=== Test Failure Details ===\n{exc_info}")
if stdout:
logger.info(f"=== Captured stdout ===\n{stdout}")
if stderr:
logger.warning(f"=== Captured stderr ===\n{stderr}")
logger.info(f"=== Finished test: {test_name} ===")
logger.remove(handler_id)
@pytest.hookimpl(tryfirst=True, hookwrapper=True)
def pytest_runtest_makereport(item, call):
"""Hook to capture test results and attach to the test node."""
outcome = yield
rep = outcome.get_result()
# Store report on the test node for access in fixtures
setattr(item, f"rep_{rep.when}", rep)
@pytest.fixture @pytest.fixture
def environment(mocker): def environment(mocker):
"""Mock arrow.now() to return a fixed datetime for testing jinja2 time extension.""" """Mock arrow.now() to return a fixed datetime for testing jinja2 time extension."""
@@ -257,57 +165,6 @@ def prepare_test_function(live_server, datastore_path):
except: except:
break break
# Add test helper methods to the app for worker management
def set_workers(count):
"""Set the number of workers for testing - brutal shutdown, no delays"""
from changedetectionio import worker_pool
from changedetectionio.flask_app import update_q, notification_q
current_count = worker_pool.get_worker_count()
# Special case: Setting to 0 means shutdown all workers brutally
if count == 0:
logger.debug(f"Brutally shutting down all {current_count} workers")
worker_pool.shutdown_workers()
return {
'status': 'success',
'message': f'Shutdown all {current_count} workers',
'previous_count': current_count,
'current_count': 0
}
# Adjust worker count (no delays, no verification)
result = worker_pool.adjust_async_worker_count(
count,
update_q=update_q,
notification_q=notification_q,
app=live_server.app,
datastore=datastore
)
return result
def check_all_workers_alive(expected_count):
"""Check that all expected workers are alive"""
from changedetectionio import worker_pool
from changedetectionio.flask_app import update_q, notification_q
result = worker_pool.check_worker_health(
expected_count,
update_q=update_q,
notification_q=notification_q,
app=live_server.app,
datastore=datastore
)
assert result['status'] == 'healthy', f"Workers not healthy: {result['message']}"
return result
# Attach helper methods to app for easy test access
live_server.app.set_workers = set_workers
live_server.app.check_all_workers_alive = check_all_workers_alive
# Prevent background thread from writing during cleanup/reload # Prevent background thread from writing during cleanup/reload
datastore.needs_write = False datastore.needs_write = False
datastore.needs_write_urgent = False datastore.needs_write_urgent = False
@@ -405,8 +262,8 @@ def app(request, datastore_path):
# Shutdown workers gracefully before loguru cleanup # Shutdown workers gracefully before loguru cleanup
try: try:
from changedetectionio import worker_pool from changedetectionio import worker_handler
worker_pool.shutdown_workers() worker_handler.shutdown_workers()
except Exception: except Exception:
pass pass
@@ -454,3 +311,4 @@ def app(request, datastore_path):
yield app yield app
@@ -1,41 +0,0 @@
import time
from flask import url_for
from changedetectionio.tests.util import wait_for_all_checks
def test_check_plugin_processor(client, live_server, measure_memory_usage, datastore_path):
# requires os-int intelligence plugin installed (first basic one we test with)
res = client.get(url_for("watchlist.index"))
assert b'OSINT Reconnaissance' in res.data, "Must have the OSINT plugin installed at test time"
assert b'<input checked id="processor-0" name="processor" type="radio" value="text_json_diff">' in res.data, "But the first text_json_diff processor should always be selected by default in quick watch form"
res = client.post(
url_for("ui.ui_views.form_quick_watch_add"),
data={"url": 'http://127.0.0.1', "tags": '', 'processor': 'osint_recon'},
follow_redirects=True
)
assert b"Watch added" in res.data
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(
url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True
)
assert b'Target: http://127.0.0.1' in res.data
assert b'DNSKEY Records' in res.data
wait_for_all_checks(client)
# Now change it to something that doesnt exist
uuid = next(iter(live_server.app.config['DATASTORE'].data['watching']))
live_server.app.config['DATASTORE'].data['watching'][uuid]['processor'] = "now_missing"
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
assert b"Exception: Processor module" in res.data and b'now_missing' in res.data, f'Should register that the plugin is missing for {uuid}'
@@ -1,166 +0,0 @@
#!/usr/bin/env python3
"""
Test notification_urls validation in Watch and Tag API endpoints.
Ensures that invalid AppRise URLs are rejected when setting notification_urls.
Valid AppRise notification URLs use specific protocols like:
- posts://example.com - POST to HTTP endpoint
- gets://example.com - GET to HTTP endpoint
- mailto://user@example.com - Email
- slack://token/channel - Slack
- discord://webhook_id/webhook_token - Discord
- etc.
Invalid notification URLs:
- https://example.com - Plain HTTPS is NOT a valid AppRise notification protocol
- ftp://example.com - FTP is NOT a valid AppRise notification protocol
- Plain URLs without proper AppRise protocol prefix
"""
from flask import url_for
import json
def test_watch_notification_urls_validation(client, live_server, measure_memory_usage, datastore_path):
"""Test that Watch PUT/POST endpoints validate notification_urls."""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Test 1: Create a watch with valid notification URLs
valid_urls = ["posts://example.com/notify1", "posts://example.com/notify2"]
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": "https://example.com",
"notification_urls": valid_urls
}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 201, "Should accept valid notification URLs on watch creation"
watch_uuid = res.json['uuid']
# Verify the notification URLs were saved
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
assert res.status_code == 200
assert set(res.json['notification_urls']) == set(valid_urls), "Valid notification URLs should be saved"
# Test 2: Try to create a watch with invalid notification URLs (https:// is not valid)
invalid_urls = ["https://example.com/webhook"]
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": "https://example.com",
"notification_urls": invalid_urls
}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject https:// notification URLs (not a valid AppRise protocol)"
assert b"is not a valid AppRise URL" in res.data, "Should provide AppRise validation error message"
# Test 2b: Also test other invalid protocols
invalid_urls_ftp = ["ftp://not-apprise-url"]
res = client.post(
url_for("createwatch"),
data=json.dumps({
"url": "https://example.com",
"notification_urls": invalid_urls_ftp
}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject ftp:// notification URLs"
assert b"is not a valid AppRise URL" in res.data, "Should provide AppRise validation error message"
# Test 3: Update watch with valid notification URLs
new_valid_urls = ["posts://newserver.com"]
res = client.put(
url_for("watch", uuid=watch_uuid),
data=json.dumps({"notification_urls": new_valid_urls}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 200, "Should accept valid notification URLs on watch update"
# Verify the notification URLs were updated
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
assert res.status_code == 200
assert res.json['notification_urls'] == new_valid_urls, "Valid notification URLs should be updated"
# Test 4: Try to update watch with invalid notification URLs (plain https:// not valid)
invalid_https_url = ["https://example.com/webhook"]
res = client.put(
url_for("watch", uuid=watch_uuid),
data=json.dumps({"notification_urls": invalid_https_url}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject https:// notification URLs on watch update"
assert b"is not a valid AppRise URL" in res.data, "Should provide AppRise validation error message"
# Test 5: Update watch with non-list notification_urls (caught by OpenAPI schema validation)
res = client.put(
url_for("watch", uuid=watch_uuid),
data=json.dumps({"notification_urls": "not-a-list"}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject non-list notification_urls"
assert b"OpenAPI validation failed" in res.data or b"Request body validation error" in res.data
# Test 6: Verify original URLs are preserved after failed update
res = client.get(
url_for("watch", uuid=watch_uuid),
headers={'x-api-key': api_key}
)
assert res.status_code == 200
assert res.json['notification_urls'] == new_valid_urls, "URLs should remain unchanged after validation failure"
def test_tag_notification_urls_validation(client, live_server, measure_memory_usage, datastore_path):
"""Test that Tag PUT endpoint validates notification_urls."""
from changedetectionio.model import Tag
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
datastore = live_server.app.config['DATASTORE']
# Create a tag
tag_uuid = datastore.add_tag(title="Test Tag")
assert tag_uuid is not None
# Test 1: Update tag with valid notification URLs
valid_urls = ["posts://example.com/tag-notify"]
res = client.put(
url_for("tag", uuid=tag_uuid),
data=json.dumps({"notification_urls": valid_urls}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 200, "Should accept valid notification URLs on tag update"
# Verify the notification URLs were saved
tag = datastore.data['settings']['application']['tags'][tag_uuid]
assert tag['notification_urls'] == valid_urls, "Valid notification URLs should be saved to tag"
# Test 2: Try to update tag with invalid notification URLs (https:// not valid)
invalid_urls = ["https://example.com/webhook"]
res = client.put(
url_for("tag", uuid=tag_uuid),
data=json.dumps({"notification_urls": invalid_urls}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject https:// notification URLs on tag update"
assert b"is not a valid AppRise URL" in res.data, "Should provide AppRise validation error message"
# Test 3: Update tag with non-list notification_urls (caught by OpenAPI schema validation)
res = client.put(
url_for("tag", uuid=tag_uuid),
data=json.dumps({"notification_urls": "not-a-list"}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 400, "Should reject non-list notification_urls"
assert b"OpenAPI validation failed" in res.data or b"Request body validation error" in res.data
# Test 4: Verify original URLs are preserved after failed update
tag = datastore.data['settings']['application']['tags'][tag_uuid]
assert tag['notification_urls'] == valid_urls, "URLs should remain unchanged after validation failure"
@@ -2,7 +2,7 @@
import time import time
from flask import url_for from flask import url_for
from .util import live_server_setup, extract_UUID_from_client, wait_for_all_checks, delete_all_watches from .util import live_server_setup, extract_UUID_from_client, wait_for_all_checks
import os import os
@@ -116,7 +116,7 @@ def test_check_ldjson_price_autodetect(client, live_server, measure_memory_usage
# And not this cause its not the ld-json # And not this cause its not the ld-json
assert b"So let's see what happens" not in res.data assert b"So let's see what happens" not in res.data
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
########################################################################################## ##########################################################################################
# And we shouldnt see the offer # And we shouldnt see the offer
@@ -131,7 +131,7 @@ def test_check_ldjson_price_autodetect(client, live_server, measure_memory_usage
assert b'ldjson-price-track-offer' not in res.data assert b'ldjson-price-track-offer' not in res.data
########################################################################################## ##########################################################################################
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def _test_runner_check_bad_format_ignored(live_server, client, has_ldjson_price_data): def _test_runner_check_bad_format_ignored(live_server, client, has_ldjson_price_data):
@@ -147,7 +147,7 @@ def _test_runner_check_bad_format_ignored(live_server, client, has_ldjson_price_
########################################################################################## ##########################################################################################
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_bad_ldjson_is_correctly_ignored(client, live_server, measure_memory_usage, datastore_path): def test_bad_ldjson_is_correctly_ignored(client, live_server, measure_memory_usage, datastore_path):
+1 -1
View File
@@ -414,4 +414,4 @@ def test_plaintext_even_if_xml_content_and_can_apply_filters(client, live_server
assert b'Abonnementen bijwerken' in res.data assert b'Abonnementen bijwerken' in res.data
assert b'&lt;foobar' not in res.data assert b'&lt;foobar' not in res.data
res = delete_all_watches(client) res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
@@ -6,7 +6,7 @@ from .util import (
set_original_response, set_original_response,
set_modified_response, set_modified_response,
live_server_setup, live_server_setup,
wait_for_all_checks, delete_all_watches wait_for_all_checks
) )
from loguru import logger from loguru import logger
@@ -104,7 +104,7 @@ def run_socketio_watch_update_test(client, live_server, password_mode="", datast
assert watch.has_unviewed, "The watch was not marked as unviewed after content change" assert watch.has_unviewed, "The watch was not marked as unviewed after content change"
# Clean up # Clean up
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_everything(live_server, client, measure_memory_usage, datastore_path): def test_everything(live_server, client, measure_memory_usage, datastore_path):
+20 -5
View File
@@ -69,7 +69,7 @@ def test_conditions_with_text_and_number(client, live_server, measure_memory_usa
# 1. The page filtered text must contain "5" (first digit of value) # 1. The page filtered text must contain "5" (first digit of value)
# 2. The extracted number should be >= 20 and <= 100 # 2. The extracted number should be >= 20 and <= 100
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={ data={
"url": test_url, "url": test_url,
"fetch_backend": "html_requests", "fetch_backend": "html_requests",
@@ -110,20 +110,25 @@ def test_conditions_with_text_and_number(client, live_server, measure_memory_usa
wait_for_all_checks(client) wait_for_all_checks(client)
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True) client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(1) time.sleep(0.2)
wait_for_all_checks(client)
# Case 1 # Case 1
set_number_in_range_response(datastore_path=datastore_path, number="70.5") set_number_in_range_response(datastore_path=datastore_path, number="70.5")
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
time.sleep(2)
# 75 is > 20 and < 100 and contains "5" # 75 is > 20 and < 100 and contains "5"
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert b'has-unread-changes' in res.data assert b'has-unread-changes' in res.data
# Case 2: Change with one condition violated # Case 2: Change with one condition violated
# Number out of range (150) but contains '5' # Number out of range (150) but contains '5'
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True) client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
set_number_out_of_range_response(datastore_path=datastore_path, number="150.5") set_number_out_of_range_response(datastore_path=datastore_path, number="150.5")
@@ -149,6 +154,7 @@ def test_condition_validate_rule_row(client, live_server, measure_memory_usage,
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
uuid = next(iter(live_server.app.config['DATASTORE'].data['watching']))
# the front end submits the current form state which should override the watch in a temporary copy # the front end submits the current form state which should override the watch in a temporary copy
res = client.post( res = client.post(
@@ -189,8 +195,12 @@ def test_condition_validate_rule_row(client, live_server, measure_memory_usage,
) )
assert res.status_code == 200 assert res.status_code == 200
assert b'false' in res.data assert b'false' in res.data
# cleanup for the next
client.get(
url_for("ui.form_delete", uuid="all"),
follow_redirects=True
)
delete_all_watches(client)
# If there was only a change in the whitespacing, then we shouldnt have a change detected # If there was only a change in the whitespacing, then we shouldnt have a change detected
@@ -220,12 +230,17 @@ def test_wordcount_conditions_plugin(client, live_server, measure_memory_usage,
# Check it saved # Check it saved
res = client.get( res = client.get(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
) )
# Assert the word count is counted correctly # Assert the word count is counted correctly
assert b'<td>13</td>' in res.data assert b'<td>13</td>' in res.data
delete_all_watches(client)
# cleanup for the next
client.get(
url_for("ui.form_delete", uuid="all"),
follow_redirects=True
)
# If there was only a change in the whitespacing, then we shouldnt have a change detected # If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_lev_conditions_plugin(client, live_server, measure_memory_usage, datastore_path): def test_lev_conditions_plugin(client, live_server, measure_memory_usage, datastore_path):
@@ -64,7 +64,6 @@ def test_DNS_errors(client, live_server, measure_memory_usage, datastore_path):
follow_redirects=True follow_redirects=True
) )
assert b"1 Imported" in res.data assert b"1 Imported" in res.data
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up # Give the thread time to pick it up
wait_for_all_checks(client) wait_for_all_checks(client)
@@ -80,7 +79,7 @@ def test_DNS_errors(client, live_server, measure_memory_usage, datastore_path):
) )
assert found_name_resolution_error assert found_name_resolution_error
# Should always record that we tried # Should always record that we tried
assert "just now".encode('utf-8') in res.data or 'seconds ago'.encode('utf-8') in res.data assert bytes("just now".encode('utf-8')) in res.data
delete_all_watches(client) delete_all_watches(client)
# Re 1513 # Re 1513
@@ -1,7 +1,7 @@
import os import os
import time import time
from flask import url_for from flask import url_for
from .util import set_original_response, wait_for_all_checks, wait_for_notification_endpoint_output, delete_all_watches from .util import set_original_response, wait_for_all_checks, wait_for_notification_endpoint_output
from ..notification import valid_notification_formats from ..notification import valid_notification_formats
@@ -118,10 +118,8 @@ def run_filter_test(client, live_server, content_filter, app_notification_format
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert b'Warning, no filters were found' in res.data assert b'Warning, no filters were found' in res.data
assert not os.path.isfile(notification_file) assert not os.path.isfile(notification_file)
time.sleep(2) time.sleep(1)
wait_for_all_checks(client)
wait_for_all_checks(client)
assert live_server.app.config['DATASTORE'].data['watching'][uuid]['consecutive_filter_failures'] == 5 assert live_server.app.config['DATASTORE'].data['watching'][uuid]['consecutive_filter_failures'] == 5
time.sleep(2) time.sleep(2)
@@ -180,7 +178,6 @@ def run_filter_test(client, live_server, content_filter, app_notification_format
follow_redirects=True follow_redirects=True
) )
os.unlink(notification_file) os.unlink(notification_file)
delete_all_watches(client)
def test_check_include_filters_failure_notification(client, live_server, measure_memory_usage, datastore_path): def test_check_include_filters_failure_notification(client, live_server, measure_memory_usage, datastore_path):
@@ -188,12 +185,10 @@ def test_check_include_filters_failure_notification(client, live_server, measure
run_filter_test(client=client, live_server=live_server, content_filter='#nope-doesnt-exist', app_notification_format=valid_notification_formats.get('htmlcolor'), datastore_path=datastore_path) run_filter_test(client=client, live_server=live_server, content_filter='#nope-doesnt-exist', app_notification_format=valid_notification_formats.get('htmlcolor'), datastore_path=datastore_path)
# Check markup send conversion didnt affect plaintext preference # Check markup send conversion didnt affect plaintext preference
run_filter_test(client=client, live_server=live_server, content_filter='#nope-doesnt-exist', app_notification_format=valid_notification_formats.get('text'), datastore_path=datastore_path) run_filter_test(client=client, live_server=live_server, content_filter='#nope-doesnt-exist', app_notification_format=valid_notification_formats.get('text'), datastore_path=datastore_path)
delete_all_watches(client)
def test_check_xpath_filter_failure_notification(client, live_server, measure_memory_usage, datastore_path): def test_check_xpath_filter_failure_notification(client, live_server, measure_memory_usage, datastore_path):
# # live_server_setup(live_server) # Setup on conftest per function # # live_server_setup(live_server) # Setup on conftest per function
run_filter_test(client=client, live_server=live_server, content_filter='//*[@id="nope-doesnt-exist"]', app_notification_format=valid_notification_formats.get('htmlcolor'), datastore_path=datastore_path) run_filter_test(client=client, live_server=live_server, content_filter='//*[@id="nope-doesnt-exist"]', app_notification_format=valid_notification_formats.get('htmlcolor'), datastore_path=datastore_path)
delete_all_watches(client)
# Test that notification is never sent # Test that notification is never sent
@@ -202,4 +197,3 @@ def test_basic_markup_from_text(client, live_server, measure_memory_usage, datas
from ..notification.handler import markup_text_links_to_html from ..notification.handler import markup_text_links_to_html
x = markup_text_links_to_html("hello https://google.com") x = markup_text_links_to_html("hello https://google.com")
assert 'a href' in x assert 'a href' in x
delete_all_watches(client)
+1 -2
View File
@@ -166,8 +166,7 @@ def test_tag_add_in_ui(client, live_server, measure_memory_usage, datastore_path
delete_all_watches(client) delete_all_watches(client)
def test_group_tag_notification(client, live_server, measure_memory_usage, datastore_path): def test_group_tag_notification(client, live_server, measure_memory_usage, datastore_path):
delete_all_watches(client)
set_original_response(datastore_path=datastore_path) set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', _external=True) test_url = url_for('test_endpoint', _external=True)
@@ -142,8 +142,6 @@ def test_consistent_history(client, live_server, measure_memory_usage, datastore
assert '"default"' not in f.read(), "'default' probably shouldnt be here, it came from when the 'default' Watch vars were accidently being saved" assert '"default"' not in f.read(), "'default' probably shouldnt be here, it came from when the 'default' Watch vars were accidently being saved"
delete_all_watches(client)
def test_check_text_history_view(client, live_server, measure_memory_usage, datastore_path): def test_check_text_history_view(client, live_server, measure_memory_usage, datastore_path):
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f: with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
@@ -164,7 +162,7 @@ def test_check_text_history_view(client, live_server, measure_memory_usage, data
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
res = client.get(url_for("ui.ui_diff.diff_history_page", uuid=uuid)) res = client.get(url_for("ui.ui_diff.diff_history_page", uuid="first"))
assert b'test-one' in res.data assert b'test-one' in res.data
assert b'test-two' in res.data assert b'test-two' in res.data
@@ -40,7 +40,10 @@ def set_some_changed_response(datastore_path):
def test_normal_page_check_works_with_ignore_status_code(client, live_server, measure_memory_usage, datastore_path): def test_normal_page_check_works_with_ignore_status_code(client, live_server, measure_memory_usage, datastore_path):
from loguru import logger
# Give the endpoint time to spin up
time.sleep(1)
set_original_response(datastore_path=datastore_path) set_original_response(datastore_path=datastore_path)
@@ -59,41 +62,20 @@ def test_normal_page_check_works_with_ignore_status_code(client, live_server, me
# Add our URL to the import page # Add our URL to the import page
test_url = url_for('test_endpoint', _external=True) test_url = url_for('test_endpoint', _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url) uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
logger.info(f"TEST: First check - queuing UUID {uuid}")
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
logger.info(f"TEST: Waiting for first check to complete") wait_for_all_checks(client)
wait_result = wait_for_all_checks(client)
logger.info(f"TEST: First check wait completed: {wait_result}")
# Check history after first check
watch = client.application.config.get('DATASTORE').data['watching'][uuid]
logger.info(f"TEST: After first check - history count: {len(watch.history.keys())}")
set_some_changed_response(datastore_path=datastore_path) set_some_changed_response(datastore_path=datastore_path)
wait_for_all_checks(client)
# Trigger a check # Trigger a check
logger.info(f"TEST: Second check - queuing UUID {uuid}")
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
logger.info(f"TEST: Waiting for second check to complete") # Give the thread time to pick it up
wait_result = wait_for_all_checks(client) wait_for_all_checks(client)
logger.info(f"TEST: Second check wait completed: {wait_result}")
# Check history after second check
watch = client.application.config.get('DATASTORE').data['watching'][uuid]
logger.info(f"TEST: After second check - history count: {len(watch.history.keys())}")
logger.info(f"TEST: Watch history keys: {list(watch.history.keys())}")
# It should report nothing found (no new 'has-unread-changes' class) # It should report nothing found (no new 'has-unread-changes' class)
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
if b'has-unread-changes' not in res.data:
logger.error(f"TEST FAILED: has-unread-changes not found in response")
logger.error(f"TEST: Watch last_error: {watch.get('last_error')}")
logger.error(f"TEST: Watch last_checked: {watch.get('last_checked')}")
assert b'has-unread-changes' in res.data assert b'has-unread-changes' in res.data
assert b'/test-endpoint' in res.data assert b'/test-endpoint' in res.data
+1 -1
View File
@@ -82,7 +82,7 @@ def test_import_distillio(client, live_server, measure_memory_usage, datastore_p
# Give the endpoint time to spin up # Give the endpoint time to spin up
time.sleep(1) time.sleep(1)
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
res = client.post( res = client.post(
url_for("imports.import_page"), url_for("imports.import_page"),
data={ data={
@@ -224,7 +224,6 @@ def check_json_filter(json_filter, client, live_server, datastore_path):
set_original_response(datastore_path=datastore_path) set_original_response(datastore_path=datastore_path)
delete_all_watches(client)
# Add our URL to the import page # Add our URL to the import page
test_url = url_for('test_endpoint', content_type="application/json", _external=True) test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url, extras={"include_filters": json_filter.splitlines()}) uuid = client.application.config.get('DATASTORE').add_watch(url=test_url, extras={"include_filters": json_filter.splitlines()})
@@ -298,17 +297,14 @@ def check_json_filter_bool_val(json_filter, client, live_server, datastore_path)
def test_check_jsonpath_filter_bool_val(client, live_server, measure_memory_usage, datastore_path): def test_check_jsonpath_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
check_json_filter_bool_val("json:$['available']", client, live_server, datastore_path=datastore_path) check_json_filter_bool_val("json:$['available']", client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
def test_check_jq_filter_bool_val(client, live_server, measure_memory_usage, datastore_path): def test_check_jq_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
if jq_support: if jq_support:
check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path) check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
def test_check_jqraw_filter_bool_val(client, live_server, measure_memory_usage, datastore_path): def test_check_jqraw_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
if jq_support: if jq_support:
check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path) check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
# Re #265 - Extended JSON selector test # Re #265 - Extended JSON selector test
# Stuff to consider here # Stuff to consider here
@@ -456,17 +452,14 @@ def test_correct_header_detect(client, live_server, measure_memory_usage, datast
def test_check_jsonpath_ext_filter(client, live_server, measure_memory_usage, datastore_path): def test_check_jsonpath_ext_filter(client, live_server, measure_memory_usage, datastore_path):
check_json_ext_filter('json:$[?(@.status==Sold)]', client, live_server, datastore_path=datastore_path) check_json_ext_filter('json:$[?(@.status==Sold)]', client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
def test_check_jq_ext_filter(client, live_server, measure_memory_usage, datastore_path): def test_check_jq_ext_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support: if jq_support:
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path) check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
def test_check_jqraw_ext_filter(client, live_server, measure_memory_usage, datastore_path): def test_check_jqraw_ext_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support: if jq_support:
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path) check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path)
delete_all_watches(client)
def test_jsonpath_BOM_utf8(client, live_server, measure_memory_usage, datastore_path): def test_jsonpath_BOM_utf8(client, live_server, measure_memory_usage, datastore_path):
from .. import html_tools from .. import html_tools
@@ -477,6 +470,5 @@ def test_jsonpath_BOM_utf8(client, live_server, measure_memory_usage, datastore_
# See that we can find the second <script> one, which is not broken, and matches our filter # See that we can find the second <script> one, which is not broken, and matches our filter
text = html_tools.extract_json_as_string(json_str, "json:$.name") text = html_tools.extract_json_as_string(json_str, "json:$.name")
assert text == '"José"' assert text == '"José"'
delete_all_watches(client)
+8 -2
View File
@@ -313,8 +313,14 @@ def test_notification_custom_endpoint_and_jinja2(client, live_server, measure_me
# Add a watch and trigger a HTTP POST # Add a watch and trigger a HTTP POST
test_url = url_for('test_endpoint', _external=True) test_url = url_for('test_endpoint', _external=True)
watch_uuid = client.application.config.get('DATASTORE').add_watch(url=test_url, tag="nice one") res = client.post(
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) url_for("ui.ui_views.form_quick_watch_add"),
data={"url": test_url, "tags": 'nice one'},
follow_redirects=True
)
assert b"Watch added" in res.data
watch_uuid = next(iter(live_server.app.config['DATASTORE'].data['watching']))
wait_for_all_checks(client) wait_for_all_checks(client)
set_modified_response(datastore_path=datastore_path) set_modified_response(datastore_path=datastore_path)
@@ -1,7 +1,7 @@
import os import os
import time import time
from flask import url_for from flask import url_for
from .util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks, delete_all_watches from .util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks
import logging import logging
def test_check_notification_error_handling(client, live_server, measure_memory_usage, datastore_path): def test_check_notification_error_handling(client, live_server, measure_memory_usage, datastore_path):
@@ -81,4 +81,4 @@ def test_check_notification_error_handling(client, live_server, measure_memory_u
os.unlink(os.path.join(datastore_path, "notification.txt")) os.unlink(os.path.join(datastore_path, "notification.txt"))
assert 'xxxxx' in notification_submission assert 'xxxxx' in notification_submission
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
@@ -1,52 +0,0 @@
import os
import time
from flask import url_for
from .util import set_original_response, wait_for_all_checks, wait_for_notification_endpoint_output
from ..notification import valid_notification_formats
from loguru import logger
def test_queue_system(client, live_server, measure_memory_usage, datastore_path):
"""Test that multiple workers can process queue concurrently without blocking each other"""
# (pytest) Werkzeug's threaded server uses ThreadPoolExecutor with a default limit of around 40 threads (or min(32, os.cpu_count() + 4)).
items = os.cpu_count() +3
delay = 10
# Auto-queue is off here.
live_server.app.config['DATASTORE'].data['settings']['application']['all_paused'] = True
test_urls = [
f"{url_for('test_endpoint', _external=True)}?delay={delay}&id={i}&content=hello+test+content+{i}"
for i in range(0, items)
]
# Import 30 URLs to queue
res = client.post(
url_for("imports.import_page"),
data={"urls": "\r\n".join(test_urls)},
follow_redirects=True
)
assert f"{items} Imported".encode('utf-8') in res.data
client.application.set_workers(items)
start = time.time()
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
time.sleep(delay/2)
# Verify all workers are idle (no UUIDs being processed)
from changedetectionio import worker_pool
running_uuids = worker_pool.get_running_uuids()
logger.debug( f"Should be atleast some workers running - {len(running_uuids)} UUIDs still being processed: {running_uuids}")
assert len(running_uuids) != 0, f"Should be atleast some workers running - {len(running_uuids)} UUIDs still being processed: {running_uuids}"
wait_for_all_checks(client)
# all workers should be done in less than say 10 seconds (they take time to 'see' something is in the queue too)
total_time = (time.time() - start)
logger.debug(f"All workers finished {items} items in less than {delay} seconds per job. {total_time}s total")
# if there was a bug in queue handler not running parallel, this would blow out to items*delay seconds
assert total_time < delay + 10, f"All workers finished {items} items in less than {delay} seconds per job, total time {total_time}s"
# Verify all workers are idle (no UUIDs being processed)
from changedetectionio import worker_pool
running_uuids = worker_pool.get_running_uuids()
assert len(running_uuids) == 0, f"Expected all workers to be idle, but {len(running_uuids)} UUIDs still being processed: {running_uuids}"
+8 -9
View File
@@ -17,12 +17,12 @@ def test_headers_in_request(client, live_server, measure_memory_usage, datastore
test_url = test_url.replace('localhost', 'changedet') test_url = test_url.replace('localhost', 'changedet')
# Add the test URL twice, we will check # Add the test URL twice, we will check
uuidA = client.application.config.get('DATASTORE').add_watch(url=test_url) uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
uuidB = client.application.config.get('DATASTORE').add_watch(url=test_url) uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True) client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
@@ -31,7 +31,7 @@ def test_headers_in_request(client, live_server, measure_memory_usage, datastore
# Add some headers to a request # Add some headers to a request
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuidA), url_for("ui.ui_edit.edit_page", uuid="first"),
data={ data={
"url": test_url, "url": test_url,
"tags": "", "tags": "",
@@ -42,14 +42,13 @@ def test_headers_in_request(client, live_server, measure_memory_usage, datastore
) )
assert b"Updated watch." in res.data assert b"Updated watch." in res.data
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick up the first version # Give the thread time to pick up the first version
wait_for_all_checks(client) wait_for_all_checks(client)
# The service should echo back the request headers # The service should echo back the request headers
res = client.get( res = client.get(
url_for("ui.ui_preview.preview_page", uuid=uuidA), url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True follow_redirects=True
) )
@@ -93,7 +92,7 @@ def test_body_in_request(client, live_server, measure_memory_usage, datastore_pa
# add the first 'version' # add the first 'version'
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={ data={
"url": test_url, "url": test_url,
"tags": "", "tags": "",
@@ -111,7 +110,7 @@ def test_body_in_request(client, live_server, measure_memory_usage, datastore_pa
body_value = 'Test Body Value {{ 1+1 }}' body_value = 'Test Body Value {{ 1+1 }}'
body_value_formatted = 'Test Body Value 2' body_value_formatted = 'Test Body Value 2'
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={ data={
"url": test_url, "url": test_url,
"tags": "", "tags": "",
@@ -127,7 +126,7 @@ def test_body_in_request(client, live_server, measure_memory_usage, datastore_pa
# The service should echo back the body # The service should echo back the body
res = client.get( res = client.get(
url_for("ui.ui_preview.preview_page", uuid=uuid), url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True follow_redirects=True
) )
@@ -158,7 +157,7 @@ def test_body_in_request(client, live_server, measure_memory_usage, datastore_pa
# Attempt to add a body with a GET method # Attempt to add a body with a GET method
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={ data={
"url": test_url, "url": test_url,
"tags": "", "tags": "",
@@ -236,7 +236,6 @@ def test_restock_itemprop_with_tag(client, live_server, measure_memory_usage, da
} }
_run_test_minmax_limit(client, extra_watch_edit_form=extras,datastore_path=datastore_path) _run_test_minmax_limit(client, extra_watch_edit_form=extras,datastore_path=datastore_path)
delete_all_watches(client)
@@ -389,10 +388,9 @@ def test_change_with_notification_values(client, live_server, measure_memory_usa
os.unlink(os.path.join(datastore_path, "notification.txt")) os.unlink(os.path.join(datastore_path, "notification.txt"))
uuid = next(iter(live_server.app.config['DATASTORE'].data['watching'])) uuid = next(iter(live_server.app.config['DATASTORE'].data['watching']))
res = client.post(url_for("ui.ui_notification.ajax_callback_send_notification_test", watch_uuid=uuid), data={}, follow_redirects=True) res = client.post(url_for("ui.ui_notification.ajax_callback_send_notification_test", watch_uuid=uuid), data={}, follow_redirects=True)
wait_for_notification_endpoint_output(datastore_path=datastore_path) time.sleep(5)
assert os.path.isfile(os.path.join(datastore_path, "notification.txt")), "Notification received" assert os.path.isfile(os.path.join(datastore_path, "notification.txt")), "Notification received"
delete_all_watches(client)
def test_data_sanity(client, live_server, measure_memory_usage, datastore_path): def test_data_sanity(client, live_server, measure_memory_usage, datastore_path):
@@ -408,7 +406,6 @@ def test_data_sanity(client, live_server, measure_memory_usage, datastore_path):
follow_redirects=True follow_redirects=True
) )
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
@@ -420,7 +417,6 @@ def test_data_sanity(client, live_server, measure_memory_usage, datastore_path):
data={"url": test_url2, "tags": 'restock tests', 'processor': 'restock_diff'}, data={"url": test_url2, "tags": 'restock tests', 'processor': 'restock_diff'},
follow_redirects=True follow_redirects=True
) )
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert str(res.data.decode()).count("950.95") == 1, "Price should only show once (for the watch added, no other watches yet)" assert str(res.data.decode()).count("950.95") == 1, "Price should only show once (for the watch added, no other watches yet)"
@@ -466,4 +462,3 @@ def test_special_prop_examples(client, live_server, measure_memory_usage, datast
assert b'ception' not in res.data assert b'ception' not in res.data
assert b'155.55' in res.data assert b'155.55' in res.data
delete_all_watches(client)
+1 -1
View File
@@ -107,7 +107,7 @@ def test_rss_and_token(client, live_server, measure_memory_usage, datastore_path
assert b"Access denied, bad token" not in res.data assert b"Access denied, bad token" not in res.data
assert b"Random content" in res.data assert b"Random content" in res.data
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_basic_cdata_rss_markup(client, live_server, measure_memory_usage, datastore_path): def test_basic_cdata_rss_markup(client, live_server, measure_memory_usage, datastore_path):
@@ -23,7 +23,6 @@ def test_rss_feed_empty(client, live_server, measure_memory_usage, datastore_pat
) )
assert res.status_code == 400 assert res.status_code == 400
assert b'does not have enough history snapshots to show' in res.data assert b'does not have enough history snapshots to show' in res.data
delete_all_watches(client)
def test_rss_single_watch_order(client, live_server, measure_memory_usage, datastore_path): def test_rss_single_watch_order(client, live_server, measure_memory_usage, datastore_path):
""" """
+5 -8
View File
@@ -24,20 +24,20 @@ def test_share_watch(client, live_server, measure_memory_usage, datastore_path):
# Goto the edit page, add our ignore text # Goto the edit page, add our ignore text
# Add our URL to the import page # Add our URL to the import page
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"}, data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"},
follow_redirects=True follow_redirects=True
) )
assert b"Updated watch." in res.data assert b"Updated watch." in res.data
# Check it saved # Check it saved
res = client.get( res = client.get(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
) )
assert bytes(include_filters.encode('utf-8')) in res.data assert bytes(include_filters.encode('utf-8')) in res.data
# click share the link # click share the link
res = client.get( res = client.get(
url_for("ui.form_share_put_watch", uuid=uuid), url_for("ui.form_share_put_watch", uuid="first"),
follow_redirects=True follow_redirects=True
) )
@@ -63,16 +63,13 @@ def test_share_watch(client, live_server, measure_memory_usage, datastore_path):
# Now hit edit, we should see what we expect # Now hit edit, we should see what we expect
# that the import fetched the meta-data # that the import fetched the meta-data
uuids = list(client.application.config.get('DATASTORE').data['watching'])
assert uuids, "It saved/imported and created a new URL from the share"
# Check it saved # Check it saved
res = client.get( res = client.get(
url_for("ui.ui_edit.edit_page", uuid=uuids[0]), url_for("ui.ui_edit.edit_page", uuid="first"),
) )
assert bytes(include_filters.encode('utf-8')) in res.data assert bytes(include_filters.encode('utf-8')) in res.data
# Check it saved the URL # Check it saved the URL
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert bytes(test_url.encode('utf-8')) in res.data assert bytes(test_url.encode('utf-8')) in res.data
delete_all_watches(client)
+1 -5
View File
@@ -25,7 +25,6 @@ def test_recheck_time_field_validation_global_settings(client, live_server, meas
assert REQUIRE_ATLEAST_ONE_TIME_PART_MESSAGE_DEFAULT.encode('utf-8') in res.data assert REQUIRE_ATLEAST_ONE_TIME_PART_MESSAGE_DEFAULT.encode('utf-8') in res.data
delete_all_watches(client)
def test_recheck_time_field_validation_single_watch(client, live_server, measure_memory_usage, datastore_path): def test_recheck_time_field_validation_single_watch(client, live_server, measure_memory_usage, datastore_path):
@@ -95,7 +94,6 @@ def test_recheck_time_field_validation_single_watch(client, live_server, measure
assert b"Updated watch." in res.data assert b"Updated watch." in res.data
assert REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT.encode('utf-8') not in res.data assert REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT.encode('utf-8') not in res.data
delete_all_watches(client)
def test_checkbox_open_diff_in_new_tab(client, live_server, measure_memory_usage, datastore_path): def test_checkbox_open_diff_in_new_tab(client, live_server, measure_memory_usage, datastore_path):
@@ -244,7 +242,6 @@ def test_page_title_listing_behaviour(client, live_server, measure_memory_usage,
# No page title description, and 'use_page_title_in_list' is on, it should show the <title> # No page title description, and 'use_page_title_in_list' is on, it should show the <title>
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert b"head titlecustom html" in res.data assert b"head titlecustom html" in res.data
delete_all_watches(client)
def test_ui_viewed_unread_flag(client, live_server, measure_memory_usage, datastore_path): def test_ui_viewed_unread_flag(client, live_server, measure_memory_usage, datastore_path):
@@ -286,5 +283,4 @@ def test_ui_viewed_unread_flag(client, live_server, measure_memory_usage, datast
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True) client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2) time.sleep(0.2)
res = client.get(url_for("watchlist.index")) res = client.get(url_for("watchlist.index"))
assert b'<span id="unread-tab-counter">0</span>' in res.data assert b'<span id="unread-tab-counter">0</span>' in res.data
delete_all_watches(client)
+8 -10
View File
@@ -366,7 +366,7 @@ def test_check_with_prefix_include_filters(client, live_server, measure_memory_u
assert b"Some text thats the same" in res.data # in selector assert b"Some text thats the same" in res.data # in selector
assert b"Some text that will change" not in res.data # not in selector assert b"Some text that will change" not in res.data # not in selector
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_various_rules(client, live_server, measure_memory_usage, datastore_path): def test_various_rules(client, live_server, measure_memory_usage, datastore_path):
@@ -423,7 +423,7 @@ def test_xpath_20(client, live_server, measure_memory_usage, datastore_path):
test_url = url_for('test_endpoint', _external=True) test_url = url_for('test_endpoint', _external=True)
res = client.post( res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid), url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": "//*[contains(@class, 'sametext')]|//*[contains(@class, 'changetext')]", data={"include_filters": "//*[contains(@class, 'sametext')]|//*[contains(@class, 'changetext')]",
"url": test_url, "url": test_url,
"tags": "", "tags": "",
@@ -437,14 +437,14 @@ def test_xpath_20(client, live_server, measure_memory_usage, datastore_path):
wait_for_all_checks(client) wait_for_all_checks(client)
res = client.get( res = client.get(
url_for("ui.ui_preview.preview_page", uuid=uuid), url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True follow_redirects=True
) )
assert b"Some text thats the same" in res.data # in selector assert b"Some text thats the same" in res.data # in selector
assert b"Some text that will change" in res.data # in selector assert b"Some text that will change" in res.data # in selector
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_count(client, live_server, measure_memory_usage, datastore_path): def test_xpath_20_function_count(client, live_server, measure_memory_usage, datastore_path):
@@ -477,7 +477,7 @@ def test_xpath_20_function_count(client, live_server, measure_memory_usage, data
assert b"246913579975308642" in res.data # in selector assert b"246913579975308642" in res.data # in selector
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_count2(client, live_server, measure_memory_usage, datastore_path): def test_xpath_20_function_count2(client, live_server, measure_memory_usage, datastore_path):
@@ -501,8 +501,6 @@ def test_xpath_20_function_count2(client, live_server, measure_memory_usage, dat
) )
assert b"Updated watch." in res.data assert b"Updated watch." in res.data
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client) wait_for_all_checks(client)
res = client.get( res = client.get(
@@ -512,7 +510,7 @@ def test_xpath_20_function_count2(client, live_server, measure_memory_usage, dat
assert b"246913579975308642" in res.data # in selector assert b"246913579975308642" in res.data # in selector
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_xpath_20_function_string_join_matches(client, live_server, measure_memory_usage, datastore_path): def test_xpath_20_function_string_join_matches(client, live_server, measure_memory_usage, datastore_path):
@@ -546,7 +544,7 @@ def test_xpath_20_function_string_join_matches(client, live_server, measure_memo
assert b"Some text thats the samespecialconjunctionSome text that will change" in res.data # in selector assert b"Some text thats the samespecialconjunctionSome text that will change" in res.data # in selector
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def _subtest_xpath_rss(client, datastore_path, content_type='text/html'): def _subtest_xpath_rss(client, datastore_path, content_type='text/html'):
@@ -584,7 +582,7 @@ def _subtest_xpath_rss(client, datastore_path, content_type='text/html'):
assert b"Lets go discount" in res.data, f"When testing for Lets go discount called with content type '{content_type}'" assert b"Lets go discount" in res.data, f"When testing for Lets go discount called with content type '{content_type}'"
assert b"Events and Announcements" not in res.data, f"When testing for Lets go discount called with content type '{content_type}'" # It should not be here because thats not our selector target assert b"Events and Announcements" not in res.data, f"When testing for Lets go discount called with content type '{content_type}'" # It should not be here because thats not our selector target
delete_all_watches(client) client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
# Be sure all-in-the-wild types of RSS feeds work with xpath # Be sure all-in-the-wild types of RSS feeds work with xpath
def test_rss_xpath(client, live_server, measure_memory_usage, datastore_path): def test_rss_xpath(client, live_server, measure_memory_usage, datastore_path):
+28 -104
View File
@@ -6,42 +6,6 @@ from flask import url_for
import logging import logging
import time import time
import os import os
import threading
# Thread-safe global storage for test endpoint content
# Avoids filesystem cache issues in parallel tests
_test_endpoint_content_lock = threading.Lock()
_test_endpoint_content = {}
def write_test_file_and_sync(filepath, content, mode='w'):
"""
Write test data to file and ensure it's synced to disk.
Also stores in thread-safe global dict to bypass filesystem cache.
Critical for parallel tests where workers may read files immediately after write.
Without fsync(), data may still be in OS buffers when workers try to read,
causing race conditions where old data is seen.
Args:
filepath: Full path to file
content: Content to write (str or bytes)
mode: File mode ('w' for text, 'wb' for binary)
"""
# Convert content to bytes if needed
if isinstance(content, str):
content_bytes = content.encode('utf-8')
else:
content_bytes = content
# Store in thread-safe global dict for instant access
with _test_endpoint_content_lock:
_test_endpoint_content[os.path.basename(filepath)] = content_bytes
# Also write to file for compatibility
with open(filepath, mode) as f:
f.write(content)
f.flush() # Flush Python buffer to OS
os.fsync(f.fileno()) # Force OS to write to disk
def set_original_response(datastore_path, extra_title=''): def set_original_response(datastore_path, extra_title=''):
test_return_data = f"""<html> test_return_data = f"""<html>
@@ -56,7 +20,8 @@ def set_original_response(datastore_path, extra_title=''):
</html> </html>
""" """
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), test_return_data) with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None return None
def set_modified_response(datastore_path): def set_modified_response(datastore_path):
@@ -71,7 +36,9 @@ def set_modified_response(datastore_path):
</html> </html>
""" """
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), test_return_data) with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None return None
def set_longer_modified_response(datastore_path): def set_longer_modified_response(datastore_path):
test_return_data = """<html> test_return_data = """<html>
@@ -88,7 +55,9 @@ def set_longer_modified_response(datastore_path):
</html> </html>
""" """
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), test_return_data) with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None return None
def set_more_modified_response(datastore_path): def set_more_modified_response(datastore_path):
@@ -104,14 +73,17 @@ def set_more_modified_response(datastore_path):
</html> </html>
""" """
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), test_return_data) with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None return None
def set_empty_text_response(datastore_path): def set_empty_text_response(datastore_path):
test_return_data = """<html><body></body></html>""" test_return_data = """<html><body></body></html>"""
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), test_return_data) with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None return None
@@ -160,40 +132,21 @@ def extract_UUID_from_client(client):
return uuid.strip() return uuid.strip()
def delete_all_watches(client=None): def delete_all_watches(client=None):
# Change tracking
client.application.config.get('DATASTORE')._dirty_watches = set() # Watch UUIDs that need saving
client.application.config.get('DATASTORE')._dirty_settings = False # Settings changed
client.application.config.get('DATASTORE')._watch_hashes = {} # UUID -> SHA256 hash for change detection
uuids = list(client.application.config.get('DATASTORE').data['watching']) uuids = list(client.application.config.get('DATASTORE').data['watching'])
for uuid in uuids: for uuid in uuids:
client.application.config.get('DATASTORE').delete(uuid) client.application.config.get('DATASTORE').delete(uuid)
from changedetectionio.flask_app import update_q
# Clear the queue to prevent leakage to next test
# Use clear() method to ensure both priority_items and notification_queue are drained
if hasattr(update_q, 'clear'):
update_q.clear()
else:
# Fallback for old implementation
while not update_q.empty():
try:
update_q.get_nowait()
except:
break
time.sleep(0.2)
def wait_for_all_checks(client=None): def wait_for_all_checks(client=None):
""" """
Waits until the queue is empty and workers are idle. Waits until the queue is empty and workers are idle.
Delegates to worker_pool.wait_for_all_checks for shared logic. Delegates to worker_handler.wait_for_all_checks for shared logic.
""" """
from changedetectionio.flask_app import update_q as global_update_q from changedetectionio.flask_app import update_q as global_update_q
from changedetectionio import worker_pool from changedetectionio import worker_handler
return worker_pool.wait_for_all_checks(global_update_q, timeout=150)
# Use the shared wait logic from worker_handler
return worker_handler.wait_for_all_checks(global_update_q, timeout=150)
def wait_for_watch_history(client, min_history_count=2, timeout=10): def wait_for_watch_history(client, min_history_count=2, timeout=10):
""" """
@@ -242,11 +195,8 @@ def new_live_server_setup(live_server):
@live_server.app.route('/test-endpoint') @live_server.app.route('/test-endpoint')
def test_endpoint(): def test_endpoint():
# REMOVED: logger.debug() causes file locking between test process and Flask server process from loguru import logger
# Flask server runs in separate multiprocessing.Process and inherited loguru tries to logger.debug(f"/test-endpoint hit {request}")
# write to same log files, causing request handlers to block on file locks
# from loguru import logger
# logger.debug(f"/test-endpoint hit {request}")
ctype = request.args.get('content_type') ctype = request.args.get('content_type')
status_code = request.args.get('status_code') status_code = request.args.get('status_code')
content = request.args.get('content') or None content = request.args.get('content') or None
@@ -268,35 +218,15 @@ def new_live_server_setup(live_server):
resp.headers['Content-Type'] = ctype if ctype else 'text/html' resp.headers['Content-Type'] = ctype if ctype else 'text/html'
return resp return resp
# Check thread-safe global dict first (instant, no cache issues) # Tried using a global var here but didn't seem to work, so reading from a file instead.
# Fall back to file if not in dict (for tests that write directly) datastore_path = current_app.config.get('TEST_DATASTORE_PATH', 'test-datastore')
with _test_endpoint_content_lock: with open(os.path.join(datastore_path, "endpoint-content.txt"), "rb") as f:
content_data = _test_endpoint_content.get("endpoint-content.txt") resp = make_response(f.read(), status_code)
if uppercase_headers:
if content_data is None: resp.headers['CONTENT-TYPE'] = ctype if ctype else 'text/html'
# Not in global dict, read from file else:
datastore_path = current_app.config.get('TEST_DATASTORE_PATH', 'test-datastore') resp.headers['Content-Type'] = ctype if ctype else 'text/html'
filepath = os.path.join(datastore_path, "endpoint-content.txt") return resp
# REMOVED: os.sync() was blocking for many seconds during parallel tests
# With -n 6+ parallel tests, heavy I/O causes os.sync() to wait for ALL
# system writes to complete, causing "Read timed out" errors
# File writes from test code are already flushed by the time workers fetch
try:
with open(filepath, "rb") as f:
content_data = f.read()
except Exception as e:
# REMOVED: logger.error() causes file locking in multiprocess context
# Just raise the exception directly for debugging
raise
resp = make_response(content_data, status_code)
if uppercase_headers:
resp.headers['CONTENT-TYPE'] = ctype if ctype else 'text/html'
else:
resp.headers['Content-Type'] = ctype if ctype else 'text/html'
return resp
except FileNotFoundError: except FileNotFoundError:
return make_response('', status_code) return make_response('', status_code)
@@ -371,12 +301,6 @@ def new_live_server_setup(live_server):
def test_pdf_endpoint(): def test_pdf_endpoint():
datastore_path = current_app.config.get('TEST_DATASTORE_PATH', 'test-datastore') datastore_path = current_app.config.get('TEST_DATASTORE_PATH', 'test-datastore')
# Force filesystem sync before reading to ensure fresh data
try:
os.sync()
except (AttributeError, PermissionError):
pass
# Tried using a global var here but didn't seem to work, so reading from a file instead. # Tried using a global var here but didn't seem to work, so reading from a file instead.
with open(os.path.join(datastore_path, "endpoint-test.pdf"), "rb") as f: with open(os.path.join(datastore_path, "endpoint-test.pdf"), "rb") as f:
resp = make_response(f.read(), 200) resp = make_response(f.read(), 200)
@@ -23,14 +23,11 @@ _uuid_processing_lock = threading.Lock() # Protects currently_processing_uuids
USE_ASYNC_WORKERS = True USE_ASYNC_WORKERS = True
# Custom ThreadPoolExecutor for queue operations with named threads # Custom ThreadPoolExecutor for queue operations with named threads
# Scale executor threads to match FETCH_WORKERS (no minimum, no maximum) # Scale executor threads with FETCH_WORKERS to avoid bottleneck at high concurrency
# Thread naming: "QueueGetter-N" for easy debugging in thread dumps/traces _max_executor_workers = max(50, int(os.getenv("FETCH_WORKERS", "10")))
# With FETCH_WORKERS=10: 10 workers + 10 executor threads = 20 threads total
# With FETCH_WORKERS=500: 500 workers + 500 executor threads = 1000 threads total (acceptable on modern systems)
_max_executor_workers = int(os.getenv("FETCH_WORKERS", "10"))
queue_executor = ThreadPoolExecutor( queue_executor = ThreadPoolExecutor(
max_workers=_max_executor_workers, max_workers=_max_executor_workers,
thread_name_prefix="QueueGetter-" # Shows in thread dumps as "QueueGetter-0", "QueueGetter-1", etc. thread_name_prefix="QueueGetter-"
) )
@@ -85,17 +82,16 @@ class WorkerThread:
self.loop = None self.loop = None
def start(self): def start(self):
"""Start the worker thread with descriptive name for debugging""" """Start the worker thread"""
self.thread = threading.Thread( self.thread = threading.Thread(
target=self.run, target=self.run,
daemon=True, daemon=True,
name=f"PageFetchAsyncUpdateWorker-{self.worker_id}" # Shows in thread dumps with worker ID name=f"PageFetchAsyncUpdateWorker-{self.worker_id}"
) )
self.thread.start() self.thread.start()
def stop(self): def stop(self):
"""Stop the worker thread brutally - no waiting""" """Stop the worker thread"""
# Try to stop the event loop if it exists
if self.loop and self.running: if self.loop and self.running:
try: try:
# Signal the loop to stop # Signal the loop to stop
@@ -103,7 +99,8 @@ class WorkerThread:
except RuntimeError: except RuntimeError:
pass pass
# Don't wait - thread is daemon and will die when needed if self.thread and self.thread.is_alive():
self.thread.join(timeout=2.0)
def start_async_workers(n_workers, update_q, notification_q, app, datastore): def start_async_workers(n_workers, update_q, notification_q, app, datastore):
@@ -128,7 +125,7 @@ def start_async_workers(n_workers, update_q, notification_q, app, datastore):
async def start_single_async_worker(worker_id, update_q, notification_q, app, datastore, executor=None): async def start_single_async_worker(worker_id, update_q, notification_q, app, datastore, executor=None):
"""Start a single async worker with auto-restart capability""" """Start a single async worker with auto-restart capability"""
from changedetectionio.worker import async_update_worker from changedetectionio.async_update_worker import async_update_worker
# Check if we're in pytest environment - if so, be more gentle with logging # Check if we're in pytest environment - if so, be more gentle with logging
import os import os
@@ -340,36 +337,24 @@ def queue_item_async_safe(update_q, item, silent=False):
def shutdown_workers(): def shutdown_workers():
"""Shutdown all async workers brutally - no delays, no waiting""" """Shutdown all async workers fast and aggressively"""
global worker_threads, queue_executor global worker_threads
# Check if we're in pytest environment - if so, be more gentle with logging # Check if we're in pytest environment - if so, be more gentle with logging
import os import os
in_pytest = "pytest" in os.sys.modules or "PYTEST_CURRENT_TEST" in os.environ in_pytest = "pytest" in os.sys.modules or "PYTEST_CURRENT_TEST" in os.environ
if not in_pytest: if not in_pytest:
logger.info("Brutal shutdown of async workers initiated...") logger.info("Fast shutdown of async workers initiated...")
# Stop all worker event loops # Stop all worker threads
for worker in worker_threads: for worker in worker_threads:
worker.stop() worker.stop()
# Clear immediately - threads are daemon and will die
worker_threads.clear() worker_threads.clear()
# Shutdown the queue executor to prevent "cannot schedule new futures after shutdown" errors
# This must happen AFTER workers are stopped to avoid race conditions
if queue_executor:
try:
queue_executor.shutdown(wait=False)
if not in_pytest:
logger.debug("Queue executor shut down")
except Exception as e:
if not in_pytest:
logger.warning(f"Error shutting down queue executor: {e}")
if not in_pytest: if not in_pytest:
logger.info("Async workers brutal shutdown complete") logger.info("Async workers fast shutdown complete")
@@ -484,14 +469,12 @@ def wait_for_all_checks(update_q, timeout=150):
elif time.time() - empty_since >= 0.3: elif time.time() - empty_since >= 0.3:
# Add small buffer for filesystem operations to complete # Add small buffer for filesystem operations to complete
time.sleep(0.2) time.sleep(0.2)
logger.trace("wait_for_all_checks: All checks complete (queue empty, workers idle)")
return True return True
else: else:
empty_since = None empty_since = None
attempt += 1 attempt += 1
logger.warning(f"wait_for_all_checks: Timeout after {timeout} attempts")
return False # Timeout return False # Timeout
-7
View File
@@ -16,13 +16,6 @@ services:
# Log output levels: TRACE, DEBUG(default), INFO, SUCCESS, WARNING, ERROR, CRITICAL # Log output levels: TRACE, DEBUG(default), INFO, SUCCESS, WARNING, ERROR, CRITICAL
# - LOGGER_LEVEL=TRACE # - LOGGER_LEVEL=TRACE
# #
# Plugins! See https://changedetection.io/plugins for more plugins.
# Install additional Python packages (processor plugins, etc.)
# Example: Install the OSINT reconnaissance processor plugin
# - EXTRA_PACKAGES=changedetection.io-osint-processor
# Multiple packages can be installed by separating with spaces:
# - EXTRA_PACKAGES=changedetection.io-osint-processor another-plugin
#
# #
# Uncomment below and the "sockpuppetbrowser" to use a real Chrome browser (It uses the "playwright" protocol) # Uncomment below and the "sockpuppetbrowser" to use a real Chrome browser (It uses the "playwright" protocol)
# - PLAYWRIGHT_DRIVER_URL=ws://browser-sockpuppet-chrome:3000 # - PLAYWRIGHT_DRIVER_URL=ws://browser-sockpuppet-chrome:3000
-28
View File
@@ -1,28 +0,0 @@
#!/bin/bash
set -e
# Install additional packages from EXTRA_PACKAGES env var
# Uses a marker file to avoid reinstalling on every container restart
INSTALLED_MARKER="/datastore/.extra_packages_installed"
CURRENT_PACKAGES="$EXTRA_PACKAGES"
if [ -n "$EXTRA_PACKAGES" ]; then
# Check if we need to install/update packages
if [ ! -f "$INSTALLED_MARKER" ] || [ "$(cat $INSTALLED_MARKER 2>/dev/null)" != "$CURRENT_PACKAGES" ]; then
echo "Installing extra packages: $EXTRA_PACKAGES"
pip3 install --no-cache-dir $EXTRA_PACKAGES
if [ $? -eq 0 ]; then
echo "$CURRENT_PACKAGES" > "$INSTALLED_MARKER"
echo "Extra packages installed successfully"
else
echo "ERROR: Failed to install extra packages"
exit 1
fi
else
echo "Extra packages already installed: $EXTRA_PACKAGES"
fi
fi
# Execute the main command
exec "$@"
+3 -3
View File
@@ -8,7 +8,7 @@ flask-paginate
flask_expects_json~=1.7 flask_expects_json~=1.7
flask_restful flask_restful
flask_cors # For the Chrome extension to operate flask_cors # For the Chrome extension to operate
# janus # No longer needed - using pure threading.Queue for multi-loop support janus # Thread-safe async/sync queue bridge
flask_wtf~=1.2 flask_wtf~=1.2
flask~=3.1 flask~=3.1
flask-socketio~=5.6.0 flask-socketio~=5.6.0
@@ -51,9 +51,9 @@ linkify-it-py
# - Needed for apprise/spush, and maybe others? hopefully doesnt trigger a rust compile. # - Needed for apprise/spush, and maybe others? hopefully doesnt trigger a rust compile.
# - Requires extra wheel for rPi, adds build time for arm/v8 which is not in piwheels # - Requires extra wheel for rPi, adds build time for arm/v8 which is not in piwheels
# Pinned to 44.x for ARM compatibility and sslyze compatibility (sslyze requires <45) and (45.x may not have pre-built ARM wheels) # Pinned to 43.0.1 for ARM compatibility (45.x may not have pre-built ARM wheels)
# Also pinned because dependabot wants specific versions # Also pinned because dependabot wants specific versions
cryptography==44.0.0 cryptography==46.0.3
# apprise mqtt https://github.com/dgtlmoon/changedetection.io/issues/315 # apprise mqtt https://github.com/dgtlmoon/changedetection.io/issues/315
# use any version other than 2.0.x due to https://github.com/eclipse/paho.mqtt.python/issues/814 # use any version other than 2.0.x due to https://github.com/eclipse/paho.mqtt.python/issues/814