Compare commits

...

44 Commits

Author SHA1 Message Date
dgtlmoon 15e0a330fe increase delay
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-06-03 09:27:34 +02:00
dgtlmoon 40226dbad7 add delays 2025-06-03 09:22:04 +02:00
dgtlmoon 5c7c548929 woops 2025-06-03 09:10:15 +02:00
dgtlmoon 6949a09aab test tweaks 2025-06-03 09:06:19 +02:00
dgtlmoon cf01015601 Add delay for test stability
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / test-container-build (push) Has been cancelled
2025-06-02 21:00:29 +02:00
dgtlmoon 1f710529a4 woops 2025-06-02 20:47:35 +02:00
dgtlmoon 501aaf4b77 test delay 2025-06-02 20:47:27 +02:00
dgtlmoon 8f421a43ef test improvements 2025-06-02 20:27:30 +02:00
dgtlmoon a9cf6a4373 test speedup 2025-06-02 20:23:35 +02:00
dgtlmoon 4352e8006c Improve dockerfile 2025-06-02 19:29:26 +02:00
dgtlmoon b90d03a78e update readme 2025-06-02 19:25:39 +02:00
dgtlmoon 03e751b57f Tidy up async worker names and cleanups when in test mode 2025-06-02 19:18:03 +02:00
dgtlmoon 6c3e88e261 test fixes due to dnspython and other changes 2025-06-02 19:09:39 +02:00
dgtlmoon 75e6fbd624 include more debug output on test 2025-06-02 18:40:03 +02:00
dgtlmoon 821c0edff4 unpin urlrequests 2025-06-02 18:39:12 +02:00
dgtlmoon cd7dde4477 gevent/thread type cleanups 2025-06-02 18:32:10 +02:00
dgtlmoon 6866956e67 Remove eventlet go with gevent! 2025-06-02 18:25:40 +02:00
dgtlmoon b4bfd23f98 Strip whitespace, add history/preview handling 2025-06-02 17:03:25 +02:00
dgtlmoon 817afed17d Remove old hack which is probably not compatible 2025-06-02 16:24:35 +02:00
dgtlmoon 5c0d151490 Make sure test data is not built with docker container 2025-06-02 16:23:51 +02:00
dgtlmoon 337411c16a Cross platform fixes
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-05-30 19:00:25 +02:00
dgtlmoon 6c1ed57032 WIP 2025-05-30 18:49:43 +02:00
dgtlmoon 3d61ce8df7 WIP 2025-05-30 18:13:31 +02:00
dgtlmoon f7695f59d3 work on missing exceptions etc 2025-05-30 17:35:41 +02:00
dgtlmoon 40498a59b6 Adding feather-icons 2025-05-30 17:16:38 +02:00
dgtlmoon 0d332dd519 WIP 2025-05-30 16:32:18 +02:00
dgtlmoon e9d28b810a update test handler 2025-05-30 16:22:24 +02:00
dgtlmoon e5aba3b2f0 ensure workers are running 2025-05-30 16:14:31 +02:00
dgtlmoon 01742dd670 WIP 2025-05-30 16:09:25 +02:00
dgtlmoon 34fbfa7113 fix for browsersteps 2025-05-30 15:52:17 +02:00
dgtlmoon a52ae11062 WIP 2025-05-30 15:49:41 +02:00
dgtlmoon fb5e93691f Revert "WIP - The PlaywrightManager successfully isolates async operations in a dedicated thread while providing a clean synchronous interface to the worker threads, solving the original threading vs async conflict!"
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
This reverts commit 6b68587bbf.
2025-05-30 10:31:38 +02:00
dgtlmoon 6b68587bbf WIP - The PlaywrightManager successfully isolates async operations in a dedicated thread while providing a clean synchronous interface to the worker threads, solving the original threading vs async conflict! 2025-05-30 10:16:35 +02:00
dgtlmoon e891c2da42 WIP - switch to python async mode, tweak eventlet
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-05-29 15:03:11 +02:00
dgtlmoon b535339e94 Undo monkey patch for eventlet
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built 📦 package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-05-29 12:44:44 +02:00
dgtlmoon d40e017e29 Merge branch 'master' into socketio-tweaks 2025-05-29 11:32:33 +02:00
dgtlmoon bf6bab6c05 more work on enable/disable 2025-05-28 11:46:37 +02:00
dgtlmoon 142f93cf88 Adding py 2025-05-28 11:35:22 +02:00
dgtlmoon 4c7395f203 use socket emit from client to handle events 2025-05-28 11:28:45 +02:00
dgtlmoon 9bc347158a WIP 2025-05-28 11:21:31 +02:00
dgtlmoon b74eaca83f Switch to eventlet as handler, UI option to enable/disable 2025-05-28 11:13:47 +02:00
dgtlmoon 46f78f0164 Tweak script setup 2025-05-28 10:11:40 +02:00
dgtlmoon 05ab0831ef Lock to major versions 2025-05-28 10:02:40 +02:00
dgtlmoon 62653a4646 Use 3.1.3 2025-05-28 10:01:08 +02:00
112 changed files with 2948 additions and 1524 deletions
+32
View File
@@ -29,3 +29,35 @@ venv/
# Visual Studio
.vscode/
# Test and development files
test-datastore/
tests/
docs/
*.md
!README.md
# Temporary and log files
*.log
*.tmp
tmp/
temp/
# Training data and large files
train-data/
works-data/
# Container files
Dockerfile*
docker-compose*.yml
.dockerignore
# Development certificates and keys
*.pem
*.key
*.crt
profile_output.prof
# Large binary files that shouldn't be in container
*.pdf
chrome.json
@@ -86,10 +86,10 @@ jobs:
run: |
# Playwright via Sockpuppetbrowser fetch
# tests/visualselector/test_fetch_data.py will do browser steps
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_content.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/test_errorhandling.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/visualselector/test_fetch_data.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_custom_js_before_content.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest -vv --capture=tee-sys --showlocals --tb=long --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_content.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest -vv --capture=tee-sys --showlocals --tb=long --live-server-host=0.0.0.0 --live-server-port=5004 tests/test_errorhandling.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest -vv --capture=tee-sys --showlocals --tb=long --live-server-host=0.0.0.0 --live-server-port=5004 tests/visualselector/test_fetch_data.py'
docker run --rm -e "FLASK_SERVER_NAME=cdio" -e "PLAYWRIGHT_DRIVER_URL=ws://sockpuppetbrowser:3000" --network changedet-network --hostname=cdio test-changedetectionio bash -c 'cd changedetectionio;pytest -vv --capture=tee-sys --showlocals --tb=long --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_custom_js_before_content.py'
- name: Playwright and SocketPuppetBrowser - Headers and requests
+48 -95
View File
@@ -10,10 +10,11 @@ import os
import getopt
import platform
import signal
import socket
import sys
from werkzeug.serving import run_simple
import sys
# Eventlet completely removed - using threading mode for SocketIO
# This provides better Python 3.12+ compatibility and eliminates eventlet/asyncio conflicts
from changedetectionio import store
from changedetectionio.flask_app import changedetection_app
from loguru import logger
@@ -28,22 +29,34 @@ def get_version():
# Parent wrapper or OS sends us a SIGTERM/SIGINT, do everything required for a clean shutdown
def sigshutdown_handler(_signo, _stack_frame):
name = signal.Signals(_signo).name
logger.critical(f'Shutdown: Got Signal - {name} ({_signo}), Saving DB to disk and calling shutdown')
datastore.sync_to_json()
logger.success('Sync JSON to disk complete.')
logger.critical(f'Shutdown: Got Signal - {name} ({_signo}), Fast shutdown initiated')
# Shutdown socketio server if available
# Set exit flag immediately to stop all loops
app.config.exit.set()
datastore.stop_thread = True
# Shutdown workers immediately
try:
from changedetectionio import worker_handler
worker_handler.shutdown_workers()
except Exception as e:
logger.error(f"Error shutting down workers: {str(e)}")
# Shutdown socketio server fast
from changedetectionio.flask_app import socketio_server
if socketio_server and hasattr(socketio_server, 'shutdown'):
try:
logger.info("Shutting down Socket.IO server...")
socketio_server.shutdown()
except Exception as e:
logger.error(f"Error shutting down Socket.IO server: {str(e)}")
# Set flags for clean shutdown
datastore.stop_thread = True
app.config.exit.set()
# Save data quickly
try:
datastore.sync_to_json()
logger.success('Fast sync to disk complete.')
except Exception as e:
logger.error(f"Error syncing to disk: {str(e)}")
sys.exit()
def main():
@@ -52,9 +65,9 @@ def main():
datastore_path = None
do_cleanup = False
host = ''
host = "0.0.0.0"
ipv6_enabled = False
port = os.environ.get('PORT') or 5000
port = int(os.environ.get('PORT', 5000))
ssl_mode = False
# On Windows, create and use a default path.
@@ -150,6 +163,11 @@ def main():
app = changedetection_app(app_config, datastore)
# Get the SocketIO instance from the Flask app (created in flask_app.py)
from changedetectionio.flask_app import socketio_server
global socketio
socketio = socketio_server
signal.signal(signal.SIGTERM, sigshutdown_handler)
signal.signal(signal.SIGINT, sigshutdown_handler)
@@ -174,10 +192,11 @@ def main():
@app.context_processor
def inject_version():
def inject_template_globals():
return dict(right_sticky="v{}".format(datastore.data['version_tag']),
new_version_available=app.config['NEW_VERSION_AVAILABLE'],
has_password=datastore.data['settings']['application']['password'] != False
has_password=datastore.data['settings']['application']['password'] != False,
socket_io_enabled=datastore.data['settings']['application']['ui'].get('socket_io_enabled', True)
)
# Monitored websites will not receive a Referer header when a user clicks on an outgoing link.
@@ -201,87 +220,21 @@ def main():
from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_prefix=1, x_host=1)
s_type = socket.AF_INET6 if ipv6_enabled else socket.AF_INET
# Get socketio_server from flask_app
from changedetectionio.flask_app import socketio_server
# SocketIO instance is already initialized in flask_app.py
if socketio_server and datastore.data['settings']['application']['ui'].get('open_diff_in_new_tab'):
logger.info("Starting server with Socket.IO support (using threading)...")
# Use Flask-SocketIO's run method with error handling for Werkzeug warning
# This is the cleanest approach that works with all Flask-SocketIO versions
# Use '0.0.0.0' as the default host if none is specified
# This will listen on all available interfaces
listen_host = '0.0.0.0' if host == '' else host
logger.info(f"Using host: {listen_host} and port: {port}")
try:
# First try with the allow_unsafe_werkzeug parameter (newer versions)
if ssl_mode:
socketio_server.run(
app,
host=listen_host,
port=int(port),
certfile='cert.pem',
keyfile='privkey.pem',
debug=False,
use_reloader=False,
allow_unsafe_werkzeug=True # Only in newer versions
)
else:
socketio_server.run(
app,
host=listen_host,
port=int(port),
debug=False,
use_reloader=False,
allow_unsafe_werkzeug=True # Only in newer versions
)
except TypeError:
# If allow_unsafe_werkzeug is not a valid parameter, try without it
logger.info("Falling back to basic run method without allow_unsafe_werkzeug")
# Override the werkzeug safety check by setting an environment variable
os.environ['WERKZEUG_RUN_MAIN'] = 'true'
if ssl_mode:
socketio_server.run(
app,
host=listen_host,
port=int(port),
certfile='cert.pem',
keyfile='privkey.pem',
debug=False,
use_reloader=False
)
else:
socketio_server.run(
app,
host=listen_host,
port=int(port),
debug=False,
use_reloader=False
)
else:
logger.warning("Socket.IO server not initialized, falling back to standard WSGI server")
# Fallback to standard WSGI server if socketio_server is not available
listen_host = '0.0.0.0' if host == '' else host
# Launch using SocketIO run method for proper integration (if enabled)
if socketio_server:
if ssl_mode:
# Use Werkzeug's run_simple with SSL support
run_simple(
hostname=listen_host,
port=int(port),
application=app,
use_reloader=False,
use_debugger=False,
ssl_context=('cert.pem', 'privkey.pem')
)
socketio.run(app, host=host, port=int(port), debug=False,
certfile='cert.pem', keyfile='privkey.pem', allow_unsafe_werkzeug=True)
else:
# Use Werkzeug's run_simple for standard HTTP
run_simple(
hostname=listen_host,
port=int(port),
application=app,
use_reloader=False,
use_debugger=False
)
socketio.run(app, host=host, port=int(port), debug=False, allow_unsafe_werkzeug=True)
else:
# Run Flask app without Socket.IO if disabled
logger.info("Starting Flask app without Socket.IO server")
if ssl_mode:
app.run(host=host, port=int(port), debug=False,
ssl_context=('cert.pem', 'privkey.pem'))
else:
app.run(host=host, port=int(port), debug=False)
+4 -3
View File
@@ -3,6 +3,7 @@ from changedetectionio.strtobool import strtobool
from flask_expects_json import expects_json
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
from flask_restful import abort, Resource
from flask import request, make_response
import validators
@@ -47,7 +48,7 @@ class Watch(Resource):
abort(404, message='No watch exists with the UUID of {}'.format(uuid))
if request.args.get('recheck'):
self.update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return "OK", 200
if request.args.get('paused', '') == 'paused':
self.datastore.data['watching'].get(uuid).pause()
@@ -236,7 +237,7 @@ class CreateWatch(Resource):
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags)
if new_uuid:
self.update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
return {'uuid': new_uuid}, 201
else:
return "Invalid or unsupported URL", 400
@@ -291,7 +292,7 @@ class CreateWatch(Resource):
if request.args.get('recheck_all'):
for uuid in self.datastore.data['watching'].keys():
self.update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return {'status': "OK"}, 200
return list, 200
+449
View File
@@ -0,0 +1,449 @@
from .processors.exceptions import ProcessorException
import changedetectionio.content_fetchers.exceptions as content_fetchers_exceptions
from changedetectionio.processors.text_json_diff.processor import FilterNotFoundInResponse
from changedetectionio import html_tools
from changedetectionio.flask_app import watch_check_update
import asyncio
import importlib
import os
import time
from loguru import logger
# Async version of update_worker
# Processes jobs from AsyncSignalPriorityQueue instead of threaded queue
async def async_update_worker(worker_id, q, notification_q, app, datastore):
"""
Async worker function that processes watch check jobs from the queue.
Args:
worker_id: Unique identifier for this worker
q: AsyncSignalPriorityQueue containing jobs to process
notification_q: Standard queue for notifications
app: Flask application instance
datastore: Application datastore
"""
# Set a descriptive name for this task
task = asyncio.current_task()
if task:
task.set_name(f"async-worker-{worker_id}")
logger.info(f"Starting async worker {worker_id}")
while not app.config.exit.is_set():
update_handler = None
watch = None
try:
# Use asyncio wait_for to make queue.get() cancellable
queued_item_data = await asyncio.wait_for(q.get(), timeout=1.0)
except asyncio.TimeoutError:
# No jobs available, continue loop
continue
except Exception as e:
logger.error(f"Worker {worker_id} error getting queue item: {e}")
await asyncio.sleep(0.1)
continue
uuid = queued_item_data.item.get('uuid')
fetch_start_time = round(time.time())
# Mark this UUID as being processed
from changedetectionio import worker_handler
worker_handler.set_uuid_processing(uuid, processing=True)
try:
if uuid in list(datastore.data['watching'].keys()) and datastore.data['watching'][uuid].get('url'):
changed_detected = False
contents = b''
process_changedetection_results = True
update_obj = {}
# Clear last errors
datastore.data['watching'][uuid]['browser_steps_last_error_step'] = None
datastore.data['watching'][uuid]['last_checked'] = fetch_start_time
watch = datastore.data['watching'].get(uuid)
logger.info(f"Worker {worker_id} processing watch UUID {uuid} Priority {queued_item_data.priority} URL {watch['url']}")
try:
watch_check_update.send(watch_uuid=uuid)
# Processor is what we are using for detecting the "Change"
processor = watch.get('processor', 'text_json_diff')
# Init a new 'difference_detection_processor'
processor_module_name = f"changedetectionio.processors.{processor}.processor"
try:
processor_module = importlib.import_module(processor_module_name)
except ModuleNotFoundError as e:
print(f"Processor module '{processor}' not found.")
raise e
update_handler = processor_module.perform_site_check(datastore=datastore,
watch_uuid=uuid)
# All fetchers are now async, so call directly
await update_handler.call_browser()
# Run change detection (this is synchronous)
changed_detected, update_obj, contents = update_handler.run_changedetection(watch=watch)
except PermissionError as e:
logger.critical(f"File permission error updating file, watch: {uuid}")
logger.critical(str(e))
process_changedetection_results = False
except ProcessorException as e:
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot)
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
datastore.update_watch(uuid=uuid, update_obj={'last_error': e.message})
process_changedetection_results = False
except content_fetchers_exceptions.ReplyWithContentButNoText as e:
extra_help = ""
if e.has_filters:
has_img = html_tools.include_filters(include_filters='img',
html_content=e.html_content)
if has_img:
extra_help = ", it's possible that the filters you have give an empty result or contain only an image."
else:
extra_help = ", it's possible that the filters were found, but contained no usable text."
datastore.update_watch(uuid=uuid, update_obj={
'last_error': f"Got HTML content but no text found (With {e.status_code} reply code){extra_help}"
})
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
process_changedetection_results = False
except content_fetchers_exceptions.Non200ErrorCodeReceived as e:
if e.status_code == 403:
err_text = "Error - 403 (Access denied) received"
elif e.status_code == 404:
err_text = "Error - 404 (Page not found) received"
elif e.status_code == 407:
err_text = "Error - 407 (Proxy authentication required) received, did you need a username and password for the proxy?"
elif e.status_code == 500:
err_text = "Error - 500 (Internal server error) received from the web site"
else:
extra = ' (Access denied or blocked)' if str(e.status_code).startswith('4') else ''
err_text = f"Error - Request returned a HTTP error code {e.status_code}{extra}"
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data, as_error=True)
if e.page_text:
watch.save_error_text(contents=e.page_text)
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text})
process_changedetection_results = False
except FilterNotFoundInResponse as e:
if not datastore.data['watching'].get(uuid):
continue
err_text = "Warning, no filters were found, no change detection ran - Did the page change layout? update your Visual Filter if necessary."
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text})
# Filter wasnt found, but we should still update the visual selector so that they can have a chance to set it up again
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot)
if e.xpath_data:
watch.save_xpath_data(data=e.xpath_data)
# Only when enabled, send the notification
if watch.get('filter_failure_notification_send', False):
c = watch.get('consecutive_filter_failures', 0)
c += 1
# Send notification if we reached the threshold?
threshold = datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts', 0)
logger.debug(f"Filter for {uuid} not found, consecutive_filter_failures: {c} of threshold {threshold}")
if c >= threshold:
if not watch.get('notification_muted'):
logger.debug(f"Sending filter failed notification for {uuid}")
await send_filter_failure_notification(uuid, notification_q, datastore)
c = 0
logger.debug(f"Reset filter failure count back to zero")
datastore.update_watch(uuid=uuid, update_obj={'consecutive_filter_failures': c})
else:
logger.trace(f"{uuid} - filter_failure_notification_send not enabled, skipping")
process_changedetection_results = False
except content_fetchers_exceptions.checksumFromPreviousCheckWasTheSame as e:
# Yes fine, so nothing todo, don't continue to process.
process_changedetection_results = False
changed_detected = False
except content_fetchers_exceptions.BrowserConnectError as e:
datastore.update_watch(uuid=uuid,
update_obj={'last_error': e.msg})
process_changedetection_results = False
except content_fetchers_exceptions.BrowserFetchTimedOut as e:
datastore.update_watch(uuid=uuid,
update_obj={'last_error': e.msg})
process_changedetection_results = False
except content_fetchers_exceptions.BrowserStepsStepException as e:
if not datastore.data['watching'].get(uuid):
continue
error_step = e.step_n + 1
from playwright._impl._errors import TimeoutError, Error
# Generally enough info for TimeoutError (couldnt locate the element after default seconds)
err_text = f"Browser step at position {error_step} could not run, check the watch, add a delay if necessary, view Browser Steps to see screenshot at that step."
if e.original_e.name == "TimeoutError":
# Just the first line is enough, the rest is the stack trace
err_text += " Could not find the target."
else:
# Other Error, more info is good.
err_text += " " + str(e.original_e).splitlines()[0]
logger.debug(f"BrowserSteps exception at step {error_step} {str(e.original_e)}")
datastore.update_watch(uuid=uuid,
update_obj={'last_error': err_text,
'browser_steps_last_error_step': error_step})
if watch.get('filter_failure_notification_send', False):
c = watch.get('consecutive_filter_failures', 0)
c += 1
# Send notification if we reached the threshold?
threshold = datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts', 0)
logger.error(f"Step for {uuid} not found, consecutive_filter_failures: {c}")
if threshold > 0 and c >= threshold:
if not watch.get('notification_muted'):
await send_step_failure_notification(watch_uuid=uuid, step_n=e.step_n, notification_q=notification_q, datastore=datastore)
c = 0
datastore.update_watch(uuid=uuid, update_obj={'consecutive_filter_failures': c})
process_changedetection_results = False
except content_fetchers_exceptions.EmptyReply as e:
# Some kind of custom to-str handler in the exception handler that does this?
err_text = "EmptyReply - try increasing 'Wait seconds before extracting text', Status Code {}".format(e.status_code)
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code})
process_changedetection_results = False
except content_fetchers_exceptions.ScreenshotUnavailable as e:
err_text = "Screenshot unavailable, page did not render fully in the expected time or page was too long - try increasing 'Wait seconds before extracting text'"
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code})
process_changedetection_results = False
except content_fetchers_exceptions.JSActionExceptions as e:
err_text = "Error running JS Actions - Page request - "+e.message
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code})
process_changedetection_results = False
except content_fetchers_exceptions.PageUnloadable as e:
err_text = "Page request from server didnt respond correctly"
if e.message:
err_text = "{} - {}".format(err_text, e.message)
if e.screenshot:
watch.save_screenshot(screenshot=e.screenshot, as_error=True)
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code,
'has_ldjson_price_data': None})
process_changedetection_results = False
except content_fetchers_exceptions.BrowserStepsInUnsupportedFetcher as e:
err_text = "This watch has Browser Steps configured and so it cannot run with the 'Basic fast Plaintext/HTTP Client', either remove the Browser Steps or select a Chrome fetcher."
datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text})
process_changedetection_results = False
logger.error(f"Exception (BrowserStepsInUnsupportedFetcher) reached processing watch UUID: {uuid}")
except Exception as e:
logger.error(f"Worker {worker_id} exception processing watch UUID: {uuid}")
logger.error(str(e))
datastore.update_watch(uuid=uuid, update_obj={'last_error': "Exception: " + str(e)})
process_changedetection_results = False
else:
if not datastore.data['watching'].get(uuid):
continue
update_obj['content-type'] = update_handler.fetcher.get_all_headers().get('content-type', '').lower()
if not watch.get('ignore_status_codes'):
update_obj['consecutive_filter_failures'] = 0
update_obj['last_error'] = False
cleanup_error_artifacts(uuid, datastore)
if not datastore.data['watching'].get(uuid):
continue
if process_changedetection_results:
# Extract title if needed
if datastore.data['settings']['application'].get('extract_title_as_title') or watch['extract_title_as_title']:
if not watch['title'] or not len(watch['title']):
try:
update_obj['title'] = html_tools.extract_element(find='title', html_content=update_handler.fetcher.content)
logger.info(f"UUID: {uuid} Extract <title> updated title to '{update_obj['title']}")
except Exception as e:
logger.warning(f"UUID: {uuid} Extract <title> as watch title was enabled, but couldn't find a <title>.")
try:
datastore.update_watch(uuid=uuid, update_obj=update_obj)
if changed_detected or not watch.history_n:
if update_handler.screenshot:
watch.save_screenshot(screenshot=update_handler.screenshot)
if update_handler.xpath_data:
watch.save_xpath_data(data=update_handler.xpath_data)
# Ensure unique timestamp for history
if watch.newest_history_key and int(fetch_start_time) == int(watch.newest_history_key):
logger.warning(f"Timestamp {fetch_start_time} already exists, waiting 1 seconds")
fetch_start_time += 1
await asyncio.sleep(1)
watch.save_history_text(contents=contents,
timestamp=int(fetch_start_time),
snapshot_id=update_obj.get('previous_md5', 'none'))
empty_pages_are_a_change = datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
if update_handler.fetcher.content or (not update_handler.fetcher.content and empty_pages_are_a_change):
watch.save_last_fetched_html(contents=update_handler.fetcher.content, timestamp=int(fetch_start_time))
# Send notifications on second+ check
if watch.history_n >= 2:
logger.info(f"Change detected in UUID {uuid} - {watch['url']}")
if not watch.get('notification_muted'):
await send_content_changed_notification(uuid, notification_q, datastore)
except Exception as e:
logger.critical(f"Worker {worker_id} exception in process_changedetection_results")
logger.critical(str(e))
datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})
# Always record attempt count
count = watch.get('check_count', 0) + 1
# Record server header
try:
server_header = update_handler.fetcher.headers.get('server', '').strip().lower()[:255]
datastore.update_watch(uuid=uuid, update_obj={'remote_server_reply': server_header})
except Exception as e:
pass
datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - fetch_start_time, 3),
'check_count': count})
except Exception as e:
logger.error(f"Worker {worker_id} unexpected error processing {uuid}: {e}")
logger.error(f"Worker {worker_id} traceback:", exc_info=True)
# Also update the watch with error information
if datastore and uuid in datastore.data['watching']:
datastore.update_watch(uuid=uuid, update_obj={'last_error': f"Worker error: {str(e)}"})
finally:
# Always cleanup - this runs whether there was an exception or not
if uuid:
try:
# Mark UUID as no longer being processed
worker_handler.set_uuid_processing(uuid, processing=False)
# Send completion signal
if watch:
#logger.info(f"Worker {worker_id} sending completion signal for UUID {watch['uuid']}")
watch_check_update.send(watch_uuid=watch['uuid'])
update_handler = None
logger.debug(f"Worker {worker_id} completed watch {uuid} in {time.time()-fetch_start_time:.2f}s")
except Exception as cleanup_error:
logger.error(f"Worker {worker_id} error during cleanup: {cleanup_error}")
# Brief pause before continuing to avoid tight error loops (only on error)
if 'e' in locals():
await asyncio.sleep(1.0)
else:
# Small yield for normal completion
await asyncio.sleep(0.01)
# Check if we should exit
if app.config.exit.is_set():
break
# Check if we're in pytest environment - if so, be more gentle with logging
import sys
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
if not in_pytest:
logger.info(f"Worker {worker_id} shutting down")
def cleanup_error_artifacts(uuid, datastore):
"""Helper function to clean up error artifacts"""
cleanup_files = ["last-error-screenshot.png", "last-error.txt"]
for f in cleanup_files:
full_path = os.path.join(datastore.datastore_path, uuid, f)
if os.path.isfile(full_path):
os.unlink(full_path)
async def send_content_changed_notification(watch_uuid, notification_q, datastore):
"""Helper function to queue notifications using the new notification service"""
try:
from changedetectionio.notification_service import create_notification_service
# Create notification service instance
notification_service = create_notification_service(datastore, notification_q)
notification_service.send_content_changed_notification(watch_uuid)
except Exception as e:
logger.error(f"Error sending notification for {watch_uuid}: {e}")
async def send_filter_failure_notification(watch_uuid, notification_q, datastore):
"""Helper function to send filter failure notifications using the new notification service"""
try:
from changedetectionio.notification_service import create_notification_service
# Create notification service instance
notification_service = create_notification_service(datastore, notification_q)
notification_service.send_filter_failure_notification(watch_uuid)
except Exception as e:
logger.error(f"Error sending filter failure notification for {watch_uuid}: {e}")
async def send_step_failure_notification(watch_uuid, step_n, notification_q, datastore):
"""Helper function to send step failure notifications using the new notification service"""
try:
from changedetectionio.notification_service import create_notification_service
# Create notification service instance
notification_service = create_notification_service(datastore, notification_q)
notification_service.send_step_failure_notification(watch_uuid, step_n)
except Exception as e:
logger.error(f"Error sending step failure notification for {watch_uuid}: {e}")
@@ -25,35 +25,53 @@ io_interface_context = None
import json
import hashlib
from flask import Response
import asyncio
import threading
def run_async_in_browser_loop(coro):
"""Run async coroutine using the existing async worker event loop"""
from changedetectionio import worker_handler
# Use the existing async worker event loop instead of creating a new one
if worker_handler.USE_ASYNC_WORKERS and worker_handler.async_loop and not worker_handler.async_loop.is_closed():
logger.debug("Browser steps using existing async worker event loop")
future = asyncio.run_coroutine_threadsafe(coro, worker_handler.async_loop)
return future.result()
else:
# Fallback: create a new event loop (for sync workers or if async loop not available)
logger.debug("Browser steps creating temporary event loop")
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
return loop.run_until_complete(coro)
finally:
loop.close()
def construct_blueprint(datastore: ChangeDetectionStore):
browser_steps_blueprint = Blueprint('browser_steps', __name__, template_folder="templates")
def start_browsersteps_session(watch_uuid):
from . import nonContext
async def start_browsersteps_session(watch_uuid):
from . import browser_steps
import time
global io_interface_context
from playwright.async_api import async_playwright
# We keep the playwright session open for many minutes
keepalive_seconds = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60
browsersteps_start_session = {'start_time': time.time()}
# You can only have one of these running
# This should be very fine to leave running for the life of the application
# @idea - Make it global so the pool of watch fetchers can use it also
if not io_interface_context:
io_interface_context = nonContext.c_sync_playwright()
# Start the Playwright context, which is actually a nodejs sub-process and communicates over STDIN/STDOUT pipes
io_interface_context = io_interface_context.start()
# Create a new async playwright instance for browser steps
playwright_instance = async_playwright()
playwright_context = await playwright_instance.start()
keepalive_ms = ((keepalive_seconds + 3) * 1000)
base_url = os.getenv('PLAYWRIGHT_DRIVER_URL', '').strip('"')
a = "?" if not '?' in base_url else '&'
base_url += a + f"timeout={keepalive_ms}"
browsersteps_start_session['browser'] = io_interface_context.chromium.connect_over_cdp(base_url)
browser = await playwright_context.chromium.connect_over_cdp(base_url, timeout=keepalive_ms)
browsersteps_start_session['browser'] = browser
browsersteps_start_session['playwright_context'] = playwright_context
proxy_id = datastore.get_preferred_proxy_for_watch(uuid=watch_uuid)
proxy = None
@@ -75,15 +93,20 @@ def construct_blueprint(datastore: ChangeDetectionStore):
logger.debug(f"Browser Steps: UUID {watch_uuid} selected proxy {proxy_url}")
# Tell Playwright to connect to Chrome and setup a new session via our stepper interface
browsersteps_start_session['browserstepper'] = browser_steps.browsersteps_live_ui(
playwright_browser=browsersteps_start_session['browser'],
browserstepper = browser_steps.browsersteps_live_ui(
playwright_browser=browser,
proxy=proxy,
start_url=datastore.data['watching'][watch_uuid].link,
headers=datastore.data['watching'][watch_uuid].get('headers')
)
# Initialize the async connection
await browserstepper.connect(proxy=proxy)
browsersteps_start_session['browserstepper'] = browserstepper
# For test
#browsersteps_start_session['browserstepper'].action_goto_url(value="http://example.com?time="+str(time.time()))
#await browsersteps_start_session['browserstepper'].action_goto_url(value="http://example.com?time="+str(time.time()))
return browsersteps_start_session
@@ -92,7 +115,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
@browser_steps_blueprint.route("/browsersteps_start_session", methods=['GET'])
def browsersteps_start_session():
# A new session was requested, return sessionID
import asyncio
import uuid
browsersteps_session_id = str(uuid.uuid4())
watch_uuid = request.args.get('uuid')
@@ -104,7 +127,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
logger.debug("browser_steps.py connecting")
try:
browsersteps_sessions[browsersteps_session_id] = start_browsersteps_session(watch_uuid)
# Run the async function in the dedicated browser steps event loop
browsersteps_sessions[browsersteps_session_id] = run_async_in_browser_loop(
start_browsersteps_session(watch_uuid)
)
except Exception as e:
if 'ECONNREFUSED' in str(e):
return make_response('Unable to start the Playwright Browser session, is sockpuppetbrowser running? Network configuration is OK?', 401)
@@ -169,9 +195,14 @@ def construct_blueprint(datastore: ChangeDetectionStore):
is_last_step = strtobool(request.form.get('is_last_step'))
try:
browsersteps_sessions[browsersteps_session_id]['browserstepper'].call_action(action_name=step_operation,
selector=step_selector,
optional_value=step_optional_value)
# Run the async call_action method in the dedicated browser steps event loop
run_async_in_browser_loop(
browsersteps_sessions[browsersteps_session_id]['browserstepper'].call_action(
action_name=step_operation,
selector=step_selector,
optional_value=step_optional_value
)
)
except Exception as e:
logger.error(f"Exception when calling step operation {step_operation} {str(e)}")
@@ -185,7 +216,11 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# Screenshots and other info only needed on requesting a step (POST)
try:
(screenshot, xpath_data) = browsersteps_sessions[browsersteps_session_id]['browserstepper'].get_current_state()
# Run the async get_current_state method in the dedicated browser steps event loop
(screenshot, xpath_data) = run_async_in_browser_loop(
browsersteps_sessions[browsersteps_session_id]['browserstepper'].get_current_state()
)
if is_last_step:
watch = datastore.data['watching'].get(uuid)
u = browsersteps_sessions[browsersteps_session_id]['browserstepper'].page.url
@@ -199,7 +234,6 @@ def construct_blueprint(datastore: ChangeDetectionStore):
return make_response("Error fetching screenshot and element data - " + str(e), 401)
# SEND THIS BACK TO THE BROWSER
output = {
"screenshot": f"data:image/jpeg;base64,{base64.b64encode(screenshot).decode('ascii')}",
"xpath_data": xpath_data,
@@ -63,7 +63,7 @@ class steppable_browser_interface():
self.start_url = start_url
# Convert and perform "Click Button" for example
def call_action(self, action_name, selector=None, optional_value=None):
async def call_action(self, action_name, selector=None, optional_value=None):
if self.page is None:
logger.warning("Cannot call action on None page object")
return
@@ -93,73 +93,74 @@ class steppable_browser_interface():
optional_value = jinja_render(template_str=optional_value)
action_handler(selector, optional_value)
await action_handler(selector, optional_value)
# Safely wait for timeout
self.page.wait_for_timeout(1.5 * 1000)
await self.page.wait_for_timeout(1.5 * 1000)
logger.debug(f"Call action done in {time.time()-now:.2f}s")
def action_goto_url(self, selector=None, value=None):
async def action_goto_url(self, selector=None, value=None):
if not value:
logger.warning("No URL provided for goto_url action")
return None
now = time.time()
response = self.page.goto(value, timeout=0, wait_until='load')
response = await self.page.goto(value, timeout=0, wait_until='load')
logger.debug(f"Time to goto URL {time.time()-now:.2f}s")
return response
# Incase they request to go back to the start
def action_goto_site(self, selector=None, value=None):
return self.action_goto_url(value=self.start_url)
async def action_goto_site(self, selector=None, value=None):
return await self.action_goto_url(value=self.start_url)
def action_click_element_containing_text(self, selector=None, value=''):
async def action_click_element_containing_text(self, selector=None, value=''):
logger.debug("Clicking element containing text")
if not value or not len(value.strip()):
return
elem = self.page.get_by_text(value)
if elem.count():
elem.first.click(delay=randint(200, 500), timeout=self.action_timeout)
if await elem.count():
await elem.first.click(delay=randint(200, 500), timeout=self.action_timeout)
def action_click_element_containing_text_if_exists(self, selector=None, value=''):
async def action_click_element_containing_text_if_exists(self, selector=None, value=''):
logger.debug("Clicking element containing text if exists")
if not value or not len(value.strip()):
return
elem = self.page.get_by_text(value)
logger.debug(f"Clicking element containing text - {elem.count()} elements found")
if elem.count():
elem.first.click(delay=randint(200, 500), timeout=self.action_timeout)
count = await elem.count()
logger.debug(f"Clicking element containing text - {count} elements found")
if count:
await elem.first.click(delay=randint(200, 500), timeout=self.action_timeout)
def action_enter_text_in_field(self, selector, value):
async def action_enter_text_in_field(self, selector, value):
if not selector or not len(selector.strip()):
return
self.page.fill(selector, value, timeout=self.action_timeout)
await self.page.fill(selector, value, timeout=self.action_timeout)
def action_execute_js(self, selector, value):
async def action_execute_js(self, selector, value):
if not value:
return None
return self.page.evaluate(value)
return await self.page.evaluate(value)
def action_click_element(self, selector, value):
async def action_click_element(self, selector, value):
logger.debug("Clicking element")
if not selector or not len(selector.strip()):
return
self.page.click(selector=selector, timeout=self.action_timeout + 20 * 1000, delay=randint(200, 500))
await self.page.click(selector=selector, timeout=self.action_timeout + 20 * 1000, delay=randint(200, 500))
def action_click_element_if_exists(self, selector, value):
async def action_click_element_if_exists(self, selector, value):
import playwright._impl._errors as _api_types
logger.debug("Clicking element if exists")
if not selector or not len(selector.strip()):
return
try:
self.page.click(selector, timeout=self.action_timeout, delay=randint(200, 500))
await self.page.click(selector, timeout=self.action_timeout, delay=randint(200, 500))
except _api_types.TimeoutError:
return
except _api_types.Error:
@@ -167,7 +168,7 @@ class steppable_browser_interface():
return
def action_click_x_y(self, selector, value):
async def action_click_x_y(self, selector, value):
if not value or not re.match(r'^\s?\d+\s?,\s?\d+\s?$', value):
logger.warning("'Click X,Y' step should be in the format of '100 , 90'")
return
@@ -177,42 +178,42 @@ class steppable_browser_interface():
x = int(float(x.strip()))
y = int(float(y.strip()))
self.page.mouse.click(x=x, y=y, delay=randint(200, 500))
await self.page.mouse.click(x=x, y=y, delay=randint(200, 500))
except Exception as e:
logger.error(f"Error parsing x,y coordinates: {str(e)}")
def action__select_by_option_text(self, selector, value):
async def action__select_by_option_text(self, selector, value):
if not selector or not len(selector.strip()):
return
self.page.select_option(selector, label=value, timeout=self.action_timeout)
await self.page.select_option(selector, label=value, timeout=self.action_timeout)
def action_scroll_down(self, selector, value):
async def action_scroll_down(self, selector, value):
# Some sites this doesnt work on for some reason
self.page.mouse.wheel(0, 600)
self.page.wait_for_timeout(1000)
await self.page.mouse.wheel(0, 600)
await self.page.wait_for_timeout(1000)
def action_wait_for_seconds(self, selector, value):
async def action_wait_for_seconds(self, selector, value):
try:
seconds = float(value.strip()) if value else 1.0
self.page.wait_for_timeout(seconds * 1000)
await self.page.wait_for_timeout(seconds * 1000)
except (ValueError, TypeError) as e:
logger.error(f"Invalid value for wait_for_seconds: {str(e)}")
def action_wait_for_text(self, selector, value):
async def action_wait_for_text(self, selector, value):
if not value:
return
import json
v = json.dumps(value)
self.page.wait_for_function(
await self.page.wait_for_function(
f'document.querySelector("body").innerText.includes({v});',
timeout=30000
)
def action_wait_for_text_in_element(self, selector, value):
async def action_wait_for_text_in_element(self, selector, value):
if not selector or not value:
return
@@ -220,49 +221,49 @@ class steppable_browser_interface():
s = json.dumps(selector)
v = json.dumps(value)
self.page.wait_for_function(
await self.page.wait_for_function(
f'document.querySelector({s}).innerText.includes({v});',
timeout=30000
)
# @todo - in the future make some popout interface to capture what needs to be set
# https://playwright.dev/python/docs/api/class-keyboard
def action_press_enter(self, selector, value):
self.page.keyboard.press("Enter", delay=randint(200, 500))
async def action_press_enter(self, selector, value):
await self.page.keyboard.press("Enter", delay=randint(200, 500))
def action_press_page_up(self, selector, value):
self.page.keyboard.press("PageUp", delay=randint(200, 500))
async def action_press_page_up(self, selector, value):
await self.page.keyboard.press("PageUp", delay=randint(200, 500))
def action_press_page_down(self, selector, value):
self.page.keyboard.press("PageDown", delay=randint(200, 500))
async def action_press_page_down(self, selector, value):
await self.page.keyboard.press("PageDown", delay=randint(200, 500))
def action_check_checkbox(self, selector, value):
async def action_check_checkbox(self, selector, value):
if not selector:
return
self.page.locator(selector).check(timeout=self.action_timeout)
await self.page.locator(selector).check(timeout=self.action_timeout)
def action_uncheck_checkbox(self, selector, value):
async def action_uncheck_checkbox(self, selector, value):
if not selector:
return
self.page.locator(selector).uncheck(timeout=self.action_timeout)
await self.page.locator(selector).uncheck(timeout=self.action_timeout)
def action_remove_elements(self, selector, value):
async def action_remove_elements(self, selector, value):
"""Removes all elements matching the given selector from the DOM."""
if not selector:
return
self.page.locator(selector).evaluate_all("els => els.forEach(el => el.remove())")
await self.page.locator(selector).evaluate_all("els => els.forEach(el => el.remove())")
def action_make_all_child_elements_visible(self, selector, value):
async def action_make_all_child_elements_visible(self, selector, value):
"""Recursively makes all child elements inside the given selector fully visible."""
if not selector:
return
self.page.locator(selector).locator("*").evaluate_all("""
await self.page.locator(selector).locator("*").evaluate_all("""
els => els.forEach(el => {
el.style.display = 'block'; // Forces it to be displayed
el.style.visibility = 'visible'; // Ensures it's not hidden
@@ -307,21 +308,22 @@ class browsersteps_live_ui(steppable_browser_interface):
self.playwright_browser = playwright_browser
self.start_url = start_url
self._is_cleaned_up = False
if self.context is None:
self.connect(proxy=proxy)
self.proxy = proxy
# Note: connect() is now async and must be called separately
def __del__(self):
# Ensure cleanup happens if object is garbage collected
self.cleanup()
# Note: cleanup is now async, so we can only mark as cleaned up here
self._is_cleaned_up = True
# Connect and setup a new context
def connect(self, proxy=None):
async def connect(self, proxy=None):
# Should only get called once - test that
keep_open = 1000 * 60 * 5
now = time.time()
# @todo handle multiple contexts, bind a unique id from the browser on each req?
self.context = self.playwright_browser.new_context(
self.context = await self.playwright_browser.new_context(
accept_downloads=False, # Should never be needed
bypass_csp=True, # This is needed to enable JavaScript execution on GitHub and others
extra_http_headers=self.headers,
@@ -332,7 +334,7 @@ class browsersteps_live_ui(steppable_browser_interface):
user_agent=manage_user_agent(headers=self.headers),
)
self.page = self.context.new_page()
self.page = await self.context.new_page()
# self.page.set_default_navigation_timeout(keep_open)
self.page.set_default_timeout(keep_open)
@@ -342,13 +344,15 @@ class browsersteps_live_ui(steppable_browser_interface):
self.page.on("console", lambda msg: print(f"Browser steps console - {msg.type}: {msg.text} {msg.args}"))
logger.debug(f"Time to browser setup {time.time()-now:.2f}s")
self.page.wait_for_timeout(1 * 1000)
await self.page.wait_for_timeout(1 * 1000)
def mark_as_closed(self):
logger.debug("Page closed, cleaning up..")
self.cleanup()
# Note: This is called from a sync context (event handler)
# so we'll just mark as cleaned up and let __del__ handle the rest
self._is_cleaned_up = True
def cleanup(self):
async def cleanup(self):
"""Properly clean up all resources to prevent memory leaks"""
if self._is_cleaned_up:
return
@@ -359,7 +363,7 @@ class browsersteps_live_ui(steppable_browser_interface):
if hasattr(self, 'page') and self.page is not None:
try:
# Force garbage collection before closing
self.page.request_gc()
await self.page.request_gc()
except Exception as e:
logger.debug(f"Error during page garbage collection: {str(e)}")
@@ -370,7 +374,7 @@ class browsersteps_live_ui(steppable_browser_interface):
logger.debug(f"Error removing event listeners: {str(e)}")
try:
self.page.close()
await self.page.close()
except Exception as e:
logger.debug(f"Error closing page: {str(e)}")
@@ -379,7 +383,7 @@ class browsersteps_live_ui(steppable_browser_interface):
# Clean up context
if hasattr(self, 'context') and self.context is not None:
try:
self.context.close()
await self.context.close()
except Exception as e:
logger.debug(f"Error closing context: {str(e)}")
@@ -401,12 +405,12 @@ class browsersteps_live_ui(steppable_browser_interface):
return False
def get_current_state(self):
async def get_current_state(self):
"""Return the screenshot and interactive elements mapping, generally always called after action_()"""
import importlib.resources
import json
# because we for now only run browser steps in playwright mode (not puppeteer mode)
from changedetectionio.content_fetchers.playwright import capture_full_page
from changedetectionio.content_fetchers.playwright import capture_full_page_async
# Safety check - don't proceed if resources are cleaned up
if self._is_cleaned_up or self.page is None:
@@ -416,29 +420,29 @@ class browsersteps_live_ui(steppable_browser_interface):
xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text()
now = time.time()
self.page.wait_for_timeout(1 * 1000)
await self.page.wait_for_timeout(1 * 1000)
screenshot = None
xpath_data = None
try:
# Get screenshot first
screenshot = capture_full_page(page=self.page)
screenshot = await capture_full_page_async(page=self.page)
logger.debug(f"Time to get screenshot from browser {time.time() - now:.2f}s")
# Then get interactive elements
now = time.time()
self.page.evaluate("var include_filters=''")
self.page.request_gc()
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
scan_elements = 'a,button,input,select,textarea,i,th,td,p,li,h1,h2,h3,h4,div,span'
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
xpath_data = json.loads(self.page.evaluate(xpath_element_js, {
xpath_data = json.loads(await self.page.evaluate(xpath_element_js, {
"visualselector_xpath_selectors": scan_elements,
"max_height": MAX_TOTAL_HEIGHT
}))
self.page.request_gc()
await self.page.request_gc()
# Sort elements by size
xpath_data['size_pos'] = sorted(xpath_data['size_pos'], key=lambda k: k['width'] * k['height'], reverse=True)
@@ -448,13 +452,13 @@ class browsersteps_live_ui(steppable_browser_interface):
logger.error(f"Error getting current state: {str(e)}")
# Attempt recovery - force garbage collection
try:
self.page.request_gc()
await self.page.request_gc()
except:
pass
# Request garbage collection one final time
try:
self.page.request_gc()
await self.page.request_gc()
except:
pass
@@ -1,17 +0,0 @@
from playwright.sync_api import PlaywrightContextManager
# So playwright wants to run as a context manager, but we do something horrible and hacky
# we are holding the session open for as long as possible, then shutting it down, and opening a new one
# So it means we don't get to use PlaywrightContextManager' __enter__ __exit__
# To work around this, make goodbye() act the same as the __exit__()
#
# But actually I think this is because the context is opened correctly with __enter__() but we timeout the connection
# then theres some lock condition where we cant destroy it without it hanging
class c_PlaywrightContextManager(PlaywrightContextManager):
def goodbye(self) -> None:
self.__exit__()
def c_sync_playwright() -> PlaywrightContextManager:
return c_PlaywrightContextManager()
@@ -1,6 +1,7 @@
from flask import Blueprint, request, redirect, url_for, flash, render_template
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import worker_handler
from changedetectionio.blueprint.imports.importer import (
import_url_list,
import_distill_io_json,
@@ -24,7 +25,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
importer_handler = import_url_list()
importer_handler.run(data=request.values.get('urls'), flash=flash, datastore=datastore, processor=request.values.get('processor', 'text_json_diff'))
for uuid in importer_handler.new_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
if len(importer_handler.remaining_data) == 0:
return redirect(url_for('watchlist.index'))
@@ -37,7 +38,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
d_importer = import_distill_io_json()
d_importer.run(data=request.values.get('distill-io'), flash=flash, datastore=datastore)
for uuid in d_importer.new_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# XLSX importer
if request.files and request.files.get('xlsx_file'):
@@ -60,7 +61,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
w_importer.run(data=file, flash=flash, datastore=datastore)
for uuid in w_importer.new_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Could be some remaining, or we could be on GET
form = forms.importForm(formdata=request.form if request.method == 'POST' else None)
@@ -4,6 +4,7 @@ from flask import Blueprint, flash, redirect, url_for
from flask_login import login_required
from changedetectionio.store import ChangeDetectionStore
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
from queue import PriorityQueue
PRICE_DATA_TRACK_ACCEPT = 'accepted'
@@ -19,7 +20,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q: PriorityQueue
datastore.data['watching'][uuid]['track_ldjson_price_data'] = PRICE_DATA_TRACK_ACCEPT
datastore.data['watching'][uuid]['processor'] = 'restock_diff'
datastore.data['watching'][uuid].clear_watch()
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return redirect(url_for("watchlist.index"))
@login_required
@@ -67,7 +67,32 @@ def construct_blueprint(datastore: ChangeDetectionStore):
del (app_update['password'])
datastore.data['settings']['application'].update(app_update)
# Handle dynamic worker count adjustment
old_worker_count = datastore.data['settings']['requests'].get('workers', 1)
new_worker_count = form.data['requests'].get('workers', 1)
datastore.data['settings']['requests'].update(form.data['requests'])
# Adjust worker count if it changed
if new_worker_count != old_worker_count:
from changedetectionio import worker_handler
from changedetectionio.flask_app import update_q, notification_q, app, datastore as ds
result = worker_handler.adjust_async_worker_count(
new_count=new_worker_count,
update_q=update_q,
notification_q=notification_q,
app=app,
datastore=ds
)
if result['status'] == 'success':
flash(f"Worker count adjusted: {result['message']}", 'notice')
elif result['status'] == 'not_supported':
flash("Dynamic worker adjustment not supported for sync workers", 'warning')
elif result['status'] == 'error':
flash(f"Error adjusting workers: {result['message']}", 'error')
if not os.getenv("SALTED_PASS", False) and len(form.application.form.password.encrypted_password):
datastore.data['settings']['application']['password'] = form.application.form.password.encrypted_password
@@ -135,6 +135,12 @@
{{ render_field(form.application.form.webdriver_delay) }}
</div>
</fieldset>
<div class="pure-control-group">
{{ render_field(form.requests.form.workers) }}
{% set worker_info = get_worker_status_info() %}
<span class="pure-form-message-inline">Number of concurrent workers to process watches. More workers = faster processing but higher memory usage.<br>
Currently running: <strong>{{ worker_info.count }}</strong> operational {{ worker_info.type }} workers{% if worker_info.active_workers > 0 %} ({{ worker_info.active_workers }} actively processing){% endif %}.</span>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.requests.form.default_ua) }}
<span class="pure-form-message-inline">
@@ -247,9 +253,9 @@ nav
<span class="pure-form-message-inline">Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.</span>
</div>
<div class="pure-control-group">
<span class="pure-form-message-inline">Enable realtime updates in the UI</span>
{{ render_checkbox_field(form.application.form.ui.form.socket_io_enabled, class="socket_io_enabled") }}
<span class="pure-form-message-inline">Realtime UI Updates Enabled - (Restart required if this is changed)</span>
</div>
</div>
<div class="tab-pane-inner" id="proxies">
<div id="recommended-proxy">
+6 -13
View File
@@ -1,15 +1,13 @@
import time
from flask import Blueprint, request, redirect, url_for, flash, render_template, session
from loguru import logger
from functools import wraps
from changedetectionio.blueprint.ui.ajax import constuct_ui_ajax_blueprint
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.blueprint.ui.edit import construct_blueprint as construct_edit_blueprint
from changedetectionio.blueprint.ui.notification import construct_blueprint as construct_notification_blueprint
from changedetectionio.blueprint.ui.views import construct_blueprint as construct_views_blueprint
def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_update_threads, queuedWatchMetaData, watch_check_update):
def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handler, queuedWatchMetaData, watch_check_update):
ui_blueprint = Blueprint('ui', __name__, template_folder="templates")
# Register the edit blueprint
@@ -24,9 +22,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
views_blueprint = construct_views_blueprint(datastore, update_q, queuedWatchMetaData, watch_check_update)
ui_blueprint.register_blueprint(views_blueprint)
ui_ajax_blueprint = constuct_ui_ajax_blueprint(datastore, update_q, running_update_threads, queuedWatchMetaData, watch_check_update)
ui_blueprint.register_blueprint(ui_ajax_blueprint)
# Import the login decorator
from changedetectionio.auth_decorator import login_optionally_required
@@ -100,7 +95,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
new_uuid = datastore.clone(uuid)
if not datastore.data['watching'].get(uuid).get('paused'):
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid}))
flash('Cloned, you are editing the new watch.')
@@ -116,13 +111,11 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
i = 0
running_uuids = []
for t in running_update_threads:
running_uuids.append(t.current_uuid)
running_uuids = worker_handler.get_running_uuids()
if uuid:
if uuid not in running_uuids:
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
i += 1
else:
@@ -139,7 +132,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
if tag != None and tag not in watch['tags']:
continue
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
i += 1
if i == 1:
@@ -197,7 +190,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, running_updat
for uuid in uuids:
if datastore.data['watching'].get(uuid):
# Recheck and require a full reprocessing
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
flash("{} watches queued for rechecking".format(len(uuids)))
elif (op == 'clear-errors'):
-35
View File
@@ -1,35 +0,0 @@
import time
from blinker import signal
from flask import Blueprint, request, redirect, url_for, flash, render_template, session
from changedetectionio.store import ChangeDetectionStore
def constuct_ui_ajax_blueprint(datastore: ChangeDetectionStore, update_q, running_update_threads, queuedWatchMetaData, watch_check_update):
ui_ajax_blueprint = Blueprint('ajax', __name__, template_folder="templates", url_prefix='/ajax')
# Import the login decorator
from changedetectionio.auth_decorator import login_optionally_required
@ui_ajax_blueprint.route("/toggle", methods=['POST'])
@login_optionally_required
def ajax_toggler():
op = request.values.get('op')
uuid = request.values.get('uuid')
if op and datastore.data['watching'].get(uuid):
if op == 'pause':
datastore.data['watching'][uuid].toggle_pause()
elif op == 'mute':
datastore.data['watching'][uuid].toggle_mute()
elif op == 'recheck':
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
watch_check_update = signal('watch_check_update')
if watch_check_update:
watch_check_update.send(watch_uuid=uuid)
return 'OK'
return ui_ajax_blueprint
+2 -1
View File
@@ -9,6 +9,7 @@ from jinja2 import Environment, FileSystemLoader
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio.time_handler import is_within_schedule
from changedetectionio import worker_handler
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData):
edit_blueprint = Blueprint('ui_edit', __name__, template_folder="../ui/templates")
@@ -201,7 +202,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
#############################
if not datastore.data['watching'][uuid].get('paused') and is_in_schedule:
# Queue the watch for immediate recheck, with a higher priority
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Diff page [edit] link should go back to diff page
if request.args.get("next") and request.args.get("next") == 'diff':
+2 -1
View File
@@ -7,6 +7,7 @@ from copy import deepcopy
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import html_tools
from changedetectionio import worker_handler
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData, watch_check_update):
views_blueprint = Blueprint('ui_views', __name__, template_folder="../ui/templates")
@@ -212,7 +213,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
return redirect(url_for('ui.ui_edit.edit_page', uuid=new_uuid, unpause_on_save=1, tag=request.args.get('tag')))
else:
# Straight into the queue.
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
flash("Watch added.")
return redirect(url_for('watchlist.index', tag=request.args.get('tag','')))
@@ -78,7 +78,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
active_tag=active_tag,
active_tag_uuid=active_tag_uuid,
app_rss_token=datastore.data['settings']['application'].get('rss_access_token'),
ajax_toggle_url=url_for('ui.ajax.ajax_toggler'),
datastore=datastore,
errored_count=errored_count,
form=form,
@@ -1,11 +1,15 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_simple_field, render_field, render_nolabel_field, sort_by_title %}
{%- extends 'base.html' -%}
{%- block content -%}
{%- from '_helpers.html' import render_simple_field, render_field, render_nolabel_field, sort_by_title -%}
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='watch-overview.js')}}" defer></script>
<script>let nowtimeserver={{ now_time_server }};</script>
<script>let ajax_toggle_url="{{ ajax_toggle_url }}";</script>
<script>
// Initialize Feather icons after the page loads
document.addEventListener('DOMContentLoaded', function() {
feather.replace();
});
</script>
<style>
.checking-now .last-checked {
background-image: linear-gradient(to bottom, transparent 0%, rgba(0,0,0,0.05) 40%, rgba(0,0,0,0.1) 100%);
@@ -39,139 +43,143 @@
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" >
<input type="hidden" id="op_extradata" name="op_extradata" value="" >
<div id="checkbox-operations">
<button class="pure-button button-secondary button-xsmall" name="op" value="pause">Pause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unpause">UnPause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mute">Mute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unmute">UnMute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="recheck">Recheck</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="assign-tag" id="checkbox-assign-tag">Tag</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mark-viewed">Mark viewed</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="notification-default">Use default notification</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="clear-errors">Clear errors</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="clear-history">Clear/reset history</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="delete">Delete</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="pause"><i data-feather="pause" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Pause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unpause"><i data-feather="play" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>UnPause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mute"><i data-feather="volume-x" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Mute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unmute"><i data-feather="volume-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>UnMute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="recheck"><i data-feather="refresh-cw" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Recheck</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="assign-tag" id="checkbox-assign-tag"><i data-feather="tag" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Tag</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mark-viewed"><i data-feather="eye" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Mark viewed</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="notification-default"><i data-feather="bell" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Use default notification</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="clear-errors"><i data-feather="x-circle" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Clear errors</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="clear-history"><i data-feather="trash-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Clear/reset history</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="delete"><i data-feather="trash" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Delete</button>
</div>
{% if watches|length >= pagination.per_page %}
{%- if watches|length >= pagination.per_page -%}
{{ pagination.info }}
{% endif %}
{% if search_q %}<div id="search-result-info">Searching "<strong><i>{{search_q}}</i></strong>"</div>{% endif %}
{%- endif -%}
{%- if search_q -%}<div id="search-result-info">Searching "<strong><i>{{search_q}}</i></strong>"</div>{%- endif -%}
<div>
<a href="{{url_for('watchlist.index')}}" class="pure-button button-tag {{'active' if not active_tag_uuid }}">All</a>
<!-- tag list -->
{% for uuid, tag in tags %}
{% if tag != "" %}
{%- for uuid, tag in tags -%}
{%- if tag != "" -%}
<a href="{{url_for('watchlist.index', tag=uuid) }}" class="pure-button button-tag {{'active' if active_tag_uuid == uuid }}">{{ tag.title }}</a>
{% endif %}
{% endfor %}
{%- endif -%}
{%- endfor -%}
</div>
{% set sort_order = sort_order or 'asc' %}
{% set sort_attribute = sort_attribute or 'last_changed' %}
{% set pagination_page = request.args.get('page', 0) %}
{% set cols_required = 6 %}
{% set any_has_restock_price_processor = datastore.any_watches_have_processor_by_name("restock_diff") %}
{% if any_has_restock_price_processor %}
{% set cols_required = cols_required + 1 %}
{% endif %}
{%- set sort_order = sort_order or 'asc' -%}
{%- set sort_attribute = sort_attribute or 'last_changed' -%}
{%- set pagination_page = request.args.get('page', 0) -%}
{%- set cols_required = 6 -%}
{%- set any_has_restock_price_processor = datastore.any_watches_have_processor_by_name("restock_diff") -%}
{%- if any_has_restock_price_processor -%}
{%- set cols_required = cols_required + 1 -%}
{%- endif -%}
<div id="watch-table-wrapper">
<table class="pure-table pure-table-striped watch-table">
<thead>
<tr>
{% set link_order = "desc" if sort_order == 'asc' else "asc" %}
{% set arrow_span = "" %}
{%- set link_order = "desc" if sort_order == 'asc' else "asc" -%}
{%- set arrow_span = "" -%}
<th><input style="vertical-align: middle" type="checkbox" id="check-all" > <a class="{{ 'active '+link_order if sort_attribute == 'date_created' else 'inactive' }}" href="{{url_for('watchlist.index', sort='date_created', order=link_order, tag=active_tag_uuid)}}"># <span class='arrow {{link_order}}'></span></a></th>
<th class="empty-cell"></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'label' else 'inactive' }}" href="{{url_for('watchlist.index', sort='label', order=link_order, tag=active_tag_uuid)}}">Website <span class='arrow {{link_order}}'></span></a></th>
{% if any_has_restock_price_processor %}
{%- if any_has_restock_price_processor -%}
<th>Restock &amp; Price</th>
{% endif %}
{%- endif -%}
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_checked' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_checked', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">Last</span> Checked <span class='arrow {{link_order}}'></span></a></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_changed' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_changed', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">Last</span> Changed <span class='arrow {{link_order}}'></span></a></th>
<th class="empty-cell"></th>
</tr>
</thead>
<tbody>
{% if not watches|length %}
{%- if not watches|length -%}
<tr>
<td colspan="{{ cols_required }}" style="text-wrap: wrap;">No website watches configured, please add a URL in the box above, or <a href="{{ url_for('imports.import_page')}}" >import a list</a>.</td>
</tr>
{% endif %}
{% for watch in (watches|sort(attribute=sort_attribute, reverse=sort_order == 'asc'))|pagination_slice(skip=pagination.skip) %}
{% set checking_now = is_checking_now(watch) %}
<tr id="{{ watch.uuid }}" data-watch-uuid="{{ watch.uuid }}"
class="{{ loop.cycle('pure-table-odd', 'pure-table-even') }} processor-{{ watch['processor'] }}
{# realtime.js also sets these vars on the row for update #}
{% if watch.compile_error_texts()|length >2 %}has-error{% endif %}
{% if watch.paused is defined and watch.paused != False %}paused{% endif %}
{% if watch.has_unviewed %}unviewed{% endif %}
{% if watch.has_restock_info %} has-restock-info {% if watch['restock']['in_stock'] %}in-stock{% else %}not-in-stock{% endif %} {% else %}no-restock-info{% endif %}
{% if watch.uuid in queued_uuids %}queued{% endif %}
{% if checking_now %}checking-now{% endif %}
{% if watch.notification_muted %}notification_muted{% endif %}
">
{%- endif -%}
{%- for watch in (watches|sort(attribute=sort_attribute, reverse=sort_order == 'asc'))|pagination_slice(skip=pagination.skip) -%}
{%- set checking_now = is_checking_now(watch) -%}
{%- set history_n = watch.history_n -%}
{# Mirror in changedetectionio/static/js/realtime.js for the frontend #}
{%- set row_classes = [
loop.cycle('pure-table-odd', 'pure-table-even'),
'processor-' ~ watch['processor'],
'has-error' if watch.compile_error_texts()|length > 2 else '',
'paused' if watch.paused is defined and watch.paused != False else '',
'unviewed' if watch.has_unviewed else '',
'has-restock-info' if watch.has_restock_info else 'no-restock-info',
'in-stock' if watch.has_restock_info and watch['restock']['in_stock'] else '',
'not-in-stock' if watch.has_restock_info and not watch['restock']['in_stock'] else '',
'queued' if watch.uuid in queued_uuids else '',
'checking-now' if checking_now else '',
'notification_muted' if watch.notification_muted else '',
'single-history' if history_n == 1 else '',
'multiple-history' if history_n >= 2 else ''
] -%}
<tr id="{{ watch.uuid }}" data-watch-uuid="{{ watch.uuid }}" class="{{ row_classes | reject('equalto', '') | join(' ') }}">
<td class="inline checkbox-uuid" ><input name="uuids" type="checkbox" value="{{ watch.uuid}} " > <span>{{ loop.index+pagination.skip }}</span></td>
<td class="inline watch-controls">
<a class="ajax-op state-off pause-toggle" data-op="pause" href="{{url_for('watchlist.index', op='pause', uuid=watch.uuid, tag=active_tag_uuid)}}"><img src="{{url_for('static_content', group='images', filename='pause.svg')}}" alt="Pause checks" title="Pause checks" class="icon icon-pause" ></a>
<a class="ajax-op state-on pause-toggle" data-op="pause" style="display: none" href="{{url_for('watchlist.index', op='pause', uuid=watch.uuid, tag=active_tag_uuid)}}"><img src="{{url_for('static_content', group='images', filename='play.svg')}}" alt="UnPause checks" title="UnPause checks" class="icon icon-unpause" ></a>
<a class="ajax-op state-off mute-toggle" data-op="mute" href="{{url_for('watchlist.index', op='mute', uuid=watch.uuid, tag=active_tag_uuid)}}"><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="Mute notification" title="Mute notification" class="icon icon-mute" ></a>
<a class="ajax-op state-on mute-toggle" data-op="mute" style="display: none" href="{{url_for('watchlist.index', op='mute', uuid=watch.uuid, tag=active_tag_uuid)}}"><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="UnMute notification" title="UnMute notification" class="icon icon-mute" ></a>
</td>
<td class="title-col inline">{{watch.title if watch.title is not none and watch.title|length > 0 else watch.url}}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}"></a>
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}"><i data-feather="external-link"></i></a>
<a class="link-spread" href="{{url_for('ui.form_share_put_watch', uuid=watch.uuid)}}"><img src="{{url_for('static_content', group='images', filename='spread.svg')}}" class="status-icon icon icon-spread" title="Create a link to share watch config with others" ></a>
{% if watch.get_fetch_backend == "html_webdriver"
{%- if watch.get_fetch_backend == "html_webdriver"
or ( watch.get_fetch_backend == "system" and system_default_fetcher == 'html_webdriver' )
or "extra_browser_" in watch.get_fetch_backend
%}
-%}
<img class="status-icon" src="{{url_for('static_content', group='images', filename='google-chrome-icon.png')}}" alt="Using a Chrome browser" title="Using a Chrome browser" >
{% endif %}
{%- endif -%}
{% if watch.is_pdf %}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" alt="Converting PDF to text" >{% endif %}
{% if watch.has_browser_steps %}<img class="status-icon status-browsersteps" src="{{url_for('static_content', group='images', filename='steps.svg')}}" alt="Browser Steps is enabled" >{% endif %}
{%- if watch.is_pdf -%}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" alt="Converting PDF to text" >{%- endif -%}
{%- if watch.has_browser_steps -%}<img class="status-icon status-browsersteps" src="{{url_for('static_content', group='images', filename='steps.svg')}}" alt="Browser Steps is enabled" >{%- endif -%}
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div>
{% if watch['processor'] == 'text_json_diff' %}
{% if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] %}
{%- if watch['processor'] == 'text_json_diff' -%}
{%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%}
<div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div>
{% endif %}
{% endif %}
{% if watch['processor'] == 'restock_diff' %}
{%- endif -%}
{%- endif -%}
{%- if watch['processor'] == 'restock_diff' -%}
<span class="tracking-ldjson-price-data" title="Automatically following embedded price information"><img src="{{url_for('static_content', group='images', filename='price-tag-icon.svg')}}" class="status-icon price-follow-tag-icon" > Price</span>
{% endif %}
{% for watch_tag_uuid, watch_tag in datastore.get_all_tags_for_watch(watch['uuid']).items() %}
{%- endif -%}
{%- for watch_tag_uuid, watch_tag in datastore.get_all_tags_for_watch(watch['uuid']).items() -%}
<span class="watch-tag-list">{{ watch_tag.title }}</span>
{% endfor %}
{%- endfor -%}
</td>
<!-- @todo make it so any watch handler obj can expose this --->
{% if any_has_restock_price_processor %}
{%- if any_has_restock_price_processor -%}
<td class="restock-and-price">
{% if watch['processor'] == 'restock_diff' %}
{% if watch.has_restock_info %}
{%- if watch['processor'] == 'restock_diff' -%}
{%- if watch.has_restock_info -%}
<span class="restock-label {{'in-stock' if watch['restock']['in_stock'] else 'not-in-stock' }}" title="Detecting restock and price">
<!-- maybe some object watch['processor'][restock_diff] or.. -->
{% if watch['restock']['in_stock'] %} In stock {% else %} Not in stock {% endif %}
{%- if watch['restock']['in_stock']-%} In stock {%- else-%} Not in stock {%- endif -%}
</span>
{% endif %}
{%- endif -%}
{% if watch.get('restock') and watch['restock']['price'] != None %}
{% if watch['restock']['price'] != None %}
{%- if watch.get('restock') and watch['restock']['price'] != None -%}
{%- if watch['restock']['price'] != None -%}
<span class="restock-label price" title="Price">
{{ watch['restock']['price']|format_number_locale }} {{ watch['restock']['currency'] }}
</span>
{% endif %}
{% elif not watch.has_restock_info %}
{%- endif -%}
{%- elif not watch.has_restock_info -%}
<span class="restock-label error">No information</span>
{% endif %}
{% endif %}
{%- endif -%}
{%- endif -%}
</td>
{% endif %}
{%- endif -%}
{#last_checked becomes fetch-start-time#}
<td class="last-checked" data-timestamp="{{ watch.last_checked }}" data-fetchduration={{ watch.fetch_time }} data-eta_complete="{{ watch.last_checked+watch.fetch_time }}" >
<div class="spinner-wrapper" style="display:none;" >
@@ -179,51 +187,34 @@
</div>
<span class="innertext">{{watch|format_last_checked_time|safe}}</span>
</td>
<td class="last-changed" data-timestamp="{{ watch.last_changed }}">{% if watch.history_n >=2 and watch.last_changed >0 %}
<td class="last-changed" data-timestamp="{{ watch.last_changed }}">{%- if watch.history_n >=2 and watch.last_changed >0 -%}
{{watch.last_changed|format_timestamp_timeago}}
{% else %}
{%- else -%}
Not yet
{% endif %}
{%- endif -%}
</td>
<td>
{%- set target_attr = ' target="' ~ watch.uuid ~ '"' if datastore.data['settings']['application']['ui'].get('open_diff_in_new_tab') else '' -%}
<a href="" class="already-in-queue-button recheck pure-button pure-button-primary" style="display: none;" disabled="disabled">Queued</a>
<a href="{{ url_for('ui.form_watch_checknow', uuid=watch.uuid, tag=request.args.get('tag')) }}" data-op='recheck' class="ajax-op recheck pure-button pure-button-primary">Recheck</a>
<a href="{{ url_for('ui.ui_edit.edit_page', uuid=watch.uuid, tag=active_tag_uuid)}}#general" class="pure-button pure-button-primary">Edit</a>
{% if watch.history_n >= 2 %}
{% set open_diff_in_new_tab = datastore.data['settings']['application']['ui'].get('open_diff_in_new_tab') %}
{% set target_attr = ' target="' ~ watch.uuid ~ '"' if open_diff_in_new_tab else '' %}
{% if watch.has_unviewed %}
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid, from_version=watch.get_from_version_based_on_last_viewed) }}" {{target_attr}} class="pure-button pure-button-primary diff-link">History</a>
{% else %}
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary diff-link">History</a>
{% endif %}
{% else %}
{% if watch.history_n == 1 or (watch.history_n ==0 and watch.error_text_ctime )%}
<a href="{{ url_for('ui.ui_views.preview_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary">Preview</a>
{% endif %}
{% endif %}
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary history-link" style="display: none;">History</a>
<a href="{{ url_for('ui.ui_views.preview_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary preview-link" style="display: none;">Preview</a>
</td>
</tr>
{% endfor %}
{%- endfor -%}
</tbody>
</table>
<ul id="post-list-buttons">
<li id="post-list-with-errors" class="{% if errored_count %}has-error{% endif %}" style="display: none;" >
<li id="post-list-with-errors" class="{%- if errored_count -%}has-error{%- endif -%}" style="display: none;" >
<a href="{{url_for('watchlist.index', with_errors=1, tag=request.args.get('tag')) }}" class="pure-button button-tag button-error">With errors ({{ errored_count }})</a>
</li>
<li id="post-list-mark-views" class="{% if has_unviewed %}has-unviewed{% endif %}" style="display: none;" >
<li id="post-list-mark-views" class="{%- if has_unviewed -%}has-unviewed{%- endif -%}" style="display: none;" >
<a href="{{url_for('ui.mark_all_viewed',with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag " id="mark-all-viewed">Mark all viewed</a>
</li>
<li>
<a href="{{ url_for('ui.form_watch_checknow', tag=active_tag_uuid, with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag" id="recheck-all">Recheck
all {% if active_tag_uuid %} in "{{active_tag.title}}"{%endif%}</a>
all {%- if active_tag_uuid-%} in "{{active_tag.title}}"{%endif%}</a>
</li>
<li>
<a href="{{ url_for('rss.feed', tag=active_tag_uuid, token=app_rss_token)}}"><img alt="RSS Feed" id="feed-icon" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
@@ -233,4 +224,4 @@
</div>
</form>
</div>
{% endblock %}
{%- endblock -%}
+7 -7
View File
@@ -68,7 +68,7 @@ class Fetcher():
return self.error
@abstractmethod
def run(self,
async def run(self,
url,
timeout,
request_headers,
@@ -122,7 +122,7 @@ class Fetcher():
return None
def iterate_browser_steps(self, start_url=None):
async def iterate_browser_steps(self, start_url=None):
from changedetectionio.blueprint.browser_steps.browser_steps import steppable_browser_interface
from playwright._impl._errors import TimeoutError, Error
from changedetectionio.safe_jinja import render as jinja_render
@@ -136,8 +136,8 @@ class Fetcher():
for step in valid_steps:
step_n += 1
logger.debug(f">> Iterating check - browser Step n {step_n} - {step['operation']}...")
self.screenshot_step("before-" + str(step_n))
self.save_step_html("before-" + str(step_n))
await self.screenshot_step("before-" + str(step_n))
await self.save_step_html("before-" + str(step_n))
try:
optional_value = step['optional_value']
@@ -148,11 +148,11 @@ class Fetcher():
if '{%' in step['selector'] or '{{' in step['selector']:
selector = jinja_render(template_str=step['selector'])
getattr(interface, "call_action")(action_name=step['operation'],
await getattr(interface, "call_action")(action_name=step['operation'],
selector=selector,
optional_value=optional_value)
self.screenshot_step(step_n)
self.save_step_html(step_n)
await self.screenshot_step(step_n)
await self.save_step_html(step_n)
except (Error, TimeoutError) as e:
logger.debug(str(e))
# Stop processing here
@@ -9,15 +9,15 @@ from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, vi
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
def capture_full_page(page):
async def capture_full_page_async(page):
import os
import time
from multiprocessing import Process, Pipe
start = time.time()
page_height = page.evaluate("document.documentElement.scrollHeight")
page_width = page.evaluate("document.documentElement.scrollWidth")
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
logger.debug(f"Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width}")
@@ -32,23 +32,23 @@ def capture_full_page(page):
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
logger.debug(f"Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
# Set viewport to a larger size to capture more content at once
page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
# Capture screenshots in chunks up to the max total height
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
page.request_gc()
page.evaluate(f"window.scrollTo(0, {y})")
page.request_gc()
screenshot_chunks.append(page.screenshot(
await page.request_gc()
await page.evaluate(f"window.scrollTo(0, {y})")
await page.request_gc()
screenshot_chunks.append(await page.screenshot(
type="jpeg",
full_page=False,
quality=int(os.getenv("SCREENSHOT_QUALITY", 72))
))
y += step_size
page.request_gc()
await page.request_gc()
# Restore original viewport size
page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
# If we have multiple chunks, stitch them together
if len(screenshot_chunks) > 1:
@@ -73,7 +73,6 @@ def capture_full_page(page):
return screenshot_chunks[0]
class fetcher(Fetcher):
fetcher_description = "Playwright {}/Javascript".format(
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
@@ -124,9 +123,9 @@ class fetcher(Fetcher):
self.proxy['username'] = parsed.username
self.proxy['password'] = parsed.password
def screenshot_step(self, step_n=''):
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
screenshot = capture_full_page(page=self.page)
screenshot = await capture_full_page_async(page=self.page)
if self.browser_steps_screenshot_path is not None:
@@ -135,15 +134,15 @@ class fetcher(Fetcher):
with open(destination, 'wb') as f:
f.write(screenshot)
def save_step_html(self, step_n):
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = self.page.content()
content = await self.page.content()
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w') as f:
f.write(content)
def run(self,
async def run(self,
url,
timeout,
request_headers,
@@ -154,26 +153,26 @@ class fetcher(Fetcher):
is_binary=False,
empty_pages_are_a_change=False):
from playwright.sync_api import sync_playwright
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
response = None
with sync_playwright() as p:
async with async_playwright() as p:
browser_type = getattr(p, self.browser_type)
# Seemed to cause a connection Exception even tho I can see it connect
# self.browser = browser_type.connect(self.command_executor, timeout=timeout*1000)
# 60,000 connection timeout only
browser = browser_type.connect_over_cdp(self.browser_connection_url, timeout=60000)
browser = await browser_type.connect_over_cdp(self.browser_connection_url, timeout=60000)
# SOCKS5 with authentication is not supported (yet)
# https://github.com/microsoft/playwright/issues/10567
# Set user agent to prevent Cloudflare from blocking the browser
# Use the default one configured in the App.py model that's passed from fetch_site_status.py
context = browser.new_context(
context = await browser.new_context(
accept_downloads=False, # Should never be needed
bypass_csp=True, # This is needed to enable JavaScript execution on GitHub and others
extra_http_headers=request_headers,
@@ -183,7 +182,7 @@ class fetcher(Fetcher):
user_agent=manage_user_agent(headers=request_headers),
)
self.page = context.new_page()
self.page = await context.new_page()
# Listen for all console events and handle errors
self.page.on("console", lambda msg: logger.debug(f"Playwright console: Watch URL: {url} {msg.type}: {msg.text} {msg.args}"))
@@ -193,32 +192,37 @@ class fetcher(Fetcher):
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
response = browsersteps_interface.action_goto_url(value=url)
response = await browsersteps_interface.action_goto_url(value=url)
if response is None:
context.close()
browser.close()
await context.close()
await browser.close()
logger.debug("Content Fetcher > Response object from the browser communication was none")
raise EmptyReply(url=url, status_code=None)
self.headers = response.all_headers()
# In async_playwright, all_headers() returns a coroutine
try:
self.headers = await response.all_headers()
except TypeError:
# Fallback for sync version
self.headers = response.all_headers()
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
await browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._errors.TimeoutError as e:
context.close()
browser.close()
await context.close()
await browser.close()
# This can be ok, we will try to grab what we could retrieve
pass
except Exception as e:
logger.debug(f"Content Fetcher > Other exception when executing custom JS code {str(e)}")
context.close()
browser.close()
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
self.page.wait_for_timeout(extra_wait * 1000)
await self.page.wait_for_timeout(extra_wait * 1000)
try:
self.status_code = response.status
@@ -226,48 +230,48 @@ class fetcher(Fetcher):
# https://github.com/dgtlmoon/changedetection.io/discussions/2122#discussioncomment-8241962
logger.critical(f"Response from the browser/Playwright did not have a status_code! Response follows.")
logger.critical(response)
context.close()
browser.close()
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if self.status_code != 200 and not ignore_status_codes:
screenshot = capture_full_page(self.page)
screenshot = await capture_full_page_async(self.page)
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if not empty_pages_are_a_change and len(self.page.content().strip()) == 0:
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
logger.debug("Content Fetcher > Content was empty, empty_pages_are_a_change = False")
context.close()
browser.close()
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
# Run Browser Steps here
if self.browser_steps_get_valid_steps():
self.iterate_browser_steps(start_url=url)
await self.iterate_browser_steps(start_url=url)
self.page.wait_for_timeout(extra_wait * 1000)
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
self.page.evaluate("var include_filters=''")
self.page.request_gc()
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
# request_gc before and after evaluate to free up memory
# @todo browsersteps etc
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = self.page.evaluate(XPATH_ELEMENT_JS, {
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
self.page.request_gc()
await self.page.request_gc()
self.instock_data = self.page.evaluate(INSTOCK_DATA_JS)
self.page.request_gc()
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await self.page.request_gc()
self.content = self.page.content()
self.page.request_gc()
self.content = await self.page.content()
await self.page.request_gc()
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
# Bug 3 in Playwright screenshot handling
@@ -279,7 +283,7 @@ class fetcher(Fetcher):
# acceptable screenshot quality here
try:
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
self.screenshot = capture_full_page(page=self.page)
self.screenshot = await capture_full_page_async(page=self.page)
except Exception as e:
# It's likely the screenshot was too long/big and something crashed
@@ -287,30 +291,30 @@ class fetcher(Fetcher):
finally:
# Request garbage collection one more time before closing
try:
self.page.request_gc()
await self.page.request_gc()
except:
pass
# Clean up resources properly
try:
self.page.request_gc()
await self.page.request_gc()
except:
pass
try:
self.page.close()
await self.page.close()
except:
pass
self.page = None
try:
context.close()
await context.close()
except:
pass
context = None
try:
browser.close()
await browser.close()
except:
pass
browser = None
@@ -310,15 +310,15 @@ class fetcher(Fetcher):
async def main(self, **kwargs):
await self.fetch_page(**kwargs)
def run(self, url, timeout, request_headers, request_body, request_method, ignore_status_codes=False,
async def run(self, url, timeout, request_headers, request_body, request_method, ignore_status_codes=False,
current_include_filters=None, is_binary=False, empty_pages_are_a_change=False):
#@todo make update_worker async which could run any of these content_fetchers within memory and time constraints
max_time = os.getenv('PUPPETEER_MAX_PROCESSING_TIMEOUT_SECONDS', 180)
max_time = int(os.getenv('PUPPETEER_MAX_PROCESSING_TIMEOUT_SECONDS', 180))
# This will work in 3.10 but not >= 3.11 because 3.11 wants tasks only
# Now we run this properly in async context since we're called from async worker
try:
asyncio.run(asyncio.wait_for(self.main(
await asyncio.wait_for(self.main(
url=url,
timeout=timeout,
request_headers=request_headers,
@@ -328,7 +328,7 @@ class fetcher(Fetcher):
current_include_filters=current_include_filters,
is_binary=is_binary,
empty_pages_are_a_change=empty_pages_are_a_change
), timeout=max_time))
), timeout=max_time)
except asyncio.TimeoutError:
raise(BrowserFetchTimedOut(msg=f"Browser connected but was unable to process the page in {max_time} seconds."))
+33 -3
View File
@@ -1,6 +1,7 @@
from loguru import logger
import hashlib
import os
import asyncio
from changedetectionio import strtobool
from changedetectionio.content_fetchers.exceptions import BrowserStepsInUnsupportedFetcher, EmptyReply, Non200ErrorCodeReceived
from changedetectionio.content_fetchers.base import Fetcher
@@ -15,7 +16,7 @@ class fetcher(Fetcher):
self.proxy_override = proxy_override
# browser_connection_url is none because its always 'launched locally'
def run(self,
def _run_sync(self,
url,
timeout,
request_headers,
@@ -25,6 +26,7 @@ class fetcher(Fetcher):
current_include_filters=None,
is_binary=False,
empty_pages_are_a_change=False):
"""Synchronous version of run - the original requests implementation"""
import chardet
import requests
@@ -36,7 +38,6 @@ class fetcher(Fetcher):
proxies = {}
# Allows override the proxy on a per-request basis
# https://requests.readthedocs.io/en/latest/user/advanced/#socks
# Should also work with `socks5://user:pass@host:port` type syntax.
@@ -100,9 +101,38 @@ class fetcher(Fetcher):
else:
self.content = r.text
self.raw_content = r.content
async def run(self,
url,
timeout,
request_headers,
request_body,
request_method,
ignore_status_codes=False,
current_include_filters=None,
is_binary=False,
empty_pages_are_a_change=False):
"""Async wrapper that runs the synchronous requests code in a thread pool"""
loop = asyncio.get_event_loop()
# Run the synchronous _run_sync in a thread pool to avoid blocking the event loop
await loop.run_in_executor(
None, # Use default ThreadPoolExecutor
lambda: self._run_sync(
url=url,
timeout=timeout,
request_headers=request_headers,
request_body=request_body,
request_method=request_method,
ignore_status_codes=ignore_status_codes,
current_include_filters=current_include_filters,
is_binary=is_binary,
empty_pages_are_a_change=empty_pages_are_a_change
)
)
def quit(self, watch=None):
# In case they switched to `requests` fetcher from something else
@@ -47,7 +47,7 @@ class fetcher(Fetcher):
self.proxy_url = k.strip()
def run(self,
async def run(self,
url,
timeout,
request_headers,
@@ -58,77 +58,86 @@ class fetcher(Fetcher):
is_binary=False,
empty_pages_are_a_change=False):
from selenium.webdriver.chrome.options import Options as ChromeOptions
# request_body, request_method unused for now, until some magic in the future happens.
import asyncio
# Wrap the entire selenium operation in a thread executor
def _run_sync():
from selenium.webdriver.chrome.options import Options as ChromeOptions
# request_body, request_method unused for now, until some magic in the future happens.
options = ChromeOptions()
options = ChromeOptions()
# Load Chrome options from env
CHROME_OPTIONS = [
line.strip()
for line in os.getenv("CHROME_OPTIONS", "").strip().splitlines()
if line.strip()
]
# Load Chrome options from env
CHROME_OPTIONS = [
line.strip()
for line in os.getenv("CHROME_OPTIONS", "").strip().splitlines()
if line.strip()
]
for opt in CHROME_OPTIONS:
options.add_argument(opt)
for opt in CHROME_OPTIONS:
options.add_argument(opt)
# 1. proxy_config /Proxy(proxy_config) selenium object is REALLY unreliable
# 2. selenium-wire cant be used because the websocket version conflicts with pypeteer-ng
# 3. selenium only allows ONE runner at a time by default!
# 4. driver must use quit() or it will continue to block/hold the selenium process!!
# 1. proxy_config /Proxy(proxy_config) selenium object is REALLY unreliable
# 2. selenium-wire cant be used because the websocket version conflicts with pypeteer-ng
# 3. selenium only allows ONE runner at a time by default!
# 4. driver must use quit() or it will continue to block/hold the selenium process!!
if self.proxy_url:
options.add_argument(f'--proxy-server={self.proxy_url}')
if self.proxy_url:
options.add_argument(f'--proxy-server={self.proxy_url}')
from selenium.webdriver.remote.remote_connection import RemoteConnection
from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
driver = None
try:
# Create the RemoteConnection and set timeout (e.g., 30 seconds)
remote_connection = RemoteConnection(
self.browser_connection_url,
)
remote_connection.set_timeout(30) # seconds
from selenium.webdriver.remote.remote_connection import RemoteConnection
from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
driver = None
try:
# Create the RemoteConnection and set timeout (e.g., 30 seconds)
remote_connection = RemoteConnection(
self.browser_connection_url,
)
remote_connection.set_timeout(30) # seconds
# Now create the driver with the RemoteConnection
driver = RemoteWebDriver(
command_executor=remote_connection,
options=options
)
# Now create the driver with the RemoteConnection
driver = RemoteWebDriver(
command_executor=remote_connection,
options=options
)
driver.set_page_load_timeout(int(os.getenv("WEBDRIVER_PAGELOAD_TIMEOUT", 45)))
except Exception as e:
if driver:
driver.quit()
raise e
driver.set_page_load_timeout(int(os.getenv("WEBDRIVER_PAGELOAD_TIMEOUT", 45)))
except Exception as e:
if driver:
driver.quit()
raise e
try:
driver.get(url)
try:
driver.get(url)
if not "--window-size" in os.getenv("CHROME_OPTIONS", ""):
driver.set_window_size(1280, 1024)
if not "--window-size" in os.getenv("CHROME_OPTIONS", ""):
driver.set_window_size(1280, 1024)
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
if self.webdriver_js_execute_code is not None:
driver.execute_script(self.webdriver_js_execute_code)
# Selenium doesn't automatically wait for actions as good as Playwright, so wait again
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
# @todo - how to check this? is it possible?
self.status_code = 200
# @todo somehow we should try to get this working for WebDriver
# raise EmptyReply(url=url, status_code=r.status_code)
if self.webdriver_js_execute_code is not None:
driver.execute_script(self.webdriver_js_execute_code)
# Selenium doesn't automatically wait for actions as good as Playwright, so wait again
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
# @todo - how to check this? is it possible?
self.status_code = 200
# @todo somehow we should try to get this working for WebDriver
# raise EmptyReply(url=url, status_code=r.status_code)
# @todo - dom wait loaded?
import time
time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay)
self.content = driver.page_source
self.headers = {}
self.screenshot = driver.get_screenshot_as_png()
except Exception as e:
driver.quit()
raise e
# @todo - dom wait loaded?
time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay)
self.content = driver.page_source
self.headers = {}
self.screenshot = driver.get_screenshot_as_png()
except Exception as e:
driver.quit()
raise e
driver.quit()
# Run the selenium operations in a thread pool to avoid blocking the event loop
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, _run_sync)
+448
View File
@@ -1,4 +1,5 @@
import queue
import asyncio
from blinker import signal
from loguru import logger
@@ -50,3 +51,450 @@ class SignalPriorityQueue(queue.PriorityQueue):
except Exception as e:
logger.critical(f"Exception: {e}")
return item
def get_uuid_position(self, target_uuid):
"""
Find the position of a watch UUID in the priority queue.
Optimized for large queues - O(n) complexity instead of O(n log n).
Args:
target_uuid: The UUID to search for
Returns:
dict: Contains position info or None if not found
- position: 0-based position in queue (0 = next to be processed)
- total_items: total number of items in queue
- priority: the priority value of the found item
"""
with self.mutex:
queue_list = list(self.queue)
total_items = len(queue_list)
if total_items == 0:
return {
'position': None,
'total_items': 0,
'priority': None,
'found': False
}
# Find the target item and its priority first - O(n)
target_item = None
target_priority = None
for item in queue_list:
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
item.item.get('uuid') == target_uuid):
target_item = item
target_priority = item.priority
break
if target_item is None:
return {
'position': None,
'total_items': total_items,
'priority': None,
'found': False
}
# Count how many items have higher priority (lower numbers) - O(n)
position = 0
for item in queue_list:
# Items with lower priority numbers are processed first
if item.priority < target_priority:
position += 1
elif item.priority == target_priority and item != target_item:
# For same priority, count items that come before this one
# (Note: this is approximate since heap order isn't guaranteed for equal priorities)
position += 1
return {
'position': position,
'total_items': total_items,
'priority': target_priority,
'found': True
}
def get_all_queued_uuids(self, limit=None, offset=0):
"""
Get UUIDs currently in the queue with their positions.
For large queues, use limit/offset for pagination.
Args:
limit: Maximum number of items to return (None = all)
offset: Number of items to skip (for pagination)
Returns:
dict: Contains items and metadata
- items: List of dicts with uuid, position, and priority
- total_items: Total number of items in queue
- returned_items: Number of items returned
- has_more: Whether there are more items after this page
"""
with self.mutex:
queue_list = list(self.queue)
total_items = len(queue_list)
if total_items == 0:
return {
'items': [],
'total_items': 0,
'returned_items': 0,
'has_more': False
}
# For very large queues, warn about performance
if total_items > 1000 and limit is None:
logger.warning(f"Getting all {total_items} queued items without limit - this may be slow")
# Sort only if we need exact positions (expensive for large queues)
if limit is not None and limit <= 100:
# For small requests, we can afford to sort
queue_items = sorted(queue_list)
end_idx = min(offset + limit, len(queue_items)) if limit else len(queue_items)
items_to_process = queue_items[offset:end_idx]
result = []
for position, item in enumerate(items_to_process, start=offset):
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
'uuid' in item.item):
result.append({
'uuid': item.item['uuid'],
'position': position,
'priority': item.priority
})
return {
'items': result,
'total_items': total_items,
'returned_items': len(result),
'has_more': (offset + len(result)) < total_items
}
else:
# For large requests, return items with approximate positions
# This is much faster O(n) instead of O(n log n)
result = []
processed = 0
skipped = 0
for item in queue_list:
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
'uuid' in item.item):
if skipped < offset:
skipped += 1
continue
if limit and processed >= limit:
break
# Approximate position based on priority comparison
approx_position = sum(1 for other in queue_list if other.priority < item.priority)
result.append({
'uuid': item.item['uuid'],
'position': approx_position, # Approximate
'priority': item.priority
})
processed += 1
return {
'items': result,
'total_items': total_items,
'returned_items': len(result),
'has_more': (offset + len(result)) < total_items,
'note': 'Positions are approximate for performance with large queues'
}
def get_queue_summary(self):
"""
Get a quick summary of queue state without expensive operations.
O(n) complexity - fast even for large queues.
Returns:
dict: Queue summary statistics
"""
with self.mutex:
queue_list = list(self.queue)
total_items = len(queue_list)
if total_items == 0:
return {
'total_items': 0,
'priority_breakdown': {},
'immediate_items': 0,
'clone_items': 0,
'scheduled_items': 0
}
# Count items by priority type - O(n)
immediate_items = 0 # priority 1
clone_items = 0 # priority 5
scheduled_items = 0 # priority > 100 (timestamps)
priority_counts = {}
for item in queue_list:
priority = item.priority
priority_counts[priority] = priority_counts.get(priority, 0) + 1
if priority == 1:
immediate_items += 1
elif priority == 5:
clone_items += 1
elif priority > 100:
scheduled_items += 1
return {
'total_items': total_items,
'priority_breakdown': priority_counts,
'immediate_items': immediate_items,
'clone_items': clone_items,
'scheduled_items': scheduled_items,
'min_priority': min(priority_counts.keys()) if priority_counts else None,
'max_priority': max(priority_counts.keys()) if priority_counts else None
}
class AsyncSignalPriorityQueue(asyncio.PriorityQueue):
"""
Async version of SignalPriorityQueue that sends signals when items are added/removed.
This class extends asyncio.PriorityQueue and maintains the same signal behavior
as the synchronous version for real-time UI updates.
"""
def __init__(self, maxsize=0):
super().__init__(maxsize)
try:
self.queue_length_signal = signal('queue_length')
except Exception as e:
logger.critical(f"Exception: {e}")
async def put(self, item):
# Call the parent's put method first
await super().put(item)
# After putting the item in the queue, check if it has a UUID and emit signal
if hasattr(item, 'item') and isinstance(item.item, dict) and 'uuid' in item.item:
uuid = item.item['uuid']
# Get the signal and send it if it exists
watch_check_update = signal('watch_check_update')
if watch_check_update:
# Send the watch_uuid parameter
watch_check_update.send(watch_uuid=uuid)
# Send queue_length signal with current queue size
try:
if self.queue_length_signal:
self.queue_length_signal.send(length=self.qsize())
except Exception as e:
logger.critical(f"Exception: {e}")
async def get(self):
# Call the parent's get method first
item = await super().get()
# Send queue_length signal with current queue size
try:
if self.queue_length_signal:
self.queue_length_signal.send(length=self.qsize())
except Exception as e:
logger.critical(f"Exception: {e}")
return item
@property
def queue(self):
"""
Provide compatibility with sync PriorityQueue.queue access
Returns the internal queue for template access
"""
return self._queue if hasattr(self, '_queue') else []
def get_uuid_position(self, target_uuid):
"""
Find the position of a watch UUID in the async priority queue.
Optimized for large queues - O(n) complexity instead of O(n log n).
Args:
target_uuid: The UUID to search for
Returns:
dict: Contains position info or None if not found
- position: 0-based position in queue (0 = next to be processed)
- total_items: total number of items in queue
- priority: the priority value of the found item
"""
queue_list = list(self._queue)
total_items = len(queue_list)
if total_items == 0:
return {
'position': None,
'total_items': 0,
'priority': None,
'found': False
}
# Find the target item and its priority first - O(n)
target_item = None
target_priority = None
for item in queue_list:
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
item.item.get('uuid') == target_uuid):
target_item = item
target_priority = item.priority
break
if target_item is None:
return {
'position': None,
'total_items': total_items,
'priority': None,
'found': False
}
# Count how many items have higher priority (lower numbers) - O(n)
position = 0
for item in queue_list:
if item.priority < target_priority:
position += 1
elif item.priority == target_priority and item != target_item:
position += 1
return {
'position': position,
'total_items': total_items,
'priority': target_priority,
'found': True
}
def get_all_queued_uuids(self, limit=None, offset=0):
"""
Get UUIDs currently in the async queue with their positions.
For large queues, use limit/offset for pagination.
Args:
limit: Maximum number of items to return (None = all)
offset: Number of items to skip (for pagination)
Returns:
dict: Contains items and metadata (same structure as sync version)
"""
queue_list = list(self._queue)
total_items = len(queue_list)
if total_items == 0:
return {
'items': [],
'total_items': 0,
'returned_items': 0,
'has_more': False
}
# Same logic as sync version but without mutex
if limit is not None and limit <= 100:
queue_items = sorted(queue_list)
end_idx = min(offset + limit, len(queue_items)) if limit else len(queue_items)
items_to_process = queue_items[offset:end_idx]
result = []
for position, item in enumerate(items_to_process, start=offset):
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
'uuid' in item.item):
result.append({
'uuid': item.item['uuid'],
'position': position,
'priority': item.priority
})
return {
'items': result,
'total_items': total_items,
'returned_items': len(result),
'has_more': (offset + len(result)) < total_items
}
else:
# Fast approximate positions for large queues
result = []
processed = 0
skipped = 0
for item in queue_list:
if (hasattr(item, 'item') and
isinstance(item.item, dict) and
'uuid' in item.item):
if skipped < offset:
skipped += 1
continue
if limit and processed >= limit:
break
approx_position = sum(1 for other in queue_list if other.priority < item.priority)
result.append({
'uuid': item.item['uuid'],
'position': approx_position,
'priority': item.priority
})
processed += 1
return {
'items': result,
'total_items': total_items,
'returned_items': len(result),
'has_more': (offset + len(result)) < total_items,
'note': 'Positions are approximate for performance with large queues'
}
def get_queue_summary(self):
"""
Get a quick summary of async queue state.
O(n) complexity - fast even for large queues.
"""
queue_list = list(self._queue)
total_items = len(queue_list)
if total_items == 0:
return {
'total_items': 0,
'priority_breakdown': {},
'immediate_items': 0,
'clone_items': 0,
'scheduled_items': 0
}
immediate_items = 0
clone_items = 0
scheduled_items = 0
priority_counts = {}
for item in queue_list:
priority = item.priority
priority_counts[priority] = priority_counts.get(priority, 0) + 1
if priority == 1:
immediate_items += 1
elif priority == 5:
clone_items += 1
elif priority > 100:
scheduled_items += 1
return {
'total_items': total_items,
'priority_breakdown': priority_counts,
'immediate_items': immediate_items,
'clone_items': clone_items,
'scheduled_items': scheduled_items,
'min_priority': min(priority_counts.keys()) if priority_counts else None,
'max_priority': max(priority_counts.keys()) if priority_counts else None
}
+143 -28
View File
@@ -4,6 +4,7 @@ import flask_login
import locale
import os
import queue
import sys
import threading
import time
import timeago
@@ -11,7 +12,8 @@ from blinker import signal
from changedetectionio.strtobool import strtobool
from threading import Event
from changedetectionio.custom_queue import SignalPriorityQueue
from changedetectionio.custom_queue import SignalPriorityQueue, AsyncSignalPriorityQueue
from changedetectionio import worker_handler
from flask import (
Flask,
@@ -45,12 +47,11 @@ from .time_handler import is_within_schedule
datastore = None
# Local
running_update_threads = []
ticker_thread = None
extra_stylesheets = []
update_q = SignalPriorityQueue()
# Use async queue by default, keep sync for backward compatibility
update_q = AsyncSignalPriorityQueue() if worker_handler.USE_ASYNC_WORKERS else SignalPriorityQueue()
notification_q = queue.Queue()
MAX_QUEUE_SIZE = 2000
@@ -145,10 +146,32 @@ def _jinja2_filter_format_number_locale(value: float) -> str:
@app.template_global('is_checking_now')
def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"):
# Worker thread tells us which UUID it is currently processing.
for t in running_update_threads:
if t.current_uuid == watch_obj['uuid']:
return True
return worker_handler.is_watch_running(watch_obj['uuid'])
@app.template_global('get_watch_queue_position')
def _get_watch_queue_position(watch_obj):
"""Get the position of a watch in the queue"""
uuid = watch_obj['uuid']
return update_q.get_uuid_position(uuid)
@app.template_global('get_current_worker_count')
def _get_current_worker_count():
"""Get the current number of operational workers"""
return worker_handler.get_worker_count()
@app.template_global('get_worker_status_info')
def _get_worker_status_info():
"""Get detailed worker status information for display"""
status = worker_handler.get_worker_status()
running_uuids = worker_handler.get_running_uuids()
return {
'count': status['worker_count'],
'type': status['worker_type'],
'active_workers': len(running_uuids),
'processing_watches': running_uuids,
'loop_running': status.get('async_loop_running', None)
}
# We use the whole watch object from the store/JSON so we can see if there's some related status in terms of a thread
@@ -470,16 +493,21 @@ def changedetection_app(config=None, datastore_o=None):
# watchlist UI buttons etc
import changedetectionio.blueprint.ui as ui
app.register_blueprint(ui.construct_blueprint(datastore, update_q, running_update_threads, queuedWatchMetaData, watch_check_update))
app.register_blueprint(ui.construct_blueprint(datastore, update_q, worker_handler, queuedWatchMetaData, watch_check_update))
import changedetectionio.blueprint.watchlist as watchlist
app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='')
# Initialize Socket.IO server
from changedetectionio.realtime.socket_server import init_socketio
global socketio_server
socketio_server = init_socketio(app, datastore)
logger.info("Socket.IO server initialized")
# Initialize Socket.IO server conditionally based on settings
socket_io_enabled = datastore.data['settings']['application']['ui'].get('socket_io_enabled', True)
if socket_io_enabled:
from changedetectionio.realtime.socket_server import init_socketio
global socketio_server
socketio_server = init_socketio(app, datastore)
logger.info("Socket.IO server initialized")
else:
logger.info("Socket.IO server disabled via settings")
socketio_server = None
# Memory cleanup endpoint
@app.route('/gc-cleanup', methods=['GET'])
@@ -491,12 +519,91 @@ def changedetection_app(config=None, datastore_o=None):
result = memory_cleanup(app)
return jsonify({"status": "success", "message": "Memory cleanup completed", "result": result})
# Worker health check endpoint
@app.route('/worker-health', methods=['GET'])
@login_optionally_required
def worker_health():
from flask import jsonify
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
# Get basic status
status = worker_handler.get_worker_status()
# Perform health check
health_result = worker_handler.check_worker_health(
expected_count=expected_workers,
update_q=update_q,
notification_q=notification_q,
app=app,
datastore=datastore
)
return jsonify({
"status": "success",
"worker_status": status,
"health_check": health_result,
"expected_workers": expected_workers
})
# Queue status endpoint
@app.route('/queue-status', methods=['GET'])
@login_optionally_required
def queue_status():
from flask import jsonify, request
# Get specific UUID position if requested
target_uuid = request.args.get('uuid')
if target_uuid:
position_info = update_q.get_uuid_position(target_uuid)
return jsonify({
"status": "success",
"uuid": target_uuid,
"queue_position": position_info
})
else:
# Get pagination parameters
limit = request.args.get('limit', type=int)
offset = request.args.get('offset', type=int, default=0)
summary_only = request.args.get('summary', type=bool, default=False)
if summary_only:
# Fast summary for large queues
summary = update_q.get_queue_summary()
return jsonify({
"status": "success",
"queue_summary": summary
})
else:
# Get queued items with pagination support
if limit is None:
# Default limit for large queues to prevent performance issues
queue_size = update_q.qsize()
if queue_size > 100:
limit = 50
logger.warning(f"Large queue ({queue_size} items) detected, limiting to {limit} items. Use ?limit=N for more.")
all_queued = update_q.get_all_queued_uuids(limit=limit, offset=offset)
return jsonify({
"status": "success",
"queue_size": update_q.qsize(),
"queued_data": all_queued
})
# Start the async workers during app initialization
# Can be overridden by ENV or use the default settings
n_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
logger.info(f"Starting {n_workers} workers during app initialization")
worker_handler.start_workers(n_workers, update_q, notification_q, app, datastore)
# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
threading.Thread(target=notification_runner).start()
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
# Check for new release version, but not when running in test/build or pytest
if not os.getenv("GITHUB_REF", False) and not strtobool(os.getenv('DISABLE_VERSION_CHECK', 'no')):
if not os.getenv("GITHUB_REF", False) and not strtobool(os.getenv('DISABLE_VERSION_CHECK', 'no')) and not in_pytest:
threading.Thread(target=check_for_new_version).start()
# Return the Flask app - the Socket.IO will be attached to it but initialized separately
@@ -588,27 +695,35 @@ def notification_runner():
# Threaded runner, look for new watches to feed into the Queue.
def ticker_thread_check_time_launch_checks():
import random
from changedetectionio import update_worker
proxy_last_called_time = {}
last_health_check = 0
recheck_time_minimum_seconds = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 3))
logger.debug(f"System env MINIMUM_SECONDS_RECHECK_TIME {recheck_time_minimum_seconds}")
# Spin up Workers that do the fetching
# Can be overriden by ENV or use the default settings
n_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
for _ in range(n_workers):
new_worker = update_worker.update_worker(update_q, notification_q, app, datastore)
running_update_threads.append(new_worker)
new_worker.start()
# Workers are now started during app initialization, not here
while not app.config.exit.is_set():
# Periodic worker health check (every 60 seconds)
now = time.time()
if now - last_health_check > 60:
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
health_result = worker_handler.check_worker_health(
expected_count=expected_workers,
update_q=update_q,
notification_q=notification_q,
app=app,
datastore=datastore
)
if health_result['status'] != 'healthy':
logger.warning(f"Worker health check: {health_result['message']}")
last_health_check = now
# Get a list of watches by UUID that are currently fetching data
running_uuids = []
for t in running_update_threads:
if t.current_uuid:
running_uuids.append(t.current_uuid)
running_uuids = worker_handler.get_running_uuids()
# Re #232 - Deepcopy the data incase it changes while we're iterating through it all
watch_uuid_list = []
@@ -711,7 +826,7 @@ def ticker_thread_check_time_launch_checks():
f"{now - watch['last_checked']:0.2f}s since last checked")
# Into the queue with you
update_q.put(queuedWatchMetaData.PrioritizedItem(priority=priority, item={'uuid': uuid}))
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=priority, item={'uuid': uuid}))
# Reset for next time
watch.jitter_seconds = 0
+7
View File
@@ -719,6 +719,12 @@ class globalSettingsRequestForm(Form):
jitter_seconds = IntegerField('Random jitter seconds ± check',
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0, message="Should contain zero or more seconds")])
workers = IntegerField('Number of fetch workers',
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=1, max=50,
message="Should be between 1 and 50")])
extra_proxies = FieldList(FormField(SingleExtraProxy), min_entries=5)
extra_browsers = FieldList(FormField(SingleExtraBrowser), min_entries=5)
@@ -733,6 +739,7 @@ class globalSettingsRequestForm(Form):
class globalSettingsApplicationUIForm(Form):
open_diff_in_new_tab = BooleanField('Open diff page in a new tab', default=True, validators=[validators.Optional()])
socket_io_enabled = BooleanField('Realtime UI Updates Enabled', default=True, validators=[validators.Optional()])
# datastore.data['settings']['application']..
class globalSettingsApplicationForm(commonSettingsForm):
+1
View File
@@ -62,6 +62,7 @@ class model(dict):
'timezone': None, # Default IANA timezone name
'ui': {
'open_diff_in_new_tab': True,
'socket_io_enabled': True
},
}
}
+246
View File
@@ -0,0 +1,246 @@
#!/usr/bin/env python3
"""
Notification Service Module
Extracted from update_worker.py to provide standalone notification functionality
for both sync and async workers
"""
import time
from loguru import logger
class NotificationService:
"""
Standalone notification service that handles all notification functionality
previously embedded in the update_worker class
"""
def __init__(self, datastore, notification_q):
self.datastore = datastore
self.notification_q = notification_q
def queue_notification_for_watch(self, n_object, watch):
"""
Queue a notification for a watch with full diff rendering and template variables
"""
from changedetectionio import diff
from changedetectionio.notification import default_notification_format_for_watch
dates = []
trigger_text = ''
now = time.time()
if watch:
watch_history = watch.history
dates = list(watch_history.keys())
trigger_text = watch.get('trigger_text', [])
# Add text that was triggered
if len(dates):
snapshot_contents = watch.get_history_snapshot(dates[-1])
else:
snapshot_contents = "No snapshot/history available, the watch should fetch atleast once."
# If we ended up here with "System default"
if n_object.get('notification_format') == default_notification_format_for_watch:
n_object['notification_format'] = self.datastore.data['settings']['application'].get('notification_format')
html_colour_enable = False
# HTML needs linebreak, but MarkDown and Text can use a linefeed
if n_object.get('notification_format') == 'HTML':
line_feed_sep = "<br>"
# Snapshot will be plaintext on the disk, convert to some kind of HTML
snapshot_contents = snapshot_contents.replace('\n', line_feed_sep)
elif n_object.get('notification_format') == 'HTML Color':
line_feed_sep = "<br>"
# Snapshot will be plaintext on the disk, convert to some kind of HTML
snapshot_contents = snapshot_contents.replace('\n', line_feed_sep)
html_colour_enable = True
else:
line_feed_sep = "\n"
triggered_text = ''
if len(trigger_text):
from . import html_tools
triggered_text = html_tools.get_triggered_text(content=snapshot_contents, trigger_text=trigger_text)
if triggered_text:
triggered_text = line_feed_sep.join(triggered_text)
# Could be called as a 'test notification' with only 1 snapshot available
prev_snapshot = "Example text: example test\nExample text: change detection is cool\nExample text: some more examples\n"
current_snapshot = "Example text: example test\nExample text: change detection is fantastic\nExample text: even more examples\nExample text: a lot more examples"
if len(dates) > 1:
prev_snapshot = watch.get_history_snapshot(dates[-2])
current_snapshot = watch.get_history_snapshot(dates[-1])
n_object.update({
'current_snapshot': snapshot_contents,
'diff': diff.render_diff(prev_snapshot, current_snapshot, line_feed_sep=line_feed_sep, html_colour=html_colour_enable),
'diff_added': diff.render_diff(prev_snapshot, current_snapshot, include_removed=False, line_feed_sep=line_feed_sep),
'diff_full': diff.render_diff(prev_snapshot, current_snapshot, include_equal=True, line_feed_sep=line_feed_sep, html_colour=html_colour_enable),
'diff_patch': diff.render_diff(prev_snapshot, current_snapshot, line_feed_sep=line_feed_sep, patch_format=True),
'diff_removed': diff.render_diff(prev_snapshot, current_snapshot, include_added=False, line_feed_sep=line_feed_sep),
'notification_timestamp': now,
'screenshot': watch.get_screenshot() if watch and watch.get('notification_screenshot') else None,
'triggered_text': triggered_text,
'uuid': watch.get('uuid') if watch else None,
'watch_url': watch.get('url') if watch else None,
})
if watch:
n_object.update(watch.extra_notification_token_values())
logger.trace(f"Main rendered notification placeholders (diff_added etc) calculated in {time.time()-now:.3f}s")
logger.debug("Queued notification for sending")
self.notification_q.put(n_object)
def _check_cascading_vars(self, var_name, watch):
"""
Check notification variables in cascading priority:
Individual watch settings > Tag settings > Global settings
"""
from changedetectionio.notification import (
default_notification_format_for_watch,
default_notification_body,
default_notification_title
)
# Would be better if this was some kind of Object where Watch can reference the parent datastore etc
v = watch.get(var_name)
if v and not watch.get('notification_muted'):
if var_name == 'notification_format' and v == default_notification_format_for_watch:
return self.datastore.data['settings']['application'].get('notification_format')
return v
tags = self.datastore.get_all_tags_for_watch(uuid=watch.get('uuid'))
if tags:
for tag_uuid, tag in tags.items():
v = tag.get(var_name)
if v and not tag.get('notification_muted'):
return v
if self.datastore.data['settings']['application'].get(var_name):
return self.datastore.data['settings']['application'].get(var_name)
# Otherwise could be defaults
if var_name == 'notification_format':
return default_notification_format_for_watch
if var_name == 'notification_body':
return default_notification_body
if var_name == 'notification_title':
return default_notification_title
return None
def send_content_changed_notification(self, watch_uuid):
"""
Send notification when content changes are detected
"""
n_object = {}
watch = self.datastore.data['watching'].get(watch_uuid)
if not watch:
return
watch_history = watch.history
dates = list(watch_history.keys())
# Theoretically it's possible that this could be just 1 long,
# - In the case that the timestamp key was not unique
if len(dates) == 1:
raise ValueError(
"History index had 2 or more, but only 1 date loaded, timestamps were not unique? maybe two of the same timestamps got written, needs more delay?"
)
# Should be a better parent getter in the model object
# Prefer - Individual watch settings > Tag settings > Global settings (in that order)
n_object['notification_urls'] = self._check_cascading_vars('notification_urls', watch)
n_object['notification_title'] = self._check_cascading_vars('notification_title', watch)
n_object['notification_body'] = self._check_cascading_vars('notification_body', watch)
n_object['notification_format'] = self._check_cascading_vars('notification_format', watch)
# (Individual watch) Only prepare to notify if the rules above matched
queued = False
if n_object and n_object.get('notification_urls'):
queued = True
count = watch.get('notification_alert_count', 0) + 1
self.datastore.update_watch(uuid=watch_uuid, update_obj={'notification_alert_count': count})
self.queue_notification_for_watch(n_object=n_object, watch=watch)
return queued
def send_filter_failure_notification(self, watch_uuid):
"""
Send notification when CSS/XPath filters fail consecutively
"""
threshold = self.datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts')
watch = self.datastore.data['watching'].get(watch_uuid)
if not watch:
return
n_object = {'notification_title': 'Changedetection.io - Alert - CSS/xPath filter was not present in the page',
'notification_body': "Your configured CSS/xPath filters of '{}' for {{{{watch_url}}}} did not appear on the page after {} attempts, did the page change layout?\n\nLink: {{{{base_url}}}}/edit/{{{{watch_uuid}}}}\n\nThanks - Your omniscient changedetection.io installation :)\n".format(
", ".join(watch['include_filters']),
threshold),
'notification_format': 'text'}
if len(watch['notification_urls']):
n_object['notification_urls'] = watch['notification_urls']
elif len(self.datastore.data['settings']['application']['notification_urls']):
n_object['notification_urls'] = self.datastore.data['settings']['application']['notification_urls']
# Only prepare to notify if the rules above matched
if 'notification_urls' in n_object:
n_object.update({
'watch_url': watch['url'],
'uuid': watch_uuid,
'screenshot': None
})
self.notification_q.put(n_object)
logger.debug(f"Sent filter not found notification for {watch_uuid}")
else:
logger.debug(f"NOT sending filter not found notification for {watch_uuid} - no notification URLs")
def send_step_failure_notification(self, watch_uuid, step_n):
"""
Send notification when browser steps fail consecutively
"""
watch = self.datastore.data['watching'].get(watch_uuid, False)
if not watch:
return
threshold = self.datastore.data['settings']['application'].get('filter_failure_notification_threshold_attempts')
n_object = {'notification_title': "Changedetection.io - Alert - Browser step at position {} could not be run".format(step_n+1),
'notification_body': "Your configured browser step at position {} for {{{{watch_url}}}} "
"did not appear on the page after {} attempts, did the page change layout? "
"Does it need a delay added?\n\nLink: {{{{base_url}}}}/edit/{{{{watch_uuid}}}}\n\n"
"Thanks - Your omniscient changedetection.io installation :)\n".format(step_n+1, threshold),
'notification_format': 'text'}
if len(watch['notification_urls']):
n_object['notification_urls'] = watch['notification_urls']
elif len(self.datastore.data['settings']['application']['notification_urls']):
n_object['notification_urls'] = self.datastore.data['settings']['application']['notification_urls']
# Only prepare to notify if the rules above matched
if 'notification_urls' in n_object:
n_object.update({
'watch_url': watch['url'],
'uuid': watch_uuid
})
self.notification_q.put(n_object)
logger.error(f"Sent step not found notification for {watch_uuid}")
# Convenience functions for creating notification service instances
def create_notification_service(datastore, notification_q):
"""
Factory function to create a NotificationService instance
"""
return NotificationService(datastore, notification_q)
+12 -11
View File
@@ -27,7 +27,7 @@ class difference_detection_processor():
# Generic fetcher that should be extended (requests, playwright etc)
self.fetcher = Fetcher()
def call_browser(self, preferred_proxy_id=None):
async def call_browser(self, preferred_proxy_id=None):
from requests.structures import CaseInsensitiveDict
@@ -147,16 +147,17 @@ class difference_detection_processor():
# And here we go! call the right browser with browser-specific settings
empty_pages_are_a_change = self.datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
self.fetcher.run(url=url,
timeout=timeout,
request_headers=request_headers,
request_body=request_body,
request_method=request_method,
ignore_status_codes=ignore_status_codes,
current_include_filters=self.watch.get('include_filters'),
is_binary=is_binary,
empty_pages_are_a_change=empty_pages_are_a_change
)
# All fetchers are now async
await self.fetcher.run(url=url,
timeout=timeout,
request_headers=request_headers,
request_body=request_body,
request_method=request_method,
ignore_status_codes=ignore_status_codes,
current_include_filters=self.watch.get('include_filters'),
is_binary=is_binary,
empty_pages_are_a_change=empty_pages_are_a_change
)
#@todo .quit here could go on close object, so we can run JS if change-detected
self.fetcher.quit(watch=self.watch)
+124
View File
@@ -0,0 +1,124 @@
# Real-time Socket.IO Implementation
This directory contains the Socket.IO implementation for changedetection.io's real-time updates.
## Architecture Overview
The real-time system provides live updates to the web interface for:
- Watch status changes (checking, completed, errors)
- Queue length updates
- General statistics updates
## Current Implementation
### Socket.IO Configuration
- **Async Mode**: `threading` (default) or `gevent` (optional via SOCKETIO_MODE env var)
- **Server**: Flask-SocketIO with threading support
- **Background Tasks**: Python threading with daemon threads
### Async Worker Integration
- **Workers**: Async workers using asyncio for watch processing
- **Queue**: AsyncSignalPriorityQueue for job distribution
- **Signals**: Blinker signals for real-time updates between workers and Socket.IO
### Environment Variables
- `SOCKETIO_MODE=threading` (default, recommended)
- `SOCKETIO_MODE=gevent` (optional, has cross-platform limitations)
## Architecture Decision: Why Threading Mode?
### Previous Issues with Eventlet
**Eventlet was completely removed** due to fundamental compatibility issues:
1. **Monkey Patching Conflicts**: `eventlet.monkey_patch()` globally replaced Python's threading/socket modules, causing conflicts with:
- Playwright's synchronous browser automation
- Async worker event loops
- Various Python libraries expecting real threading
2. **Python 3.12+ Compatibility**: Eventlet had issues with newer Python versions and asyncio integration
3. **CVE-2023-29483**: Security vulnerability in eventlet's dnspython dependency
### Current Solution Benefits
**Threading Mode Advantages**:
- Full compatibility with async workers and Playwright
- No monkey patching - uses standard Python threading
- Better Python 3.12+ support
- Cross-platform compatibility (Windows, macOS, Linux)
- No external async library dependencies
- Fast shutdown capabilities
**Optional Gevent Support**:
- Available via `SOCKETIO_MODE=gevent` for high-concurrency scenarios
- Cross-platform limitations documented in requirements.txt
- Not recommended as default due to Windows socket limits and macOS ARM build issues
## Socket.IO Mode Configuration
### Threading Mode (Default)
```python
# Enabled automatically
async_mode = 'threading'
socketio = SocketIO(app, async_mode='threading')
```
### Gevent Mode (Optional)
```bash
# Set environment variable
export SOCKETIO_MODE=gevent
```
## Background Tasks
### Queue Polling
- **Threading Mode**: `threading.Thread` with `threading.Event` for shutdown
- **Signal Handling**: Blinker signals for watch state changes
- **Real-time Updates**: Direct Socket.IO `emit()` calls to connected clients
### Worker Integration
- **Async Workers**: Run in separate asyncio event loop thread
- **Communication**: AsyncSignalPriorityQueue bridges async workers and Socket.IO
- **Updates**: Real-time updates sent when workers complete tasks
## Files in This Directory
- `socket_server.py`: Main Socket.IO initialization and event handling
- `events.py`: Watch operation event handlers
- `__init__.py`: Module initialization
## Production Deployment
### Recommended WSGI Servers
For production with Socket.IO threading mode:
- **Gunicorn**: `gunicorn --worker-class eventlet changedetection:app` (if using gevent mode)
- **uWSGI**: With threading support
- **Docker**: Built-in Flask server works well for containerized deployments
### Performance Considerations
- Threading mode: Better memory usage, standard Python threading
- Gevent mode: Higher concurrency but platform limitations
- Async workers: Separate from Socket.IO, provides scalability
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `SOCKETIO_MODE` | `threading` | Socket.IO async mode (`threading` or `gevent`) |
| `FETCH_WORKERS` | `10` | Number of async workers for watch processing |
| `CHANGEDETECTION_HOST` | `0.0.0.0` | Server bind address |
| `CHANGEDETECTION_PORT` | `5000` | Server port |
## Debugging Tips
1. **Socket.IO Issues**: Check browser dev tools for WebSocket connection errors
2. **Threading Issues**: Monitor with `ps -T` to check thread count
3. **Worker Issues**: Use `/worker-health` endpoint to check async worker status
4. **Queue Issues**: Use `/queue-status` endpoint to monitor job queue
5. **Performance**: Use `/gc-cleanup` endpoint to trigger memory cleanup
## Migration Notes
If upgrading from eventlet-based versions:
- Remove any `EVENTLET_*` environment variables
- No code changes needed - Socket.IO mode is automatically configured
- Optional: Set `SOCKETIO_MODE=gevent` if high concurrency is required and platform supports it
+58
View File
@@ -0,0 +1,58 @@
from flask_socketio import emit
from loguru import logger
from blinker import signal
def register_watch_operation_handlers(socketio, datastore):
"""Register Socket.IO event handlers for watch operations"""
@socketio.on('watch_operation')
def handle_watch_operation(data):
"""Handle watch operations like pause, mute, recheck via Socket.IO"""
try:
op = data.get('op')
uuid = data.get('uuid')
logger.debug(f"Socket.IO: Received watch operation '{op}' for UUID {uuid}")
if not op or not uuid:
emit('operation_result', {'success': False, 'error': 'Missing operation or UUID'})
return
# Check if watch exists
if not datastore.data['watching'].get(uuid):
emit('operation_result', {'success': False, 'error': 'Watch not found'})
return
watch = datastore.data['watching'][uuid]
# Perform the operation
if op == 'pause':
watch.toggle_pause()
logger.info(f"Socket.IO: Toggled pause for watch {uuid}")
elif op == 'mute':
watch.toggle_mute()
logger.info(f"Socket.IO: Toggled mute for watch {uuid}")
elif op == 'recheck':
# Import here to avoid circular imports
from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
logger.info(f"Socket.IO: Queued recheck for watch {uuid}")
else:
emit('operation_result', {'success': False, 'error': f'Unknown operation: {op}'})
return
# Send signal to update UI
watch_check_update = signal('watch_check_update')
if watch_check_update:
watch_check_update.send(watch_uuid=uuid)
# Send success response to client
emit('operation_result', {'success': True, 'operation': op, 'uuid': uuid})
except Exception as e:
logger.error(f"Socket.IO error in handle_watch_operation: {str(e)}")
emit('operation_result', {'success': False, 'error': str(e)})
+163 -104
View File
@@ -8,8 +8,10 @@ from blinker import signal
from changedetectionio import strtobool
class SignalHandler:
"""A standalone class to receive signals"""
def __init__(self, socketio_instance, datastore):
self.socketio_instance = socketio_instance
self.datastore = datastore
@@ -17,19 +19,22 @@ class SignalHandler:
# Connect to the watch_check_update signal
from changedetectionio.flask_app import watch_check_update as wcc
wcc.connect(self.handle_signal, weak=False)
logger.info("SignalHandler: Connected to signal from direct import")
# logger.info("SignalHandler: Connected to signal from direct import")
# Connect to the queue_length signal
queue_length_signal = signal('queue_length')
queue_length_signal.connect(self.handle_queue_length, weak=False)
logger.info("SignalHandler: Connected to queue_length signal")
# logger.info("SignalHandler: Connected to queue_length signal")
# Create and start the queue update thread using standard threading
import threading
self.polling_emitter_thread = threading.Thread(
target=self.polling_emit_running_or_queued_watches_threaded,
daemon=True
)
self.polling_emitter_thread.start()
logger.info("Started polling thread using threading (eventlet-free)")
# Create and start the queue update thread using gevent
import gevent
logger.info("Using gevent for polling thread")
self.polling_emitter_thread = gevent.spawn(self.polling_emit_running_or_queued_watches)
# Store the thread reference in socketio for clean shutdown
self.socketio_instance.polling_emitter_thread = self.polling_emitter_thread
@@ -44,7 +49,7 @@ class SignalHandler:
watch = self.datastore.data['watching'].get(watch_uuid)
if watch:
if app_context:
#note
# note
with app_context.app_context():
with app_context.test_request_context():
# Forward to handle_watch_update with the watch parameter
@@ -61,59 +66,85 @@ class SignalHandler:
try:
queue_length = kwargs.get('length', 0)
logger.debug(f"SignalHandler: Queue length update received: {queue_length}")
# Emit the queue size to all connected clients
self.socketio_instance.emit("queue_size", {
"q_length": queue_length,
"event_timestamp": time.time()
})
except Exception as e:
logger.error(f"Socket.IO error in handle_queue_length: {str(e)}")
def polling_emit_running_or_queued_watches(self):
"""Greenlet that periodically updates the browser/frontend with current state of who is being checked or queued
This is because sometimes the browser page could reload (like on clicking on a link) but the data is old
"""
logger.info("Queue update greenlet started")
# Import the watch_check_update signal, update_q, and running_update_threads here to avoid circular imports
from changedetectionio.flask_app import app, running_update_threads
def polling_emit_running_or_queued_watches_threaded(self):
"""Threading version of polling for Windows compatibility"""
import time
import threading
logger.info("Queue update thread started (threading mode)")
# Import here to avoid circular imports
from changedetectionio.flask_app import app
from changedetectionio import worker_handler
watch_check_update = signal('watch_check_update')
# Use gevent sleep for non-blocking operation
from gevent import sleep as gevent_sleep
# Get the stop event from the socketio instance
stop_event = self.socketio_instance.stop_event if hasattr(self.socketio_instance, 'stop_event') else None
# Run until explicitly stopped
while stop_event is None or not stop_event.is_set():
# Track previous state to avoid unnecessary emissions
previous_running_uuids = set()
# Run until app shutdown - check exit flag more frequently for fast shutdown
exit_event = getattr(app.config, 'exit', threading.Event())
while not exit_event.is_set():
try:
# For each item in the queue, send a signal, so we update the UI
for t in running_update_threads:
if hasattr(t, 'current_uuid') and t.current_uuid:
logger.trace(f"Sending update for {t.current_uuid}")
# Send with app_context to ensure proper URL generation
with app.app_context():
watch_check_update.send(app_context=app, watch_uuid=t.current_uuid)
# Yield control back to gevent after each send to prevent blocking
gevent_sleep(0.1) # Small sleep to yield control
# Check if we need to stop in the middle of processing
if stop_event is not None and stop_event.is_set():
# Get current running UUIDs from async workers
running_uuids = set(worker_handler.get_running_uuids())
# Only send updates for UUIDs that changed state
newly_running = running_uuids - previous_running_uuids
no_longer_running = previous_running_uuids - running_uuids
# Send updates for newly running UUIDs (but exit fast if shutdown requested)
for uuid in newly_running:
if exit_event.is_set():
break
# Sleep between polling/update cycles
gevent_sleep(2)
logger.trace(f"Threading polling: UUID {uuid} started processing")
with app.app_context():
watch_check_update.send(app_context=app, watch_uuid=uuid)
time.sleep(0.01) # Small yield
# Send updates for UUIDs that finished processing (but exit fast if shutdown requested)
if not exit_event.is_set():
for uuid in no_longer_running:
if exit_event.is_set():
break
logger.trace(f"Threading polling: UUID {uuid} finished processing")
with app.app_context():
watch_check_update.send(app_context=app, watch_uuid=uuid)
time.sleep(0.01) # Small yield
# Update tracking for next iteration
previous_running_uuids = running_uuids
# Sleep between polling cycles, but check exit flag every 0.5 seconds for fast shutdown
for _ in range(20): # 20 * 0.5 = 10 seconds total
if exit_event.is_set():
break
time.sleep(0.5)
except Exception as e:
logger.error(f"Error in queue update greenlet: {str(e)}")
# Sleep a bit to avoid flooding logs in case of persistent error
gevent_sleep(0.5)
logger.info("Queue update greenlet stopped")
logger.error(f"Error in threading polling: {str(e)}")
# Even during error recovery, check for exit quickly
for _ in range(1): # 1 * 0.5 = 0.5 seconds
if exit_event.is_set():
break
time.sleep(0.5)
# Check if we're in pytest environment - if so, be more gentle with logging
import sys
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
if not in_pytest:
logger.info("Queue update thread stopped (threading mode)")
def handle_watch_update(socketio, **kwargs):
@@ -123,14 +154,12 @@ def handle_watch_update(socketio, **kwargs):
datastore = kwargs.get('datastore')
# Emit the watch update to all connected clients
from changedetectionio.flask_app import running_update_threads, update_q
from changedetectionio.flask_app import update_q
from changedetectionio.flask_app import _jinja2_filter_datetime
from changedetectionio import worker_handler
# Get list of watches that are currently running
running_uuids = []
for t in running_update_threads:
if hasattr(t, 'current_uuid') and t.current_uuid:
running_uuids.append(t.current_uuid)
running_uuids = worker_handler.get_running_uuids()
# Get list of watches in the queue
queue_list = []
@@ -143,26 +172,30 @@ def handle_watch_update(socketio, **kwargs):
error_texts = watch.compile_error_texts()
# Create a simplified watch data object to send to clients
watch_uuid = watch.get('uuid')
watch_data = {
'checking_now': True if watch.get('uuid') in running_uuids else False,
'checking_now': True if watch_uuid in running_uuids else False,
'fetch_time': watch.get('fetch_time'),
'has_error': True if error_texts else False,
'last_changed': watch.get('last_changed'),
'last_checked': watch.get('last_checked'),
'error_text': error_texts,
'history_n': watch.history_n,
'last_checked_text': _jinja2_filter_datetime(watch),
'last_changed_text': timeago.format(int(watch['last_changed']), time.time()) if watch.history_n >= 2 and int(watch.get('last_changed', 0)) > 0 else 'Not yet',
'queued': True if watch.get('uuid') in queue_list else False,
'last_changed_text': timeago.format(int(watch['last_changed']), time.time()) if watch.history_n >= 2 and int(
watch.get('last_changed', 0)) > 0 else 'Not yet',
'queued': True if watch_uuid in queue_list else False,
'paused': True if watch.get('paused') else False,
'notification_muted': True if watch.get('notification_muted') else False,
'unviewed': watch.has_unviewed,
'uuid': watch.get('uuid'),
'uuid': watch_uuid,
'event_timestamp': time.time()
}
errored_count =0
for uuid, watch in datastore.data['watching'].items():
if watch.get('last_error'):
errored_count = 0
for watch_uuid_iter, watch_iter in datastore.data['watching'].items():
if watch_iter.get('last_error'):
errored_count += 1
general_stats = {
@@ -171,13 +204,13 @@ def handle_watch_update(socketio, **kwargs):
}
# Debug what's being emitted
#logger.debug(f"Emitting 'watch_update' event for {watch.get('uuid')}, data: {watch_data}")
# logger.debug(f"Emitting 'watch_update' event for {watch.get('uuid')}, data: {watch_data}")
# Emit to all clients (no 'broadcast' parameter needed - it's the default behavior)
socketio.emit("watch_update", {'watch': watch_data, 'general_stats': general_stats})
# Log after successful emit
#logger.info(f"Socket.IO: Emitted update for watch {watch.get('uuid')}, Checking now: {watch_data['checking_now']}")
# Log after successful emit - use watch_data['uuid'] to avoid variable shadowing issues
logger.trace(f"Socket.IO: Emitted update for watch {watch_data['uuid']}, Checking now: {watch_data['checking_now']}")
except Exception as e:
logger.error(f"Socket.IO error in handle_watch_update: {str(e)}")
@@ -185,35 +218,65 @@ def handle_watch_update(socketio, **kwargs):
def init_socketio(app, datastore):
"""Initialize SocketIO with the main Flask app"""
# Use the threading async_mode instead of eventlet
# This avoids the need for monkey patching eventlet,
# Which leads to problems with async playwright etc
async_mode = 'gevent'
logger.info(f"Using {async_mode} mode for Socket.IO")
import platform
import sys
# Platform-specific async_mode selection for better stability
system = platform.system().lower()
python_version = sys.version_info
# Check for SocketIO mode configuration via environment variable
# Default is 'threading' for best cross-platform compatibility
socketio_mode = os.getenv('SOCKETIO_MODE', 'threading').lower()
if socketio_mode == 'gevent':
# Use gevent mode (higher concurrency but platform limitations)
try:
import gevent
async_mode = 'gevent'
logger.info(f"SOCKETIO_MODE=gevent: Using {async_mode} mode for Socket.IO")
except ImportError:
async_mode = 'threading'
logger.warning(f"SOCKETIO_MODE=gevent but gevent not available, falling back to {async_mode} mode")
elif socketio_mode == 'threading':
# Use threading mode (default - best compatibility)
async_mode = 'threading'
logger.info(f"SOCKETIO_MODE=threading: Using {async_mode} mode for Socket.IO")
else:
# Invalid mode specified, use default
async_mode = 'threading'
logger.warning(f"Invalid SOCKETIO_MODE='{socketio_mode}', using default {async_mode} mode for Socket.IO")
# Log platform info for debugging
logger.info(f"Platform: {system}, Python: {python_version.major}.{python_version.minor}, Socket.IO mode: {async_mode}")
# Restrict SocketIO CORS to same origin by default, can be overridden with env var
cors_origins = os.environ.get('SOCKETIO_CORS_ORIGINS', None)
socketio = SocketIO(app,
async_mode=async_mode,
cors_allowed_origins=cors_origins, # None means same-origin only
logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')),
engineio_logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')))
async_mode=async_mode,
cors_allowed_origins=cors_origins, # None means same-origin only
logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')),
engineio_logger=strtobool(os.getenv('SOCKETIO_LOGGING', 'False')))
# Set up event handlers
logger.info("Socket.IO: Registering connect event handler")
@socketio.on('connect')
def handle_connect():
"""Handle client connection"""
from changedetectionio.auth_decorator import login_optionally_required
# logger.info("Socket.IO: CONNECT HANDLER CALLED - Starting connection process")
from flask import request
from flask_login import current_user
from changedetectionio.flask_app import update_q
# Access datastore from socketio
datastore = socketio.datastore
# logger.info(f"Socket.IO: Current user authenticated: {current_user.is_authenticated if hasattr(current_user, 'is_authenticated') else 'No current_user'}")
# Check if authentication is required and user is not authenticated
has_password_enabled = datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False)
# logger.info(f"Socket.IO: Password enabled: {has_password_enabled}")
if has_password_enabled and not current_user.is_authenticated:
logger.warning("Socket.IO: Rejecting unauthenticated connection")
return False # Reject the connection
@@ -231,6 +294,7 @@ def init_socketio(app, datastore):
logger.info("Socket.IO: Client connected")
# logger.info("Socket.IO: Registering disconnect event handler")
@socketio.on('disconnect')
def handle_disconnect():
"""Handle client disconnection"""
@@ -239,45 +303,40 @@ def init_socketio(app, datastore):
# Create a dedicated signal handler that will receive signals and emit them to clients
signal_handler = SignalHandler(socketio, datastore)
# Register watch operation event handlers
from .events import register_watch_operation_handlers
register_watch_operation_handlers(socketio, datastore)
# Store the datastore reference on the socketio object for later use
socketio.datastore = datastore
# Create a stop event for our queue update thread using gevent Event
import gevent.event
stop_event = gevent.event.Event()
socketio.stop_event = stop_event
# No stop event needed for threading mode - threads check app.config.exit directly
# Add a shutdown method to the socketio object
def shutdown():
"""Shutdown the SocketIO server gracefully"""
"""Shutdown the SocketIO server fast and aggressively"""
try:
logger.info("Socket.IO: Shutting down server...")
# Signal the queue update thread to stop
if hasattr(socketio, 'stop_event'):
socketio.stop_event.set()
logger.info("Socket.IO: Signaled queue update thread to stop")
# Wait for the greenlet to exit (with timeout)
logger.info("Socket.IO: Fast shutdown initiated...")
# For threading mode, give the thread a very short time to exit gracefully
if hasattr(socketio, 'polling_emitter_thread'):
try:
# For gevent greenlets
socketio.polling_emitter_thread.join(timeout=5)
logger.info("Socket.IO: Queue update greenlet joined successfully")
except Exception as e:
logger.error(f"Error joining greenlet: {str(e)}")
logger.info("Socket.IO: Queue update greenlet did not exit in time")
# Close any remaining client connections
#if hasattr(socketio, 'server'):
# socketio.server.disconnect()
logger.info("Socket.IO: Server shutdown complete")
if socketio.polling_emitter_thread.is_alive():
logger.info("Socket.IO: Waiting 1 second for polling thread to stop...")
socketio.polling_emitter_thread.join(timeout=1.0) # Only 1 second timeout
if socketio.polling_emitter_thread.is_alive():
logger.info("Socket.IO: Polling thread still running after timeout - continuing with shutdown")
else:
logger.info("Socket.IO: Polling thread stopped quickly")
else:
logger.info("Socket.IO: Polling thread already stopped")
logger.info("Socket.IO: Fast shutdown complete")
except Exception as e:
logger.error(f"Socket.IO error during shutdown: {str(e)}")
# Attach the shutdown method to the socketio object
socketio.shutdown = shutdown
logger.info("Socket.IO initialized and attached to main Flask app")
logger.info(f"Socket.IO: Registered event handlers: {socketio.handlers if hasattr(socketio, 'handlers') else 'No handlers found'}")
return socketio
+26 -14
View File
@@ -2,20 +2,20 @@
$(document).ready(function () {
function bindAjaxHandlerButtonsEvents() {
$('.ajax-op').on('click.ajaxHandlerNamespace', function (e) {
function bindSocketHandlerButtonsEvents(socket) {
$('.ajax-op').on('click.socketHandlerNamespace', function (e) {
e.preventDefault();
$.ajax({
type: "POST",
url: ajax_toggle_url,
data: {'op': $(this).data('op'), 'uuid': $(this).closest('tr').data('watch-uuid')},
statusCode: {
400: function () {
// More than likely the CSRF token was lost when the server restarted
alert("There was a problem processing the request, please reload the page.");
}
}
const op = $(this).data('op');
const uuid = $(this).closest('tr').data('watch-uuid');
console.log(`Socket.IO: Sending watch operation '${op}' for UUID ${uuid}`);
// Emit the operation via Socket.IO
socket.emit('watch_operation', {
'op': op,
'uuid': uuid
});
return false;
});
}
@@ -38,7 +38,7 @@ $(document).ready(function () {
socket.on('connect', function () {
console.log('Socket.IO connected with path:', socketio_url);
console.log('Socket transport:', socket.io.engine.transport.name);
bindAjaxHandlerButtonsEvents();
bindSocketHandlerButtonsEvents(socket);
});
socket.on('connect_error', function(error) {
@@ -55,7 +55,7 @@ $(document).ready(function () {
socket.on('disconnect', function (reason) {
console.log('Socket.IO disconnected, reason:', reason);
$('.ajax-op').off('.ajaxHandlerNamespace')
$('.ajax-op').off('.socketHandlerNamespace')
});
socket.on('queue_size', function (data) {
@@ -63,6 +63,16 @@ $(document).ready(function () {
// Update queue size display if implemented in the UI
})
// Listen for operation results
socket.on('operation_result', function (data) {
if (data.success) {
console.log(`Socket.IO: Operation '${data.operation}' completed successfully for UUID ${data.uuid}`);
} else {
console.error(`Socket.IO: Operation failed: ${data.error}`);
alert("There was a problem processing the request: " + data.error);
}
});
// Listen for periodically emitted watch data
console.log('Adding watch_update event listener');
@@ -87,6 +97,8 @@ $(document).ready(function () {
$($watchRow).toggleClass('has-error', watch.has_error);
$($watchRow).toggleClass('notification_muted', watch.notification_muted);
$($watchRow).toggleClass('paused', watch.paused);
$($watchRow).toggleClass('single-history', watch.history_n === 1);
$($watchRow).toggleClass('multiple-history', watch.history_n >= 2);
$('td.title-col .error-text', $watchRow).html(watch.error_text)
File diff suppressed because one or more lines are too long
@@ -39,10 +39,13 @@
}
}
.title-col a[target="_blank"]::after,
.current-diff-url::after {
content: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAQElEQVR42qXKwQkAIAxDUUdxtO6/RBQkQZvSi8I/pL4BoGw/XPkh4XigPmsUgh0626AjRsgxHTkUThsG2T/sIlzdTsp52kSS1wAAAABJRU5ErkJggg==);
.title-col a[target="_blank"] i[data-feather],
.current-diff-url i[data-feather] {
width: 12px;
height: 12px;
stroke: #666;
margin: 0 3px 0 5px;
vertical-align: middle;
}
@@ -114,5 +117,18 @@
display: block !important;
}
}
tr.single-history {
a.preview-link {
display: inline-block !important;
}
}
tr.multiple-history {
a.history-link {
display: inline-block !important;
}
}
}
@@ -1083,6 +1083,9 @@ ul {
/* some space if they wrap the page */
margin-bottom: 3px;
margin-top: 3px;
/* vertically center icon and text */
display: inline-flex;
align-items: center;
}
}
+15 -5
View File
@@ -545,10 +545,13 @@ body.preview-text-enabled {
font-weight: bolder; }
.watch-table th a.inactive .arrow {
display: none; }
.watch-table .title-col a[target="_blank"]::after,
.watch-table .current-diff-url::after {
content: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAQElEQVR42qXKwQkAIAxDUUdxtO6/RBQkQZvSi8I/pL4BoGw/XPkh4XigPmsUgh0626AjRsgxHTkUThsG2T/sIlzdTsp52kSS1wAAAABJRU5ErkJggg==);
margin: 0 3px 0 5px; }
.watch-table .title-col a[target="_blank"] i[data-feather],
.watch-table .current-diff-url i[data-feather] {
width: 12px;
height: 12px;
stroke: #666;
margin: 0 3px 0 5px;
vertical-align: middle; }
.watch-table tr.checking-now td:first-child {
position: relative; }
.watch-table tr.checking-now td:first-child::before {
@@ -579,6 +582,10 @@ body.preview-text-enabled {
color: var(--color-watch-table-error); }
.watch-table tr.has-error .error-text {
display: block !important; }
.watch-table tr.single-history a.preview-link {
display: inline-block !important; }
.watch-table tr.multiple-history a.history-link {
display: inline-block !important; }
ul#conditions_match_logic {
list-style: none; }
@@ -1457,7 +1464,10 @@ ul {
#checkbox-operations button {
/* some space if they wrap the page */
margin-bottom: 3px;
margin-top: 3px; }
margin-top: 3px;
/* vertically center icon and text */
display: inline-flex;
align-items: center; }
.checkbox-uuid > * {
vertical-align: middle; }
+7 -1
View File
@@ -238,6 +238,7 @@ class ChangeDetectionStore:
with self.lock:
if uuid == 'all':
self.__data['watching'] = {}
time.sleep(1) # Mainly used for testing to allow all items to flush before running next test
# GitHub #30 also delete history records
for uuid in self.data['watching']:
@@ -407,7 +408,12 @@ class ChangeDetectionStore:
# This is a fairly basic strategy to deal with the case that the file is corrupted,
# system was out of memory, out of RAM etc
with open(self.json_store_path+".tmp", 'w') as json_file:
json.dump(data, json_file, indent=4)
# Use compact JSON in production for better performance
debug_mode = os.environ.get('CHANGEDETECTION_DEBUG', 'false').lower() == 'true'
if debug_mode:
json.dump(data, json_file, indent=4)
else:
json.dump(data, json_file, separators=(',', ':'))
os.replace(self.json_store_path+".tmp", self.json_store_path)
except Exception as e:
logger.error(f"Error writing JSON!! (Main JSON file save was skipped) : {str(e)}")
+4 -2
View File
@@ -31,11 +31,13 @@
const socketio_url="{{ get_socketio_path() }}/socket.io";
const is_authenticated = {% if current_user.is_authenticated or not has_password %}true{% else %}false{% endif %};
</script>
<script src="https://unpkg.com/feather-icons"></script>
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='csrf.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='socket.io.min.js')}}" integrity="sha384-c79GN5VsunZvi+Q/WObgk2in0CbZsHnjEqvFxC5DxHn9lTfNce2WW6h2pH6u/kF+" crossorigin="anonymous"></script>
{% if socket_io_enabled %}
<script src="{{url_for('static_content', group='js', filename='socket.io.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='realtime.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='timeago-init.js')}}" defer></script>
{% endif %}
</head>
<body class="">
+43
View File
@@ -10,6 +10,8 @@ import os
import sys
from loguru import logger
from changedetectionio.tests.util import live_server_setup, new_live_server_setup
# https://github.com/pallets/flask/blob/1.1.2/examples/tutorial/tests/test_auth.py
# Much better boilerplate than the docs
# https://www.python-boilerplate.com/py3+flask+pytest/
@@ -70,6 +72,22 @@ def cleanup(datastore_path):
if os.path.isfile(f):
os.unlink(f)
@pytest.fixture(scope='function', autouse=True)
def prepare_test_function(live_server):
routes = [rule.rule for rule in live_server.app.url_map.iter_rules()]
if '/test-random-content-endpoint' not in routes:
logger.debug("Setting up test URL routes")
new_live_server_setup(live_server)
yield
# Then cleanup/shutdown
live_server.app.config['DATASTORE'].data['watching']={}
time.sleep(0.3)
live_server.app.config['DATASTORE'].data['watching']={}
@pytest.fixture(scope='session')
def app(request):
"""Create application for the tests."""
@@ -106,8 +124,33 @@ def app(request):
app.config['STOP_THREADS'] = True
def teardown():
# Stop all threads and services
datastore.stop_thread = True
app.config.exit.set()
# Shutdown workers gracefully before loguru cleanup
try:
from changedetectionio import worker_handler
worker_handler.shutdown_workers()
except Exception:
pass
# Stop socket server threads
try:
from changedetectionio.flask_app import socketio_server
if socketio_server and hasattr(socketio_server, 'shutdown'):
socketio_server.shutdown()
except Exception:
pass
# Give threads a moment to finish their shutdown
import time
time.sleep(0.1)
# Remove all loguru handlers to prevent "closed file" errors
logger.remove()
# Cleanup files
cleanup(app_config['datastore_path'])
@@ -78,12 +78,12 @@ def do_test(client, live_server, make_test_use_extra_browser=False):
# Requires playwright to be installed
def test_request_via_custom_browser_url(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=True)
def test_request_not_via_custom_browser_url(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=False)
@@ -7,7 +7,7 @@ import logging
# Requires playwright to be installed
def test_fetch_webdriver_content(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
#####################
res = client.post(
@@ -5,7 +5,7 @@ from ..util import live_server_setup, wait_for_all_checks, extract_UUID_from_cli
def test_execute_custom_js(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
test_url = url_for('test_interactive_html_endpoint', _external=True)
@@ -6,7 +6,7 @@ from ..util import live_server_setup, wait_for_all_checks
def test_preferred_proxy(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
url = "http://chosen.changedetection.io"
@@ -6,7 +6,7 @@ from ..util import live_server_setup, wait_for_all_checks, extract_UUID_from_cli
def test_noproxy_option(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Run by run_proxy_tests.sh
# Call this URL then scan the containers that it never went through them
url = "http://noproxy.changedetection.io"
@@ -6,7 +6,7 @@ from ..util import live_server_setup, wait_for_all_checks, extract_UUID_from_cli
# just make a request, we will grep in the docker logs to see it actually got called
def test_check_basic_change_detection_functionality(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
res = client.post(
url_for("imports.import_page"),
# Because a URL wont show in squid/proxy logs due it being SSLed
@@ -13,7 +13,7 @@ from ... import strtobool
# WEBDRIVER_URL=http://127.0.0.1:4444/wd/hub pytest tests/proxy_list/test_proxy_noconnect.py
def test_proxy_noconnect_custom(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Goto settings, add our custom one
res = client.post(
@@ -7,7 +7,7 @@ import os
# just make a request, we will grep in the docker logs to see it actually got called
def test_select_custom(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Goto settings, add our custom one
res = client.post(
@@ -20,7 +20,7 @@ def set_response():
time.sleep(1)
def test_socks5(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_response()
# Setup a proxy
@@ -21,7 +21,7 @@ def set_response():
# should be proxies.json mounted from run_proxy_tests.sh already
# -v `pwd`/tests/proxy_socks5/proxies.json-example:/app/changedetectionio/test-datastore/proxies.json
def test_socks5_from_proxiesjson_file(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_response()
# Because the socks server should connect back to us
test_url = url_for('test_endpoint', _external=True) + f"?socks-test-tag={os.getenv('SOCKSTEST', '')}"
@@ -54,7 +54,7 @@ def test_restock_detection(client, live_server, measure_memory_usage):
set_original_response()
#assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
#####################
notification_url = url_for('test_notification_endpoint', _external=True).replace('http://localhost', 'http://changedet').replace('http', 'json')
@@ -20,8 +20,7 @@ from changedetectionio.notification import (
valid_notification_formats,
)
def test_setup(live_server):
live_server_setup(live_server)
def get_last_message_from_smtp_server():
import socket
@@ -40,7 +39,7 @@ def get_last_message_from_smtp_server():
# Requires running the test SMTP server
def test_check_notification_email_formats_default_HTML(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
## live_server_setup(live_server) # Setup on conftest per function
set_original_response()
notification_url = f'mailto://changedetection@{smtp_test_server}:11025/?to=fff@home.com'
@@ -91,7 +90,7 @@ def test_check_notification_email_formats_default_HTML(client, live_server, meas
def test_check_notification_email_formats_default_Text_override_HTML(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
## live_server_setup(live_server) # Setup on conftest per function
# HTML problems? see this
# https://github.com/caronc/apprise/issues/633
@@ -4,7 +4,7 @@ import time
def test_check_access_control(app, client, live_server):
# Still doesnt work, but this is closer.
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
with app.test_client(use_cookies=True) as c:
# Check we don't have any password protection enabled yet.
@@ -4,7 +4,7 @@ import os.path
from flask import url_for
from .util import live_server_setup, wait_for_all_checks, wait_for_notification_endpoint_output
import time
def set_original(excluding=None, add_line=None):
test_return_data = """<html>
@@ -35,11 +35,11 @@ def set_original(excluding=None, add_line=None):
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_check_removed_line_contains_trigger(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Give the endpoint time to spin up
set_original()
# Add our URL to the import page
@@ -72,6 +72,7 @@ def test_check_removed_line_contains_trigger(client, live_server, measure_memory
res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
assert b'Queued 1 watch for rechecking.' in res.data
wait_for_all_checks(client)
time.sleep(0.5)
res = client.get(url_for("watchlist.index"))
assert b'unviewed' not in res.data
@@ -84,12 +85,17 @@ def test_check_removed_line_contains_trigger(client, live_server, measure_memory
res = client.get(url_for("watchlist.index"))
assert b'unviewed' in res.data
time.sleep(1)
# Now add it back, and we should not get a trigger
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
time.sleep(1)
set_original(excluding=None)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
time.sleep(1)
res = client.get(url_for("watchlist.index"))
assert b'unviewed' not in res.data
@@ -105,7 +111,10 @@ def test_check_removed_line_contains_trigger(client, live_server, measure_memory
def test_check_add_line_contains_trigger(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
time.sleep(1)
# Give the endpoint time to spin up
test_notification_url = url_for('test_notification_endpoint', _external=True).replace('http://', 'post://') + "?xxx={{ watch_url }}"
+8 -7
View File
@@ -52,12 +52,12 @@ def is_valid_uuid(val):
return False
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_api_simple(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
@@ -108,7 +108,7 @@ def test_api_simple(client, live_server, measure_memory_usage):
headers={'x-api-key': api_key}
)
assert len(res.json) == 0
time.sleep(1)
wait_for_all_checks(client)
set_modified_response()
@@ -119,6 +119,7 @@ def test_api_simple(client, live_server, measure_memory_usage):
)
wait_for_all_checks(client)
time.sleep(1)
# Did the recheck fire?
res = client.get(
url_for("createwatch"),
@@ -291,7 +292,7 @@ def test_access_denied(client, live_server, measure_memory_usage):
def test_api_watch_PUT_update(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Create a watch
@@ -371,7 +372,7 @@ def test_api_watch_PUT_update(client, live_server, measure_memory_usage):
def test_api_import(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
res = client.post(
@@ -393,7 +394,7 @@ def test_api_import(client, live_server, measure_memory_usage):
def test_api_conflict_UI_password(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Enable password check and diff page access bypass
@@ -5,7 +5,7 @@ from .util import live_server_setup
import json
def test_api_notifications_crud(client, live_server):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
# Confirm notifications are initially empty
+1 -1
View File
@@ -7,7 +7,7 @@ from .util import live_server_setup, wait_for_all_checks
def test_api_search(client, live_server):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
watch_data = {}
+1 -1
View File
@@ -5,7 +5,7 @@ from .util import live_server_setup, wait_for_all_checks
import json
def test_api_tags_listing(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
tag_title = 'Test Tag'
+1 -1
View File
@@ -6,7 +6,7 @@ from .util import live_server_setup, wait_for_all_checks
# test pages with http://username@password:foobar.com/ work
def test_basic_auth(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# This page will echo back any auth info
@@ -76,12 +76,12 @@ def set_response_without_ldjson():
f.write(test_return_data)
return None
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
# actually only really used by the distll.io importer, but could be handy too
def test_check_ldjson_price_autodetect(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_response_with_ldjson()
# Add our URL to the import page
@@ -164,7 +164,7 @@ def _test_runner_check_bad_format_ignored(live_server, client, has_ldjson_price_
def test_bad_ldjson_is_correctly_ignored(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_return_data = """
<html>
<head>
+2 -1
View File
@@ -18,7 +18,7 @@ def test_inscriptus():
def test_check_basic_change_detection_functionality(client, live_server, measure_memory_usage):
set_original_response()
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Add our URL to the import page
res = client.post(
@@ -143,6 +143,7 @@ def test_check_basic_change_detection_functionality(client, live_server, measure
# hit the mark all viewed link
res = client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
assert b'class="has-unviewed' not in res.data
assert b'unviewed' not in res.data
+1 -1
View File
@@ -9,7 +9,7 @@ import time
def test_backup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_original_response()
@@ -110,7 +110,7 @@ def run_socketio_watch_update_test(client, live_server, password_mode=""):
def test_everything(live_server, client):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
run_socketio_watch_update_test(password_mode="", live_server=live_server, client=client)
@@ -62,7 +62,7 @@ def set_modified_response_minus_block_text():
def test_check_block_changedetection_text_NOT_present(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Use a mix of case in ZzZ to prove it works case-insensitive.
ignore_text = "out of stoCk\r\nfoobar"
set_original_ignore_response()
+1 -1
View File
@@ -7,7 +7,7 @@ from .util import live_server_setup, wait_for_all_checks
def test_clone_functionality(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write("<html><body>Some content</body></html>")
+9 -5
View File
@@ -45,15 +45,15 @@ def set_number_out_of_range_response(number="150"):
f.write(test_return_data)
def test_setup(client, live_server):
# def test_setup(client, live_server):
"""Test that both text and number conditions work together with AND logic."""
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
def test_conditions_with_text_and_number(client, live_server):
"""Test that both text and number conditions work together with AND logic."""
set_original_response("50")
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
@@ -110,6 +110,8 @@ def test_conditions_with_text_and_number(client, live_server):
wait_for_all_checks(client)
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
wait_for_all_checks(client)
# Case 1
@@ -126,6 +128,8 @@ def test_conditions_with_text_and_number(client, live_server):
# Case 2: Change with one condition violated
# Number out of range (150) but contains '5'
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
set_number_out_of_range_response("150.5")
@@ -206,7 +210,7 @@ def test_condition_validate_rule_row(client, live_server):
# If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_wordcount_conditions_plugin(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_return_data = """<html>
<body>
@@ -249,7 +253,7 @@ def test_wordcount_conditions_plugin(client, live_server, measure_memory_usage):
# If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_lev_conditions_plugin(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write("""<html>
+3 -4
View File
@@ -6,8 +6,7 @@ from .util import live_server_setup, wait_for_all_checks
from ..html_tools import *
def test_setup(live_server):
live_server_setup(live_server)
def set_original_response():
test_return_data = """<html>
@@ -125,7 +124,7 @@ def test_check_markup_include_filters_restriction(client, live_server, measure_m
# Tests the whole stack works with the CSS Filter
def test_check_multiple_filters(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
include_filters = "#blob-a\r\nxpath://*[contains(@id,'blob-b')]"
with open("test-datastore/endpoint-content.txt", "w") as f:
@@ -177,7 +176,7 @@ def test_check_multiple_filters(client, live_server, measure_memory_usage):
# Mainly used when the filter contains just an IMG, this can happen when someone selects an image in the visual-selector
# Tests fetcher can throw a "ReplyWithContentButNoText" exception after applying filter and extracting text
def test_filter_is_empty_help_suggestion(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
include_filters = "#blob-a"
@@ -8,8 +8,7 @@ from ..html_tools import *
from .util import live_server_setup, wait_for_all_checks
def test_setup(live_server):
live_server_setup(live_server)
def set_response_with_multiple_index():
data= """<!DOCTYPE html>
@@ -148,7 +147,7 @@ across multiple lines
def test_element_removal_full(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_response()
@@ -209,7 +208,7 @@ def test_element_removal_full(client, live_server, measure_memory_usage):
# Re #2752
def test_element_removal_nth_offset_no_shift(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_response_with_multiple_index()
subtractive_selectors_data = ["""
+1 -2
View File
@@ -7,8 +7,7 @@ from .util import live_server_setup, wait_for_all_checks, extract_UUID_from_clie
import pytest
def test_setup(live_server):
live_server_setup(live_server)
def set_html_response():
+25 -7
View File
@@ -5,10 +5,7 @@ import time
from flask import url_for
from .util import live_server_setup, wait_for_all_checks
from ..html_tools import *
def test_setup(live_server):
live_server_setup(live_server)
def _runner_test_http_errors(client, live_server, http_code, expected_text):
@@ -79,7 +76,14 @@ def test_DNS_errors(client, live_server, measure_memory_usage):
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
found_name_resolution_error = b"Temporary failure in name resolution" in res.data or b"Name or service not known" in res.data
found_name_resolution_error = (
b"No address found" in res.data or
b"Name or service not known" in res.data or
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
)
assert found_name_resolution_error
# Should always record that we tried
assert bytes("just now".encode('utf-8')) in res.data
@@ -88,7 +92,7 @@ def test_DNS_errors(client, live_server, measure_memory_usage):
# Re 1513
def test_low_level_errors_clear_correctly(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Give the endpoint time to spin up
time.sleep(1)
@@ -108,7 +112,14 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
# We should see the DNS error
res = client.get(url_for("watchlist.index"))
found_name_resolution_error = b"Temporary failure in name resolution" in res.data or b"Name or service not known" in res.data
found_name_resolution_error = (
b"No address found" in res.data or
b"Name or service not known" in res.data or
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
)
assert found_name_resolution_error
# Update with what should work
@@ -123,7 +134,14 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
# Now the error should be gone
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
found_name_resolution_error = b"Temporary failure in name resolution" in res.data or b"Name or service not known" in res.data
found_name_resolution_error = (
b"No address found" in res.data or
b"Name or service not known" in res.data or
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
)
assert not found_name_resolution_error
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
+1 -1
View File
@@ -14,7 +14,7 @@ def test_check_extract_text_from_diff(client, live_server, measure_memory_usage)
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write("Now it's {} seconds since epoch, time flies!".format(str(time.time())))
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Add our URL to the import page
res = client.post(
@@ -67,11 +67,11 @@ def set_multiline_response():
return None
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_check_filter_multiline(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
## live_server_setup(live_server) # Setup on conftest per function
set_multiline_response()
# Add our URL to the import page
@@ -206,7 +206,7 @@ def test_check_filter_and_regex_extract(client, live_server, measure_memory_usag
def test_regex_error_handling(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
@@ -46,7 +46,7 @@ def test_filter_doesnt_exist_then_exists_should_get_notification(client, live_se
# And the page has that filter available
# Then I should get a notification
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Give the endpoint time to spin up
time.sleep(1)
@@ -163,15 +163,14 @@ def run_filter_test(client, live_server, content_filter):
os.unlink("test-datastore/notification.txt")
def test_setup(live_server):
live_server_setup(live_server)
def test_check_include_filters_failure_notification(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
# # live_server_setup(live_server) # Setup on conftest per function
run_filter_test(client, live_server,'#nope-doesnt-exist')
def test_check_xpath_filter_failure_notification(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
# # live_server_setup(live_server) # Setup on conftest per function
run_filter_test(client, live_server, '//*[@id="nope-doesnt-exist"]')
# Test that notification is never sent
+9 -9
View File
@@ -6,8 +6,8 @@ from .util import live_server_setup, wait_for_all_checks, extract_rss_token_from
import os
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def set_original_response():
test_return_data = """<html>
@@ -40,7 +40,7 @@ def set_modified_response():
return None
def test_setup_group_tag(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_response()
# Add a tag with some config, import a tag and it should roughly work
@@ -131,7 +131,7 @@ def test_setup_group_tag(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_tag_import_singular(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
res = client.post(
@@ -151,7 +151,7 @@ def test_tag_import_singular(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_tag_add_in_ui(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
#
res = client.post(
url_for("tags.form_tag_add"),
@@ -168,7 +168,7 @@ def test_tag_add_in_ui(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_group_tag_notification(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_response()
test_url = url_for('test_endpoint', _external=True)
@@ -236,7 +236,7 @@ def test_group_tag_notification(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_limit_tag_ui(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
urls=[]
@@ -275,7 +275,7 @@ def test_limit_tag_ui(client, live_server, measure_memory_usage):
assert b'All tags deleted' in res.data
def test_clone_tag_on_import(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("imports.import_page"),
@@ -301,7 +301,7 @@ def test_clone_tag_on_import(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_clone_tag_on_quickwatchform_add(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
@@ -9,7 +9,7 @@ from .util import live_server_setup, wait_for_all_checks
from urllib.parse import urlparse, parse_qs
def test_consistent_history(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
workers = int(os.getenv("FETCH_WORKERS", 10))
r = range(1, 10+workers)
+1 -1
View File
@@ -24,7 +24,7 @@ def set_original_ignore_response():
def test_ignore(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_original_ignore_response()
test_url = url_for('test_endpoint', _external=True)
res = client.post(
@@ -3,8 +3,7 @@
from . util import live_server_setup
from changedetectionio import html_tools
def test_setup(live_server):
live_server_setup(live_server)
# Unit test of the stripper
# Always we are dealing in utf-8
+3 -4
View File
@@ -5,8 +5,7 @@ from flask import url_for
from .util import live_server_setup, wait_for_all_checks
from changedetectionio import html_tools
def test_setup(live_server):
live_server_setup(live_server)
# Unit test of the stripper
# Always we are dealing in utf-8
@@ -256,9 +255,9 @@ def _run_test_global_ignore(client, as_source=False, extra_ignore=""):
assert b'Deleted' in res.data
def test_check_global_ignore_text_functionality(client, live_server):
#live_server_setup(live_server)
_run_test_global_ignore(client, as_source=False)
def test_check_global_ignore_text_functionality_as_source(client, live_server):
#live_server_setup(live_server)
_run_test_global_ignore(client, as_source=True, extra_ignore='/\?v=\d/')
@@ -6,8 +6,7 @@ from flask import url_for
from .util import live_server_setup, wait_for_all_checks
def test_setup(live_server):
live_server_setup(live_server)
def set_original_ignore_response():
test_return_data = """<html>
@@ -5,8 +5,7 @@ from flask import url_for
from .util import live_server_setup, wait_for_all_checks
def test_setup(live_server):
live_server_setup(live_server)
def set_original_response():
@@ -4,8 +4,7 @@ import time
from flask import url_for
from . util import live_server_setup
def test_setup(live_server):
live_server_setup(live_server)
# Should be the same as set_original_ignore_response() but with a little more whitespacing
+4 -4
View File
@@ -8,8 +8,8 @@ from flask import url_for
from .util import live_server_setup, wait_for_all_checks
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_import(client, live_server, measure_memory_usage):
# Give the endpoint time to spin up
@@ -126,7 +126,7 @@ def test_import_distillio(client, live_server, measure_memory_usage):
def test_import_custom_xlsx(client, live_server, measure_memory_usage):
"""Test can upload a excel spreadsheet and the watches are created correctly"""
#live_server_setup(live_server)
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'import/spreadsheet.xlsx')
@@ -175,7 +175,7 @@ def test_import_custom_xlsx(client, live_server, measure_memory_usage):
def test_import_watchete_xlsx(client, live_server, measure_memory_usage):
"""Test can upload a excel spreadsheet and the watches are created correctly"""
#live_server_setup(live_server)
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'import/spreadsheet.xlsx')
with open(filename, 'rb') as f:
+4 -4
View File
@@ -5,12 +5,12 @@ from flask import url_for
from .util import live_server_setup, wait_for_all_checks
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# # live_server_setup(live_server) # Setup on conftest per function
# If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_jinja2_in_url_query(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Add our URL to the import page
test_url = url_for('test_return_query', _external=True)
@@ -35,7 +35,7 @@ def test_jinja2_in_url_query(client, live_server, measure_memory_usage):
# https://techtonics.medium.com/secure-templating-with-jinja2-understanding-ssti-and-jinja2-sandbox-environment-b956edd60456
def test_jinja2_security_url_query(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Add our URL to the import page
test_url = url_for('test_return_query', _external=True)
@@ -12,8 +12,7 @@ try:
except ModuleNotFoundError:
jq_support = False
def test_setup(live_server):
live_server_setup(live_server)
def test_unittest_inline_html_extract():
# So lets pretend that the JSON we want is inside some HTML
+1 -1
View File
@@ -19,7 +19,7 @@ something to trigger<br>
f.write(data)
def test_content_filter_live_preview(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_response()
test_url = url_for('test_endpoint', _external=True)
@@ -27,7 +27,7 @@ def set_zero_byte_response():
def test_check_basic_change_detection_functionality(client, live_server, measure_memory_usage):
set_original_response()
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
# Add our URL to the import page
res = client.post(
@@ -96,6 +96,8 @@ def test_check_basic_change_detection_functionality(client, live_server, measure
res = client.get(url_for("watchlist.index"))
assert b'unviewed' in res.data
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
# A totally zero byte (#2528) response should also not trigger an error
set_zero_byte_response()
+7 -9
View File
@@ -5,8 +5,7 @@ import re
from flask import url_for
from loguru import logger
from .util import set_original_response, set_modified_response, set_more_modified_response, live_server_setup, wait_for_all_checks, \
set_longer_modified_response, get_index
from .util import set_original_response, set_modified_response, set_more_modified_response, live_server_setup, wait_for_all_checks
from . util import extract_UUID_from_client
import logging
import base64
@@ -18,13 +17,12 @@ from changedetectionio.notification import (
valid_notification_formats,
)
def test_setup(live_server):
live_server_setup(live_server)
# Hard to just add more live server URLs when one test is already running (I think)
# So we add our test here (was in a different file)
def test_check_notification(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_response()
# Re 360 - new install should have defaults set
@@ -286,7 +284,7 @@ def test_notification_validation(client, live_server, measure_memory_usage):
def test_notification_custom_endpoint_and_jinja2(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# test_endpoint - that sends the contents of a file
# test_notification_endpoint - that takes a POST and writes it to file (test-datastore/notification.txt)
@@ -331,7 +329,7 @@ def test_notification_custom_endpoint_and_jinja2(client, live_server, measure_me
# Check no errors were recorded, because we asked for 204 which is slightly uncommon but is still OK
res = get_index(client)
res = client.get(url_for("watchlist.index"))
assert b'notification-error' not in res.data
with open("test-datastore/notification.txt", 'r') as f:
@@ -372,7 +370,7 @@ def test_notification_custom_endpoint_and_jinja2(client, live_server, measure_me
#2510
def test_global_send_test_notification(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_response()
if os.path.isfile("test-datastore/notification.txt"):
os.unlink("test-datastore/notification.txt") \
@@ -517,7 +515,7 @@ def _test_color_notifications(client, notification_body_token):
def test_html_color_notifications(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
_test_color_notifications(client, '{{diff}}')
_test_color_notifications(client, '{{diff_full}}')
@@ -6,7 +6,7 @@ import logging
def test_check_notification_error_handling(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
set_original_response()
# Set a URL and fetch it, then set a notification URL which is going to give errors
@@ -60,7 +60,15 @@ def test_check_notification_error_handling(client, live_server, measure_memory_u
# The error should show in the notification logs
res = client.get(
url_for("settings.notification_logs"))
found_name_resolution_error = b"Temporary failure in name resolution" in res.data or b"Name or service not known" in res.data
# Check for various DNS/connection error patterns that may appear in different environments
found_name_resolution_error = (
b"No address found" in res.data or
b"Name or service not known" in res.data or
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
)
assert found_name_resolution_error
# And the working one, which is after the 'broken' one should still have fired
+1 -1
View File
@@ -20,7 +20,7 @@ def set_original_ignore_response():
def test_obfuscations(client, live_server, measure_memory_usage):
set_original_ignore_response()
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
+1 -1
View File
@@ -10,7 +10,7 @@ def test_fetch_pdf(client, live_server, measure_memory_usage):
import shutil
shutil.copy("tests/test.pdf", "test-datastore/endpoint-test.pdf")
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
test_url = url_for('test_pdf_endpoint', _external=True)
# Add our URL to the import page
res = client.post(
@@ -10,7 +10,7 @@ def test_fetch_pdf(client, live_server, measure_memory_usage):
import shutil
shutil.copy("tests/test.pdf", "test-datastore/endpoint-test.pdf")
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
test_url = url_for('test_pdf_endpoint', _external=True)
# Add our URL to the import page
res = client.post(
+5 -6
View File
@@ -4,8 +4,7 @@ import time
from flask import url_for
from . util import set_original_response, set_modified_response, live_server_setup, wait_for_all_checks, extract_UUID_from_client
def test_setup(live_server):
live_server_setup(live_server)
# Hard to just add more live server URLs when one test is already running (I think)
# So we add our test here (was in a different file)
@@ -154,7 +153,7 @@ def test_body_in_request(client, live_server, measure_memory_usage):
follow_redirects=True
)
assert b"1 Imported" in res.data
wait_for_all_checks(client)
watches_with_body = 0
with open('test-datastore/url-watches.json') as f:
app_struct = json.load(f)
@@ -258,7 +257,7 @@ def test_method_in_request(client, live_server, measure_memory_usage):
# Re #2408 - user-agent override test, also should handle case-insensitive header deduplication
def test_ua_global_override(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
## live_server_setup(live_server) # Setup on conftest per function
test_url = url_for('test_headers', _external=True)
res = client.post(
@@ -313,7 +312,7 @@ def test_ua_global_override(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_headers_textfile_in_request(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# Add our URL to the import page
webdriver_ua = "Hello fancy webdriver UA 1.0"
@@ -426,7 +425,7 @@ def test_headers_textfile_in_request(client, live_server, measure_memory_usage):
assert b'Deleted' in res.data
def test_headers_validation(client, live_server):
#live_server_setup(live_server)
test_url = url_for('test_headers', _external=True)
res = client.post(
@@ -44,13 +44,13 @@ def set_original_response(props_markup='', price="121.95"):
def test_setup(client, live_server):
# def test_setup(client, live_server):
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
def test_restock_itemprop_basic(client, live_server):
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
@@ -89,7 +89,7 @@ def test_restock_itemprop_basic(client, live_server):
assert b'Deleted' in res.data
def test_itemprop_price_change(client, live_server):
#live_server_setup(live_server)
# Out of the box 'Follow price changes' should be ON
test_url = url_for('test_endpoint', _external=True)
@@ -114,6 +114,8 @@ def test_itemprop_price_change(client, live_server):
assert b'180.45' in res.data
assert b'unviewed' in res.data
client.get(url_for("ui.mark_all_viewed"), follow_redirects=True)
time.sleep(0.2)
# turning off price change trigger, but it should show the new price, with no change notification
set_original_response(props_markup=instock_props[0], price='120.45')
@@ -214,7 +216,7 @@ def _run_test_minmax_limit(client, extra_watch_edit_form):
def test_restock_itemprop_minmax(client, live_server):
#live_server_setup(live_server)
extras = {
"restock_settings-follow_price_changes": "y",
"restock_settings-price_change_min": 900.0,
@@ -223,7 +225,7 @@ def test_restock_itemprop_minmax(client, live_server):
_run_test_minmax_limit(client, extra_watch_edit_form=extras)
def test_restock_itemprop_with_tag(client, live_server):
#live_server_setup(live_server)
res = client.post(
url_for("tags.form_tag_add"),
@@ -252,7 +254,7 @@ def test_restock_itemprop_with_tag(client, live_server):
def test_itemprop_percent_threshold(client, live_server):
#live_server_setup(live_server)
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
@@ -319,7 +321,7 @@ def test_itemprop_percent_threshold(client, live_server):
def test_change_with_notification_values(client, live_server):
#live_server_setup(live_server)
if os.path.isfile("test-datastore/notification.txt"):
os.unlink("test-datastore/notification.txt")
@@ -387,7 +389,7 @@ def test_change_with_notification_values(client, live_server):
def test_data_sanity(client, live_server):
#live_server_setup(live_server)
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
@@ -437,7 +439,7 @@ def test_data_sanity(client, live_server):
# All examples should give a prive of 666.66
def test_special_prop_examples(client, live_server):
import glob
#live_server_setup(live_server)
test_url = url_for('test_endpoint', _external=True)
check_path = os.path.join(os.path.dirname(__file__), "itemprop_test_examples", "*.txt")
+6 -6
View File
@@ -65,11 +65,11 @@ def set_html_content(content):
with open("test-datastore/endpoint-content.txt", "wb") as f:
f.write(test_return_data.encode('utf-8'))
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_rss_and_token(client, live_server, measure_memory_usage):
# live_server_setup(live_server)
# # live_server_setup(live_server) # Setup on conftest per function
set_original_response()
rss_token = extract_rss_token_from_UI(client)
@@ -107,7 +107,7 @@ def test_rss_and_token(client, live_server, measure_memory_usage):
client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_basic_cdata_rss_markup(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_cdata_xml()
@@ -135,7 +135,7 @@ def test_basic_cdata_rss_markup(client, live_server, measure_memory_usage):
res = client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
def test_rss_xpath_filtering(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
set_original_cdata_xml()
@@ -191,7 +191,7 @@ def test_rss_bad_chars_breaking(client, live_server):
Otherwise feedgen should support regular unicode
"""
#live_server_setup(live_server)
with open("test-datastore/endpoint-content.txt", "w") as f:
ten_kb_string = "A" * 10_000
+4 -4
View File
@@ -6,11 +6,11 @@ from zoneinfo import ZoneInfo
from flask import url_for
from .util import live_server_setup, wait_for_all_checks, extract_UUID_from_client
def test_setup(client, live_server):
live_server_setup(live_server)
# def test_setup(client, live_server):
# live_server_setup(live_server) # Setup on conftest per function
def test_check_basic_scheduler_functionality(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
days = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']
test_url = url_for('test_random_content_endpoint', _external=True)
@@ -92,7 +92,7 @@ def test_check_basic_scheduler_functionality(client, live_server, measure_memory
def test_check_basic_global_scheduler_functionality(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
days = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']
test_url = url_for('test_random_content_endpoint', _external=True)
+3 -4
View File
@@ -2,11 +2,10 @@ from flask import url_for
from .util import set_original_response, set_modified_response, live_server_setup
import time
def test_setup(live_server):
live_server_setup(live_server)
def test_basic_search(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
urls = ['https://localhost:12300?first-result=1',
'https://localhost:5000?second-result=1'
@@ -39,7 +38,7 @@ def test_basic_search(client, live_server, measure_memory_usage):
def test_search_in_tag_limit(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
urls = ['https://localhost:12300?first-result=1 tag-one',
'https://localhost:5000?second-result=1 tag-two'
+5 -5
View File
@@ -5,11 +5,11 @@ from .util import live_server_setup, wait_for_all_checks
from .. import strtobool
def test_setup(client, live_server, measure_memory_usage):
live_server_setup(live_server)
# def test_setup(client, live_server, measure_memory_usage):
# live_server_setup(live_server) # Setup on conftest per function
def test_bad_access(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
res = client.post(
url_for("imports.import_page"),
data={"urls": 'https://localhost'},
@@ -89,7 +89,7 @@ def _runner_test_various_file_slash(client, file_uri):
assert b'Deleted' in res.data
def test_file_slash_access(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
# file: is NOT permitted by default, so it will be caught by ALLOW_FILE_URI check
@@ -99,7 +99,7 @@ def test_file_slash_access(client, live_server, measure_memory_usage):
_runner_test_various_file_slash(client, file_uri=f"file:{test_file_path}") # CVE-2024-56509
def test_xss(client, live_server, measure_memory_usage):
#live_server_setup(live_server)
from changedetectionio.notification import (
default_notification_format
)
+1 -1
View File
@@ -11,7 +11,7 @@ sleep_time_for_fetch_thread = 3
def test_share_watch(client, live_server, measure_memory_usage):
set_original_response()
live_server_setup(live_server)
# live_server_setup(live_server) # Setup on conftest per function
test_url = url_for('test_endpoint', _external=True)
include_filters = ".nice-filter"
+1 -2
View File
@@ -7,8 +7,7 @@ from .util import set_original_response, set_modified_response, live_server_setu
sleep_time_for_fetch_thread = 3
def test_setup(live_server):
live_server_setup(live_server)
def test_check_basic_change_detection_functionality_source(client, live_server, measure_memory_usage):
set_original_response()

Some files were not shown because too many files have changed in this diff Show More