Compare commits

..

21 Commits

Author SHA1 Message Date
dgtlmoon
50fe822d8e Adding random tips
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-04 14:22:26 +01:00
dgtlmoon
a7ea79e1fc Improve test for missing processor 2026-02-04 14:10:30 +01:00
dgtlmoon
9921d303de tweaks for when the processor is not found 2026-02-04 13:56:55 +01:00
dgtlmoon
7832787b84 Improve tests 2026-02-04 13:36:43 +01:00
dgtlmoon
bd5f7187f7 oops 2026-02-04 13:24:33 +01:00
dgtlmoon
e2f9fdb384 Merge branch 'processor-plugin-improvements' of github.com:dgtlmoon/changedetection.io into processor-plugin-improvements 2026-02-04 13:13:10 +01:00
dgtlmoon
261f88b272 Add test 2026-02-04 13:13:00 +01:00
dgtlmoon
8d2e668b42 Fix name 2026-02-04 13:12:57 +01:00
dgtlmoon
c7050077be Apply suggestion from @dgtlmoon
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-03 15:55:17 +01:00
dgtlmoon
5f25e3825c Closes #3814 2026-02-03 15:49:54 +01:00
dgtlmoon
041c1ad531 Better support for capabilities 2026-02-03 15:39:50 +01:00
dgtlmoon
e958acebed Processors can specify what capabilities they support (visual selector, text filters etc) 2026-02-03 15:31:44 +01:00
dgtlmoon
b826d9b236 Bump docs
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-03 13:51:15 +01:00
dgtlmoon
bc3efbff27 Merge branch 'master' into processor-plugin-improvements 2026-02-03 11:12:57 +01:00
dgtlmoon
063ee38099 Merge branch 'master' into processor-plugin-improvements
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-18 00:56:30 +01:00
dgtlmoon
5007b8201e Merge branch 'master' into processor-plugin-improvements
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-15 23:14:30 +01:00
dgtlmoon
7e853a4b46 Configurable paackages 2026-01-14 19:01:30 +01:00
dgtlmoon
4dc5301de4 Downgrade for osint plugin and handle missing processors better 2026-01-14 18:22:23 +01:00
dgtlmoon
edf0989cd4 Put the type badge first in the list 2026-01-14 17:34:54 +01:00
dgtlmoon
423b546948 Refactor of processor extra configs 2026-01-14 17:29:20 +01:00
dgtlmoon
c1c810a79a Misc fixes for processor plugins 2026-01-14 16:59:20 +01:00
20 changed files with 138 additions and 375 deletions

View File

@@ -3,10 +3,10 @@
{% from '_helpers.html' import render_simple_field, render_field %}
<div class="edit-form">
<div class="box-wrap inner">
<h2>{{ _('Backups') }}</h2>
<h4>{{ _('Backups') }}</h4>
{% if backup_running %}
<p>
<span class="spinner"></span>&nbsp;<strong>{{ _('A backup is running!') }}</strong>
<strong>{{ _('A backup is running!') }}</strong>
</p>
{% endif %}
<p>

View File

@@ -25,7 +25,7 @@
<li class="tab"><a href="#ui-options">{{ _('UI Options') }}</a></li>
<li class="tab"><a href="#api">{{ _('API') }}</a></li>
<li class="tab"><a href="#rss">{{ _('RSS') }}</a></li>
<li class="tab"><a href="{{ url_for('backups.index') }}">{{ _('Backups') }}</a></li>
<li class="tab"><a href="{{ url_for('backups.index') }}" class="pure-menu-link">{{ _('Backups') }}</a></li>
<li class="tab"><a href="#timedate">{{ _('Time & Date') }}</a></li>
<li class="tab"><a href="#proxies">{{ _('CAPTCHA & Proxies') }}</a></li>
{% if plugin_tabs %}
@@ -59,14 +59,6 @@
{{ _('Set to') }} <strong>0</strong> {{ _('to disable') }}
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.history_snapshot_max_length, class="history_snapshot_max_length") }}
<span class="pure-form-message-inline">{{ _('Limit collection of history snapshots for each watch to this number of history items.') }}
<br>
{{ _('Set to empty to disable / no limit') }}
</span>
</div>
<div class="pure-control-group">
{% if not hide_remove_pass %}
{% if current_user.is_authenticated %}

View File

@@ -197,7 +197,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Recast it if need be to right data Watch handler
watch_class = processors.get_custom_watch_obj_for_processor(form.data.get('processor'))
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore.data, default=datastore.data['watching'][uuid])
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, default=datastore.data['watching'][uuid])
flash(gettext("Updated watch - unpaused!") if request.args.get('unpause_on_save') else gettext("Updated watch."))
# Cleanup any browsersteps session for this watch

View File

@@ -115,13 +115,6 @@
{{ _('Sends a notification when the filter can no longer be seen on the page, good for knowing when the page changed and your filter will not work anymore.') }}
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.history_snapshot_max_length, class="history_snapshot_max_length") }}
<span class="pure-form-message-inline">{{ _('Limit collection of history snapshots for each watch to this number of history items.') }}
<br>
{{ _('Set to empty to use system settings default') }}
</span>
</div>
<div class="pure-control-group">
{{ render_ternary_field(form.use_page_title_in_list) }}
</div>

View File

@@ -26,7 +26,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
add_paused = request.form.get('edit_and_watch_submit_button') != None
from changedetectionio import processors
processor = request.form.get('processor', processors.get_default_processor())
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags','').strip(), extras={'paused': add_paused, 'processor': processor})
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags').strip(), extras={'paused': add_paused, 'processor': processor})
if new_uuid:
if add_paused:
@@ -39,4 +39,4 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
return redirect(url_for('watchlist.index', tag=request.args.get('tag','')))
return views_blueprint
return views_blueprint

View File

@@ -1,4 +1,3 @@
import asyncio
import gc
import json
import os
@@ -350,7 +349,12 @@ class fetcher(Fetcher):
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Finally block will handle cleanup
# Cleanup before raising to prevent memory leak
await self.page.close()
await context.close()
await browser.close()
# Force garbage collection to release Playwright resources immediately
gc.collect()
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
@@ -366,7 +370,12 @@ class fetcher(Fetcher):
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
# Finally block will handle cleanup
try:
await context.close()
await browser.close()
except Exception as e:
# Fine, could be messy situation
pass
raise
await self.page.wait_for_timeout(extra_wait * 1000)
@@ -415,40 +424,35 @@ class fetcher(Fetcher):
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
# Clean up resources properly with timeouts to prevent hanging
# Request garbage collection one more time before closing
try:
if hasattr(self, 'page') and self.page:
await self.page.request_gc()
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"Successfully closed page for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing page for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing page for {url}: {e}")
finally:
self.page = None
await self.page.request_gc()
except:
pass
# Clean up resources properly
try:
await self.page.request_gc()
except:
pass
try:
if context:
await asyncio.wait_for(context.close(), timeout=5.0)
logger.debug(f"Successfully closed context for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing context for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing context for {url}: {e}")
finally:
context = None
await self.page.close()
except:
pass
self.page = None
try:
if browser:
await asyncio.wait_for(browser.close(), timeout=5.0)
logger.debug(f"Successfully closed browser connection for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing browser connection for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing browser for {url}: {e}")
finally:
browser = None
await context.close()
except:
pass
context = None
try:
await browser.close()
except:
pass
browser = None
# Force Python GC to release Playwright resources immediately
# Playwright objects can have circular references that delay cleanup

View File

@@ -1,5 +1,4 @@
import asyncio
import gc
import json
import os
import websockets.exceptions
@@ -222,36 +221,19 @@ class fetcher(Fetcher):
self.browser_connection_url += f"{r}--proxy-server={proxy_url}"
async def quit(self, watch=None):
watch_uuid = watch.get('uuid') if watch else 'unknown'
# Close page
try:
if hasattr(self, 'page') and self.page:
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"[{watch_uuid}] Page closed successfully")
except asyncio.TimeoutError:
logger.warning(f"[{watch_uuid}] Timed out closing page (5s)")
await self.page.close()
del self.page
except Exception as e:
logger.warning(f"[{watch_uuid}] Error closing page: {e}")
finally:
self.page = None
pass
# Close browser connection
try:
if hasattr(self, 'browser') and self.browser:
await asyncio.wait_for(self.browser.close(), timeout=5.0)
logger.debug(f"[{watch_uuid}] Browser closed successfully")
except asyncio.TimeoutError:
logger.warning(f"[{watch_uuid}] Timed out closing browser (5s)")
await self.browser.close()
del self.browser
except Exception as e:
logger.warning(f"[{watch_uuid}] Error closing browser: {e}")
finally:
self.browser = None
pass
logger.info(f"[{watch_uuid}] Cleanup puppeteer complete")
# Force garbage collection to release resources
gc.collect()
logger.info("Cleanup puppeteer complete.")
async def fetch_page(self,
current_include_filters,
@@ -281,11 +263,9 @@ class fetcher(Fetcher):
# Connect directly using the specified browser_ws_endpoint
# @todo timeout
try:
logger.debug(f"[{watch_uuid}] Connecting to browser at {self.browser_connection_url}")
self.browser = await pyppeteer_instance.connect(browserWSEndpoint=self.browser_connection_url,
ignoreHTTPSErrors=True
)
logger.debug(f"[{watch_uuid}] Browser connected successfully")
except websockets.exceptions.InvalidStatusCode as e:
raise BrowserConnectError(msg=f"Error while trying to connect the browser, Code {e.status_code} (check your access, whitelist IP, password etc)")
except websockets.exceptions.InvalidURI:
@@ -294,18 +274,7 @@ class fetcher(Fetcher):
raise BrowserConnectError(msg=f"Error connecting to the browser - Exception '{str(e)}'")
# more reliable is to just request a new page
try:
logger.debug(f"[{watch_uuid}] Creating new page")
self.page = await self.browser.newPage()
logger.debug(f"[{watch_uuid}] Page created successfully")
except Exception as e:
logger.error(f"[{watch_uuid}] Failed to create new page: {e}")
# Browser is connected but page creation failed - must cleanup browser
try:
await asyncio.wait_for(self.browser.close(), timeout=3.0)
except Exception as cleanup_error:
logger.error(f"[{watch_uuid}] Failed to cleanup browser after page creation failure: {cleanup_error}")
raise
self.page = await self.browser.newPage()
# Add console handler to capture console.log from favicon fetcher
#self.page.on('console', lambda msg: logger.debug(f"Browser console [{msg.type}]: {msg.text}"))
@@ -374,12 +343,6 @@ class fetcher(Fetcher):
w = extra_wait - 2 if extra_wait > 4 else 2
logger.debug(f"Waiting {w} seconds before calling Page.stopLoading...")
await asyncio.sleep(w)
# Check if page still exists (might have been closed due to error during sleep)
if not self.page or not hasattr(self.page, '_client'):
logger.debug("Page already closed, skipping stopLoading")
return
logger.debug("Issuing stopLoading command...")
await self.page._client.send('Page.stopLoading')
logger.debug("stopLoading command sent!")
@@ -405,9 +368,7 @@ class fetcher(Fetcher):
asyncio.create_task(handle_frame_navigation())
response = await self.page.goto(url, timeout=0)
await asyncio.sleep(1 + extra_wait)
# Check if page still exists before sending command
if self.page and hasattr(self.page, '_client'):
await self.page._client.send('Page.stopLoading')
await self.page._client.send('Page.stopLoading')
if response:
break
@@ -476,9 +437,15 @@ class fetcher(Fetcher):
logger.debug(f"Screenshot format {self.screenshot_format}")
self.screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force garbage collection - pyppeteer base64 decode creates temporary buffers
# Force aggressive memory cleanup - pyppeteer base64 decode creates temporary buffers
import gc
gc.collect()
# Release C-level memory from base64 decode back to OS
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT

View File

@@ -837,8 +837,6 @@ class processor_text_json_diff_form(commonSettingsForm):
conditions = FieldList(FormField(ConditionFormRow), min_entries=1) # Add rule logic here
use_page_title_in_list = TernaryNoneBooleanField(_l('Use page <title> in list'), default=None)
history_snapshot_max_length = IntegerField(_l('Number of history items per watch to keep'), render_kw={"style": "width: 5em;"}, validators=[validators.Optional(), validators.NumberRange(min=2)])
def extra_tab_content(self):
return None
@@ -1036,8 +1034,6 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,
message=_l("Should contain zero or more attempts"))])
history_snapshot_max_length = IntegerField(_l('Number of history items per watch to keep'), render_kw={"style": "width: 5em;"}, validators=[validators.Optional(), validators.NumberRange(min=2)])
ui = FormField(globalSettingsApplicationUIForm)

View File

@@ -52,13 +52,7 @@ def render(template_str, **args: t.Any) -> str:
return output[:JINJA2_MAX_RETURN_PAYLOAD_SIZE]
def render_fully_escaped(content):
"""
Escape HTML content safely.
MEMORY LEAK FIX: Use markupsafe.escape() directly instead of creating
Jinja2 environments (was causing 1M+ compilations per page load).
Simpler, faster, and no concerns about environment state.
"""
from markupsafe import escape
return str(escape(content))
env = jinja2.sandbox.ImmutableSandboxedEnvironment(autoescape=True)
template = env.from_string("{{ some_html|e }}")
return template.render(some_html=content)

View File

@@ -46,7 +46,6 @@ class model(dict):
'filter_failure_notification_threshold_attempts': _FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT,
'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
'global_subtractive_selectors': [],
'history_snapshot_max_length': None,
'ignore_whitespace': True,
'ignore_status_codes': False, #@todo implement, as ternary.
'ssim_threshold': '0.96', # Default SSIM threshold for screenshot comparison

View File

@@ -1,6 +1,3 @@
import gc
from copy import copy
from blinker import signal
from changedetectionio.validate_url import is_safe_valid_url
@@ -16,7 +13,7 @@ from .. import jinja2_custom as safe_jinja
from ..html_tools import TRANSLATE_WHITESPACE_TABLE
FAVICON_RESAVE_THRESHOLD_SECONDS=86400
BROTLI_COMPRESS_SIZE_THRESHOLD = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024*20))
BROTLI_COMPRESS_SIZE_THRESHOLD = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024))
minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 3))
mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
@@ -112,15 +109,8 @@ class model(watch_base):
self.__datastore_path = kw.get('datastore_path')
if kw.get('datastore_path'):
del kw['datastore_path']
self.__datastore = kw.get('__datastore')
if not self.__datastore:
raise ValueError("Watch object requires '__datastore' reference - cannot access global settings without it")
if kw.get('__datastore'):
del kw['__datastore']
super(model, self).__init__(*arg, **kw)
if kw.get('default'):
self.update(kw['default'])
del kw['default']
@@ -433,49 +423,16 @@ class model(watch_base):
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
return f.read()
def _write_atomic(self, dest, data, mode='wb'):
def _write_atomic(self, dest, data):
"""Write data atomically to dest using a temp file"""
import tempfile
with tempfile.NamedTemporaryFile(mode, delete=False, dir=self.watch_data_dir) as tmp:
tmp.write(data)
tmp.flush()
os.fsync(tmp.fileno())
tmp_path = tmp.name
os.replace(tmp_path, dest)
def history_trim(self, newest_n_items):
from pathlib import Path
# Sort by timestamp (key)
sorted_items = sorted(self.history.items(), key=lambda x: int(x[0]))
keep_part = dict(sorted_items[-newest_n_items:])
delete_part = dict(sorted_items[:-newest_n_items])
logger.info( f"[{self.get('uuid')}] Trimming history to most recent {newest_n_items} items, keeping {len(keep_part)} items deleting {len(delete_part)} items.")
if delete_part:
for item in delete_part.items():
try:
Path(item[1]).unlink(missing_ok=True)
except Exception as e:
logger.critical(f"{str(e)}")
finally:
logger.debug(f"[{self.get('uuid')}] Deleted {item[1]} history snapshot")
try:
dest = os.path.join(self.watch_data_dir, self.history_index_filename)
output = "\r\n".join(
f"{k},{Path(v).name}"
for k, v in keep_part.items()
)+"\r\n"
self._write_atomic(dest=dest, data=output, mode='w')
except Exception as e:
logger.critical(f"{str(e)}")
finally:
logger.debug(f"[{self.get('uuid')}] Updated history index {dest}")
# reimport
bump = self.history
gc.collect()
if not os.path.exists(dest):
import tempfile
with tempfile.NamedTemporaryFile('wb', delete=False, dir=self.watch_data_dir) as tmp:
tmp.write(data)
tmp.flush()
os.fsync(tmp.fileno())
tmp_path = tmp.name
os.replace(tmp_path, dest)
# Save some text file to the appropriate path and bump the history
# result_obj from fetch_site_status.run()
@@ -484,6 +441,7 @@ class model(watch_base):
logger.trace(f"{self.get('uuid')} - Updating {self.history_index_filename} with timestamp {timestamp}")
self.ensure_data_dir_exists()
skip_brotli = strtobool(os.getenv('DISABLE_BROTLI_TEXT_SNAPSHOT', 'False'))
# Binary data - detect file type and save without compression
@@ -543,15 +501,6 @@ class model(watch_base):
self.__newest_history_key = timestamp
self.__history_n += 1
maxlen = (
self.get('history_snapshot_max_length')
or (self.__datastore and self.__datastore['settings']['application'].get('history_snapshot_max_length'))
)
if maxlen and self.__history_n and self.__history_n > maxlen:
self.history_trim(newest_n_items=maxlen)
# @todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
return snapshot_fname
@@ -664,11 +613,6 @@ class model(watch_base):
try:
with open(fname, 'wb') as f:
f.write(decoded)
# Invalidate favicon filename cache
if hasattr(self, '_favicon_filename_cache'):
delattr(self, '_favicon_filename_cache')
# A signal that could trigger the socket server to update the browser also
watch_check_update = signal('watch_favicon_bump')
if watch_check_update:
@@ -685,32 +629,20 @@ class model(watch_base):
Find any favicon.* file in the current working directory
and return the contents of the newest one.
MEMORY LEAK FIX: Cache the result to avoid repeated glob.glob() operations.
glob.glob() causes millions of fnmatch allocations when called for every watch on page load.
Returns:
str: Basename of the newest favicon file, or None if not found.
bytes: Contents of the newest favicon file, or None if not found.
"""
# Check cache first (prevents 26M+ allocations from repeated glob operations)
cache_key = '_favicon_filename_cache'
if hasattr(self, cache_key):
return getattr(self, cache_key)
import glob
# Search for all favicon.* files
files = glob.glob(os.path.join(self.watch_data_dir, "favicon.*"))
if not files:
result = None
else:
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
result = os.path.basename(newest_file)
return None
# Cache the result
setattr(self, cache_key, result)
return result
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
return os.path.basename(newest_file)
def get_screenshot_as_thumbnail(self, max_age=3200):
"""Return path to a square thumbnail of the most recent screenshot.

View File

@@ -32,7 +32,6 @@ class watch_base(dict):
'filter_text_replaced': True,
'follow_price_changes': True,
'has_ldjson_price_data': None,
'history_snapshot_max_length': None,
'headers': {}, # Extra headers to send
'ignore_text': [], # List of text to ignore when calculating the comparison checksum
'ignore_status_codes': None,

View File

@@ -193,17 +193,18 @@ class perform_site_check(difference_detection_processor):
itemprop_availability = {}
multiple_prices_found = False
# Try built-in extraction first, this will scan metadata in the HTML
try:
itemprop_availability = get_itemprop_availability(self.fetcher.content)
except MoreThanOnePriceFound as e:
# Don't raise immediately - let plugins try to handle this case
# Plugins might be able to determine which price is correct
logger.warning(f"Built-in detection found multiple prices on {watch.get('url')}, will try plugin override")
multiple_prices_found = True
itemprop_availability = {}
# Add the real data
raise ProcessorException(message="Cannot run, more than one price detected, this plugin is only for product pages with ONE product, try the content-change detection mode.",
url=watch.get('url'),
status_code=self.fetcher.get_last_status_code(),
screenshot=self.fetcher.screenshot,
xpath_data=self.fetcher.xpath_data
)
# If built-in extraction didn't get both price AND availability, try plugin override
# Only check plugin if this watch is using a fetcher that might provide better data
@@ -215,21 +216,9 @@ class perform_site_check(difference_detection_processor):
from changedetectionio.pluggy_interface import get_itemprop_availability_from_plugin
fetcher_name = watch.get('fetch_backend', 'html_requests')
# Resolve 'system' to the actual fetcher being used
# This allows plugins to work even when watch uses "system settings default"
if fetcher_name == 'system':
# Get the actual fetcher that was used (from self.fetcher)
# Fetcher class name gives us the actual backend (e.g., 'html_requests', 'html_webdriver')
actual_fetcher = type(self.fetcher).__name__
if 'html_requests' in actual_fetcher.lower():
fetcher_name = 'html_requests'
elif 'webdriver' in actual_fetcher.lower() or 'playwright' in actual_fetcher.lower():
fetcher_name = 'html_webdriver'
logger.debug(f"Resolved 'system' fetcher to actual fetcher: {fetcher_name}")
# Try plugin override - plugins can decide if they support this fetcher
if fetcher_name:
logger.debug(f"Calling extra plugins for getting item price/availability (fetcher: {fetcher_name})")
# Only try plugin override if not using system default (which might be anything)
if fetcher_name and fetcher_name != 'system':
logger.debug("Calling extra plugins for getting item price/availability")
plugin_availability = get_itemprop_availability_from_plugin(self.fetcher.content, fetcher_name, self.fetcher, watch.link)
if plugin_availability:
@@ -244,16 +233,6 @@ class perform_site_check(difference_detection_processor):
if not plugin_availability:
logger.debug("No item price/availability from plugins")
# If we had multiple prices and plugins also failed, NOW raise the exception
if multiple_prices_found and not itemprop_availability.get('price'):
raise ProcessorException(
message="Cannot run, more than one price detected, this plugin is only for product pages with ONE product, try the content-change detection mode.",
url=watch.get('url'),
status_code=self.fetcher.get_last_status_code(),
screenshot=self.fetcher.screenshot,
xpath_data=self.fetcher.xpath_data
)
# Something valid in get_itemprop_availability() by scraping metadata ?
if itemprop_availability.get('price') or itemprop_availability.get('availability'):
# Store for other usage

View File

@@ -64,7 +64,7 @@ def prepare_filter_prevew(datastore, watch_uuid, form_data):
# Only update vars that came in via the AJAX post
p = {k: v for k, v in form.data.items() if k in form_data.keys()}
tmp_watch.update(p)
blank_watch_no_filters = watch_model(datastore_path=datastore.datastore_path, __datastore=datastore.data)
blank_watch_no_filters = watch_model()
blank_watch_no_filters['url'] = tmp_watch.get('url')
latest_filename = next(reversed(tmp_watch.history))

View File

@@ -67,7 +67,7 @@ echo "-------------------- Running rest of tests in parallel -------------------
# REMOVE_REQUESTS_OLD_SCREENSHOTS disabled so that we can write a screenshot and send it in test_notifications.py without a real browser
FETCH_WORKERS=2 REMOVE_REQUESTS_OLD_SCREENSHOTS=false \
pytest tests/test_*.py \
-n 8 \
-n 18 \
--dist=load \
-vvv \
-s \

View File

@@ -9,14 +9,18 @@ from flask import (
)
from flask_babel import gettext
from ..model import App, Watch
from copy import deepcopy
from ..blueprint.rss import RSS_CONTENT_FORMAT_DEFAULT
from ..html_tools import TRANSLATE_WHITESPACE_TABLE
from ..model import App, Watch, USE_SYSTEM_DEFAULT_NOTIFICATION_FORMAT_FOR_WATCH
from copy import deepcopy, copy
from os import path, unlink
from threading import Lock
import json
import os
import re
import secrets
import sys
import threading
import time
import uuid as uuid_builder
from loguru import logger
@@ -31,6 +35,7 @@ except ImportError:
HAS_ORJSON = False
from ..processors import get_custom_watch_obj_for_processor
from ..processors.restock_diff import Restock
# Import the base class and helpers
from .file_saving_datastore import FileSavingDataStore, load_all_watches, save_watch_atomic, save_json_atomic
@@ -132,19 +137,6 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
)
logger.info(f"Tag: {uuid} {tag['title']}")
def _rehydrate_watches(self):
"""Rehydrate watch entities from stored data (converts dicts to Watch objects)."""
watch_count = len(self.__data.get('watching', {}))
if watch_count == 0:
return
logger.info(f"Rehydrating {watch_count} watches...")
watching_rehydrated = {}
for uuid, watch_dict in self.__data.get('watching', {}).items():
watching_rehydrated[uuid] = self.rehydrate_entity(uuid, watch_dict)
self.__data['watching'] = watching_rehydrated
logger.success(f"Rehydrated {watch_count} watches into Watch objects")
def _load_state(self):
"""
@@ -182,7 +174,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.json_store_path = os.path.join(self.datastore_path, "changedetection.json")
# Base definition for all watchers (deepcopy part of #569)
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, __datastore=self.__data, default={}))
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, default={}))
# Load build SHA if available (Docker deployments)
if path.isfile('changedetectionio/source.txt'):
@@ -226,29 +218,23 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
logger.critical(f"Legacy datastore detected at {self.datastore_path}/url-watches.json")
logger.critical("Migration will be triggered via update_26")
# Load the legacy datastore
# Load the legacy datastore to get its schema_version
from .legacy_loader import load_legacy_format
legacy_path = os.path.join(self.datastore_path, "url-watches.json")
legacy_data = load_legacy_format(legacy_path)
with open(legacy_path) as f:
self.__data = json.load(f)
if not legacy_data:
if not self.__data:
raise Exception("Failed to load legacy datastore from url-watches.json")
# Store the loaded data
self.__data = legacy_data
# CRITICAL: Rehydrate watches from dicts into Watch objects
# This ensures watches have their methods available during migration
self._rehydrate_watches()
# update_26 will save watches to individual files and create changedetection.json
# Next startup will load from new format normally
# update_26 will load the legacy data again and migrate to new format
# Only run updates AFTER the legacy schema version (e.g., if legacy is at 25, only run 26+)
self.run_updates()
else:
# Fresh install - create new datastore
logger.warning(f"No datastore found, creating new datastore at {self.datastore_path}")
logger.critical(f"No datastore found, creating new datastore at {self.datastore_path}")
# Set schema version to latest (no updates needed)
updates_available = self.get_updates_available()
@@ -322,7 +308,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
if entity.get('processor') != 'text_json_diff':
logger.trace(f"Loading Watch object '{watch_class.__module__}.{watch_class.__name__}' for UUID {uuid}")
entity = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, default=entity)
entity = watch_class(datastore_path=self.datastore_path, default=entity)
return entity
# ============================================================================
@@ -653,7 +639,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# If the processor also has its own Watch implementation
watch_class = get_custom_watch_obj_for_processor(apply_extras.get('processor'))
new_watch = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, url=url)
new_watch = watch_class(datastore_path=self.datastore_path, url=url)
new_uuid = new_watch.get('uuid')

View File

@@ -182,86 +182,3 @@ def test_check_text_history_view(client, live_server, measure_memory_usage, data
assert b'test-one' not in res.data
delete_all_watches(client)
def test_history_trim_global_only(client, live_server, measure_memory_usage, datastore_path):
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
uuid = None
limit = 3
for i in range(0, 10):
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(f"<html>test {i}</html>")
if not uuid:
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
if i ==8:
watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
history_n = len(list(watch.history.keys()))
logger.debug(f"History length should be at limit {limit} and it is {history_n}")
assert history_n == limit
if i == 6:
res = client.post(
url_for("settings.settings_page"),
data={"application-history_snapshot_max_length": limit},
follow_redirects=True
)
# It will need to detect one more change to start trimming it, which is really at 'start of 7'
assert b'Settings updated' in res.data
delete_all_watches(client)
def test_history_trim_global_override_in_watch(client, live_server, measure_memory_usage, datastore_path):
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
uuid = None
limit = 3
res = client.post(
url_for("settings.settings_page"),
data={"application-history_snapshot_max_length": 10000},
follow_redirects=True
)
# It will need to detect one more change to start trimming it, which is really at 'start of 7'
assert b'Settings updated' in res.data
for i in range(0, 10):
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(f"<html>test {i}</html>")
if not uuid:
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests",
"time_between_check_use_default": "y", "history_snapshot_max_length": str(limit)},
follow_redirects=True
)
assert b"Updated watch." in res.data
wait_for_all_checks(client)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
if i == 8:
watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
history_n = len(list(watch.history.keys()))
logger.debug(f"History length should be at limit {limit} and it is {history_n}")
assert history_n == limit
if i == 6:
res = client.post(
url_for("settings.settings_page"),
data={"application-history_snapshot_max_length": limit},
follow_redirects=True
)
# It will need to detect one more change to start trimming it, which is really at 'start of 7'
assert b'Settings updated' in res.data
delete_all_watches(client)

View File

@@ -13,14 +13,7 @@ class TestDiffBuilder(unittest.TestCase):
def test_watch_get_suggested_from_diff_timestamp(self):
import uuid as uuid_builder
# Create minimal mock datastore for tests
mock_datastore = {
'settings': {
'application': {}
},
'watching': {}
}
watch = Watch.model(datastore_path='/tmp', __datastore=mock_datastore, default={})
watch = Watch.model(datastore_path='/tmp', default={})
watch.ensure_data_dir_exists()
@@ -56,7 +49,7 @@ class TestDiffBuilder(unittest.TestCase):
assert p == "109", "Correct when its the same time"
# new empty one
watch = Watch.model(datastore_path='/tmp', __datastore=mock_datastore, default={})
watch = Watch.model(datastore_path='/tmp', default={})
p = watch.get_from_version_based_on_last_viewed
assert p == None, "None when no history available"

View File

@@ -370,7 +370,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
except Exception as e:
import traceback
logger.error(f"Worker {worker_id} exception processing watch UUID: {uuid}")
logger.exception(f"Worker {worker_id} full exception details:")
logger.error(str(e))
logger.error(traceback.format_exc())
datastore.update_watch(uuid=uuid, update_obj={'last_error': "Exception: " + str(e)})
process_changedetection_results = False
@@ -434,9 +435,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
await send_content_changed_notification(uuid, notification_q, datastore)
except Exception as e:
logger.critical(f"Worker {worker_id} exception in process_changedetection_results")
logger.exception(f"Worker {worker_id} full exception details:")
logger.critical(str(e))
datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})
# Always record attempt count
@@ -451,7 +451,6 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
logger.debug(f"UUID: {uuid} Page <title> is '{page_title}'")
datastore.update_watch(uuid=uuid, update_obj={'page_title': page_title})
except Exception as e:
logger.exception(f"Worker {worker_id} full exception details:")
logger.warning(f"UUID: {uuid} Exception when extracting <title> - {str(e)}")
# Record server header
@@ -481,15 +480,22 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
del update_handler
update_handler = None
# Force garbage collection
# Force aggressive memory cleanup after clearing
import gc
gc.collect()
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
except Exception as e:
import traceback
logger.error(traceback.format_exc())
logger.error(f"Worker {worker_id} unexpected error processing {uuid}: {e}")
logger.exception(f"Worker {worker_id} full exception details:")
logger.error(f"Worker {worker_id} traceback:", exc_info=True)
# Also update the watch with error information
if datastore and uuid in datastore.data['watching']:
datastore.update_watch(uuid=uuid, update_obj={'last_error': f"Worker error: {str(e)}"})
@@ -497,43 +503,49 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
finally:
# Always cleanup - this runs whether there was an exception or not
if uuid:
# Call quit() as backup (Puppeteer/Playwright have internal cleanup, but this acts as safety net)
try:
if update_handler and hasattr(update_handler, 'fetcher') and update_handler.fetcher:
await update_handler.fetcher.quit(watch=watch)
except Exception as e:
logger.error(f"Exception while cleaning/quit after calling browser: {e}")
logger.exception(f"Worker {worker_id} full exception details:")
try:
# Release UUID from processing (thread-safe)
worker_pool.release_uuid_from_processing(uuid, worker_id=worker_id)
# Send completion signal
if watch:
#logger.info(f"Worker {worker_id} sending completion signal for UUID {watch['uuid']}")
watch_check_update.send(watch_uuid=watch['uuid'])
# Clean up all memory references BEFORE garbage collection
# Explicitly clean up update_handler and all its references
if update_handler:
# Clear fetcher content using the proper method
if hasattr(update_handler, 'fetcher') and update_handler.fetcher:
update_handler.fetcher.clear_content()
# Clear processor references
if hasattr(update_handler, 'content_processor'):
update_handler.content_processor = None
del update_handler
update_handler = None
# Clear large content variables
# Clear local contents variable if it still exists
if 'contents' in locals():
del contents
# Force garbage collection after all references are cleared
# Note: We don't set watch = None here because:
# 1. watch is just a local reference to datastore.data['watching'][uuid]
# 2. Setting it to None doesn't affect the datastore
# 3. GC can't collect the object anyway (still referenced by datastore)
# 4. It would just cause confusion
# Force garbage collection after cleanup
import gc
gc.collect()
logger.debug(f"Worker {worker_id} completed watch {uuid} in {time.time()-fetch_start_time:.2f}s")
except Exception as cleanup_error:
logger.error(f"Worker {worker_id} error during cleanup: {cleanup_error}")
logger.exception(f"Worker {worker_id} full exception details:")
del(uuid)

View File

@@ -70,7 +70,7 @@ lxml >=4.8.0,!=5.2.0,!=5.2.1,<7
# XPath 2.0-3.1 support - 4.2.0 had issues, 4.1.5 stable
# Consider updating to latest stable version periodically
elementpath==5.1.1
elementpath==5.1.0
# For fast image comparison in screenshot change detection
# opencv-python-headless is OPTIONAL (excluded from requirements.txt)