UI - Ability to download a complete data package (.zip) of a watch

Disable content compression of HTML/etc by default due to memory leak between flask_socketio and flask and flask_compress.
Avoid reprocessing if the page was the same (#3867 )
2026-05-01 15:20:33 +00:00 · 2026-02-15 10:39:14 +01:00 · 2026-02-15 08:19:29 +01:00 · 2026-02-14 21:24:28 +01:00 · 2026-02-13 17:43:43 +01:00 · 2026-02-13 16:30:59 +01:00
42 changed files with 3075 additions and 656 deletions
@@ -103,7 +103,7 @@ jobs:
          docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_watch_model'
          docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_jinja2_security'
          docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_semver'
-          docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_html_to_text'
+          docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_html_to_text'         

  # Basic pytest tests with ancillary services
  basic-tests:
@@ -516,3 +516,142 @@ jobs:
            exit 1
          fi
          docker rm sig-test
+
+  # Upgrade path test
+  upgrade-path-test:
+    runs-on: ubuntu-latest
+    needs: build
+    timeout-minutes: 25
+    env:
+      PYTHON_VERSION: ${{ inputs.python-version }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0  # Fetch all history and tags for upgrade testing
+
+      - name: Set up Python ${{ env.PYTHON_VERSION }}
+        uses: actions/setup-python@v6
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+
+      - name: Check upgrade works without error
+        run: |
+          echo "=== Testing upgrade path from 0.49.1 to ${{ github.ref_name }} (${{ github.sha }}) ==="
+
+          # Checkout old version and create datastore
+          git checkout 0.49.1
+          python3 -m venv .venv
+          source .venv/bin/activate
+          pip install -r requirements.txt
+          pip install 'pyOpenSSL>=23.2.0'
+
+          echo "=== Running version 0.49.1 to create datastore ==="
+          python3 ./changedetection.py -C -d /tmp/data &
+          APP_PID=$!
+
+          # Wait for app to be ready
+          echo "Waiting for 0.49.1 to be ready..."
+          sleep 6
+
+          # Extract API key from datastore (0.49.1 uses url-watches.json)
+          API_KEY=$(jq -r '.settings.application.api_access_token // empty' /tmp/data/url-watches.json)
+          echo "API Key: ${API_KEY:0:8}..."
+
+          # Create a watch with tag "github-group-test" via API
+          echo "Creating test watch with tag via API..."
+          curl -X POST "http://127.0.0.1:5000/api/v1/watch" \
+            -H "x-api-key: ${API_KEY}" \
+            -H "Content-Type: application/json" \
+            --show-error --fail \
+            --retry 6 --retry-delay 1 --retry-connrefused \
+            -d '{
+              "url": "https://example.com/upgrade-test",
+              "tag": "github-group-test"
+            }'
+
+          echo "✓ Created watch with tag 'github-group-test'"
+
+          # Create a specific test URL watch
+          echo "Creating test URL watch via API..."
+          curl -X POST "http://127.0.0.1:5000/api/v1/watch" \
+            -H "x-api-key: ${API_KEY}" \
+            -H "Content-Type: application/json" \
+            --show-error --fail \
+            -d '{
+              "url": "http://localhost/test.txt"
+            }'
+
+          echo "✓ Created watch for 'http://localhost/test.txt' in version 0.49.1"
+
+          # Stop the old version gracefully
+          kill $APP_PID
+          wait $APP_PID || true
+          echo "✓ Version 0.49.1 stopped"
+
+          # Upgrade to current version (use commit SHA since we're in detached HEAD)
+          echo "Upgrading to commit ${{ github.sha }}"
+          git checkout ${{ github.sha }}
+          pip install -r requirements.txt
+
+          echo "=== Running current version (commit ${{ github.sha }}) with old datastore (testing mode) ==="
+          TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD=1 python3 ./changedetection.py -d /tmp/data > /tmp/upgrade-test.log 2>&1
+
+          echo "=== Upgrade test output ==="
+          cat /tmp/upgrade-test.log
+          echo "✓ Datastore upgraded successfully"
+
+          # Now start the current version normally to verify the tag survived
+          echo "=== Starting current version to verify tag exists after upgrade ==="
+          timeout 20 python3 ./changedetection.py -d /tmp/data > /tmp/ui-test.log 2>&1 &
+          APP_PID=$!
+
+          # Wait for app to be ready and fetch UI
+          echo "Waiting for current version to be ready..."
+          sleep 5
+          curl --retry 6 --retry-delay 1 --retry-connrefused --silent http://127.0.0.1:5000 > /tmp/ui-output.html
+
+          # Verify tag exists in UI
+          if grep -q "github-group-test" /tmp/ui-output.html; then
+            echo "✓ Tag 'github-group-test' found in UI after upgrade"
+          else
+            echo "ERROR: Tag 'github-group-test' not found in UI after upgrade"
+            echo "=== UI Output ==="
+            cat /tmp/ui-output.html
+            echo "=== App Log ==="
+            cat /tmp/ui-test.log
+            kill $APP_PID || true
+            exit 1
+          fi
+
+          # Verify test URL exists in UI
+          if grep -q "http://localhost/test.txt" /tmp/ui-output.html; then
+            echo "✓ Watch URL 'http://localhost/test.txt' found in UI after upgrade"
+          else
+            echo "ERROR: Watch URL 'http://localhost/test.txt' not found in UI after upgrade"
+            echo "=== UI Output ==="
+            cat /tmp/ui-output.html
+            echo "=== App Log ==="
+            cat /tmp/ui-test.log
+            kill $APP_PID || true
+            exit 1
+          fi
+
+          # Cleanup
+          kill $APP_PID || true
+          wait $APP_PID || true
+
+          echo ""
+          echo "✓✓✓ Upgrade test passed: 0.49.1 → ${{ github.ref_name }} ✓✓✓"
+          echo "    - Commit: ${{ github.sha }}"
+          echo "    - Datastore migrated successfully"
+          echo "    - Tag 'github-group-test' survived upgrade"
+          echo "    - Watch URL 'http://localhost/test.txt' survived upgrade"
+
+          echo "✓ Upgrade test passed: 0.49.1 → ${{ github.ref_name }}"
+
+      - name: Upload upgrade test logs
+        if: always()
+        uses: actions/upload-artifact@v6
+        with:
+          name: upgrade-test-logs-py${{ env.PYTHON_VERSION }}
+          path: /tmp/upgrade-test.log
@@ -371,7 +371,15 @@ def main():
        # Dont' start if the JSON DB looks corrupt
        logger.critical(f"ERROR: JSON DB or Proxy List JSON at '{app_config['datastore_path']}' appears to be corrupt, aborting.")
        logger.critical(str(e))
-        return
+        sys.exit(1)
+
+    # Testing mode: Exit cleanly after datastore initialization (for CI/CD upgrade tests)
+    if os.environ.get('TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD'):
+        logger.success(f"TESTING MODE: Datastore loaded successfully from {app_config['datastore_path']}")
+        logger.success(f"TESTING MODE: Schema version: {datastore.data['settings']['application'].get('schema_version', 'unknown')}")
+        logger.success(f"TESTING MODE: Loaded {len(datastore.data['watching'])} watches")
+        logger.success("TESTING MODE: Exiting cleanly (TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD is set)")
+        sys.exit(0)

    # Apply all_paused setting if specified via CLI
    if all_paused is not None:
@@ -2,7 +2,7 @@ from changedetectionio.strtobool import strtobool
 from flask_restful import abort, Resource
 from flask import request
 from functools import wraps
-from . import auth, validate_openapi_request, schema_create_watch
+from . import auth, validate_openapi_request
 from ..validate_url import is_safe_valid_url
 import json

@@ -33,9 +33,25 @@ def convert_query_param_to_type(value, schema_property):

    Returns:
        Converted value in the appropriate type
+
+    Supports both OpenAPI 3.1 formats:
+    - type: [string, 'null']  (array format)
+    - anyOf: [{type: string}, {type: null}]  (anyOf format)
    """
-    # Handle anyOf schemas (extract the first type)
-    if 'anyOf' in schema_property:
+    prop_type = schema_property.get('type')
+
+    # Handle OpenAPI 3.1 type arrays: type: [string, 'null']
+    if isinstance(prop_type, list):
+        # Use the first non-null type from the array
+        for t in prop_type:
+            if t != 'null':
+                prop_type = t
+                break
+        else:
+            prop_type = None
+
+    # Handle anyOf schemas (older format)
+    elif 'anyOf' in schema_property:
        # Use the first non-null type from anyOf
        for option in schema_property['anyOf']:
            if option.get('type') and option.get('type') != 'null':
@@ -43,8 +59,6 @@ def convert_query_param_to_type(value, schema_property):
                break
        else:
            prop_type = None
-    else:
-        prop_type = schema_property.get('type')

    # Handle array type (e.g., notification_urls)
    if prop_type == 'array':
@@ -89,7 +103,7 @@ class Import(Resource):
    @validate_openapi_request('importWatches')
    def post(self):
        """Import a list of watched URLs with optional watch configuration."""
-
+        from . import get_watch_schema_properties
        # Special parameters that are NOT watch configuration
        special_params = {'tag', 'tag_uuids', 'dedupe', 'proxy'}

@@ -115,7 +129,8 @@ class Import(Resource):
            tag_uuids = tag_uuids.split(',')

        # Extract ALL other query parameters as watch configuration
-        schema_properties = schema_create_watch.get('properties', {})
+        # Get schema from OpenAPI spec (replaces old schema_create_watch)
+        schema_properties = get_watch_schema_properties()
        for param_name, param_value in request.args.items():
            # Skip special parameters
            if param_name in special_params:
@@ -1,8 +1,6 @@
-from flask_expects_json import expects_json
 from flask_restful import Resource, abort
 from flask import request
 from . import auth, validate_openapi_request
-from . import schema_create_notification_urls, schema_delete_notification_urls

 class Notifications(Resource):
    def __init__(self, **kwargs):
@@ -22,7 +20,6 @@ class Notifications(Resource):
    
    @auth.check_token
    @validate_openapi_request('addNotifications')
-    @expects_json(schema_create_notification_urls)
    def post(self):
        """Create Notification URLs."""

@@ -50,7 +47,6 @@ class Notifications(Resource):
    
    @auth.check_token
    @validate_openapi_request('replaceNotifications')
-    @expects_json(schema_create_notification_urls)
    def put(self):
        """Replace Notification URLs."""
        json_data = request.get_json()
@@ -73,7 +69,6 @@ class Notifications(Resource):
        
    @auth.check_token
    @validate_openapi_request('deleteNotifications')
-    @expects_json(schema_delete_notification_urls)
    def delete(self):
        """Delete Notification URLs."""

@@ -1,6 +1,5 @@
 from changedetectionio import queuedWatchMetaData
 from changedetectionio import worker_pool
-from flask_expects_json import expects_json
 from flask_restful import abort, Resource
 from loguru import logger

@@ -8,8 +7,7 @@ import threading
 from flask import request
 from . import auth

-# Import schemas from __init__.py
-from . import schema_tag, schema_create_tag, schema_update_tag, validate_openapi_request
+from . import validate_openapi_request


 class Tag(Resource):
@@ -69,7 +67,25 @@ class Tag(Resource):
            tag.commit()
            return "OK", 200

-        return tag
+        # Filter out Watch-specific runtime fields that don't apply to Tags (yet)
+        # TODO: Future enhancement - aggregate these values from all Watches that have this tag:
+        #   - check_count: sum of all watches' check_count
+        #   - last_checked: most recent last_checked from all watches
+        #   - last_changed: most recent last_changed from all watches
+        #   - consecutive_filter_failures: count of watches with failures
+        #   - etc.
+        # These come from watch_base inheritance but currently have no meaningful value for Tags
+        watch_only_fields = {
+            'browser_steps_last_error_step', 'check_count', 'consecutive_filter_failures',
+            'content-type', 'fetch_time', 'last_changed', 'last_checked', 'last_error',
+            'last_notification_error', 'last_viewed', 'notification_alert_count',
+            'page_title', 'previous_md5', 'remote_server_reply'
+        }
+
+        # Create clean tag dict without Watch-specific fields
+        clean_tag = {k: v for k, v in tag.items() if k not in watch_only_fields}
+
+        return clean_tag

    @auth.check_token
    @validate_openapi_request('deleteTag')
@@ -102,38 +118,73 @@ class Tag(Resource):

    @auth.check_token
    @validate_openapi_request('updateTag')
-    @expects_json(schema_update_tag)
    def put(self, uuid):
        """Update tag information."""
        tag = self.datastore.data['settings']['application']['tags'].get(uuid)
        if not tag:
            abort(404, message='No tag exists with the UUID of {}'.format(uuid))

+        # Make a mutable copy of request.json for modification
+        json_data = dict(request.json)
+
        # Validate notification_urls if provided
-        if 'notification_urls' in request.json:
+        if 'notification_urls' in json_data:
            from wtforms import ValidationError
            from changedetectionio.api.Notifications import validate_notification_urls
            try:
-                notification_urls = request.json.get('notification_urls', [])
+                notification_urls = json_data.get('notification_urls', [])
                validate_notification_urls(notification_urls)
            except ValidationError as e:
                return str(e), 400

-        tag.update(request.json)
+        # Filter out readOnly fields (extracted from OpenAPI spec Tag schema)
+        # These are system-managed fields that should never be user-settable
+        from . import get_readonly_tag_fields
+        readonly_fields = get_readonly_tag_fields()
+
+        # Tag model inherits from watch_base but has no @property attributes of its own
+        # So we only need to filter readOnly fields
+        for field in readonly_fields:
+            json_data.pop(field, None)
+
+        # Validate remaining fields - reject truly unknown fields
+        # Get valid fields from Tag schema
+        from . import get_tag_schema_properties
+        valid_fields = set(get_tag_schema_properties().keys())
+
+        # Check for unknown fields
+        unknown_fields = set(json_data.keys()) - valid_fields
+        if unknown_fields:
+            return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400
+
+        tag.update(json_data)
        tag.commit()

+        # Clear checksums for all watches using this tag to force reprocessing
+        # Tag changes affect inherited configuration
+        cleared_count = self.datastore.clear_checksums_for_tag(uuid)
+        logger.info(f"Tag {uuid} updated via API, cleared {cleared_count} watch checksums")
+
        return "OK", 200


    @auth.check_token
    @validate_openapi_request('createTag')
-    # Only cares for {'title': 'xxxx'}
    def post(self):
        """Create a single tag/group."""

        json_data = request.get_json()
        title = json_data.get("title",'').strip()

+        # Validate that only valid fields are provided
+        # Get valid fields from Tag schema
+        from . import get_tag_schema_properties
+        valid_fields = set(get_tag_schema_properties().keys())
+
+        # Check for unknown fields
+        unknown_fields = set(json_data.keys()) - valid_fields
+        if unknown_fields:
+            return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400

        new_uuid = self.datastore.add_tag(title=title)
        if new_uuid:
@@ -8,13 +8,11 @@ from . import auth
 from changedetectionio import queuedWatchMetaData, strtobool
 from changedetectionio import worker_pool
 from flask import request, make_response, send_from_directory
-from flask_expects_json import expects_json
 from flask_restful import abort, Resource
 from loguru import logger
 import copy

-# Import schemas from __init__.py
-from . import schema, schema_create_watch, schema_update_watch, validate_openapi_request
+from . import validate_openapi_request, get_readonly_watch_fields
 from ..notification import valid_notification_formats
 from ..notification.handler import newline_re

@@ -121,7 +119,6 @@ class Watch(Resource):

    @auth.check_token
    @validate_openapi_request('updateWatch')
-    @expects_json(schema_update_watch)
    def put(self, uuid):
        """Update watch information."""
        watch = self.datastore.data['watching'].get(uuid)
@@ -175,6 +172,35 @@ class Watch(Resource):
        # Extract and remove processor config fields from json_data
        processor_config_data = processors.extract_processor_config_from_form_data(json_data)

+        # Filter out readOnly fields (extracted from OpenAPI spec Watch schema)
+        # These are system-managed fields that should never be user-settable
+        readonly_fields = get_readonly_watch_fields()
+
+        # Also filter out @property attributes (computed/derived values from the model)
+        # These are not stored and should be ignored in PUT requests
+        from changedetectionio.model.Watch import model as WatchModel
+        property_fields = WatchModel.get_property_names()
+
+        # Combine both sets of fields to ignore
+        fields_to_ignore = readonly_fields | property_fields
+
+        # Remove all ignored fields from update data
+        for field in fields_to_ignore:
+            json_data.pop(field, None)
+
+        # Validate remaining fields - reject truly unknown fields
+        # Get valid fields from WatchBase schema
+        from . import get_watch_schema_properties
+        valid_fields = set(get_watch_schema_properties().keys())
+
+        # Also allow last_viewed (explicitly defined in UpdateWatch schema)
+        valid_fields.add('last_viewed')
+
+        # Check for unknown fields
+        unknown_fields = set(json_data.keys()) - valid_fields
+        if unknown_fields:
+            return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400
+
        # Update watch with regular (non-processor-config) fields
        watch.update(json_data)
        watch.commit()
@@ -393,7 +419,6 @@ class CreateWatch(Resource):

    @auth.check_token
    @validate_openapi_request('createWatch')
-    @expects_json(schema_create_watch)
    def post(self):
        """Create a single watch."""

@@ -1,41 +1,6 @@
-import copy
 import functools
 from flask import request, abort
 from loguru import logger
-from . import api_schema
-from ..model import watch_base
-
-# Build a JSON Schema atleast partially based on our Watch model
-watch_base_config = watch_base()
-schema = api_schema.build_watch_json_schema(watch_base_config)
-
-schema_create_watch = copy.deepcopy(schema)
-schema_create_watch['required'] = ['url']
-del schema_create_watch['properties']['last_viewed']
-# Allow processor_config_* fields (handled separately in endpoint)
-schema_create_watch['patternProperties'] = {
-    '^processor_config_': {'type': ['string', 'number', 'boolean', 'object', 'array', 'null']}
-}
-
-schema_update_watch = copy.deepcopy(schema)
-schema_update_watch['additionalProperties'] = False
-# Allow processor_config_* fields (handled separately in endpoint)
-schema_update_watch['patternProperties'] = {
-    '^processor_config_': {'type': ['string', 'number', 'boolean', 'object', 'array', 'null']}
-}
-
-# Tag schema is also based on watch_base since Tag inherits from it
-schema_tag = copy.deepcopy(schema)
-schema_create_tag = copy.deepcopy(schema_tag)
-schema_create_tag['required'] = ['title']
-schema_update_tag = copy.deepcopy(schema_tag)
-schema_update_tag['additionalProperties'] = False
-
-schema_notification_urls = copy.deepcopy(schema)
-schema_create_notification_urls = copy.deepcopy(schema_notification_urls)
-schema_create_notification_urls['required'] = ['notification_urls']
-schema_delete_notification_urls = copy.deepcopy(schema_notification_urls)
-schema_delete_notification_urls['required'] = ['notification_urls']

@functools.cache
 def get_openapi_spec():
@@ -54,6 +19,79 @@ def get_openapi_spec():
    _openapi_spec = OpenAPI.from_dict(spec_dict)
    return _openapi_spec

+@functools.cache
+def get_openapi_schema_dict():
+    """
+    Get the raw OpenAPI spec dictionary for schema access.
+
+    Used by Import endpoint to validate and convert query parameters.
+    Returns the YAML dict directly (not the OpenAPI object).
+    """
+    import os
+    import yaml
+
+    spec_path = os.path.join(os.path.dirname(__file__), '../../docs/api-spec.yaml')
+    if not os.path.exists(spec_path):
+        spec_path = os.path.join(os.path.dirname(__file__), '../docs/api-spec.yaml')
+
+    with open(spec_path, 'r', encoding='utf-8') as f:
+        return yaml.safe_load(f)
+
+@functools.cache
+def _resolve_schema_properties(schema_name):
+    """
+    Generic helper to resolve schema properties, including allOf inheritance.
+
+    Args:
+        schema_name: Name of the schema (e.g., 'WatchBase', 'Watch', 'Tag')
+
+    Returns:
+        dict: All properties including inherited ones from $ref schemas
+    """
+    spec_dict = get_openapi_schema_dict()
+    schema = spec_dict['components']['schemas'].get(schema_name, {})
+
+    properties = {}
+
+    # Handle allOf (schema inheritance)
+    if 'allOf' in schema:
+        for item in schema['allOf']:
+            # Resolve $ref to parent schema
+            if '$ref' in item:
+                ref_path = item['$ref'].split('/')[-1]
+                ref_schema = spec_dict['components']['schemas'].get(ref_path, {})
+                properties.update(ref_schema.get('properties', {}))
+            # Add schema-specific properties
+            if 'properties' in item:
+                properties.update(item['properties'])
+    else:
+        # Direct properties (no inheritance)
+        properties = schema.get('properties', {})
+
+    return properties
+
+
+@functools.cache
+def get_watch_schema_properties():
+    """
+    Extract watch schema properties from OpenAPI spec for Import endpoint.
+
+    Returns WatchBase properties (all writable Watch fields).
+    """
+    return _resolve_schema_properties('WatchBase')
+
+# Import readonly field utilities from shared module (avoids circular dependencies with model layer)
+from changedetectionio.model.schema_utils import get_readonly_watch_fields, get_readonly_tag_fields
+
+@functools.cache
+def get_tag_schema_properties():
+    """
+    Extract Tag schema properties from OpenAPI spec.
+
+    Returns WatchBase properties + Tag-specific properties (overrides_watch).
+    """
+    return _resolve_schema_properties('Tag')
+
 def validate_openapi_request(operation_id):
    """Decorator to validate incoming requests against OpenAPI spec."""
    def decorator(f):
@@ -72,8 +110,16 @@ def validate_openapi_request(operation_id):
                    if result.errors:
                        error_details = []
                        for error in result.errors:
-                            error_details.append(str(error))
-                        raise BadRequest(f"OpenAPI validation failed: {error_details}")
+                            # Extract detailed schema errors from __cause__
+                            if hasattr(error, '__cause__') and hasattr(error.__cause__, 'schema_errors'):
+                                for schema_error in error.__cause__.schema_errors:
+                                    field = '.'.join(str(p) for p in schema_error.path) if schema_error.path else 'body'
+                                    msg = schema_error.message if hasattr(schema_error, 'message') else str(schema_error)
+                                    error_details.append(f"{field}: {msg}")
+                            else:
+                                error_details.append(str(error))
+                            logger.error(f"API Call - Validation failed: {'; '.join(error_details)}")
+                        raise BadRequest(f"Validation failed: {'; '.join(error_details)}")
            except BadRequest:
                # Re-raise BadRequest exceptions (validation failures)
                raise
@@ -1,162 +0,0 @@
-# Responsible for building the storage dict into a set of rules ("JSON Schema") acceptable via the API
-# Probably other ways to solve this when the backend switches to some ORM
-from changedetectionio.notification import valid_notification_formats
-
-
-def build_time_between_check_json_schema():
-    # Setup time between check schema
-    schema_properties_time_between_check = {
-        "type": "object",
-        "additionalProperties": False,
-        "properties": {}
-    }
-    for p in ['weeks', 'days', 'hours', 'minutes', 'seconds']:
-        schema_properties_time_between_check['properties'][p] = {
-            "anyOf": [
-                {
-                    "type": "integer"
-                },
-                {
-                    "type": "null"
-                }
-            ]
-        }
-
-    return schema_properties_time_between_check
-
-def build_watch_json_schema(d):
-    # Base JSON schema
-    schema = {
-        'type': 'object',
-        'properties': {},
-    }
-
-    for k, v in d.items():
-        # @todo 'integer' is not covered here because its almost always for internal usage
-
-        if isinstance(v, type(None)):
-            schema['properties'][k] = {
-                "anyOf": [
-                    {"type": "null"},
-                ]
-            }
-        elif isinstance(v, list):
-            schema['properties'][k] = {
-                "anyOf": [
-                    {"type": "array",
-                     # Always is an array of strings, like text or regex or something
-                     "items": {
-                         "type": "string",
-                         "maxLength": 5000
-                     }
-                     },
-                ]
-            }
-        elif isinstance(v, bool):
-            schema['properties'][k] = {
-                "anyOf": [
-                    {"type": "boolean"},
-                ]
-            }
-        elif isinstance(v, str):
-            schema['properties'][k] = {
-                "anyOf": [
-                    {"type": "string",
-                     "maxLength": 5000},
-                ]
-            }
-
-    # Can also be a string (or None by default above)
-    for v in ['body',
-              'notification_body',
-              'notification_format',
-              'notification_title',
-              'proxy',
-              'tag',
-              'title',
-              'webdriver_js_execute_code'
-              ]:
-        schema['properties'][v]['anyOf'].append({'type': 'string', "maxLength": 5000})
-
-    for v in ['last_viewed']:
-        schema['properties'][v] = {
-            "type": "integer",
-            "description": "Unix timestamp in seconds of the last time the watch was viewed.",
-            "minimum": 0
-        }
-
-    # None or Boolean
-    schema['properties']['track_ldjson_price_data']['anyOf'].append({'type': 'boolean'})
-
-    schema['properties']['method'] = {"type": "string",
-                                      "enum": ["GET", "POST", "DELETE", "PUT"]
-                                      }
-
-    schema['properties']['fetch_backend']['anyOf'].append({"type": "string",
-                                                           "enum": ["html_requests", "html_webdriver"]
-                                                           })
-
-    schema['properties']['processor'] = {"anyOf": [
-        {"type": "string", "enum": ["restock_diff", "text_json_diff"]},
-        {"type": "null"}
-    ]}
-
-    # All headers must be key/value type dict
-    schema['properties']['headers'] = {
-        "type": "object",
-        "patternProperties": {
-            # Should always be a string:string type value
-            ".*": {"type": "string"},
-        }
-    }
-
-    schema['properties']['notification_format'] = {'type': 'string',
-                                                   'enum': list(valid_notification_formats.keys())
-                                                   }
-
-    # Stuff that shouldn't be available but is just state-storage
-    for v in ['previous_md5', 'last_error', 'has_ldjson_price_data', 'previous_md5_before_filters', 'uuid']:
-        del schema['properties'][v]
-
-    schema['properties']['webdriver_delay']['anyOf'].append({'type': 'integer'})
-
-    schema['properties']['time_between_check'] = build_time_between_check_json_schema()
-
-    schema['properties']['time_between_check_use_default'] = {
-        "type": "boolean",
-        "default": True,
-        "description": "Whether to use global settings for time between checks - defaults to true if not set"
-    }
-
-    schema['properties']['browser_steps'] = {
-        "anyOf": [
-            {
-                "type": "array",
-                "items": {
-                    "type": "object",
-                    "properties": {
-                        "operation": {
-                            "type": ["string", "null"],
-                            "maxLength": 5000  # Allows null and any string up to 5000 chars (including "")
-                        },
-                        "selector": {
-                            "type": ["string", "null"],
-                            "maxLength": 5000
-                        },
-                        "optional_value": {
-                            "type": ["string", "null"],
-                            "maxLength": 5000
-                        }
-                    },
-                    "required": ["operation", "selector", "optional_value"],
-                    "additionalProperties": False  # No extra keys allowed
-                }
-            },
-            {"type": "null"},  # Allows null for `browser_steps`
-            {"type": "array", "maxItems": 0}  # Allows empty array []
-        ]
-    }
-
-    # headers ?
-    return schema
-
@@ -83,6 +83,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
                datastore.data['settings']['requests'].update(form.data['requests'])
                datastore.commit()

+                # Clear all checksums to force reprocessing with new settings
+                # Global settings can affect watch behavior (filters, rendering, etc.)
+                datastore.clear_all_last_checksums()
+
                # Adjust worker count if it changed
                if new_worker_count != old_worker_count:
                    from changedetectionio import worker_pool
@@ -244,6 +244,12 @@ def construct_blueprint(datastore: ChangeDetectionStore):
        tag.update(form.data)
        tag['processor'] = 'restock_diff'
        tag.commit()
+
+        # Clear checksums for all watches using this tag to force reprocessing
+        # Tag changes affect inherited configuration
+        cleared_count = datastore.clear_checksums_for_tag(uuid)
+        logger.info(f"Tag {uuid} updated, cleared {cleared_count} watch checksums")
+
        flash(gettext("Updated"))

        return redirect(url_for('tags.tags_overview_page'))
@@ -354,6 +354,59 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
        # Return a 500 error
        abort(500)

+    @edit_blueprint.route("/edit/<string:uuid>/get-data-package", methods=['GET'])
+    @login_optionally_required
+    def watch_get_data_package(uuid):
+        """Download all data for a single watch as a zip file"""
+        from io import BytesIO
+        from flask import send_file
+        import zipfile
+        from pathlib import Path
+        import datetime
+
+        if uuid == 'first':
+            uuid = list(datastore.data['watching'].keys()).pop()
+
+        watch = datastore.data['watching'].get(uuid)
+        if not watch:
+            abort(404)
+
+        # Create zip in memory
+        memory_file = BytesIO()
+
+        with zipfile.ZipFile(memory_file, 'w',
+                           compression=zipfile.ZIP_DEFLATED,
+                           compresslevel=8) as zipObj:
+
+            # Add the watch's JSON file if it exists
+            watch_json_path = os.path.join(watch.data_dir, 'watch.json')
+            if os.path.isfile(watch_json_path):
+                zipObj.write(watch_json_path,
+                           arcname=os.path.join(uuid, 'watch.json'),
+                           compress_type=zipfile.ZIP_DEFLATED,
+                           compresslevel=8)
+
+            # Add all files in the watch data directory
+            if os.path.isdir(watch.data_dir):
+                for f in Path(watch.data_dir).glob('*'):
+                    if f.is_file() and f.name != 'watch.json':  # Skip watch.json since we already added it
+                        zipObj.write(f,
+                                   arcname=os.path.join(uuid, f.name),
+                                   compress_type=zipfile.ZIP_DEFLATED,
+                                   compresslevel=8)
+
+        # Seek to beginning of file
+        memory_file.seek(0)
+
+        # Generate filename with timestamp
+        timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
+        filename = f"watch-data-{uuid[:8]}-{timestamp}.zip"
+
+        return send_file(memory_file,
+                        as_attachment=True,
+                        download_name=filename,
+                        mimetype='application/zip')
+
    # Ajax callback
    @edit_blueprint.route("/edit/<string:uuid>/preview-rendered", methods=['POST'])
    @login_optionally_required
@@ -488,6 +488,7 @@ Math: {{ 1 + 1 }}") }}
                    {% if watch.history_n %}
                        <p>
                             <a href="{{url_for('ui.ui_edit.watch_get_latest_html', uuid=uuid)}}" class="pure-button button-small">{{ _('Download latest HTML snapshot') }}</a>
+                             <a href="{{url_for('ui.ui_edit.watch_get_data_package', uuid=uuid)}}" class="pure-button button-small">{{ _('Export watch data') }}</a>
                        </p>
                    {% endif %}

@@ -70,13 +70,17 @@ socketio_server = None
 # Enable CORS, especially useful for the Chrome extension to operate from anywhere
 CORS(app)

-# Super handy for compressing large BrowserSteps responses and others
-# Flask-Compress handles HTTP compression, Socket.IO compression disabled to prevent memory leak
+# Flask-Compress handles HTTP compression, Socket.IO compression disabled to prevent memory leak.
+# There's also a bug between flask compress and socketio that causes some kind of slow memory leak
+# It's better to use compression on your reverse proxy (nginx etc) instead.
+if strtobool(os.getenv("FLASK_ENABLE_COMPRESSION")):
+    app.config['COMPRESS_MIN_SIZE'] = 2096
+    app.config['COMPRESS_MIMETYPES'] = ['text/html', 'text/css', 'text/javascript', 'application/json', 'application/javascript', 'image/svg+xml']
+    # Use gzip only - smaller memory footprint than zstd/brotli (4-8KB vs 200-500KB contexts)
+    app.config['COMPRESS_ALGORITHM'] = ['gzip']
+
 compress = FlaskCompress()
-app.config['COMPRESS_MIN_SIZE'] = 2096
-app.config['COMPRESS_MIMETYPES'] = ['text/html', 'text/css', 'text/javascript', 'application/json', 'application/javascript', 'image/svg+xml']
-# Use gzip only - smaller memory footprint than zstd/brotli (4-8KB vs 200-500KB contexts)
-app.config['COMPRESS_ALGORITHM'] = ['gzip']
+
 compress.init_app(app)
 app.config['TEMPLATES_AUTO_RELOAD'] = False

@@ -20,11 +20,9 @@ See: Watch.py model docstring for full Pydantic architecture explanation
 See: processors/restock_diff/processor.py:184-192 for current manual implementation
 """

-import os
 from changedetectionio.model import watch_base
 from changedetectionio.model.persistence import EntityPersistenceMixin

-
 class model(EntityPersistenceMixin, watch_base):
    """
    Tag domain model - groups watches and can override their settings.
@@ -335,7 +335,6 @@ class model(EntityPersistenceMixin, watch_base):
            'last_notification_error': False,
            'last_viewed': 0,
            'previous_md5': False,
-            'previous_md5_before_filters': False,
            'remote_server_reply': None,
            'track_ldjson_price_data': None
        })
@@ -386,10 +385,16 @@ class model(EntityPersistenceMixin, watch_base):

    @property
    def is_pdf(self):
-        # content_type field is set in the future
-        # https://github.com/dgtlmoon/changedetection.io/issues/1392
-        # Not sure the best logic here
-        return self.get('url', '').lower().endswith('.pdf') or 'pdf' in self.get('content_type', '').lower()
+        url = str(self.get("url") or "").lower()
+        content_type = str(self.get("content-type") or "").lower()
+
+        if content_type in ("none", "null", ""):
+            content_type = ""
+
+        return (
+                url.endswith(".pdf")
+                or content_type.split(";")[0].strip() == "application/pdf"
+        )

    @property
    def label(self):
@@ -2,7 +2,7 @@ import os
 import uuid

 from changedetectionio import strtobool
-from .persistence import EntityPersistenceMixin
+from .persistence import EntityPersistenceMixin, _determine_entity_type

 __all__ = ['EntityPersistenceMixin', 'watch_base']

@@ -26,6 +26,7 @@ class watch_base(dict):
          - Configuration override chain resolution (Watch → Tag → Global)
          - Immutability options
          - Better testing
+          - USE https://docs.pydantic.dev/latest/integrations/datamodel_code_generator TO BUILD THE MODEL FROM THE API-SPEC!!!

    CHAIN RESOLUTION ARCHITECTURE:
        The dream is a 3-level override hierarchy:
@@ -128,7 +129,6 @@ class watch_base(dict):
        fetch_time (float): Duration of last fetch in seconds
        consecutive_filter_failures (int): Counter for consecutive filter match failures
        previous_md5 (str|bool): MD5 hash of previous content
-        previous_md5_before_filters (str|bool): MD5 hash before filters applied
        history_snapshot_max_length (int|None): Max history snapshots to keep (None = use global)

    Conditions:
@@ -165,6 +165,10 @@ class watch_base(dict):
        if kw.get('datastore_path'):
            del kw['datastore_path']

+        # IMPORTANT: Don't initialize __watch_was_edited yet!
+        # We'll initialize it AFTER the initial update() call below
+        # This prevents marking the watch as edited during initialization
+
        self.update({
            # Custom notification content
            # Re #110, so then if this is set to None, we know to use the default value instead
@@ -173,7 +177,7 @@ class watch_base(dict):
            'body': None,
            'browser_steps': [],
            'browser_steps_last_error_step': None,
-            'conditions' : {},
+            'conditions' : [],
            'conditions_match_logic': CONDITIONS_MATCH_LOGIC_DEFAULT,
            'check_count': 0,
            'check_unique_lines': False,  # On change-detected, compare against all history if its something new
@@ -210,7 +214,6 @@ class watch_base(dict):
            'page_title': None, # <title> from the page
            'paused': False,
            'previous_md5': False,
-            'previous_md5_before_filters': False,  # Used for skipping changedetection entirely
            'processor': 'text_json_diff',  # could be restock_diff or others from .processors
            'price_change_threshold_percent': None,
            'proxy': None,  # Preferred proxy connection
@@ -296,9 +299,152 @@ class watch_base(dict):

        super(watch_base, self).__init__(*arg, **kw)

+        # Check if we're being initialized from an existing watch object
+        # that has was_edited=True, so we can preserve the flag
+        preserve_edited_flag = False
        if self.get('default'):
+            # When creating a new watch object from an existing one (e.g., changing processor),
+            # preserve the was_edited flag if it was True
+            default_watch = self.get('default')
+            if hasattr(default_watch, 'was_edited') and default_watch.was_edited:
+                preserve_edited_flag = True
            del self['default']

+        # NOW initialize the edited flag after all initial setup is complete
+        # This ensures initialization doesn't trigger the edited flag
+        # But preserve it if the source watch had it set to True
+        self.__watch_was_edited = preserve_edited_flag
+
+    def _mark_field_as_edited(self, key):
+        """
+        Helper to mark a field as edited if it's writable.
+
+        Internal method used by __setitem__, update(), pop(), etc.
+        """
+        # Don't track edits during initial load or if already edited
+        if not hasattr(self, '_watch_base__watch_was_edited'):
+            return
+        if self.__watch_was_edited:
+            return  # Already marked as edited
+
+        # Import from shared schema utilities (no circular dependency)
+        from .schema_utils import get_readonly_watch_fields
+        readonly_fields = get_readonly_watch_fields()
+
+        # Additional system-managed fields not in OpenAPI spec (yet)
+        # These are set by processors/workers and should not trigger edited flag
+        additional_system_fields = {
+            'last_check_status',  # Set by processors
+            'restock',  # Set by restock processor
+            'last_viewed',  # Set by mark_all_viewed endpoint
+        }
+
+        # Only mark as edited if this is a user-writable field
+        if key not in readonly_fields and key not in additional_system_fields:
+            self.__watch_was_edited = True
+
+    def __setitem__(self, key, value):
+        """
+        Override dict.__setitem__ to track when writable watch fields are modified.
+
+        This enables skipping reprocessing when:
+        1. HTML content is unchanged (checksumFromPreviousCheckWasTheSame)
+        2. AND watch configuration was not edited
+
+        Only sets the edited flag when field is NOT in readonly_fields (from OpenAPI spec).
+        """
+        # Set the value first (always)
+        super().__setitem__(key, value)
+        # Mark as edited if writable field
+        self._mark_field_as_edited(key)
+
+    def __delitem__(self, key):
+        """Override dict.__delitem__ to track deletions of writable fields."""
+        super().__delitem__(key)
+        self._mark_field_as_edited(key)
+
+    def update(self, *args, **kwargs):
+        """Override dict.update() to track modifications to writable fields."""
+        # Call parent update first
+        super().update(*args, **kwargs)
+
+        # Mark as edited for any writable fields that were updated
+        # Handle both update(dict) and update(key=value) forms
+        if args:
+            for key in args[0].keys():
+                self._mark_field_as_edited(key)
+        for key in kwargs.keys():
+            self._mark_field_as_edited(key)
+
+    def pop(self, key, *args):
+        """Override dict.pop() to track removal of writable fields."""
+        result = super().pop(key, *args)
+        self._mark_field_as_edited(key)
+        return result
+
+    def setdefault(self, key, default=None):
+        """Override dict.setdefault() to track modifications to writable fields."""
+        # Only marks as edited if key didn't exist (i.e., a new value was set)
+        existed = key in self
+        result = super().setdefault(key, default)
+        if not existed:
+            self._mark_field_as_edited(key)
+        return result
+
+    @property
+    def was_edited(self):
+        """
+        Check if watch configuration was edited since last processing.
+
+        Returns:
+            bool: True if writable fields were modified, False otherwise
+        """
+        return getattr(self, '_watch_base__watch_was_edited', False)
+
+    def reset_watch_edited_flag(self):
+        """
+        Reset the watch edited flag after successful processing.
+
+        Call this after processing completes to allow future content-only change detection.
+        """
+        self.__watch_was_edited = False
+
+    @classmethod
+    def get_property_names(cls):
+        """
+        Get all @property attribute names from this model class using introspection.
+
+        This discovers computed/derived properties that are not stored in the datastore.
+        These properties should be filtered out during PUT/POST requests.
+
+        Returns:
+            frozenset: Immutable set of @property attribute names from the model class
+        """
+        import functools
+
+        # Create a cached version if it doesn't exist
+        if not hasattr(cls, '_cached_get_property_names'):
+            @functools.cache
+            def _get_props():
+                properties = set()
+                # Use introspection to find all @property attributes
+                for name in dir(cls):
+                    # Skip private/magic attributes
+                    if name.startswith('_'):
+                        continue
+                    try:
+                        attr = getattr(cls, name)
+                        # Check if it's a property descriptor
+                        if isinstance(attr, property):
+                            properties.add(name)
+                    except (AttributeError, TypeError):
+                        continue
+                return frozenset(properties)
+
+            cls._cached_get_property_names = _get_props
+
+        return cls._cached_get_property_names()
+
    def __deepcopy__(self, memo):
        """
        Custom deepcopy for all watch_base subclasses (Watch, Tag, etc.).
@@ -511,10 +657,8 @@ class watch_base(dict):
        # Save to disk via subclass implementation
        try:
            # Determine entity type from module name (Watch.py -> watch, Tag.py -> tag)
-            from changedetectionio.model.persistence import _determine_entity_type
            entity_type = _determine_entity_type(self.__class__)
            filename = f"{entity_type}.json"
-
            self._save_to_disk(data_dict, uuid)
            logger.debug(f"Committed {entity_type} {uuid} to {uuid}/{filename}")
        except Exception as e:
@@ -0,0 +1,92 @@
+"""
+Schema utilities for Watch and Tag models.
+
+Provides functions to extract readonly fields and properties from OpenAPI spec.
+Shared by both the model layer and API layer to avoid circular dependencies.
+"""
+
+import functools
+
+
+@functools.cache
+def get_openapi_schema_dict():
+    """
+    Get the raw OpenAPI spec dictionary for schema access.
+
+    Returns the YAML dict directly (not the OpenAPI object).
+    """
+    import os
+    import yaml
+
+    spec_path = os.path.join(os.path.dirname(__file__), '../../docs/api-spec.yaml')
+    if not os.path.exists(spec_path):
+        spec_path = os.path.join(os.path.dirname(__file__), '../docs/api-spec.yaml')
+
+    with open(spec_path, 'r', encoding='utf-8') as f:
+        return yaml.safe_load(f)
+
+
+@functools.cache
+def _resolve_readonly_fields(schema_name):
+    """
+    Generic helper to resolve readOnly fields, including allOf inheritance.
+
+    Args:
+        schema_name: Name of the schema (e.g., 'Watch', 'Tag')
+
+    Returns:
+        frozenset: All readOnly field names including inherited ones
+    """
+    spec_dict = get_openapi_schema_dict()
+    schema = spec_dict['components']['schemas'].get(schema_name, {})
+
+    readonly_fields = set()
+
+    # Handle allOf (schema inheritance)
+    if 'allOf' in schema:
+        for item in schema['allOf']:
+            # Resolve $ref to parent schema
+            if '$ref' in item:
+                ref_path = item['$ref'].split('/')[-1]
+                ref_schema = spec_dict['components']['schemas'].get(ref_path, {})
+                if 'properties' in ref_schema:
+                    for field_name, field_def in ref_schema['properties'].items():
+                        if field_def.get('readOnly') is True:
+                            readonly_fields.add(field_name)
+            # Check schema-specific properties
+            if 'properties' in item:
+                for field_name, field_def in item['properties'].items():
+                    if field_def.get('readOnly') is True:
+                        readonly_fields.add(field_name)
+    else:
+        # Direct properties (no inheritance)
+        if 'properties' in schema:
+            for field_name, field_def in schema['properties'].items():
+                if field_def.get('readOnly') is True:
+                    readonly_fields.add(field_name)
+
+    return frozenset(readonly_fields)
+
+
+@functools.cache
+def get_readonly_watch_fields():
+    """
+    Extract readOnly field names from Watch schema in OpenAPI spec.
+
+    Returns readOnly fields from WatchBase (uuid, date_created) + Watch-specific readOnly fields.
+
+    Used by:
+    - model/watch_base.py: Track when writable fields are edited
+    - api/Watch.py: Filter readonly fields from PUT requests
+    """
+    return _resolve_readonly_fields('Watch')
+
+
+@functools.cache
+def get_readonly_tag_fields():
+    """
+    Extract readOnly field names from Tag schema in OpenAPI spec.
+
+    Returns readOnly fields from WatchBase (uuid, date_created) + Tag-specific readOnly fields.
+    """
+    return _resolve_readonly_fields('Tag')
@@ -1,6 +1,6 @@
 from functools import lru_cache
 from loguru import logger
-from flask_babel import gettext
+from flask_babel import gettext, get_locale
 import importlib
 import inspect
 import os
@@ -190,14 +190,15 @@ def get_plugin_processor_metadata():
        logger.warning(f"Error getting plugin processor metadata: {e}")
    return metadata

-
-def available_processors():
-    """
-    Get a list of processors by name and description for the UI elements.
-    Can be filtered via DISABLED_PROCESSORS environment variable (comma-separated list).
-    :return: A list :)
+@lru_cache(maxsize=32)
+def _available_processors_cached(locale_str):
    """
+    Internal cached function that includes locale in cache key.
+    This ensures translations are cached per-language instead of globally.

+    :param locale_str: The locale string (e.g., 'en', 'it', 'zh')
+    :return: A list of tuples (processor_name, translated_description, weight)
+    """
    processor_classes = find_processors()

    # Check if DISABLED_PROCESSORS env var is set
@@ -256,6 +257,22 @@ def available_processors():
    # Return as tuples without weight (for backwards compatibility)
    return [(name, desc) for name, desc, weight in available]

+def available_processors():
+    """
+    Get a list of processors by name and description for the UI elements.
+    Can be filtered via DISABLED_PROCESSORS environment variable (comma-separated list).
+
+    This function delegates to a locale-aware cached version to ensure translations
+    are cached per-language instead of globally.
+
+    :return: A list of tuples (processor_name, translated_description)
+    """
+    # Get current locale and use it as cache key
+    # Convert Babel Locale object to string for use as cache key
+    locale = get_locale()
+    locale_str = str(locale) if locale else 'en'
+    return _available_processors_cached(locale_str)
+

 def get_default_processor():
    """
@@ -19,6 +19,7 @@ class difference_detection_processor():
    xpath_data = None
    preferred_proxy = None
    screenshot_format = SCREENSHOT_FORMAT_JPEG
+    last_raw_content_checksum = None

    def __init__(self, datastore, watch_uuid):
        self.datastore = datastore
@@ -34,6 +35,64 @@ class difference_detection_processor():
        # Generic fetcher that should be extended (requests, playwright etc)
        self.fetcher = Fetcher()

+        # Load the last raw content checksum from file
+        self.read_last_raw_content_checksum()
+
+    def update_last_raw_content_checksum(self, checksum):
+        """
+        Save the raw content MD5 checksum to file.
+        This is used for skip logic - avoid reprocessing if raw HTML unchanged.
+        """
+        if not checksum:
+            return
+
+        watch = self.datastore.data['watching'].get(self.watch_uuid)
+        if not watch:
+            return
+
+        data_dir = watch.data_dir
+        if not data_dir:
+            return
+
+        watch.ensure_data_dir_exists()
+        checksum_file = os.path.join(data_dir, 'last-checksum.txt')
+
+        try:
+            with open(checksum_file, 'w', encoding='utf-8') as f:
+                f.write(checksum)
+            self.last_raw_content_checksum = checksum
+        except IOError as e:
+            logger.warning(f"Failed to write checksum file for {self.watch_uuid}: {e}")
+
+    def read_last_raw_content_checksum(self):
+        """
+        Read the last raw content MD5 checksum from file.
+        Returns None if file doesn't exist (first run) or can't be read.
+        """
+        watch = self.datastore.data['watching'].get(self.watch_uuid)
+        if not watch:
+            self.last_raw_content_checksum = None
+            return
+
+        data_dir = watch.data_dir
+        if not data_dir:
+            self.last_raw_content_checksum = None
+            return
+
+        checksum_file = os.path.join(data_dir, 'last-checksum.txt')
+
+        if not os.path.isfile(checksum_file):
+            self.last_raw_content_checksum = None
+            return
+
+        try:
+            with open(checksum_file, 'r', encoding='utf-8') as f:
+                self.last_raw_content_checksum = f.read().strip()
+        except IOError as e:
+            logger.warning(f"Failed to read checksum file for {self.watch_uuid}: {e}")
+            self.last_raw_content_checksum = None
+
+
    async def call_browser(self, preferred_proxy_id=None):

        from requests.structures import CaseInsensitiveDict
@@ -257,8 +316,16 @@ class difference_detection_processor():
        except IOError as e:
            logger.error(f"Failed to write extra watch config {filename}: {e}")

+    def get_raw_document_checksum(self):
+        checksum = None
+
+        if self.fetcher.content:
+            checksum = hashlib.md5(self.fetcher.content.encode('utf-8')).hexdigest()
+
+        return checksum
+
    @abstractmethod
-    def run_changedetection(self, watch):
+    def run_changedetection(self, watch, force_reprocess=False):
        update_obj = {'last_notification_error': False, 'last_error': False}
        some_data = 'xxxxx'
        update_obj["previous_md5"] = hashlib.md5(some_data.encode('utf-8')).hexdigest()
@@ -30,7 +30,7 @@ class perform_site_check(difference_detection_processor):
    # Override to use PNG format for better image comparison (JPEG compression creates noise)
    screenshot_format = SCREENSHOT_FORMAT_PNG

-    def run_changedetection(self, watch):
+    def run_changedetection(self, watch, force_reprocess=False):
        """
        Perform screenshot comparison using OpenCV subprocess handler.

@@ -2,6 +2,7 @@ from ..base import difference_detection_processor
 from ..exceptions import ProcessorException
 from . import Restock
 from loguru import logger
+from changedetectionio.content_fetchers.exceptions import checksumFromPreviousCheckWasTheSame

 import urllib3
 import time
@@ -56,6 +57,259 @@ def _deduplicate_prices(data):
    return list(unique_data)


+# =============================================================================
+# MEMORY MANAGEMENT: Why We Use Multiprocessing (Linux Only)
+# =============================================================================
+#
+# The get_itemprop_availability() function uses 'extruct' to parse HTML metadata
+# (JSON-LD, microdata, OpenGraph, etc). Extruct internally uses lxml, which wraps
+# libxml2 - a C library that allocates memory at the C level.
+#
+# Memory Leak Problem:
+# --------------------
+# 1. lxml's document_fromstring() creates thousands of Python objects backed by
+#    C-level allocations (nodes, attributes, text content)
+# 2. Python's garbage collector can mark these objects as collectible, but
+#    cannot force the OS to reclaim the actual C-level memory
+# 3. malloc/free typically doesn't return memory to OS - it just marks it as
+#    "free in the process address space"
+# 4. With repeated parsing of large HTML (5MB+ pages), memory accumulates even
+#    after Python GC runs
+#
+# Why Multiprocessing Fixes This:
+# --------------------------------
+# When a subprocess exits, the OS forcibly reclaims ALL memory including C-level
+# allocations that Python GC couldn't release. This ensures clean memory state
+# after each extraction.
+#
+# Performance Impact:
+# -------------------
+# - Memray analysis showed 1.2M document_fromstring allocations per page
+# - Without subprocess: memory grows by ~50-500MB per parse and lingers
+# - With subprocess: ~35MB overhead but forces full cleanup after each run
+# - Trade-off: 35MB resource_tracker vs 500MB+ accumulated leak = much better at scale
+#
+# References:
+# -----------
+# - lxml memory issues: https://medium.com/devopss-hole/python-lxml-memory-leak-b8d0b1000dc7
+# - libxml2 caching behavior: https://www.mail-archive.com/lxml@python.org/msg00026.html
+# - GC limitations with C extensions: https://benbernardblog.com/tracking-down-a-freaky-python-memory-leak-part-2/
+#
+# Additional Context:
+# -------------------
+# - jsonpath_ng (used to query the parsed data) is pure Python and doesn't leak
+# - The leak is specifically from lxml's document parsing, not the JSONPath queries
+# - Linux-only because multiprocessing spawn is well-tested there; other platforms
+#   use direct call as fallback
+#
+# Alternative Solution (Future Optimization):
+# -------------------------------------------
+# This entire problem could be avoided by using regex to extract just the machine
+# data blocks (JSON-LD, microdata, OpenGraph tags) BEFORE parsing with lxml:
+#
+#   1. Use regex to extract <script type="application/ld+json">...</script> blocks
+#   2. Use regex to extract <meta property="og:*"> tags
+#   3. Use regex to find itemprop/itemtype attributes and their containing elements
+#   4. Parse ONLY those extracted snippets instead of the entire HTML document
+#
+# Benefits:
+#   - Avoids parsing 5MB of HTML when we only need a few KB of metadata
+#   - Eliminates the lxml memory leak entirely
+#   - Faster extraction (regex is much faster than DOM parsing)
+#   - No subprocess overhead needed
+#
+# Trade-offs:
+#   - Regex for HTML is brittle (comments, CDATA, edge cases)
+#   - Microdata extraction would be complex (need to track element boundaries)
+#   - Would need extensive testing to ensure we don't miss valid data
+#   - extruct is battle-tested; regex solution would need similar maturity
+#
+# For now, the subprocess approach is safer and leverages existing extruct code.
+# =============================================================================
+
+
+def _extract_itemprop_availability_worker(pipe_conn):
+    """
+    Subprocess worker for itemprop extraction (Linux memory management).
+
+    Uses spawn multiprocessing to isolate extruct/lxml memory allocations.
+    When the subprocess exits, the OS reclaims ALL memory including lxml's
+    C-level allocations that Python's GC cannot release.
+
+    Args:
+        pipe_conn: Pipe connection to receive HTML and send result
+    """
+    import json
+    import gc
+
+    html_content = None
+    result_data = None
+
+    try:
+        # Receive HTML as raw bytes (no pickle)
+        html_bytes = pipe_conn.recv_bytes()
+        html_content = html_bytes.decode('utf-8')
+
+        # Explicitly delete html_bytes to free memory
+        del html_bytes
+        gc.collect()
+
+        # Perform extraction in subprocess (uses extruct/lxml)
+        result_data = get_itemprop_availability(html_content)
+
+        # Convert Restock object to dict for JSON serialization
+        result = {
+            'success': True,
+            'data': dict(result_data) if result_data else {}
+        }
+        pipe_conn.send_bytes(json.dumps(result).encode('utf-8'))
+
+        # Clean up before exit
+        del result_data, html_content, result
+        gc.collect()
+
+    except MoreThanOnePriceFound:
+        # Serialize the specific exception type
+        result = {
+            'success': False,
+            'exception_type': 'MoreThanOnePriceFound'
+        }
+        pipe_conn.send_bytes(json.dumps(result).encode('utf-8'))
+
+    except Exception as e:
+        # Serialize other exceptions
+        result = {
+            'success': False,
+            'exception_type': type(e).__name__,
+            'exception_message': str(e)
+        }
+        pipe_conn.send_bytes(json.dumps(result).encode('utf-8'))
+
+    finally:
+        # Final cleanup before subprocess exits
+        # Variables may already be deleted in try block, so use try/except
+        try:
+            del html_content
+        except (NameError, UnboundLocalError):
+            pass
+        try:
+            del result_data
+        except (NameError, UnboundLocalError):
+            pass
+        gc.collect()
+        pipe_conn.close()
+
+
+def extract_itemprop_availability_safe(html_content) -> Restock:
+    """
+    Extract itemprop availability with hybrid approach for memory efficiency.
+
+    Strategy (fastest to slowest, least to most memory):
+    1. Try pure Python extraction (JSON-LD, OpenGraph, microdata) - covers 80%+ of cases
+    2. Fall back to extruct with subprocess isolation on Linux for complex cases
+
+    Args:
+        html_content: HTML string to parse
+
+    Returns:
+        Restock: Extracted availability data
+
+    Raises:
+        MoreThanOnePriceFound: When multiple prices detected
+        Other exceptions: From extruct/parsing
+    """
+    import platform
+
+    # Step 1: Try pure Python extraction first (fast, no lxml, no memory leak)
+    try:
+        from .pure_python_extractor import extract_metadata_pure_python, query_price_availability
+
+        logger.trace("Attempting pure Python metadata extraction (no lxml)")
+        extracted_data = extract_metadata_pure_python(html_content)
+        price_data = query_price_availability(extracted_data)
+
+        # If we got price AND availability, we're done!
+        if price_data.get('price') and price_data.get('availability'):
+            result = Restock(price_data)
+            logger.debug(f"Pure Python extraction successful: {dict(result)}")
+            return result
+
+        # If we got some data but not everything, still try extruct for completeness
+        if price_data.get('price') or price_data.get('availability'):
+            logger.debug(f"Pure Python extraction partial: {price_data}, will try extruct for completeness")
+
+    except Exception as e:
+        logger.debug(f"Pure Python extraction failed: {e}, falling back to extruct")
+
+    # Step 2: Fall back to extruct (uses lxml, needs subprocess on Linux)
+    logger.trace("Falling back to extruct (lxml-based) with subprocess isolation")
+
+    # Only use subprocess isolation on Linux
+    # Other platforms may have issues with spawn or don't need the aggressive memory management
+    if platform.system() == 'Linux':
+        import multiprocessing
+        import json
+        import gc
+
+        try:
+            ctx = multiprocessing.get_context('spawn')
+            parent_conn, child_conn = ctx.Pipe()
+            p = ctx.Process(target=_extract_itemprop_availability_worker, args=(child_conn,))
+            p.start()
+
+            # Send HTML as raw bytes (no pickle)
+            html_bytes = html_content.encode('utf-8')
+            parent_conn.send_bytes(html_bytes)
+
+            # Explicitly delete html_bytes copy immediately after sending
+            del html_bytes
+            gc.collect()
+
+            # Receive result as JSON
+            result_bytes = parent_conn.recv_bytes()
+            result = json.loads(result_bytes.decode('utf-8'))
+
+            # Wait for subprocess to complete
+            p.join()
+
+            # Close pipes
+            parent_conn.close()
+            child_conn.close()
+
+            # Clean up all subprocess-related objects
+            del p, parent_conn, child_conn, result_bytes
+            gc.collect()
+
+            # Handle result or re-raise exception
+            if result['success']:
+                # Reconstruct Restock object from dict
+                restock_obj = Restock(result['data'])
+                # Clean up result dict
+                del result
+                gc.collect()
+                return restock_obj
+            else:
+                # Re-raise the exception that occurred in subprocess
+                exception_type = result['exception_type']
+                exception_msg = result.get('exception_message', '')
+                del result
+                gc.collect()
+
+                if exception_type == 'MoreThanOnePriceFound':
+                    raise MoreThanOnePriceFound()
+                else:
+                    raise Exception(f"{exception_type}: {exception_msg}")
+
+        except Exception as e:
+            # If multiprocessing itself fails, log and fall back to direct call
+            logger.warning(f"Subprocess extraction failed: {e}, falling back to direct call")
+            gc.collect()
+            return get_itemprop_availability(html_content)
+    else:
+        # Non-Linux: direct call (no subprocess overhead needed)
+        return get_itemprop_availability(html_content)
+
+
 # should return Restock()
 # add casting?
 def get_itemprop_availability(html_content) -> Restock:
@@ -150,22 +404,37 @@ class perform_site_check(difference_detection_processor):
    screenshot = None
    xpath_data = None

-    def run_changedetection(self, watch):
+    def run_changedetection(self, watch, force_reprocess=False):
        import hashlib

        if not watch:
            raise Exception("Watch no longer exists.")

+        current_raw_document_checksum = self.get_raw_document_checksum()
+        # Skip processing only if BOTH conditions are true:
+        # 1. HTML content unchanged (checksum matches last saved checksum)
+        # 2. Watch configuration was not edited (including trigger_text, filters, etc.)
+        # The was_edited flag handles all watch configuration changes, so we don't need
+        # separate checks for trigger_text or other processing rules.
+        if (not force_reprocess and
+            not watch.was_edited and
+            self.last_raw_content_checksum and
+            self.last_raw_content_checksum == current_raw_document_checksum):
+            raise checksumFromPreviousCheckWasTheSame()
+
        # Unset any existing notification error
        update_obj = {'last_notification_error': False, 'last_error': False, 'restock':  Restock()}

        self.screenshot = self.fetcher.screenshot
        self.xpath_data = self.fetcher.xpath_data

-        # Track the content type
-        update_obj['content_type'] = self.fetcher.headers.get('Content-Type', '')
+        # Track the content type (readonly field, doesn't trigger was_edited)
+        update_obj['content-type'] = self.fetcher.headers.get('Content-Type', '')  # Use hyphen (matches OpenAPI spec)
        update_obj["last_check_status"] = self.fetcher.get_last_status_code()

+        # Save the raw content checksum to file (processor implementation detail, not watch config)
+        self.update_last_raw_content_checksum(current_raw_document_checksum)
+
        # Only try to process restock information (like scraping for keywords) if the page was actually rendered correctly.
        # Otherwise it will assume "in stock" because nothing suggesting the opposite was found
        from ...html_tools import html_to_text
@@ -196,8 +465,9 @@ class perform_site_check(difference_detection_processor):
        multiple_prices_found = False

        # Try built-in extraction first, this will scan metadata in the HTML
+        # On Linux, this runs in a subprocess to prevent lxml/extruct memory leaks
        try:
-            itemprop_availability = get_itemprop_availability(self.fetcher.content)
+            itemprop_availability = extract_itemprop_availability_safe(self.fetcher.content)
        except MoreThanOnePriceFound as e:
            # Don't raise immediately - let plugins try to handle this case
            # Plugins might be able to determine which price is correct
@@ -0,0 +1,286 @@
+"""
+Pure Python metadata extractor - no lxml, no memory leaks.
+
+This module provides a fast, memory-efficient alternative to extruct for common
+e-commerce metadata extraction. It handles:
+- JSON-LD (covers 80%+ of modern sites)
+- OpenGraph meta tags
+- Basic microdata attributes
+
+Uses Python's built-in html.parser instead of lxml/libxml2, avoiding C-level
+memory allocation issues. For edge cases, the main processor can fall back to
+extruct (with subprocess isolation on Linux).
+"""
+
+from html.parser import HTMLParser
+import json
+import re
+from loguru import logger
+
+
+class JSONLDExtractor(HTMLParser):
+    """
+    Extract JSON-LD structured data from HTML.
+
+    Finds all <script type="application/ld+json"> tags and parses their content.
+    Handles multiple JSON-LD blocks on the same page.
+    """
+
+    def __init__(self):
+        super().__init__()
+        self.in_jsonld = False
+        self.data = []  # List of all parsed JSON-LD objects
+        self.current_script = []
+
+    def handle_starttag(self, tag, attrs):
+        if tag == 'script':
+            # Check if this is a JSON-LD script tag
+            for attr, value in attrs:
+                if attr == 'type' and value == 'application/ld+json':
+                    self.in_jsonld = True
+                    self.current_script = []
+                    break
+
+    def handle_data(self, data):
+        if self.in_jsonld:
+            self.current_script.append(data)
+
+    def handle_endtag(self, tag):
+        if tag == 'script' and self.in_jsonld:
+            # Parse the accumulated script content
+            script_content = ''.join(self.current_script)
+            if script_content.strip():
+                try:
+                    # Parse JSON (handles both objects and arrays)
+                    parsed = json.loads(script_content)
+                    if isinstance(parsed, list):
+                        self.data.extend(parsed)
+                    else:
+                        self.data.append(parsed)
+                except json.JSONDecodeError as e:
+                    logger.debug(f"Failed to parse JSON-LD: {e}")
+                    pass
+
+            self.in_jsonld = False
+            self.current_script = []
+
+
+class OpenGraphExtractor(HTMLParser):
+    """
+    Extract OpenGraph meta tags from HTML.
+
+    Finds <meta property="og:*"> tags commonly used for social media sharing.
+    """
+
+    def __init__(self):
+        super().__init__()
+        self.og_data = {}
+
+    def handle_starttag(self, tag, attrs):
+        if tag == 'meta':
+            attrs_dict = dict(attrs)
+            prop = attrs_dict.get('property', '')
+
+            # Extract OpenGraph properties
+            if prop.startswith('og:'):
+                content = attrs_dict.get('content', '')
+                if content:
+                    self.og_data[prop] = content
+
+
+class MicrodataExtractor(HTMLParser):
+    """
+    Extract basic microdata attributes from HTML.
+
+    Finds elements with itemprop attributes. This is a simplified extractor
+    that doesn't handle nested itemscope/itemtype hierarchies - for complex
+    cases, use extruct as fallback.
+    """
+
+    def __init__(self):
+        super().__init__()
+        self.microdata = {}
+        self.current_itemprop = None
+
+    def handle_starttag(self, tag, attrs):
+        attrs_dict = dict(attrs)
+
+        if 'itemprop' in attrs_dict:
+            itemprop = attrs_dict['itemprop']
+
+            # Price/currency/availability can be in content/href attributes
+            if itemprop == 'price':
+                if 'content' in attrs_dict:
+                    self.microdata['price'] = attrs_dict['content']
+                else:
+                    self.current_itemprop = 'price'
+
+            elif itemprop == 'priceCurrency':
+                if 'content' in attrs_dict:
+                    self.microdata['currency'] = attrs_dict['content']
+                else:
+                    self.current_itemprop = 'priceCurrency'
+
+            elif itemprop == 'availability':
+                # Can be in href (link) or content (meta)
+                if 'href' in attrs_dict:
+                    self.microdata['availability'] = attrs_dict['href']
+                elif 'content' in attrs_dict:
+                    self.microdata['availability'] = attrs_dict['content']
+                else:
+                    self.current_itemprop = 'availability'
+
+    def handle_data(self, data):
+        # Capture text content for itemprop elements
+        if self.current_itemprop == 'price':
+            # Try to extract numeric price from text
+            try:
+                price_text = re.sub(r'[^\d.]', '', data.strip())
+                if price_text:
+                    self.microdata['price'] = float(price_text)
+            except ValueError:
+                pass
+        elif self.current_itemprop == 'priceCurrency':
+            currency = data.strip()
+            if currency:
+                self.microdata['currency'] = currency
+        elif self.current_itemprop == 'availability':
+            availability = data.strip()
+            if availability:
+                self.microdata['availability'] = availability
+
+    def handle_endtag(self, tag):
+        # Reset current itemprop after closing tag
+        self.current_itemprop = None
+
+
+def extract_metadata_pure_python(html_content):
+    """
+    Extract structured metadata from HTML using pure Python parsers.
+
+    Returns a dict with three keys:
+    - 'json-ld': List of parsed JSON-LD objects
+    - 'opengraph': Dict of OpenGraph properties
+    - 'microdata': Dict of microdata properties
+
+    Args:
+        html_content: HTML string to parse
+
+    Returns:
+        dict: Extracted metadata in three formats
+    """
+    result = {
+        'json-ld': [],
+        'opengraph': {},
+        'microdata': {}
+    }
+
+    # Extract JSON-LD
+    try:
+        jsonld_extractor = JSONLDExtractor()
+        jsonld_extractor.feed(html_content)
+        result['json-ld'] = jsonld_extractor.data
+        logger.trace(f"Pure Python: Found {len(jsonld_extractor.data)} JSON-LD blocks")
+    except Exception as e:
+        logger.debug(f"JSON-LD extraction failed: {e}")
+
+    # Extract OpenGraph
+    try:
+        og_extractor = OpenGraphExtractor()
+        og_extractor.feed(html_content)
+        result['opengraph'] = og_extractor.og_data
+        if result['opengraph']:
+            logger.trace(f"Pure Python: Found {len(og_extractor.og_data)} OpenGraph tags")
+    except Exception as e:
+        logger.debug(f"OpenGraph extraction failed: {e}")
+
+    # Extract Microdata
+    try:
+        microdata_extractor = MicrodataExtractor()
+        microdata_extractor.feed(html_content)
+        result['microdata'] = microdata_extractor.microdata
+        if result['microdata']:
+            logger.trace(f"Pure Python: Found microdata: {result['microdata']}")
+    except Exception as e:
+        logger.debug(f"Microdata extraction failed: {e}")
+
+    return result
+
+
+def query_price_availability(extracted_data):
+    """
+    Query extracted metadata for price and availability information.
+
+    Uses jsonpath_ng to query JSON-LD data (same approach as extruct).
+    Falls back to OpenGraph and microdata if JSON-LD doesn't have the data.
+
+    Args:
+        extracted_data: Dict from extract_metadata_pure_python()
+
+    Returns:
+        dict: {'price': float, 'currency': str, 'availability': str}
+    """
+    from jsonpath_ng import parse
+
+    result = {}
+
+    # 1. Try JSON-LD first (most reliable and common)
+    for data in extracted_data.get('json-ld', []):
+        try:
+            # Use jsonpath to find price/availability anywhere in the structure
+            price_parse = parse('$..(price|Price)')
+            availability_parse = parse('$..(availability|Availability)')
+            currency_parse = parse('$..(priceCurrency|currency|priceCurrency)')
+
+            price_results = [m.value for m in price_parse.find(data)]
+            if price_results and not result.get('price'):
+                # Handle various price formats
+                price_val = price_results[0]
+                if isinstance(price_val, (int, float)):
+                    result['price'] = float(price_val)
+                elif isinstance(price_val, str):
+                    # Extract numeric value from string
+                    try:
+                        result['price'] = float(re.sub(r'[^\d.]', '', price_val))
+                    except ValueError:
+                        pass
+
+            avail_results = [m.value for m in availability_parse.find(data)]
+            if avail_results and not result.get('availability'):
+                result['availability'] = str(avail_results[0])
+
+            curr_results = [m.value for m in currency_parse.find(data)]
+            if curr_results and not result.get('currency'):
+                result['currency'] = str(curr_results[0])
+
+            # If we found price, this JSON-LD block is good
+            if result.get('price'):
+                logger.debug(f"Pure Python: Found price data in JSON-LD: {result}")
+                break
+
+        except Exception as e:
+            logger.debug(f"Error querying JSON-LD: {e}")
+            continue
+
+    # 2. Try OpenGraph if JSON-LD didn't provide everything
+    og_data = extracted_data.get('opengraph', {})
+    if not result.get('price') and 'og:price:amount' in og_data:
+        try:
+            result['price'] = float(og_data['og:price:amount'])
+        except ValueError:
+            pass
+    if not result.get('currency') and 'og:price:currency' in og_data:
+        result['currency'] = og_data['og:price:currency']
+    if not result.get('availability') and 'og:availability' in og_data:
+        result['availability'] = og_data['og:availability']
+
+    # 3. Use microdata as last resort
+    microdata = extracted_data.get('microdata', {})
+    if not result.get('price') and 'price' in microdata:
+        result['price'] = microdata['price']
+    if not result.get('currency') and 'currency' in microdata:
+        result['currency'] = microdata['currency']
+    if not result.get('availability') and 'availability' in microdata:
+        result['availability'] = microdata['availability']
+
+    return result
@@ -17,7 +17,8 @@ def _task(watch, update_handler):

    try:
        # The slow process (we run 2 of these in parallel)
-        changed_detected, update_obj, text_after_filter = update_handler.run_changedetection(watch=watch)
+        # Always force reprocess for preview - we want to show the filtered content regardless of checksums
+        changed_detected, update_obj, text_after_filter = update_handler.run_changedetection(watch=watch, force_reprocess=True)
    except FilterNotFoundInResponse as e:
        text_after_filter = f"Filter not found in HTML: {str(e)}"
    except ReplyWithContentButNoText as e:
@@ -7,6 +7,7 @@ import re
 import urllib3

 from changedetectionio.conditions import execute_ruleset_against_all_plugins
+from changedetectionio.content_fetchers.exceptions import checksumFromPreviousCheckWasTheSame
 from ..base import difference_detection_processor
 from changedetectionio.html_tools import PERL_STYLE_REGEX, cdata_in_document_to_text, TRANSLATE_WHITESPACE_TABLE
 from changedetectionio import html_tools, content_fetchers
@@ -368,12 +369,24 @@ class ChecksumCalculator:
 # (set_proxy_from_list)
 class perform_site_check(difference_detection_processor):

-    def run_changedetection(self, watch):
+    def run_changedetection(self, watch, force_reprocess=False):
        changed_detected = False

        if not watch:
            raise Exception("Watch no longer exists.")

+        current_raw_document_checksum = self.get_raw_document_checksum()
+        # Skip processing only if BOTH conditions are true:
+        # 1. HTML content unchanged (checksum matches last saved checksum)
+        # 2. Watch configuration was not edited (including trigger_text, filters, etc.)
+        # The was_edited flag handles all watch configuration changes, so we don't need
+        # separate checks for trigger_text or other processing rules.
+        if (not force_reprocess and
+            not watch.was_edited and
+            self.last_raw_content_checksum and
+            self.last_raw_content_checksum == current_raw_document_checksum):
+            raise checksumFromPreviousCheckWasTheSame()
+
        # Initialize components
        filter_config = FilterConfig(watch, self.datastore)
        content_processor = ContentProcessor(self.fetcher, watch, filter_config, self.datastore)
@@ -391,9 +404,11 @@ class perform_site_check(difference_detection_processor):
        self.screenshot = self.fetcher.screenshot
        self.xpath_data = self.fetcher.xpath_data

-        # Track the content type and checksum before filters
-        update_obj['content_type'] = ctype_header
-        update_obj['previous_md5_before_filters'] = hashlib.md5(self.fetcher.content.encode('utf-8')).hexdigest()
+        # Track the content type (readonly field, doesn't trigger was_edited)
+        update_obj['content-type'] = ctype_header  # Use hyphen (matches OpenAPI spec and watch_base default)
+
+        # Save the raw content checksum to file (processor implementation detail, not watch config)
+        self.update_last_raw_content_checksum(current_raw_document_checksum)

        # === CONTENT PREPROCESSING ===
        # Avoid creating unnecessary intermediate string copies by reassigning only when needed
@@ -33,9 +33,8 @@ except ImportError:
 from ..processors import get_custom_watch_obj_for_processor

 # Import the base class and helpers
-from .file_saving_datastore import FileSavingDataStore, load_all_watches, load_all_tags, save_watch_atomic, save_tag_atomic, save_json_atomic
+from .file_saving_datastore import FileSavingDataStore, load_all_watches, load_all_tags, save_json_atomic
 from .updates import DatastoreUpdatesMixin
-from .legacy_loader import has_legacy_datastore

 # Because the server will run as a daemon and wont know the URL for notification links when firing off a notification
 BASE_URL_NOT_SET_TEXT = '("Base URL" not set - see settings - notifications)'
@@ -78,7 +77,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
            logger.info(f"Backing up changedetection.json due to new version to '{db_path_version_backup}'.")
            copyfile(db_path, db_path_version_backup)

-    def _load_settings(self):
+    def _load_settings(self, filename="changedetection.json"):
        """
        Load settings from storage.

@@ -87,7 +86,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
        Returns:
            dict: Settings data loaded from storage
        """
-        changedetection_json = os.path.join(self.datastore_path, "changedetection.json")
+        changedetection_json = os.path.join(self.datastore_path, filename)

        logger.info(f"Loading settings from {changedetection_json}")

@@ -122,6 +121,11 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
            if 'application' in settings_data['settings']:
                self.__data['settings']['application'].update(settings_data['settings']['application'])

+        # More or less for the old format which had this data in the one url-watches.json
+        # cant hurt to leave it here,
+        if 'watching' in settings_data:
+            self.__data['watching'].update(settings_data['watching'])
+
    def _rehydrate_tags(self):
        """Rehydrate tag entities from stored data into Tag objects with restock_diff processor."""
        from ..model import Tag
@@ -146,23 +150,28 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
        logger.info(f"Rehydrating {watch_count} watches...")
        watching_rehydrated = {}
        for uuid, watch_dict in self.__data.get('watching', {}).items():
-            watching_rehydrated[uuid] = self.rehydrate_entity(uuid, watch_dict)
+            if isinstance(watch_dict, dict):
+                watching_rehydrated[uuid] = self.rehydrate_entity(uuid, watch_dict)
+            else:
+                logger.error(f"Watch UUID {uuid} already rehydrated")
+
        self.__data['watching'] = watching_rehydrated
        logger.success(f"Rehydrated {watch_count} watches into Watch objects")


-    def _load_state(self):
+    def _load_state(self, main_settings_filename="changedetection.json"):
        """
        Load complete datastore state from storage.

        Orchestrates loading of settings, watches, and tags using polymorphic methods.
        """
        # Load settings
-        settings_data = self._load_settings()
+        settings_data = self._load_settings(filename=main_settings_filename)
        self._apply_settings(settings_data)

-        # Load watches (polymorphic - parent class method)
+        # Load watches, scan them from the disk
        self._load_watches()
+        self._rehydrate_watches()

        # Load tags from individual tag.json files
        # These will override any tags in settings (migration path)
@@ -200,112 +209,73 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):

        # Check if datastore already exists
        changedetection_json = os.path.join(self.datastore_path, "changedetection.json")
+        changedetection_json_old_schema = os.path.join(self.datastore_path, "url-watches.json")

        if os.path.exists(changedetection_json):
-            # Load existing datastore (changedetection.json + watch.json files)
-            logger.info("Loading existing datastore")
-            try:
-                self._load_state()
-            except Exception as e:
-                logger.critical(f"Failed to load datastore: {e}")
-                raise
-
            # Run schema updates if needed
            # Pass current schema version from loaded datastore (defaults to 0 if not set)
+            # Load existing datastore (changedetection.json + watch.json files)
+            logger.info("Loading existing datastore")
+            self._load_state()
+            current_schema = self.data['settings']['application'].get('schema_version', 0)
+            self.run_updates(current_schema_version=current_schema)
+
+        # Legacy datastore detected - trigger migration, even works if the schema is much before the migration step.
+        elif os.path.exists(changedetection_json_old_schema):
+
+            logger.critical(f"Legacy datastore detected at {changedetection_json_old_schema}, loading and running updates")
+            self._load_state(main_settings_filename="url-watches.json")
+            # update 26 will load the whole old config from disk to __data
            current_schema = self.__data['settings']['application'].get('schema_version', 0)
            self.run_updates(current_schema_version=current_schema)
+            # Probably tags were also shifted to disk and many other changes, so best to reload here.
+            self._load_state()

        else:
            # No datastore yet - check if this is a fresh install or legacy migration
-            # Generate app_guid FIRST (required for all operations)
-            if "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ:
-                self.__data['app_guid'] = "test-" + str(uuid_builder.uuid4())
-            else:
-                self.__data['app_guid'] = str(uuid_builder.uuid4())
+            self.init_fresh_install(include_default_watches=include_default_watches,
+                                    version_tag=version_tag)

-            # Generate RSS access token
-            self.__data['settings']['application']['rss_access_token'] = secrets.token_hex(16)
+    def init_fresh_install(self, include_default_watches, version_tag):
+      # Generate app_guid FIRST (required for all operations)
+        if "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ:
+            self.__data['app_guid'] = "test-" + str(uuid_builder.uuid4())
+        else:
+            self.__data['app_guid'] = str(uuid_builder.uuid4())

-            # Generate API access token
-            self.__data['settings']['application']['api_access_token'] = secrets.token_hex(16)
+        # Generate RSS access token
+        self.__data['settings']['application']['rss_access_token'] = secrets.token_hex(16)

-            # Check if legacy datastore exists (url-watches.json)
-            if has_legacy_datastore(self.datastore_path):
-                # Legacy datastore detected - trigger migration
-                logger.critical(f"Legacy datastore detected at {self.datastore_path}/url-watches.json")
-                logger.critical("Migration will be triggered via update_26")
+        # Generate API access token
+        self.__data['settings']['application']['api_access_token'] = secrets.token_hex(16)
+        logger.warning(f"No datastore found, creating new datastore at {self.datastore_path}")

-                # Load the legacy datastore
-                from .legacy_loader import load_legacy_format
-                legacy_path = os.path.join(self.datastore_path, "url-watches.json")
-                legacy_data = load_legacy_format(legacy_path)
+        # Set schema version to latest (no updates needed)
+        latest_update_available = self.get_updates_available().pop()
+        logger.info(f"Marking fresh install to schema version {latest_update_available}")
+        self.__data['settings']['application']['schema_version'] = latest_update_available

-                if not legacy_data:
-                    raise Exception("Failed to load legacy datastore from url-watches.json")
+        # Add default watches if requested
+        if include_default_watches:
+            self.add_watch(
+                url='https://news.ycombinator.com/',
+                tag='Tech news',
+                extras={'fetch_backend': 'html_requests'}
+            )
+            self.add_watch(
+                url='https://changedetection.io/CHANGELOG.txt',
+                tag='changedetection.io',
+                extras={'fetch_backend': 'html_requests'}
+            )

-                # Merge legacy data with base_config defaults (preserves new fields like 'ui')
-                # self.__data already has App.model() defaults from line 190
-                logger.info("Merging legacy data with base_config defaults...")
-
-                # Apply top-level fields from legacy data
-                if 'app_guid' in legacy_data:
-                    self.__data['app_guid'] = legacy_data['app_guid']
-                if 'build_sha' in legacy_data:
-                    self.__data['build_sha'] = legacy_data['build_sha']
-                if 'version_tag' in legacy_data:
-                    self.__data['version_tag'] = legacy_data['version_tag']
-
-                # Apply watching data (complete replacement as these are user's watches)
-                if 'watching' in legacy_data:
-                    self.__data['watching'] = legacy_data['watching']
-
-                # Merge settings sections (preserves base_config defaults for missing fields)
-                if 'settings' in legacy_data:
-                    if 'headers' in legacy_data['settings']:
-                        self.__data['settings']['headers'].update(legacy_data['settings']['headers'])
-                    if 'requests' in legacy_data['settings']:
-                        self.__data['settings']['requests'].update(legacy_data['settings']['requests'])
-                    if 'application' in legacy_data['settings']:
-                        # CRITICAL: Use .update() to merge, not replace
-                        # This preserves new fields like 'ui' that exist in base_config
-                        self.__data['settings']['application'].update(legacy_data['settings']['application'])
-
-                # CRITICAL: Rehydrate watches from dicts into Watch objects
-                # This ensures watches have their methods available during migration
-                self._rehydrate_watches()
-
-                # update_26 will save watches to individual files and create changedetection.json
-                # Next startup will load from new format normally
-                self.run_updates()
+        # Create changedetection.json immediately
+        try:
+            self._save_settings()
+            logger.info("Created changedetection.json for new datastore")
+        except Exception as e:
+            logger.error(f"Failed to create initial changedetection.json: {e}")


-            else:
-                # Fresh install - create new datastore
-                logger.warning(f"No datastore found, creating new datastore at {self.datastore_path}")
-
-                # Set schema version to latest (no updates needed)
-                updates_available = self.get_updates_available()
-                self.__data['settings']['application']['schema_version'] = updates_available.pop() if updates_available else 26
-
-                # Add default watches if requested
-                if include_default_watches:
-                    self.add_watch(
-                        url='https://news.ycombinator.com/',
-                        tag='Tech news',
-                        extras={'fetch_backend': 'html_requests'}
-                    )
-                    self.add_watch(
-                        url='https://changedetection.io/CHANGELOG.txt',
-                        tag='changedetection.io',
-                        extras={'fetch_backend': 'html_requests'}
-                    )
-
-                # Create changedetection.json immediately
-                try:
-                    self._save_settings()
-                    logger.info("Created changedetection.json for new datastore")
-                except Exception as e:
-                    logger.error(f"Failed to create initial changedetection.json: {e}")

        # Set version tag
        self.__data['version_tag'] = version_tag
@@ -383,17 +353,9 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
        # Deep copy settings to avoid modifying the original
        settings_copy = copy.deepcopy(self.__data['settings'])

-        # Only exclude tags if we've already migrated them to individual files (schema >= 28)
-        # This ensures update_28 can migrate tags from settings
-        schema_version = self.__data['settings']['application'].get('schema_version', 0)
-        if schema_version >= 28:
-            # Tags are in individual tag.json files, don't save to settings
-            settings_copy['application']['tags'] = {}
-        # else: keep tags in settings for update_28 migration
-
        return {
            'note': 'Settings file - watches are in {uuid}/watch.json, tags are in {uuid}/tag.json',
-            'app_guid': self.__data['app_guid'],
+            'app_guid': self.__data.get('app_guid'),
            'settings': settings_copy,
            'build_sha': self.__data.get('build_sha'),
            'version_tag': self.__data.get('version_tag')
@@ -422,15 +384,14 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
        Implementation of abstract method from FileSavingDataStore.
        Delegates to helper function and stores results in internal data structure.
        """
-        watching = load_all_watches(
-            self.datastore_path,
-            self.rehydrate_entity
-        )

        # Store loaded data
-        self.__data['watching'] = watching
-
-        logger.debug(f"Loaded {len(watching)} watches")
+        # @note this will also work for the old legacy format because self.__data['watching'] should already have them loaded by this point.
+        self.__data['watching'].update(load_all_watches(
+            self.datastore_path,
+            self.rehydrate_entity
+        ))
+        logger.debug(f"Loaded {len(self.__data['watching'])} watches")

    def _load_tags(self):
        """
@@ -495,6 +456,63 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
        self.__data['settings']['application']['password'] = False
        self.commit()

+    def clear_all_last_checksums(self):
+        """
+        Delete all last-checksum.txt files to force reprocessing of all watches.
+
+        This should be called when global settings change, since watches inherit
+        configuration and need to reprocess even if their individual watch dict
+        hasn't been modified.
+
+        Note: We delete the checksum file rather than setting was_edited=True because:
+        - was_edited is not persisted across restarts
+        - File deletion ensures reprocessing works across app restarts
+        """
+        deleted_count = 0
+        for uuid in self.__data['watching'].keys():
+            watch = self.__data['watching'][uuid]
+            if watch.data_dir:
+                checksum_file = os.path.join(watch.data_dir, 'last-checksum.txt')
+                if os.path.isfile(checksum_file):
+                    try:
+                        os.remove(checksum_file)
+                        deleted_count += 1
+                        logger.debug(f"Cleared checksum for watch {uuid}")
+                    except OSError as e:
+                        logger.warning(f"Failed to delete checksum file for {uuid}: {e}")
+
+        logger.info(f"Cleared {deleted_count} checksum files to force reprocessing")
+        return deleted_count
+
+    def clear_checksums_for_tag(self, tag_uuid):
+        """
+        Delete last-checksum.txt files for all watches using a specific tag.
+
+        This should be called when a tag configuration is edited, since watches
+        inherit tag settings and need to reprocess.
+
+        Args:
+            tag_uuid: UUID of the tag that was modified
+
+        Returns:
+            int: Number of checksum files deleted
+        """
+        deleted_count = 0
+        for uuid, watch in self.__data['watching'].items():
+            if watch.get('tags') and tag_uuid in watch['tags']:
+                if watch.data_dir:
+                    checksum_file = os.path.join(watch.data_dir, 'last-checksum.txt')
+                    if os.path.isfile(checksum_file):
+                        try:
+                            os.remove(checksum_file)
+                            deleted_count += 1
+                            logger.debug(f"Cleared checksum for watch {uuid} (tag {tag_uuid})")
+                        except OSError as e:
+                            logger.warning(f"Failed to delete checksum file for {uuid}: {e}")
+
+        logger.info(f"Cleared {deleted_count} checksum files for tag {tag_uuid}")
+        return deleted_count
+
    def commit(self):
        """
        Save settings immediately to disk using atomic write.
@@ -207,15 +207,6 @@ def save_watch_atomic(watch_dir, uuid, watch_dict):
    save_entity_atomic(watch_dir, uuid, watch_dict, "watch.json", "watch", max_size_mb=10)


-def save_tag_atomic(tag_dir, uuid, tag_dict):
-    """
-    Save a tag to disk using atomic write pattern.
-
-    Convenience wrapper around save_entity_atomic for tags.
-    Kept for backwards compatibility.
-    """
-    save_entity_atomic(tag_dir, uuid, tag_dict, "tag.json", "tag", max_size_mb=1)
-

 def load_watch_from_file(watch_json, uuid, rehydrate_entity_func):
    """
@@ -1,66 +0,0 @@
-"""
-Legacy format loader for url-watches.json.
-
-Provides functions to detect and load from the legacy monolithic JSON format.
-Used during migration (update_26) to transition to individual watch.json files.
-"""
-
-import os
-import json
-from loguru import logger
-
-# Try to import orjson for faster JSON serialization
-try:
-    import orjson
-    HAS_ORJSON = True
-except ImportError:
-    HAS_ORJSON = False
-
-
-def has_legacy_datastore(datastore_path):
-    """
-    Check if a legacy url-watches.json file exists.
-
-    This is used by update_26 to determine if migration is needed.
-
-    Args:
-        datastore_path: Path to datastore directory
-
-    Returns:
-        bool: True if url-watches.json exists
-    """
-    url_watches_json = os.path.join(datastore_path, "url-watches.json")
-    return os.path.exists(url_watches_json)
-
-
-def load_legacy_format(json_store_path):
-    """
-    Load datastore from legacy url-watches.json format.
-
-    Args:
-        json_store_path: Full path to url-watches.json file
-
-    Returns:
-        dict: Loaded datastore data with 'watching', 'settings', etc.
-        None: If file doesn't exist or loading failed
-    """
-    logger.info(f"Loading from legacy format: {json_store_path}")
-
-    if not os.path.isfile(json_store_path):
-        logger.warning(f"Legacy file not found: {json_store_path}")
-        return None
-
-    try:
-        if HAS_ORJSON:
-            with open(json_store_path, 'rb') as f:
-                data = orjson.loads(f.read())
-        else:
-            with open(json_store_path, 'r', encoding='utf-8') as f:
-                data = json.load(f)
-
-        logger.info(f"Loaded {len(data.get('watching', {}))} watches from legacy format")
-        return data
-
-    except Exception as e:
-        logger.error(f"Failed to load legacy format: {e}")
-        return None
@@ -16,12 +16,18 @@ import time
 from loguru import logger
 from copy import deepcopy

+
+# Try to import orjson for faster JSON serialization
+try:
+    import orjson
+    HAS_ORJSON = True
+except ImportError:
+    HAS_ORJSON = False
+
 from ..html_tools import TRANSLATE_WHITESPACE_TABLE
 from ..processors.restock_diff import Restock
 from ..blueprint.rss import RSS_CONTENT_FORMAT_DEFAULT
 from ..model import USE_SYSTEM_DEFAULT_NOTIFICATION_FORMAT_FOR_WATCH
-from .file_saving_datastore import save_watch_atomic
-

 def create_backup_tarball(datastore_path, update_number):
    """
@@ -97,7 +103,7 @@ def create_backup_tarball(datastore_path, update_number):
                    tar.add(tag_json, arcname=f"{entry}/tag.json")
                    tag_count += 1

-            logger.success(f"Backup created: {backup_filename} ({watch_count} watches, {tag_count} tags)")
+            logger.success(f"Backup created: {backup_filename} ({watch_count} watches from disk, {tag_count} tags from disk)")
            return backup_path

    except Exception as e:
@@ -137,6 +143,7 @@ class DatastoreUpdatesMixin:
        return updates_available

    def run_updates(self, current_schema_version=None):
+        import sys
        """
        Run all pending schema updates sequentially.

@@ -160,6 +167,23 @@ class DatastoreUpdatesMixin:
        4. All changes saved via individual .commit() calls
        """
        updates_available = self.get_updates_available()
+        if self.data.get('watching'):
+            test_watch = self.data['watching'].get(next(iter(self.data.get('watching', {}))))
+            from ..model.Watch import model
+
+            if not isinstance(test_watch, model):
+                import sys
+                logger.critical("Cannot run updates! Watch structure must be re-hydrated back to a Watch model object!")
+                sys.exit(1)
+
+        if self.data['settings']['application'].get('tags',{}):
+            test_tag = self.data['settings']['application'].get('tags',{}).get(next(iter(self.data['settings']['application'].get('tags',{}))))
+            from ..model.Tag import model as tag_model
+
+            if not isinstance(test_tag, tag_model):
+                import sys
+                logger.critical("Cannot run updates! Watch tag/group structure must be re-hydrated back to a Tag model object!")
+                sys.exit(1)

        # Determine current schema version
        if current_schema_version is None:
@@ -201,10 +225,9 @@ class DatastoreUpdatesMixin:
                try:
                    update_method = getattr(self, f"update_{update_n}")()
                except Exception as e:
-                    logger.error(f"Error while trying update_{update_n}")
-                    logger.error(e)
-                    # Don't run any more updates
-                    return
+                    logger.critical(f"Error while trying update_{update_n}")
+                    logger.exception(e)
+                    sys.exit(1)
                else:
                    # Bump the version
                    self.data['settings']['application']['schema_version'] = update_n
@@ -555,27 +578,6 @@ class DatastoreUpdatesMixin:
        logger.critical("COPY-based migration: url-watches.json will remain intact for rollback")
        logger.critical("=" * 80)

-        # Check if already migrated
-        changedetection_json = os.path.join(self.datastore_path, "changedetection.json")
-        if os.path.exists(changedetection_json):
-            logger.info("Migration already completed (changedetection.json exists), skipping")
-            return
-
-        # Check if we need to load legacy data
-        from .legacy_loader import has_legacy_datastore, load_legacy_format
-
-        if not has_legacy_datastore(self.datastore_path):
-            logger.info("No legacy datastore found, nothing to migrate")
-            return
-
-        # Load legacy data from url-watches.json
-        logger.critical("Loading legacy datastore from url-watches.json...")
-        legacy_path = os.path.join(self.datastore_path, "url-watches.json")
-        legacy_data = load_legacy_format(legacy_path)
-
-        if not legacy_data:
-            raise Exception("Failed to load legacy datastore from url-watches.json")
-
        # Populate settings from legacy data
        logger.info("Populating settings from legacy data...")
        watch_count = len(self.data['watching'])
@@ -587,9 +589,7 @@ class DatastoreUpdatesMixin:
        saved_count = 0
        for uuid, watch in self.data['watching'].items():
            try:
-                watch_dict = dict(watch)
-                watch_dir = os.path.join(self.datastore_path, uuid)
-                save_watch_atomic(watch_dir, uuid, watch_dict)
+                watch.commit()
                saved_count += 1

                if saved_count % 100 == 0:
@@ -635,18 +635,19 @@ class DatastoreUpdatesMixin:

        # Phase 4: Verify settings file exists
        logger.critical("Phase 4/4: Verifying changedetection.json exists...")
+        changedetection_json_new_schema=os.path.join(self.datastore_path, "changedetection.json")
+        if not os.path.isfile(changedetection_json_new_schema):
+            import sys
+            logger.critical("Migration failed, changedetection.json not found after update ran!")
+            sys.exit(1)

-        if not os.path.isfile(changedetection_json):
-            raise Exception(
-                "Migration failed: changedetection.json not found after save. "
-                "url-watches.json remains intact, safe to retry."
-            )

        logger.critical("Phase 4 complete: Verified changedetection.json exists")

        # Success! Now reload from new format
        logger.critical("Reloading datastore from new format...")
-        self._load_state() # Includes load_watches
+        # write it to disk, it will be saved without ['watching'] in the JSON db because we find it from disk glob
+        self._save_settings()
        logger.success("Datastore reloaded from new format successfully")
        logger.critical("=" * 80)
        logger.critical("MIGRATION COMPLETED SUCCESSFULLY!")
@@ -681,9 +682,11 @@ class DatastoreUpdatesMixin:
        - Enables independent tag versioning/backup
        - Maintains backwards compatibility (tags stay in settings too)
        """
+        # Force save as tag.json (not watch.json) even if object is corrupted
+
        logger.critical("=" * 80)
        logger.critical("Running migration: Individual tag persistence (update_28)")
-        logger.critical("Creating individual tag.json files (tags remain in settings too)")
+        logger.critical("Creating individual tag.json files")
        logger.critical("=" * 80)

        tags = self.data['settings']['application'].get('tags', {})
@@ -700,27 +703,8 @@ class DatastoreUpdatesMixin:

        for uuid, tag_data in tags.items():
            try:
-                # Force save as tag.json (not watch.json) even if object is corrupted
-                from changedetectionio.store.file_saving_datastore import save_entity_atomic
-                import os
-
-                tag_dir = os.path.join(self.datastore_path, uuid)
-                os.makedirs(tag_dir, exist_ok=True)
-
-                # Convert to dict if it's an object
-                tag_dict = dict(tag_data) if hasattr(tag_data, '__iter__') else tag_data
-
-                # Save explicitly as tag.json
-                save_entity_atomic(
-                    tag_dir,
-                    uuid,
-                    tag_dict,
-                    filename='tag.json',
-                    entity_type='tag',
-                    max_size_mb=1
-                )
+                tag_data.commit()
                saved_count += 1
-
                if saved_count % 10 == 0:
                    logger.info(f"  Progress: {saved_count}/{tag_count} tags migrated...")

@@ -737,5 +721,5 @@ class DatastoreUpdatesMixin:
        # On next load, _load_tags() will read from tag.json files and merge with settings
        logger.info("Tags saved to both settings AND individual tag.json files")
        logger.info("Future tag edits will update both locations (dual storage)")
+        logger.critical("=" * 80)

-        logger.critical("=" * 80)
@@ -328,6 +328,68 @@ def test_api_simple(client, live_server, measure_memory_usage, datastore_path):
    )
    assert len(res.json) == 0, "Watch list should be empty"

+def test_roundtrip_API(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test the full round trip, this way we test the default Model fits back into OpenAPI spec
+    :param client:
+    :param live_server:
+    :param measure_memory_usage:
+    :param datastore_path:
+    :return:
+    """
+    api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
+
+    set_original_response(datastore_path=datastore_path)
+    test_url = url_for('test_endpoint', _external=True)
+
+    # Create new
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({"url": test_url}),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+        follow_redirects=True
+    )
+
+    assert res.status_code == 201
+    uuid = res.json.get('uuid')
+
+    # Now fetch it and send it back
+
+    res = client.get(
+        url_for("watch", uuid=uuid),
+        headers={'x-api-key': api_key}
+    )
+
+    watch=res.json
+
+    # Be sure that 'readOnly' values are never updated in the real watch
+    watch['last_changed'] = 454444444444
+    watch['date_created'] = 454444444444
+
+    # HTTP PUT ( UPDATE an existing watch )
+    res = client.put(
+        url_for("watch", uuid=uuid),
+        headers={'x-api-key': api_key, 'content-type': 'application/json'},
+        data=json.dumps(watch),
+    )
+    if res.status_code != 200:
+        print(f"\n=== PUT failed with {res.status_code} ===")
+        print(f"Error: {res.data}")
+    assert res.status_code == 200, "HTTP PUT update was sent OK"
+
+    res = client.get(
+        url_for("watch", uuid=uuid),
+        headers={'x-api-key': api_key}
+    )
+    last_changed = res.json.get('last_changed')
+    assert last_changed != 454444444444
+    assert last_changed != "454444444444"
+
+    date_created = res.json.get('date_created')
+    assert date_created != 454444444444
+    assert date_created != "454444444444"
+
+
 def test_access_denied(client, live_server, measure_memory_usage, datastore_path):
    # `config_api_token_enabled` Should be On by default
    res = client.get(
@@ -401,6 +463,9 @@ def test_api_watch_PUT_update(client, live_server, measure_memory_usage, datasto
        follow_redirects=True
    )

+    if res.status_code != 201:
+        print(f"\n=== POST createwatch failed with {res.status_code} ===")
+        print(f"Response: {res.data}")
    assert res.status_code == 201

    wait_for_all_checks(client)
@@ -464,11 +529,12 @@ def test_api_watch_PUT_update(client, live_server, measure_memory_usage, datasto
    )

    assert res.status_code == 400, "Should get error 400 when we give a field that doesnt exist"
-    # Message will come from `flask_expects_json`
-    # With patternProperties for processor_config_*, the error message format changed slightly
-    assert (b'Additional properties are not allowed' in res.data or
+    # Backend validation now rejects unknown fields with a clear error message
+    assert (b'Unknown field' in res.data or
+            b'Additional properties are not allowed' in res.data or
+            b'Unevaluated properties are not allowed' in res.data or
            b'does not match any of the regexes' in res.data), \
-            "Should reject unknown fields with schema validation error"
+            "Should reject unknown fields with validation error"


    # Try a XSS URL
@@ -553,6 +619,8 @@ def test_api_import(client, live_server, measure_memory_usage, datastore_path):
    assert res.status_code == 200
    uuid = res.json[0]
    watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
+    assert isinstance(watch['notification_urls'], list), "notification_urls must be stored as a list"
+    assert len(watch['notification_urls']) == 2, "notification_urls should have 2 entries"
    assert 'mailto://test@example.com' in watch['notification_urls'], "notification_urls should contain first email"
    assert 'mailto://admin@example.com' in watch['notification_urls'], "notification_urls should contain second email"

@@ -599,6 +667,34 @@ def test_api_import(client, live_server, measure_memory_usage, datastore_path):
    assert res.status_code == 400, "Should reject unknown field"
    assert b"Unknown watch configuration parameter" in res.data, "Error message should mention unknown parameter"

+    # Test 7: Import with complex nested array (browser_steps) - array of objects
+    browser_steps = json.dumps([
+        {"operation": "wait", "selector": "5", "optional_value": ""},
+        {"operation": "click", "selector": "button.submit", "optional_value": ""}
+    ])
+    params = urllib.parse.urlencode({
+        'tag': 'browser-test',
+        'browser_steps': browser_steps
+    })
+
+    res = client.post(
+        url_for("import") + "?" + params,
+        data='https://website8.com',
+        headers={'x-api-key': api_key},
+        follow_redirects=True
+    )
+
+    assert res.status_code == 200, "Should accept browser_steps array"
+    uuid = res.json[0]
+    watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
+    assert len(watch['browser_steps']) == 2, "Should have 2 browser steps"
+    assert watch['browser_steps'][0]['operation'] == 'wait', "First step should be wait"
+    assert watch['browser_steps'][1]['operation'] == 'click', "Second step should be click"
+    assert watch['browser_steps'][1]['selector'] == 'button.submit', "Second step selector should be button.submit"
+
+    # Cleanup
+    delete_all_watches(client)
+

 def test_api_import_small_synchronous(client, live_server, measure_memory_usage, datastore_path):
    """Test that small imports (< threshold) are processed synchronously"""
@@ -837,7 +933,9 @@ def test_api_url_validation(client, live_server, measure_memory_usage, datastore
    )
    assert res.status_code == 400, "Updating watch URL to null should fail"
    # Accept either OpenAPI validation error or our custom validation error
-    assert b'URL cannot be null' in res.data or b'OpenAPI validation failed' in res.data or b'validation error' in res.data.lower()
+    assert (b'URL cannot be null' in res.data or
+            b'Validation failed' in res.data or
+            b'validation error' in res.data.lower())

    # Test 8: UPDATE to empty string URL should fail
    res = client.put(
@@ -924,3 +1022,140 @@ def test_api_url_validation(client, live_server, measure_memory_usage, datastore
        headers={'x-api-key': api_key},
    )
    delete_all_watches(client)
+
+
+def test_api_time_between_check_validation(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that time_between_check validation works correctly:
+    - When time_between_check_use_default is false, at least one time value must be > 0
+    - Values must be valid integers
+    """
+    import json
+    from flask import url_for
+    
+    api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
+    
+    # Test 1: time_between_check_use_default=false with NO time_between_check should fail
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example.com",
+            "time_between_check_use_default": False
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 400, "Should fail when time_between_check_use_default=false with no time_between_check"
+    assert b"At least one time interval" in res.data, "Error message should mention time interval requirement"
+    
+    # Test 2: time_between_check_use_default=false with ALL zeros should fail
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example.com",
+            "time_between_check_use_default": False,
+            "time_between_check": {
+                "weeks": 0,
+                "days": 0,
+                "hours": 0,
+                "minutes": 0,
+                "seconds": 0
+            }
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 400, "Should fail when all time values are 0"
+    assert b"At least one time interval" in res.data, "Error message should mention time interval requirement"
+    
+    # Test 3: time_between_check_use_default=false with NULL values should fail
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example.com",
+            "time_between_check_use_default": False,
+            "time_between_check": {
+                "weeks": None,
+                "days": None,
+                "hours": None,
+                "minutes": None,
+                "seconds": None
+            }
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 400, "Should fail when all time values are null"
+    assert b"At least one time interval" in res.data, "Error message should mention time interval requirement"
+    
+    # Test 4: time_between_check_use_default=false with valid hours should succeed
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example.com",
+            "time_between_check_use_default": False,
+            "time_between_check": {
+                "hours": 2
+            }
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 201, "Should succeed with valid hours value"
+    uuid1 = res.json.get('uuid')
+    
+    # Test 5: time_between_check_use_default=false with valid minutes should succeed
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example2.com",
+            "time_between_check_use_default": False,
+            "time_between_check": {
+                "minutes": 30
+            }
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 201, "Should succeed with valid minutes value"
+    uuid2 = res.json.get('uuid')
+    
+    # Test 6: time_between_check_use_default=true (or missing) with no time_between_check should succeed (uses defaults)
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example3.com",
+            "time_between_check_use_default": True
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 201, "Should succeed when using default settings"
+    uuid3 = res.json.get('uuid')
+    
+    # Test 7: Default behavior (no time_between_check_use_default field) should use defaults and succeed
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example4.com"
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 201, "Should succeed with default behavior (using global settings)"
+    uuid4 = res.json.get('uuid')
+    
+    # Test 8: Verify integer type validation - string should fail (OpenAPI validation)
+    res = client.post(
+        url_for("createwatch"),
+        data=json.dumps({
+            "url": "https://example5.com",
+            "time_between_check_use_default": False,
+            "time_between_check": {
+                "hours": "not_a_number"
+            }
+        }),
+        headers={'content-type': 'application/json', 'x-api-key': api_key},
+    )
+    assert res.status_code == 400, "Should fail when time value is not an integer"
+    assert b"Validation failed" in res.data or b"not of type" in res.data, "Should mention validation/type error"
+    
+    # Cleanup
+    for uuid in [uuid1, uuid2, uuid3, uuid4]:
+        client.delete(
+            url_for("watch", uuid=uuid),
+            headers={'x-api-key': api_key},
+        )
@@ -107,7 +107,7 @@ def test_watch_notification_urls_validation(client, live_server, measure_memory_
        headers={'content-type': 'application/json', 'x-api-key': api_key}
    )
    assert res.status_code == 400, "Should reject non-list notification_urls"
-    assert b"OpenAPI validation failed" in res.data or b"Request body validation error" in res.data
+    assert b"Validation failed" in res.data or b"is not of type" in res.data

    # Test 6: Verify original URLs are preserved after failed update
    res = client.get(
@@ -159,7 +159,7 @@ def test_tag_notification_urls_validation(client, live_server, measure_memory_us
        headers={'content-type': 'application/json', 'x-api-key': api_key}
    )
    assert res.status_code == 400, "Should reject non-list notification_urls"
-    assert b"OpenAPI validation failed" in res.data or b"Request body validation error" in res.data
+    assert b"Validation failed" in res.data or b"is not of type" in res.data

    # Test 4: Verify original URLs are preserved after failed update
    tag = datastore.data['settings']['application']['tags'][tag_uuid]
@@ -26,7 +26,7 @@ def test_openapi_validation_invalid_content_type_on_create_watch(client, live_se

    # Should get 400 error due to OpenAPI validation failure
    assert res.status_code == 400, f"Expected 400 but got {res.status_code}"
-    assert b"OpenAPI validation failed" in res.data, "Should contain OpenAPI validation error message"
+    assert b"Validation failed" in res.data, "Should contain validation error message"


 def test_openapi_validation_missing_required_field_create_watch(client, live_server, measure_memory_usage, datastore_path):
@@ -43,7 +43,7 @@ def test_openapi_validation_missing_required_field_create_watch(client, live_ser

    # Should get 400 error due to missing required field
    assert res.status_code == 400, f"Expected 400 but got {res.status_code}"
-    assert b"OpenAPI validation failed" in res.data, "Should contain OpenAPI validation error message"
+    assert b"Validation failed" in res.data, "Should contain validation error message"


 def test_openapi_validation_invalid_field_in_request_body(client, live_server, measure_memory_usage, datastore_path):
@@ -80,10 +80,9 @@ def test_openapi_validation_invalid_field_in_request_body(client, live_server, m
    # Should get 400 error due to invalid field (this will be caught by internal validation)
    # Note: This tests the flow where OpenAPI validation passes but internal validation catches it
    assert res.status_code == 400, f"Expected 400 but got {res.status_code}"
-    # With patternProperties for processor_config_*, the error message format changed slightly
-    assert (b"Additional properties are not allowed" in res.data or
-            b"does not match any of the regexes" in res.data), \
-            "Should contain validation error about additional/invalid properties"
+    # Backend validation now returns "Unknown field(s):" message
+    assert b"Unknown field" in res.data, \
+            "Should contain validation error about unknown fields"


 def test_openapi_validation_import_wrong_content_type(client, live_server, measure_memory_usage, datastore_path):
@@ -100,7 +99,7 @@ def test_openapi_validation_import_wrong_content_type(client, live_server, measu

    # Should get 400 error due to content-type mismatch
    assert res.status_code == 400, f"Expected 400 but got {res.status_code}"
-    assert b"OpenAPI validation failed" in res.data, "Should contain OpenAPI validation error message"
+    assert b"Validation failed" in res.data, "Should contain validation error message"


 def test_openapi_validation_import_correct_content_type_succeeds(client, live_server, measure_memory_usage, datastore_path):
@@ -158,7 +157,7 @@ def test_openapi_validation_create_tag_missing_required_title(client, live_serve

    # Should get 400 error due to missing required field
    assert res.status_code == 400, f"Expected 400 but got {res.status_code}"
-    assert b"OpenAPI validation failed" in res.data, "Should contain OpenAPI validation error message"
+    assert b"Validation failed" in res.data, "Should contain validation error message"


 def test_openapi_validation_watch_update_allows_partial_updates(client, live_server, measure_memory_usage, datastore_path):
@@ -176,4 +176,57 @@ def test_api_tags_listing(client, live_server, measure_memory_usage, datastore_p
    assert res.status_code == 204


+def test_roundtrip_API(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test the full round trip, this way we test the default Model fits back into OpenAPI spec
+    :param client:
+    :param live_server:
+    :param measure_memory_usage:
+    :param datastore_path:
+    :return:
+    """
+    api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')

+    set_original_response(datastore_path=datastore_path)
+
+    res = client.post(
+        url_for("tag"),
+        data=json.dumps({"title": "My tag title"}),
+        headers={'content-type': 'application/json', 'x-api-key': api_key}
+    )
+    assert res.status_code == 201
+
+    uuid = res.json.get('uuid')
+
+    # Now fetch it and send it back
+
+    res = client.get(
+        url_for("tag", uuid=uuid),
+        headers={'x-api-key': api_key}
+    )
+
+    tag = res.json
+
+    # Only test with date_created (readOnly field that should be filtered out)
+    # last_changed is Watch-specific and doesn't apply to Tags
+    tag['date_created'] = 454444444444
+
+    # HTTP PUT ( UPDATE an existing watch )
+    res = client.put(
+        url_for("tag", uuid=uuid),
+        headers={'x-api-key': api_key, 'content-type': 'application/json'},
+        data=json.dumps(tag),
+    )
+    if res.status_code != 200:
+        print(f"\n=== PUT failed with {res.status_code} ===")
+        print(f"Error: {res.data}")
+    assert res.status_code == 200, "HTTP PUT update was sent OK"
+
+    # Verify readOnly fields like date_created cannot be overridden
+    res = client.get(
+        url_for("tag", uuid=uuid),
+        headers={'x-api-key': api_key}
+    )
+    date_created = res.json.get('date_created')
+    assert date_created != 454444444444, "ReadOnly date_created should not be updateable"
+    assert date_created != "454444444444", "ReadOnly date_created should not be updateable"
@@ -54,11 +54,11 @@ def test_backup(client, live_server, measure_memory_usage, datastore_path):
    backup = ZipFile(io.BytesIO(res.data))
    l = backup.namelist()

-    # Check for UUID-based txt files (history and snapshot)
+    # Check for UUID-based txt files (history, snapshot, and last-checksum)
    uuid4hex_txt = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}.*txt', re.I)
    txt_files = list(filter(uuid4hex_txt.match, l))
-    # Should be two txt files in the archive (history and the snapshot)
-    assert len(txt_files) == 2
+    # Should be three txt files in the archive (history, snapshot, and last-checksum)
+    assert len(txt_files) == 3

    # Check for watch.json files (new format)
    uuid4hex_json = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}/watch\.json$', re.I)
@@ -75,4 +75,62 @@ def test_backup(client, live_server, measure_memory_usage, datastore_path):
        follow_redirects=True
    )

-    assert b'No backups found.' in res.data
+    assert b'No backups found.' in res.data
+
+
+def test_watch_data_package_download(client, live_server, measure_memory_usage, datastore_path):
+    """Test downloading a single watch's data as a zip package"""
+    import os
+    import json
+
+    set_original_response(datastore_path=datastore_path)
+
+    # Add a watch
+    res = client.post(
+        url_for("imports.import_page"),
+        data={"urls": url_for('test_endpoint', _external=True)},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+    wait_for_all_checks(client)
+
+    # Get the UUID directly from the datastore
+    # Find the watch directories
+    uuid = None
+    for item in os.listdir(datastore_path):
+        item_path = os.path.join(datastore_path, item)
+        if os.path.isdir(item_path) and len(item) == 36:  # UUID format
+            uuid = item
+            break
+
+    assert uuid is not None, "Could not find watch UUID in datastore"
+
+    # Download the watch data package
+    res = client.get(
+        url_for("ui.ui_edit.watch_get_data_package", uuid=uuid),
+        follow_redirects=True
+    )
+
+    # Should get the right zip content type
+    assert res.content_type == "application/zip"
+
+    # Should be PK/ZIP stream (PKzip header)
+    assert res.data[:2] == b'PK', "File should start with PK (PKzip header)"
+    assert res.data.count(b'PK') >= 2, "Should have multiple PK markers (zip file structure)"
+
+    # Verify zip contents
+    backup = ZipFile(io.BytesIO(res.data))
+    files = backup.namelist()
+
+    # Should have files in a UUID directory
+    assert any(uuid in f for f in files), f"Files should be in UUID directory: {files}"
+
+    # Should contain watch.json
+    watch_json_path = f"{uuid}/watch.json"
+    assert watch_json_path in files, f"Should contain watch.json, got: {files}"
+
+    # Should contain history/snapshot files
+    uuid4hex_txt = re.compile(f'^{re.escape(uuid)}/.*\\.txt', re.I)
+    txt_files = list(filter(uuid4hex_txt.match, files))
+    assert len(txt_files) > 0, f"Should have at least one .txt file (history/snapshot), got: {files}"
@@ -71,22 +71,19 @@ def test_include_filters_output():

 # Tests the whole stack works with the CSS Filter
 def test_check_markup_include_filters_restriction(client, live_server, measure_memory_usage, datastore_path):
-    sleep_time_for_fetch_thread = 3

    include_filters = "#sametext"

    set_original_response(datastore_path=datastore_path)

-    # Give the endpoint time to spin up
-    time.sleep(1)

    # Add our URL to the import page
    test_url = url_for('test_endpoint', _external=True)
    uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)

-    # Give the thread time to pick it up
-    time.sleep(sleep_time_for_fetch_thread)
+    wait_for_all_checks(client)
+

    # Goto the edit page, add our ignore text
    # Add our URL to the import page
@@ -103,15 +100,15 @@ def test_check_markup_include_filters_restriction(client, live_server, measure_m
    )
    assert bytes(include_filters.encode('utf-8')) in res.data

-    # Give the thread time to pick it up
-    time.sleep(sleep_time_for_fetch_thread)
+    wait_for_all_checks(client)
+
    #  Make a change
    set_modified_response(datastore_path=datastore_path)

    # Trigger a check
    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
-    # Give the thread time to pick it up
-    time.sleep(sleep_time_for_fetch_thread)
+    wait_for_all_checks(client)
+

    # It should have 'has-unread-changes' still
    # Because it should be looking at only that 'sametext' id
@@ -5,6 +5,8 @@ from flask import url_for
 from .util import live_server_setup, wait_for_all_checks, extract_rss_token_from_UI, get_UUID_for_tag_name, extract_UUID_from_client, delete_all_watches
 import os

+from ..store import ChangeDetectionStore
+

 # def test_setup(client, live_server, measure_memory_usage, datastore_path):
   #  live_server_setup(live_server) # Setup on conftest per function
@@ -487,7 +489,6 @@ def test_tag_json_persistence(client, live_server, measure_memory_usage, datasto
    - Tag deletion removes tag.json file
    """
    import json
-    from changedetectionio.store import ChangeDetectionStore

    datastore = client.application.config.get('DATASTORE')

@@ -569,9 +570,6 @@ def test_tag_json_migration_update_27(client, live_server, measure_memory_usage,
    This simulates a pre-update_27 datastore and verifies migration works.
    """
    import json
-    from changedetectionio.store import ChangeDetectionStore
-
-    datastore = client.application.config.get('DATASTORE')

    # 1. Create multiple tags
    tag_names = ['migration-tag-1', 'migration-tag-2', 'migration-tag-3']
@@ -106,7 +106,7 @@ def test_consistent_history(client, live_server, measure_memory_usage, datastore

        # Find the snapshot one
        for fname in files_in_watch_dir:
-            if fname != 'history.txt' and fname != 'watch.json' and 'html' not in fname:
+            if fname != 'history.txt' and fname != 'watch.json' and fname != 'last-checksum.txt' and 'html' not in fname:
                if strtobool(os.getenv("TEST_WITH_BROTLI")):
                    assert fname.endswith('.br'), "Forced TEST_WITH_BROTLI then it should be a .br filename"

@@ -123,11 +123,18 @@ def test_consistent_history(client, live_server, measure_memory_usage, datastore
                assert json_obj['watching'][w]['title'], "Watch should have a title set"
                assert contents.startswith(watch_title + "x"), f"Snapshot contents in file {fname} should start with '{watch_title}x', got '{contents}'"

-        # With new format, we also have watch.json, so 4 files total
+        # With new format, we have watch.json, so 4 files minimum
+        # Note: last-checksum.txt may or may not exist - it gets cleared by settings changes,
+        # and this test changes settings before checking files
+        # This assertion should be AFTER the loop, not inside it
        if os.path.exists(changedetection_json):
-            assert len(files_in_watch_dir) == 4, "Should be four files in the dir with new format: watch.json, html.br snapshot, history.txt and the extracted text snapshot"
+            # 4 required files: watch.json, html.br, history.txt, extracted text snapshot
+            # last-checksum.txt is optional (cleared by settings changes in this test)
+            assert len(files_in_watch_dir) >= 4 and len(files_in_watch_dir) <= 5, f"Should be 4-5 files in the dir with new format (last-checksum.txt is optional). Found {len(files_in_watch_dir)}: {files_in_watch_dir}"
        else:
-            assert len(files_in_watch_dir) == 3, "Should be just three files in the dir with legacy format: html.br snapshot, history.txt and the extracted text snapshot"
+            # 3 required files: html.br, history.txt, extracted text snapshot
+            # last-checksum.txt is optional
+            assert len(files_in_watch_dir) >= 3 and len(files_in_watch_dir) <= 4, f"Should be 3-4 files in the dir with legacy format (last-checksum.txt is optional). Found {len(files_in_watch_dir)}: {files_in_watch_dir}"

    # Check that 'default' Watch vars aren't accidentally being saved
    if os.path.exists(changedetection_json):
@@ -0,0 +1,208 @@
+#!/usr/bin/env python3
+"""
+Test that changing global settings or tag configurations forces reprocessing.
+
+When settings or tag configurations change, all affected watches need to
+reprocess even if their content hasn't changed, because configuration affects
+the processing result.
+"""
+
+import os
+import time
+from flask import url_for
+from .util import wait_for_all_checks
+
+
+def test_settings_change_forces_reprocess(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that changing global settings clears all checksums to force reprocessing.
+    """
+
+    # Setup test content
+    test_html = """<html>
+     <body>
+     <p>Test content that stays the same</p>
+     </body>
+     </html>
+    """
+    with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
+        f.write(test_html)
+
+    test_url = url_for('test_endpoint', _external=True)
+
+    # Add two watches
+    datastore = client.application.config.get('DATASTORE')
+    uuid1 = datastore.add_watch(url=test_url, extras={'title': 'Watch 1'})
+    uuid2 = datastore.add_watch(url=test_url, extras={'title': 'Watch 2'})
+
+    # Unpause watches
+    datastore.data['watching'][uuid1]['paused'] = False
+    datastore.data['watching'][uuid2]['paused'] = False
+
+    # First check - establishes baseline
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify checksum files were created
+    checksum1 = os.path.join(datastore_path, uuid1, 'last-checksum.txt')
+    checksum2 = os.path.join(datastore_path, uuid2, 'last-checksum.txt')
+    assert os.path.isfile(checksum1), "First check should create checksum file for watch 1"
+    assert os.path.isfile(checksum2), "First check should create checksum file for watch 2"
+
+    # Change global settings (any setting will do)
+    res = client.post(
+        url_for("settings.settings_page"),
+        data={
+            "application-empty_pages_are_a_change": "",
+            "requests-time_between_check-minutes": 180,
+            'application-fetch_backend': "html_requests"
+        },
+        follow_redirects=True
+    )
+    assert b"Settings updated." in res.data
+
+    # Give it a moment to process
+    time.sleep(0.5)
+
+    # Verify ALL checksum files were deleted
+    assert not os.path.isfile(checksum1), "Settings change should delete checksum for watch 1"
+    assert not os.path.isfile(checksum2), "Settings change should delete checksum for watch 2"
+
+    # Next check should reprocess (not skip) and recreate checksums
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify checksum files were recreated
+    assert os.path.isfile(checksum1), "Reprocessing should recreate checksum file for watch 1"
+    assert os.path.isfile(checksum2), "Reprocessing should recreate checksum file for watch 2"
+
+    print("✓ Settings change forces reprocessing of all watches")
+
+
+def test_tag_change_forces_reprocess(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that changing a tag configuration clears checksums only for watches with that tag.
+    """
+
+    # Setup test content
+    test_html = """<html>
+     <body>
+     <p>Test content that stays the same</p>
+     </body>
+     </html>
+    """
+    with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
+        f.write(test_html)
+
+    test_url = url_for('test_endpoint', _external=True)
+
+    # Create a tag
+    datastore = client.application.config.get('DATASTORE')
+    tag_uuid = datastore.add_tag('Test Tag')
+
+    # Add watches - one with tag, one without
+    uuid_with_tag = datastore.add_watch(url=test_url, extras={'title': 'Watch With Tag', 'tags': [tag_uuid]})
+    uuid_without_tag = datastore.add_watch(url=test_url, extras={'title': 'Watch Without Tag'})
+
+    # Unpause watches
+    datastore.data['watching'][uuid_with_tag]['paused'] = False
+    datastore.data['watching'][uuid_without_tag]['paused'] = False
+
+    # First check - establishes baseline
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify checksum files were created
+    checksum_with = os.path.join(datastore_path, uuid_with_tag, 'last-checksum.txt')
+    checksum_without = os.path.join(datastore_path, uuid_without_tag, 'last-checksum.txt')
+    assert os.path.isfile(checksum_with), "First check should create checksum for tagged watch"
+    assert os.path.isfile(checksum_without), "First check should create checksum for untagged watch"
+
+    # Edit the tag (change notification_muted as an example)
+    tag = datastore.data['settings']['application']['tags'][tag_uuid]
+    res = client.post(
+        url_for("tags.form_tag_edit_submit", uuid=tag_uuid),
+        data={
+            'title': 'Test Tag',
+            'notification_muted': 'y',
+            'overrides_watch': 'n'
+        },
+        follow_redirects=True
+    )
+    assert b"Updated" in res.data
+
+    # Give it a moment to process
+    time.sleep(0.5)
+
+    # Verify ONLY the tagged watch's checksum was deleted
+    assert not os.path.isfile(checksum_with), "Tag change should delete checksum for watch WITH tag"
+    assert os.path.isfile(checksum_without), "Tag change should NOT delete checksum for watch WITHOUT tag"
+
+    # Next check should reprocess tagged watch and recreate its checksum
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify tagged watch's checksum was recreated
+    assert os.path.isfile(checksum_with), "Reprocessing should recreate checksum for tagged watch"
+    assert os.path.isfile(checksum_without), "Untagged watch should still have its checksum"
+
+    print("✓ Tag change forces reprocessing only for watches with that tag")
+
+
+def test_tag_change_via_api_forces_reprocess(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that updating a tag via API also clears checksums for affected watches.
+    """
+
+    # Setup test content
+    test_html = """<html>
+     <body>
+     <p>Test content</p>
+     </body>
+     </html>
+    """
+    with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
+        f.write(test_html)
+
+    test_url = url_for('test_endpoint', _external=True)
+
+    # Create a tag
+    datastore = client.application.config.get('DATASTORE')
+    tag_uuid = datastore.add_tag('API Test Tag')
+
+    # Add watch with tag
+    uuid_with_tag = datastore.add_watch(url=test_url, extras={'title': 'API Watch'})
+    datastore.data['watching'][uuid_with_tag]['paused'] = False
+    datastore.data['watching'][uuid_with_tag]['tags'] = [tag_uuid]
+    datastore.data['watching'][uuid_with_tag].commit()
+
+    # First check
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify checksum exists
+    checksum_file = os.path.join(datastore_path, uuid_with_tag, 'last-checksum.txt')
+    assert os.path.isfile(checksum_file), "First check should create checksum file"
+
+    # Update tag via API
+    res = client.put(
+        f'/api/v1/tag/{tag_uuid}',
+        json={'notification_muted': True},
+        headers={'x-api-key': datastore.data['settings']['application']['api_access_token']}
+    )
+    assert res.status_code == 200, f"API call failed with status {res.status_code}: {res.data}"
+
+    # Give it more time for async operations
+    time.sleep(1.0)
+
+    # Debug: Check if checksum still exists
+    if os.path.isfile(checksum_file):
+        # Read checksum to see if it changed
+        with open(checksum_file, 'r') as f:
+            checksum_content = f.read()
+            print(f"Checksum still exists: {checksum_content}")
+
+    # Verify checksum was deleted
+    assert not os.path.isfile(checksum_file), "API tag update should delete checksum"
+
+    print("✓ Tag update via API forces reprocessing")
@@ -0,0 +1,246 @@
+#!/usr/bin/env python3
+"""
+Test the watch edited flag functionality.
+
+This tests the private __watch_was_edited flag that tracks when writable
+watch fields are modified, which prevents skipping reprocessing when the
+watch configuration has changed.
+"""
+
+import os
+import time
+from flask import url_for
+from .util import live_server_setup, wait_for_all_checks
+
+
+def set_test_content(datastore_path):
+    """Write test HTML content to endpoint-content.txt for test server."""
+    test_html = """<html>
+     <body>
+     <p>Test content for watch edited flag tests</p>
+     <p>This content stays the same across checks</p>
+     </body>
+     </html>
+    """
+    with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
+        f.write(test_html)
+
+
+def test_watch_edited_flag_lifecycle(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test the full lifecycle of the was_edited flag:
+    1. Flag starts False when watch is created
+    2. Flag becomes True when writable fields are modified
+    3. Flag is reset False after worker processing
+    4. Flag stays False when readonly fields are modified
+    """
+
+    # Setup - Add a watch
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("ui.ui_views.form_quick_watch_add"),
+        data={"url": test_url, "tags": "", "edit_and_watch_submit_button": "Edit > Watch"},
+        follow_redirects=True
+    )
+    assert b"Watch added" in res.data or b"Updated watch" in res.data
+
+    # Get the watch UUID
+    datastore = client.application.config.get('DATASTORE')
+    uuid = list(datastore.data['watching'].keys())[0]
+    watch = datastore.data['watching'][uuid]
+
+    # Reset flag after initial form submission (form sets fields which trigger the flag)
+    watch.reset_watch_edited_flag()
+
+    # Test 1: Flag should be False after reset
+    assert not watch.was_edited, "Flag should be False after reset"
+
+    # Test 2: Modify a writable field (title) - flag should become True
+    watch['title'] = 'New Title'
+    assert watch.was_edited, "Flag should be True after modifying writable field 'title'"
+
+    # Test 3: Reset flag manually (simulating what worker does)
+    watch.reset_watch_edited_flag()
+    assert not watch.was_edited, "Flag should be False after reset"
+
+    # Test 4: Modify another writable field (url) - flag should become True again
+    watch['url'] = 'https://example.com'
+    assert watch.was_edited, "Flag should be True after modifying writable field 'url'"
+
+    # Test 5: Reset and modify a readonly field - flag should stay False
+    watch.reset_watch_edited_flag()
+    assert not watch.was_edited, "Flag should be False after reset"
+
+    # Modify readonly field (uuid) - should not set flag
+    old_uuid = watch['uuid']
+    watch['uuid'] = 'readonly-test-uuid'
+    assert not watch.was_edited, "Flag should stay False when modifying readonly field 'uuid'"
+    watch['uuid'] = old_uuid  # Restore original
+
+    # Note: Worker reset behavior is tested in test_check_removed_line_contains_trigger
+    # and test_watch_edited_flag_prevents_skip
+
+    print("✓ All watch edited flag lifecycle tests passed")
+
+
+def test_watch_edited_flag_dict_methods(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that the flag is set correctly by various dict methods:
+    - __setitem__ (watch['key'] = value)
+    - update() (watch.update({'key': value}))
+    - setdefault() (watch.setdefault('key', default))
+    - pop() (watch.pop('key'))
+    - __delitem__ (del watch['key'])
+    """
+
+    # Setup - Add a watch
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("ui.ui_views.form_quick_watch_add"),
+        data={"url": test_url, "tags": "", "edit_and_watch_submit_button": "Edit > Watch"},
+        follow_redirects=True
+    )
+
+    datastore = client.application.config.get('DATASTORE')
+    uuid = list(datastore.data['watching'].keys())[0]
+    watch = datastore.data['watching'][uuid]
+
+    # Test __setitem__
+    watch.reset_watch_edited_flag()
+    watch['title'] = 'Test via setitem'
+    assert watch.was_edited, "Flag should be True after __setitem__ on writable field"
+
+    # Test update() with dict
+    watch.reset_watch_edited_flag()
+    watch.update({'title': 'Test via update dict'})
+    assert watch.was_edited, "Flag should be True after update() with writable field"
+
+    # Test update() with kwargs
+    watch.reset_watch_edited_flag()
+    watch.update(title='Test via update kwargs')
+    assert watch.was_edited, "Flag should be True after update() kwargs with writable field"
+
+    # Test setdefault() on new key
+    watch.reset_watch_edited_flag()
+    watch.setdefault('title', 'Should not be set')  # Key exists, no change
+    assert not watch.was_edited, "Flag should stay False when setdefault() doesn't change existing key"
+
+    watch.setdefault('custom_field', 'New value')  # New key
+    assert watch.was_edited, "Flag should be True after setdefault() creates new writable field"
+
+    # Test pop() on writable field
+    watch.reset_watch_edited_flag()
+    watch.pop('custom_field', None)
+    assert watch.was_edited, "Flag should be True after pop() on writable field"
+
+    # Test __delitem__ on writable field
+    watch.reset_watch_edited_flag()
+    watch['temp_field'] = 'temp'
+    watch.reset_watch_edited_flag()  # Reset after adding
+    del watch['temp_field']
+    assert watch.was_edited, "Flag should be True after __delitem__ on writable field"
+
+    print("✓ All dict methods correctly set the flag")
+
+
+def test_watch_edited_flag_prevents_skip(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that the was_edited flag prevents skipping reprocessing.
+    When watch configuration is edited, it should reprocess even if content unchanged.
+    After worker processing, flag should be reset and subsequent checks can skip.
+    """
+
+    # Setup test content
+    set_test_content(datastore_path)
+
+    # Setup - Add a watch
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("ui.ui_views.form_quick_watch_add"),
+        data={"url": test_url, "tags": "", "edit_and_watch_submit_button": "Edit > Watch"},
+        follow_redirects=True
+    )
+    assert b"Watch added" in res.data or b"Updated watch" in res.data
+
+    datastore = client.application.config.get('DATASTORE')
+    uuid = list(datastore.data['watching'].keys())[0]
+    watch = datastore.data['watching'][uuid]
+
+    # Unpause the watch (watches are paused by default in tests)
+    watch['paused'] = False
+
+    # Run first check to establish baseline
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify first check completed successfully - checksum file should exist
+    checksum_file = os.path.join(datastore_path, uuid, 'last-checksum.txt')
+    assert os.path.isfile(checksum_file), "First check should create last-checksum.txt file"
+
+    # Reset the was_edited flag (simulating clean state after processing)
+    watch.reset_watch_edited_flag()
+    assert not watch.was_edited, "Flag should be False after reset"
+
+    # Run second check without any changes - should skip via checksumFromPreviousCheckWasTheSame
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # Verify it was skipped (last_check_status should indicate skip)
+    # Note: The actual skip is tested in test_check_removed_line_contains_trigger
+    # Here we're focused on the was_edited flag interaction
+
+    # Now modify the watch - flag should become True
+    watch['title'] = 'Modified Title'
+    assert watch.was_edited, "Flag should be True after modifying watch"
+
+    # Run third check - should NOT skip because was_edited=True even though content unchanged
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    # After worker processing, the flag should be reset by the worker
+    # This reset happens in the processor's run() method after processing completes
+    assert not watch.was_edited, "Flag should be False after worker processing"
+
+    print("✓ was_edited flag correctly prevents skip and is reset by worker")
+
+
+def test_watch_edited_flag_system_fields(client, live_server, measure_memory_usage, datastore_path):
+    """
+    Test that system fields (readonly + additional system fields) don't trigger the flag.
+    """
+
+    # Setup - Add a watch
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("ui.ui_views.form_quick_watch_add"),
+        data={"url": test_url, "tags": "", "edit_and_watch_submit_button": "Edit > Watch"},
+        follow_redirects=True
+    )
+
+    datastore = client.application.config.get('DATASTORE')
+    uuid = list(datastore.data['watching'].keys())[0]
+    watch = datastore.data['watching'][uuid]
+
+    # Test readonly fields from OpenAPI spec
+    readonly_fields = ['uuid', 'date_created', 'last_viewed']
+    for field in readonly_fields:
+        watch.reset_watch_edited_flag()
+        if field in watch:
+            old_value = watch[field]
+            watch[field] = 'modified-readonly-value'
+            assert not watch.was_edited, f"Flag should stay False when modifying readonly field '{field}'"
+            watch[field] = old_value  # Restore
+
+    # Test additional system fields not in OpenAPI spec yet
+    system_fields = ['last_check_status']
+    for field in system_fields:
+        watch.reset_watch_edited_flag()
+        watch[field] = 'system-value'
+        assert not watch.was_edited, f"Flag should stay False when modifying system field '{field}'"
+
+    # Test that content-type (readonly per OpenAPI) doesn't trigger flag
+    watch.reset_watch_edited_flag()
+    watch['content-type'] = 'text/html'
+    assert not watch.was_edited, "Flag should stay False when modifying 'content-type' (readonly)"
+
+    print("✓ System fields correctly don't trigger the flag")
@@ -276,6 +276,9 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                    # Yes fine, so nothing todo, don't continue to process.
                    process_changedetection_results = False
                    changed_detected = False
+                    logger.debug(f'[{uuid}] - checksumFromPreviousCheckWasTheSame - Checksum from previous check was the same, nothing todo here.')
+                    # Reset the edited flag since we successfully completed the check
+                    watch.reset_watch_edited_flag()
                    
                except content_fetchers_exceptions.BrowserConnectError as e:
                    datastore.update_watch(uuid=uuid,
@@ -378,7 +381,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                    if not datastore.data['watching'].get(uuid):
                        continue

-                    update_obj['content-type'] = update_handler.fetcher.get_all_headers().get('content-type', '').lower()
+                    update_obj['content-type'] = str(update_handler.fetcher.get_all_headers().get('content-type', '') or "").lower()

                    if not watch.get('ignore_status_codes'):
                        update_obj['consecutive_filter_failures'] = 0
@@ -392,6 +395,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                logger.debug(f"Processing watch UUID: {uuid} - xpath_data length returned {len(update_handler.xpath_data) if update_handler and update_handler.xpath_data else 'empty.'}")
                if update_handler and process_changedetection_results:
                    try:
+                        # Reset the edited flag BEFORE update_watch (which calls watch.update() and would set it again)
+                        watch.reset_watch_edited_flag()
                        datastore.update_watch(uuid=uuid, update_obj=update_obj)

                        if changed_detected or not watch.history_n:
@@ -439,8 +444,22 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                        logger.exception(f"Worker {worker_id} full exception details:")
                        datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})

+
                # Always record attempt count
                count = watch.get('check_count', 0) + 1
+
+                final_updates = {'fetch_time': round(time.time() - fetch_start_time, 3),
+                                                                  'check_count': count,
+                                                                  }
+                # Record server header
+                try:
+                    server_header = str(update_handler.fetcher.get_all_headers().get('server', '') or "").strip().lower()[:255]
+                    if server_header:
+                        final_updates['remote_server_reply'] = server_header
+                except Exception as e:
+                    server_header = None
+                    pass
+
                if update_handler: # Could be none or empty if the processor was not found
                    # Always record page title (used in notifications, and can change even when the content is the same)
                    if update_obj.get('content-type') and 'html' in update_obj.get('content-type'):
@@ -449,17 +468,12 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                            if page_title:
                                page_title = page_title.strip()[:2000]
                                logger.debug(f"UUID: {uuid} Page <title> is '{page_title}'")
-                                datastore.update_watch(uuid=uuid, update_obj={'page_title': page_title})
+                                final_updates['page_title'] = page_title
                        except Exception as e:
                            logger.exception(f"Worker {worker_id} full exception details:")
                            logger.warning(f"UUID: {uuid} Exception when extracting <title> - {str(e)}")

-                    # Record server header
-                    try:
-                        server_header = update_handler.fetcher.headers.get('server', '').strip().lower()[:255]
-                        datastore.update_watch(uuid=uuid, update_obj={'remote_server_reply': server_header})
-                    except Exception as e:
-                        pass
+

                    # Store favicon if necessary
                    if update_handler.fetcher.favicon_blob and update_handler.fetcher.favicon_blob.get('base64'):
@@ -467,14 +481,12 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                                           favicon_base_64=update_handler.fetcher.favicon_blob.get('base64')
                                           )

-                    datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - fetch_start_time, 3),
-                                                                   'check_count': count})
+                    datastore.update_watch(uuid=uuid, update_obj=final_updates)

                    # NOW clear fetcher content - after all processing is complete
                    # This is the last point where we need the fetcher data
                    if update_handler and hasattr(update_handler, 'fetcher') and update_handler.fetcher:
                        update_handler.fetcher.clear_content()
-                        logger.debug(f"Cleared fetcher content for UUID {uuid}")

                    # Explicitly delete update_handler to free all references
                    if update_handler:
@@ -28,7 +28,7 @@ info:
    
    For example: `x-api-key: YOUR_API_KEY`
    
-  version: 0.1.5
+  version: 0.1.6
  contact:
    name: ChangeDetection.io
    url: https://github.com/dgtlmoon/changedetection.io
@@ -126,13 +126,22 @@ components:
    WatchBase:
      type: object
      properties:
+        uuid:
+          type: string
+          format: uuid
+          description: Unique identifier
+          readOnly: true
+        date_created:
+          type: [integer, 'null']
+          description: Unix timestamp of creation
+          readOnly: true
        url:
          type: string
          format: uri
          description: URL to monitor for changes
          maxLength: 5000
        title:
-          type: string
+          type: [string, 'null']
          description: Custom title for the web page change monitor (watch), not to be confused with page_title
          maxLength: 5000
        tag:
@@ -156,56 +165,61 @@ components:
          description: HTTP method to use
        fetch_backend:
          type: string
-          enum: [html_requests, html_webdriver]
-          description: Backend to use for fetching content
+          description: |
+            Backend to use for fetching content. Common values:
+            - `system` (default) - Use the system-wide default fetcher
+            - `html_requests` - Fast requests-based fetcher
+            - `html_webdriver` - Browser-based fetcher (Playwright/Puppeteer)
+            - `extra_browser_*` - Custom browser configurations (if configured)
+            - Plugin-provided fetchers (if installed)
+          pattern: '^(system|html_requests|html_webdriver|extra_browser_.+)$'
+          default: system
        headers:
          type: object
          additionalProperties:
            type: string
          description: HTTP headers to include in requests
        body:
-          type: string
+          type: [string, 'null']
          description: HTTP request body
          maxLength: 5000
        proxy:
-          type: string
+          type: [string, 'null']
          description: Proxy configuration
          maxLength: 5000
+        ignore_status_codes:
+          type: [boolean, 'null']
+          description: Ignore HTTP status code errors (boolean or null)
        webdriver_delay:
-          type: integer
+          type: [integer, 'null']
          description: Delay in seconds for webdriver
        webdriver_js_execute_code:
-          type: string
+          type: [string, 'null']
          description: JavaScript code to execute
          maxLength: 5000
        time_between_check:
          type: object
          properties:
            weeks:
-              type: integer
+              type: [integer, 'null']
              minimum: 0
              maximum: 52000
-              nullable: true
            days:
-              type: integer
+              type: [integer, 'null']
              minimum: 0
              maximum: 365000
-              nullable: true
            hours:
-              type: integer
+              type: [integer, 'null']
              minimum: 0
              maximum: 8760000
-              nullable: true
            minutes:
-              type: integer
+              type: [integer, 'null']
              minimum: 0
              maximum: 525600000
-              nullable: true
            seconds:
-              type: integer
+              type: [integer, 'null']
              minimum: 0
              maximum: 31536000000
-              nullable: true
          description: Time intervals between checks. All fields must be non-negative. At least one non-zero value required when not using default settings.
        time_between_check_use_default:
          type: boolean
@@ -219,11 +233,11 @@ components:
          maxItems: 100
          description: Notification URLs for this web page change monitor (watch). Maximum 100 URLs.
        notification_title:
-          type: string
+          type: [string, 'null']
          description: Custom notification title
          maxLength: 5000
        notification_body:
-          type: string
+          type: [string, 'null']
          description: Custom notification body
          maxLength: 5000
        notification_format:
@@ -231,7 +245,7 @@ components:
          enum: ['text', 'html', 'htmlcolor', 'markdown', 'System default']
          description: Format for notifications
        track_ldjson_price_data:
-          type: boolean
+          type: [boolean, 'null']
          description: Whether to track JSON-LD price data
        browser_steps:
          type: array
@@ -239,17 +253,14 @@ components:
            type: object
            properties:
              operation:
-                type: string
+                type: [string, 'null']
                maxLength: 5000
-                nullable: true
              selector:
-                type: string
+                type: [string, 'null']
                maxLength: 5000
-                nullable: true
              optional_value:
-                type: string
+                type: [string, 'null']
                maxLength: 5000
-                nullable: true
            required: [operation, selector, optional_value]
            additionalProperties: false
          maxItems: 100
@@ -260,16 +271,197 @@ components:
          default: text_json_diff
          description: Optional processor mode to use for change detection. Defaults to `text_json_diff` if not specified.

+        # Content Filtering
+        include_filters:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: CSS/XPath selectors to extract specific content from the page
+        subtractive_selectors:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: CSS/XPath selectors to remove content from the page
+        ignore_text:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: Text patterns to ignore in change detection
+        trigger_text:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: Text/regex patterns that must be present to trigger a change
+        text_should_not_be_present:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: Text that should NOT be present (triggers alert if found)
+        extract_text:
+          type: array
+          items:
+            type: string
+            maxLength: 5000
+          maxItems: 100
+          description: Regex patterns to extract specific text after filtering
+
+        # Text Processing
+        trim_text_whitespace:
+          type: boolean
+          default: false
+          description: Strip leading/trailing whitespace from text
+        sort_text_alphabetically:
+          type: boolean
+          default: false
+          description: Sort lines alphabetically before comparison
+        remove_duplicate_lines:
+          type: boolean
+          default: false
+          description: Remove duplicate lines from content
+        check_unique_lines:
+          type: boolean
+          default: false
+          description: Compare against all history for unique lines
+        strip_ignored_lines:
+          type: [boolean, 'null']
+          description: Remove lines matching ignore patterns
+
+        # Change Detection Filters
+        filter_text_added:
+          type: boolean
+          default: true
+          description: Include added text in change detection
+        filter_text_removed:
+          type: boolean
+          default: true
+          description: Include removed text in change detection
+        filter_text_replaced:
+          type: boolean
+          default: true
+          description: Include replaced text in change detection
+
+        # Restock/Price Detection
+        in_stock_only:
+          type: boolean
+          default: true
+          description: Only trigger on in-stock transitions (restock_diff processor)
+        follow_price_changes:
+          type: boolean
+          default: true
+          description: Monitor and track price changes (restock_diff processor)
+        price_change_threshold_percent:
+          type: [number, 'null']
+          description: Minimum price change percentage to trigger notification
+        has_ldjson_price_data:
+          type: [boolean, 'null']
+          description: Whether page has LD-JSON price data (auto-detected)
+          readOnly: true
+
+        # Notifications
+        notification_screenshot:
+          type: boolean
+          default: false
+          description: Include screenshot in notifications (if supported by notification URL)
+        filter_failure_notification_send:
+          type: boolean
+          default: true
+          description: Send notification when filters fail to match content
+
+        # History & Display
+        use_page_title_in_list:
+          type: [boolean, 'null']
+          description: Display page title in watch list (null = use system default)
+        history_snapshot_max_length:
+          type: [integer, 'null']
+          minimum: 1
+          maximum: 1000
+          description: Maximum number of history snapshots to keep (null = use system default)
+
+        # Scheduling
+        time_schedule_limit:
+          type: object
+          description: Weekly schedule limiting when checks can run
+          properties:
+            enabled:
+              type: boolean
+              default: false
+            monday:
+              $ref: '#/components/schemas/DaySchedule'
+            tuesday:
+              $ref: '#/components/schemas/DaySchedule'
+            wednesday:
+              $ref: '#/components/schemas/DaySchedule'
+            thursday:
+              $ref: '#/components/schemas/DaySchedule'
+            friday:
+              $ref: '#/components/schemas/DaySchedule'
+            saturday:
+              $ref: '#/components/schemas/DaySchedule'
+            sunday:
+              $ref: '#/components/schemas/DaySchedule'
+
+        # Conditions (advanced logic)
+        conditions:
+          type: array
+          items:
+            type: object
+            properties:
+              field:
+                type: string
+                description: Field to check (e.g., 'page_filtered_text', 'page_title')
+              operator:
+                type: string
+                description: Comparison operator (e.g., 'contains_regex', 'equals', 'not_equals')
+              value:
+                type: string
+                description: Value to compare against
+            required: [field, operator, value]
+          maxItems: 100
+          description: Array of condition rules for change detection logic (empty array when not set)
+        conditions_match_logic:
+          type: string
+          enum: ['ALL', 'ANY']
+          default: 'ALL'
+          description: Logic operator - ALL (match all conditions) or ANY (match any condition)
+
+    DaySchedule:
+      type: object
+      properties:
+        enabled:
+          type: boolean
+          default: true
+        start_time:
+          type: string
+          pattern: '^([0-1]?[0-9]|2[0-3]):[0-5][0-9]$'
+          default: '00:00'
+          description: Start time in HH:MM format
+        duration:
+          type: object
+          properties:
+            hours:
+              type: string
+              pattern: '^[0-9]+$'
+              default: '24'
+            minutes:
+              type: string
+              pattern: '^[0-9]+$'
+              default: '00'
+
    Watch:
      allOf:
        - $ref: '#/components/schemas/WatchBase'
        - type: object
          properties:
-            uuid:
-              type: string
-              format: uuid
-              description: Unique identifier for the web page change monitor (watch)
-              readOnly: true
            last_checked:
              type: integer
              description: Unix timestamp of last check
@@ -278,9 +470,10 @@ components:
              type: integer
              description: Unix timestamp of last change
              readOnly: true
+              x-computed: true
            last_error:
-              type: string
-              description: Last error message
+              type: [string, boolean, 'null']
+              description: Last error message (false when no error, string when error occurred, null if not checked yet)
              readOnly: true
            last_viewed:
              type: integer
@@ -291,6 +484,61 @@ components:
              format: string
              description: The watch URL rendered in case of any Jinja2 markup, always use this for listing.
              readOnly: true
+              x-computed: true
+            page_title:
+              type: [string, 'null']
+              description: HTML <title> tag extracted from the page
+              readOnly: true
+            check_count:
+              type: integer
+              description: Total number of checks performed
+              readOnly: true
+            fetch_time:
+              type: number
+              description: Duration of last fetch in seconds
+              readOnly: true
+            previous_md5:
+              type: [string, boolean]
+              description: MD5 hash of previous content (false if not set)
+              readOnly: true
+            previous_md5_before_filters:
+              type: [string, boolean]
+              description: MD5 hash before filters applied (false if not set)
+              readOnly: true
+            consecutive_filter_failures:
+              type: integer
+              description: Counter for consecutive filter match failures
+              readOnly: true
+            last_notification_error:
+              type: [string, 'null']
+              description: Last notification error message
+              readOnly: true
+            notification_alert_count:
+              type: integer
+              description: Number of notifications sent
+              readOnly: true
+            content-type:
+              type: [string, 'null']
+              description: Content-Type from last fetch
+              readOnly: true
+            remote_server_reply:
+              type: [string, 'null']
+              description: Server header from last response
+              readOnly: true
+            browser_steps_last_error_step:
+              type: [integer, 'null']
+              description: Last browser step that caused an error
+              readOnly: true
+            viewed:
+              type: [integer, boolean]
+              description: Computed property - true if watch has been viewed, false otherwise (deprecated, use last_viewed instead)
+              readOnly: true
+              x-computed: true
+            history_n:
+              type: integer
+              description: Number of history snapshots available
+              readOnly: true
+              x-computed: true

    CreateWatch:
      allOf:
@@ -301,34 +549,45 @@ components:

    UpdateWatch:
      allOf:
-        - $ref: '#/components/schemas/WatchBase'
+        - $ref: '#/components/schemas/WatchBase'  # Extends WatchBase for user-settable fields
        - type: object
          properties:
            last_viewed:
              type: integer
              description: Unix timestamp in seconds of the last time the watch was viewed. Setting it to a value higher than `last_changed` in the "Update watch" endpoint marks the watch as viewed.
              minimum: 0
+      # Note: ReadOnly and @property fields are filtered out in the backend before update
+      # We don't use unevaluatedProperties:false here to allow roundtrip GET/PUT workflows
+      # where the response includes computed fields that should be silently ignored

    Tag:
-      type: object
-      properties:
-        uuid:
-          type: string
-          format: uuid
-          description: Unique identifier for the tag
-          readOnly: true
-        title:
-          type: string
-          description: Tag title
-          maxLength: 5000
-        notification_urls:
-          type: array
-          items:
-            type: string
-          description: Default notification URLs for web page change monitors (watches) with this tag
-        notification_muted:
-          type: boolean
-          description: Whether notifications are muted for this tag
+      allOf:
+        - $ref: '#/components/schemas/WatchBase'
+        - type: object
+          properties:
+            overrides_watch:
+              type: [boolean, 'null']
+              description: |
+                Whether this tag's settings override watch settings for all watches in this tag/group.
+                - true: Tag settings override watch settings
+                - false: Tag settings do not override (watches use their own settings)
+                - null: Not decided yet / inherit default behavior
+            # Future: Aggregated statistics from all watches with this tag
+            # check_count:
+            #   type: integer
+            #   description: Sum of check_count from all watches with this tag
+            #   readOnly: true
+            #   x-computed: true
+            # last_checked:
+            #   type: integer
+            #   description: Most recent last_checked timestamp from all watches with this tag
+            #   readOnly: true
+            #   x-computed: true
+            # last_changed:
+            #   type: integer
+            #   description: Most recent last_changed timestamp from all watches with this tag
+            #   readOnly: true
+            #   x-computed: true

    CreateTag:
      allOf:
@@ -5,15 +5,14 @@ flask-compress
 # 0.6.3 included compatibility fix for werkzeug 3.x (2.x had deprecation of url handlers)
 flask-login>=0.6.3
 flask-paginate
-flask_expects_json~=1.7
 flask_restful
 flask_cors # For the Chrome extension to operate
 # janus # No longer needed - using pure threading.Queue for multi-loop support
 flask_wtf~=1.2
 flask~=3.1
 flask-socketio~=5.6.0
-python-socketio~=5.16.0
-python-engineio~=4.13.0
+python-socketio~=5.16.1
+python-engineio~=4.13.1
 inscriptis~=2.2
 pytz
 timeago~=1.0
@@ -126,8 +125,8 @@ greenlet >= 3.0.3
 # Default SOCKETIO_MODE=threading is recommended for better compatibility
 gevent

-# Pinned or it causes problems with flask_expects_json which seems unmaintained
-referencing==0.35.1
+# Previously pinned for flask_expects_json (removed 2026-02). Unpinning for now.
+referencing

 # For conditions
 panzi-json-logic
Author	SHA1	Message	Date
dgtlmoon	cdabdfeef1	UI - Ability to download a complete data package (.zip) of a watch	2026-02-15 10:39:14 +01:00
dgtlmoon	501aa61e19	Disable content compression of HTML/etc by default due to memory leak between flask_socketio and flask and flask_compress.	2026-02-15 08:19:29 +01:00
dgtlmoon	b6d3d63372	Avoid reprocessing if the page was the same (#3867 ) Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details	2026-02-14 21:24:28 +01:00
dependabot[bot]	f4bb32f588	Update python-socketio requirement from ~=5.16.0 to ~=5.16.1 (#3869 ) Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details	2026-02-13 17:43:43 +01:00
dgtlmoon	bcd32852ca	API - Remove `flask_expects_json` validation, this is covered entirely by OpenAPI, update OpenAPI spec. (#3871 )	2026-02-13 16:30:59 +01:00
dependabot[bot]	ad14807067	Update python-engineio requirement from ~=4.13.0 to ~=4.13.1 (#3868 )	2026-02-13 11:24:50 +01:00
dgtlmoon	4bc01aca8d	Price tracker - Use a more memory efficient price scraper, use subprocess on linux for cleaner memory management. (#3864 ) Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled Details ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details CodeQL / Analyze (javascript) (push) Has been cancelled Details CodeQL / Analyze (python) (push) Has been cancelled Details	2026-02-11 17:21:08 +01:00
dgtlmoon	ef41dd304c	Refactoring upgrade path (#3861 )	2026-02-11 16:13:08 +01:00