Text filters - Process subtractive_selectors first

LLM - Fixing summary cache miss-hit (#4136 )
LLM - UI - Message that 'AI Intent' (triggers) need a bigger model
2026-05-30 21:41:07 +00:00 · 2026-05-15 10:15:00 +02:00 · 2026-05-12 17:44:36 +02:00 · 2026-05-12 17:40:54 +02:00 · 2026-05-12 17:36:39 +02:00 · 2026-05-12 17:28:43 +02:00
35 changed files with 973 additions and 157 deletions
@@ -30,7 +30,7 @@ Stop drowning in noise. Connect any LLM (OpenAI, Gemini, Anthropic, Ollama and m

 **AI change summaries** — instead of staring at a raw diff, your notification reads _"Price dropped from $89.99 to $67.00"_ or _"3 new products added to the listing"_. Works globally or per-watch, with full control over the prompt.

-Works with any model you already pay for — GPT-4o-mini and Gemini Flash handle this well at fractions of a cent per check. Or run it entirely locally with Ollama. Powered by [LiteLLM](https://github.com/BerriAI/litellm), giving you seamless access to [100+ supported providers and models](https://docs.litellm.ai/docs/providers).
+Works with any model you already pay for — GPT-4o-mini and Gemini Flash handle this well at fractions of a cent per check. Or run it entirely locally with **Ollama**, **vLLM**, **LM Studio**, or any **OpenAI-compatible self-hosted endpoint** — pick the *OpenAI-compatible (vLLM, LM Studio, llama.cpp)* option in the provider dropdown and point it at your server's `/v1` URL. Powered by [LiteLLM](https://github.com/BerriAI/litellm), giving you seamless access to [100+ supported providers and models](https://docs.litellm.ai/docs/providers).

 [<img src="./docs/LLM-change-summary.jpeg" style="max-width:100%;" alt="AI-powered website change detection — plain language change summaries and smart alert rules"  title="AI website change detection with LLM change summaries and intelligent alert filtering" />](https://changedetection.io?src=github)

@@ -36,6 +36,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
        default['llm'] = {
            'llm_model':                         _stored_llm.get('model', ''),
            'llm_api_base':                      _stored_llm.get('api_base', ''),
+            'llm_provider_kind':                 _stored_llm.get('provider_kind', ''),
+            'llm_local_token_multiplier':        _stored_llm.get('local_token_multiplier', 5),
            'llm_change_summary_default':        datastore.data['settings']['application'].get('llm_change_summary_default', ''),
            'llm_override_diff_with_summary':    datastore.data['settings']['application'].get('llm_override_diff_with_summary', True),
            'llm_restock_use_fallback_extract':  datastore.data['settings']['application'].get('llm_restock_use_fallback_extract', True),
@@ -148,6 +150,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
                    'model': (llm_data.get('llm_model') or '').strip(),
                    'api_key': effective_api_key,
                    'api_base': (llm_data.get('llm_api_base') or '').strip(),
+                    # Identifies a self-hosted OpenAI-compatible endpoint so reasoning-friendly
+                    # token caps can be applied conditionally (cloud-LLM defaults stay tight).
+                    'provider_kind': (llm_data.get('llm_provider_kind') or '').strip(),
+                    'local_token_multiplier': int(llm_data.get('llm_local_token_multiplier') or 5),
                    'token_budget_month': existing_llm.get('token_budget_month', 0),
                    'max_input_chars': existing_llm.get('max_input_chars', 0),
                    **preserved_counters,
@@ -1,4 +1,7 @@
+import json
+import logging
 import os
+import re

 from flask import Blueprint, jsonify, redirect, url_for, flash
 from flask_babel import gettext
@@ -8,6 +11,44 @@ from changedetectionio.store import ChangeDetectionStore
 from changedetectionio.auth_decorator import login_optionally_required


+class _LiteLLMWarningCapture(logging.Handler):
+    """Capture warnings emitted on the 'LiteLLM' stdlib logger during a single call.
+
+    litellm.get_valid_models() catches HTTP/auth errors internally, logs a warning,
+    and returns []. Without capturing that warning we can't tell the user *why*
+    no models came back (bad key vs. offline vs. genuinely empty model list).
+    """
+    def __init__(self):
+        super().__init__(level=logging.WARNING)
+        self.messages = []
+
+    def emit(self, record):
+        try:
+            self.messages.append(record.getMessage())
+        except Exception:
+            pass
+
+
+def _humanize_litellm_error(raw: str) -> str:
+    # litellm warnings typically look like:
+    #   "Error getting valid models: Failed to get models: { 'error': { 'message': '...' } }"
+    # Pull the inner provider message when present; otherwise trim the boilerplate.
+    if not raw:
+        return raw
+    m = re.search(r'\{.*\}', raw, re.DOTALL)
+    if m:
+        try:
+            body = json.loads(m.group(0))
+            inner = (body.get('error') or {}).get('message') or body.get('message')
+            if inner:
+                return inner
+        except Exception:
+            pass
+    cleaned = re.sub(r'^Error getting valid models:\s*', '', raw)
+    cleaned = re.sub(r'^Failed to get models:\s*', '', cleaned).strip()
+    return cleaned[:500]
+
+
 def construct_llm_blueprint(datastore: ChangeDetectionStore):
    llm_blueprint = Blueprint('llm', __name__)

@@ -30,19 +71,38 @@ def construct_llm_blueprint(datastore: ChangeDetectionStore):
            api_key = (datastore.data['settings']['application'].get('llm') or {}).get('api_key', '')
            logger.debug("LLM model list: no api_key in request, using stored key")

-        _PREFIXES = {'gemini': 'gemini/', 'ollama': 'ollama/', 'openrouter': 'openrouter/'}
+        _PREFIXES = {'gemini': 'gemini/', 'ollama': 'ollama/', 'openrouter': 'openrouter/',
+                     'openai_compatible': 'openai/'}
+        # vLLM / LM Studio / llama.cpp speak OpenAI's wire format — route through litellm's
+        # 'openai' provider but keep the UI-level name distinct from cloud OpenAI.
+        _LITELLM_PROVIDER = {'openai_compatible': 'openai'}
        prefix = _PREFIXES.get(provider, '')
+        litellm_provider = _LITELLM_PROVIDER.get(provider, provider)

        try:
            import litellm
-            logger.debug(f"LLM model list: calling litellm.get_valid_models provider={provider!r} api_base={api_base!r}")
-            raw = litellm.get_valid_models(
-                check_provider_endpoint=True,
-                custom_llm_provider=provider,
-                api_key=api_key or None,
-                api_base=api_base or None,
-            ) or []
+            logger.debug(f"LLM model list: calling litellm.get_valid_models provider={provider!r} (litellm={litellm_provider!r}) api_base={api_base!r}")
+
+            capture = _LiteLLMWarningCapture()
+            litellm_logger = logging.getLogger('LiteLLM')
+            litellm_logger.addHandler(capture)
+            try:
+                raw = litellm.get_valid_models(
+                    check_provider_endpoint=True,
+                    custom_llm_provider=litellm_provider,
+                    api_key=api_key or None,
+                    api_base=api_base or None,
+                ) or []
+            finally:
+                litellm_logger.removeHandler(capture)
+
            models = sorted({(m if m.startswith(prefix) else prefix + m) for m in raw})
+
+            if not models and capture.messages:
+                err = _humanize_litellm_error(capture.messages[-1])
+                logger.debug(f"LLM model list: 0 models, surfacing captured litellm warning: {err!r}")
+                return jsonify({'models': [], 'error': err}), 400
+
            logger.debug(f"LLM model list: got {len(models)} models for provider={provider!r}")
            return jsonify({'models': models, 'error': None})
        except Exception as e:
@@ -67,14 +127,18 @@ def construct_llm_blueprint(datastore: ChangeDetectionStore):

        try:
            logger.debug(f"LLM connection test: sending test prompt to model={model!r}")
+            # Reuse the same multiplier path the production calls use, so cloud providers
+            # stay on a small base cap (matching upstream's pre-existing behavior) and only
+            # 'openai_compatible' endpoints opt into the reasoning-friendly headroom.
+            from changedetectionio.llm.evaluator import apply_local_token_multiplier
            text, total_tokens, input_tokens, output_tokens = completion(
                model=model,
                messages=[{'role': 'user', 'content':
-                    'Reply with exactly five words confirming you are ready.'}],
+                    'Respond with just the word: ready'}],
                api_key=llm_cfg.get('api_key') or None,
                api_base=api_base or None,
-                timeout=20,
-                max_tokens=200,
+                timeout=30,
+                max_tokens=apply_local_token_multiplier(200, llm_cfg),
            )
            reply = text.strip()
            if not reply:
@@ -30,6 +30,10 @@
      <div class="stab-overview-text">
        <strong>{{ _('Intent filtering') }}</strong>
        <p>{{ _('Each watch or tag can carry a plain-text intent — %(ex1)s or %(ex2)s. On every detected change the AI evaluates the diff against it and suppresses irrelevant noise.', ex1='<strong>"notify me only when the price drops"</strong>', ex2='<strong>"alert when the item goes out of stock"</strong>') | safe }}</p>
+        <p><small>{{ _('Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very small models (≤3B) may misjudge numeric comparisons.',
+              local='<code>qwen2.5:7b</code>',
+              gpt='<code>gpt-4o-mini</code>',
+              gemini='<code>gemini-2.0-flash</code>') | safe }}</small></p>
      </div>
    </div>
    <div class="stab-overview-feature">
@@ -111,6 +115,7 @@
      </optgroup>
      <optgroup label="{{ _('Local / Self-hosted') }}">
        <option value="ollama">Ollama (local)</option>
+        <option value="openai_compatible">{{ _('OpenAI-compatible (vLLM, LM Studio, llama.cpp)') }}</option>
      </optgroup>
      <optgroup label="OpenRouter">
        <option value="openrouter">OpenRouter (200+ models)</option>
@@ -127,6 +132,18 @@
    <span class="pure-form-message-inline">{{ _('Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers.') }}</span>
  </div>

+  {# Hidden field carrying the dropdown selection so the backend knows when to apply
+     reasoning-friendly token caps (only for self-hosted OpenAI-compatible endpoints). #}
+  {{ form.llm.form.llm_provider_kind() }}
+
+  <div class="pure-control-group" id="llm-local-advanced-group" style="display:none">
+    <label for="{{ form.llm.form.llm_local_token_multiplier.id }}">{{ form.llm.form.llm_local_token_multiplier.label.text }}</label>
+    {{ form.llm.form.llm_local_token_multiplier() }}
+    <span class="pure-form-message-inline">
+      {{ _('Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to %(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps.', default='5x') | safe }}
+    </span>
+  </div>
+
  <div class="pure-control-group" id="llm-fetch-group" style="display:none">
    <label></label>
    <button type="button" id="llm-fetch-btn" class="pure-button button-xsmall" onclick="llmFetchModels()"
@@ -377,14 +394,15 @@

 <script>
 (function () {
-  const LIVE_PROVIDERS = ['openai', 'anthropic', 'gemini', 'ollama', 'openrouter'];
+  const LIVE_PROVIDERS = ['openai', 'anthropic', 'gemini', 'ollama', 'openai_compatible', 'openrouter'];
  const BASE_DEFAULTS  = { ollama: 'http://localhost:11434' };
  const KEY_HINTS = {
-    openai:     '{{ _("platform.openai.com → API keys") }}',
-    anthropic:  '{{ _("console.anthropic.com → API keys") }}',
-    gemini:     '{{ _("aistudio.google.com → Get API key") }}',
-    ollama:     '{{ _("No API key needed for local Ollama") }}',
-    openrouter: '{{ _("openrouter.ai → Keys") }}',
+    openai:             '{{ _("platform.openai.com → API keys") }}',
+    anthropic:          '{{ _("console.anthropic.com → API keys") }}',
+    gemini:             '{{ _("aistudio.google.com → Get API key") }}',
+    ollama:             '{{ _("No API key needed for local Ollama") }}',
+    openai_compatible:  '{{ _("Bearer token for your self-hosted server (vLLM, LM Studio, etc.)") }}',
+    openrouter:         '{{ _("openrouter.ai → Keys") }}',
  };

  window.llmDisclaimerToggle = function (cb) {
@@ -393,20 +411,31 @@
  };

  window.llmOnProviderChange = function (provider) {
-    const fetchGroup  = document.getElementById('llm-fetch-group');
-    const baseGroup   = document.getElementById('llm-base-group');
-    const modelSelGrp = document.getElementById('llm-model-select-group');
-    const baseField   = document.querySelector('[name="llm-llm_api_base"]');
-    const hint        = document.getElementById('llm-key-hint');
+    const fetchGroup    = document.getElementById('llm-fetch-group');
+    const baseGroup     = document.getElementById('llm-base-group');
+    const modelSelGrp   = document.getElementById('llm-model-select-group');
+    const localAdvGrp   = document.getElementById('llm-local-advanced-group');
+    const baseField     = document.querySelector('[name="llm-llm_api_base"]');
+    const kindField     = document.querySelector('[name="llm-llm_provider_kind"]');
+    const hint          = document.getElementById('llm-key-hint');

    fetchGroup.style.display = LIVE_PROVIDERS.includes(provider) ? '' : 'none';

-    const needsBase = provider === 'ollama';
+    const needsBase = provider === 'ollama' || provider === 'openai_compatible';
    baseGroup.style.display = needsBase ? '' : 'none';
    if (BASE_DEFAULTS[provider] !== undefined) {
      if (!baseField.value) baseField.value = BASE_DEFAULTS[provider];
    }

+    // Persist the dropdown selection so the backend can branch on provider kind
+    // (currently only 'openai_compatible' triggers the local-multiplier code path).
+    if (kindField) kindField.value = provider || '';
+
+    // Show the local-endpoint advanced settings (token multiplier) only for the
+    // OpenAI-compatible self-hosted option. Cloud providers and Ollama get the
+    // original tight caps and don't see this section at all.
+    if (localAdvGrp) localAdvGrp.style.display = (provider === 'openai_compatible') ? '' : 'none';
+
    hint.textContent = KEY_HINTS[provider] || '';
    modelSelGrp.style.display = 'none';
    document.getElementById('llm-fetch-status').textContent = '';
@@ -444,7 +473,7 @@

      if (!data.models || data.models.length === 0) {
        statusEl.style.color = '#e67e22';
-        statusEl.textContent = '{{ _("No models returned — check your API key.") }}';
+        statusEl.textContent = '{{ _("No models returned by the provider.") }}';
        selGroup.style.display = 'none';
        return;
      }
@@ -516,6 +545,11 @@
    if (m.startsWith('gemini/'))       guessed = 'gemini';
    else if (m.startsWith('ollama/'))  guessed = 'ollama';
    else if (m.startsWith('openrouter/')) guessed = 'openrouter';
+    else if (m.startsWith('openai/')) {
+      // openai/<model> + custom api_base = self-hosted OpenAI-compatible (vLLM etc.)
+      const baseField = document.querySelector('[name="llm-llm_api_base"]');
+      guessed = (baseField && baseField.value.trim()) ? 'openai_compatible' : 'openai';
+    }
    else if (m.startsWith('claude'))   guessed = 'anthropic';
    else if (m.startsWith('gpt') || m.startsWith('o1') || m.startsWith('o3')) guessed = 'openai';

@@ -198,10 +198,12 @@ def construct_blueprint(datastore: ChangeDetectionStore):
        best_from = watch.get_from_version_based_on_last_viewed
        from_version      = request.args.get('from_version', best_from if best_from else dates[-2])
        to_version        = request.args.get('to_version', dates[-1])
-        all_changes       = request.args.get('all_changes', '0') == '1'
-        ignore_whitespace = request.args.get('ignore_whitespace', '0') == '1'
-        show_removed      = request.args.get('removed', '1') == '1'
-        show_added        = request.args.get('added', '1') == '1'
+        from changedetectionio.llm.evaluator import DiffPrefs
+        prefs             = DiffPrefs.from_request_args(request.args)
+        all_changes       = prefs.all_changes
+        ignore_whitespace = prefs.ignore_whitespace
+        show_removed      = prefs.show_removed
+        show_added        = prefs.show_added

        def _prep(text):
            """Optionally normalise whitespace on each line before diffing."""
@@ -263,21 +265,17 @@ def construct_blueprint(datastore: ChangeDetectionStore):
            return jsonify({'summary': None, 'error': 'No differences found'})

        from changedetectionio.llm.evaluator import (
-            summarise_change, get_effective_summary_prompt,
+            summarise_change, get_effective_summary_prompt, build_summary_cache_prompt,
            is_global_token_budget_exceeded, get_global_token_budget_month,
            LLMInputTooLargeError,
        )

-        effective_prompt = get_effective_summary_prompt(watch, datastore)
-        from changedetectionio.llm.prompt_builder import build_change_summary_system_prompt
-        # Diff-pref flags + system prompt are part of the cache key so prompt changes bust the cache
+        # Diff-pref flags + system prompt are part of the cache key so prompt changes bust the cache.
        _max_summary_tokens = datastore.data['settings']['application'].get('llm_max_summary_tokens', 3000)
-        cache_prompt = (
-            effective_prompt
-            + f'\x00prefs:all={int(all_changes)},ws={int(ignore_whitespace)}'
-              f',rm={int(show_removed)},add={int(show_added)}'
-            + f'\x00sys:{build_change_summary_system_prompt()}'
-            + f'\x00max_tokens:{_max_summary_tokens}'
+        cache_prompt = build_summary_cache_prompt(
+            effective_prompt=get_effective_summary_prompt(watch, datastore),
+            max_summary_tokens=_max_summary_tokens,
+            prefs=prefs,
        )

        # Check cache — keyed by version pair + prompt hash (invalidates if prompt changes)
@@ -356,7 +356,7 @@ window.watchOverviewI18n = {
            {#last_checked becomes fetch-start-time#}
                <td class="last-checked" data-timestamp="{{ watch.last_checked }}" data-fetchduration={{ watch.fetch_time }} data-eta_complete="{{ watch.last_checked+watch.fetch_time }}" data-label="{{ _('Last Checked') }}">
                    <div class="spinner-wrapper" style="display:none;" >
-                        <span class="spinner"></span><span class="status-text">&nbsp;{{ _('Checking now') }}</span>
+                        <span class="spinner"></span><span class="status-text">&nbsp;{{ watch['__check_status'] or _('Checking now') }}</span>
                    </div>
                    <span class="innertext">{{watch|format_last_checked_time|safe}}</span>
                </td>
@@ -17,6 +17,7 @@ from wtforms import (
    Form,
    Field,
    FloatField,
+    HiddenField,
    IntegerField,
    PasswordField,
    RadioField,
@@ -279,12 +280,44 @@ class TimeBetweenCheckForm(Form):
        return True


+class LabelAfterInputTableWidget(widgets.TableWidget):
+    """
+    Variant of WTForms' TableWidget that renders the input cell before the label cell,
+    so each row is <td>input</td><th>label</th> instead of the default <th>label</th><td>input</td>.
+    """
+
+    def __call__(self, field, **kwargs):
+        from markupsafe import Markup
+        from wtforms.widgets import html_params
+
+        html = []
+        if self.with_table_tag:
+            kwargs.setdefault("id", field.id)
+            html.append(f"<table {html_params(**kwargs)}>")
+        hidden = ""
+        for subfield in field:
+            if subfield.type in ("HiddenField", "CSRFTokenField"):
+                hidden += str(subfield)
+            else:
+                html.append(
+                    f"<tr><td>{hidden}{subfield}</td><th>{subfield.label}</th></tr>"
+                )
+                hidden = ""
+        if self.with_table_tag:
+            html.append("</table>")
+        if hidden:
+            html.append(hidden)
+        return Markup("".join(html))
+
+
 class EnhancedFormField(FormField):
    """
    An enhanced FormField that supports conditional validation with top-level error messages.
    Adds a 'top_errors' property for validation errors at the FormField level.
    """

+    widget = LabelAfterInputTableWidget()
+
    def __init__(self, form_class, label=None, validators=None, separator="-",
                 conditional_field=None, conditional_message=None, conditional_test_function=None, **kwargs):
        """
@@ -1073,7 +1106,6 @@ class globalSettingsLLMForm(Form):
        _l('API Key'),
        validators=[validators.Optional()],
        render_kw={
-            "placeholder": _l('Leave blank to use LITELLM_API_KEY env var'),
            "autocomplete": "off",
            "style": "width: 24em;",
        },
@@ -1086,6 +1118,24 @@ class globalSettingsLLMForm(Form):
            "style": "width: 24em;",
        },
    )
+    # Persisted by the Provider dropdown JS — lets the backend distinguish a self-hosted
+    # OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp) from cloud OpenAI, so we can
+    # apply reasoning-friendly token caps only when the user opted in.
+    llm_provider_kind = HiddenField(
+        validators=[validators.Optional()],
+        default='',
+    )
+    # Multiplier applied to LLM max_tokens caps when provider_kind == 'openai_compatible'.
+    # Reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought into
+    # message.reasoning_content before the final answer lands in message.content.
+    # Local self-hosted models cost no per-token money, so giving them headroom is cheap;
+    # cloud providers stay on the original tight caps so existing users see no cost change.
+    llm_local_token_multiplier = IntegerField(
+        _l('Token multiplier for local reasoning models'),
+        validators=[validators.Optional(), validators.NumberRange(min=1, max=20)],
+        default=5,
+        render_kw={"placeholder": "5", "style": "width: 6em;"},
+    )
    llm_change_summary_default = TextAreaField(
        _l('Default AI Change Summary prompt'),
        validators=[validators.Optional(), validators.Length(max=2000)],
@@ -49,6 +49,9 @@ def completion(model: str, messages: list, api_key: str = None,

    _retryable = (litellm.Timeout, litellm.APIConnectionError)

+    logger.trace("Sending payload to LLM.. ")
+    logger.trace(messages)
+
    for attempt in range(1, DEFAULT_RETRIES + 1):
        try:
            response = litellm.completion(**kwargs)
@@ -16,6 +16,7 @@ Environment variable overrides (take priority over datastore settings):

 import hashlib
 import os
+from dataclasses import dataclass
 from datetime import datetime, timezone
 from loguru import logger

@@ -81,8 +82,35 @@ def _cached_system(text: str, model: str = '') -> dict:

 LLM_DEFAULT_MAX_SUMMARY_TOKENS = 3000

-# Default prompt used when the user hasn't configured llm_change_summary
-DEFAULT_CHANGE_SUMMARY_PROMPT = "Describe in plain English what changed — list what was added or removed as bullet points, including key details for each item. Be careful of content that merely just moved around, you should mention that it moved but dont report that it was added/removed etc. Be considerate of the style content you are summarising the change of, adjust your report accordingly. Do not quote non-English text verbatim; translate and summarise all content into English. Your entire response must be in English."
+# Output-token cap for the JSON-returning calls (intent eval, preview, setup/prefilter).
+# Mirrors client.py's _MAX_COMPLETION_TOKENS so the multiplier helper has a base value
+# to scale; cloud-LLM users hit this default unmodified, preserving prior cost defaults.
+JSON_RESPONSE_MAX_TOKENS = 400
+
+# Default prompt used when the user hasn't configured llm_change_summary.
+# This owns the OUTPUT FORMAT (structure, sections, style, language). The system prompt
+# in prompt_builder.build_change_summary_system_prompt() only covers how to READ the diff.
+# Users can replace this entirely (e.g. "Just tell me the new timestamp.") without
+# fighting hard-coded structure rules from the system prompt.
+DEFAULT_CHANGE_SUMMARY_PROMPT = (
+    "Describe what changed in plain English using these sections, in this fixed order — "
+    "omit a section entirely if there is nothing to report for it:\n"
+    "  Added: ...\n"
+    "  Changed: ...\n"
+    "  Removed: ...\n"
+    "The Removed section MUST always be last. Never place removals before additions or changes.\n\n"
+    "List items as bullet points with key details for each one. Be considerate of the style "
+    "of content you are summarising and adjust your report accordingly.\n"
+    "Do not list standalone timestamps like '3 hours ago', 'Yesterday', '2 minutes ago' as added "
+    "or removed items — they are not meaningful content changes.\n"
+    "For content-heavy pages (news, listings, feeds): quote or paraphrase the specific new "
+    "headlines, items, or entries that were added — do not collapse them into vague phrases "
+    "like 'new articles were added' or 'section was expanded'.\n"
+    "For large blocks of new text (full articles, documents, long paragraphs): briefly summarise "
+    "the substance in 1-2 sentences capturing the key point — do not just repeat the title.\n\n"
+    "Do not quote non-English text verbatim; translate and summarise all content into English. "
+    "Your entire response must be in English."
+)


 def _summary_max_tokens(diff: str, max_cap: int = LLM_DEFAULT_MAX_SUMMARY_TOKENS) -> int:
@@ -90,6 +118,37 @@ def _summary_max_tokens(diff: str, max_cap: int = LLM_DEFAULT_MAX_SUMMARY_TOKENS
    return max(400, min(len(diff) // 4, max_cap))


+def apply_local_token_multiplier(base_max_tokens: int, llm_cfg: dict) -> int:
+    """
+    Scale max_tokens for self-hosted OpenAI-compatible endpoints (vLLM, LM Studio, llama.cpp).
+
+    Reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought into
+    `message.reasoning_content` BEFORE the final answer lands in `message.content`.
+    Without enough headroom the request truncates mid-thought (`finish_reason='length'`)
+    and the answer never lands — callers see an empty string and silently fall through
+    to safe defaults, hiding the problem.
+
+    Local self-hosted models cost no per-token money, so headroom is cheap; cloud
+    providers (OpenAI, Anthropic, Gemini, OpenRouter) keep their original tight caps
+    so existing users see no cost change.
+
+    Activated only when `llm_cfg['provider_kind'] == 'openai_compatible'`.
+    Multiplier defaults to 5x and is user-configurable in Settings → AI → Provider.
+    """
+    if (llm_cfg or {}).get('provider_kind') != 'openai_compatible':
+        return base_max_tokens
+    try:
+        multiplier = int(llm_cfg.get('local_token_multiplier') or 5)
+    except (TypeError, ValueError):
+        multiplier = 5
+    # Clamp to the same 1-20 range the form enforces. Defense-in-depth against
+    # corrupted datastore values that bypassed form validation (manual JSON edits,
+    # future migrations, plugins): a runaway multiplier could otherwise produce
+    # absurdly large max_tokens caps and exhaust local-endpoint memory.
+    multiplier = max(1, min(multiplier, 20))
+    return base_max_tokens * multiplier
+
+
 # ---------------------------------------------------------------------------
 # Intent resolution
 # ---------------------------------------------------------------------------
@@ -338,6 +397,7 @@ def run_setup(watch, datastore, snapshot_text: str) -> None:
            ],
            api_key=cfg.get('api_key'),
            api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
            extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
        )
        _check_token_budget(watch, cfg, tokens)
@@ -379,6 +439,58 @@ def compute_summary_cache_key(diff_text: str, prompt: str) -> str:
    return h.hexdigest()[:16]


+@dataclass(frozen=True)
+class DiffPrefs:
+    """
+    User-facing diff display preferences. Part of the LLM summary cache key so
+    that toggling a preference produces a fresh summary.
+
+    Field defaults are the single source of truth — the UI query-arg defaults in
+    diff.py's from_request_args() and the worker pre-cache's bare DiffPrefs()
+    both rely on these.
+    """
+    all_changes:       bool = False
+    ignore_whitespace: bool = False
+    show_removed:      bool = True
+    show_added:        bool = True
+
+    @classmethod
+    def from_request_args(cls, args) -> 'DiffPrefs':
+        """Parse from a Flask request.args (or any .get(key, default)-shaped mapping)."""
+        return cls(
+            all_changes       = args.get('all_changes', '0') == '1',
+            ignore_whitespace = args.get('ignore_whitespace', '0') == '1',
+            show_removed      = args.get('removed', '1') == '1',
+            show_added        = args.get('added', '1') == '1',
+        )
+
+    def cache_key_suffix(self) -> str:
+        return (
+            f'\x00prefs:all={int(self.all_changes)},ws={int(self.ignore_whitespace)}'
+            f',rm={int(self.show_removed)},add={int(self.show_added)}'
+        )
+
+
+def build_summary_cache_prompt(effective_prompt: str, max_summary_tokens: int,
+                                prefs: DiffPrefs = None) -> str:
+    """
+    Compose the full cache-key string passed to save/get_llm_diff_summary.
+
+    Default prefs are DiffPrefs() — must match the UI's query-arg defaults so a
+    worker-side pre-cache is hit by an unmodified UI request. Same helper must
+    be used by both the worker pre-cache write and the UI diff route read,
+    otherwise the prompt hashes diverge and the cache file isn't found.
+    """
+    if prefs is None:
+        prefs = DiffPrefs()
+    return (
+        effective_prompt
+        + prefs.cache_key_suffix()
+        + f'\x00sys:{build_change_summary_system_prompt()}'
+        + f'\x00max_tokens:{max_summary_tokens}'
+    )
+
+
 def summarise_change(watch, datastore, diff: str, current_snapshot: str = '') -> str:
    """
    Generate a plain-language summary of the change using the watch's
@@ -431,9 +543,12 @@ def summarise_change(watch, datastore, diff: str, current_snapshot: str = '') ->
            ],
            api_key=cfg.get('api_key'),
            api_base=cfg.get('api_base'),
-            max_tokens=_summary_max_tokens(
-                diff,
-                max_cap=int(datastore.data['settings']['application'].get('llm_max_summary_tokens', LLM_DEFAULT_MAX_SUMMARY_TOKENS) or LLM_DEFAULT_MAX_SUMMARY_TOKENS),
+            max_tokens=apply_local_token_multiplier(
+                _summary_max_tokens(
+                    diff,
+                    max_cap=int(datastore.data['settings']['application'].get('llm_max_summary_tokens', LLM_DEFAULT_MAX_SUMMARY_TOKENS) or LLM_DEFAULT_MAX_SUMMARY_TOKENS),
+                ),
+                cfg,
            ),
            extra_body=_extra_body,
        )
@@ -496,6 +611,7 @@ def preview_extract(watch, datastore, content: str) -> dict | None:
            ],
            api_key=cfg.get('api_key'),
            api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
            extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
        )
        accumulate_global_tokens(datastore, tokens, model=cfg['model'])
@@ -579,6 +695,7 @@ def evaluate_change(watch, datastore, diff: str, current_snapshot: str = '') ->
            ],
            api_key=cfg.get('api_key'),
            api_base=cfg.get('api_base'),
+            max_tokens=apply_local_token_multiplier(JSON_RESPONSE_MAX_TOKENS, cfg),
            extra_body=_thinking_extra_body(cfg['model'], int(datastore.data['settings']['application'].get('llm_thinking_budget', LLM_DEFAULT_THINKING_BUDGET) or 0)),
        )
        raw, tokens = _resp[0], _resp[1]
@@ -79,7 +79,13 @@ def build_eval_system_prompt() -> str:
        "Rules:\n"
        "- important=true ONLY when the diff clearly and specifically matches the intent — be strict\n"
        "- Pay close attention to direction: an intent about price drops means removed (-) prices and added (+) lower prices\n"
-        "- Empty, trivial, or cosmetic diffs (timestamps, counters, whitespace, navigation) → important=false\n"
+        "- The user's intent always wins. If the intent explicitly asks about timestamps, numbers, counters, "
+        "thresholds, or any specific value (e.g. 'when the timestamp is greater than 1778599592', "
+        "'when stock count > 5'), evaluate the diff against that intent — do NOT dismiss it as cosmetic.\n"
+        "- Otherwise: empty, trivial, or genuinely cosmetic diffs (heartbeat timestamps, view counters, "
+        "whitespace, navigation tweaks) default to important=false\n"
+        "- For numeric comparisons in the intent, parse the values explicitly and compare them — "
+        "do not eyeball or round\n"
        "- If the same text appears in both removed (-) and added (+) lines the content has likely just "
        "shifted or been reordered. Treat pure reordering as important=false unless the intent "
        "explicitly asks about order or position.\n"
@@ -130,7 +136,14 @@ def build_change_summary_prompt(diff: str, custom_prompt: str,
    """
    Build the user message for an AI Change Summary call.
    The user supplies their own instructions (custom_prompt); this wraps them
-    with the diff and optional page context.
+    with the diff (which carries its own surrounding context via unified_diff's
+    n=3 context lines, marked '~' by _annotate_moved_lines).
+
+    NOTE: current_snapshot is accepted for caller compatibility but intentionally
+    unused. A wholesale page excerpt caused the LLM to report unchanged page
+    content (e.g. old release-note bullets) as "what changed" — hallucinations
+    drawn from the excerpt rather than the diff. The in-diff context lines give
+    the model enough surrounding text to describe each change accurately.
    """
    parts = []
    if url:
@@ -138,42 +151,33 @@ def build_change_summary_prompt(diff: str, custom_prompt: str,
    if title:
        parts.append(f"Page title: {title}")
    parts.append(f"Instructions: {custom_prompt}")
-    if current_snapshot:
-        excerpt = trim_to_relevant(current_snapshot, custom_prompt, max_chars=2_000)
-        if excerpt:
-            parts.append(f"\nCurrent page (excerpt):\n{excerpt}")
    parts.append(f"\nWhat changed (diff):\n{_annotate_moved_lines(diff)}")
    return '\n'.join(parts)


 def build_change_summary_system_prompt() -> str:
+    """
+    Universal, format-agnostic instructions: how to READ a diff and accuracy rules.
+    All output-format choices (prose vs JSON, sections, bullets, language, length)
+    are owned by the user prompt — including the default in
+    DEFAULT_CHANGE_SUMMARY_PROMPT — so that a user replacing the user-prompt
+    (e.g. asking for raw JSON) is not overridden by hard-coded format rules here.
+    """
    return (
-        "You are a meticulous, accurate summariser of website changes for monitoring notifications.\n"
-        "Your goal is to describe exactly what changed — never omit significant details, "
-        "never add information that isn't in the diff, and never speculate.\n\n"
+        "You analyse a unified-diff document showing how a monitored web page changed, "
+        "and produce exactly the output the user asks for.\n\n"
        "Rules for reading the diff:\n"
-        "- Lines starting with + are genuinely new content. List them specifically.\n"
-        "- Lines starting with - are genuinely removed content. List them specifically.\n"
+        "- Lines starting with + are genuinely new content.\n"
+        "- Lines starting with - are genuinely removed content.\n"
        "- Lines starting with ~ have been PRE-IDENTIFIED as moved/reordered or trivial — "
        "the same text exists on both sides of the diff, or the line is a standalone timestamp. "
-        "Do NOT report ~ lines as added or removed. "
-        "If many ~ lines exist, note briefly that some content was reordered.\n"
-        "- Never list standalone timestamps like '3 hours ago', 'Yesterday', '2 minutes ago' "
-        "as added or removed items — they are not meaningful content changes.\n"
-        "For content-heavy pages (news, listings, feeds): quote or paraphrase the specific new "
-        "headlines, items, or entries that were added — do not collapse them into vague phrases "
-        "like 'new articles were added' or 'section was expanded'.\n"
-        "For large blocks of new text (full articles, documents, long paragraphs): briefly summarise "
-        "the substance in 1-2 sentences capturing the key point — do not just repeat the title.\n\n"
-        "Structure your response using these sections, in this fixed order — "
-        "omit a section entirely if there is nothing to report for it:\n"
-        "  Added: ...\n"
-        "  Changed: ...\n"
-        "  Removed: ...\n"
-        "The Removed section MUST always be last. Never place removals before additions or changes.\n\n"
-        "Follow the user's formatting instructions exactly for structure, language, and length.\n"
-        "Respond with ONLY the summary text — no JSON, no markdown code fences, no preamble. "
-        "Just the description."
+        "Do NOT treat ~ lines as added or removed.\n\n"
+        "Accuracy: only report what the +/- lines actually contain. Never invent details, "
+        "never speculate, never add information that isn't in the diff.\n\n"
+        "Follow the user's instructions exactly — including the requested output format "
+        "(plain text, JSON, Markdown, single value, etc.), structure, language, and length. "
+        "Do not add preamble, meta-commentary, or self-introduction. Produce only the output "
+        "the user asked for — nothing before it, nothing after it."
    )


@@ -1024,8 +1024,10 @@ class model(EntityPersistenceMixin, watch_base):
        prompt_hash = self._llm_summary_prompt_hash(prompt)
        fname = os.path.join(self.data_dir, f'change-summary-{from_version}-to-{to_version}-{prompt_hash}.txt')
        if not os.path.isfile(fname):
+            logger.debug(f"LLM cached diff summary '{fname}' NOT found")
            return ''
        with open(fname, 'r', encoding='utf-8') as f:
+            logger.debug(f"LLM cached diff summary '{fname}' FOUND")
            return f.read().strip()

    def save_llm_diff_summary(self, summary: str, from_version, to_version, prompt: str = ''):
@@ -1064,6 +1066,7 @@ class model(EntityPersistenceMixin, watch_base):
        Prepare watch data for commit.

        Excludes processor_config_* keys (stored in separate files).
+        Excludes __-prefixed keys (transient in-memory state — must not persist to disk).
        Normalizes browser_steps to empty list if no meaningful steps.
        """
        import copy
@@ -1077,8 +1080,11 @@ class model(EntityPersistenceMixin, watch_base):
        else:
            snapshot = dict(self)

-        # Exclude processor config keys (stored separately)
-        watch_dict = {k: copy.deepcopy(v) for k, v in snapshot.items() if not k.startswith('processor_config_')}
+        # Exclude processor config keys (stored separately) and __-prefixed transient keys
+        watch_dict = {
+            k: copy.deepcopy(v) for k, v in snapshot.items()
+            if not k.startswith('processor_config_') and not k.startswith('__')
+        }

        # Normalize browser_steps: if no meaningful steps, save as empty list
        if not self.has_browser_steps:
@@ -335,6 +335,13 @@ class watch_base(dict):
        if self.__watch_was_edited:
            return  # Already marked as edited

+        # __-prefixed keys are transient in-memory state (e.g. __check_status set by
+        # set_watch_minitext_status). They never persist to disk and must not trigger
+        # the edited flag — otherwise just observing a check in progress would force
+        # the next run to bypass the unchanged-content skip.
+        if isinstance(key, str) and key.startswith('__'):
+            return
+
        # Import from shared schema utilities (no circular dependency)
        from .schema_utils import get_readonly_watch_fields
        readonly_fields = get_readonly_watch_fields()
@@ -13,6 +13,7 @@ import json
 import re
 from loguru import logger
 from changedetectionio.pluggy_interface import hookimpl
+from changedetectionio.llm.evaluator import apply_local_token_multiplier

 # Injected at startup by inject_datastore_into_plugins()
 datastore = None
@@ -234,7 +235,10 @@ def get_itemprop_availability_override(content, fetcher_name, fetcher_instance,
            ],
            api_key=llm_cfg.get('api_key'),
            api_base=llm_cfg.get('api_base'),
-            max_tokens=80,
+            # 80 fits a {price, currency, availability} JSON answer comfortably for cloud
+            # models. Local reasoning models burn most of that on chain-of-thought before
+            # the JSON lands — the multiplier scales it up only when provider_kind says so.
+            max_tokens=apply_local_token_multiplier(80, llm_cfg),
        )

        accumulate_global_tokens(
@@ -210,10 +210,19 @@ def render(watch, datastore, request, url_for, render_template, flash, redirect,
    llm_summary_prompt = ''
    if llm_configured:
        try:
-            from changedetectionio.llm.evaluator import get_effective_summary_prompt
+            from changedetectionio.llm.evaluator import (
+                get_effective_summary_prompt, build_summary_cache_prompt,
+            )
            _prompt = get_effective_summary_prompt(watch, datastore)
            llm_summary_prompt = _prompt
-            llm_diff_summary = watch.get_llm_diff_summary(from_version, to_version, prompt=_prompt)
+            # Must match the cache_prompt the worker writes and the UI ajax route reads —
+            # using UI default diff prefs so the initial render finds the worker's pre-cache.
+            _max_summary_tokens = datastore.data['settings']['application'].get('llm_max_summary_tokens', 3000)
+            _cache_prompt = build_summary_cache_prompt(
+                effective_prompt=_prompt,
+                max_summary_tokens=_max_summary_tokens,
+            )
+            llm_diff_summary = watch.get_llm_diff_summary(from_version, to_version, prompt=_cache_prompt)
        except Exception as e:
            logger.warning(f"Could not load llm-diff-summary for {uuid}: {e}")

@@ -495,16 +495,17 @@ class perform_site_check(difference_detection_processor):
        # Start with content reference, avoid copy until modification
        html_content = content

-        # Apply include filters (CSS, XPath, JSON)
-        # Except for plaintext (incase they tried to confuse the system, it will HTML escape
-        #if not stream_content_type.is_plaintext:
-        if filter_config.has_include_filters:
-            html_content = content_processor.apply_include_filters(content, stream_content_type)
-
-        # Apply subtractive selectors
+        # Apply subtractive selectors first so include filters operate on already-cleaned content.
+        # Otherwise a subtractive selector that relies on ancestor context (e.g. ".main .ads")
+        # cannot match after the include filter has extracted the inner element and stripped
+        # the parent wrapper.
        if filter_config.has_subtractive_selectors:
            html_content = content_processor.apply_subtractive_selectors(html_content)

+        # Apply include filters (CSS, XPath, JSON)
+        if filter_config.has_include_filters:
+            html_content = content_processor.apply_include_filters(html_content, stream_content_type)
+
        # === TEXT EXTRACTION ===
        if watch.is_source_type_url:
            # For source URLs, keep raw content
@@ -108,7 +108,9 @@ def test_check_notification_email_formats_default_HTML(client, live_server, meas
    html_content = html_part.get_content()
    assert 'some text<br>' in html_content  # We converted \n from the notification body
    assert 'fallback-body<br>' in html_content  # kept the original <br>
-    assert '(added) So let\'s see what happens.<br>' in html_content  # the html part
+    # GHSA-q8xq-qg4x-wphg: apostrophes in diff content are escaped (&#39;) for HTML notifications.
+    # Renders as ' in the recipient's email client; only the byte-source differs.
+    assert '(added) So let&#39;s see what happens.<br>' in html_content  # the html part
    delete_all_watches(client)


@@ -452,7 +454,8 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
    html_part = parts[1]
    assert html_part.get_content_type() == 'text/html'
    html_content = html_part.get_content()
-    assert '(removed) So let\'s see what happens.' in html_content  # the html part
+    # GHSA-q8xq-qg4x-wphg: apostrophes in diff content are escaped (&#39;) for HTML notifications.
+    assert '(removed) So let&#39;s see what happens.' in html_content  # the html part
    assert '&lt;!DOCTYPE html' not in html_content
    assert '<!DOCTYPE html' in html_content # Our original template is working correctly

@@ -792,5 +795,6 @@ def test_check_html_notification_with_apprise_format_is_html(client, live_server
    html_content = html_part.get_content()
    assert 'some text<br>' in html_content  # We converted \n from the notification body
    assert 'fallback-body<br>' in html_content  # kept the original <br>
-    assert '(added) So let\'s see what happens.<br>' in html_content  # the html part
+    # GHSA-q8xq-qg4x-wphg: apostrophes in diff content are escaped (&#39;) for HTML notifications.
+    assert '(added) So let&#39;s see what happens.<br>' in html_content  # the html part
    delete_all_watches(client)
@@ -251,3 +251,41 @@ body > table > tr:nth-child(3) > td:nth-child(3)""",
        # First column should exist
        assert b"Emil" in res.data

+
+# Re PR #978: subtractive_selectors must run BEFORE include_filters so that selectors
+# relying on ancestor context (e.g. ".main .ad") can still match. If include runs first,
+# the ancestor wrapper is stripped and the subtractive selector matches nothing.
+def test_subtractive_selectors_applied_before_include_filters(client, live_server, measure_memory_usage, datastore_path):
+    page_html = """<html><body>
+<div class="main">
+  <p class="keep">first kept paragraph</p>
+  <p class="advertisement">noisy advertisement text</p>
+  <p class="keep">second kept paragraph</p>
+</div>
+</body></html>
+"""
+    with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
+        f.write(page_html)
+
+    test_url = url_for("test_endpoint", _external=True)
+    client.application.config.get('DATASTORE').add_watch(
+        url=test_url,
+        extras={
+            # Include filter strips the .main wrapper from the output
+            "include_filters": [".main p"],
+            # Subtractive selector depends on the .main ancestor — only effective if it runs first
+            "subtractive_selectors": [".main .advertisement"],
+        },
+    )
+    client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    wait_for_all_checks(client)
+
+    res = client.get(
+        url_for("ui.ui_preview.preview_page", uuid="first"),
+        follow_redirects=True,
+    )
+
+    assert b"first kept paragraph" in res.data
+    assert b"second kept paragraph" in res.data
+    # The bug: ad survives if include filter runs first
+    assert b"noisy advertisement text" not in res.data
@@ -790,3 +790,79 @@ def test_html_watch_diff_content_escaped_in_html_notification(client, live_serve
        f"Diff content from text/html page was NOT escaped — tracking pixel reached HTML notification: {body!r}"

    client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
+
+
+def test_source_url_diff_content_escaped_in_html_notification(client, live_server, measure_memory_usage, datastore_path):
+    """
+    GHSA-q8xq-qg4x-wphg — companion to the inscriptis test. `source:`-prefixed
+    URLs short-circuit the HTML→text step (processor.py:509-511) and store the
+    raw HTML body verbatim as the snapshot. That gives an attacker who controls
+    a watched page a *direct* injection path — no entity-encoding tricks needed,
+    any live `<a>` / `<img>` / `<script>` on the page lands straight into
+    current_snapshot / raw_diff. The escape pass must catch this too.
+    """
+    from .util import write_test_file_and_sync
+
+    if os.path.isfile(os.path.join(datastore_path, "notification.txt")):
+        os.unlink(os.path.join(datastore_path, "notification.txt"))
+
+    # Baseline: innocuous raw HTML.
+    baseline_html = "<html><body><p>nothing to see here</p></body></html>"
+    write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), baseline_html)
+
+    test_notification_url = url_for('test_notification_endpoint', _external=True).replace('http://', 'post://')
+    # `source:` prefix → raw HTML body is stored as-is in the snapshot (no inscriptis).
+    test_url = 'source:' + url_for('test_endpoint', _external=True, content_type='text/html')
+
+    res = client.post(
+        url_for("settings.settings_page"),
+        data={
+            "application-fetch_backend": "html_requests",
+            "application-minutes_between_check": 180,
+            "application-notification_body": 'Watch had changes:\n{{current_snapshot}}',
+            "application-notification_format": "html",
+            "application-notification_urls": test_notification_url,
+            "application-notification_title": "Change detected",
+        },
+        follow_redirects=True
+    )
+    assert b'Settings updated' in res.data
+
+    res = client.post(
+        url_for("ui.ui_views.form_quick_watch_add"),
+        data={"url": test_url, "tags": ''},
+        follow_redirects=True
+    )
+    assert b"Watch added" in res.data
+
+    wait_for_all_checks(client)
+
+    # Modified page contains LIVE HTML directly — no entity encoding. With source:
+    # this lands in the snapshot verbatim.
+    attacker_html = (
+        '<html><body>'
+        '<a href="https://attacker.example/payment">ACTION REQUIRED</a>'
+        '<img src="https://attacker.example/track" width="1" height="1">'
+        '</body></html>'
+    )
+    write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), attacker_html)
+
+    res = client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
+    assert b'Queued 1 watch for rechecking.' in res.data
+
+    wait_for_all_checks(client)
+    wait_for_notification_endpoint_output(datastore_path=datastore_path)
+
+    with open(os.path.join(datastore_path, "notification.txt"), 'r') as f:
+        body = f.read()
+
+    # Sanity: snapshot really did carry the markup through. Escaped form must show up.
+    assert '&lt;a href=' in body or '&amp;lt;a href=' in body, \
+        f"Expected escaped attacker markup in notification body, got: {body!r}"
+
+    assert '<a href="https://attacker.example/payment"' not in body, \
+        f"source: URL raw HTML was NOT escaped — phishing link reached HTML notification: {body!r}"
+    assert '<img src="https://attacker.example/track"' not in body, \
+        f"source: URL raw HTML was NOT escaped — tracking pixel reached HTML notification: {body!r}"
+
+    client.get(url_for("ui.form_delete", uuid="all"), follow_redirects=True)
@@ -799,6 +799,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -892,10 +899,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1091,6 +1111,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1104,7 +1128,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3118,11 +3142,11 @@ msgid "API Key"
 msgstr "API klíč"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -815,6 +815,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -908,10 +915,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1107,6 +1127,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1120,7 +1144,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3170,11 +3194,11 @@ msgid "API Key"
 msgstr "API-Schlüssel"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -797,6 +797,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -890,10 +897,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1089,6 +1109,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1102,7 +1126,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3112,11 +3136,11 @@ msgid "API Key"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -797,6 +797,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -890,10 +897,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1089,6 +1109,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1102,7 +1126,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3112,11 +3136,11 @@ msgid "API Key"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -835,6 +835,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -928,10 +935,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1127,6 +1147,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1140,7 +1164,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3185,11 +3209,11 @@ msgid "API Key"
 msgstr "Clave API"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -803,6 +803,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -896,10 +903,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1095,6 +1115,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1108,7 +1132,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3125,11 +3149,11 @@ msgid "API Key"
 msgstr "Clé API"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -799,6 +799,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -892,10 +899,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1091,6 +1111,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1104,7 +1128,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3114,11 +3138,11 @@ msgid "API Key"
 msgstr "Chiave API"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -804,6 +804,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -897,10 +904,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1096,6 +1116,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1109,7 +1133,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3131,11 +3155,11 @@ msgid "API Key"
 msgstr "APIキー"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -799,6 +799,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr "각 모니터링 또는 태그에 일반 텍스트 판단 기준(%(ex1)s 또는 %(ex2)s)을 지정할 수 있습니다. 변경이 감지될 때마다 AI가 diff를 이 기준과 비교해 불필요한 알림을 줄입니다."

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -898,10 +905,23 @@ msgstr "프로바이더 선택"
 msgid "Local / Self-hosted"
 msgstr "로컬 / 자체 호스팅"

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr "Ollama 또는 사용자 지정/자체 호스팅 엔드포인트에만 필요합니다. 클라우드 프로바이더는 비워 두세요."

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr "사용 가능한 모델 불러오기"
@@ -1097,6 +1117,10 @@ msgstr "aistudio.google.com → API 키 받기"
 msgid "No API key needed for local Ollama"
 msgstr "로컬 Ollama에는 API 키가 필요 없습니다"

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr "openrouter.ai → 키"
@@ -1110,8 +1134,8 @@ msgid "Loading…"
 msgstr "불러오는 중..."

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
-msgstr "반환된 모델이 없습니다. API 키를 확인하세요."
+msgid "No models returned by the provider."
+msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "— choose a model —"
@@ -3121,14 +3145,14 @@ msgstr "모델"
 msgid "API Key"
 msgstr "API 키"

-#: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
-msgstr "LITELLM_API_KEY 환경 변수를 사용하려면 비워 두세요"
-
 #: changedetectionio/forms.py
 msgid "API Base URL"
 msgstr "API 기본 URL"

+#: changedetectionio/forms.py
+msgid "Token multiplier for local reasoning models"
+msgstr ""
+
 #: changedetectionio/forms.py
 msgid "Default AI Change Summary prompt"
 msgstr "기본 AI 변경 요약 프롬프트"
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: changedetection.io 0.55.3\n"
 "Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
-"POT-Creation-Date: 2026-05-02 18:29+0900\n"
+"POT-Creation-Date: 2026-05-12 17:39+0200\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <LL@li.org>\n"
@@ -796,6 +796,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -889,10 +896,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1088,6 +1108,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1101,7 +1125,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3111,11 +3135,11 @@ msgid "API Key"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -822,6 +822,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -915,10 +922,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1114,6 +1134,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1127,7 +1151,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3162,11 +3186,11 @@ msgid "API Key"
 msgstr "Chave da API"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -832,6 +832,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -925,10 +932,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1124,6 +1144,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1137,7 +1161,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3165,11 +3189,11 @@ msgid "API Key"
 msgstr "API Anahtarı"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -812,6 +812,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -905,10 +912,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1104,6 +1124,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1117,7 +1141,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3144,11 +3168,11 @@ msgid "API Key"
 msgstr "Ключ API"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -801,6 +801,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -894,10 +901,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1093,6 +1113,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1106,7 +1130,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3117,11 +3141,11 @@ msgid "API Key"
 msgstr "API密钥"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -800,6 +800,13 @@ msgid ""
 "diff against it and suppresses irrelevant noise."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Tip: intent evaluation benefits from a capable model — recommended %(local)s locally, or %(gpt)s / %(gemini)s. Very "
+"small models (≤3B) may misjudge numeric comparisons."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 #, python-format
 msgid ""
@@ -893,10 +900,23 @@ msgstr ""
 msgid "Local / Self-hosted"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "OpenAI-compatible (vLLM, LM Studio, llama.cpp)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Only needed for Ollama or custom/self-hosted endpoints. Leave blank for cloud providers."
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+#, python-format
+msgid ""
+"Local reasoning models (Qwen3, DeepSeek-R1, Gemma 3, etc.) emit chain-of-thought before the final answer. This "
+"multiplier scales every <code>max_tokens</code> cap for this endpoint to leave reasoning room. Defaults to "
+"%(default)s; raise it if responses come back truncated, lower it if you want tighter limits. Only applied to self-"
+"hosted OpenAI-compatible endpoints — cloud providers (OpenAI, Anthropic, Gemini) keep their original tight caps."
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "Load available models"
 msgstr ""
@@ -1092,6 +1112,10 @@ msgstr ""
 msgid "No API key needed for local Ollama"
 msgstr ""

+#: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
+msgid "Bearer token for your self-hosted server (vLLM, LM Studio, etc.)"
+msgstr ""
+
 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
 msgid "openrouter.ai → Keys"
 msgstr ""
@@ -1105,7 +1129,7 @@ msgid "Loading…"
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
-msgid "No models returned — check your API key."
+msgid "No models returned by the provider."
 msgstr ""

 #: changedetectionio/blueprint/settings/templates/settings_llm_tab.html
@@ -3116,11 +3140,11 @@ msgid "API Key"
 msgstr "API 金鑰"

 #: changedetectionio/forms.py
-msgid "Leave blank to use LITELLM_API_KEY env var"
+msgid "API Base URL"
 msgstr ""

 #: changedetectionio/forms.py
-msgid "API Base URL"
+msgid "Token multiplier for local reasoning models"
 msgstr ""

 #: changedetectionio/forms.py
@@ -9,9 +9,16 @@ from changedetectionio.pluggy_interface import apply_update_handler_alter, apply

 import asyncio
 import os
+import re
 import sys
 import time

+# Allow alphanumerics, space, and a small set of punctuation that appears in legitimate
+# status strings ("Querying AI/LLM (intent)..", "Fetching page.."). Anything that could
+# be HTML-active (<, >, &, ", ', =, ;, {, }, `, \) is stripped.
+_MINITEXT_STATUS_SAFE_RE = re.compile(r'[^A-Za-z0-9 ().,/:\-]')
+_MINITEXT_STATUS_MAX_LEN = 80
+
 from loguru import logger

 # Async version of update_worker
@@ -20,6 +27,22 @@ from loguru import logger
 IN_PYTEST = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
 DEFER_SLEEP_TIME_ALREADY_QUEUED = 0.3 if IN_PYTEST else 10.0

+
+def set_watch_minitext_status(watch, status):
+    """
+    Set a transient status line for a watch (e.g. "Fetching page..", "Querying AI/LLM..").
+
+    Writes to watch['__check_status'] so a client reloading the page can render the
+    last known status, and fires the realtime signal so already-connected clients
+    update live. __-prefixed key is filtered from disk by Watch._get_commit_data().
+
+    Status is sanitized to alphanumerics, space, and safe punctuation only.
+    """
+    safe_status = _MINITEXT_STATUS_SAFE_RE.sub('', str(status))[:_MINITEXT_STATUS_MAX_LEN]
+    watch['__check_status'] = safe_status
+    signal('watch_small_status_comment').send(watch_uuid=watch['uuid'], status=safe_status)
+
+
 async def async_update_worker(worker_id, q, notification_q, app, datastore, executor=None):
    """
    Async worker function that processes watch check jobs from the queue.
@@ -159,8 +182,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                    # Allow plugins to modify/wrap the update_handler
                    update_handler = apply_update_handler_alter(update_handler, watch, datastore)

-                    update_signal = signal('watch_small_status_comment')
-                    update_signal.send(watch_uuid=uuid, status="Fetching page..")
+                    set_watch_minitext_status(watch, "Fetching page..")

                    # All fetchers are now async, so call directly
                    await update_handler.call_browser()
@@ -446,6 +468,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                                    # Step 1: AI Change Intent — may suppress notification
                                    _llm_intent, _llm_intent_source = resolve_intent(watch, datastore)
                                    if _llm_intent:
+                                        set_watch_minitext_status(watch, "AI/LLM (rules)..")
                                        _llm_result = await loop.run_in_executor(
                                            executor,
                                            lambda diff=_diff_text, snap=contents: evaluate_change(
@@ -465,6 +488,7 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec

                                    # Step 2: AI Change Summary — runs for any LLM-configured watch with a change
                                    if changed_detected:
+                                        set_watch_minitext_status(watch, "AI/LLM (summary)..")
                                        _change_summary = await loop.run_in_executor(
                                            executor,
                                            lambda diff=_diff_text, snap=contents: summarise_change(
@@ -478,22 +502,6 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec

                        datastore.update_watch(uuid=uuid, update_obj=update_obj)

-                        # Save AI summary file now that the new snapshot has been committed
-                        # and its version timestamp is the last key in history
-                        if update_obj.get('_llm_change_summary') and _llm_from_version:
-                            try:
-                                from changedetectionio.llm.evaluator import get_effective_summary_prompt
-                                _llm_to_version = list(watch.history.keys())[-1]
-                                _llm_prompt = get_effective_summary_prompt(watch, datastore)
-                                watch.save_llm_diff_summary(
-                                    update_obj['_llm_change_summary'],
-                                    _llm_from_version,
-                                    _llm_to_version,
-                                    prompt=_llm_prompt,
-                                )
-                            except Exception as _fe:
-                                logger.warning(f"Could not write change-summary file for {uuid}: {_fe}")
-
                        if changed_detected or not watch.history_n:
                            if update_handler.screenshot:
                                watch.save_screenshot(screenshot=update_handler.screenshot)
@@ -519,6 +527,31 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                                                    timestamp=int(fetch_start_time),
                                                    snapshot_id=update_obj.get('previous_md5', 'none'))

+                            # Save AI summary file now that the new snapshot is committed —
+                            # watch.history.keys()[-1] now reflects the just-saved version,
+                            # so the cache filename matches what the UI will later look up.
+                            # Cache key must use build_summary_cache_prompt() with UI defaults so
+                            # the worker write and the UI read hash to the same prompt_hash.
+                            if update_obj.get('_llm_change_summary') and _llm_from_version:
+                                try:
+                                    from changedetectionio.llm.evaluator import (
+                                        get_effective_summary_prompt, build_summary_cache_prompt,
+                                    )
+                                    _llm_to_version = list(watch.history.keys())[-1]
+                                    _llm_max_summary_tokens = datastore.data['settings']['application'].get('llm_max_summary_tokens', 3000)
+                                    _llm_cache_prompt = build_summary_cache_prompt(
+                                        effective_prompt=get_effective_summary_prompt(watch, datastore),
+                                        max_summary_tokens=_llm_max_summary_tokens,
+                                    )
+                                    watch.save_llm_diff_summary(
+                                        update_obj['_llm_change_summary'],
+                                        _llm_from_version,
+                                        _llm_to_version,
+                                        prompt=_llm_cache_prompt,
+                                    )
+                                except Exception as _fe:
+                                    logger.warning(f"Could not write change-summary file for {uuid}: {_fe}")
+
                            empty_pages_are_a_change = datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
                            if update_handler.fetcher.content or (not update_handler.fetcher.content and empty_pages_are_a_change):
                                watch.save_last_fetched_html(contents=update_handler.fetcher.content, timestamp=int(fetch_start_time))
@@ -669,6 +702,8 @@ async def async_update_worker(worker_id, q, notification_q, app, datastore, exec
                finally:
                    # Send completion signal - retrieve by name to ensure thread-safe access
                    if watch:
+                        # Clear transient in-memory status — check is done
+                        watch.pop('__check_status', None)
                        watch_check_update = signal('watch_check_update')
                        watch_check_update.send(watch_uuid=watch['uuid'])
Author	SHA1	Message	Date
dgtlmoon	a931b48a1b	Text filters - Process subtractive_selectors first	2026-05-15 10:15:00 +02:00
dgtlmoon	bf5efc4c31	LLM - Fixing summary cache miss-hit (#4136 ) Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details ChangeDetection.io App Test / lint-translations (push) Has been cancelled Details ChangeDetection.io App Test / lint-template-i18n (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled Details	2026-05-12 17:44:36 +02:00
dgtlmoon	aa53574332	LLM - UI - Message that 'AI Intent' (triggers) need a bigger model	2026-05-12 17:40:54 +02:00
dgtlmoon	8afa06aa96	LLM - Allow better override of formats and rules for intent/triggers	2026-05-12 17:36:39 +02:00
dgtlmoon	c6e0205622	LLM - Remove the 'format' info from the system prompt so you can create your own 'summary' formats (ie: "Make a new JSON object with the timestamp")	2026-05-12 17:28:43 +02:00
dgtlmoon	fbe59f89b6	UI - Make LLM status sticky (#4135 )	2026-05-12 15:28:33 +02:00
dgtlmoon	15be5a62db	LLM - Bumping default prompt Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details ChangeDetection.io App Test / lint-translations (push) Has been cancelled Details ChangeDetection.io App Test / lint-template-i18n (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled Details	2026-05-12 13:18:58 +02:00
dgtlmoon	a2fa9a9e7b	LLM integration - LiteLLM config - UI tweaks (#4134 )	2026-05-12 11:33:11 +02:00
K K	972d1206e8	LLM - Self-hosted OpenAI-compatible endpoint support (vLLM, LM Studio, llama.cpp) — refs #3204 (#4117 ) Build and push containers / metadata (push) Has been cancelled Details Build and push containers / build-push-containers (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled Details ChangeDetection.io App Test / lint-code (push) Has been cancelled Details ChangeDetection.io App Test / lint-translations (push) Has been cancelled Details ChangeDetection.io App Test / lint-template-i18n (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled Details Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled Details ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled Details	2026-05-11 18:04:11 +02:00
dgtlmoon	bbf56e2253	UI - "Time between check" fields re-order labels. #4128	2026-05-11 17:55:05 +02:00
dgtlmoon	dfc6eaf340	HTML escaping in HTML notifications - Bumping tests (#4131 )	2026-05-11 17:48:24 +02:00