Compare commits

..

188 Commits

Author SHA1 Message Date
dgtlmoon
fecd181e07 oops
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-02 11:15:12 +01:00
dgtlmoon
525e390523 test env tweaks 2026-03-02 11:10:28 +01:00
dgtlmoon
7fe332ad95 Small fix for 3.14 setup
Some checks failed
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-02 10:37:02 +01:00
dgtlmoon
b65a01ec02 Python 3.14 test #3662 2026-03-02 10:37:02 +01:00
dgtlmoon
b984426666 0.54.3
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-03-01 00:18:45 +01:00
dgtlmoon
1889a10ef6 CVE-2026-27696 Small fix - Restricted hostnames can still be added but are only checked at fetch-time (not when rendering lists etc) (#3938) 2026-03-01 00:17:29 +01:00
dgtlmoon
f66ae4fceb Adding Ukranian translations, rebuilding translations. (#3936) 2026-02-28 21:59:44 +01:00
Rithy-Nicolas TAN
fb14229888 Update messages.po in French translation (#3926) 2026-02-28 21:20:20 +01:00
dgtlmoon
6d1081f5bc 0.54.2
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-27 11:00:57 +01:00
dgtlmoon
9e907d8466 Unresolvable hostnames should still be added, they are checked for security at fetch time (#3933) 2026-02-27 11:00:44 +01:00
dependabot[bot]
6d6a0fd7ef CI workflow - Bump the all group with 2 updates (#3931) 2026-02-27 10:06:08 +01:00
dependabot[bot]
1537e58fc2 Update jsonpath-ng requirement from ~=1.7.0 to ~=1.8.0 (#3929) 2026-02-27 10:05:32 +01:00
dgtlmoon
5669509255 API - Processors configuration is now part of the API (#3902)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-02-25 11:30:39 +01:00
dgtlmoon
1d72716c69 Notification Token {{diff}} can accept arguments like {{diff_added(lines=5, context=2)}} (#3923)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-24 16:15:05 +01:00
dgtlmoon
c12da77439 Fixing change_datetime notification token (and adding test) (#3922) 2026-02-24 14:14:53 +01:00
dgtlmoon
f9048af6e8 0.54.1
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-23 23:01:31 +01:00
dgtlmoon
2f7315e29c Tests - Tweaks to upgrade path tests 2026-02-23 22:20:13 +01:00
dgtlmoon
bf3f8eae45 Tests - Run upgrade path test with ALLOW_IANA_RESTRICTED_ADDRESSES=true 2026-02-23 21:59:00 +01:00
dgtlmoon
fe7aa38c65 CVE-2026-27696 - Server-Side Request Forgery (SSRF) via Watch URLs, set env var ALLOW_IANA_RESTRICTED_ADDRESSES to true to access IANA reserved URLs such as http://169.254.169.254, http://10.0.0.1/, http://127.0.0.1/, etc. 2026-02-23 21:56:43 +01:00
dgtlmoon
a385c89abf CVE-2026-27645 - Reflected XSS in RSS Single Watch request 2026-02-23 21:55:59 +01:00
dgtlmoon
98f884bbff 0.53.7
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-23 08:11:55 +01:00
dgtlmoon
35499d1171 Libraries/Build - unpin referencing library (#3919)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-23 07:27:00 +01:00
dependabot[bot]
599aed75d1 Bump referencing from 0.35.1 to 0.37.0 (#3677) 2026-02-23 05:32:41 +01:00
dgtlmoon
6df75a5af9 Upgrading flask-socketio and related packages with security updates ( #3910 ) (#3918) 2026-02-23 05:30:24 +01:00
dgtlmoon
f71c4b9865 0.53.6
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-21 14:12:37 +01:00
dgtlmoon
82d5d7999c Pip installs - remove flask patch and pin library versions 2026-02-21 13:41:16 +01:00
dgtlmoon
7a51f1e4bf Lazy load flask_compress
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-20 08:56:25 +01:00
dgtlmoon
91dee697f9 UI - Content compression was not obeying FLASK_ENABLE_COMPRESSION, should be off by default due to a memory leak in flask_compress & socket.io 2026-02-20 08:54:10 +01:00
dgtlmoon
4128acf95a 0.53.5
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-20 00:57:52 +01:00
dgtlmoon
7c8d59c795 Fixing bad replacement of metadata causing possible content removal #3906 (#3908) 2026-02-20 00:55:37 +01:00
dgtlmoon
897403f7cc UI - Backup restore (#3899)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-02-18 18:05:32 +01:00
dgtlmoon
bca35f680e 0.53.4 2026-02-18 14:07:26 +01:00
dgtlmoon
fafea1b5c6 Updates/migration - Re-run tag update, re-save to cleanup changedetection.json, code refactor (#3898) 2026-02-18 14:05:23 +01:00
dgtlmoon
93630e188d UI - Search modal - fixes for running in sub path
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-18 10:27:12 +01:00
dgtlmoon
7e99d748b9 Puppeteer - Adding extra browser cleanup (#3897) 2026-02-18 10:18:14 +01:00
dgtlmoon
352c91c619 Puppeteer - Use a modern scroll method for screenshot stitching 2026-02-18 10:01:22 +01:00
dgtlmoon
a6e55aaba9 UI - CSS - Ensure 'difference' 'preview' both wraps by word and by very long strings
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-17 17:08:44 +01:00
dgtlmoon
25a17bd49d Fix: Some SPAs with long content - Stripping tags must also find matching close tag (#3895) 2026-02-17 16:57:29 +01:00
dgtlmoon
954582a581 Fix: Some SPA's also set body content to display: none which breaks text output 2026-02-17 15:38:54 +01:00
dgtlmoon
d8ef86a8b5 "Error 200 no content" - Some very large SPA pages make HTML to Text fail by dumping 10Mb+ into page header, strip extras. (#3892) 2026-02-17 14:44:03 +01:00
dgtlmoon
8711d29861 UI - Filters & Triggers - Adding reminder that you can also use 'Conditions' for trigger rules
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-17 02:55:18 +01:00
dgtlmoon
2343ddd88a Minor code tidy 2026-02-17 02:46:22 +01:00
dgtlmoon
c6d6ef0e0c Fix time schedule off-by-one bug at exact end times for all durations and add comprehensive edge case tests Re #846 (#3890) 2026-02-17 02:38:16 +01:00
dgtlmoon
23063ad8a1 UI - More fixes for realtime updates 2026-02-17 02:37:03 +01:00
dgtlmoon
27b8a2d178 UI - Fixing realtime updates for status updates when checking (#3889) 2026-02-17 02:26:38 +01:00
dgtlmoon
a53f2a784d Pluggy plugin hook for before and after a watch is processed (#3888) 2026-02-17 01:58:41 +01:00
dgtlmoon
7558ca5fda 0.53.3 2026-02-16 20:41:07 +01:00
dgtlmoon
383c3b427f API - Adding automated test for API with NGINX sub-path, Skip validation errors about server path (allows use on sub-paths/reverse proxy etc) (#3886) 2026-02-16 20:32:35 +01:00
dgtlmoon
b01ba5d8a1 UI - Use version from code in version tab
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-16 19:41:27 +01:00
dgtlmoon
86e5184cef 0.53.2 2026-02-16 18:52:31 +01:00
dgtlmoon
1dbf1f5db5 UI - Watch overview - Restock price, validate number before output (#3883) 2026-02-16 18:50:37 +01:00
dgtlmoon
c5bd7da647 Security - Adding small test and fixing overzealous filename cleaner (#3884) 2026-02-16 18:31:25 +01:00
dgtlmoon
549e167746 Datastore - On fresh installs, also scan for existing watch.json watches in subdirectories 2026-02-16 15:56:46 +01:00
dgtlmoon
9d38b45173 Security CVE-2026-25527 - Unauthenticated static path traversal in resources 2026-02-16 15:48:03 +01:00
dgtlmoon
3558e9ee10 Browser Steps - Minor code cleanup 2026-02-16 13:22:54 +01:00
dgtlmoon
4b94de7e0c UI - Browser Steps - First step was missing Clear / Remove / Pic buttons 2026-02-16 13:20:34 +01:00
dgtlmoon
3f99f0dd7b 0.53.1 2026-02-16 13:06:49 +01:00
dgtlmoon
fe465de73c Browser Steps - Clean off empty fields on save/update (UI and API), small refactor Re #3874, #3879 (#3880) 2026-02-16 13:05:46 +01:00
dgtlmoon
1ad3207288 Test - Improve test for watch package download 2026-02-16 13:05:18 +01:00
dgtlmoon
dbe238e33d UI - Watch data download, fix test, update text.
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-16 11:13:19 +01:00
dgtlmoon
32cb72b459 UI - Ability to download a complete data package (.zip) of a watch (#3877)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-15 10:53:21 +01:00
dgtlmoon
501aa61e19 Disable content compression of HTML/etc by default due to memory leak between flask_socketio and flask and flask_compress. 2026-02-15 08:19:29 +01:00
dgtlmoon
b6d3d63372 Avoid reprocessing if the page was the same (#3867)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-14 21:24:28 +01:00
dependabot[bot]
f4bb32f588 Update python-socketio requirement from ~=5.16.0 to ~=5.16.1 (#3869)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-13 17:43:43 +01:00
dgtlmoon
bcd32852ca API - Remove flask_expects_json validation, this is covered entirely by OpenAPI, update OpenAPI spec. (#3871) 2026-02-13 16:30:59 +01:00
dependabot[bot]
ad14807067 Update python-engineio requirement from ~=4.13.0 to ~=4.13.1 (#3868) 2026-02-13 11:24:50 +01:00
dgtlmoon
4bc01aca8d Price tracker - Use a more memory efficient price scraper, use subprocess on linux for cleaner memory management. (#3864)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-02-11 17:21:08 +01:00
dgtlmoon
ef41dd304c Refactoring upgrade path (#3861) 2026-02-11 16:13:08 +01:00
dgtlmoon
5726c5a0ac API - Import use background task to import large lists (#3858)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-11 08:15:58 +01:00
dgtlmoon
80f7decf4f API - Bumping docs 2026-02-11 07:44:45 +01:00
dgtlmoon
c66a29b011 API - Import - Ability to set any watch value as HTTP URL Query value, for example ?processor=restock_diff&time_between_check={'hours':24} Re #3845 (#3857) 2026-02-11 07:26:48 +01:00
dgtlmoon
a1a2e5c5bf API - Include missing tags in fetching watch information. #3854 (#3856) 2026-02-11 06:45:19 +01:00
dgtlmoon
6e90a0bbd1 UI - Bulk checkbox operations modal confirmation fix Re #3853 2026-02-11 06:29:59 +01:00
dgtlmoon
987789425d Tags update fix (#3849)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-07 17:13:41 +01:00
dgtlmoon
892b645147 Refactor for Tags storage (#3848)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-07 13:13:02 +01:00
dgtlmoon
278da3fa9b Including uptime in UI settings/info
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-07 03:50:49 +01:00
dgtlmoon
c577bd700c Refactor watch saving backend, closes #3846 (#3847) 2026-02-07 03:41:35 +01:00
dependabot[bot]
d4d6bb2872 Bump psutil from 7.2.1 to 7.2.2 (#3844)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-06 19:55:04 +01:00
dependabot[bot]
45fb262386 Bump pyppeteer-ng from 2.0.0rc12 to 2.0.0rc13 (#3843)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-06 01:33:10 +01:00
dgtlmoon
1058debc12 Fix for When MoreThanOnePriceFound() is raised, plugins dont fire #3840 #3833
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-05 20:07:47 +01:00
dgtlmoon
61b41b0b16 Rebuild translations (#3842) 2026-02-05 18:17:46 +01:00
dgtlmoon
efe3afd383 UI - Favicon use lazy load for faster rendering 2026-02-05 17:21:57 +01:00
dgtlmoon
84d26640cc Adding more tests and Watch object improvements (#3841) 2026-02-05 17:01:08 +01:00
dgtlmoon
2349344d9e Improved watch global settings handling (#3839) 2026-02-05 16:40:00 +01:00
dgtlmoon
bdc2916c07 New datastore message should be warning not critical 2026-02-05 16:25:22 +01:00
dgtlmoon
4fd477a60c Improving upgrade path
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-05 13:00:01 +01:00
dgtlmoon
dc8b387f40 History length limit size option (#3834) 2026-02-05 12:29:20 +01:00
dgtlmoon
2149a6fe3b Memory improvement - Use builtin markupsafe instead of creating a jinja2 template env each time for small strings (#3836) 2026-02-05 10:07:36 +01:00
dgtlmoon
f77d2bac6d Favicon path - cache results 2026-02-05 09:39:50 +01:00
dgtlmoon
75ecd1b793 UI - Backups tab - styling fix
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-02-05 00:18:22 +01:00
dgtlmoon
4fe2a67839 Styling fix for "backups" tab Re #3821 2026-02-04 22:42:57 +01:00
dgtlmoon
5bbbe37436 UI- Fix possible bug adding tags in quickwatch form
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-04 14:27:54 +01:00
dgtlmoon
83d7ce0fcf Processor plugin improvements - Now supports creating your own processor (for example, monitor DNS changes) (#3739)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-04 14:23:08 +01:00
dependabot[bot]
6bea9909ec Bump elementpath from 5.1.0 to 5.1.1 (#3799) 2026-02-04 11:49:35 +01:00
dgtlmoon
1aabf967ef Puppeteer and Playwright browser close/shutdown improvements (#3830)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-03 11:14:51 +01:00
dgtlmoon
30dc4ac23b Refactor of queue systenm and improve tests, improves multiple workers (#3826)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-02-02 22:28:27 +01:00
dgtlmoon
2658f81f02 Ability to limit total number of watches with env var PAGE_WATCH_LIMIT (#3828)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-02-02 11:21:44 +01:00
dgtlmoon
674d863a21 UI - Move Default Proxy selection back to "General" tab
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-01-28 18:15:32 +01:00
dgtlmoon
0b9cfcdf09 API - Notification URLs werent always being validated (#3812) 2026-01-28 17:30:58 +01:00
dgtlmoon
fd820c9330 Remove deprecated call to strtobool 2026-01-28 17:27:20 +01:00
Robbert Langezaal
e02a1824c5 UI - Make watch tags link elements (#3813) 2026-01-28 17:18:23 +01:00
dgtlmoon
5911b7fe7a test tweak 2026-01-28 17:17:37 +01:00
dgtlmoon
a239480272 DB data migration upgrade fixes (#3811) 2026-01-28 12:02:19 +01:00
dgtlmoon
fceb3cf39f Big refactor to save watches as their own datafile with some agnostic data store backend, saves writing a huge JSON file every time (#3775)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-01-28 10:18:21 +01:00
dgtlmoon
7f631268dd Improved catching of errors/exceptions in Browser Steps steps (#3808)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-27 16:23:54 +01:00
Tim Kye
8cc04ca7c5 Improving default settings for remote reverse proxies (#3806) 2026-01-27 16:23:34 +01:00
dgtlmoon
4dec1e017b CLI extra options, "batch mode" see --help allows re-checking and adding watches from the CLI (#3802)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-24 14:18:06 +01:00
Dominik Herold
9d1743adbe Update messages.po // German (#3797)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-23 10:18:41 +01:00
dependabot[bot]
f34d806b09 Bump apprise from 1.9.6 to 1.9.7 (#3800) 2026-01-23 10:18:18 +01:00
dgtlmoon
c22335ed01 0.52.9
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-22 10:30:12 +01:00
dgtlmoon
0042f0c36a Memory management improvements for large screenshots, Brotli snapshot improvements (#3798) 2026-01-22 10:29:22 +01:00
dgtlmoon
55e14cf394 Updating site.webmanifest for PWA usage 2026-01-22 10:28:38 +01:00
Ianis BERNARD
308ccb5841 Use credentials to fetch web manifest (#3790) 2026-01-22 10:27:20 +01:00
dgtlmoon
978e17acf6 Make language selection sticky and provide a way to return back to default auto-detect #3792 (#3795) 2026-01-22 08:01:49 +01:00
dgtlmoon
73c29d1fa0 Element locking 'off' by default (so they dont move when the screenshot scroll happens), only lock top viewport elements. Improve logging. (#3796) 2026-01-22 08:01:19 +01:00
dgtlmoon
b3eb88b6d2 Rebuilding language translation files
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-01-22 06:11:21 +01:00
Alex Notes
aa73ce2ee6 Update French translation (#3788) 2026-01-22 05:44:20 +01:00
Maicon Strey
0cbf345e84 Open github link on new tab (#3791) 2026-01-22 05:43:15 +01:00
Dominik Herold
d65e08e7c8 Update messages.po // German "From" (#3793) 2026-01-22 05:42:49 +01:00
dgtlmoon
10f233a939 Improving container version labeling, using master branch as docker :dev tag. Re #3794 2026-01-22 05:38:56 +01:00
dgtlmoon
52911d699f 0.52.8
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-20 13:35:12 +01:00
dgtlmoon
7e886e0c56 Memory - Favicon reader had a memory leak, Restart fetch workers between jobs, misc tweaks (#3787) 2026-01-20 12:49:53 +01:00
dgtlmoon
151e603af7 API - Validation improvements (#3782)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-19 18:16:25 +01:00
dgtlmoon
7311af4b58 i18n - zh traditional chinese autodetect from browser fix 2026-01-19 16:28:25 +01:00
dgtlmoon
af193e8d7a UI - Fixes for search dialog #3778 (#3781) 2026-01-19 16:18:23 +01:00
dgtlmoon
9e2acadb7e 0.52.7
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-19 09:37:01 +01:00
吾爱分享
48da93b4ec Fix zh PO duplicates and complete new translations. (#3773) 2026-01-19 09:35:52 +01:00
dgtlmoon
0c1adc8906 Lots of translation updates (#3772)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-17 22:24:23 +01:00
dgtlmoon
9e5a0a0209 UI - Global "mute" and "pause" buttons on main menu, move "Backups" to "Settings" (#3769)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-01-17 18:20:29 +01:00
dgtlmoon
9b96689072 API & UI - Recheck all - Dont requeue existing queued or processing watches. (#3770) 2026-01-17 18:20:22 +01:00
dgtlmoon
5e5674f48d Non blocking improvements (#3767)
* Non blocking improvements

* Test fix

* Background thread re-queue

* Nonblockimg improvements, run tasks in background, add warning about CPU cores

* Misc fixes
2026-01-17 17:25:18 +01:00
dgtlmoon
272e68ad2e Improvements to deterministic fix (false triggers) (#3766) 2026-01-17 16:12:32 +01:00
dgtlmoon
01e06979d8 Run "clear all history" in background thread to prevent blocking (#3765) 2026-01-17 15:34:21 +01:00
dgtlmoon
e45c77d51d Test - Adding missing test 2026-01-17 15:33:34 +01:00
dgtlmoon
bee1130c6e Important fix for possible wrong detection of changes under high-concurrency setups (many many fetch workers) 2026-01-17 14:45:23 +01:00
dgtlmoon
5f8448d0e2 Language updates (#3764) 2026-01-17 14:11:57 +01:00
dgtlmoon
9438d38dc6 Queues and Scheduler - No need to add imported items to the check queue, the scheduler will do this #3762 (#3763), CPU usage improvements.
* No need to add imported items to the check queue, the scheduler will do this #3762

* Tests - Faster recheck/reschedule loop under pytest environment

* More wait time under test

* Bunch up some tests a little

* fix typo

* woops

* If they want to queue one thats already running, thats up to them.

* WIP

* Fixing queue limit size

* Increase max queue size and many CPU performance fixes
2026-01-17 13:43:24 +01:00
dgtlmoon
d0c66758c2 UI - Fixing link to scheduler help/tutorial page.
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-16 19:14:29 +01:00
dgtlmoon
9e8a9d5907 Manual update of DE language (and recompile all languages) 2026-01-16 18:47:01 +01:00
dgtlmoon
7449be39fb Recompile CSS 2026-01-16 18:40:16 +01:00
dgtlmoon
e9f3d0bce4 UI - Mobile - Empty page watches message and layout improvements (#3760) 2026-01-16 17:59:52 +01:00
dgtlmoon
2abc8aa9b4 UI - CSS - Give dark-mode switching a soft transition 2026-01-16 17:45:33 +01:00
dgtlmoon
69b70a2a07 Edit - More reliable fetch of watch on test (usually affects tests) 2026-01-16 16:52:35 +01:00
吾爱分享
0c42bcb8d6 Manual polish for several translations in the zh locale. (#3757)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-01-16 10:50:31 +01:00
dgtlmoon
091c708a28 Fix for old selenium 3 (#3748 #3756), however be sure to use selenium 4. 2026-01-16 10:30:26 +01:00
dgtlmoon
084be9c990 Languages - Recompile languages, small fix for 'de'. 2026-01-16 09:50:15 +01:00
dependabot[bot]
6db1085337 Bump elementpath from 5.0.4 to 5.1.0 (#3754) 2026-01-16 09:22:10 +01:00
吾爱分享
66553e106d Update zh translations with improved, consistent Simplified Chinese UI copy. (#3752) 2026-01-16 09:21:29 +01:00
dependabot[bot]
5b01dbd9f8 Bump apprise from 1.9.5 to 1.9.6 (#3753) 2026-01-16 09:09:02 +01:00
dgtlmoon
c86f214fc3 0.52.6
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-15 22:28:58 +01:00
dgtlmoon
32149640d9 Selenium fetcher - Small fix for #3748 RGB error on transparent screenshots or similar (#3749) 2026-01-15 20:56:53 +01:00
dgtlmoon
15f16455fc UI - Show queue size above watch table in realtime
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-01-15 17:28:09 +01:00
dgtlmoon
15cdfac9d9 0.52.5 2026-01-15 14:07:09 +01:00
dgtlmoon
04de397916 Revert sub-process brotli saving because it could fork-bomb/use up too many system resources (#3747) 2026-01-15 13:56:08 +01:00
dgtlmoon
4643082c5b i18n: Recompile zh_Hant_TW/LC_MESSAGES/messages.mo 2026-01-15 13:21:49 +01:00
滅ü
3b2b74e62d i18n: Update zh_Hant_TW translations (#3745) 2026-01-15 13:12:25 +01:00
dependabot[bot]
68354cf53d Update jsonschema requirement from ~=4.25 to ~=4.26 (#3743) 2026-01-15 13:03:16 +01:00
dgtlmoon
3e364e0eba Translations - ZH_Hant_TW - Fixing timeago string handling #3737
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-15 12:24:53 +01:00
dgtlmoon
06ea29bfc7 Translations - Fixing zh_TW to zh_Hant_TW , adding tests #3737 (#3744) 2026-01-15 12:01:12 +01:00
dependabot[bot]
f4e178955c Bump pyppeteer-ng from 2.0.0rc10 to 2.0.0rc11 (#3742) 2026-01-15 10:31:42 +01:00
dgtlmoon
51d531d732 0.52.4
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-01-14 13:26:23 +01:00
dgtlmoon
e40c4ca97d Fixing Traditional Chinese locale mapping #3737 (#3738) 2026-01-14 13:26:07 +01:00
dgtlmoon
b8ede70f3a Languages - Pypi/pip package was missing translations 2026-01-14 13:09:23 +01:00
dgtlmoon
50b349b464 0.52.3
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-14 12:00:54 +01:00
dgtlmoon
67d097cca7 UI - Groups - Adding 'Recheck' button from groups overview page 2026-01-14 11:59:42 +01:00
dgtlmoon
494385a379 Minor playwright memory cleanup improvements (#3736) 2026-01-14 11:54:53 +01:00
dgtlmoon
c2ee84b753 Browser Steps UI async_loop bug, refactored startup of BrowserSteps, increased test coverage. Re #3734 (#3735) 2026-01-14 11:27:01 +01:00
dgtlmoon
c1e0296cda 0.52.2
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-13 16:36:16 +01:00
dgtlmoon
f041223c38 Page fetchers - Were not truely running independently and could have been blocking eachother, this commit speeds up page fetches where there is more than 1 worker. 2026-01-13 16:32:50 +01:00
dgtlmoon
d36738d7ef RSS - Bugfix - possible edge case of wrong feed info could be rendered (#3733) 2026-01-13 16:31:58 +01:00
dgtlmoon
e51ff34c89 UI - Language modal - flag icons should be round
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
2026-01-12 18:01:42 +01:00
dgtlmoon
ba4ed9cf27 0.52.1 2026-01-12 17:52:52 +01:00
dgtlmoon
33b7f1684d Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2026-01-12 17:51:17 +01:00
dgtlmoon
3d14df6a11 Development branch merge into release/master
Multi-language / Translations Support (#3696)
  - Complete internationalization system implemented
  - Support for 7 languages: Czech (cs), German (de), French (fr), Italian (it), Korean (ko), Chinese Simplified (zh), Chinese Traditional (zh_TW)
  - Language selector with localized flags and theming
  - Flash message translations
  - Multiple translation fixes and improvements across all languages
  - Language setting preserved across redirects

  Pluggable Content Fetchers (#3653)
  - New architecture for extensible content fetcher system
  - Allows custom fetcher implementations

  Image / Screenshot Comparison Processor (#3680)
  - New processor for visual change detection (disabled for this release)
  - Supporting CSS/JS infrastructure added

  UI Improvements

  Design & Layout
  - Auto-generated tag color schemes
  - Simplified login form styling
  - Removed hard-coded CSS, moved to SCSS variables
  - Tag UI cleanup and improvements
  - Automatic tab wrapper functionality
  - Menu refactoring for better organization
  - Cleanup of offset settings
  - Hide sticky tabs on narrow viewports
  - Improved responsive layout (#3702)

  User Experience
  - Modal alerts/confirmations on delete/clear operations (#3693, #3598, #3382)
  - Auto-add https:// to URLs in quickwatch form if not present
  - Better redirect handling on login (#3699)
  - 'Recheck all' now returns to correct group/tag (#3673)
  - Language set redirect keeps hash fragment
  - More friendly human-readable text throughout UI

  Performance & Reliability

  Scheduler & Processing
  - Soft delays instead of blocking time.sleep() calls (#3710)
  - More resilient handling of same UUID being processed (#3700)
  - Better Puppeteer timeout handling
  - Improved Puppeteer shutdown/cleanup (#3692)
  - Requests cleanup now properly async

  History & Rendering
  - Faster server-side "difference" rendering on History page (#3442)
  - Show ignored/triggered rows in history
  - API: Retry watch data if watch dict changed (more reliable)

  API Improvements

  - Watch get endpoint: retry mechanism for changed watch data
  - WatchHistoryDiff API endpoint includes extra format args (#3703)

  Testing Improvements

  - Replace time.sleep with wait_for_notification_endpoint_output (#3716)
  - Test for mode switching (#3701)
  - Test for #3720 added (#3725)
  - Extract-text difference test fixes
  - Improved dev workflow

  Bug Fixes

  - Notification error text output (#3672, #3669, #3280)
  - HTML validation fixes (#3704)
  - Template discovery path fixes
  - Notification debug log now uses system locale for dates/times
  - Puppeteer spelling mistake in log output
  - Recalculation on anchor change
  - Queue bubble update disabled temporarily

  Dependency Updates

  - beautifulsoup4 updated (#3724)
  - psutil 7.1.0 → 7.2.1 (#3723)
  - python-engineio ~=4.12.3 → ~=4.13.0 (#3707)
  - python-socketio ~=5.14.3 → ~=5.16.0 (#3706)
  - flask-socketio ~=5.5.1 → ~=5.6.0 (#3691)
  - brotli ~=1.1 → ~=1.2 (#3687)
  - lxml updated (#3590)
  - pytest ~=7.2 → ~=9.0 (#3676)
  - jsonschema ~=4.0 → ~=4.25 (#3618)
  - pluggy ~=1.5 → ~=1.6 (#3616)
  - cryptography 44.0.1 → 46.0.3 (security) (#3589)

  Documentation

  - README updated with viewport size setup information

  Development Infrastructure

  - Dev container only built on dev branch
  - Improved dev workflow tooling
2026-01-12 17:50:53 +01:00
dgtlmoon
08ce1e28ce Adding test for #3720 2026-01-12 11:40:31 +01:00
MkDev11
e4118a1620 Testing - fix: Replace time.sleep with wait_for_notification_endpoint_output in test_notification (#3716)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-01-07 22:24:58 +01:00
dgtlmoon
64d0c09b08 Update README.md - Info about setting up different viewport sizes 2026-01-07 22:23:07 +01:00
dgtlmoon
008e5eb024 Use soft delays instead of blocking time sleeps in scheduler (#3710)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-05 10:34:14 +01:00
dgtlmoon
e6553065fd API - Watch get, retry watch data if watch dict changed (more reliable) 2026-01-05 10:31:17 +01:00
dgtlmoon
de996a4566 Notification debug log - Use locale of system for dates/times 2026-01-05 10:13:35 +01:00
dgtlmoon
4784ae4cd0 Misc small HTML Validation fixes (#3704)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-01-04 17:17:17 +01:00
dgtlmoon
39274f121c 0.51.4
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2025-11-28 13:26:15 +01:00
dgtlmoon
4b1d871078 Improving UTF-8 handling for xPath selectors (Stop the xpath filter from chewing up non-regulat-latin-text style content) (#3659)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-11-28 13:13:41 +01:00
dependabot[bot]
f78c2dcffd Bump actions/checkout from 5 to 6 in the all group (#3651)
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2025-11-24 02:03:13 +01:00
Voczi
1c2c22b8df Specify UTF-8 encoding for xpath_element_js (#3650) 2025-11-23 19:55:26 +01:00
dgtlmoon
3276a9347a Update playwright library to 1.56
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2025-11-21 11:12:18 +01:00
dgtlmoon
d763bb4267 0.51.3
Some checks failed
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2025-11-19 16:43:37 +01:00
dgtlmoon
be3c9892e0 RSS Reader Mode parser improvements - Pick up all fields from RSS where possible, better auto-detect of the XML encoding if it wasnt set by the browser (#3646) 2025-11-19 16:42:25 +01:00
835 changed files with 89127 additions and 5787 deletions

33
.github/nginx-reverse-proxy-test.conf vendored Normal file
View File

@@ -0,0 +1,33 @@
server {
listen 80;
server_name localhost;
# Test basic reverse proxy to changedetection.io
location / {
proxy_pass http://changedet-app:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Test subpath deployment with X-Forwarded-Prefix
location /changedet-sub/ {
proxy_pass http://changedet-app:5000/;
proxy_set_header X-Forwarded-Prefix /changedet-sub;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}

View File

@@ -7,6 +7,8 @@ ENV PYTHONUNBUFFERED=1
COPY requirements.txt /requirements.txt
ARG TARGETPLATFORM
RUN \
apk add --update --no-cache --virtual=build-dependencies \
build-base \
@@ -27,7 +29,19 @@ RUN \
file \
nodejs \
poppler-utils \
python3 && \
python3 \
glib \
libsm \
libxext \
libxrender && \
case "$TARGETPLATFORM" in \
linux/arm/v7|linux/arm/v8) \
echo "INFO: Skipping py3-opencv on $TARGETPLATFORM (using pixelmatch fallback)" \
;; \
*) \
apk add --update --no-cache py3-opencv || echo "WARN: py3-opencv install failed, using pixelmatch fallback" \
;; \
esac && \
echo "**** pip3 install test of changedetection.io ****" && \
python3 -m venv /lsiopy && \
pip install -U pip wheel setuptools && \

View File

@@ -30,7 +30,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL

View File

@@ -39,14 +39,14 @@ jobs:
# Or if we are in a tagged release scenario.
if: ${{ github.event.workflow_run.conclusion == 'success' }} || ${{ github.event.release.tag_name }} != ''
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Set up Python 3.11
uses: actions/setup-python@v6
with:
python-version: 3.11
- name: Cache pip packages
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
@@ -93,16 +93,27 @@ jobs:
driver-opts: image=moby/buildkit:master
# master branch -> :dev container tag
- name: Docker meta :dev
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/metadata-action@v5
id: meta_dev
with:
images: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io
ghcr.io/${{ github.repository }}
tags: |
type=raw,value=dev
- name: Build and push :dev
id: docker_build
if: ${{ github.ref }} == "refs/heads/master"
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
push: true
tags: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:dev,ghcr.io/${{ github.repository }}:dev
tags: ${{ steps.meta_dev.outputs.tags }}
labels: ${{ steps.meta_dev.outputs.labels }}
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/arm/v8
cache-from: type=gha
cache-to: type=gha,mode=max
@@ -141,6 +152,7 @@ jobs:
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/arm/v8
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@@ -7,7 +7,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
@@ -21,7 +21,7 @@ jobs:
- name: Build a binary wheel and a source tarball
run: python3 -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: python-package-distributions
path: dist/
@@ -34,7 +34,7 @@ jobs:
- build
steps:
- name: Download all the dists
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: python-package-distributions
path: dist/
@@ -61,8 +61,8 @@ jobs:
# --- API test ---
# This also means that the docs/api-spec.yml was shipped and could be read
test -f /tmp/url-watches.json
API_KEY=$(jq -r '.. | .api_access_token? // empty' /tmp/url-watches.json)
test -f /tmp/changedetection.json
API_KEY=$(jq -r '.. | .api_access_token? // empty' /tmp/changedetection.json)
echo Test API KEY is $API_KEY
curl -X POST "http://127.0.0.1:10000/api/v1/watch" \
-H "x-api-key: ${API_KEY}" \
@@ -93,7 +93,7 @@ jobs:
steps:
- name: Download all the dists
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: python-package-distributions
path: dist/

View File

@@ -44,14 +44,14 @@ jobs:
- platform: linux/arm64
dockerfile: ./.github/test/Dockerfile-alpine
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Set up Python 3.11
uses: actions/setup-python@v6
with:
python-version: 3.11
- name: Cache pip packages
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

View File

@@ -7,7 +7,7 @@ jobs:
lint-code:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Lint with Ruff
run: |
pip install ruff
@@ -52,4 +52,13 @@ jobs:
uses: ./.github/workflows/test-stack-reusable-workflow.yml
with:
python-version: '3.13'
skip-pypuppeteer: true
skip-pypuppeteer: true
test-application-3-14:
#if: github.event_name == 'push' && github.ref == 'refs/heads/master'
needs: lint-code
uses: ./.github/workflows/test-stack-reusable-workflow.yml
with:
python-version: '3.14'
skip-pypuppeteer: false

View File

@@ -21,7 +21,7 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v6
@@ -29,7 +29,7 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}
- name: Cache pip packages
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements.txt') }}
@@ -37,10 +37,29 @@ jobs:
${{ runner.os }}-pip-py${{ env.PYTHON_VERSION }}-
${{ runner.os }}-pip-
- name: Get current date for cache key
id: date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build changedetection.io container for testing under Python ${{ env.PYTHON_VERSION }}
uses: docker/build-push-action@v6
with:
context: ./
file: ./Dockerfile
build-args: |
PYTHON_VERSION=${{ env.PYTHON_VERSION }}
LOGGER_LEVEL=TRACE
tags: test-changedetectionio
load: true
cache-from: type=gha,scope=build-${{ github.ref_name }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements.txt', 'Dockerfile') }}-${{ steps.date.outputs.date }}
cache-to: type=gha,mode=max,scope=build-${{ github.ref_name }}-py${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements.txt', 'Dockerfile') }}-${{ steps.date.outputs.date }}
- name: Verify build
run: |
echo "---- Building for Python ${{ env.PYTHON_VERSION }} -----"
docker build --build-arg PYTHON_VERSION=${{ env.PYTHON_VERSION }} --build-arg LOGGER_LEVEL=TRACE -t test-changedetectionio .
echo "---- Built for Python ${{ env.PYTHON_VERSION }} -----"
docker run test-changedetectionio bash -c 'pip list'
- name: We should be Python ${{ env.PYTHON_VERSION }} ...
@@ -52,7 +71,7 @@ jobs:
docker save test-changedetectionio -o /tmp/test-changedetectionio.tar
- name: Upload Docker image artifact
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp/test-changedetectionio.tar
@@ -66,10 +85,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -84,6 +103,7 @@ jobs:
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_watch_model'
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_jinja2_security'
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_semver'
docker run test-changedetectionio bash -c 'python3 -m unittest changedetectionio.tests.unit.test_html_to_text'
# Basic pytest tests with ancillary services
basic-tests:
@@ -93,10 +113,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -110,6 +130,32 @@ jobs:
docker network inspect changedet-network >/dev/null 2>&1 || docker network create changedet-network
docker run --name test-cdio-basic-tests --network changedet-network test-changedetectionio bash -c 'cd changedetectionio && ./run_basic_tests.sh'
- name: Test CLI options
run: |
docker network inspect changedet-network >/dev/null 2>&1 || docker network create changedet-network
docker run --name test-cdio-cli-opts --network changedet-network test-changedetectionio bash -c 'changedetectionio/test_cli_opts.sh' &> cli-opts-output.txt
echo "=== CLI Options Test Output ==="
cat cli-opts-output.txt
- name: CLI Memory Test
run: |
echo "=== Checking CLI batch mode memory usage ==="
# Extract RSS memory value from output
RSS_MB=$(grep -oP "Memory consumption before worker shutdown: RSS=\K[\d.]+" cli-opts-output.txt | head -1 || echo "0")
echo "RSS Memory: ${RSS_MB} MB"
# Check if RSS is less than 100MB
if [ -n "$RSS_MB" ]; then
if (( $(echo "$RSS_MB < 100" | bc -l) )); then
echo "✓ Memory usage is acceptable: ${RSS_MB} MB < 100 MB"
else
echo "✗ Memory usage too high: ${RSS_MB} MB >= 100 MB"
exit 1
fi
else
echo "⚠ Could not extract memory usage, skipping check"
fi
- name: Extract memory report and logs
if: always()
uses: ./.github/actions/extract-memory-report
@@ -119,11 +165,18 @@ jobs:
- name: Store test artifacts
if: always()
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: test-cdio-basic-tests-output-py${{ env.PYTHON_VERSION }}
path: output-logs
- name: Store CLI test output
if: always()
uses: actions/upload-artifact@v7
with:
name: test-cdio-cli-opts-output-py${{ env.PYTHON_VERSION }}
path: cli-opts-output.txt
# Playwright tests
playwright-tests:
runs-on: ubuntu-latest
@@ -132,10 +185,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -174,10 +227,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -214,10 +267,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -250,10 +303,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -271,6 +324,175 @@ jobs:
run: |
docker run --rm --network changedet-network test-changedetectionio bash -c 'cd changedetectionio;pytest tests/smtp/test_notification_smtp.py'
nginx-reverse-proxy:
runs-on: ubuntu-latest
needs: build
timeout-minutes: 10
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
- name: Load Docker image
run: |
docker load -i /tmp/test-changedetectionio.tar
- name: Spin up services
run: |
docker network create changedet-network
# Start changedetection.io container with X-Forwarded headers support
docker run --name changedet-app --hostname changedet-app --network changedet-network \
-e USE_X_SETTINGS=true \
-d test-changedetectionio
sleep 3
- name: Start nginx reverse proxy
run: |
# Start nginx with our test configuration
docker run --name nginx-proxy --network changedet-network -d -p 8080:80 --rm \
-v ${{ github.workspace }}/.github/nginx-reverse-proxy-test.conf:/etc/nginx/conf.d/default.conf:ro \
nginx:alpine
sleep 2
- name: Test reverse proxy - root path
run: |
echo "=== Testing nginx reverse proxy at root path ==="
curl --retry-connrefused --retry 6 -s http://localhost:8080/ > /tmp/nginx-test-root.html
# Check for changedetection.io UI elements
if grep -q "checkbox-uuid" /tmp/nginx-test-root.html; then
echo "✓ Found checkbox-uuid in response"
else
echo "ERROR: checkbox-uuid not found in response"
cat /tmp/nginx-test-root.html
exit 1
fi
# Check for watchlist content
if grep -q -i "watch" /tmp/nginx-test-root.html; then
echo "✓ Found watch/watchlist content in response"
else
echo "ERROR: watchlist content not found"
cat /tmp/nginx-test-root.html
exit 1
fi
echo "✓ Root path reverse proxy working correctly"
- name: Test reverse proxy - subpath with X-Forwarded-Prefix
run: |
echo "=== Testing nginx reverse proxy at subpath /changedet-sub/ ==="
curl --retry-connrefused --retry 6 -s http://localhost:8080/changedet-sub/ > /tmp/nginx-test-subpath.html
# Check for changedetection.io UI elements
if grep -q "checkbox-uuid" /tmp/nginx-test-subpath.html; then
echo "✓ Found checkbox-uuid in subpath response"
else
echo "ERROR: checkbox-uuid not found in subpath response"
cat /tmp/nginx-test-subpath.html
exit 1
fi
echo "✓ Subpath reverse proxy working correctly"
- name: Test API through reverse proxy subpath
run: |
echo "=== Testing API endpoints through nginx subpath /changedet-sub/ ==="
# Extract API key from the changedetection.io datastore
API_KEY=$(docker exec changedet-app cat /datastore/changedetection.json | grep -o '"api_access_token": *"[^"]*"' | cut -d'"' -f4)
if [ -z "$API_KEY" ]; then
echo "ERROR: Could not extract API key from datastore"
docker exec changedet-app cat /datastore/changedetection.json
exit 1
fi
echo "✓ Extracted API key: ${API_KEY:0:8}..."
# Create a watch via API through nginx proxy subpath
echo "Creating watch via POST to /changedet-sub/api/v1/watch"
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "http://localhost:8080/changedet-sub/api/v1/watch" \
-H "x-api-key: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/test-nginx-proxy",
"tag": "nginx-test"
}')
HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
BODY=$(echo "$RESPONSE" | head -n-1)
if [ "$HTTP_CODE" != "201" ]; then
echo "ERROR: Expected HTTP 201, got $HTTP_CODE"
echo "Response: $BODY"
exit 1
fi
echo "✓ Watch created successfully (HTTP 201)"
# Extract the watch UUID from response
WATCH_UUID=$(echo "$BODY" | grep -o '"uuid": *"[^"]*"' | cut -d'"' -f4)
echo "✓ Watch UUID: $WATCH_UUID"
# Update the watch via PUT through nginx proxy subpath
echo "Updating watch via PUT to /changedet-sub/api/v1/watch/${WATCH_UUID}"
RESPONSE=$(curl -s -w "\n%{http_code}" -X PUT "http://localhost:8080/changedet-sub/api/v1/watch/${WATCH_UUID}" \
-H "x-api-key: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"paused": true
}')
HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
BODY=$(echo "$RESPONSE" | head -n-1)
if [ "$HTTP_CODE" != "200" ]; then
echo "ERROR: Expected HTTP 200, got $HTTP_CODE"
echo "Response: $BODY"
exit 1
fi
if echo "$BODY" | grep -q 'OK'; then
echo "✓ Watch updated successfully (HTTP 200, response: OK)"
else
echo "ERROR: Expected response 'OK', got: $BODY"
echo "Response: $BODY"
exit 1
fi
# Verify the watch is paused via GET
echo "Verifying watch is paused via GET"
RESPONSE=$(curl -s "http://localhost:8080/changedet-sub/api/v1/watch/${WATCH_UUID}" \
-H "x-api-key: ${API_KEY}")
if echo "$RESPONSE" | grep -q '"paused": *true'; then
echo "✓ Watch is paused as expected"
else
echo "ERROR: Watch paused state not confirmed"
echo "Response: $RESPONSE"
exit 1
fi
echo "✓ API tests through nginx subpath completed successfully"
- name: Cleanup nginx test
if: always()
run: |
docker logs nginx-proxy || true
docker logs changedet-app || true
docker stop nginx-proxy changedet-app || true
docker rm nginx-proxy changedet-app || true
# Proxy tests
proxy-tests:
runs-on: ubuntu-latest
@@ -279,10 +501,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -319,10 +541,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -342,6 +564,29 @@ jobs:
cd changedetectionio
./run_custom_browser_url_tests.sh
processor-plugin-tests:
runs-on: ubuntu-latest
needs: build
timeout-minutes: 20
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
- name: Load Docker image
run: |
docker load -i /tmp/test-changedetectionio.tar
- name: Basic processor plugin registration and checks
run: |
docker run -e EXTRA_PACKAGES=changedetection.io-osint-processor test-changedetectionio bash -c 'cd changedetectionio;pytest -vvv -s tests/plugins/test_processor.py::test_check_plugin_processor'
# Container startup tests
container-tests:
runs-on: ubuntu-latest
@@ -350,10 +595,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -395,10 +640,10 @@ jobs:
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Download Docker image artifact
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: test-changedetectionio-${{ env.PYTHON_VERSION }}
path: /tmp
@@ -440,3 +685,154 @@ jobs:
exit 1
fi
docker rm sig-test
# Upgrade path test
upgrade-path-test:
runs-on: ubuntu-latest
needs: build
timeout-minutes: 25
env:
PYTHON_VERSION: ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0 # Fetch all history and tags for upgrade testing
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v6
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Check upgrade works without error
run: |
echo "=== Testing upgrade path from 0.49.1 to ${{ github.ref_name }} (${{ github.sha }}) ==="
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
g++ \
gcc \
libc-dev \
libffi-dev \
libjpeg-dev \
libssl-dev \
libxslt-dev \
make \
patch \
pkg-config \
zlib1g-dev
# Checkout old version and create datastore
git checkout 0.49.1
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install 'pyOpenSSL>=23.2.0'
echo "=== Running version 0.49.1 to create datastore ==="
ALLOW_IANA_RESTRICTED_ADDRESSES=true python3 ./changedetection.py -C -d /tmp/data &
APP_PID=$!
# Wait for app to be ready
echo "Waiting for 0.49.1 to be ready..."
sleep 6
# Extract API key from datastore (0.49.1 uses url-watches.json)
API_KEY=$(jq -r '.settings.application.api_access_token // empty' /tmp/data/url-watches.json)
echo "API Key: ${API_KEY:0:8}..."
# Create a watch with tag "github-group-test" via API
echo "Creating test watch with tag via API..."
curl -X POST "http://127.0.0.1:5000/api/v1/watch" \
-H "x-api-key: ${API_KEY}" \
-H "Content-Type: application/json" \
--show-error --fail \
--retry 6 --retry-delay 1 --retry-connrefused \
-d '{
"url": "https://example.com/upgrade-test",
"tag": "github-group-test"
}'
echo "✓ Created watch with tag 'github-group-test'"
# Create a specific test URL watch
echo "Creating test URL watch via API..."
curl -X POST "http://127.0.0.1:5000/api/v1/watch" \
-H "x-api-key: ${API_KEY}" \
-H "Content-Type: application/json" \
--show-error --fail \
-d '{
"url": "http://localhost/test.txt"
}'
echo "✓ Created watch for 'http://localhost/test.txt' in version 0.49.1"
# Stop the old version gracefully
kill $APP_PID
wait $APP_PID || true
echo "✓ Version 0.49.1 stopped"
# Upgrade to current version (use commit SHA since we're in detached HEAD)
echo "Upgrading to commit ${{ github.sha }}"
git checkout ${{ github.sha }}
pip install -r requirements.txt
echo "=== Running current version (commit ${{ github.sha }}) with old datastore (testing mode) ==="
ALLOW_IANA_RESTRICTED_ADDRESSES=true TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD=1 python3 ./changedetection.py -d /tmp/data > /tmp/upgrade-test.log 2>&1
echo "=== Upgrade test output ==="
cat /tmp/upgrade-test.log
echo "✓ Datastore upgraded successfully"
# Now start the current version normally to verify the tag survived
echo "=== Starting current version to verify tag exists after upgrade ==="
ALLOW_IANA_RESTRICTED_ADDRESSES=true timeout 20 python3 ./changedetection.py -d /tmp/data > /tmp/ui-test.log 2>&1 &
APP_PID=$!
# Wait for app to be ready and fetch UI
echo "Waiting for current version to be ready..."
sleep 5
curl --retry 6 --retry-delay 1 --retry-connrefused --silent http://127.0.0.1:5000 > /tmp/ui-output.html
# Verify tag exists in UI
if grep -q "github-group-test" /tmp/ui-output.html; then
echo "✓ Tag 'github-group-test' found in UI after upgrade"
else
echo "ERROR: Tag 'github-group-test' not found in UI after upgrade"
echo "=== UI Output ==="
cat /tmp/ui-output.html
echo "=== App Log ==="
cat /tmp/ui-test.log
kill $APP_PID || true
exit 1
fi
# Verify test URL exists in UI
if grep -q "http://localhost/test.txt" /tmp/ui-output.html; then
echo "✓ Watch URL 'http://localhost/test.txt' found in UI after upgrade"
else
echo "ERROR: Watch URL 'http://localhost/test.txt' not found in UI after upgrade"
echo "=== UI Output ==="
cat /tmp/ui-output.html
echo "=== App Log ==="
cat /tmp/ui-test.log
kill $APP_PID || true
exit 1
fi
# Cleanup
kill $APP_PID || true
wait $APP_PID || true
echo ""
echo "✓✓✓ Upgrade test passed: 0.49.1 → ${{ github.ref_name }} ✓✓✓"
echo " - Commit: ${{ github.sha }}"
echo " - Datastore migrated successfully"
echo " - Tag 'github-group-test' survived upgrade"
echo " - Watch URL 'http://localhost/test.txt' survived upgrade"
echo "✓ Upgrade test passed: 0.49.1 → ${{ github.ref_name }}"
- name: Upload upgrade test logs
if: always()
uses: actions/upload-artifact@v7
with:
name: upgrade-test-logs-py${{ env.PYTHON_VERSION }}
path: /tmp/upgrade-test.log

1
.gitignore vendored
View File

@@ -29,3 +29,4 @@ test-datastore/
# Memory consumption log
test-memory.log
tests/logs/

View File

@@ -34,6 +34,7 @@ ENV OPENSSL_LIB_DIR="/usr/lib/arm-linux-gnueabihf"
ENV OPENSSL_INCLUDE_DIR="/usr/include/openssl"
# Additional environment variables for cryptography Rust build
ENV CRYPTOGRAPHY_DONT_BUILD_RUST=1
RUN --mount=type=cache,id=pip,sharing=locked,target=/tmp/pip-cache \
pip install \
--prefer-binary \
@@ -43,7 +44,6 @@ RUN --mount=type=cache,id=pip,sharing=locked,target=/tmp/pip-cache \
--target=/dependencies \
-r /requirements.txt
# Playwright is an alternative to Selenium
# Excluded this package from requirements.txt to prevent arm/v6 and arm/v7 builds from failing
# https://github.com/dgtlmoon/changedetection.io/pull/1067 also musl/alpine (not supported)
@@ -52,13 +52,38 @@ RUN --mount=type=cache,id=pip,sharing=locked,target=/tmp/pip-cache \
--prefer-binary \
--cache-dir=/tmp/pip-cache \
--target=/dependencies \
playwright~=1.48.0 \
playwright~=1.56.0 \
|| echo "WARN: Failed to install Playwright. The application can still run, but the Playwright option will be disabled."
# OpenCV is optional for fast image comparison (pixelmatch is the fallback)
# Skip on arm/v7 and arm/v8 where builds take weeks - excluded from requirements.txt
ARG TARGETPLATFORM
RUN --mount=type=cache,id=pip,sharing=locked,target=/tmp/pip-cache \
case "$TARGETPLATFORM" in \
linux/arm/v7|linux/arm/v8) \
echo "INFO: Skipping OpenCV on $TARGETPLATFORM (build takes too long), using pixelmatch fallback" \
;; \
*) \
pip install \
--prefer-binary \
--extra-index-url https://www.piwheels.org/simple \
--cache-dir=/tmp/pip-cache \
--target=/dependencies \
opencv-python-headless>=4.8.0.76 \
|| echo "WARN: OpenCV install failed, will use pixelmatch fallback" \
;; \
esac
# Final image stage
FROM python:${PYTHON_VERSION}-slim-bookworm
LABEL org.opencontainers.image.source="https://github.com/dgtlmoon/changedetection.io"
LABEL org.opencontainers.image.url="https://changedetection.io"
LABEL org.opencontainers.image.documentation="https://changedetection.io/tutorials"
LABEL org.opencontainers.image.title="changedetection.io"
LABEL org.opencontainers.image.description="Self-hosted web page change monitoring and notification service"
LABEL org.opencontainers.image.licenses="Apache-2.0"
LABEL org.opencontainers.image.vendor="changedetection.io"
RUN apt-get update && apt-get install -y --no-install-recommends \
libxslt1.1 \
@@ -69,6 +94,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
# favicon type detection and other uses
file \
zlib1g \
# OpenCV dependencies for image processing
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender-dev \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
@@ -89,6 +119,9 @@ EXPOSE 5000
# The actual flask app module
COPY changedetectionio /app/changedetectionio
# Compile translation files for i18n support
RUN pybabel compile -d /app/changedetectionio/translations
# Also for OpenAPI validation wrapper - needs the YML
RUN [ ! -d "/app/docs" ] && mkdir /app/docs
COPY docs/api-spec.yaml /app/docs/api-spec.yaml
@@ -105,6 +138,15 @@ ENV LOGGER_LEVEL="$LOGGER_LEVEL"
ENV LC_ALL=en_US.UTF-8
WORKDIR /app
# Copy and set up entrypoint script for installing extra packages
COPY docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh
# Set entrypoint to handle EXTRA_PACKAGES env var
ENTRYPOINT ["/docker-entrypoint.sh"]
# Default command (can be overridden in docker-compose.yml)
CMD ["python", "./changedetection.py", "-d", "/datastore"]

View File

@@ -9,12 +9,15 @@ recursive-include changedetectionio/notification *
recursive-include changedetectionio/processors *
recursive-include changedetectionio/realtime *
recursive-include changedetectionio/static *
recursive-include changedetectionio/store *
recursive-include changedetectionio/templates *
recursive-include changedetectionio/tests *
recursive-include changedetectionio/translations *
recursive-include changedetectionio/widgets *
prune changedetectionio/static/package-lock.json
prune changedetectionio/static/styles/node_modules
prune changedetectionio/static/styles/package-lock.json
include changedetectionio/favicon_utils.py
include changedetection.py
include requirements.txt
include README-pip.md

View File

@@ -183,6 +183,9 @@ docker compose pull && docker compose up -d
See the wiki for more information https://github.com/dgtlmoon/changedetection.io/wiki
## Different browser viewport sizes (mobile, desktop etc)
If you are using the recommended `sockpuppetbrowser` (which is in the docker-compose.yml as a setting to be uncommented) you can easily set different viewport sizes for your web page change detection, [see more information here about setting up different viewport sizes](https://github.com/dgtlmoon/sockpuppetbrowser?tab=readme-ov-file#setting-viewport-size).
## Filters

5
babel.cfg Normal file
View File

@@ -0,0 +1,5 @@
[python: **.py]
keywords = _:1,_l:1,gettext:1
[jinja2: **/templates/**.html]
encoding = utf-8

View File

@@ -2,22 +2,76 @@
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
# Semver means never use .01, or 00. Should be .1.
__version__ = '0.51.2'
__version__ = '0.54.3'
from changedetectionio.strtobool import strtobool
from json.decoder import JSONDecodeError
import os
from loguru import logger
import getopt
import logging
import os
import platform
import signal
import sys
import threading
import time
# Eventlet completely removed - using threading mode for SocketIO
# This provides better Python 3.12+ compatibility and eliminates eventlet/asyncio conflicts
from changedetectionio import store
from changedetectionio.flask_app import changedetection_app
from loguru import logger
# Note: store and changedetection_app are imported inside main() to avoid
# initialization before argument parsing (allows --help to work without loading everything)
# ==============================================================================
# Multiprocessing Configuration - CRITICAL for Thread Safety
# ==============================================================================
#
# PROBLEM: Python 3.12+ warns about fork() with multi-threaded processes:
# "This process is multi-threaded, use of fork() may lead to deadlocks"
#
# WHY IT'S DANGEROUS:
# 1. This Flask app has multiple threads (HTTP handlers, workers, SocketIO)
# 2. fork() copies ONLY the calling thread to the child process
# 3. BUT fork() also copies all locks/mutexes in their current state
# 4. If another thread held a lock during fork() → child has locked lock with no owner
# 5. Result: PERMANENT DEADLOCK if child tries to acquire that lock
#
# SOLUTION: Use 'spawn' instead of 'fork'
# - spawn starts a fresh Python interpreter (no inherited threads or locks)
# - Slower (~200ms vs ~1ms) but safe with multi-threaded parent
# - Consistent across all platforms (Windows already uses spawn by default)
#
# IMPLEMENTATION:
# 1. Explicit contexts everywhere (primary protection):
# - playwright.py: ctx = multiprocessing.get_context('spawn')
# - puppeteer.py: ctx = multiprocessing.get_context('spawn')
# - isolated_opencv.py: ctx = multiprocessing.get_context('spawn')
# - isolated_libvips.py: ctx = multiprocessing.get_context('spawn')
#
# 2. Global default (defense-in-depth, below):
# - Safety net if future code forgets explicit context
# - Protects against third-party libraries using Process()
# - Costs nothing (explicit contexts always override it)
#
# WHY BOTH?
# - Explicit contexts: Clear, self-documenting, always works
# - Global default: Safety net for forgotten contexts or library code
# - If someone writes "Process()" instead of "ctx.Process()", still safe!
#
# See: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
# ==============================================================================
import multiprocessing
import sys
# Set spawn as global default (safety net - all our code uses explicit contexts anyway)
# Skip in tests to avoid breaking pytest-flask's LiveServer fixture (uses unpicklable local functions)
if 'pytest' not in sys.modules:
try:
if multiprocessing.get_start_method(allow_none=True) is None:
multiprocessing.set_start_method('spawn', force=False)
logger.debug("Set multiprocessing default to 'spawn' for thread safety (explicit contexts used everywhere)")
except RuntimeError:
logger.debug(f"Multiprocessing start method already set: {multiprocessing.get_start_method()}")
# Only global so we can access it in the signal handler
app = None
@@ -30,15 +84,26 @@ def get_version():
def sigshutdown_handler(_signo, _stack_frame):
name = signal.Signals(_signo).name
logger.critical(f'Shutdown: Got Signal - {name} ({_signo}), Fast shutdown initiated')
# Set exit flag immediately to stop all loops
app.config.exit.set()
datastore.stop_thread = True
# Log memory consumption before shutting down workers (cross-platform)
try:
import psutil
process = psutil.Process()
mem_info = process.memory_info()
rss_mb = mem_info.rss / 1024 / 1024
vms_mb = mem_info.vms / 1024 / 1024
logger.info(f"Memory consumption before worker shutdown: RSS={rss_mb:,.2f} MB, VMS={vms_mb:,.2f} MB")
except Exception as e:
logger.warning(f"Could not retrieve memory stats: {str(e)}")
# Shutdown workers and queues immediately
try:
from changedetectionio import worker_handler
worker_handler.shutdown_workers()
from changedetectionio import worker_pool
worker_pool.shutdown_workers()
except Exception as e:
logger.error(f"Error shutting down workers: {str(e)}")
@@ -47,9 +112,9 @@ def sigshutdown_handler(_signo, _stack_frame):
from changedetectionio.flask_app import update_q, notification_q
update_q.close()
notification_q.close()
logger.debug("Janus queues closed successfully")
logger.debug("Queues closed successfully")
except Exception as e:
logger.critical(f"CRITICAL: Failed to close janus queues: {e}")
logger.critical(f"CRITICAL: Failed to close queues: {e}")
# Shutdown socketio server fast
from changedetectionio.flask_app import socketio_server
@@ -59,31 +124,80 @@ def sigshutdown_handler(_signo, _stack_frame):
except Exception as e:
logger.error(f"Error shutting down Socket.IO server: {str(e)}")
# Save data quickly
try:
datastore.sync_to_json()
logger.success('Fast sync to disk complete.')
except Exception as e:
logger.error(f"Error syncing to disk: {str(e)}")
# With immediate persistence, all data is already saved
logger.success('All data already persisted (immediate commits enabled).')
sys.exit()
def print_help():
"""Print help text for command line options"""
print('Usage: changedetection.py [options]')
print('')
print('Standard options:')
print(' -s SSL enable')
print(' -h HOST Listen host (default: 0.0.0.0)')
print(' -p PORT Listen port (default: 5000)')
print(' -d PATH Datastore path')
print(' -l LEVEL Log level (TRACE, DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL)')
print(' -c Cleanup unused snapshots')
print(' -C Create datastore directory if it doesn\'t exist')
print(' -P true/false Set all watches paused (true) or active (false)')
print('')
print('Add URLs on startup:')
print(' -u URL Add URL to watch (can be used multiple times)')
print(' -u0 \'JSON\' Set options for first -u URL (e.g. \'{"processor":"text_json_diff"}\')')
print(' -u1 \'JSON\' Set options for second -u URL (0-indexed)')
print(' -u2 \'JSON\' Set options for third -u URL, etc.')
print(' Available options: processor, fetch_backend, headers, method, etc.')
print(' See model/Watch.py for all available options')
print('')
print('Recheck on startup:')
print(' -r all Queue all watches for recheck on startup')
print(' -r UUID,... Queue specific watches (comma-separated UUIDs)')
print(' -r all N Queue all watches, wait for completion, repeat N times')
print(' -r UUID,... N Queue specific watches, wait for completion, repeat N times')
print('')
print('Batch mode:')
print(' -b Run in batch mode (process queue then exit)')
print(' Useful for CI/CD, cron jobs, or one-time checks')
print(' NOTE: Batch mode checks if Flask is running and aborts if port is in use')
print(' Use -p PORT to specify a different port if needed')
print('')
def main():
global datastore
global app
# Early help/version check before any initialization
if '--help' in sys.argv or '-help' in sys.argv:
print_help()
sys.exit(0)
if '--version' in sys.argv or '-v' in sys.argv:
print(f'changedetection.io {__version__}')
sys.exit(0)
# Import heavy modules after help/version checks to keep startup fast for those flags
from changedetectionio import store
from changedetectionio.flask_app import changedetection_app
datastore_path = None
do_cleanup = False
# Optional URL to watch since start
default_url = None
# Set a default logger level
logger_level = 'DEBUG'
include_default_watches = True
all_paused = None # None means don't change, True/False to set
host = os.environ.get("LISTEN_HOST", "0.0.0.0").strip()
port = int(os.environ.get('PORT', 5000))
ssl_mode = False
# Lists for multiple URLs and their options
urls_to_add = []
url_options = {} # Key: index (0-based), Value: dict of options
recheck_watches = None # None, 'all', or list of UUIDs
recheck_repeat_count = 1 # Number of times to repeat recheck cycle
batch_mode = False # Run once then exit when queue is empty
# On Windows, create and use a default path.
if os.name == 'nt':
datastore_path = os.path.expandvars(r'%APPDATA%\changedetection.io')
@@ -92,10 +206,68 @@ def main():
# Must be absolute so that send_from_directory doesnt try to make it relative to backend/
datastore_path = os.path.join(os.getcwd(), "../datastore")
# Pre-process arguments to extract -u, -u<N>, and -r options before getopt
# This allows unlimited -u0, -u1, -u2, ... options without predefining them
cleaned_argv = ['changedetection.py'] # Start with program name
i = 1
while i < len(sys.argv):
arg = sys.argv[i]
# Handle -u (add URL)
if arg == '-u' and i + 1 < len(sys.argv):
urls_to_add.append(sys.argv[i + 1])
i += 2
continue
# Handle -u<N> (set options for URL at index N)
if arg.startswith('-u') and len(arg) > 2 and arg[2:].isdigit():
idx = int(arg[2:])
if i + 1 < len(sys.argv):
try:
import json
url_options[idx] = json.loads(sys.argv[i + 1])
except json.JSONDecodeError as e:
print(f'Error: Invalid JSON for {arg}: {sys.argv[i + 1]}')
print(f'JSON decode error: {e}')
sys.exit(2)
i += 2
continue
# Handle -r (recheck watches)
if arg == '-r' and i + 1 < len(sys.argv):
recheck_arg = sys.argv[i + 1]
if recheck_arg.lower() == 'all':
recheck_watches = 'all'
else:
# Parse comma-separated list of UUIDs
recheck_watches = [uuid.strip() for uuid in recheck_arg.split(',') if uuid.strip()]
# Check for optional repeat count as third argument
if i + 2 < len(sys.argv) and sys.argv[i + 2].isdigit():
recheck_repeat_count = int(sys.argv[i + 2])
if recheck_repeat_count < 1:
print(f'Error: Repeat count must be at least 1, got {recheck_repeat_count}')
sys.exit(2)
i += 3
else:
i += 2
continue
# Handle -b (batch mode - run once and exit)
if arg == '-b':
batch_mode = True
i += 1
continue
# Keep other arguments for getopt
cleaned_argv.append(arg)
i += 1
try:
opts, args = getopt.getopt(sys.argv[1:], "6Ccsd:h:p:l:u:", "port")
except getopt.GetoptError:
print('backend.py -s SSL enable -h [host] -p [port] -d [datastore path] -u [default URL to watch] -l [debug level - TRACE, DEBUG(default), INFO, SUCCESS, WARNING, ERROR, CRITICAL]')
opts, args = getopt.getopt(cleaned_argv[1:], "6Csd:h:p:l:P:", "port")
except getopt.GetoptError as e:
print_help()
print(f'Error: {e}')
sys.exit(2)
create_datastore_dir = False
@@ -120,14 +292,6 @@ def main():
if opt == '-d':
datastore_path = arg
if opt == '-u':
default_url = arg
include_default_watches = False
# Cleanup (remove text files that arent in the index)
if opt == '-c':
do_cleanup = True
# Create the datadir if it doesnt exist
if opt == '-C':
create_datastore_dir = True
@@ -135,6 +299,18 @@ def main():
if opt == '-l':
logger_level = int(arg) if arg.isdigit() else arg.upper()
if opt == '-P':
try:
all_paused = bool(strtobool(arg))
except ValueError:
print(f'Error: Invalid value for -P option: {arg}')
print('Expected: true, false, yes, no, 1, or 0')
sys.exit(2)
# If URLs are provided, don't include default watches
if urls_to_add:
include_default_watches = False
logger.success(f"changedetection.io version {get_version()} starting.")
# Launch using SocketIO run method for proper integration (if enabled)
@@ -165,12 +341,22 @@ def main():
" WARNING, ERROR, CRITICAL")
sys.exit(2)
# Disable verbose pyppeteer logging to prevent memory leaks from large CDP messages
# Set both parent and child loggers since pyppeteer hardcodes DEBUG level
logging.getLogger('pyppeteer.connection').setLevel(logging.WARNING)
logging.getLogger('pyppeteer.connection.Connection').setLevel(logging.WARNING)
# isnt there some @thingy to attach to each route to tell it, that this route needs a datastore
app_config = {'datastore_path': datastore_path}
app_config = {
'datastore_path': datastore_path,
'batch_mode': batch_mode,
'recheck_watches': recheck_watches,
'recheck_repeat_count': recheck_repeat_count
}
if not os.path.isdir(app_config['datastore_path']):
if create_datastore_dir:
os.mkdir(app_config['datastore_path'])
os.makedirs(app_config['datastore_path'], exist_ok=True)
else:
logger.critical(
f"ERROR: Directory path for the datastore '{app_config['datastore_path']}'"
@@ -185,13 +371,219 @@ def main():
# Dont' start if the JSON DB looks corrupt
logger.critical(f"ERROR: JSON DB or Proxy List JSON at '{app_config['datastore_path']}' appears to be corrupt, aborting.")
logger.critical(str(e))
return
sys.exit(1)
if default_url:
datastore.add_watch(url = default_url)
# Testing mode: Exit cleanly after datastore initialization (for CI/CD upgrade tests)
if os.environ.get('TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD'):
logger.success(f"TESTING MODE: Datastore loaded successfully from {app_config['datastore_path']}")
logger.success(f"TESTING MODE: Schema version: {datastore.data['settings']['application'].get('schema_version', 'unknown')}")
logger.success(f"TESTING MODE: Loaded {len(datastore.data['watching'])} watches")
logger.success("TESTING MODE: Exiting cleanly (TESTING_SHUTDOWN_AFTER_DATASTORE_LOAD is set)")
sys.exit(0)
# Apply all_paused setting if specified via CLI
if all_paused is not None:
datastore.data['settings']['application']['all_paused'] = all_paused
logger.info(f"Setting all watches paused: {all_paused}")
# Inject datastore into plugins that need access to settings
from changedetectionio.pluggy_interface import inject_datastore_into_plugins
inject_datastore_into_plugins(datastore)
# Step 1: Add URLs with their options (if provided via -u flags)
added_watch_uuids = []
if urls_to_add:
logger.info(f"Adding {len(urls_to_add)} URL(s) from command line")
for idx, url in enumerate(urls_to_add):
extras = url_options.get(idx, {})
if extras:
logger.debug(f"Adding watch {idx}: {url} with options: {extras}")
else:
logger.debug(f"Adding watch {idx}: {url}")
new_uuid = datastore.add_watch(url=url, extras=extras)
if new_uuid:
added_watch_uuids.append(new_uuid)
logger.success(f"Added watch: {url} (UUID: {new_uuid})")
else:
logger.error(f"Failed to add watch: {url}")
app = changedetection_app(app_config, datastore)
# Step 2: Queue newly added watches (if -u was provided in batch mode)
# This must happen AFTER app initialization so update_q is available
if batch_mode and added_watch_uuids:
from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData, worker_pool
logger.info(f"Batch mode: Queuing {len(added_watch_uuids)} newly added watches")
for watch_uuid in added_watch_uuids:
try:
worker_pool.queue_item_async_safe(
update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
)
logger.debug(f"Queued newly added watch: {watch_uuid}")
except Exception as e:
logger.error(f"Failed to queue watch {watch_uuid}: {e}")
# Step 3: Queue watches for recheck (if -r was provided)
# This must happen AFTER app initialization so update_q is available
if recheck_watches is not None:
from changedetectionio.flask_app import update_q
from changedetectionio import queuedWatchMetaData, worker_pool
watches_to_queue = []
if recheck_watches == 'all':
# Queue all watches, excluding those already queued in batch mode
all_watches = list(datastore.data['watching'].keys())
if batch_mode and added_watch_uuids:
# Exclude newly added watches that were already queued in batch mode
watches_to_queue = [uuid for uuid in all_watches if uuid not in added_watch_uuids]
logger.info(f"Queuing {len(watches_to_queue)} existing watches for recheck ({len(added_watch_uuids)} newly added watches already queued)")
else:
watches_to_queue = all_watches
logger.info(f"Queuing all {len(watches_to_queue)} watches for recheck")
else:
# Queue specific UUIDs
watches_to_queue = recheck_watches
logger.info(f"Queuing {len(watches_to_queue)} specific watches for recheck")
queued_count = 0
for watch_uuid in watches_to_queue:
if watch_uuid in datastore.data['watching']:
try:
worker_pool.queue_item_async_safe(
update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
)
queued_count += 1
logger.debug(f"Queued watch for recheck: {watch_uuid}")
except Exception as e:
logger.error(f"Failed to queue watch {watch_uuid}: {e}")
else:
logger.warning(f"Watch UUID not found in datastore: {watch_uuid}")
logger.success(f"Successfully queued {queued_count} watches for recheck")
# Step 4: Setup batch mode monitor (if -b was provided)
if batch_mode:
from changedetectionio.flask_app import update_q
# Safety check: Ensure Flask app is not already running on this port
# Batch mode should never run alongside the web server
import socket
test_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
# Try to bind to the configured host:port (no SO_REUSEADDR - strict check)
test_socket.bind((host, port))
test_socket.close()
logger.debug(f"Batch mode: Port {port} is available (Flask app not running)")
except OSError as e:
test_socket.close()
# errno 98 = EADDRINUSE (Linux)
# errno 48 = EADDRINUSE (macOS)
# errno 10048 = WSAEADDRINUSE (Windows)
if e.errno in (48, 98, 10048) or "Address already in use" in str(e) or "already in use" in str(e).lower():
logger.critical(f"ERROR: Batch mode cannot run - port {port} is already in use")
logger.critical(f"The Flask web server appears to be running on {host}:{port}")
logger.critical(f"Batch mode is designed for standalone operation (CI/CD, cron jobs, etc.)")
logger.critical(f"Please either stop the Flask web server, or use a different port with -p PORT")
sys.exit(1)
else:
# Some other socket error - log but continue (might be network configuration issue)
logger.warning(f"Port availability check failed with unexpected error: {e}")
logger.warning(f"Continuing with batch mode anyway - be aware of potential conflicts")
def queue_watches_for_recheck(datastore, iteration):
"""Helper function to queue watches for recheck"""
watches_to_queue = []
if recheck_watches == 'all':
all_watches = list(datastore.data['watching'].keys())
if batch_mode and added_watch_uuids and iteration == 1:
# Only exclude newly added watches on first iteration
watches_to_queue = [uuid for uuid in all_watches if uuid not in added_watch_uuids]
else:
watches_to_queue = all_watches
logger.info(f"Batch mode (iteration {iteration}): Queuing all {len(watches_to_queue)} watches")
elif recheck_watches:
watches_to_queue = recheck_watches
logger.info(f"Batch mode (iteration {iteration}): Queuing {len(watches_to_queue)} specific watches")
queued_count = 0
for watch_uuid in watches_to_queue:
if watch_uuid in datastore.data['watching']:
try:
worker_pool.queue_item_async_safe(
update_q,
queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid})
)
queued_count += 1
except Exception as e:
logger.error(f"Failed to queue watch {watch_uuid}: {e}")
else:
logger.warning(f"Watch UUID not found in datastore: {watch_uuid}")
logger.success(f"Batch mode (iteration {iteration}): Successfully queued {queued_count} watches")
return queued_count
def batch_mode_monitor():
"""Monitor queue and workers, shutdown or repeat when work is complete"""
import time
# Track iterations if repeat mode is enabled
current_iteration = 1
total_iterations = recheck_repeat_count if recheck_watches and recheck_repeat_count > 1 else 1
if total_iterations > 1:
logger.info(f"Batch mode: Will repeat recheck {total_iterations} times")
else:
logger.info("Batch mode: Waiting for all queued items to complete...")
# Wait a bit for workers to start processing
time.sleep(3)
try:
while current_iteration <= total_iterations:
logger.info(f"Batch mode: Waiting for iteration {current_iteration}/{total_iterations} to complete...")
# Use the shared wait_for_all_checks function
completed = worker_pool.wait_for_all_checks(update_q, timeout=300)
if not completed:
logger.warning(f"Batch mode: Iteration {current_iteration} timed out after 300 seconds")
logger.success(f"Batch mode: Iteration {current_iteration}/{total_iterations} completed")
# Check if we need to repeat
if current_iteration < total_iterations:
logger.info(f"Batch mode: Starting iteration {current_iteration + 1}...")
current_iteration += 1
# Re-queue watches for next iteration
queue_watches_for_recheck(datastore, current_iteration)
# Brief pause before continuing
time.sleep(2)
else:
# All iterations complete
logger.success(f"Batch mode: All {total_iterations} iterations completed, initiating shutdown")
# Trigger shutdown
import os, signal
os.kill(os.getpid(), signal.SIGTERM)
return
except Exception as e:
logger.error(f"Batch mode monitor error: {e}")
logger.error(f"Initiating emergency shutdown")
import os, signal
os.kill(os.getpid(), signal.SIGTERM)
# Start monitor in background thread
monitor_thread = threading.Thread(target=batch_mode_monitor, daemon=True, name="BatchModeMonitor")
monitor_thread.start()
logger.info("Batch mode enabled: Will exit after all queued items are processed")
# Get the SocketIO instance from the Flask app (created in flask_app.py)
from changedetectionio.flask_app import socketio_server
global socketio
@@ -213,19 +605,17 @@ def main():
else:
logger.info("SIGUSR1 handler only registered on Linux, skipped.")
# Go into cleanup mode
if do_cleanup:
datastore.remove_unused_snapshots()
app.config['datastore_path'] = datastore_path
@app.context_processor
def inject_template_globals():
return dict(right_sticky="v{}".format(datastore.data['version_tag']),
return dict(right_sticky="v"+__version__,
new_version_available=app.config['NEW_VERSION_AVAILABLE'],
has_password=datastore.data['settings']['application']['password'] != False,
socket_io_enabled=datastore.data['settings']['application']['ui'].get('socket_io_enabled', True)
socket_io_enabled=datastore.data['settings']['application'].get('ui', {}).get('socket_io_enabled', True),
all_paused=datastore.data['settings']['application'].get('all_paused', False),
all_muted=datastore.data['settings']['application'].get('all_muted', False)
)
# Monitored websites will not receive a Referer header when a user clicks on an outgoing link.
@@ -247,23 +637,43 @@ def main():
if os.getenv('USE_X_SETTINGS'):
logger.info("USE_X_SETTINGS is ENABLED")
from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_prefix=1, x_host=1)
app.wsgi_app = ProxyFix(
app.wsgi_app,
x_for=1, # X-Forwarded-For (client IP)
x_proto=1, # X-Forwarded-Proto (http/https)
x_host=1, # X-Forwarded-Host (original host)
x_port=1, # X-Forwarded-Port (original port)
x_prefix=1 # X-Forwarded-Prefix (URL prefix)
)
# SocketIO instance is already initialized in flask_app.py
if socketio_server:
if ssl_mode:
logger.success(f"SSL mode enabled, attempting to start with '{ssl_cert_file}' and '{ssl_privkey_file}' in {os.getcwd()}")
socketio.run(app, host=host, port=int(port), debug=False,
ssl_context=(ssl_cert_file, ssl_privkey_file), allow_unsafe_werkzeug=True)
else:
socketio.run(app, host=host, port=int(port), debug=False, allow_unsafe_werkzeug=True)
# In batch mode, skip starting the HTTP server - just keep workers running
if batch_mode:
logger.info("Batch mode: Skipping HTTP server startup, workers will process queue")
logger.info("Batch mode: Main thread will wait for shutdown signal")
# Keep main thread alive until batch monitor triggers shutdown
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
logger.info("Batch mode: Keyboard interrupt received")
pass
else:
# Run Flask app without Socket.IO if disabled
logger.info("Starting Flask app without Socket.IO server")
if ssl_mode:
logger.success(f"SSL mode enabled, attempting to start with '{ssl_cert_file}' and '{ssl_privkey_file}' in {os.getcwd()}")
app.run(host=host, port=int(port), debug=False,
ssl_context=(ssl_cert_file, ssl_privkey_file))
# Normal mode: Start HTTP server
# SocketIO instance is already initialized in flask_app.py
if socketio_server:
if ssl_mode:
logger.success(f"SSL mode enabled, attempting to start with '{ssl_cert_file}' and '{ssl_privkey_file}' in {os.getcwd()}")
socketio.run(app, host=host, port=int(port), debug=False,
ssl_context=(ssl_cert_file, ssl_privkey_file), allow_unsafe_werkzeug=True)
else:
socketio.run(app, host=host, port=int(port), debug=False, allow_unsafe_werkzeug=True)
else:
app.run(host=host, port=int(port), debug=False)
# Run Flask app without Socket.IO if disabled
logger.info("Starting Flask app without Socket.IO server")
if ssl_mode:
logger.success(f"SSL mode enabled, attempting to start with '{ssl_cert_file}' and '{ssl_privkey_file}' in {os.getcwd()}")
app.run(host=host, port=int(port), debug=False,
ssl_context=(ssl_cert_file, ssl_privkey_file))
else:
app.run(host=host, port=int(port), debug=False)

View File

@@ -4,6 +4,10 @@ from flask import request
from functools import wraps
from . import auth, validate_openapi_request
from ..validate_url import is_safe_valid_url
import json
# Number of URLs above which import switches to background processing
IMPORT_SWITCH_TO_BACKGROUND_THRESHOLD = 20
def default_content_type(content_type='text/plain'):
@@ -19,6 +23,76 @@ def default_content_type(content_type='text/plain'):
return decorator
def convert_query_param_to_type(value, schema_property):
"""
Convert a query parameter string to the appropriate type based on schema definition.
Args:
value: String value from query parameter
schema_property: Schema property definition with 'type' or 'anyOf' field
Returns:
Converted value in the appropriate type
Supports both OpenAPI 3.1 formats:
- type: [string, 'null'] (array format)
- anyOf: [{type: string}, {type: null}] (anyOf format)
"""
prop_type = schema_property.get('type')
# Handle OpenAPI 3.1 type arrays: type: [string, 'null']
if isinstance(prop_type, list):
# Use the first non-null type from the array
for t in prop_type:
if t != 'null':
prop_type = t
break
else:
prop_type = None
# Handle anyOf schemas (older format)
elif 'anyOf' in schema_property:
# Use the first non-null type from anyOf
for option in schema_property['anyOf']:
if option.get('type') and option.get('type') != 'null':
prop_type = option.get('type')
break
else:
prop_type = None
# Handle array type (e.g., notification_urls)
if prop_type == 'array':
# Support both comma-separated and JSON array format
if value.startswith('['):
try:
return json.loads(value)
except json.JSONDecodeError:
return [v.strip() for v in value.split(',')]
return [v.strip() for v in value.split(',')]
# Handle object type (e.g., time_between_check, headers)
elif prop_type == 'object':
try:
return json.loads(value)
except json.JSONDecodeError:
raise ValueError(f"Invalid JSON object for field: {value}")
# Handle boolean type
elif prop_type == 'boolean':
return strtobool(value)
# Handle integer type
elif prop_type == 'integer':
return int(value)
# Handle number type (float)
elif prop_type == 'number':
return float(value)
# Default: return as string
return value
class Import(Resource):
def __init__(self, **kwargs):
# datastore is a black box dependency
@@ -28,40 +102,128 @@ class Import(Resource):
@default_content_type('text/plain') #3547 #3542
@validate_openapi_request('importWatches')
def post(self):
"""Import a list of watched URLs."""
"""Import a list of watched URLs with optional watch configuration."""
from . import get_watch_schema_properties
# Special parameters that are NOT watch configuration
special_params = {'tag', 'tag_uuids', 'dedupe', 'proxy'}
extras = {}
# Handle special 'proxy' parameter
if request.args.get('proxy'):
plist = self.datastore.proxy_list
if not request.args.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
proxy_list_str = ', '.join(plist) if plist else 'none configured'
return f"Invalid proxy choice, currently supported proxies are '{proxy_list_str}'", 400
else:
extras['proxy'] = request.args.get('proxy')
# Handle special 'dedupe' parameter
dedupe = strtobool(request.args.get('dedupe', 'true'))
# Handle special 'tag' and 'tag_uuids' parameters
tags = request.args.get('tag')
tag_uuids = request.args.get('tag_uuids')
if tag_uuids:
tag_uuids = tag_uuids.split(',')
# Extract ALL other query parameters as watch configuration
# Get schema from OpenAPI spec (replaces old schema_create_watch)
schema_properties = get_watch_schema_properties()
for param_name, param_value in request.args.items():
# Skip special parameters
if param_name in special_params:
continue
# Skip if not in schema (unknown parameter)
if param_name not in schema_properties:
return f"Unknown watch configuration parameter: {param_name}", 400
# Convert to appropriate type based on schema
try:
converted_value = convert_query_param_to_type(param_value, schema_properties[param_name])
extras[param_name] = converted_value
except (ValueError, json.JSONDecodeError) as e:
return f"Invalid value for parameter '{param_name}': {str(e)}", 400
# Validate processor if provided
if 'processor' in extras:
from changedetectionio.processors import available_processors
available = [p[0] for p in available_processors()]
if extras['processor'] not in available:
return f"Invalid processor '{extras['processor']}'. Available processors: {', '.join(available)}", 400
# Validate fetch_backend if provided
if 'fetch_backend' in extras:
from changedetectionio.content_fetchers import available_fetchers
available = [f[0] for f in available_fetchers()]
# Also allow 'system' and extra_browser_* patterns
is_valid = (
extras['fetch_backend'] == 'system' or
extras['fetch_backend'] in available or
extras['fetch_backend'].startswith('extra_browser_')
)
if not is_valid:
return f"Invalid fetch_backend '{extras['fetch_backend']}'. Available: system, {', '.join(available)}", 400
# Validate notification_urls if provided
if 'notification_urls' in extras:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
validate_notification_urls(extras['notification_urls'])
except ValidationError as e:
return f"Invalid notification_urls: {str(e)}", 400
urls = request.get_data().decode('utf8').splitlines()
added = []
# Clean and validate URLs upfront
urls_to_import = []
for url in urls:
url = url.strip()
if not len(url):
continue
# If hosts that only contain alphanumerics are allowed ("localhost" for example)
# Validate URL
if not is_safe_valid_url(url):
return f"Invalid or unsupported URL - {url}", 400
# Check for duplicates if dedupe is enabled
if dedupe and self.datastore.url_exists(url):
continue
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags, tag_uuids=tag_uuids)
added.append(new_uuid)
urls_to_import.append(url)
return added
# For small imports, process synchronously for immediate feedback
if len(urls_to_import) < IMPORT_SWITCH_TO_BACKGROUND_THRESHOLD:
added = []
for url in urls_to_import:
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags, tag_uuids=tag_uuids)
added.append(new_uuid)
return added, 200
# For large imports (>= 20), process in background thread
else:
import threading
from loguru import logger
def import_watches_background():
"""Background thread to import watches - discarded after completion."""
try:
added_count = 0
for url in urls_to_import:
try:
self.datastore.add_watch(url=url, extras=extras, tag=tags, tag_uuids=tag_uuids)
added_count += 1
except Exception as e:
logger.error(f"Error importing URL {url}: {e}")
logger.info(f"Background import complete: {added_count} watches created")
except Exception as e:
logger.error(f"Error in background import: {e}")
# Start background thread and return immediately
thread = threading.Thread(target=import_watches_background, daemon=True, name="ImportWatches-Background")
thread.start()
return {'status': f'Importing {len(urls_to_import)} URLs in background', 'count': len(urls_to_import)}, 202

View File

@@ -1,8 +1,6 @@
from flask_expects_json import expects_json
from flask_restful import Resource, abort
from flask import request
from . import auth, validate_openapi_request
from . import schema_create_notification_urls, schema_delete_notification_urls
class Notifications(Resource):
def __init__(self, **kwargs):
@@ -22,7 +20,6 @@ class Notifications(Resource):
@auth.check_token
@validate_openapi_request('addNotifications')
@expects_json(schema_create_notification_urls)
def post(self):
"""Create Notification URLs."""
@@ -50,7 +47,6 @@ class Notifications(Resource):
@auth.check_token
@validate_openapi_request('replaceNotifications')
@expects_json(schema_create_notification_urls)
def put(self):
"""Replace Notification URLs."""
json_data = request.get_json()
@@ -67,13 +63,12 @@ class Notifications(Resource):
clean_urls = [url.strip() for url in notification_urls if isinstance(url, str)]
self.datastore.data['settings']['application']['notification_urls'] = clean_urls
self.datastore.needs_write = True
self.datastore.commit()
return {'notification_urls': clean_urls}, 200
@auth.check_token
@validate_openapi_request('deleteNotifications')
@expects_json(schema_delete_notification_urls)
def delete(self):
"""Delete Notification URLs."""
@@ -95,7 +90,7 @@ class Notifications(Resource):
abort(400, message="No matching notification URLs found.")
self.datastore.data['settings']['application']['notification_urls'] = notification_urls
self.datastore.needs_write = True
self.datastore.commit()
return 'OK', 204

View File

@@ -0,0 +1,21 @@
import functools
from flask import make_response
from flask_restful import Resource
@functools.cache
def _get_spec_yaml():
"""Build and cache the merged spec as a YAML string (only serialized once per process)."""
import yaml
from changedetectionio.api import build_merged_spec_dict
return yaml.dump(build_merged_spec_dict(), default_flow_style=False, allow_unicode=True)
class Spec(Resource):
def get(self):
"""Return the merged OpenAPI spec including all registered processor extensions."""
return make_response(
_get_spec_yaml(),
200,
{'Content-Type': 'application/yaml'}
)

View File

@@ -1,13 +1,13 @@
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
from flask_expects_json import expects_json
from changedetectionio import worker_pool
from flask_restful import abort, Resource
from loguru import logger
import threading
from flask import request
from . import auth
# Import schemas from __init__.py
from . import schema_tag, schema_create_tag, schema_update_tag, validate_openapi_request
from . import validate_openapi_request
class Tag(Resource):
@@ -17,38 +17,75 @@ class Tag(Resource):
self.update_q = kwargs['update_q']
# Get information about a single tag
# curl http://localhost:5000/api/v1/tag/<string:uuid>
# curl http://localhost:5000/api/v1/tag/<uuid_str:uuid>
@auth.check_token
@validate_openapi_request('getTag')
def get(self, uuid):
"""Get data for a single tag/group, toggle notification muting, or recheck all."""
from copy import deepcopy
tag = deepcopy(self.datastore.data['settings']['application']['tags'].get(uuid))
tag = self.datastore.data['settings']['application']['tags'].get(uuid)
if not tag:
abort(404, message=f'No tag exists with the UUID of {uuid}')
if request.args.get('recheck'):
# Recheck all, including muted
# Get most overdue first
i=0
# Recheck all watches with this tag, including muted
# First collect watches to queue
watches_to_queue = []
for k in sorted(self.datastore.data['watching'].items(), key=lambda item: item[1].get('last_checked', 0)):
watch_uuid = k[0]
watch = k[1]
if not watch['paused'] and tag['uuid'] not in watch['tags']:
continue
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
i+=1
if not watch['paused'] and tag['uuid'] in watch['tags']:
watches_to_queue.append(watch_uuid)
return f"OK, {i} watches queued", 200
# If less than 20 watches, queue synchronously for immediate feedback
if len(watches_to_queue) < 20:
for watch_uuid in watches_to_queue:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
return {'status': f'OK, queued {len(watches_to_queue)} watches for rechecking'}, 200
else:
# 20+ watches - queue in background thread to avoid blocking API response
def queue_watches_background():
"""Background thread to queue watches - discarded after completion."""
try:
for watch_uuid in watches_to_queue:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
logger.info(f"Background queueing complete for tag {tag['uuid']}: {len(watches_to_queue)} watches queued")
except Exception as e:
logger.error(f"Error in background queueing for tag {tag['uuid']}: {e}")
# Start background thread and return immediately
thread = threading.Thread(target=queue_watches_background, daemon=True, name=f"QueueTag-{tag['uuid'][:8]}")
thread.start()
return {'status': f'OK, queueing {len(watches_to_queue)} watches in background'}, 202
if request.args.get('muted', '') == 'muted':
self.datastore.data['settings']['application']['tags'][uuid]['notification_muted'] = True
tag['notification_muted'] = True
tag.commit()
return "OK", 200
elif request.args.get('muted', '') == 'unmuted':
self.datastore.data['settings']['application']['tags'][uuid]['notification_muted'] = False
tag['notification_muted'] = False
tag.commit()
return "OK", 200
return tag
# Filter out Watch-specific runtime fields that don't apply to Tags (yet)
# TODO: Future enhancement - aggregate these values from all Watches that have this tag:
# - check_count: sum of all watches' check_count
# - last_checked: most recent last_checked from all watches
# - last_changed: most recent last_changed from all watches
# - consecutive_filter_failures: count of watches with failures
# - etc.
# These come from watch_base inheritance but currently have no meaningful value for Tags
watch_only_fields = {
'browser_steps_last_error_step', 'check_count', 'consecutive_filter_failures',
'content-type', 'fetch_time', 'last_changed', 'last_checked', 'last_error',
'last_notification_error', 'last_viewed', 'notification_alert_count',
'page_title', 'previous_md5', 'remote_server_reply'
}
# Create clean tag dict without Watch-specific fields
clean_tag = {k: v for k, v in tag.items() if k not in watch_only_fields}
return clean_tag
@auth.check_token
@validate_openapi_request('deleteTag')
@@ -59,38 +96,84 @@ class Tag(Resource):
# Delete the tag, and any tag reference
del self.datastore.data['settings']['application']['tags'][uuid]
# Remove tag from all watches
for watch_uuid, watch in self.datastore.data['watching'].items():
if watch.get('tags') and uuid in watch['tags']:
watch['tags'].remove(uuid)
watch.commit()
return 'OK', 204
@auth.check_token
@validate_openapi_request('updateTag')
@expects_json(schema_update_tag)
def put(self, uuid):
"""Update tag information."""
tag = self.datastore.data['settings']['application']['tags'].get(uuid)
if not tag:
abort(404, message='No tag exists with the UUID of {}'.format(uuid))
tag.update(request.json)
self.datastore.needs_write_urgent = True
# Make a mutable copy of request.json for modification
json_data = dict(request.json)
# Validate notification_urls if provided
if 'notification_urls' in json_data:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = json_data.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
# Filter out readOnly fields (extracted from OpenAPI spec Tag schema)
# These are system-managed fields that should never be user-settable
from . import get_readonly_tag_fields
readonly_fields = get_readonly_tag_fields()
# Tag model inherits from watch_base but has no @property attributes of its own
# So we only need to filter readOnly fields
for field in readonly_fields:
json_data.pop(field, None)
# Validate remaining fields - reject truly unknown fields
# Get valid fields from Tag schema
from . import get_tag_schema_properties
valid_fields = set(get_tag_schema_properties().keys())
# Check for unknown fields
unknown_fields = set(json_data.keys()) - valid_fields
if unknown_fields:
return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400
tag.update(json_data)
tag.commit()
# Clear checksums for all watches using this tag to force reprocessing
# Tag changes affect inherited configuration
cleared_count = self.datastore.clear_checksums_for_tag(uuid)
logger.info(f"Tag {uuid} updated via API, cleared {cleared_count} watch checksums")
return "OK", 200
@auth.check_token
@validate_openapi_request('createTag')
# Only cares for {'title': 'xxxx'}
def post(self):
"""Create a single tag/group."""
json_data = request.get_json()
title = json_data.get("title",'').strip()
# Validate that only valid fields are provided
# Get valid fields from Tag schema
from . import get_tag_schema_properties
valid_fields = set(get_tag_schema_properties().keys())
# Check for unknown fields
unknown_fields = set(json_data.keys()) - valid_fields
if unknown_fields:
return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400
new_uuid = self.datastore.add_tag(title=title)
if new_uuid:

View File

@@ -1,17 +1,20 @@
import os
import threading
from changedetectionio.validate_url import is_safe_valid_url
from changedetectionio.favicon_utils import get_favicon_mime_type
from flask_expects_json import expects_json
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
from flask_restful import abort, Resource
from flask import request, make_response, send_from_directory
from . import auth
from changedetectionio import queuedWatchMetaData, strtobool
from changedetectionio import worker_pool
from flask import request, make_response, send_from_directory
from flask_restful import abort, Resource
from loguru import logger
import copy
# Import schemas from __init__.py
from . import schema, schema_create_watch, schema_update_watch, validate_openapi_request
from . import validate_openapi_request, get_readonly_watch_fields
from ..notification import valid_notification_formats
from ..notification.handler import newline_re
def validate_time_between_check_required(json_data):
@@ -54,41 +57,53 @@ class Watch(Resource):
self.update_q = kwargs['update_q']
# Get information about a single watch, excluding the history list (can be large)
# curl http://localhost:5000/api/v1/watch/<string:uuid>
# curl http://localhost:5000/api/v1/watch/<uuid_str:uuid>
# @todo - version2 - ?muted and ?paused should be able to be called together, return the watch struct not "OK"
# ?recheck=true
@auth.check_token
@validate_openapi_request('getWatch')
def get(self, uuid):
"""Get information about a single watch, recheck, pause, or mute."""
from copy import deepcopy
watch = deepcopy(self.datastore.data['watching'].get(uuid))
if not watch:
# Get watch reference first (for pause/mute operations)
watch_obj = self.datastore.data['watching'].get(uuid)
if not watch_obj:
abort(404, message='No watch exists with the UUID of {}'.format(uuid))
# Create a dict copy for JSON response (with lock for thread safety)
# This is much faster than deepcopy and doesn't copy the datastore reference
# WARNING: dict() is a SHALLOW copy - nested dicts are shared with original!
# Only safe because we only ADD scalar properties (line 97-101), never modify nested dicts
# If you need to modify nested dicts, use: from copy import deepcopy; watch = deepcopy(dict(watch_obj))
with self.datastore.lock:
watch = dict(watch_obj)
if request.args.get('recheck'):
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return "OK", 200
if request.args.get('paused', '') == 'paused':
self.datastore.data['watching'].get(uuid).pause()
watch_obj.pause()
watch_obj.commit()
return "OK", 200
elif request.args.get('paused', '') == 'unpaused':
self.datastore.data['watching'].get(uuid).unpause()
watch_obj.unpause()
watch_obj.commit()
return "OK", 200
if request.args.get('muted', '') == 'muted':
self.datastore.data['watching'].get(uuid).mute()
watch_obj.mute()
watch_obj.commit()
return "OK", 200
elif request.args.get('muted', '') == 'unmuted':
self.datastore.data['watching'].get(uuid).unmute()
watch_obj.unmute()
watch_obj.commit()
return "OK", 200
# Return without history, get that via another API call
# Properties are not returned as a JSON, so add the required props manually
watch['history_n'] = watch.history_n
watch['history_n'] = watch_obj.history_n
# attr .last_changed will check for the last written text snapshot on change
watch['last_changed'] = watch.last_changed
watch['viewed'] = watch.viewed
watch['link'] = watch.link,
watch['last_changed'] = watch_obj.last_changed
watch['viewed'] = watch_obj.viewed
watch['link'] = watch_obj.link,
return watch
@@ -104,7 +119,6 @@ class Watch(Resource):
@auth.check_token
@validate_openapi_request('updateWatch')
@expects_json(schema_update_watch)
def put(self, uuid):
"""Update watch information."""
watch = self.datastore.data['watching'].get(uuid)
@@ -113,19 +127,86 @@ class Watch(Resource):
if request.json.get('proxy'):
plist = self.datastore.proxy_list
if not request.json.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
if not plist or request.json.get('proxy') not in plist:
proxy_list_str = ', '.join(plist) if plist else 'none configured'
return f"Invalid proxy choice, currently supported proxies are '{proxy_list_str}'", 400
# Validate time_between_check when not using defaults
validation_error = validate_time_between_check_required(request.json)
if validation_error:
return validation_error, 400
# XSS etc protection
if request.json.get('url') and not is_safe_valid_url(request.json.get('url')):
return "Invalid URL", 400
# Validate notification_urls if provided
if 'notification_urls' in request.json:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = request.json.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
watch.update(request.json)
# XSS etc protection - validate URL if it's being updated
if 'url' in request.json:
new_url = request.json.get('url')
# URL must be a non-empty string
if new_url is None:
return "URL cannot be null", 400
if not isinstance(new_url, str):
return "URL must be a string", 400
if not new_url.strip():
return "URL cannot be empty or whitespace only", 400
if not is_safe_valid_url(new_url.strip()):
return "Invalid or unsupported URL format. URL must use http://, https://, or ftp:// protocol", 400
# Handle processor-config-* fields separately (save to JSON, not datastore)
from changedetectionio import processors
# Make a mutable copy of request.json for modification
json_data = dict(request.json)
# Extract and remove processor config fields from json_data
processor_config_data = processors.extract_processor_config_from_form_data(json_data)
# Filter out readOnly fields (extracted from OpenAPI spec Watch schema)
# These are system-managed fields that should never be user-settable
readonly_fields = get_readonly_watch_fields()
# Also filter out @property attributes (computed/derived values from the model)
# These are not stored and should be ignored in PUT requests
from changedetectionio.model.Watch import model as WatchModel
property_fields = WatchModel.get_property_names()
# Combine both sets of fields to ignore
fields_to_ignore = readonly_fields | property_fields
# Remove all ignored fields from update data
for field in fields_to_ignore:
json_data.pop(field, None)
# Validate remaining fields - reject truly unknown fields
# Get valid fields from WatchBase schema
from . import get_watch_schema_properties
valid_fields = set(get_watch_schema_properties().keys())
# Also allow last_viewed (explicitly defined in UpdateWatch schema)
valid_fields.add('last_viewed')
# Check for unknown fields
unknown_fields = set(json_data.keys()) - valid_fields
if unknown_fields:
return f"Unknown field(s): {', '.join(sorted(unknown_fields))}", 400
# Update watch with regular (non-processor-config) fields
watch.update(json_data)
watch.commit()
# Save processor config to JSON file
processors.save_processor_config(self.datastore, uuid, processor_config_data)
return "OK", 200
@@ -136,7 +217,7 @@ class WatchHistory(Resource):
self.datastore = kwargs['datastore']
# Get a list of available history for a watch by UUID
# curl http://localhost:5000/api/v1/watch/<string:uuid>/history
# curl http://localhost:5000/api/v1/watch/<uuid_str:uuid>/history
@auth.check_token
@validate_openapi_request('getWatchHistory')
def get(self, uuid):
@@ -166,6 +247,10 @@ class WatchSingleHistory(Resource):
if timestamp == 'latest':
timestamp = list(watch.history.keys())[-1]
# Validate that the timestamp exists in history
if timestamp not in watch.history:
abort(404, message=f"No history snapshot found for timestamp '{timestamp}'")
if request.args.get('html'):
content = watch.get_fetched_html(timestamp)
if content:
@@ -181,6 +266,124 @@ class WatchSingleHistory(Resource):
return response
class WatchHistoryDiff(Resource):
"""
Generate diff between two historical snapshots.
Note: This API endpoint currently returns text-based diffs and works best
with the text_json_diff processor. Future processor types (like image_diff,
restock_diff) may want to implement their own specialized API endpoints
for returning processor-specific data (e.g., price charts, image comparisons).
The web UI diff page (/diff/<uuid>) is processor-aware and delegates rendering
to processors/{type}/difference.py::render() for processor-specific visualizations.
"""
def __init__(self, **kwargs):
# datastore is a black box dependency
self.datastore = kwargs['datastore']
@auth.check_token
@validate_openapi_request('getWatchHistoryDiff')
def get(self, uuid, from_timestamp, to_timestamp):
"""Generate diff between two historical snapshots."""
from changedetectionio import diff
from changedetectionio.notification.handler import apply_service_tweaks
watch = self.datastore.data['watching'].get(uuid)
if not watch:
abort(404, message=f"No watch exists with the UUID of {uuid}")
if not len(watch.history):
abort(404, message=f"Watch found but no history exists for the UUID {uuid}")
history_keys = list(watch.history.keys())
# Handle 'latest' keyword for to_timestamp
if to_timestamp == 'latest':
to_timestamp = history_keys[-1]
# Handle 'previous' keyword for from_timestamp (second-most-recent)
if from_timestamp == 'previous':
if len(history_keys) < 2:
abort(404, message=f"Not enough history entries. Need at least 2 snapshots for 'previous'")
from_timestamp = history_keys[-2]
# Validate timestamps exist
if from_timestamp not in watch.history:
abort(404, message=f"From timestamp {from_timestamp} not found in watch history")
if to_timestamp not in watch.history:
abort(404, message=f"To timestamp {to_timestamp} not found in watch history")
# Get the format parameter (default to 'text')
output_format = request.args.get('format', 'text').lower()
# Validate format
if output_format not in valid_notification_formats.keys():
abort(400, message=f"Invalid format. Must be one of: {', '.join(valid_notification_formats.keys())}")
# Get the word_diff parameter (default to False - line-level mode)
word_diff = strtobool(request.args.get('word_diff', 'false'))
# Get the no_markup parameter (default to False)
no_markup = strtobool(request.args.get('no_markup', 'false'))
# Retrieve snapshot contents
from_version_file_contents = watch.get_history_snapshot(from_timestamp)
to_version_file_contents = watch.get_history_snapshot(to_timestamp)
# Get diff preferences from query parameters (matching UI preferences in DIFF_PREFERENCES_CONFIG)
# Support both 'type' (UI parameter) and 'word_diff' (API parameter) for backward compatibility
diff_type = request.args.get('type', 'diffLines')
if diff_type == 'diffWords':
word_diff = True
# Get boolean diff preferences with defaults from DIFF_PREFERENCES_CONFIG
changes_only = strtobool(request.args.get('changesOnly', 'true'))
ignore_whitespace = strtobool(request.args.get('ignoreWhitespace', 'false'))
include_removed = strtobool(request.args.get('removed', 'true'))
include_added = strtobool(request.args.get('added', 'true'))
include_replaced = strtobool(request.args.get('replaced', 'true'))
# Generate the diff with all preferences
content = diff.render_diff(
previous_version_file_contents=from_version_file_contents,
newest_version_file_contents=to_version_file_contents,
ignore_junk=ignore_whitespace,
include_equal=changes_only,
include_removed=include_removed,
include_added=include_added,
include_replaced=include_replaced,
word_diff=word_diff,
)
# Skip formatting if no_markup is set
if no_markup:
mimetype = "text/plain"
else:
# Apply formatting based on the requested format
if output_format == 'htmlcolor':
from changedetectionio.notification.handler import apply_html_color_to_body
content = apply_html_color_to_body(n_body=content)
mimetype = "text/html"
else:
# Apply service tweaks for text/html formats
# Pass empty URL and title as they're not used for the placeholder replacement we need
_, content, _ = apply_service_tweaks(
url='',
n_body=content,
n_title='',
requested_output_format=output_format
)
mimetype = "text/html" if output_format == 'html' else "text/plain"
if 'html' in output_format:
content = newline_re.sub('<br>\r\n', content)
response = make_response(content, 200)
response.mimetype = mimetype
return response
class WatchFavicon(Resource):
def __init__(self, **kwargs):
# datastore is a black box dependency
@@ -196,18 +399,11 @@ class WatchFavicon(Resource):
favicon_filename = watch.get_favicon_filename()
if favicon_filename:
try:
import magic
mime = magic.from_file(
os.path.join(watch.watch_data_dir, favicon_filename),
mime=True
)
except ImportError:
# Fallback, no python-magic
import mimetypes
mime, encoding = mimetypes.guess_type(favicon_filename)
# Use cached MIME type detection
filepath = os.path.join(watch.data_dir, favicon_filename)
mime = get_favicon_mime_type(filepath)
response = make_response(send_from_directory(watch.watch_data_dir, favicon_filename))
response = make_response(send_from_directory(watch.data_dir, favicon_filename))
response.headers['Content-type'] = mime
response.headers['Cache-Control'] = 'max-age=300, must-revalidate' # Cache for 5 minutes, then revalidate
return response
@@ -223,7 +419,6 @@ class CreateWatch(Resource):
@auth.check_token
@validate_openapi_request('createWatch')
@expects_json(schema_create_watch)
def post(self):
"""Create a single watch."""
@@ -235,16 +430,33 @@ class CreateWatch(Resource):
if json_data.get('proxy'):
plist = self.datastore.proxy_list
if not json_data.get('proxy') in plist:
return "Invalid proxy choice, currently supported proxies are '{}'".format(', '.join(plist)), 400
if not plist or json_data.get('proxy') not in plist:
proxy_list_str = ', '.join(plist) if plist else 'none configured'
return f"Invalid proxy choice, currently supported proxies are '{proxy_list_str}'", 400
# Validate time_between_check when not using defaults
validation_error = validate_time_between_check_required(json_data)
if validation_error:
return validation_error, 400
# Validate notification_urls if provided
if 'notification_urls' in json_data:
from wtforms import ValidationError
from changedetectionio.api.Notifications import validate_notification_urls
try:
notification_urls = json_data.get('notification_urls', [])
validate_notification_urls(notification_urls)
except ValidationError as e:
return str(e), 400
# Handle processor-config-* fields separately (save to JSON, not watch)
from changedetectionio import processors
extras = copy.deepcopy(json_data)
# Extract and remove processor config fields from extras
processor_config_data = processors.extract_processor_config_from_form_data(extras)
# Because we renamed 'tag' to 'tags' but don't want to change the API (can do this in v2 of the API)
tags = None
if extras.get('tag'):
@@ -254,10 +466,25 @@ class CreateWatch(Resource):
del extras['url']
new_uuid = self.datastore.add_watch(url=url, extras=extras, tag=tags)
# Save processor config to separate JSON file
if new_uuid and processor_config_data:
processors.save_processor_config(self.datastore, new_uuid, processor_config_data)
if new_uuid:
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
# Dont queue because the scheduler will check that it hasnt been checked before anyway
# worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
return {'uuid': new_uuid}, 201
else:
# Check if it was a limit issue
page_watch_limit = os.getenv('PAGE_WATCH_LIMIT')
if page_watch_limit:
try:
page_watch_limit = int(page_watch_limit)
current_watch_count = len(self.datastore.data['watching'])
if current_watch_count >= page_watch_limit:
return f"Watch limit reached ({current_watch_count}/{page_watch_limit} watches). Cannot add more watches.", 429
except ValueError:
pass
return "Invalid or unsupported URL", 400
@auth.check_token
@@ -279,14 +506,65 @@ class CreateWatch(Resource):
'last_error': watch['last_error'],
'link': watch.link,
'page_title': watch['page_title'],
'tags': [*tags], # Unpack dict keys to list (can't use list() since variable named 'list')
'title': watch['title'],
'url': watch['url'],
'viewed': watch.viewed
}
if request.args.get('recheck_all'):
for uuid in self.datastore.data['watching'].keys():
worker_handler.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return {'status': "OK"}, 200
# Collect all watches to queue
watches_to_queue = self.datastore.data['watching'].keys()
# If less than 20 watches, queue synchronously for immediate feedback
if len(watches_to_queue) < 20:
# Get already queued/running UUIDs once (efficient)
queued_uuids = set(self.update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids())
# Filter out watches that are already queued or running
watches_to_queue_filtered = [
uuid for uuid in watches_to_queue
if uuid not in queued_uuids and uuid not in running_uuids
]
# Queue only the filtered watches
for uuid in watches_to_queue_filtered:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Provide feedback about skipped watches
skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered)
if skipped_count > 0:
return {'status': f'OK, queued {len(watches_to_queue_filtered)} watches for rechecking ({skipped_count} already queued or running)'}, 200
else:
return {'status': f'OK, queued {len(watches_to_queue_filtered)} watches for rechecking'}, 200
else:
# 20+ watches - queue in background thread to avoid blocking API response
# Capture queued/running state before background thread
queued_uuids = set(self.update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids())
def queue_all_watches_background():
"""Background thread to queue all watches - discarded after completion."""
try:
queued_count = 0
skipped_count = 0
for uuid in watches_to_queue:
# Check if already queued or running (state captured at start)
if uuid not in queued_uuids and uuid not in running_uuids:
worker_pool.queue_item_async_safe(self.update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
queued_count += 1
else:
skipped_count += 1
logger.info(f"Background queueing complete: {queued_count} watches queued, {skipped_count} skipped (already queued/running)")
except Exception as e:
logger.error(f"Error in background queueing all watches: {e}")
# Start background thread and return immediately
thread = threading.Thread(target=queue_all_watches_background, daemon=True, name="QueueAllWatches-Background")
thread.start()
return {'status': f'OK, queueing {len(watches_to_queue)} watches in background'}, 202
return list, 200

View File

@@ -1,71 +1,180 @@
import copy
import functools
from flask import request, abort
from loguru import logger
from . import api_schema
from ..model import watch_base
# Build a JSON Schema atleast partially based on our Watch model
watch_base_config = watch_base()
schema = api_schema.build_watch_json_schema(watch_base_config)
schema_create_watch = copy.deepcopy(schema)
schema_create_watch['required'] = ['url']
del schema_create_watch['properties']['last_viewed']
schema_update_watch = copy.deepcopy(schema)
schema_update_watch['additionalProperties'] = False
# Tag schema is also based on watch_base since Tag inherits from it
schema_tag = copy.deepcopy(schema)
schema_create_tag = copy.deepcopy(schema_tag)
schema_create_tag['required'] = ['title']
schema_update_tag = copy.deepcopy(schema_tag)
schema_update_tag['additionalProperties'] = False
schema_notification_urls = copy.deepcopy(schema)
schema_create_notification_urls = copy.deepcopy(schema_notification_urls)
schema_create_notification_urls['required'] = ['notification_urls']
schema_delete_notification_urls = copy.deepcopy(schema_notification_urls)
schema_delete_notification_urls['required'] = ['notification_urls']
@functools.cache
def get_openapi_spec():
"""Lazy load OpenAPI spec and dependencies only when validation is needed."""
def build_merged_spec_dict():
"""
Load the base OpenAPI spec and merge in any per-processor api.yaml extensions.
Each processor can provide an api.yaml file alongside its __init__.py that defines
additional schemas (e.g., processor_config_restock_diff). These are merged into
WatchBase.properties so the spec accurately reflects what the API accepts.
Plugin processors (via pluggy) are also supported - they just need an api.yaml
next to their processor module.
Returns the merged dict (cached - do not mutate the returned value).
"""
import os
import yaml # Lazy import - only loaded when API validation is actually used
from openapi_core import OpenAPI # Lazy import - saves ~10.7 MB on startup
import yaml
spec_path = os.path.join(os.path.dirname(__file__), '../../docs/api-spec.yaml')
if not os.path.exists(spec_path):
# Possibly for pip3 packages
spec_path = os.path.join(os.path.dirname(__file__), '../docs/api-spec.yaml')
with open(spec_path, 'r', encoding='utf-8') as f:
spec_dict = yaml.safe_load(f)
_openapi_spec = OpenAPI.from_dict(spec_dict)
return _openapi_spec
try:
from changedetectionio.processors import find_processors, get_parent_module
for module, proc_name in find_processors():
parent = get_parent_module(module)
if not parent or not hasattr(parent, '__file__'):
continue
api_yaml_path = os.path.join(os.path.dirname(parent.__file__), 'api.yaml')
if not os.path.exists(api_yaml_path):
continue
with open(api_yaml_path, 'r', encoding='utf-8') as f:
proc_spec = yaml.safe_load(f)
# Merge schemas
proc_schemas = proc_spec.get('components', {}).get('schemas', {})
spec_dict['components']['schemas'].update(proc_schemas)
# Inject processor_config_{name} into WatchBase if the schema is defined
schema_key = f'processor_config_{proc_name}'
if schema_key in proc_schemas:
spec_dict['components']['schemas']['WatchBase']['properties'][schema_key] = {
'$ref': f'#/components/schemas/{schema_key}'
}
# Append x-code-samples from processor paths into existing path operations
for path, path_item in proc_spec.get('paths', {}).items():
if path not in spec_dict.get('paths', {}):
continue
for method, operation in path_item.items():
if method not in spec_dict['paths'][path]:
continue
if 'x-code-samples' in operation:
existing = spec_dict['paths'][path][method].get('x-code-samples', [])
spec_dict['paths'][path][method]['x-code-samples'] = existing + operation['x-code-samples']
except Exception as e:
logger.warning(f"Failed to merge processor API specs: {e}")
return spec_dict
@functools.cache
def get_openapi_spec():
"""Lazy load OpenAPI spec and dependencies only when validation is needed."""
from openapi_core import OpenAPI # Lazy import - saves ~10.7 MB on startup
return OpenAPI.from_dict(build_merged_spec_dict())
@functools.cache
def get_openapi_schema_dict():
"""
Get the raw OpenAPI spec dictionary for schema access.
Used by Import endpoint to validate and convert query parameters.
Returns the merged YAML dict (not the OpenAPI object).
"""
return build_merged_spec_dict()
@functools.cache
def _resolve_schema_properties(schema_name):
"""
Generic helper to resolve schema properties, including allOf inheritance.
Args:
schema_name: Name of the schema (e.g., 'WatchBase', 'Watch', 'Tag')
Returns:
dict: All properties including inherited ones from $ref schemas
"""
spec_dict = get_openapi_schema_dict()
schema = spec_dict['components']['schemas'].get(schema_name, {})
properties = {}
# Handle allOf (schema inheritance)
if 'allOf' in schema:
for item in schema['allOf']:
# Resolve $ref to parent schema
if '$ref' in item:
ref_path = item['$ref'].split('/')[-1]
ref_schema = spec_dict['components']['schemas'].get(ref_path, {})
properties.update(ref_schema.get('properties', {}))
# Add schema-specific properties
if 'properties' in item:
properties.update(item['properties'])
else:
# Direct properties (no inheritance)
properties = schema.get('properties', {})
return properties
@functools.cache
def get_watch_schema_properties():
"""
Extract watch schema properties from OpenAPI spec for Import endpoint.
Returns WatchBase properties (all writable Watch fields).
"""
return _resolve_schema_properties('WatchBase')
# Import readonly field utilities from shared module (avoids circular dependencies with model layer)
from changedetectionio.model.schema_utils import get_readonly_watch_fields, get_readonly_tag_fields
@functools.cache
def get_tag_schema_properties():
"""
Extract Tag schema properties from OpenAPI spec.
Returns WatchBase properties + Tag-specific properties (overrides_watch).
"""
return _resolve_schema_properties('Tag')
def validate_openapi_request(operation_id):
"""Decorator to validate incoming requests against OpenAPI spec."""
def decorator(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
from werkzeug.exceptions import BadRequest
try:
# Skip OpenAPI validation for GET requests since they don't have request bodies
if request.method.upper() != 'GET':
# Lazy import - only loaded when actually validating a request
from openapi_core.contrib.flask import FlaskOpenAPIRequest
from openapi_core.templating.paths.exceptions import ServerNotFound, PathNotFound, PathError
spec = get_openapi_spec()
openapi_request = FlaskOpenAPIRequest(request)
result = spec.unmarshal_request(openapi_request)
if result.errors:
from werkzeug.exceptions import BadRequest
error_details = []
for error in result.errors:
error_details.append(str(error))
raise BadRequest(f"OpenAPI validation failed: {error_details}")
# Skip path/server validation errors for reverse proxy compatibility
# Flask routing already validates that endpoints exist (returns 404 if not).
# OpenAPI validation here is primarily for request body schema validation.
# When behind nginx/reverse proxy, URLs may have path prefixes that don't
# match the OpenAPI server definitions, causing false positives.
if isinstance(error, PathError):
logger.debug(f"API Call - Skipping path/server validation (delegated to Flask): {error}")
continue
error_str = str(error)
# Extract detailed schema errors from __cause__
if hasattr(error, '__cause__') and hasattr(error.__cause__, 'schema_errors'):
for schema_error in error.__cause__.schema_errors:
field = '.'.join(str(p) for p in schema_error.path) if schema_error.path else 'body'
msg = schema_error.message if hasattr(schema_error, 'message') else str(schema_error)
error_details.append(f"{field}: {msg}")
else:
error_details.append(error_str)
# Only raise if we have actual validation errors (not path/server issues)
if error_details:
logger.error(f"API Call - Validation failed: {'; '.join(error_details)}")
raise BadRequest(f"Validation failed: {'; '.join(error_details)}")
except BadRequest:
# Re-raise BadRequest exceptions (validation failures)
raise
@@ -78,9 +187,10 @@ def validate_openapi_request(operation_id):
return decorator
# Import all API resources
from .Watch import Watch, WatchHistory, WatchSingleHistory, CreateWatch, WatchFavicon
from .Watch import Watch, WatchHistory, WatchSingleHistory, WatchHistoryDiff, CreateWatch, WatchFavicon
from .Tags import Tags, Tag
from .Import import Import
from .SystemInfo import SystemInfo
from .Spec import Spec
from .Notifications import Notifications

View File

@@ -1,162 +0,0 @@
# Responsible for building the storage dict into a set of rules ("JSON Schema") acceptable via the API
# Probably other ways to solve this when the backend switches to some ORM
from changedetectionio.notification import valid_notification_formats
def build_time_between_check_json_schema():
# Setup time between check schema
schema_properties_time_between_check = {
"type": "object",
"additionalProperties": False,
"properties": {}
}
for p in ['weeks', 'days', 'hours', 'minutes', 'seconds']:
schema_properties_time_between_check['properties'][p] = {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
]
}
return schema_properties_time_between_check
def build_watch_json_schema(d):
# Base JSON schema
schema = {
'type': 'object',
'properties': {},
}
for k, v in d.items():
# @todo 'integer' is not covered here because its almost always for internal usage
if isinstance(v, type(None)):
schema['properties'][k] = {
"anyOf": [
{"type": "null"},
]
}
elif isinstance(v, list):
schema['properties'][k] = {
"anyOf": [
{"type": "array",
# Always is an array of strings, like text or regex or something
"items": {
"type": "string",
"maxLength": 5000
}
},
]
}
elif isinstance(v, bool):
schema['properties'][k] = {
"anyOf": [
{"type": "boolean"},
]
}
elif isinstance(v, str):
schema['properties'][k] = {
"anyOf": [
{"type": "string",
"maxLength": 5000},
]
}
# Can also be a string (or None by default above)
for v in ['body',
'notification_body',
'notification_format',
'notification_title',
'proxy',
'tag',
'title',
'webdriver_js_execute_code'
]:
schema['properties'][v]['anyOf'].append({'type': 'string', "maxLength": 5000})
for v in ['last_viewed']:
schema['properties'][v] = {
"type": "integer",
"description": "Unix timestamp in seconds of the last time the watch was viewed.",
"minimum": 0
}
# None or Boolean
schema['properties']['track_ldjson_price_data']['anyOf'].append({'type': 'boolean'})
schema['properties']['method'] = {"type": "string",
"enum": ["GET", "POST", "DELETE", "PUT"]
}
schema['properties']['fetch_backend']['anyOf'].append({"type": "string",
"enum": ["html_requests", "html_webdriver"]
})
schema['properties']['processor'] = {"anyOf": [
{"type": "string", "enum": ["restock_diff", "text_json_diff"]},
{"type": "null"}
]}
# All headers must be key/value type dict
schema['properties']['headers'] = {
"type": "object",
"patternProperties": {
# Should always be a string:string type value
".*": {"type": "string"},
}
}
schema['properties']['notification_format'] = {'type': 'string',
'enum': list(valid_notification_formats.keys())
}
# Stuff that shouldn't be available but is just state-storage
for v in ['previous_md5', 'last_error', 'has_ldjson_price_data', 'previous_md5_before_filters', 'uuid']:
del schema['properties'][v]
schema['properties']['webdriver_delay']['anyOf'].append({'type': 'integer'})
schema['properties']['time_between_check'] = build_time_between_check_json_schema()
schema['properties']['time_between_check_use_default'] = {
"type": "boolean",
"default": True,
"description": "Whether to use global settings for time between checks - defaults to true if not set"
}
schema['properties']['browser_steps'] = {
"anyOf": [
{
"type": "array",
"items": {
"type": "object",
"properties": {
"operation": {
"type": ["string", "null"],
"maxLength": 5000 # Allows null and any string up to 5000 chars (including "")
},
"selector": {
"type": ["string", "null"],
"maxLength": 5000
},
"optional_value": {
"type": ["string", "null"],
"maxLength": 5000
}
},
"required": ["operation", "selector", "optional_value"],
"additionalProperties": False # No extra keys allowed
}
},
{"type": "null"}, # Allows null for `browser_steps`
{"type": "array", "maxItems": 0} # Allows empty array []
]
}
# headers ?
return schema

View File

@@ -3,6 +3,7 @@ import glob
import threading
from flask import Blueprint, render_template, send_from_directory, flash, url_for, redirect, abort
from flask_babel import gettext
import os
from changedetectionio.store import ChangeDetectionStore
@@ -12,7 +13,7 @@ from loguru import logger
BACKUP_FILENAME_FORMAT = "changedetection-backup-{}.zip"
def create_backup(datastore_path, watches: dict):
def create_backup(datastore_path, watches: dict, tags: dict = None):
logger.debug("Creating backup...")
import zipfile
from pathlib import Path
@@ -26,15 +27,36 @@ def create_backup(datastore_path, watches: dict):
compression=zipfile.ZIP_DEFLATED,
compresslevel=8) as zipObj:
# Add the index
zipObj.write(os.path.join(datastore_path, "url-watches.json"), arcname="url-watches.json")
# Add the settings file (supports both formats)
# New format: changedetection.json
changedetection_json = os.path.join(datastore_path, "changedetection.json")
if os.path.isfile(changedetection_json):
zipObj.write(changedetection_json, arcname="changedetection.json")
logger.debug("Added changedetection.json to backup")
# Add the flask app secret
zipObj.write(os.path.join(datastore_path, "secret.txt"), arcname="secret.txt")
# Legacy format: url-watches.json (for backward compatibility)
url_watches_json = os.path.join(datastore_path, "url-watches.json")
if os.path.isfile(url_watches_json):
zipObj.write(url_watches_json, arcname="url-watches.json")
logger.debug("Added url-watches.json to backup")
# Add the flask app secret (if it exists)
secret_file = os.path.join(datastore_path, "secret.txt")
if os.path.isfile(secret_file):
zipObj.write(secret_file, arcname="secret.txt")
# Add tag data directories (each tag has its own {uuid}/tag.json)
for uuid, tag in (tags or {}).items():
for f in Path(tag.data_dir).glob('*'):
zipObj.write(f,
arcname=os.path.join(f.parts[-2], f.parts[-1]),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
logger.debug(f"Added tag '{tag.get('title')}' ({uuid}) to backup")
# Add any data in the watch data directory.
for uuid, w in watches.items():
for f in Path(w.watch_data_dir).glob('*'):
for f in Path(w.data_dir).glob('*'):
zipObj.write(f,
# Use the full path to access the file, but make the file 'relative' in the Zip.
arcname=os.path.join(f.parts[-2], f.parts[-1]),
@@ -75,28 +97,36 @@ def create_backup(datastore_path, watches: dict):
def construct_blueprint(datastore: ChangeDetectionStore):
from .restore import construct_restore_blueprint
backups_blueprint = Blueprint('backups', __name__, template_folder="templates")
backups_blueprint.register_blueprint(construct_restore_blueprint(datastore))
backup_threads = []
@login_optionally_required
@backups_blueprint.route("/request-backup", methods=['GET'])
def request_backup():
if any(thread.is_alive() for thread in backup_threads):
flash("A backup is already running, check back in a few minutes", "error")
return redirect(url_for('backups.index'))
flash(gettext("A backup is already running, check back in a few minutes"), "error")
return redirect(url_for('backups.create'))
if len(find_backups()) > int(os.getenv("MAX_NUMBER_BACKUPS", 100)):
flash("Maximum number of backups reached, please remove some", "error")
return redirect(url_for('backups.index'))
flash(gettext("Maximum number of backups reached, please remove some"), "error")
return redirect(url_for('backups.create'))
# Be sure we're written fresh
datastore.sync_to_json()
zip_thread = threading.Thread(target=create_backup, args=(datastore.datastore_path, datastore.data.get("watching")))
# With immediate persistence, all data is already saved
zip_thread = threading.Thread(
target=create_backup,
args=(datastore.datastore_path, datastore.data.get("watching")),
kwargs={'tags': datastore.data['settings']['application'].get('tags', {})},
daemon=True,
name="BackupCreator"
)
zip_thread.start()
backup_threads.append(zip_thread)
flash("Backup building in background, check back in a few minutes.")
flash(gettext("Backup building in background, check back in a few minutes."))
return redirect(url_for('backups.index'))
return redirect(url_for('backups.create'))
def find_backups():
backup_filepath = os.path.join(datastore.datastore_path, BACKUP_FILENAME_FORMAT.format("*"))
@@ -138,14 +168,14 @@ def construct_blueprint(datastore: ChangeDetectionStore):
return send_from_directory(os.path.abspath(datastore.datastore_path), filename, as_attachment=True)
@login_optionally_required
@backups_blueprint.route("", methods=['GET'])
def index():
@backups_blueprint.route("/", methods=['GET'])
@backups_blueprint.route("/create", methods=['GET'])
def create():
backups = find_backups()
output = render_template("overview.html",
output = render_template("backup_create.html",
available_backups=backups,
backup_running=any(thread.is_alive() for thread in backup_threads)
)
return output
@login_optionally_required
@@ -157,8 +187,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
for backup in backups:
os.unlink(backup)
flash("Backups were deleted.")
flash(gettext("Backups were deleted."))
return redirect(url_for('backups.index'))
return redirect(url_for('backups.create'))
return backups_blueprint

View File

@@ -0,0 +1,208 @@
import io
import json
import os
import shutil
import tempfile
import threading
import zipfile
from flask import Blueprint, render_template, flash, url_for, redirect, request
from flask_babel import gettext, lazy_gettext as _l
from wtforms import Form, BooleanField, SubmitField
from flask_wtf.file import FileField, FileAllowed
from loguru import logger
from changedetectionio.flask_app import login_optionally_required
class RestoreForm(Form):
zip_file = FileField(_l('Backup zip file'), validators=[
FileAllowed(['zip'], _l('Must be a .zip backup file!'))
])
include_groups = BooleanField(_l('Include groups'), default=True)
include_groups_replace_existing = BooleanField(_l('Replace existing groups of the same UUID'), default=True)
include_watches = BooleanField(_l('Include watches'), default=True)
include_watches_replace_existing = BooleanField(_l('Replace existing watches of the same UUID'), default=True)
submit = SubmitField(_l('Restore backup'))
def import_from_zip(zip_stream, datastore, include_groups, include_groups_replace, include_watches, include_watches_replace):
"""
Extract and import watches and groups from a backup zip stream.
Mirrors the store's _load_watches / _load_tags loading pattern:
- UUID dirs with tag.json → Tag.model + tag_obj.commit()
- UUID dirs with watch.json → rehydrate_entity + watch_obj.commit()
Returns a dict with counts: restored_groups, skipped_groups, restored_watches, skipped_watches.
Raises zipfile.BadZipFile if the stream is not a valid zip.
"""
from changedetectionio.model import Tag
restored_groups = 0
skipped_groups = 0
restored_watches = 0
skipped_watches = 0
current_tags = datastore.data['settings']['application'].get('tags', {})
current_watches = datastore.data['watching']
with tempfile.TemporaryDirectory() as tmpdir:
logger.debug(f"Restore: extracting zip to {tmpdir}")
with zipfile.ZipFile(zip_stream, 'r') as zf:
zf.extractall(tmpdir)
logger.debug("Restore: zip extracted, scanning UUID directories")
for entry in os.scandir(tmpdir):
if not entry.is_dir():
continue
uuid = entry.name
tag_json_path = os.path.join(entry.path, 'tag.json')
watch_json_path = os.path.join(entry.path, 'watch.json')
# --- Tags (groups) ---
if include_groups and os.path.exists(tag_json_path):
if uuid in current_tags and not include_groups_replace:
logger.debug(f"Restore: skipping existing group {uuid} (replace not requested)")
skipped_groups += 1
continue
try:
with open(tag_json_path, 'r', encoding='utf-8') as f:
tag_data = json.load(f)
except (json.JSONDecodeError, IOError) as e:
logger.error(f"Restore: failed to read tag.json for {uuid}: {e}")
continue
title = tag_data.get('title', uuid)
logger.debug(f"Restore: importing group '{title}' ({uuid})")
# Mirror _load_tags: set uuid and force processor
tag_data['uuid'] = uuid
tag_data['processor'] = 'restock_diff'
# Copy the UUID directory so data_dir exists for commit()
dst_dir = os.path.join(datastore.datastore_path, uuid)
if os.path.exists(dst_dir):
shutil.rmtree(dst_dir)
shutil.copytree(entry.path, dst_dir)
tag_obj = Tag.model(
datastore_path=datastore.datastore_path,
__datastore=datastore.data,
default=tag_data
)
current_tags[uuid] = tag_obj
tag_obj.commit()
restored_groups += 1
logger.success(f"Restore: group '{title}' ({uuid}) restored")
# --- Watches ---
elif include_watches and os.path.exists(watch_json_path):
if uuid in current_watches and not include_watches_replace:
logger.debug(f"Restore: skipping existing watch {uuid} (replace not requested)")
skipped_watches += 1
continue
try:
with open(watch_json_path, 'r', encoding='utf-8') as f:
watch_data = json.load(f)
except (json.JSONDecodeError, IOError) as e:
logger.error(f"Restore: failed to read watch.json for {uuid}: {e}")
continue
url = watch_data.get('url', uuid)
logger.debug(f"Restore: importing watch '{url}' ({uuid})")
# Copy UUID directory first so data_dir and history files exist
dst_dir = os.path.join(datastore.datastore_path, uuid)
if os.path.exists(dst_dir):
shutil.rmtree(dst_dir)
shutil.copytree(entry.path, dst_dir)
# Mirror _load_watches / rehydrate_entity
watch_data['uuid'] = uuid
watch_obj = datastore.rehydrate_entity(uuid, watch_data)
current_watches[uuid] = watch_obj
watch_obj.commit()
restored_watches += 1
logger.success(f"Restore: watch '{url}' ({uuid}) restored")
logger.debug(f"Restore: scan complete - groups {restored_groups} restored / {skipped_groups} skipped, "
f"watches {restored_watches} restored / {skipped_watches} skipped")
# Persist changedetection.json (includes the updated tags dict)
logger.debug("Restore: committing datastore settings")
datastore.commit()
return {
'restored_groups': restored_groups,
'skipped_groups': skipped_groups,
'restored_watches': restored_watches,
'skipped_watches': skipped_watches,
}
def construct_restore_blueprint(datastore):
restore_blueprint = Blueprint('restore', __name__, template_folder="templates")
restore_threads = []
@login_optionally_required
@restore_blueprint.route("/restore", methods=['GET'])
def restore():
form = RestoreForm()
return render_template("backup_restore.html",
form=form,
restore_running=any(t.is_alive() for t in restore_threads))
@login_optionally_required
@restore_blueprint.route("/restore/start", methods=['POST'])
def backups_restore_start():
if any(t.is_alive() for t in restore_threads):
flash(gettext("A restore is already running, check back in a few minutes"), "error")
return redirect(url_for('backups.restore.restore'))
zip_file = request.files.get('zip_file')
if not zip_file or not zip_file.filename:
flash(gettext("No file uploaded"), "error")
return redirect(url_for('backups.restore.restore'))
if not zip_file.filename.lower().endswith('.zip'):
flash(gettext("File must be a .zip backup file"), "error")
return redirect(url_for('backups.restore.restore'))
# Read into memory now — the request stream is gone once we return
try:
zip_bytes = io.BytesIO(zip_file.read())
zipfile.ZipFile(zip_bytes) # quick validity check before spawning
zip_bytes.seek(0)
except zipfile.BadZipFile:
flash(gettext("Invalid or corrupted zip file"), "error")
return redirect(url_for('backups.restore.restore'))
include_groups = request.form.get('include_groups') == 'y'
include_groups_replace = request.form.get('include_groups_replace_existing') == 'y'
include_watches = request.form.get('include_watches') == 'y'
include_watches_replace = request.form.get('include_watches_replace_existing') == 'y'
restore_thread = threading.Thread(
target=import_from_zip,
kwargs={
'zip_stream': zip_bytes,
'datastore': datastore,
'include_groups': include_groups,
'include_groups_replace': include_groups_replace,
'include_watches': include_watches,
'include_watches_replace': include_watches_replace,
},
daemon=True,
name="BackupRestore"
)
restore_thread.start()
restore_threads.append(restore_thread)
flash(gettext("Restore started in background, check back in a few minutes."))
return redirect(url_for('backups.restore.restore'))
return restore_blueprint

View File

@@ -0,0 +1,49 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_simple_field, render_field %}
<div class="edit-form">
<div class="tabs collapsable">
<ul>
<li class="tab active" id=""><a href="{{ url_for('backups.create') }}">{{ _('Create') }}</a></li>
<li class="tab"><a href="{{ url_for('backups.restore.restore') }}">{{ _('Restore') }}</a></li>
</ul>
</div>
<div class="box-wrap inner">
<div id="general">
{% if backup_running %}
<p>
<span class="spinner"></span>&nbsp;<strong>{{ _('A backup is running!') }}</strong>
</p>
{% endif %}
<p>
{{ _('Here you can download and request a new backup, when a backup is completed you will see it listed below.') }}
</p>
<br>
{% if available_backups %}
<ul>
{% for backup in available_backups %}
<li>
<a href="{{ url_for('backups.download_backup', filename=backup["filename"]) }}">{{ backup["filename"] }}</a> {{ backup["filesize"] }} {{ _('Mb') }}
</li>
{% endfor %}
</ul>
{% else %}
<p>
<strong>{{ _('No backups found.') }}</strong>
</p>
{% endif %}
<a class="pure-button pure-button-primary"
href="{{ url_for('backups.request_backup') }}">{{ _('Create backup') }}</a>
{% if available_backups %}
<a class="pure-button button-small button-error "
href="{{ url_for('backups.remove_backups') }}">{{ _('Remove backups') }}</a>
{% endif %}
</div>
</div>
</div>
{% endblock %}

View File

@@ -0,0 +1,58 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_field, render_checkbox_field %}
<div class="edit-form">
<div class="tabs collapsable">
<ul>
<li class="tab"><a href="{{ url_for('backups.create') }}">{{ _('Create') }}</a></li>
<li class="tab active"><a href="{{ url_for('backups.restore.restore') }}">{{ _('Restore') }}</a></li>
</ul>
</div>
<div class="box-wrap inner">
<div id="general">
{% if restore_running %}
<p>
<span class="spinner"></span>&nbsp;<strong>{{ _('A restore is running!') }}</strong>
</p>
{% endif %}
<p>{{ _('Restore a backup. Must be a .zip backup file created on/after v0.53.1 (new database layout).') }}</p>
<p>{{ _('Note: This does not override the main application settings, only watches and groups.') }}</p>
<form class="pure-form pure-form-stacked settings"
action="{{ url_for('backups.restore.backups_restore_start') }}"
method="POST"
enctype="multipart/form-data">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<div class="pure-control-group">
{{ render_checkbox_field(form.include_groups) }}
<span class="pure-form-message-inline">{{ _('Include all groups found in backup?') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.include_groups_replace_existing) }}
<span class="pure-form-message-inline">{{ _('Replace any existing groups of the same UUID?') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.include_watches) }}
<span class="pure-form-message-inline">{{ _('Include all watches found in backup?') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.include_watches_replace_existing) }}
<span class="pure-form-message-inline">{{ _('Replace any existing watches of the same UUID?') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(form.zip_file) }}
</div>
<div class="pure-controls">
<button type="submit" class="pure-button pure-button-primary">{{ _('Restore backup') }}</button>
</div>
</form>
</div>
</div>
</div>
{% endblock %}

View File

@@ -1,36 +0,0 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_simple_field, render_field %}
<div class="edit-form">
<div class="box-wrap inner">
<h4>Backups</h4>
{% if backup_running %}
<p>
<strong>A backup is running!</strong>
</p>
{% endif %}
<p>
Here you can download and request a new backup, when a backup is completed you will see it listed below.
</p>
<br>
{% if available_backups %}
<ul>
{% for backup in available_backups %}
<li><a href="{{ url_for('backups.download_backup', filename=backup["filename"]) }}">{{ backup["filename"] }}</a> {{ backup["filesize"] }} Mb</li>
{% endfor %}
</ul>
{% else %}
<p>
<strong>No backups found.</strong>
</p>
{% endif %}
<a class="pure-button pure-button-primary" href="{{ url_for('backups.request_backup') }}">Create backup</a>
{% if available_backups %}
<a class="pure-button button-small button-error " href="{{ url_for('backups.remove_backups') }}">Remove backups</a>
{% endif %}
</div>
</div>
{% endblock %}

View File

@@ -21,37 +21,160 @@ from changedetectionio.flask_app import login_optionally_required
from loguru import logger
browsersteps_sessions = {}
browsersteps_watch_to_session = {} # Maps watch_uuid -> browsersteps_session_id
io_interface_context = None
import json
import hashlib
from flask import Response
import asyncio
import threading
import time
def run_async_in_browser_loop(coro):
"""Run async coroutine using the existing async worker event loop"""
from changedetectionio import worker_handler
# Use the existing async worker event loop instead of creating a new one
if worker_handler.USE_ASYNC_WORKERS and worker_handler.async_loop and not worker_handler.async_loop.is_closed():
logger.debug("Browser steps using existing async worker event loop")
future = asyncio.run_coroutine_threadsafe(coro, worker_handler.async_loop)
return future.result()
else:
# Fallback: create a new event loop (for sync workers or if async loop not available)
logger.debug("Browser steps creating temporary event loop")
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
# Dedicated event loop for ALL browser steps sessions
_browser_steps_loop = None
_browser_steps_thread = None
_browser_steps_loop_lock = threading.Lock()
def _start_browser_steps_loop():
"""Start a dedicated event loop for browser steps in its own thread"""
global _browser_steps_loop
# Create and set the event loop for this thread
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
_browser_steps_loop = loop
logger.debug("Browser steps event loop started")
try:
# Run the loop forever - handles all browsersteps sessions
loop.run_forever()
except Exception as e:
logger.error(f"Browser steps event loop error: {e}")
finally:
try:
return loop.run_until_complete(coro)
# Cancel all remaining tasks
pending = asyncio.all_tasks(loop)
for task in pending:
task.cancel()
# Wait for tasks to finish cancellation
if pending:
loop.run_until_complete(asyncio.gather(*pending, return_exceptions=True))
except Exception as e:
logger.debug(f"Error during browser steps loop cleanup: {e}")
finally:
loop.close()
logger.debug("Browser steps event loop closed")
def _ensure_browser_steps_loop():
"""Ensure the browser steps event loop is running"""
global _browser_steps_loop, _browser_steps_thread
with _browser_steps_loop_lock:
if _browser_steps_thread is None or not _browser_steps_thread.is_alive():
logger.debug("Starting browser steps event loop thread")
_browser_steps_thread = threading.Thread(
target=_start_browser_steps_loop,
daemon=True,
name="BrowserStepsEventLoop"
)
_browser_steps_thread.start()
# Wait for the loop to be ready
timeout = 5.0
start_time = time.time()
while _browser_steps_loop is None:
if time.time() - start_time > timeout:
raise RuntimeError("Browser steps event loop failed to start")
time.sleep(0.01)
logger.debug("Browser steps event loop thread started and ready")
def run_async_in_browser_loop(coro):
"""Run async coroutine using the dedicated browser steps event loop"""
_ensure_browser_steps_loop()
if _browser_steps_loop and not _browser_steps_loop.is_closed():
logger.debug("Browser steps using dedicated event loop")
future = asyncio.run_coroutine_threadsafe(coro, _browser_steps_loop)
return future.result()
else:
raise RuntimeError("Browser steps event loop is not available")
def cleanup_expired_sessions():
"""Remove expired browsersteps sessions and cleanup their resources"""
global browsersteps_sessions, browsersteps_watch_to_session
expired_session_ids = []
# Find expired sessions
for session_id, session_data in browsersteps_sessions.items():
browserstepper = session_data.get('browserstepper')
if browserstepper and browserstepper.has_expired:
expired_session_ids.append(session_id)
# Cleanup expired sessions
for session_id in expired_session_ids:
logger.debug(f"Cleaning up expired browsersteps session {session_id}")
session_data = browsersteps_sessions[session_id]
# Cleanup playwright resources asynchronously
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
# Remove from watch mapping
for watch_uuid, mapped_session_id in list(browsersteps_watch_to_session.items()):
if mapped_session_id == session_id:
del browsersteps_watch_to_session[watch_uuid]
break
if expired_session_ids:
logger.info(f"Cleaned up {len(expired_session_ids)} expired browsersteps session(s)")
def cleanup_session_for_watch(watch_uuid):
"""Cleanup a specific browsersteps session for a watch UUID"""
global browsersteps_sessions, browsersteps_watch_to_session
session_id = browsersteps_watch_to_session.get(watch_uuid)
if not session_id:
logger.debug(f"No browsersteps session found for watch {watch_uuid}")
return
logger.debug(f"Cleaning up browsersteps session {session_id} for watch {watch_uuid}")
session_data = browsersteps_sessions.get(session_id)
if session_data:
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id} for watch {watch_uuid}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
# Remove from watch mapping
del browsersteps_watch_to_session[watch_uuid]
logger.debug(f"Cleaned up session for watch {watch_uuid}")
# Opportunistically cleanup any other expired sessions
cleanup_expired_sessions()
def construct_blueprint(datastore: ChangeDetectionStore):
browser_steps_blueprint = Blueprint('browser_steps', __name__, template_folder="templates")
async def start_browsersteps_session(watch_uuid):
from . import browser_steps
from changedetectionio.browser_steps import browser_steps
import time
from playwright.async_api import async_playwright
@@ -115,7 +238,6 @@ def construct_blueprint(datastore: ChangeDetectionStore):
@browser_steps_blueprint.route("/browsersteps_start_session", methods=['GET'])
def browsersteps_start_session():
# A new session was requested, return sessionID
import asyncio
import uuid
browsersteps_session_id = str(uuid.uuid4())
watch_uuid = request.args.get('uuid')
@@ -123,6 +245,9 @@ def construct_blueprint(datastore: ChangeDetectionStore):
if not watch_uuid:
return make_response('No Watch UUID specified', 500)
# Cleanup any existing session for this watch
cleanup_session_for_watch(watch_uuid)
logger.debug("Starting connection with playwright")
logger.debug("browser_steps.py connecting")
@@ -131,6 +256,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
browsersteps_sessions[browsersteps_session_id] = run_async_in_browser_loop(
start_browsersteps_session(watch_uuid)
)
# Store the mapping of watch_uuid -> browsersteps_session_id
browsersteps_watch_to_session[watch_uuid] = browsersteps_session_id
except Exception as e:
if 'ECONNREFUSED' in str(e):
return make_response('Unable to start the Playwright Browser session, is sockpuppetbrowser running? Network configuration is OK?', 401)
@@ -155,8 +284,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
watch = datastore.data['watching'].get(uuid)
filename = f"step_before-{step_n}.jpeg" if request.args.get('type', '') == 'before' else f"step_{step_n}.jpeg"
if step_n and watch and os.path.isfile(os.path.join(watch.watch_data_dir, filename)):
response = make_response(send_from_directory(directory=watch.watch_data_dir, path=filename))
if step_n and watch and os.path.isfile(os.path.join(watch.data_dir, filename)):
response = make_response(send_from_directory(directory=watch.data_dir, path=filename))
response.headers['Content-type'] = 'image/jpeg'
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
@@ -171,11 +300,10 @@ def construct_blueprint(datastore: ChangeDetectionStore):
@browser_steps_blueprint.route("/browsersteps_update", methods=['POST'])
def browsersteps_ui_update():
import base64
import playwright._impl._errors
from changedetectionio.blueprint.browser_steps import browser_steps
remaining =0
remaining = 0
uuid = request.args.get('uuid')
goto_website_url_first_step = request.args.get('goto_website_url_first_step')
browsersteps_session_id = request.args.get('browsersteps_session_id')
@@ -186,33 +314,33 @@ def construct_blueprint(datastore: ChangeDetectionStore):
return make_response('No session exists under that ID', 500)
is_last_step = False
# Actions - step/apply/etc, do the thing and return state
if request.method == 'POST':
# @todo - should always be an existing session
# @todo - should always be an existing session
if goto_website_url_first_step:
logger.debug("Going to site (requested automatically before stepping)..")
step_operation = "Goto site"
step_selector = None
step_optional_value = None
else:
step_operation = request.form.get('operation')
step_selector = request.form.get('selector')
step_optional_value = request.form.get('optional_value')
is_last_step = strtobool(request.form.get('is_last_step'))
try:
# Run the async call_action method in the dedicated browser steps event loop
run_async_in_browser_loop(
browsersteps_sessions[browsersteps_session_id]['browserstepper'].call_action(
action_name=step_operation,
selector=step_selector,
optional_value=step_optional_value
)
try:
# Run the async call_action method in the dedicated browser steps event loop
run_async_in_browser_loop(
browsersteps_sessions[browsersteps_session_id]['browserstepper'].call_action(
action_name=step_operation,
selector=step_selector,
optional_value=step_optional_value
)
)
except Exception as e:
logger.error(f"Exception when calling step operation {step_operation} {str(e)}")
# Try to find something of value to give back to the user
return make_response(str(e).splitlines()[0], 401)
# if not this_session.page:
# cleanup_playwright_session()
# return make_response('Browser session ran out of time :( Please reload this page.', 401)
except Exception as e:
logger.error(f"Exception when calling step operation {step_operation} {str(e)}")
# Try to find something of value to give back to the user
return make_response(str(e).splitlines()[0], 401)
# Screenshots and other info only needed on requesting a step (POST)
try:
@@ -220,7 +348,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
(screenshot, xpath_data) = run_async_in_browser_loop(
browsersteps_sessions[browsersteps_session_id]['browserstepper'].get_current_state()
)
if is_last_step:
watch = datastore.data['watching'].get(uuid)
u = browsersteps_sessions[browsersteps_session_id]['browserstepper'].page.url

View File

@@ -94,13 +94,13 @@ def construct_blueprint(datastore: ChangeDetectionStore):
return results
@login_required
@check_proxies_blueprint.route("/<string:uuid>/status", methods=['GET'])
@check_proxies_blueprint.route("/<uuid_str:uuid>/status", methods=['GET'])
def get_recheck_status(uuid):
results = _recalc_check_status(uuid=uuid)
return results
@login_required
@check_proxies_blueprint.route("/<string:uuid>/start", methods=['GET'])
@check_proxies_blueprint.route("/<uuid_str:uuid>/start", methods=['GET'])
def start_check(uuid):
if not datastore.proxy_list:

View File

@@ -1,13 +1,8 @@
from flask import Blueprint, request, redirect, url_for, flash, render_template
from loguru import logger
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import worker_handler
from changedetectionio.blueprint.imports.importer import (
import_url_list,
import_distill_io_json,
import_xlsx_wachete,
import_xlsx_custom
)
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData):
import_blueprint = Blueprint('imports', __name__, template_folder="templates")
@@ -17,15 +12,27 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
def import_page():
remaining_urls = []
from changedetectionio import forms
#
if request.method == 'POST':
# from changedetectionio import worker_pool
from changedetectionio.blueprint.imports.importer import (
import_url_list,
import_distill_io_json,
import_xlsx_wachete,
import_xlsx_custom
)
# URL List import
if request.values.get('urls') and len(request.values.get('urls').strip()):
# Import and push into the queue for immediate update check
from changedetectionio import processors
importer_handler = import_url_list()
importer_handler.run(data=request.values.get('urls'), flash=flash, datastore=datastore, processor=request.values.get('processor', 'text_json_diff'))
for uuid in importer_handler.new_uuids:
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
importer_handler.run(data=request.values.get('urls'), flash=flash, datastore=datastore, processor=request.values.get('processor', processors.get_default_processor()))
logger.debug(f"Imported {len(importer_handler.new_uuids)} new UUIDs")
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
if len(importer_handler.remaining_data) == 0:
return redirect(url_for('watchlist.index'))
@@ -37,8 +44,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Import and push into the queue for immediate update check
d_importer = import_distill_io_json()
d_importer.run(data=request.values.get('distill-io'), flash=flash, datastore=datastore)
for uuid in d_importer.new_uuids:
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# XLSX importer
if request.files and request.files.get('xlsx_file'):
@@ -60,8 +69,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
w_importer.import_profile = map
w_importer.run(data=file, flash=flash, datastore=datastore)
for uuid in w_importer.new_uuids:
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Dont' add to queue because scheduler can see that they haven't been checked and will add them to the queue
# for uuid in importer_handler.new_uuids:
# worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Could be some remaining, or we could be on GET
form = forms.importForm(formdata=request.form if request.method == 'POST' else None)

View File

@@ -2,6 +2,7 @@ from abc import abstractmethod
import time
from wtforms import ValidationError
from loguru import logger
from flask_babel import gettext
from changedetectionio.forms import validate_url
@@ -41,7 +42,7 @@ class import_url_list(Importer):
now = time.time()
if (len(urls) > 5000):
flash("Importing 5,000 of the first URLs from your list, the rest can be imported again.")
flash(gettext("Importing 5,000 of the first URLs from your list, the rest can be imported again."))
for url in urls:
url = url.strip()
@@ -61,7 +62,7 @@ class import_url_list(Importer):
extras = None
if processor:
extras = {'processor': processor}
new_uuid = datastore.add_watch(url=url.strip(), tag=tags, write_to_disk_now=False, extras=extras)
new_uuid = datastore.add_watch(url=url.strip(), tag=tags, save_immediately=False, extras=extras)
if new_uuid:
# Straight into the queue.
@@ -74,7 +75,7 @@ class import_url_list(Importer):
self.remaining_data = []
self.remaining_data.append(url)
flash("{} Imported from list in {:.2f}s, {} Skipped.".format(good, time.time() - now, len(self.remaining_data)))
flash(gettext("{} Imported from list in {:.2f}s, {} Skipped.").format(good, time.time() - now, len(self.remaining_data)))
class import_distill_io_json(Importer):
@@ -94,11 +95,11 @@ class import_distill_io_json(Importer):
try:
data = json.loads(data.strip())
except json.decoder.JSONDecodeError:
flash("Unable to read JSON file, was it broken?", 'error')
flash(gettext("Unable to read JSON file, was it broken?"), 'error')
return
if not data.get('data'):
flash("JSON structure looks invalid, was it broken?", 'error')
flash(gettext("JSON structure looks invalid, was it broken?"), 'error')
return
for d in data.get('data'):
@@ -128,14 +129,14 @@ class import_distill_io_json(Importer):
new_uuid = datastore.add_watch(url=d['uri'].strip(),
tag=",".join(d.get('tags', [])),
extras=extras,
write_to_disk_now=False)
save_immediately=False)
if new_uuid:
# Straight into the queue.
self.new_uuids.append(new_uuid)
good += 1
flash("{} Imported from Distill.io in {:.2f}s, {} Skipped.".format(len(self.new_uuids), time.time() - now, len(self.remaining_data)))
flash(gettext("{} Imported from Distill.io in {:.2f}s, {} Skipped.").format(len(self.new_uuids), time.time() - now, len(self.remaining_data)))
class import_xlsx_wachete(Importer):
@@ -156,7 +157,7 @@ class import_xlsx_wachete(Importer):
wb = load_workbook(data)
except Exception as e:
# @todo correct except
flash("Unable to read export XLSX file, something wrong with the file?", 'error')
flash(gettext("Unable to read export XLSX file, something wrong with the file?"), 'error')
return
row_id = 2
@@ -196,26 +197,25 @@ class import_xlsx_wachete(Importer):
validate_url(data.get('url'))
except ValidationError as e:
logger.error(f">> Import URL error {data.get('url')} {str(e)}")
flash(f"Error processing row number {row_id}, URL value was incorrect, row was skipped.", 'error')
flash(gettext("Error processing row number {}, URL value was incorrect, row was skipped.").format(row_id), 'error')
# Don't bother processing anything else on this row
continue
new_uuid = datastore.add_watch(url=data['url'].strip(),
extras=extras,
tag=data.get('folder'),
write_to_disk_now=False)
save_immediately=False)
if new_uuid:
# Straight into the queue.
self.new_uuids.append(new_uuid)
good += 1
except Exception as e:
logger.error(e)
flash(f"Error processing row number {row_id}, check all cell data types are correct, row was skipped.", 'error')
flash(gettext("Error processing row number {}, check all cell data types are correct, row was skipped.").format(row_id), 'error')
else:
row_id += 1
flash(
"{} imported from Wachete .xlsx in {:.2f}s".format(len(self.new_uuids), time.time() - now))
flash(gettext("{} imported from Wachete .xlsx in {:.2f}s").format(len(self.new_uuids), time.time() - now))
class import_xlsx_custom(Importer):
@@ -236,7 +236,7 @@ class import_xlsx_custom(Importer):
wb = load_workbook(data)
except Exception as e:
# @todo correct except
flash("Unable to read export XLSX file, something wrong with the file?", 'error')
flash(gettext("Unable to read export XLSX file, something wrong with the file?"), 'error')
return
# @todo cehck atleast 2 rows, same in other method
@@ -265,7 +265,7 @@ class import_xlsx_custom(Importer):
validate_url(url)
except ValidationError as e:
logger.error(f">> Import URL error {url} {str(e)}")
flash(f"Error processing row number {row_i}, URL value was incorrect, row was skipped.", 'error')
flash(gettext("Error processing row number {}, URL value was incorrect, row was skipped.").format(row_i), 'error')
# Don't bother processing anything else on this row
url = None
break
@@ -287,16 +287,15 @@ class import_xlsx_custom(Importer):
new_uuid = datastore.add_watch(url=url,
extras=extras,
tag=tags,
write_to_disk_now=False)
save_immediately=False)
if new_uuid:
# Straight into the queue.
self.new_uuids.append(new_uuid)
good += 1
except Exception as e:
logger.error(e)
flash(f"Error processing row number {row_i}, check all cell data types are correct, row was skipped.", 'error')
flash(gettext("Error processing row number {}, check all cell data types are correct, row was skipped.").format(row_i), 'error')
else:
row_i += 1
flash(
"{} imported from custom .xlsx in {:.2f}s".format(len(self.new_uuids), time.time() - now))
flash(gettext("{} imported from custom .xlsx in {:.2f}s").format(len(self.new_uuids), time.time() - now))

View File

@@ -6,9 +6,9 @@
<div class="tabs collapsable">
<ul>
<li class="tab" id=""><a href="#url-list">URL List</a></li>
<li class="tab"><a href="#distill-io">Distill.io</a></li>
<li class="tab"><a href="#xlsx">.XLSX &amp; Wachete</a></li>
<li class="tab" id=""><a href="#url-list">{{ _('URL List') }}</a></li>
<li class="tab"><a href="#distill-io">{{ _('Distill.io') }}</a></li>
<li class="tab"><a href="#xlsx">{{ _('.XLSX & Wachete') }}</a></li>
</ul>
</div>
@@ -16,12 +16,16 @@
<form class="pure-form" action="{{url_for('imports.import_page')}}" method="POST" enctype="multipart/form-data">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<div class="tab-pane-inner" id="url-list">
<p>
{{ _('Restoring changedetection.io backups is in the') }}<a href="{{ url_for('backups.restore.restore') }}"> {{ _('backups section') }}</a>.
<br>
</p>
<div class="pure-control-group">
Enter one URL per line, and optionally add tags for each URL after a space, delineated by comma
(,):
{{ _('Enter one URL per line, and optionally add tags for each URL after a space, delineated by comma (,):') }}
<br>
<p><strong>Example: </strong><code>https://example.com tag1, tag2, last tag</code></p>
URLs which do not pass validation will stay in the textarea.
<p><strong>{{ _('Example:') }} </strong><code>https://example.com tag1, tag2, last tag</code></p>
{{ _('URLs which do not pass validation will stay in the textarea.') }}
</div>
{{ render_field(form.processor, class="processor") }}
@@ -38,20 +42,15 @@
</div>
<div class="tab-pane-inner" id="distill-io">
<div class="pure-control-group">
Copy and Paste your Distill.io watch 'export' file, this should be a JSON file.<br>
This is <i>experimental</i>, supported fields are <code>name</code>, <code>uri</code>, <code>tags</code>, <code>config:selections</code>, the rest (including <code>schedule</code>) are ignored.
{{ _('Copy and Paste your Distill.io watch \'export\' file, this should be a JSON file.') }}<br>
{{ _('This is') }} <i>{{ _('experimental') }}</i>, {{ _('supported fields are') }} <code>name</code>, <code>uri</code>, <code>tags</code>, <code>config:selections</code>, {{ _('the rest (including') }} <code>schedule</code>) {{ _('are ignored.') }}
<br>
<p>
How to export? <a href="https://distill.io/docs/web-monitor/how-export-and-import-monitors/">https://distill.io/docs/web-monitor/how-export-and-import-monitors/</a><br>
Be sure to set your default fetcher to Chrome if required.<br>
{{ _('How to export?') }} <a href="https://distill.io/docs/web-monitor/how-export-and-import-monitors/">https://distill.io/docs/web-monitor/how-export-and-import-monitors/</a><br>
{{ _('Be sure to set your default fetcher to Chrome if required.') }}<br>
</p>
</div>
<textarea name="distill-io" class="pure-input-1-2" style="width: 100%;
font-family:monospace;
white-space: pre;
@@ -89,32 +88,33 @@
</fieldset>
<div class="pure-control-group">
<span class="pure-form-message-inline">
Table of custom column and data types mapping for the <strong>Custom mapping</strong> File mapping type.
{{ _('Table of custom column and data types mapping for the') }} <strong>{{ _('Custom mapping') }}</strong> {{ _('File mapping type.') }}
</span>
<table style="border: 1px solid #aaa; padding: 0.5rem; border-radius: 4px;">
<tr>
<td><strong>Column #</strong></td>
<td><strong>{{ _('Column #') }}</strong></td>
{% for n in range(4) %}
<td><input type="number" name="custom_xlsx[col_{{n}}]" style="width: 4rem;" min="1"></td>
{% endfor %}
</tr>
<tr>
<td><strong>Type</strong></td>
<td><strong>{{ _('Type') }}</strong></td>
{% for n in range(4) %}
<td><select name="custom_xlsx[col_type_{{n}}]">
<option value="" style="color: #aaa"> -- none --</option>
<option value="url">URL</option>
<option value="title">Title</option>
<option value="include_filters">CSS/xPath filter</option>
<option value="tag">Group / Tag name(s)</option>
<option value="interval_minutes">Recheck time (minutes)</option>
<option value="" style="color: #aaa"> -- {{ _('none') }} --</option>
<option value="url">{{ _('URL') }}</option>
<option value="title">{{ _('Title') }}</option>
<option value="include_filters">{{ _('CSS/xPath filter') }}</option>
<option value="tag">{{ _('Group / Tag name(s)') }}</option>
<option value="interval_minutes">{{ _('Recheck time (minutes)') }}</option>
</select></td>
{% endfor %}
</tr>
</table>
</div>
</div>
<button type="submit" class="pure-button pure-input-1-2 pure-button-primary">Import</button>
<button type="submit" class="pure-button pure-input-1-2 pure-button-primary">{{ _('Import') }}</button>
</form>
</div>

View File

@@ -4,7 +4,7 @@ from flask import Blueprint, flash, redirect, url_for
from flask_login import login_required
from changedetectionio.store import ChangeDetectionStore
from changedetectionio import queuedWatchMetaData
from changedetectionio import worker_handler
from changedetectionio import worker_pool
from queue import PriorityQueue
PRICE_DATA_TRACK_ACCEPT = 'accepted'
@@ -15,18 +15,20 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q: PriorityQueue
price_data_follower_blueprint = Blueprint('price_data_follower', __name__)
@login_required
@price_data_follower_blueprint.route("/<string:uuid>/accept", methods=['GET'])
@price_data_follower_blueprint.route("/<uuid_str:uuid>/accept", methods=['GET'])
def accept(uuid):
datastore.data['watching'][uuid]['track_ldjson_price_data'] = PRICE_DATA_TRACK_ACCEPT
datastore.data['watching'][uuid]['processor'] = 'restock_diff'
datastore.data['watching'][uuid].clear_watch()
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
datastore.data['watching'][uuid].commit()
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
return redirect(url_for("watchlist.index"))
@login_required
@price_data_follower_blueprint.route("/<string:uuid>/reject", methods=['GET'])
@price_data_follower_blueprint.route("/<uuid_str:uuid>/reject", methods=['GET'])
def reject(uuid):
datastore.data['watching'][uuid]['track_ldjson_price_data'] = PRICE_DATA_TRACK_REJECT
datastore.data['watching'][uuid].commit()
return redirect(url_for("watchlist.index"))

View File

@@ -81,7 +81,7 @@ def construct_main_feed_routes(rss_blueprint, datastore):
timestamp_from = dates[-2]
guid = generate_watch_guid(watch, timestamp_to)
# Because we are called via whatever web server, flask should figure out the right path
diff_link = {'href': url_for('ui.ui_views.diff_history_page', uuid=watch['uuid'], _external=True)}
diff_link = {'href': url_for('ui.ui_diff.diff_history_page', uuid=watch['uuid'], _external=True)}
# Get template and build notification context
n_body_template = get_rss_template(datastore, watch, rss_content_format,

View File

@@ -9,11 +9,12 @@ def construct_single_watch_routes(rss_blueprint, datastore):
datastore: The ChangeDetectionStore instance
"""
@rss_blueprint.route("/watch/<string:uuid>", methods=['GET'])
@rss_blueprint.route("/watch/<uuid_str:uuid>", methods=['GET'])
def rss_single_watch(uuid):
import time
from flask import make_response, request
from flask import make_response, request, Response
from flask_babel import lazy_gettext as _l
from feedgen.feed import FeedGenerator
from loguru import logger
@@ -37,18 +38,17 @@ def construct_single_watch_routes(rss_blueprint, datastore):
rss_content_format = datastore.data['settings']['application'].get('rss_content_format')
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# Get the watch by UUID
watch = datastore.data['watching'].get(uuid)
if not watch:
return f"Watch with UUID {uuid} not found", 404
return Response(_l("Watch with UUID %(uuid)s not found", uuid=uuid), status=404, mimetype='text/plain')
# Check if watch has at least 2 history snapshots
dates = list(watch.history.keys())
if len(dates) < 2:
return f"Watch {uuid} does not have enough history snapshots to show changes (need at least 2)", 400
# Add uuid to watch for proper functioning
watch['uuid'] = uuid
return Response(_l("Watch %(uuid)s does not have enough history snapshots to show changes (need at least 2)", uuid=uuid), status=400, mimetype='text/plain')
# Get the number of diffs to include (default: 5)
rss_diff_length = datastore.data['settings']['application'].get('rss_diff_length', 5)
@@ -101,7 +101,7 @@ def construct_single_watch_routes(rss_blueprint, datastore):
date_index_from, date_index_to)
# Create and populate feed entry
guid = f"{watch['uuid']}/{timestamp_to}"
guid = f"{uuid}/{timestamp_to}"
fe = fg.add_entry()
title_suffix = f"Change @ {res['original_context']['change_datetime']}"
populate_feed_entry(fe, watch, res.get('body', ''), guid, timestamp_to,

View File

@@ -63,11 +63,8 @@ def construct_tag_routes(rss_blueprint, datastore):
# Only include unviewed watches
if not watch.viewed:
# Add uuid to watch for proper functioning
watch['uuid'] = uuid
# Include a link to the diff page
diff_link = {'href': url_for('ui.ui_views.diff_history_page', uuid=watch['uuid'], _external=True)}
# Include a link to the diff page (use uuid from loop, don't modify watch dict)
diff_link = {'href': url_for('ui.ui_diff.diff_history_page', uuid=uuid, _external=True)}
# Get watch label
watch_label = get_watch_label(datastore, watch)

View File

@@ -1,10 +1,12 @@
import os
from copy import deepcopy
from datetime import datetime
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo, available_timezones
import secrets
import time
import flask_login
from flask import Blueprint, render_template, request, redirect, url_for, flash
from flask_babel import gettext
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
@@ -17,6 +19,12 @@ def construct_blueprint(datastore: ChangeDetectionStore):
@login_optionally_required
def settings_page():
from changedetectionio import forms
from changedetectionio.pluggy_interface import (
get_plugin_settings_tabs,
load_plugin_settings,
save_plugin_settings
)
default = deepcopy(datastore.data['settings'])
if datastore.proxy_list is not None:
@@ -54,7 +62,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# SALTED_PASS means the password is "locked" to what we set in the Env var
if not os.getenv("SALTED_PASS", False):
datastore.remove_password()
flash("Password protection removed.", 'notice')
flash(gettext("Password protection removed."), 'notice')
flask_login.logout_user()
return redirect(url_for('settings.settings_page'))
@@ -67,51 +75,100 @@ def construct_blueprint(datastore: ChangeDetectionStore):
del (app_update['password'])
datastore.data['settings']['application'].update(app_update)
# Handle dynamic worker count adjustment
old_worker_count = datastore.data['settings']['requests'].get('workers', 1)
new_worker_count = form.data['requests'].get('workers', 1)
datastore.data['settings']['requests'].update(form.data['requests'])
datastore.commit()
# Clear all checksums to force reprocessing with new settings
# Global settings can affect watch behavior (filters, rendering, etc.)
datastore.clear_all_last_checksums()
# Adjust worker count if it changed
if new_worker_count != old_worker_count:
from changedetectionio import worker_handler
from changedetectionio import worker_pool
from changedetectionio.flask_app import update_q, notification_q, app, datastore as ds
result = worker_handler.adjust_async_worker_count(
# Check CPU core availability and warn if worker count is high
cpu_count = os.cpu_count()
if cpu_count and new_worker_count >= (cpu_count * 0.9):
flash(gettext("Warning: Worker count ({}) is close to or exceeds available CPU cores ({})").format(
new_worker_count, cpu_count), 'warning')
result = worker_pool.adjust_async_worker_count(
new_count=new_worker_count,
update_q=update_q,
notification_q=notification_q,
app=app,
datastore=ds
)
if result['status'] == 'success':
flash(f"Worker count adjusted: {result['message']}", 'notice')
flash(gettext("Worker count adjusted: {}").format(result['message']), 'notice')
elif result['status'] == 'not_supported':
flash("Dynamic worker adjustment not supported for sync workers", 'warning')
flash(gettext("Dynamic worker adjustment not supported for sync workers"), 'warning')
elif result['status'] == 'error':
flash(f"Error adjusting workers: {result['message']}", 'error')
flash(gettext("Error adjusting workers: {}").format(result['message']), 'error')
if not os.getenv("SALTED_PASS", False) and len(form.application.form.password.encrypted_password):
datastore.data['settings']['application']['password'] = form.application.form.password.encrypted_password
datastore.needs_write_urgent = True
flash("Password protection enabled.", 'notice')
datastore.commit()
flash(gettext("Password protection enabled."), 'notice')
flask_login.logout_user()
return redirect(url_for('watchlist.index'))
datastore.needs_write_urgent = True
flash("Settings updated.")
# Also save plugin settings from the same form submission
plugin_tabs_list = get_plugin_settings_tabs()
for tab in plugin_tabs_list:
plugin_id = tab['plugin_id']
form_class = tab['form_class']
# Instantiate plugin form with POST data
plugin_form = form_class(formdata=request.form)
# Save plugin settings (validation is optional for plugins)
if plugin_form.data:
save_plugin_settings(datastore.datastore_path, plugin_id, plugin_form.data)
flash(gettext("Settings updated."))
else:
flash("An error occurred, please see below.", "error")
flash(gettext("An error occurred, please see below."), "error")
# Convert to ISO 8601 format, all date/time relative events stored as UTC time
utc_time = datetime.now(ZoneInfo("UTC")).isoformat()
# Get active plugins
from changedetectionio.pluggy_interface import get_active_plugins
import sys
active_plugins = get_active_plugins()
python_version = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
# Calculate uptime in seconds
uptime_seconds = time.time() - datastore.start_time
# Get plugin settings tabs and instantiate forms
plugin_tabs = get_plugin_settings_tabs()
plugin_forms = {}
for tab in plugin_tabs:
plugin_id = tab['plugin_id']
form_class = tab['form_class']
# Load existing settings
settings = load_plugin_settings(datastore.datastore_path, plugin_id)
# Instantiate the form with existing settings
plugin_forms[plugin_id] = form_class(data=settings)
output = render_template("settings.html",
active_plugins=active_plugins,
api_key=datastore.data['settings']['application'].get('api_access_token'),
python_version=python_version,
uptime_seconds=uptime_seconds,
available_timezones=sorted(available_timezones()),
emailprefix=os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
extra_notification_token_placeholder_info=datastore.get_unique_notification_token_placeholders_available(),
@@ -121,6 +178,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
settings_application=datastore.data['settings']['application'],
timezone_default_config=datastore.data['settings']['application'].get('scheduler_timezone_default'),
utc_time=utc_time,
plugin_tabs=plugin_tabs,
plugin_forms=plugin_forms,
)
return output
@@ -130,8 +189,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
def settings_reset_api_key():
secret = secrets.token_hex(16)
datastore.data['settings']['application']['api_access_token'] = secret
datastore.needs_write_urgent = True
flash("API Key was regenerated.")
datastore.commit()
flash(gettext("API Key was regenerated."))
return redirect(url_for('settings.settings_page')+'#api')
@settings_blueprint.route("/notification-logs", methods=['GET'])
@@ -142,4 +201,32 @@ def construct_blueprint(datastore: ChangeDetectionStore):
logs=notification_debug_log if len(notification_debug_log) else ["Notification logs are empty - no notifications sent yet."])
return output
@settings_blueprint.route("/toggle-all-paused", methods=['GET'])
@login_optionally_required
def toggle_all_paused():
current_state = datastore.data['settings']['application'].get('all_paused', False)
datastore.data['settings']['application']['all_paused'] = not current_state
datastore.commit()
if datastore.data['settings']['application']['all_paused']:
flash(gettext("Automatic scheduling paused - checks will not be queued."), 'notice')
else:
flash(gettext("Automatic scheduling resumed - checks will be queued normally."), 'notice')
return redirect(url_for('watchlist.index'))
@settings_blueprint.route("/toggle-all-muted", methods=['GET'])
@login_optionally_required
def toggle_all_muted():
current_state = datastore.data['settings']['application'].get('all_muted', False)
datastore.data['settings']['application']['all_muted'] = not current_state
datastore.commit()
if datastore.data['settings']['application']['all_muted']:
flash(gettext("All notifications muted."), 'notice')
else:
flash(gettext("All notifications unmuted."), 'notice')
return redirect(url_for('watchlist.index'))
return settings_blueprint

View File

@@ -4,7 +4,7 @@
<div class="edit-form">
<div class="inner">
<h4 style="margin-top: 0px;">Notification debug log</h4>
<h4 style="margin-top: 0px;">{{ _('Notification debug log') }}</h4>
<div id="notification-error-log">
<ul style="font-size: 80%; margin:0px; padding: 0 0 0 7px">
{% for log in logs|reverse %}

View File

@@ -18,15 +18,22 @@
<div class="edit-form">
<div class="tabs collapsable">
<ul>
<li class="tab" id=""><a href="#general">General</a></li>
<li class="tab"><a href="#notifications">Notifications</a></li>
<li class="tab"><a href="#fetching">Fetching</a></li>
<li class="tab"><a href="#filters">Global Filters</a></li>
<li class="tab"><a href="#ui-options">UI Options</a></li>
<li class="tab"><a href="#api">API</a></li>
<li class="tab"><a href="#rss">RSS</a></li>
<li class="tab"><a href="#timedate">Time &amp Date</a></li>
<li class="tab"><a href="#proxies">CAPTCHA &amp; Proxies</a></li>
<li class="tab" id=""><a href="#general">{{ _('General') }}</a></li>
<li class="tab"><a href="#notifications">{{ _('Notifications') }}</a></li>
<li class="tab"><a href="#fetching">{{ _('Fetching') }}</a></li>
<li class="tab"><a href="#filters">{{ _('Global Filters') }}</a></li>
<li class="tab"><a href="#ui-options">{{ _('UI Options') }}</a></li>
<li class="tab"><a href="#api">{{ _('API') }}</a></li>
<li class="tab"><a href="#rss">{{ _('RSS') }}</a></li>
<li class="tab"><a href="{{ url_for('backups.create') }}">{{ _('Backups') }}</a></li>
<li class="tab"><a href="#timedate">{{ _('Time & Date') }}</a></li>
<li class="tab"><a href="#proxies">{{ _('CAPTCHA & Proxies') }}</a></li>
{% if plugin_tabs %}
{% for tab in plugin_tabs %}
<li class="tab"><a href="#plugin-{{ tab.plugin_id }}">{{ tab.tab_label }}</a></li>
{% endfor %}
{% endif %}
<li class="tab"><a href="#info">{{ _('Info') }}</a></li>
</ul>
</div>
<div class="box-wrap inner">
@@ -36,57 +43,73 @@
<fieldset>
<div class="pure-control-group">
{{ render_field(form.requests.form.time_between_check, class="time-check-widget") }}
<span class="pure-form-message-inline">Default recheck time for all watches, current system minimum is <i>{{min_system_recheck_seconds}}</i> seconds (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Misc-system-settings#enviroment-variables">more info</a>).</span>
<span class="pure-form-message-inline">{{ _('Default recheck time for all watches, current system minimum is') }} <i>{{min_system_recheck_seconds}}</i> {{ _('seconds') }} (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Misc-system-settings#enviroment-variables">{{ _('more info') }}</a>).</span>
<div id="time-between-check-schedule">
<!-- Start Time and End Time -->
<!-- Start Time and End Time {{ timezone_default_config }} -->
<div id="limit-between-time">
{{ render_time_schedule_form(form.requests, available_timezones, timezone_default_config) }}
{{ render_time_schedule_form(form.requests, available_timezones, timezone_default_config) }}
</div>
</div>
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.filter_failure_notification_threshold_attempts, class="filter_failure_notification_threshold_attempts") }}
<span class="pure-form-message-inline">After this many consecutive times that the CSS/xPath filter is missing, send a notification
<span class="pure-form-message-inline">{{ _('After this many consecutive times that the CSS/xPath filter is missing, send a notification') }}
<br>
Set to <strong>0</strong> to disable
{{ _('Set to') }} <strong>0</strong> {{ _('to disable') }}
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.history_snapshot_max_length, class="history_snapshot_max_length") }}
<span class="pure-form-message-inline">{{ _('Limit collection of history snapshots for each watch to this number of history items.') }}
<br>
{{ _('Set to empty to disable / no limit') }}
</span>
</div>
<div class="pure-control-group">
{% if not hide_remove_pass %}
{% if current_user.is_authenticated %}
{{ render_button(form.application.form.removepassword_button) }}
{% else %}
{{ render_field(form.application.form.password) }}
<span class="pure-form-message-inline">Password protection for your changedetection.io application.</span>
<span class="pure-form-message-inline">{{ _('Password protection for your changedetection.io application.') }}</span>
{% endif %}
{% else %}
<span class="pure-form-message-inline">Password is locked.</span>
<span class="pure-form-message-inline">{{ _('Password is locked.') }}</span>
{% endif %}
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.shared_diff_access, class="shared_diff_access") }}
<span class="pure-form-message-inline">Allow access to the watch change history page when password is enabled (Good for sharing the diff page)
</span>
<span class="pure-form-message-inline">{{ _('Allow access to the watch change history page when password is enabled (Good for sharing the diff page)') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.empty_pages_are_a_change) }}
<span class="pure-form-message-inline">When a request returns no content, or the HTML does not contain any text, is this considered a change?</span>
<span class="pure-form-message-inline">{{ _('When a request returns no content, or the HTML does not contain any text, is this considered a change?') }}</span>
</div>
{% if form.requests.proxy %}
<div>
<br>
<div class="inline-radio">
{{ render_field(form.requests.form.proxy, class="fetch-backend-proxy") }}
<span class="pure-form-message-inline">{{ _('Choose a default proxy for all watches') }}</span>
</div>
</div>
{% endif %}
</fieldset>
</div>
<div class="tab-pane-inner" id="notifications">
<fieldset>
<div class="field-group">
{{ render_common_settings_form(form.application.form, emailprefix, settings_application, extra_notification_token_placeholder_info) }}
</div>
{{ render_common_settings_form(form.application.form, emailprefix, settings_application, extra_notification_token_placeholder_info) }}
</fieldset>
<div class="pure-control-group" id="notification-base-url">
{{ render_field(form.application.form.base_url, class="m-d") }}
<span class="pure-form-message-inline">
Base URL used for the <code>{{ '{{ base_url }}' }}</code> token in notification links.<br>
Default value is the system environment variable '<code>BASE_URL</code>' - <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Configurable-BASE_URL-setting">read more here</a>.
{{ _('Base URL used for the') }} <code>{{ '{{ base_url }}' }}</code> {{ _('token in notification links.') }}<br>
{{ _('Default value is the system environment variable') }} '<code>BASE_URL</code>' - <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Configurable-BASE_URL-setting">{{ _('read more here') }}</a>.
</span>
</div>
</div>
@@ -95,15 +118,15 @@
<div class="pure-control-group inline-radio">
{{ render_field(form.application.form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched sites don't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server, set by the ENV var 'WEBDRIVER_URL'. </p>
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched sites don\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var') }} 'WEBDRIVER_URL'. </p>
</span>
</div>
<fieldset class="pure-group" id="webdriver-override-options" data-visible-for="application-fetch_backend=html_webdriver">
<div class="pure-form-message-inline">
<strong>If you're having trouble waiting for the page to be fully rendered (text missing etc), try increasing the 'wait' time here.</strong>
<strong>{{ _('If you\'re having trouble waiting for the page to be fully rendered (text missing etc), try increasing the \'wait\' time here.') }}</strong>
<br>
This will wait <i>n</i> seconds before extracting the text.
{{ _('This will wait') }} <i>n</i> {{ _('seconds before extracting the text.') }}
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.webdriver_delay) }}
@@ -112,27 +135,27 @@
<div class="pure-control-group">
{{ render_field(form.requests.form.workers) }}
{% set worker_info = get_worker_status_info() %}
<span class="pure-form-message-inline">Number of concurrent workers to process watches. More workers = faster processing but higher memory usage.<br>
Currently running: <strong>{{ worker_info.count }}</strong> operational {{ worker_info.type }} workers{% if worker_info.active_workers > 0 %} ({{ worker_info.active_workers }} actively processing){% endif %}.</span>
<span class="pure-form-message-inline">{{ _('Number of concurrent workers to process watches. More workers = faster processing but higher memory usage.') }}<br>
{{ _('Currently running:') }} <strong>{{ worker_info.count }}</strong> {{ _('operational') }} {{ worker_info.type }} {{ _('workers') }}{% if worker_info.active_workers > 0 %} ({{ worker_info.active_workers }} {{ _('actively processing') }}){% endif %}.</span>
</div>
<div class="pure-control-group">
{{ render_field(form.requests.form.jitter_seconds, class="jitter_seconds") }}
<span class="pure-form-message-inline">Example - 3 seconds random jitter could trigger up to 3 seconds earlier or up to 3 seconds later</span>
<span class="pure-form-message-inline">{{ _('Example - 3 seconds random jitter could trigger up to 3 seconds earlier or up to 3 seconds later') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(form.requests.form.timeout) }}
<span class="pure-form-message-inline">For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.<br>
<span class="pure-form-message-inline">{{ _('For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.') }}</span><br>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.requests.form.default_ua) }}
<span class="pure-form-message-inline">
Applied to all requests.<br><br>
Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it's important to consider <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">all of the ways that the browser is detected</a>.
{{ _('Applied to all requests.') }}<br><br>
{{ _('Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it\'s important to consider') }} <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">{{ _('all of the ways that the browser is detected') }}</a>.
</span>
</div>
<div class="pure-control-group">
<br>
Tip: <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">Connect using Bright Data and Oxylabs Proxies, find out more here.</a>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
</div>
</div>
@@ -141,15 +164,15 @@
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.ignore_whitespace) }}
<span class="pure-form-message-inline">Ignore whitespace, tabs and new-lines/line-feeds when considering if a change was detected.<br>
<i>Note:</i> Changing this will change the status of your existing watches, possibly trigger alerts etc.
<span class="pure-form-message-inline">{{ _('Ignore whitespace, tabs and new-lines/line-feeds when considering if a change was detected.') }}<br>
<i>{{ _('Note:') }}</i> {{ _('Changing this will change the status of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.render_anchor_tag_content) }}
<span class="pure-form-message-inline">Render anchor tag content, default disabled, when enabled renders links as <code>(link text)[https://somesite.com]</code>
<span class="pure-form-message-inline">{{ _('Render anchor tag content, default disabled, when enabled renders links as') }} <code>(link text)[https://somesite.com]</code>
<br>
<i>Note:</i> Changing this could affect the content of your existing watches, possibly trigger alerts etc.
<i>{{ _('Note:') }}</i> {{ _('Changing this could affect the content of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
<fieldset class="pure-group">
@@ -160,9 +183,9 @@ nav
//*[contains(text(), 'Advertisement')]") }}
<span class="pure-form-message-inline">
<ul>
<li> Remove HTML element(s) by CSS and XPath selectors before text conversion. </li>
<li> Don't paste HTML here, use only CSS and XPath selectors </li>
<li> Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML. </li>
<li> {{ _('Remove HTML element(s) by CSS and XPath selectors before text conversion.') }} </li>
<li> {{ _('Don\'t paste HTML here, use only CSS and XPath selectors') }} </li>
<li> {{ _('Add multiple elements, CSS or XPath selectors per line to ignore multiple parts of the HTML.') }} </li>
</ul>
</span>
</fieldset>
@@ -170,50 +193,50 @@ nav
{{ render_field(form.application.form.global_ignore_text, rows=5, placeholder="Some text to ignore in a line
/some.regex\d{2}/ for case-INsensitive regex
") }}
<span class="pure-form-message-inline">Note: This is applied globally in addition to the per-watch rules.</span><br>
<span class="pure-form-message-inline">{{ _('Note: This is applied globally in addition to the per-watch rules.') }}</span><br>
<span class="pure-form-message-inline">
<ul>
<li>Matching text will be <strong>ignored</strong> in the text snapshot (you can still see it but it wont trigger a change)</li>
<li>Note: This is applied globally in addition to the per-watch rules.</li>
<li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
<li>Regular Expression support, wrap the entire line in forward slash <code>/regex/</code></li>
<li>Changing this will affect the comparison checksum which may trigger an alert</li>
<li>{{ _('Matching text will be') }} <strong>{{ _('ignored') }}</strong> {{ _('in the text snapshot (you can still see it but it wont trigger a change)') }}</li>
<li>{{ _('Note: This is applied globally in addition to the per-watch rules.') }}</li>
<li>{{ _('Each line processed separately, any line matching will be ignored (removed before creating the checksum)') }}</li>
<li>{{ _('Regular Expression support, wrap the entire line in forward slash') }} <code>/regex/</code></li>
<li>{{ _('Changing this will affect the comparison checksum which may trigger an alert') }}</li>
</ul>
</span>
</fieldset>
<fieldset class="pure-group">
{{ render_checkbox_field(form.application.form.strip_ignored_lines) }}
<span class="pure-form-message-inline">Remove any text that appears in the "Ignore text" from the output (otherwise its just ignored for change-detection)<br>
<i>Note:</i> Changing this will change the status of your existing watches, possibly trigger alerts etc.
<span class="pure-form-message-inline">{{ _('Remove any text that appears in the "Ignore text" from the output (otherwise its just ignored for change-detection)') }}<br>
<i>{{ _('Note:') }}</i> {{ _('Changing this will change the status of your existing watches, possibly trigger alerts etc.') }}
</span>
</fieldset>
</div>
<div class="tab-pane-inner" id="api">
<h4>API Access</h4>
<p>Drive your changedetection.io via API, More about <a href="https://changedetection.io/docs/api_v1/index.html">API access and examples here</a>.</p>
<h4>{{ _('API Access') }}</h4>
<p>{{ _('Drive your changedetection.io via API, More about') }} <a href="https://changedetection.io/docs/api_v1/index.html">{{ _('API access and examples here') }}</a>.</p>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.api_access_token_enabled) }}
<div class="pure-form-message-inline">Restrict API access limit by using <code>x-api-key</code> header - required for the Chrome Extension to work</div><br>
<div class="pure-form-message-inline"><br>API Key <span id="api-key">{{api_key}}</span>
<span style="display:none;" id="api-key-copy" >copy</span>
<div class="pure-form-message-inline">{{ _('Restrict API access limit by using') }} <code>x-api-key</code> {{ _('header - required for the Chrome Extension to work') }}</div><br>
<div class="pure-form-message-inline"><br>{{ _('API Key') }} <span id="api-key">{{api_key}}</span>
<span style="display:none;" id="api-key-copy" >{{ _('copy') }}</span>
</div>
</div>
<div class="pure-control-group">
<a href="{{url_for('settings.settings_reset_api_key')}}" class="pure-button button-small button-cancel">Regenerate API key</a>
<a href="{{url_for('settings.settings_reset_api_key')}}" class="pure-button button-small button-cancel">{{ _('Regenerate API key') }}</a>
</div>
<div class="pure-control-group">
<h4>Chrome Extension</h4>
<p>Easily add any web-page to your changedetection.io installation from within Chrome.</p>
<strong>Step 1</strong> Install the extension, <strong>Step 2</strong> Navigate to this page,
<strong>Step 3</strong> Open the extension from the toolbar and click "<i>Sync API Access</i>"
<h4>{{ _('Chrome Extension') }}</h4>
<p>{{ _('Easily add any web-page to your changedetection.io installation from within Chrome.') }}</p>
<strong>{{ _('Step 1') }}</strong> {{ _('Install the extension,') }} <strong>{{ _('Step 2') }}</strong> {{ _('Navigate to this page,') }}
<strong>{{ _('Step 3') }}</strong> {{ _('Open the extension from the toolbar and click') }} "<i>{{ _('Sync API Access') }}</i>"
<p>
<a id="chrome-extension-link"
title="Try our new Chrome Extension!"
title="{{ _('Try our new Chrome Extension!') }}"
href="https://chromewebstore.google.com/detail/changedetectionio-website/kefcfmgmlhmankjmnbijimhofdjekbop">
<img alt="Chrome store icon" src="{{ url_for('static_content', group='images', filename='google-chrome-icon.png') }}" alt="Chrome">
Chrome Webstore
<img alt="{{ _('Chrome store icon') }}" src="{{ url_for('static_content', group='images', filename='google-chrome-icon.png') }}" >
{{ _('Chrome Webstore') }}
</a>
</p>
</div>
@@ -224,20 +247,20 @@ nav
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.rss_diff_length) }}
<span class="pure-form-message-inline">Maximum number of history snapshots to include in the watch specific RSS feed.</span>
<span class="pure-form-message-inline">{{ _('Maximum number of history snapshots to include in the watch specific RSS feed.') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.rss_reader_mode) }}
<span class="pure-form-message-inline">For watching other RSS feeds - When watching RSS/Atom feeds, convert them into clean text for better change detection.</span>
<span class="pure-form-message-inline">{{ _('For watching other RSS feeds - When watching RSS/Atom feeds, convert them into clean text for better change detection.') }}</span>
</div>
<div class="pure-control-group grey-form-border">
<div class="pure-control-group">
{{ render_field(form.application.form.rss_content_format) }}
<span class="pure-form-message-inline">Does your reader support HTML? Set it here</span>
<span class="pure-form-message-inline">{{ _('Does your reader support HTML? Set it here') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.rss_template_type) }}
<span class="pure-form-message-inline">'System default' for the same template for all items, or re-use your "Notification Body" as the template.</span>
<span class="pure-form-message-inline">{{ _('\'System default\' for the same template for all items, or re-use your "Notification Body" as the template.') }}</span>
</div>
<div>
{{ render_field(form.application.form.rss_template_override) }}
@@ -250,38 +273,38 @@ nav
</div>
<div class="tab-pane-inner" id="timedate">
<div class="pure-control-group">
Ensure the settings below are correct, they are used to manage the time schedule for checking your web page watches.
{{ _('Ensure the settings below are correct, they are used to manage the time schedule for checking your web page watches.') }}
</div>
<div class="pure-control-group">
<p><strong>UTC Time &amp Date from Server:</strong> <span id="utc-time" >{{ utc_time }}</span></p>
<p><strong>Local Time &amp Date in Browser:</strong> <span class="local-time" data-utc="{{ utc_time }}"></span></p>
<p>
<p><strong>{{ _('UTC Time & Date from Server:') }}</strong> <span id="utc-time" >{{ utc_time }}</span></p>
<p><strong>{{ _('Local Time & Date in Browser:') }}</strong> <span class="local-time" data-utc="{{ utc_time }}"></span></p>
<div>
{{ render_field(form.application.form.scheduler_timezone_default) }}
<datalist id="timezones" style="display: none;">
{%- for timezone in available_timezones -%}<option value="{{ timezone }}">{{ timezone }}</option>{%- endfor -%}
</datalist>
</p>
</div>
</div>
</div>
<div class="tab-pane-inner" id="ui-options">
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.open_diff_in_new_tab, class="open_diff_in_new_tab") }}
<span class="pure-form-message-inline">Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.</span>
<span class="pure-form-message-inline">{{ _('Enable this setting to open the diff page in a new tab. If disabled, the diff page will open in the current tab.') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.socket_io_enabled, class="socket_io_enabled") }}
<span class="pure-form-message-inline">Realtime UI Updates Enabled - (Restart required if this is changed)</span>
<span class="pure-form-message-inline">{{ _('Realtime UI Updates Enabled - (Restart required if this is changed)') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.form.favicons_enabled, class="") }}
<span class="pure-form-message-inline">Enable or Disable Favicons next to the watch list</span>
<span class="pure-form-message-inline">{{ _('Enable or Disable Favicons next to the watch list') }}</span>
</div>
<div class="pure-control-group">
{{ render_checkbox_field(form.application.form.ui.use_page_title_in_list) }}
</div>
<div class="pure-control-group">
{{ render_field(form.application.form.pager_size) }}
<span class="pure-form-message-inline">Number of items per page in the watch overview list, 0 to disable.</span>
<span class="pure-form-message-inline">{{ _('Number of items per page in the watch overview list, 0 to disable.') }}</span>
</div>
</div>
@@ -329,21 +352,12 @@ nav
</div>
</div>
<p><strong>Tip</strong>: "Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.') }}</p>
<div class="pure-control-group" id="extra-proxies-setting">
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }}
<span class="pure-form-message-inline">"Name" will be used for selecting the proxy in the Watch Edit settings</span><br>
<span class="pure-form-message-inline">SOCKS5 proxies with authentication are only supported with 'plain requests' fetcher, for other fetchers you should whitelist the IP access instead</span>
{% if form.requests.proxy %}
<div>
<br>
<div class="inline-radio">
{{ render_field(form.requests.form.proxy, class="fetch-backend-proxy") }}
<span class="pure-form-message-inline">Choose a default proxy for all watches</span>
</div>
</div>
{% endif %}
<span class="pure-form-message-inline">{{ _('"Name" will be used for selecting the proxy in the Watch Edit settings') }}</span><br>
<span class="pure-form-message-inline">{{ _('SOCKS5 proxies with authentication are only supported with \'plain requests\' fetcher, for other fetchers you should whitelist the IP access instead') }}</span>
</div>
<div class="pure-control-group" id="extra-browsers-setting">
<p>
@@ -352,13 +366,52 @@ nav
</p>
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_browsers) }}
</div>
</div>
{% if plugin_tabs %}
{% for tab in plugin_tabs %}
<div class="tab-pane-inner" id="plugin-{{ tab.plugin_id }}">
{% set plugin_form = plugin_forms[tab.plugin_id] %}
{% if tab.template_path %}
{# Plugin provides custom template - include it directly (no separate form) #}
{% include tab.template_path with context %}
{% else %}
{# Default form rendering - fields only, no submit button #}
<fieldset>
{% for field in plugin_form %}
{% if field.type != 'CSRFToken' and field.type != 'SubmitField' %}
<div class="pure-control-group">
{% if field.type == 'BooleanField' %}
{{ render_checkbox_field(field) }}
{% else %}
{{ render_field(field) }}
{% endif %}
</div>
{% endif %}
{% endfor %}
</fieldset>
{% endif %}
</div>
{% endfor %}
{% endif %}
<div class="tab-pane-inner" id="info">
<p><strong>{{ _('Uptime:') }}</strong> {{ uptime_seconds|format_duration }}</p>
<p><strong>{{ _('Python version:') }}</strong> {{ python_version }}</p>
<p><strong>{{ _('Plugins active:') }}</strong></p>
{% if active_plugins %}
<ul>
{% for plugin in active_plugins %}
<li><strong>{{ plugin.name }}</strong> - {{ plugin.description }}</li>
{% endfor %}
</ul>
{% else %}
<p>{{ _('No plugins active') }}</p>
{% endif %}
</div>
<div id="actions">
<div class="pure-control-group">
{{ render_button(form.save_button) }}
<a href="{{url_for('watchlist.index')}}" class="pure-button button-cancel">Back</a>
<a href="{{url_for('ui.clear_all_history')}}" class="pure-button button-error">Clear Snapshot History</a>
<a href="{{url_for('watchlist.index')}}" class="pure-button button-cancel">{{ _('Back') }}</a>
<a href="{{url_for('ui.clear_all_history')}}" class="pure-button button-error">{{ _('Clear Snapshot History') }}</a>
</div>
</div>
</form>

View File

@@ -1,5 +1,7 @@
import threading
from flask import Blueprint, request, render_template, flash, url_for, redirect
from flask_babel import gettext
from loguru import logger
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.flask_app import login_optionally_required
@@ -43,61 +45,103 @@ def construct_blueprint(datastore: ChangeDetectionStore):
title = request.form.get('name').strip()
if datastore.tag_exists_by_name(title):
flash(f'The tag "{title}" already exists', "error")
flash(gettext('The tag "{}" already exists').format(title), "error")
return redirect(url_for('tags.tags_overview_page'))
datastore.add_tag(title)
flash("Tag added")
flash(gettext("Tag added"))
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/mute/<string:uuid>", methods=['GET'])
@tags_blueprint.route("/mute/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def mute(uuid):
if datastore.data['settings']['application']['tags'].get(uuid):
datastore.data['settings']['application']['tags'][uuid]['notification_muted'] = not datastore.data['settings']['application']['tags'][uuid]['notification_muted']
tag = datastore.data['settings']['application']['tags'].get(uuid)
if tag:
tag['notification_muted'] = not tag['notification_muted']
tag.commit()
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/delete/<string:uuid>", methods=['GET'])
@tags_blueprint.route("/delete/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def delete(uuid):
removed = 0
# Delete the tag, and any tag reference
# Delete the tag from settings immediately
if datastore.data['settings']['application']['tags'].get(uuid):
del datastore.data['settings']['application']['tags'][uuid]
for watch_uuid, watch in datastore.data['watching'].items():
if watch.get('tags') and uuid in watch['tags']:
removed += 1
watch['tags'].remove(uuid)
# Remove tag from all watches in background thread to avoid blocking
def remove_tag_background(tag_uuid):
"""Background thread to remove tag from watches - discarded after completion."""
removed_count = 0
try:
for watch_uuid, watch in datastore.data['watching'].items():
if watch.get('tags') and tag_uuid in watch['tags']:
watch['tags'].remove(tag_uuid)
watch.commit()
removed_count += 1
logger.info(f"Background: Tag {tag_uuid} removed from {removed_count} watches")
except Exception as e:
logger.error(f"Error removing tag from watches: {e}")
flash(f"Tag deleted and removed from {removed} watches")
# Start daemon thread
threading.Thread(target=remove_tag_background, args=(uuid,), daemon=True).start()
flash(gettext("Tag deleted, removing from watches in background"))
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/unlink/<string:uuid>", methods=['GET'])
@tags_blueprint.route("/unlink/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def unlink(uuid):
unlinked = 0
for watch_uuid, watch in datastore.data['watching'].items():
if watch.get('tags') and uuid in watch['tags']:
unlinked += 1
watch['tags'].remove(uuid)
# Unlink tag from all watches in background thread to avoid blocking
def unlink_tag_background(tag_uuid):
"""Background thread to unlink tag from watches - discarded after completion."""
unlinked_count = 0
try:
for watch_uuid, watch in datastore.data['watching'].items():
if watch.get('tags') and tag_uuid in watch['tags']:
watch['tags'].remove(tag_uuid)
watch.commit()
unlinked_count += 1
logger.info(f"Background: Tag {tag_uuid} unlinked from {unlinked_count} watches")
except Exception as e:
logger.error(f"Error unlinking tag from watches: {e}")
flash(f"Tag unlinked removed from {unlinked} watches")
# Start daemon thread
threading.Thread(target=unlink_tag_background, args=(uuid,), daemon=True).start()
flash(gettext("Unlinking tag from watches in background"))
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/delete_all", methods=['GET'])
@login_optionally_required
def delete_all():
for watch_uuid, watch in datastore.data['watching'].items():
watch['tags'] = []
datastore.data['settings']['application']['tags'] = {}
flash(f"All tags deleted")
for tag_uuid in list(datastore.data['settings']['application']['tags'].keys()):
# TagsDict 'del' handler will remove the dir
del datastore.data['settings']['application']['tags'][tag_uuid]
# Clear tags from all watches in background thread to avoid blocking
def clear_all_tags_background():
"""Background thread to clear tags from all watches - discarded after completion."""
cleared_count = 0
try:
for watch_uuid, watch in datastore.data['watching'].items():
watch['tags'] = []
watch.commit()
cleared_count += 1
logger.info(f"Background: Cleared tags from {cleared_count} watches")
except Exception as e:
logger.error(f"Error clearing tags from watches: {e}")
# Start daemon thread
threading.Thread(target=clear_all_tags_background, daemon=True).start()
flash(gettext("All tags deleted, clearing from watches in background"))
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/edit/<string:uuid>", methods=['GET'])
@tags_blueprint.route("/edit/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def form_tag_edit(uuid):
from changedetectionio.blueprint.tags.form import group_restock_settings_form
@@ -106,7 +150,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
default = datastore.data['settings']['application']['tags'].get(uuid)
if not default:
flash("Tag not found", "error")
flash(gettext("Tag not found"), "error")
return redirect(url_for('watchlist.index'))
form = group_restock_settings_form(
@@ -116,6 +160,21 @@ def construct_blueprint(datastore: ChangeDetectionStore):
default_system_settings = datastore.data['settings'],
)
# Bridge API-stored processor_config_* values into the form's FormField sub-forms.
# The API stores processor_config_restock_diff in the tag dict; find the matching
# FormField by checking which one's sub-fields cover the config keys.
from wtforms.fields.form import FormField as WTFormField
for key, value in default.items():
if not key.startswith('processor_config_') or not isinstance(value, dict):
continue
for form_field in form:
if isinstance(form_field, WTFormField) and all(k in form_field.form._fields for k in value):
for sub_key, sub_value in value.items():
sub_field = form_field.form._fields.get(sub_key)
if sub_field is not None:
sub_field.data = sub_value
break
template_args = {
'data': default,
'form': form,
@@ -159,17 +218,17 @@ def construct_blueprint(datastore: ChangeDetectionStore):
return output
@tags_blueprint.route("/edit/<string:uuid>", methods=['POST'])
@tags_blueprint.route("/edit/<uuid_str:uuid>", methods=['POST'])
@login_optionally_required
def form_tag_edit_submit(uuid):
from changedetectionio.blueprint.tags.form import group_restock_settings_form
if uuid == 'first':
uuid = list(datastore.data['settings']['application']['tags'].keys()).pop()
default = datastore.data['settings']['application']['tags'].get(uuid)
tag = datastore.data['settings']['application']['tags'].get(uuid)
form = group_restock_settings_form(formdata=request.form if request.method == 'POST' else None,
data=default,
data=tag,
extra_notification_tokens=datastore.get_unique_notification_tokens_available()
)
# @todo subclass form so validation works
@@ -178,15 +237,18 @@ def construct_blueprint(datastore: ChangeDetectionStore):
# flash(','.join(l), 'error')
# return redirect(url_for('tags.form_tag_edit_submit', uuid=uuid))
datastore.data['settings']['application']['tags'][uuid].update(form.data)
datastore.data['settings']['application']['tags'][uuid]['processor'] = 'restock_diff'
datastore.needs_write_urgent = True
flash("Updated")
tag.update(form.data)
tag['processor'] = 'restock_diff'
tag.commit()
# Clear checksums for all watches using this tag to force reprocessing
# Tag changes affect inherited configuration
cleared_count = datastore.clear_checksums_for_tag(uuid)
logger.info(f"Tag {uuid} updated, cleared {cleared_count} watch checksums")
flash(gettext("Updated"))
return redirect(url_for('tags.tags_overview_page'))
@tags_blueprint.route("/delete/<string:uuid>", methods=['GET'])
def form_tag_delete(uuid):
return redirect(url_for('tags.tags_overview_page'))
return tags_blueprint

View File

@@ -24,12 +24,12 @@
<div class="tabs collapsable">
<ul>
<li class="tab" id=""><a href="#general">General</a></li>
<li class="tab"><a href="#filters-and-triggers">Filters &amp; Triggers</a></li>
<li class="tab" id=""><a href="#general">{{ _('General') }}</a></li>
<li class="tab"><a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a></li>
{% if extra_tab_content %}
<li class="tab"><a href="#extras_tab">{{ extra_tab_content }}</a></li>
{% endif %}
<li class="tab"><a href="#notifications">Notifications</a></li>
<li class="tab"><a href="#notifications">{{ _('Notifications') }}</a></li>
</ul>
</div>
@@ -47,10 +47,10 @@
</div>
<div class="tab-pane-inner" id="filters-and-triggers">
<p>These settings are <strong><i>added</i></strong> to any existing watch configurations.</p>
<p>{{ _('These settings are') }} <strong><i>{{ _('added') }}</i></strong> {{ _('to any existing watch configurations.') }}</p>
{% include "edit/include_subtract.html" %}
<div class="text-filtering border-fieldset">
<h3>Text filtering</h3>
<h3>{{ _('Text filtering') }}</h3>
{% include "edit/text-options.html" %}
</div>
</div>
@@ -70,18 +70,18 @@
<div class="pure-control-group inline-radio">
{{ render_checkbox_field(form.notification_screenshot) }}
<span class="pure-form-message-inline">
<strong>Use with caution!</strong> This will easily fill up your email storage quota or flood other storages.
<strong>{{ _('Use with caution!') }}</strong> {{ _('This will easily fill up your email storage quota or flood other storages.') }}
</span>
</div>
{% endif %}
<div class="field-group" id="notification-field-group">
{% if has_default_notification_urls %}
<div class="inline-warning">
<img class="inline-warning-icon" src="{{url_for('static_content', group='images', filename='notice.svg')}}" alt="Look out!" title="Lookout!" >
There are <a href="{{ url_for('settings.settings_page')}}#notifications">system-wide notification URLs enabled</a>, this form will override notification settings for this watch only &dash; an empty Notification URL list here will still send notifications.
<img class="inline-warning-icon" src="{{url_for('static_content', group='images', filename='notice.svg')}}" alt="{{ _('Look out!') }}" title="{{ _('Lookout!') }}" >
{{ _('There are') }} <a href="{{ url_for('settings.settings_page')}}#notifications">{{ _('system-wide notification URLs enabled') }}</a>, {{ _('this form will override notification settings for this watch only') }} &dash; {{ _('an empty Notification URL list here will still send notifications.') }}
</div>
{% endif %}
<a href="#notifications" id="notification-setting-reset-to-default" class="pure-button button-xsmall" style="right: 20px; top: 20px; position: absolute; background-color: #5f42dd; border-radius: 4px; font-size: 70%; color: #fff">Use system defaults</a>
<a href="#notifications" id="notification-setting-reset-to-default" class="pure-button button-xsmall" style="right: 20px; top: 20px; position: absolute; background-color: #5f42dd; border-radius: 4px; font-size: 70%; color: #fff">{{ _('Use system defaults') }}</a>
{{ render_common_settings_form(form, emailprefix, settings_application, extra_notification_token_placeholder_info) }}
</div>

View File

@@ -2,22 +2,23 @@
{% block content %}
{% from '_helpers.html' import render_simple_field, render_field %}
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='modal.js')}}"></script>
<div class="box">
<form class="pure-form" action="{{ url_for('tags.form_tag_add') }}" method="POST" id="new-watch-form">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" >
<fieldset>
<legend>Add a new organisational tag</legend>
<legend>{{ _('Add a new organisational tag') }}</legend>
<div id="watch-add-wrapper-zone">
<div>
{{ render_simple_field(form.name, placeholder="Watch group / tag") }}
{{ render_simple_field(form.name, placeholder=_("Watch group / tag")) }}
</div>
<div>
{{ render_simple_field(form.save_button, title="Save" ) }}
{{ render_simple_field(form.save_button, title=_("Save") ) }}
</div>
</div>
<br>
<div style="color: #fff;">Groups allows you to manage filters and notifications for multiple watches under a single organisational tag.</div>
<div style="color: #fff;">{{ _('Groups allows you to manage filters and notifications for multiple watches under a single organisational tag.') }}</div>
</fieldset>
</form>
<!-- @todo maybe some overview matrix, 'tick' with which has notification, filter rules etc -->
@@ -27,8 +28,8 @@
<thead>
<tr>
<th></th>
<th># Watches</th>
<th>Tag / Label name</th>
<th>{{ _('# Watches') }}</th>
<th>{{ _('Tag / Label name') }}</th>
<th></th>
</tr>
</thead>
@@ -38,7 +39,7 @@
--->
{% if not available_tags|length %}
<tr>
<td colspan="3">No website organisational tags/groups configured</td>
<td colspan="3">{{ _('No website organisational tags/groups configured') }}</td>
</tr>
{% endif %}
{% for uuid, tag in available_tags %}
@@ -49,10 +50,25 @@
<td>{{ "{:,}".format(tag_count[uuid]) if uuid in tag_count else 0 }}</td>
<td class="title-col inline"> <a href="{{url_for('watchlist.index', tag=uuid) }}">{{ tag.title }}</a></td>
<td>
<a class="pure-button pure-button-primary" href="{{ url_for('tags.form_tag_edit', uuid=uuid) }}">Edit</a>&nbsp;
<a class="pure-button pure-button-primary" href="{{ url_for('tags.delete', uuid=uuid) }}" title="Deletes and removes tag">Delete</a>
<a class="pure-button pure-button-primary" href="{{ url_for('tags.unlink', uuid=uuid) }}" title="Keep the tag but unlink any watches">Unlink</a>
<a href="{{ url_for('rss.rss_tag_feed', tag_uuid=uuid, token=app_rss_token)}}"><img alt="RSS Feed for this watch" style="padding-left: 1em;" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
<a class="pure-button pure-button-primary" href="{{ url_for('tags.form_tag_edit', uuid=uuid) }}">{{ _('Edit') }}</a>
<a href="{{ url_for('ui.form_watch_checknow', tag=uuid) }}" class="pure-button pure-button-primary" >{{ _('Recheck') }}</a>
<a class="pure-button button-error"
href="{{ url_for('tags.delete', uuid=uuid) }}"
data-requires-confirm
data-confirm-type="danger"
data-confirm-title="{{ _('Delete Group?') }}"
data-confirm-message="{{ _('<p>Are you sure you want to delete group <strong>%(title)s</strong>?</p><p>This action cannot be undone.</p>', title=tag.title) }}"
data-confirm-button="{{ _('Delete') }}"
title="{{ _('Deletes and removes tag') }}">{{ _('Delete') }}</a>
<a class="pure-button button-warning"
href="{{ url_for('tags.unlink', uuid=uuid) }}"
data-requires-confirm
data-confirm-type="warning"
data-confirm-title="{{ _('Unlink Group?') }}"
data-confirm-message="{{ _('<p>Are you sure you want to unlink all watches from group <strong>%(title)s</strong>?</p><p>The tag will be kept but watches will be removed from it.</p>', title=tag.title) }}"
data-confirm-button="{{ _('Unlink') }}"
title="{{ _('Keep the tag but unlink any watches') }}">{{ _('Unlink') }}</a>
<a href="{{ url_for('rss.rss_tag_feed', tag_uuid=uuid, token=app_rss_token)}}"><img alt="{{ _('RSS Feed for this watch') }}" style="padding-left: 1em;" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
</td>
</tr>
{% endfor %}

View File

@@ -1,13 +1,16 @@
import time
from flask import Blueprint, request, redirect, url_for, flash, render_template, session
import threading
from flask import Blueprint, request, redirect, url_for, flash, render_template, session, current_app
from flask_babel import gettext
from loguru import logger
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.blueprint.ui.edit import construct_blueprint as construct_edit_blueprint
from changedetectionio.blueprint.ui.notification import construct_blueprint as construct_notification_blueprint
from changedetectionio.blueprint.ui.views import construct_blueprint as construct_views_blueprint
from changedetectionio.blueprint.ui import diff, preview
def _handle_operations(op, uuids, datastore, worker_handler, update_q, queuedWatchMetaData, watch_check_update, extra_data=None, emit_flash=True):
def _handle_operations(op, uuids, datastore, worker_pool, update_q, queuedWatchMetaData, watch_check_update, extra_data=None, emit_flash=True):
from flask import request, flash
if op == 'delete':
@@ -15,64 +18,69 @@ def _handle_operations(op, uuids, datastore, worker_handler, update_q, queuedWat
if datastore.data['watching'].get(uuid):
datastore.delete(uuid)
if emit_flash:
flash(f"{len(uuids)} watches deleted")
flash(gettext("{} watches deleted").format(len(uuids)))
elif op == 'pause':
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.data['watching'][uuid]['paused'] = True
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches paused")
flash(gettext("{} watches paused").format(len(uuids)))
elif op == 'unpause':
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.data['watching'][uuid.strip()]['paused'] = False
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches unpaused")
flash(gettext("{} watches unpaused").format(len(uuids)))
elif (op == 'mark-viewed'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.set_last_viewed(uuid, int(time.time()))
if emit_flash:
flash(f"{len(uuids)} watches updated")
flash(gettext("{} watches updated").format(len(uuids)))
elif (op == 'mute'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.data['watching'][uuid]['notification_muted'] = True
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches muted")
flash(gettext("{} watches muted").format(len(uuids)))
elif (op == 'unmute'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.data['watching'][uuid]['notification_muted'] = False
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches un-muted")
flash(gettext("{} watches un-muted").format(len(uuids)))
elif (op == 'recheck'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
# Recheck and require a full reprocessing
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
if emit_flash:
flash(f"{len(uuids)} watches queued for rechecking")
flash(gettext("{} watches queued for rechecking").format(len(uuids)))
elif (op == 'clear-errors'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.data['watching'][uuid]["last_error"] = False
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches errors cleared")
flash(gettext("{} watches errors cleared").format(len(uuids)))
elif (op == 'clear-history'):
for uuid in uuids:
if datastore.data['watching'].get(uuid):
datastore.clear_watch_history(uuid)
if emit_flash:
flash(f"{len(uuids)} watches cleared/reset.")
flash(gettext("{} watches cleared/reset.").format(len(uuids)))
elif (op == 'notification-default'):
from changedetectionio.notification import (
@@ -84,8 +92,9 @@ def _handle_operations(op, uuids, datastore, worker_handler, update_q, queuedWat
datastore.data['watching'][uuid]['notification_body'] = None
datastore.data['watching'][uuid]['notification_urls'] = []
datastore.data['watching'][uuid]['notification_format'] = USE_SYSTEM_DEFAULT_NOTIFICATION_FORMAT_FOR_WATCH
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches set to use default notification settings")
flash(gettext("{} watches set to use default notification settings").format(len(uuids)))
elif (op == 'assign-tag'):
op_extradata = extra_data
@@ -99,14 +108,15 @@ def _handle_operations(op, uuids, datastore, worker_handler, update_q, queuedWat
datastore.data['watching'][uuid]['tags'] = []
datastore.data['watching'][uuid]['tags'].append(tag_uuid)
datastore.data['watching'][uuid].commit()
if emit_flash:
flash(f"{len(uuids)} watches were tagged")
flash(gettext("{} watches were tagged").format(len(uuids)))
if uuids:
for uuid in uuids:
watch_check_update.send(watch_uuid=uuid)
def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handler, queuedWatchMetaData, watch_check_update):
def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool, queuedWatchMetaData, watch_check_update):
ui_blueprint = Blueprint('ui', __name__, template_folder="templates")
# Register the edit blueprint
@@ -121,18 +131,25 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
views_blueprint = construct_views_blueprint(datastore, update_q, queuedWatchMetaData, watch_check_update)
ui_blueprint.register_blueprint(views_blueprint)
# Register diff and preview blueprints
diff_blueprint = diff.construct_blueprint(datastore)
ui_blueprint.register_blueprint(diff_blueprint)
preview_blueprint = preview.construct_blueprint(datastore)
ui_blueprint.register_blueprint(preview_blueprint)
# Import the login decorator
from changedetectionio.auth_decorator import login_optionally_required
@ui_blueprint.route("/clear_history/<string:uuid>", methods=['GET'])
@ui_blueprint.route("/clear_history/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def clear_watch_history(uuid):
try:
datastore.clear_watch_history(uuid)
except KeyError:
flash('Watch not found', 'error')
flash(gettext('Watch not found'), 'error')
else:
flash("Cleared snapshot history for watch {}".format(uuid))
flash(gettext("Cleared snapshot history for watch {}").format(uuid))
return redirect(url_for('watchlist.index'))
@ui_blueprint.route("/clear_history", methods=['GET', 'POST'])
@@ -142,11 +159,26 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
confirmtext = request.form.get('confirmtext')
if confirmtext == 'clear':
for uuid in datastore.data['watching'].keys():
datastore.clear_watch_history(uuid)
flash("Cleared snapshot history for all watches")
# Run in background thread to avoid blocking
def clear_history_background():
# Capture UUIDs first to avoid race conditions
watch_uuids = list(datastore.data['watching'].keys())
logger.info(f"Background: Clearing history for {len(watch_uuids)} watches")
for uuid in watch_uuids:
try:
datastore.clear_watch_history(uuid)
except Exception as e:
logger.error(f"Error clearing history for watch {uuid}: {e}")
logger.info("Background: Completed clearing history")
# Start daemon thread
threading.Thread(target=clear_history_background, daemon=True).start()
flash(gettext("History clearing started in background"))
else:
flash('Incorrect confirmation text.', 'error')
flash(gettext('Incorrect confirmation text.'), 'error')
return redirect(url_for('watchlist.index'))
@@ -160,17 +192,37 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
# Save the current newest history as the most recently viewed
with_errors = request.args.get('with_errors') == "1"
tag_limit = request.args.get('tag')
logger.debug(f"Limiting to tag {tag_limit}")
now = int(time.time())
for watch_uuid, watch in datastore.data['watching'].items():
if with_errors and not watch.get('last_error'):
continue
if tag_limit and ( not watch.get('tags') or tag_limit not in watch['tags'] ):
logger.debug(f"Skipping watch {watch_uuid}")
continue
# Mark watches as viewed - use background thread only for large watch counts
def mark_viewed_impl():
"""Mark watches as viewed - can run synchronously or in background thread."""
marked_count = 0
try:
for watch_uuid, watch in datastore.data['watching'].items():
if with_errors and not watch.get('last_error'):
continue
datastore.set_last_viewed(watch_uuid, now)
if tag_limit and (not watch.get('tags') or tag_limit not in watch['tags']):
continue
datastore.set_last_viewed(watch_uuid, now)
marked_count += 1
logger.info(f"Marking complete: {marked_count} watches marked as viewed")
except Exception as e:
logger.error(f"Error marking as viewed: {e}")
# For small watch counts (< 10), run synchronously to avoid race conditions in tests
# For larger counts, use background thread to avoid blocking the UI
watch_count = len(datastore.data['watching'])
if watch_count < 10:
# Run synchronously for small watch counts
mark_viewed_impl()
else:
# Start background thread for large watch counts
thread = threading.Thread(target=mark_viewed_impl, daemon=True)
thread.start()
return redirect(url_for('watchlist.index', tag=tag_limit))
@@ -178,16 +230,16 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
@login_optionally_required
def form_delete():
uuid = request.args.get('uuid')
if uuid != 'all' and not uuid in datastore.data['watching'].keys():
flash('The watch by UUID {} does not exist.'.format(uuid), 'error')
return redirect(url_for('watchlist.index'))
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
if uuid != 'all' and not uuid in datastore.data['watching'].keys():
flash(gettext('The watch by UUID {} does not exist.').format(uuid), 'error')
return redirect(url_for('watchlist.index'))
datastore.delete(uuid)
flash('Deleted.')
flash(gettext('Deleted.'))
return redirect(url_for('watchlist.index'))
@@ -195,16 +247,16 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
@login_optionally_required
def form_clone():
uuid = request.args.get('uuid')
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
new_uuid = datastore.clone(uuid)
if not datastore.data['watching'].get(uuid).get('paused'):
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid}))
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=5, item={'uuid': new_uuid}))
flash('Cloned, you are editing the new watch.')
flash(gettext('Cloned, you are editing the new watch.'))
return redirect(url_for("ui.ui_edit.edit_page", uuid=new_uuid))
@@ -216,40 +268,83 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
uuid = request.args.get('uuid')
with_errors = request.args.get('with_errors') == "1"
i = 0
running_uuids = worker_handler.get_running_uuids()
if uuid:
if uuid not in running_uuids:
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
i += 1
# Single watch - check if already queued or running
if worker_pool.is_watch_running(uuid) or uuid in update_q.get_queued_uuids():
flash(gettext("Watch is already queued or being checked."))
else:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
flash(gettext("Queued 1 watch for rechecking."))
else:
# Recheck all, including muted
# Get most overdue first
# Multiple watches - first count how many need to be queued
watches_to_queue = []
for k in sorted(datastore.data['watching'].items(), key=lambda item: item[1].get('last_checked', 0)):
watch_uuid = k[0]
watch = k[1]
if not watch['paused']:
if watch_uuid not in running_uuids:
if with_errors and not watch.get('last_error'):
continue
if not watch['paused'] and watch_uuid:
if with_errors and not watch.get('last_error'):
continue
if tag != None and tag not in watch['tags']:
continue
watches_to_queue.append(watch_uuid)
if tag != None and tag not in watch['tags']:
continue
# If less than 20 watches, queue synchronously for immediate feedback
if len(watches_to_queue) < 20:
# Get already queued/running UUIDs once (efficient)
queued_uuids = set(update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids())
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
i += 1
# Filter out watches that are already queued or running
watches_to_queue_filtered = []
for watch_uuid in watches_to_queue:
if watch_uuid not in queued_uuids and watch_uuid not in running_uuids:
watches_to_queue_filtered.append(watch_uuid)
if i == 1:
flash("Queued 1 watch for rechecking.")
if i > 1:
flash(f"Queued {i} watches for rechecking.")
if i == 0:
flash("No watches available to recheck.")
# Queue only the filtered watches
for watch_uuid in watches_to_queue_filtered:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
return redirect(url_for('watchlist.index'))
# Provide feedback about skipped watches
skipped_count = len(watches_to_queue) - len(watches_to_queue_filtered)
if skipped_count > 0:
flash(gettext("Queued {} watches for rechecking ({} already queued or running).").format(
len(watches_to_queue_filtered), skipped_count))
else:
if len(watches_to_queue_filtered) == 1:
flash(gettext("Queued 1 watch for rechecking."))
else:
flash(gettext("Queued {} watches for rechecking.").format(len(watches_to_queue_filtered)))
else:
# 20+ watches - queue in background thread to avoid blocking HTTP response
# Capture queued/running state before background thread
queued_uuids = set(update_q.get_queued_uuids())
running_uuids = set(worker_pool.get_running_uuids())
def queue_watches_background():
"""Background thread to queue watches - discarded after completion."""
try:
queued_count = 0
skipped_count = 0
for watch_uuid in watches_to_queue:
# Check if already queued or running (state captured at start)
if watch_uuid not in queued_uuids and watch_uuid not in running_uuids:
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': watch_uuid}))
queued_count += 1
else:
skipped_count += 1
logger.info(f"Background queueing complete: {queued_count} watches queued, {skipped_count} skipped (already queued/running)")
except Exception as e:
logger.error(f"Error in background queueing: {e}")
# Start background thread and return immediately
thread = threading.Thread(target=queue_watches_background, daemon=True, name="QueueWatches-Background")
thread.start()
# Return immediately with approximate message
flash(gettext("Queueing watches for rechecking in background..."))
return redirect(url_for('watchlist.index', **({'tag': tag} if tag else {})))
@ui_blueprint.route("/form/checkbox-operations", methods=['POST'])
@login_optionally_required
@@ -262,7 +357,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
extra_data=extra_data,
queuedWatchMetaData=queuedWatchMetaData,
uuids=uuids,
worker_handler=worker_handler,
worker_pool=worker_pool,
update_q=update_q,
watch_check_update=watch_check_update,
op=op,
@@ -271,7 +366,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
return redirect(url_for('watchlist.index'))
@ui_blueprint.route("/share-url/<string:uuid>", methods=['GET'])
@ui_blueprint.route("/share-url/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def form_share_put_watch(uuid):
"""Given a watch UUID, upload the info and return a share-link
@@ -280,9 +375,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
import json
from copy import deepcopy
# more for testing
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# copy it to memory as trim off what we dont need (history)
watch = deepcopy(datastore.data['watching'].get(uuid))
@@ -318,8 +410,29 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_handle
except Exception as e:
logger.error(f"Error sharing -{str(e)}")
flash(f"Could not share, something went wrong while communicating with the share server - {str(e)}", 'error')
flash(gettext("Could not share, something went wrong while communicating with the share server - {}").format(str(e)), 'error')
return redirect(url_for('watchlist.index'))
@ui_blueprint.route("/language/auto-detect", methods=['GET'])
def delete_locale_language_session_var_if_it_exists():
"""Clear the session locale preference to auto-detect from browser Accept-Language header"""
if 'locale' in session:
session.pop('locale', None)
# Refresh Flask-Babel to clear cached locale
from flask_babel import refresh
refresh()
flash(gettext("Language set to auto-detect from browser"))
# Check if there's a redirect parameter to return to the same page
redirect_url = request.args.get('redirect')
# If redirect is provided and safe, use it
from changedetectionio.is_safe_url import is_safe_url
if redirect_url and is_safe_url(redirect_url, current_app):
return redirect(redirect_url)
# Otherwise redirect to watchlist
return redirect(url_for('watchlist.index'))
return ui_blueprint

View File

@@ -0,0 +1,305 @@
from flask import Blueprint, request, redirect, url_for, flash, render_template, make_response, send_from_directory
from flask_babel import gettext
import re
import importlib
from loguru import logger
from markupsafe import Markup
from changedetectionio.diff import (
REMOVED_STYLE, ADDED_STYLE, REMOVED_INNER_STYLE, ADDED_INNER_STYLE,
REMOVED_PLACEMARKER_OPEN, REMOVED_PLACEMARKER_CLOSED,
ADDED_PLACEMARKER_OPEN, ADDED_PLACEMARKER_CLOSED,
CHANGED_PLACEMARKER_OPEN, CHANGED_PLACEMARKER_CLOSED,
CHANGED_INTO_PLACEMARKER_OPEN, CHANGED_INTO_PLACEMARKER_CLOSED
)
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
def construct_blueprint(datastore: ChangeDetectionStore):
diff_blueprint = Blueprint('ui_diff', __name__, template_folder="../ui/templates")
@diff_blueprint.app_template_filter('diff_unescape_difference_spans')
def diff_unescape_difference_spans(content):
"""Emulate Jinja2's auto-escape, then selectively unescape our diff spans."""
from markupsafe import escape
if not content:
return Markup('')
# Step 1: Escape everything like Jinja2 would (this makes it XSS-safe)
escaped_content = escape(str(content))
# Step 2: Unescape only our exact diff spans generated by apply_html_color_to_body()
# Pattern matches the exact structure:
# <span style="{STYLE}" role="{ROLE}" aria-label="{LABEL}" title="{TITLE}">
# Unescape outer span opening tags with full attributes (role, aria-label, title)
# Matches removed/added/changed/changed_into spans
result = re.sub(
rf'&lt;span style=&#34;({re.escape(REMOVED_STYLE)}|{re.escape(ADDED_STYLE)})&#34; '
rf'role=&#34;(deletion|insertion|note)&#34; '
rf'aria-label=&#34;([^&]+?)&#34; '
rf'title=&#34;([^&]+?)&#34;&gt;',
r'<span style="\1" role="\2" aria-label="\3" title="\4">',
str(escaped_content),
flags=re.IGNORECASE
)
# Unescape inner span opening tags (without additional attributes)
# This matches the darker background styles for changed parts within lines
result = re.sub(
rf'&lt;span style=&#34;({re.escape(REMOVED_INNER_STYLE)}|{re.escape(ADDED_INNER_STYLE)})&#34;&gt;',
r'<span style="\1">',
result,
flags=re.IGNORECASE
)
# Unescape closing tags (but only as many as we opened)
open_count = result.count('<span style=')
close_count = str(escaped_content).count('&lt;/span&gt;')
# Replace up to the number of spans we opened
for _ in range(min(open_count, close_count)):
result = result.replace('&lt;/span&gt;', '</span>', 1)
return Markup(result)
@diff_blueprint.route("/diff/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def diff_history_page(uuid):
"""
Render the history/diff page for a watch.
This route is processor-aware: it delegates rendering to the processor's
difference.py module, allowing different processor types to provide
custom visualizations:
- text_json_diff: Text/HTML diff with syntax highlighting
- restock_diff: Could show price charts and stock history
- image_diff: Could show image comparison slider/overlay
Each processor implements processors/{type}/difference.py::render()
If a processor doesn't have a difference module, falls back to text_json_diff.
"""
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
dates = list(watch.history.keys())
if not dates or len(dates) < 2:
flash(gettext("Not enough history (2 snapshots required) to show difference page for this watch."), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's difference module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'difference')
# Call the processor's render() function
if processor_module and hasattr(processor_module, 'render'):
return processor_module.render(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=render_template,
flash=flash,
redirect=redirect
)
# Fallback: if processor doesn't have difference module, use text_json_diff as default
from changedetectionio.processors.text_json_diff.difference import render as default_render
return default_render(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=render_template,
flash=flash,
redirect=redirect
)
@diff_blueprint.route("/diff/<uuid_str:uuid>/extract", methods=['GET'])
@login_optionally_required
def diff_history_page_extract_GET(uuid):
"""
Render the data extraction form for a watch.
This route is processor-aware: it delegates to the processor's
extract.py module, allowing different processor types to provide
custom extraction interfaces.
Each processor implements processors/{type}/extract.py::render_form()
If a processor doesn't have an extract module, falls back to text_json_diff.
"""
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's extract module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'extract')
# Call the processor's render_form() function
if processor_module and hasattr(processor_module, 'render_form'):
return processor_module.render_form(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=render_template,
flash=flash,
redirect=redirect
)
# Fallback: if processor doesn't have extract module, use base processors.extract as default
from changedetectionio.processors.extract import render_form as default_render_form
return default_render_form(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=render_template,
flash=flash,
redirect=redirect
)
@diff_blueprint.route("/diff/<uuid_str:uuid>/extract", methods=['POST'])
@login_optionally_required
def diff_history_page_extract_POST(uuid):
"""
Process the data extraction request.
This route is processor-aware: it delegates to the processor's
extract.py module, allowing different processor types to provide
custom extraction logic.
Each processor implements processors/{type}/extract.py::process_extraction()
If a processor doesn't have an extract module, falls back to text_json_diff.
"""
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's extract module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'extract')
# Call the processor's process_extraction() function
if processor_module and hasattr(processor_module, 'process_extraction'):
return processor_module.process_extraction(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
make_response=make_response,
send_from_directory=send_from_directory,
flash=flash,
redirect=redirect
)
# Fallback: if processor doesn't have extract module, use base processors.extract as default
from changedetectionio.processors.extract import process_extraction as default_process_extraction
return default_process_extraction(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
make_response=make_response,
send_from_directory=send_from_directory,
flash=flash,
redirect=redirect
)
@diff_blueprint.route("/diff/<uuid_str:uuid>/processor-asset/<string:asset_name>", methods=['GET'])
@login_optionally_required
def processor_asset(uuid, asset_name):
"""
Serve processor-specific binary assets (images, files, etc.).
This route is processor-aware: it delegates to the processor's
difference.py module, allowing different processor types to serve
custom assets without embedding them as base64 in templates.
This solves memory issues with large binary data (e.g., screenshots)
by streaming them as separate HTTP responses instead of embedding
in the HTML template.
Each processor implements processors/{type}/difference.py::get_asset()
which returns (binary_data, content_type, cache_control_header).
Example URLs:
- /diff/{uuid}/processor-asset/before
- /diff/{uuid}/processor-asset/after
- /diff/{uuid}/processor-asset/rendered_diff
"""
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's difference module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'difference')
# Call the processor's get_asset() function
if processor_module and hasattr(processor_module, 'get_asset'):
result = processor_module.get_asset(
asset_name=asset_name,
watch=watch,
datastore=datastore,
request=request
)
if result is None:
from flask import abort
abort(404, description=f"Asset '{asset_name}' not found")
binary_data, content_type, cache_control = result
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort
abort(404, description=f"Processor '{processor_name}' does not support assets")
return diff_blueprint

View File

@@ -1,15 +1,15 @@
import time
from copy import deepcopy
import os
import importlib.resources
from flask import Blueprint, request, redirect, url_for, flash, render_template, make_response, send_from_directory, abort
from flask import Blueprint, request, redirect, url_for, flash, render_template, abort
from flask_babel import gettext
from loguru import logger
from jinja2 import Environment, FileSystemLoader
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio.time_handler import is_within_schedule
from changedetectionio import worker_handler
from changedetectionio import worker_pool
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData):
edit_blueprint = Blueprint('ui_edit', __name__, template_folder="../ui/templates")
@@ -20,26 +20,25 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
if tag_uuid in watch.get('tags', []) and (tag.get('include_filters') or tag.get('subtractive_selectors')):
return True
@edit_blueprint.route("/edit/<string:uuid>", methods=['GET', 'POST'])
@edit_blueprint.route("/edit/<uuid_str:uuid>", methods=['GET', 'POST'])
@login_optionally_required
# https://stackoverflow.com/questions/42984453/wtforms-populate-form-with-data-if-data-exists
# https://wtforms.readthedocs.io/en/3.0.x/forms/#wtforms.form.Form.populate_obj ?
def edit_page(uuid):
from changedetectionio import forms
from changedetectionio.blueprint.browser_steps.browser_steps import browser_step_ui_config
from changedetectionio.browser_steps.browser_steps import browser_step_ui_config
from changedetectionio import processors
import importlib
# More for testing, possible to return the first/only
if not datastore.data['watching'].keys():
flash("No watches to edit", "error")
return redirect(url_for('watchlist.index'))
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
# More for testing, possible to return the first/only
if not datastore.data['watching'].keys():
flash(gettext("No watches to edit"), "error")
return redirect(url_for('watchlist.index'))
if not uuid in datastore.data['watching']:
flash("No watch with the UUID %s found." % (uuid), "error")
flash(gettext("No watch with the UUID {} found.").format(uuid), "error")
return redirect(url_for('watchlist.index'))
switch_processor = request.args.get('switch_processor')
@@ -47,12 +46,18 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
for p in processors.available_processors():
if p[0] == switch_processor:
datastore.data['watching'][uuid]['processor'] = switch_processor
flash(f"Switched to mode - {p[1]}.")
flash(gettext("Switched to mode - {}.").format(p[1]))
datastore.clear_watch_history(uuid)
redirect(url_for('ui_edit.edit_page', uuid=uuid))
# be sure we update with a copy instead of accidently editing the live object by reference
default = deepcopy(datastore.data['watching'][uuid])
default = None
while not default:
try:
default = deepcopy(datastore.data['watching'][uuid])
except RuntimeError as e:
# Dictionary changed
continue
# Defaults for proxy choice
if datastore.proxy_list is not None: # When enabled
@@ -66,8 +71,13 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
processor_name = datastore.data['watching'][uuid].get('processor', '')
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == processor_name), None)
if not processor_classes:
flash(f"Cannot load the edit form for processor/plugin '{processor_classes[1]}', plugin missing?", 'error')
return redirect(url_for('watchlist.index'))
flash(gettext("Could not load '{}' processor, processor plugin might be missing. Please select a different processor.").format(processor_name), 'error')
# Fall back to default processor so user can still edit and change processor
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == 'text_json_diff'), None)
if not processor_classes:
# If even text_json_diff is missing, something is very wrong
flash(gettext("Could not load '{}' processor, processor plugin might be missing.").format(processor_name), 'error')
return redirect(url_for('watchlist.index'))
parent_module = processors.get_parent_module(processor_classes[0])
@@ -96,6 +106,39 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
form.datastore = datastore
form.watch = default
# Load processor-specific config from JSON file for GET requests
if request.method == 'GET' and processor_name:
try:
from changedetectionio.processors.base import difference_detection_processor
# Create a processor instance to access config methods
processor_instance = difference_detection_processor(datastore, uuid)
# Use processor name as filename so each processor keeps its own config
config_filename = f'{processor_name}.json'
processor_config = processor_instance.get_extra_watch_config(config_filename)
if processor_config:
from wtforms.fields.form import FormField
# Populate processor-config-* fields from JSON
for config_key, config_value in processor_config.items():
if not isinstance(config_value, dict):
continue
# Try exact API-named field first (e.g., processor_config_restock_diff)
target_field = getattr(form, f'processor_config_{config_key}', None)
# Fallback: find any FormField sub-form whose fields cover config_value keys
if target_field is None:
for form_field in form:
if isinstance(form_field, FormField) and all(k in form_field.form._fields for k in config_value):
target_field = form_field
break
if target_field is not None:
for sub_key, sub_value in config_value.items():
sub_field = target_field.form._fields.get(sub_key)
if sub_field is not None:
sub_field.data = sub_value
logger.debug(f"Loaded processor config from {config_filename}: {sub_key} = {sub_value}")
except Exception as e:
logger.warning(f"Failed to load processor config: {e}")
for p in datastore.extra_browsers:
form.fetch_backend.choices.append(p)
@@ -114,11 +157,6 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
if request.method == 'POST' and form.validate():
# If they changed processor, it makes sense to reset it.
if datastore.data['watching'][uuid].get('processor') != form.data.get('processor'):
datastore.data['watching'][uuid].clear_watch()
flash("Reset watch history due to change of processor")
extra_update_obj = {
'consecutive_filter_failures': 0,
'last_error' : False
@@ -129,7 +167,12 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
extra_update_obj['time_between_check'] = form.time_between_check.data
# Ignore text
# Handle processor-config-* fields separately (save to JSON, not datastore)
# IMPORTANT: These must NOT be saved to url-watches.json, only to the processor-specific JSON file
processor_config_data = processors.extract_processor_config_from_form_data(form.data)
processors.save_processor_config(datastore, uuid, processor_config_data)
# Ignore text
form_ignore_text = form.ignore_text.data
datastore.data['watching'][uuid]['ignore_text'] = form_ignore_text
@@ -167,12 +210,19 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Recast it if need be to right data Watch handler
watch_class = processors.get_custom_watch_obj_for_processor(form.data.get('processor'))
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, default=datastore.data['watching'][uuid])
flash("Updated watch - unpaused!" if request.args.get('unpause_on_save') else "Updated watch.")
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore.data, default=datastore.data['watching'][uuid])
# Re #286 - We wait for syncing new data to disk in another thread every 60 seconds
# But in the case something is added we should save straight away
datastore.needs_write_urgent = True
# Save the watch immediately
datastore.data['watching'][uuid].commit()
flash(gettext("Updated watch - unpaused!") if request.args.get('unpause_on_save') else gettext("Updated watch."))
# Cleanup any browsersteps session for this watch
try:
from changedetectionio.blueprint.browser_steps import cleanup_session_for_watch
cleanup_session_for_watch(uuid)
except Exception as e:
logger.debug(f"Error cleaning up browsersteps session: {e}")
# Do not queue on edit if its not within the time range
@@ -202,17 +252,17 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
#############################
if not datastore.data['watching'][uuid].get('paused') and is_in_schedule:
# Queue the watch for immediate recheck, with a higher priority
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': uuid}))
# Diff page [edit] link should go back to diff page
if request.args.get("next") and request.args.get("next") == 'diff':
return redirect(url_for('ui.ui_views.diff_history_page', uuid=uuid))
return redirect(url_for('ui.ui_diff.diff_history_page', uuid=uuid))
return redirect(url_for('watchlist.index', tag=request.args.get("tag",'')))
else:
if request.method == 'POST' and not form.validate():
flash("An error occurred, please see below.", "error")
flash(gettext("An error occurred, please see below."), "error")
# JQ is difficult to install on windows and must be manually added (outside requirements.txt)
jq_support = True
@@ -223,26 +273,32 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
watch = datastore.data['watching'].get(uuid)
# if system or watch is configured to need a chrome type browser
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
watch_needs_selenium_or_playwright = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
watch_needs_selenium_or_playwright = True
from zoneinfo import available_timezones
# Only works reliably with Playwright
# Import the global plugin system
from changedetectionio.pluggy_interface import collect_ui_edit_stats_extras
from changedetectionio.pluggy_interface import collect_ui_edit_stats_extras, get_fetcher_capabilities
# Get fetcher capabilities instead of hardcoded logic
capabilities = get_fetcher_capabilities(watch, datastore)
# Add processor capabilities from module
capabilities['supports_visual_selector'] = getattr(parent_module, 'supports_visual_selector', False)
capabilities['supports_text_filters_and_triggers'] = getattr(parent_module, 'supports_text_filters_and_triggers', False)
capabilities['supports_text_filters_and_triggers_elements'] = getattr(parent_module, 'supports_text_filters_and_triggers_elements', False)
capabilities['supports_request_type'] = getattr(parent_module, 'supports_request_type', False)
app_rss_token = datastore.data['settings']['application'].get('rss_access_token'),
c = [f"processor-{watch.get('processor')}"]
if worker_pool.is_watch_running(uuid):
c.append('checking-now')
template_args = {
'available_processors': processors.available_processors(),
'available_timezones': sorted(available_timezones()),
'browser_steps_config': browser_step_ui_config,
'emailprefix': os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
'extra_classes': 'checking-now' if worker_handler.is_watch_running(uuid) else '',
'extra_classes': ' '.join(c),
'extra_notification_token_placeholder_info': datastore.get_unique_notification_token_placeholders_available(),
'extra_processor_config': form.extra_tab_content(),
'extra_title': f" - Edit - {watch.label}",
@@ -258,15 +314,13 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
'url': url_for('rss.rss_single_watch', uuid=watch['uuid'], token=app_rss_token)
},
'settings_application': datastore.data['settings']['application'],
'system_has_playwright_configured': os.getenv('PLAYWRIGHT_DRIVER_URL'),
'system_has_webdriver_configured': os.getenv('WEBDRIVER_URL'),
'ui_edit_stats_extras': collect_ui_edit_stats_extras(watch),
'visual_selector_data_ready': datastore.visualselector_data_is_ready(watch_uuid=uuid),
'timezone_default_config': datastore.data['settings']['application'].get('scheduler_timezone_default'),
'using_global_webdriver_wait': not default['webdriver_delay'],
'uuid': uuid,
'watch': watch,
'watch_needs_selenium_or_playwright': watch_needs_selenium_or_playwright,
'capabilities': capabilities
}
included_content = None
@@ -286,17 +340,19 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
return output
@edit_blueprint.route("/edit/<string:uuid>/get-html", methods=['GET'])
@edit_blueprint.route("/edit/<uuid_str:uuid>/get-html", methods=['GET'])
@login_optionally_required
def watch_get_latest_html(uuid):
from io import BytesIO
from flask import send_file
import brotli
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
watch = datastore.data['watching'].get(uuid)
if watch and watch.history.keys() and os.path.isdir(watch.watch_data_dir):
if watch and watch.history.keys() and os.path.isdir(watch.data_dir):
latest_filename = list(watch.history.keys())[-1]
html_fname = os.path.join(watch.watch_data_dir, f"{latest_filename}.html.br")
html_fname = os.path.join(watch.data_dir, f"{latest_filename}.html.br")
with open(html_fname, 'rb') as f:
if html_fname.endswith('.br'):
# Read and decompress the Brotli file
@@ -311,12 +367,65 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Return a 500 error
abort(500)
@edit_blueprint.route("/edit/<uuid_str:uuid>/get-data-package", methods=['GET'])
@login_optionally_required
def watch_get_data_package(uuid):
"""Download all data for a single watch as a zip file"""
from io import BytesIO
from flask import send_file
import zipfile
from pathlib import Path
import datetime
watch = datastore.data['watching'].get(uuid)
if not watch:
abort(404)
# Create zip in memory
memory_file = BytesIO()
with zipfile.ZipFile(memory_file, 'w',
compression=zipfile.ZIP_DEFLATED,
compresslevel=8) as zipObj:
# Add the watch's JSON file if it exists
watch_json_path = os.path.join(watch.data_dir, 'watch.json')
if os.path.isfile(watch_json_path):
zipObj.write(watch_json_path,
arcname=os.path.join(uuid, 'watch.json'),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
# Add all files in the watch data directory
if os.path.isdir(watch.data_dir):
for f in Path(watch.data_dir).glob('*'):
if f.is_file() and f.name != 'watch.json': # Skip watch.json since we already added it
zipObj.write(f,
arcname=os.path.join(uuid, f.name),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
# Seek to beginning of file
memory_file.seek(0)
# Generate filename with timestamp
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
filename = f"watch-data-{uuid[:8]}-{timestamp}.zip"
return send_file(memory_file,
as_attachment=True,
download_name=filename,
mimetype='application/zip')
# Ajax callback
@edit_blueprint.route("/edit/<string:uuid>/preview-rendered", methods=['POST'])
@edit_blueprint.route("/edit/<uuid_str:uuid>/preview-rendered", methods=['POST'])
@login_optionally_required
def watch_get_preview_rendered(uuid):
'''For when viewing the "preview" of the rendered text from inside of Edit'''
from flask import jsonify
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
from changedetectionio.processors.text_json_diff import prepare_filter_prevew
result = prepare_filter_prevew(watch_uuid=uuid, form_data=request.form, datastore=datastore)
return jsonify(result)
@@ -340,6 +449,9 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
s = re.sub(r'[0-9]+', r'\\d+', s)
datastore.data["watching"][uuid]['ignore_text'].append('/' + s + '/')
return f"<a href={url_for('ui.ui_views.preview_page', uuid=uuid)}>Click to preview</a>"
# Save the updated ignore_text
datastore.data["watching"][uuid].commit()
return f"<a href={url_for('ui.ui_preview.preview_page', uuid=uuid)}>Click to preview</a>"
return edit_blueprint

View File

@@ -108,8 +108,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
prev_snapshot = watch.get_history_snapshot(timestamp=dates[-2])
current_snapshot = watch.get_history_snapshot(timestamp=dates[-1])
n_object.update(set_basic_notification_vars(snapshot_contents=snapshot_contents,
current_snapshot=current_snapshot,
n_object.update(set_basic_notification_vars(current_snapshot=current_snapshot,
prev_snapshot=prev_snapshot,
watch=watch,
triggered_text=trigger_text,
@@ -119,6 +118,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
sent_obj = process_notification(n_object, datastore)
except Exception as e:
logger.error(e)
e_str = str(e)
# Remove this text which is not important and floods the container
e_str = e_str.replace(

View File

@@ -0,0 +1,189 @@
from flask import Blueprint, request, url_for, flash, render_template, redirect
from flask_babel import gettext
import time
from loguru import logger
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import html_tools
def construct_blueprint(datastore: ChangeDetectionStore):
preview_blueprint = Blueprint('ui_preview', __name__, template_folder="../ui/templates")
@preview_blueprint.route("/preview/<uuid_str:uuid>", methods=['GET'])
@login_optionally_required
def preview_page(uuid):
"""
Render the preview page for a watch.
This route is processor-aware: it delegates rendering to the processor's
preview.py module, allowing different processor types to provide
custom visualizations:
- text_json_diff: Text preview with syntax highlighting
- image_ssim_diff: Image preview with proper rendering
- restock_diff: Could show latest price/stock data
Each processor implements processors/{type}/preview.py::render()
If a processor doesn't have a preview module, falls back to default text preview.
"""
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's preview module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'preview')
# Call the processor's render() function
if processor_module and hasattr(processor_module, 'render'):
return processor_module.render(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=render_template,
flash=flash,
redirect=redirect
)
# Fallback: if processor doesn't have preview module, use default text preview
content = []
versions = []
timestamp = None
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
extra_stylesheets = [url_for('static_content', group='styles', filename='diff.css')]
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
triggered_line_numbers = []
ignored_line_numbers = []
blocked_line_numbers = []
if datastore.data['watching'][uuid].history_n == 0 and (watch.get_error_text() or watch.get_error_snapshot()):
flash(gettext("Preview unavailable - No fetch/check completed or triggers not reached"), "error")
else:
# So prepare the latest preview or not
preferred_version = request.args.get('version')
versions = list(watch.history.keys())
timestamp = versions[-1]
if preferred_version and preferred_version in versions:
timestamp = preferred_version
try:
versions = list(watch.history.keys())
content = watch.get_history_snapshot(timestamp=timestamp)
triggered_line_numbers = html_tools.strip_ignore_text(content=content,
wordlist=watch.get('trigger_text'),
mode='line numbers'
)
ignored_line_numbers = html_tools.strip_ignore_text(content=content,
wordlist=watch.get('ignore_text'),
mode='line numbers'
)
blocked_line_numbers = html_tools.strip_ignore_text(content=content,
wordlist=watch.get("text_should_not_be_present"),
mode='line numbers'
)
except Exception as e:
content.append({'line': f"File doesnt exist or unable to read timestamp {timestamp}", 'classes': ''})
from changedetectionio.pluggy_interface import get_fetcher_capabilities
capabilities = get_fetcher_capabilities(watch, datastore)
output = render_template("preview.html",
capabilities=capabilities,
content=content,
current_diff_url=watch['url'],
current_version=timestamp,
extra_stylesheets=extra_stylesheets,
extra_title=f" - Diff - {watch.label} @ {timestamp}",
highlight_ignored_line_numbers=ignored_line_numbers,
highlight_triggered_line_numbers=triggered_line_numbers,
highlight_blocked_line_numbers=blocked_line_numbers,
history_n=watch.history_n,
is_html_webdriver=is_html_webdriver,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
screenshot=watch.get_screenshot(),
uuid=uuid,
versions=versions,
watch=watch,
)
return output
@preview_blueprint.route("/preview/<uuid_str:uuid>/processor-asset/<string:asset_name>", methods=['GET'])
@login_optionally_required
def processor_asset(uuid, asset_name):
"""
Serve processor-specific binary assets for preview (images, files, etc.).
This route is processor-aware: it delegates to the processor's
preview.py module, allowing different processor types to serve
custom assets without embedding them as base64 in templates.
This solves memory issues with large binary data by streaming them
as separate HTTP responses instead of embedding in the HTML template.
Each processor implements processors/{type}/preview.py::get_asset()
which returns (binary_data, content_type, cache_control_header).
Example URLs:
- /preview/{uuid}/processor-asset/screenshot?version=123456789
"""
from flask import make_response
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash(gettext("No history found for the specified link, bad link?"), "error")
return redirect(url_for('watchlist.index'))
# Get the processor type for this watch
processor_name = watch.get('processor', 'text_json_diff')
# Try to get the processor's preview module (works for both built-in and plugin processors)
from changedetectionio.processors import get_processor_submodule
processor_module = get_processor_submodule(processor_name, 'preview')
# Call the processor's get_asset() function
if processor_module and hasattr(processor_module, 'get_asset'):
result = processor_module.get_asset(
asset_name=asset_name,
watch=watch,
datastore=datastore,
request=request
)
if result is None:
from flask import abort
abort(404, description=f"Asset '{asset_name}' not found")
binary_data, content_type, cache_control = result
response = make_response(binary_data)
response.headers['Content-Type'] = content_type
if cache_control:
response.headers['Cache-Control'] = cache_control
return response
else:
logger.warning(f"Processor {processor_name} does not implement get_asset()")
from flask import abort
abort(404, description=f"Processor '{processor_name}' does not support assets")
return preview_blueprint

View File

@@ -9,13 +9,12 @@
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" >
<fieldset>
<div class="pure-control-group">
This will remove version history (snapshots) for ALL watches, but keep
your list of URLs! <br />
You may like to use the <strong>BACKUP</strong> link first.<br />
{{ _('This will remove version history (snapshots) for ALL watches, but keep your list of URLs!') }} <br />
{{ _('You may like to use the') }} <strong>{{ _('BACKUP') }}</strong> {{ _('link first.') }}<br />
</div>
<br />
<div class="pure-control-group">
<label for="confirmtext">Confirmation text</label>
<label for="confirmtext">{{ _('Confirmation text') }}</label>
<input
type="text"
id="confirmtext"
@@ -25,20 +24,19 @@
size="10"
/>
<span class="pure-form-message-inline"
>Type in the word <strong>clear</strong> to confirm that you
understand.</span
>{{ _('Type in the word') }} <strong>{{ _('clear') }}</strong> {{ _('to confirm that you understand.') }}</span
>
</div>
<br />
<div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary">
Clear History!
{{ _('Clear History!') }}
</button>
</div>
<br />
<div class="pure-control-group">
<a href="{{url_for('watchlist.index')}}" class="pure-button button-cancel"
>Cancel</a
>{{ _('Cancel') }}</a
>
</div>
</fieldset>

View File

@@ -0,0 +1,12 @@
<ul id="highlightSnippetActions">
<li>
<button class="pure-button pure-button-primary" onclick="diffToJpeg()" title="{{ _('Share diff as image') }}">{{ _('Share as Image') }}</button>
</li>
<li>
<a class="pure-button pure-button-primary" data-mode="exact" href="javascript:void(0);">{{ _('Ignore any lines matching') }}</a>
</li>
<li>
<a class="pure-button pure-button-primary" data-mode="digit-regex" href="javascript:void(0);" >{{ _('Ignore any lines matching excluding digits') }}</a>
</li>
</ul>

View File

@@ -0,0 +1,166 @@
{% extends 'base.html' %}
{% from '_helpers.html' import render_field, render_checkbox_field, render_button %}
{% block content %}
<script>
const screenshot_url="{{url_for('static_content', group='screenshot', filename=uuid)}}";
{% if last_error_screenshot %}
const error_screenshot_url="{{url_for('static_content', group='screenshot', filename=uuid, error_screenshot=1) }}";
{% endif %}
const highlight_submit_ignore_url="{{url_for('ui.ui_edit.highlight_submit_ignore_url', uuid=uuid)}}";
const watch_url= {{watch_a.link|tojson}};
// Initial scroll position: if set, scroll to this line number in #difference on page load
const initialScrollToLineNumber = {{ initial_scroll_line_number|default('null') }};
</script>
<script src="https://cdn.jsdelivr.net/npm/html2canvas@1.4.1/dist/html2canvas.min.js"></script>
<script src="{{url_for('static_content', group='js', filename='plugins.js')}}"></script>
<script src="https://cdn.jsdelivr.net/npm/piexifjs@1.0.6/piexif.min.js"></script>
<script src="{{url_for('static_content', group='js', filename='snippet-to-image.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='diff-overview.js')}}" defer></script>
<div id="settings">
<form class="pure-form " action="{{ url_for("ui.ui_diff.diff_history_page", uuid=uuid) }}" method="GET" id="diff-form">
<fieldset class="diff-fieldset">
{% if versions|length >= 1 %}
<span style="white-space: nowrap;">
<label id="change-from" for="diff-from-version" class="from-to-label">{{ _('From') }}</label>
<select id="diff-from-version" name="from_version" class="needs-localtime">
{%- for version in versions|reverse -%}
<option value="{{ version }}" {% if version== from_version %} selected="" {% endif %}>
{{ version }}{#{% if loop.index == 2 %} (Previous){% endif %}#}
</option>
{%- endfor -%}
</select>
</span>
<span style="white-space: nowrap;">
<label id="change-to" for="diff-to-version" class="from-to-label">{{ _('To') }}</label>
<select id="diff-to-version" name="to_version" class="needs-localtime">
{%- for version in versions|reverse -%}
<option value="{{ version }}" {% if version== to_version %} selected="" {% endif %}>
{{ version }}{#{% if loop.first %} (Current){% endif %}#}
</option>
{%- endfor -%}
</select>
</span>
{#<button type="submit" class="pure-button pure-button-primary reset-margin">Go</button>#}
{% endif %}
</fieldset>
<fieldset id="diff-style">
<span>
<label for="diffWords" class="pure-checkbox">
<input type="radio" name="type" id="diffWords" value="diffWords" {% if diff_prefs.type == 'diffWords' %}checked=""{% endif %}> {{ _('Words') }}</label>
</span>
<span>
<label for="diffLines" class="pure-checkbox">
<input type="radio" name="type" id="diffLines" value="diffLines" {% if diff_prefs.type == 'diffLines' %}checked=""{% endif %}> {{ _('Lines') }}</label>
</span>
<span>
<label for="ignoreWhitespace" class="pure-checkbox" id="label-diff-ignorewhitespace">
<input type="checkbox" id="ignoreWhitespace" name="ignoreWhitespace" {% if diff_prefs.ignoreWhitespace %}checked=""{% endif %}> {{ _('Ignore Whitespace') }}</label>
</span>
<span>
<label for="changesOnly" class="pure-checkbox" id="label-diff-changes">
<input type="checkbox" id="changesOnly" name="changesOnly" {% if diff_prefs.changesOnly %}checked=""{% endif %}> {{ _('Same/non-changed') }}</label>
</span>
<span>
<label for="removed" class="pure-checkbox" id="label-diff-removed">
<input type="checkbox" id="removed" name="removed" {% if diff_prefs.removed %}checked=""{% endif %}> {{ _('Removed') }}</label>
</span>
<span>
<label for="added" class="pure-checkbox" id="label-diff-added">
<input type="checkbox" id="added" name="added" {% if diff_prefs.added %}checked=""{% endif %}> {{ _('Added') }}</label>
</span>
<span>
<label for="replaced" class="pure-checkbox" id="label-diff-replaced">
<input type="checkbox" id="replaced" name="replaced" {% if diff_prefs.replaced %}checked=""{% endif %}> {{ _('Replaced') }}</label>
</span>
</fieldset>
{%- if versions|length >= 2 -%}
<div id="keyboard-nav">
<strong>{{ _('Keyboard:') }} </strong>
<a href="" class="pure-button pure-button-primary" id="btn-previous"> &larr; {{ _('Previous') }}</a>
&nbsp; <a class="pure-button pure-button-primary" id="btn-next" href=""> &rarr; {{ _('Next') }}</a>
</div>
{%- endif -%}
</form>
</div>
<div id="diff-jump" style="display:none;"><!-- disabled for now -->
<a id="jump-next-diff" title="{{ _('Jump to next difference') }}">{{ _('Jump') }}</a>
</div>
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
<div class="tabs">
<ul>
{% if last_error_text %}<li class="tab" id="error-text-tab"><a href="#error-text">{{ _('Error Text') }}</a></li> {% endif %}
{% if last_error_screenshot %}<li class="tab" id="error-screenshot-tab"><a href="#error-screenshot">{{ _('Error Screenshot') }}</a></li> {% endif %}
<li class="tab" id="text-tab"><a href="#text">{{ _('Text') }}</a></li>
<li class="tab" id="screenshot-tab"><a href="#screenshot">{{ _('Current screenshot') }}</a></li>
<li class="tab" id="extract-tab"><a href="{{ url_for('ui.ui_diff.diff_history_page_extract_GET', uuid=uuid)}}">{{ _('Extract Data') }}</a></li>
</ul>
</div>
<div id="diff-ui">
<div class="tab-pane-inner" id="error-text">
<div class="snapshot-age error">{{watch_a.error_text_ctime|format_seconds_ago}} {{ _('seconds ago.') }}</div>
<pre>
{{ last_error_text }}
</pre>
</div>
<div class="tab-pane-inner" id="error-screenshot">
<div class="snapshot-age error">{{watch_a.snapshot_error_screenshot_ctime|format_seconds_ago}} {{ _('seconds ago') }}</div>
<img id="error-screenshot-img" style="max-width: 80%" alt="{{ _('Current error-ing screenshot from most recent request') }}" >
</div>
<div class="tab-pane-inner" id="text">
{%- if (content | default('')).split('\n') | length > 100 -%}
<div id="cell-diff-jump-visualiser" style="user-select: none;">
{%- for cell in diff_cell_grid -%}
<div{% if cell.class %} class="{{ cell.class }}"{% endif %}></div>
{%- endfor -%}
</div>
{%- endif -%}
{%- if password_enabled_and_share_is_off -%}
<div class="tip">{{ _('Pro-tip: You can enable') }} <strong>{{ _('"share access when password is enabled"') }}</strong> {{ _('from settings.') }}
</div>
{%- endif -%}
<div id="text-diff-heading-area" style="user-select: none;">
<div class="snapshot-age"><span>{{ from_version|format_timestamp_timeago }}</span>
{%- if note -%}<span class="note"><strong>{{ note }}</strong></span>{%- endif -%}
<a href="{{ url_for("ui.ui_preview.preview_page", uuid=uuid) }}">{{ _('Goto single snapshot') }}</a>
</div>
</div>
<pre id="difference" style="border-left: 2px solid #ddd;">{{ content| diff_unescape_difference_spans }}</pre>
<div id="diff-visualiser-area-after" style="user-select: none;">
<strong>{{ _('Tip:') }}</strong> {{ _('Highlight text to share or add to ignore lists.') }}
</div>
</div>
<div class="tab-pane-inner" id="screenshot">
<div class="tip">
{{ _('For now, Differences are performed on text, not graphically, only the latest screenshot is available.') }}
</div>
{% if is_html_webdriver %}
{% if screenshot %}
<div class="snapshot-age">{{watch_a.snapshot_screenshot_ctime|format_timestamp_timeago}}</div>
<img style="max-width: 80%" id="screenshot-img" alt="{{ _('Current screenshot from most recent request') }}" >
{% else %}
{{ _('No screenshot available just yet! Try rechecking the page.') }}
{% endif %}
{% else %}
<strong>{{ _('Screenshot requires Playwright/WebDriver enabled') }}</strong>
{% endif %}
</div>
</div>
<script>
const newest_version_timestamp = {{newest_version_timestamp}};
</script>
<script src="{{url_for('static_content', group='js', filename='diff-render.js')}}"></script>
{% endblock %}

View File

@@ -1,12 +1,13 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_field, render_checkbox_field, render_button, render_time_schedule_form, playwright_warning, only_playwright_type_watches_warning, render_conditions_fieldlist_of_formfields_as_table, render_ternary_field %}
{% from '_helpers.html' import render_field, render_checkbox_field, render_button, render_time_schedule_form, playwright_warning, only_playwright_type_watches_warning, highlight_trigger_ignored_explainer, render_conditions_fieldlist_of_formfields_as_table, render_ternary_field %}
{% from '_common_fields.html' import render_common_settings_form %}
<script src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='vis.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='global-settings.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='scheduler.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='conditions.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='modal.js')}}"></script>
<script>
@@ -43,20 +44,25 @@
<div class="tabs collapsable">
<ul>
<li class="tab"><a href="#general">General</a></li>
<li class="tab"><a href="#request">Request</a></li>
<li class="tab"><a href="#general">{{ _('General') }}</a></li>
{% if capabilities.supports_request_type %}
<li class="tab"><a href="#request">{{ _('Request') }}</a></li>
{% endif %}
{% if extra_tab_content %}
<li class="tab"><a href="#extras_tab">{{ extra_tab_content }}</a></li>
{% endif %}
<li class="tab"><a id="browsersteps-tab" href="#browser-steps">Browser Steps</a></li>
<!-- should goto extra forms? -->
{% if watch['processor'] == 'text_json_diff' %}
<li class="tab"><a id="visualselector-tab" href="#visualselector">Visual Filter Selector</a></li>
<li class="tab" id="filters-and-triggers-tab"><a href="#filters-and-triggers">Filters &amp; Triggers</a></li>
<li class="tab" id="conditions-tab"><a href="#conditions">Conditions</a></li>
{% if capabilities.supports_browser_steps %}
<li class="tab"><a id="browsersteps-tab" href="#browser-steps">{{ _('Browser Steps') }}</a></li>
{% endif %}
<li class="tab"><a href="#notifications">Notifications</a></li>
<li class="tab"><a href="#stats">Stats</a></li>
{% if capabilities.supports_visual_selector %}
<li class="tab"><a id="visualselector-tab" href="#visualselector">{{ _('Visual Filter Selector') }}</a></li>
{% endif %}
{% if capabilities.supports_text_filters_and_triggers %}
<li class="tab" id="filters-and-triggers-tab"><a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a></li>
<li class="tab" id="conditions-tab"><a href="#conditions">{{ _('Conditions') }}</a></li>
{% endif %}
<li class="tab"><a href="#notifications">{{ _('Notifications') }}</a></li>
<li class="tab"><a href="#stats">{{ _('Stats') }}</a></li>
</ul>
</div>
@@ -69,19 +75,19 @@
<fieldset>
<div class="pure-control-group">
{{ render_field(form.url, placeholder="https://...", required=true, class="m-d") }}
<div class="pure-form-message">Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a></div>
<div class="pure-form-message">Variables are supported in the URL (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">help and examples here</a>).</div>
<div class="pure-form-message">{{ _('Some sites use JavaScript to create the content, for this you should') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">{{ _('use the Chrome/WebDriver Fetcher') }}</a></div>
<div class="pure-form-message">{{ _('Variables are supported in the URL') }} (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">{{ _('help and examples here') }}</a>).</div>
</div>
<div class="pure-control-group">
{{ render_field(form.tags) }}
<span class="pure-form-message-inline">Organisational tag/group name used in the main listing page</span>
<span class="pure-form-message-inline">{{ _('Organisational tag/group name used in the main listing page') }}</span>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.processor) }}
</div>
<div class="pure-control-group">
{{ render_field(form.title, class="m-d", placeholder=watch.label) }}
<span class="pure-form-message-inline">Automatically uses the page title if found, you can also use your own title/description here</span>
<span class="pure-form-message-inline">{{ _('Automatically uses the page title if found, you can also use your own title/description here') }}</span>
</div>
<div class="pure-control-group time-between-check border-fieldset">
@@ -91,7 +97,7 @@
{{ render_field(form.time_between_check, class="time-check-widget") }}
<span class="pure-form-message-inline">
The interval/amount of time between each check.
{{ _('The interval/amount of time between each check.') }}
</span>
</div>
<div id="time-between-check-schedule">
@@ -106,7 +112,14 @@
<div class="pure-control-group">
{{ render_checkbox_field(form.filter_failure_notification_send) }}
<span class="pure-form-message-inline">
Sends a notification when the filter can no longer be seen on the page, good for knowing when the page changed and your filter will not work anymore.
{{ _('Sends a notification when the filter can no longer be seen on the page, good for knowing when the page changed and your filter will not work anymore.') }}
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.history_snapshot_max_length, class="history_snapshot_max_length") }}
<span class="pure-form-message-inline">{{ _('Limit collection of history snapshots for each watch to this number of history items.') }}
<br>
{{ _('Set to empty to use system settings default') }}
</span>
</div>
<div class="pure-control-group">
@@ -115,21 +128,22 @@
</fieldset>
</div>
{% if capabilities.supports_request_type %}
<div class="tab-pane-inner" id="request">
<div class="pure-control-group inline-radio">
{{ render_field(form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched site doesn't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server, set by the ENV var 'WEBDRIVER_URL'. </p>
Tip: <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">Connect using Bright Data and Oxylabs Proxies, find out more here.</a>
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched site doesn\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var \'WEBDRIVER_URL\'.') }} </p>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
</span>
</div>
{% if form.proxy %}
<div class="pure-control-group inline-radio">
<div>{{ form.proxy.label }} <a href="" id="check-all-proxies" class="pure-button button-secondary button-xsmall" >Check/Scan all</a></div>
<div>{{ form.proxy.label }} <a href="" id="check-all-proxies" class="pure-button button-secondary button-xsmall" >{{ _('Check/Scan all') }}</a></div>
<div>{{ form.proxy(class="fetch-backend-proxy") }}</div>
<span class="pure-form-message-inline">
Choose a proxy for this watch
{{ _('Choose a proxy for this watch') }}
</span>
</div>
{% endif %}
@@ -139,31 +153,29 @@
<div class="pure-control-group">
{{ render_field(form.webdriver_delay) }}
<div class="pure-form-message-inline">
<strong>If you're having trouble waiting for the page to be fully rendered (text missing etc), try increasing the 'wait' time here.</strong>
<strong>{{ _('If you\'re having trouble waiting for the page to be fully rendered (text missing etc), try increasing the \'wait\' time here.') }}</strong>
<br>
This will wait <i>n</i> seconds before extracting the text.
{{ _('This will wait') }} <i>n</i> {{ _('seconds before extracting the text.') }}
{% if using_global_webdriver_wait %}
<br><strong>Using the current global default settings</strong>
<br><strong>{{ _('Using the current global default settings') }}</strong>
{% endif %}
</div>
</div>
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">Show advanced options</a>
<a class="pure-button button-secondary button-xsmall show-advanced">{{ _('Show advanced options') }}</a>
</div>
<div class="advanced-options" style="display: none;">
{{ render_field(form.webdriver_js_execute_code) }}
<div class="pure-form-message-inline">
Run this code before performing change detection, handy for filling in fields and other
actions <a
href="https://github.com/dgtlmoon/changedetection.io/wiki/Run-JavaScript-before-change-detection">More
help and examples here</a>
{{ _('Run this code before performing change detection, handy for filling in fields and other actions') }} <a
href="https://github.com/dgtlmoon/changedetection.io/wiki/Run-JavaScript-before-change-detection">{{ _('More help and examples here') }}</a>
</div>
</div>
</fieldset>
<!-- html requests always -->
<fieldset data-visible-for="fetch_backend=html_requests">
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">Show advanced options</a>
<a class="pure-button button-secondary button-xsmall show-advanced">{{ _('Show advanced options') }}</a>
</div>
<div class="advanced-options" style="display: none;">
<div class="pure-control-group" id="request-method">
@@ -178,7 +190,7 @@
\"year\":{% now 'Europe/Berlin', '%Y' %}
}") }}
</div>
<div class="pure-form-message">Variables are supported in the request body (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">help and examples here</a>).</div>
<div class="pure-form-message">{{ _('Variables are supported in the request body') }} (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">{{ _('help and examples here') }}</a>).</div>
</div>
</fieldset>
<!-- hmm -->
@@ -187,15 +199,15 @@
Cookie: foobar
User-Agent: wonderbra 1.0
Math: {{ 1 + 1 }}") }}
<div class="pure-form-message">Variables are supported in the request header values (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">help and examples here</a>).</div>
<div class="pure-form-message">{{ _('Variables are supported in the request header values') }} (<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">{{ _('help and examples here') }}</a>).</div>
<div class="pure-form-message-inline">
{% if has_extra_headers_file %}
<strong>Alert! Extra headers file found and will be added to this watch!</strong>
<strong>{{ _('Alert! Extra headers file found and will be added to this watch!') }}</strong>
{% else %}
Headers can be also read from a file in your data-directory <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Adding-headers-from-an-external-file">Read more here</a>
{{ _('Headers can be also read from a file in your data-directory') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Adding-headers-from-an-external-file">{{ _('Read more here') }}</a>
{% endif %}
<br>
(Not supported by Selenium browser)
({{ _('Not supported by Selenium browser') }})
</div>
</div>
<fieldset data-visible-for="fetch_backend=html_requests fetch_backend=html_webdriver" >
@@ -204,11 +216,11 @@ Math: {{ 1 + 1 }}") }}
</div>
</fieldset>
</div>
{% endif %}
<div class="tab-pane-inner" id="browser-steps">
{% if watch_needs_selenium_or_playwright %}
{# Only works with playwright #}
{% if system_has_playwright_configured %}
{% if capabilities.supports_browser_steps %}
{% if true %}
<img class="beta-logo" src="{{url_for('static_content', group='images', filename='beta-logo.png')}}" alt="New beta functionality">
<fieldset>
<div class="pure-control-group">
@@ -220,19 +232,19 @@ Math: {{ 1 + 1 }}") }}
<!--- Do this later -->
<div class="checkbox" style="display: none;">
<input type=checkbox id="include_text_elements" > <label for="include_text_elements">Turn on text finder</label>
<input type=checkbox id="include_text_elements" > <label for="include_text_elements">{{ _('Turn on text finder') }}</label>
</div>
<div id="loading-status-text" style="display: none;">Please wait, first browser step can take a little time to load..<div class="spinner"></div></div>
<div id="loading-status-text" style="display: none;">{{ _('Please wait, first browser step can take a little time to load..') }}<div class="spinner"></div></div>
<div class="flex-wrapper" >
<div id="browser-steps-ui" class="noselect">
<div class="noselect" id="browsersteps-selector-wrapper" style="width: 100%">
<span class="loader" >
<span id="browsersteps-click-start">
<h2 >Click here to Start</h2>
<h2 >{{ _('Click here to Start') }}</h2>
<svg style="height: 3.5rem;" version="1.1" viewBox="0 0 32 32" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="start"/><g id="play_x5F_alt"><path d="M16,0C7.164,0,0,7.164,0,16s7.164,16,16,16s16-7.164,16-16S24.836,0,16,0z M10,24V8l16.008,8L10,24z" style="fill: var(--color-grey-400);"/></g></svg><br>
Please allow 10-15 seconds for the browser to connect.<br>
{{ _('Please allow 10-15 seconds for the browser to connect.') }}<br>
</span>
<div class="spinner" style="display: none;"></div>
</span>
@@ -241,22 +253,20 @@ Math: {{ 1 + 1 }}") }}
</div>
</div>
<div id="browser-steps-fieldlist" >
<span id="browser-seconds-remaining">Press "Play" to start.</span> <span style="font-size: 80%;"> (<a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/pull/478/files#diff-1a79d924d1840c485238e66772391268a89c95b781d69091384cf1ea1ac146c9R4">?</a>) </span>
<span id="browser-seconds-remaining">{{ _('Press "Play" to start.') }}</span> <span style="font-size: 80%;"> (<a target="newwindow" href="https://github.com/dgtlmoon/changedetection.io/pull/478/files#diff-1a79d924d1840c485238e66772391268a89c95b781d69091384cf1ea1ac146c9R4">?</a>) </span>
{{ render_field(form.browser_steps) }}
</div>
</div>
</div>
</fieldset>
{% else %}
{# it's configured to use selenium or chrome but system says its not configured #}
{{ playwright_warning() }}
{% if system_has_webdriver_configured %}
<strong>Selenium/Webdriver cant be used here because it wont fetch screenshots reliably.</strong>
{% endif %}
<strong>{{ _('Visual Selector data is not ready, watch needs to be checked atleast once.') }}</strong>
{% endif %}
{% else %}
{# "This functionality needs chrome.." #}
{{ only_playwright_type_watches_warning() }}
<p>
<strong>{{ _('Sorry, this functionality only works with fetchers that support interactive Javascript (so far only Playwright based fetchers)') }}<br>
{{ _('You need to') }} <a href="#request">{{ _('Set the fetch method') }}</a> {{ _('to one that supports interactive Javascript.') }}</strong>
</p>
{% endif %}
</div>
@@ -266,29 +276,28 @@ Math: {{ 1 + 1 }}") }}
<div class="pure-control-group inline-radio">
{{ render_ternary_field(form.notification_muted, BooleanField=true) }}
</div>
{% if watch_needs_selenium_or_playwright %}
{% if capabilities.supports_screenshots %}
<div class="pure-control-group inline-radio">
{{ render_checkbox_field(form.notification_screenshot) }}
<span class="pure-form-message-inline">
<strong>Use with caution!</strong> This will easily fill up your email storage quota or flood other storages.
<strong>{{ _('Use with caution!') }}</strong> {{ _('This will easily fill up your email storage quota or flood other storages.') }}
</span>
</div>
{% endif %}
<div class="field-group" id="notification-field-group">
{% if has_default_notification_urls %}
<div class="inline-warning">
<img class="inline-warning-icon" src="{{url_for('static_content', group='images', filename='notice.svg')}}" alt="Look out!" title="Lookout!" >
There are <a href="{{ url_for('settings.settings_page')}}#notifications">system-wide notification URLs enabled</a>, this form will override notification settings for this watch only &dash; an empty Notification URL list here will still send notifications.
<img class="inline-warning-icon" src="{{url_for('static_content', group='images', filename='notice.svg')}}" alt="{{ _('Look out!') }}" title="{{ _('Lookout!') }}" >
{{ _('There are') }} <a href="{{ url_for('settings.settings_page')}}#notifications">{{ _('system-wide notification URLs enabled') }}</a>, {{ _('this form will override notification settings for this watch only') }} &dash; {{ _('an empty Notification URL list here will still send notifications.') }}
</div>
{% endif %}
<a href="#notifications" id="notification-setting-reset-to-default" class="pure-button button-xsmall" style="right: 20px; top: 20px; position: absolute; background-color: #5f42dd; border-radius: 4px; font-size: 70%; color: #fff">Use system defaults</a>
<a href="#notifications" id="notification-setting-reset-to-default" class="pure-button button-xsmall" style="right: 20px; top: 20px; position: absolute; background-color: #5f42dd; border-radius: 4px; font-size: 70%; color: #fff">{{ _('Use system defaults') }}</a>
{{ render_common_settings_form(form, emailprefix, settings_application, extra_notification_token_placeholder_info) }}
</div>
</fieldset>
</div>
{% if watch['processor'] == 'text_json_diff' %}
{% if capabilities.supports_text_filters_and_triggers %}
<div class="tab-pane-inner" id="conditions">
<script>
const verify_condition_rule_url="{{url_for('conditions.verify_condition_single_rule', watch_uuid=uuid)}}";
@@ -298,74 +307,77 @@ Math: {{ 1 + 1 }}") }}
{{ render_conditions_fieldlist_of_formfields_as_table(form.conditions) }}
<div class="pure-form-message-inline">
<p id="verify-state-text">Use the verify (✓) button to test if a condition passes against the current snapshot.</p>
Read a quick tutorial about <a href="https://changedetection.io/tutorial/conditional-actions-web-page-changes">using conditional web page changes here</a>.<br>
<p id="verify-state-text">{{ _('Use the verify (✓) button to test if a condition passes against the current snapshot.') }}</p>
{{ _('Read a quick tutorial about') }} <a href="https://changedetection.io/tutorial/conditional-actions-web-page-changes">{{ _('using conditional web page changes here') }}</a>.<br>
</div>
</div>
</div>
<div class="tab-pane-inner" id="filters-and-triggers">
<span id="activate-text-preview" class="pure-button pure-button-primary button-xsmall">Activate preview</span>
<span id="activate-text-preview" class="pure-button pure-button-primary button-xsmall">{{ _('Activate preview') }}</span>
<div>
<div id="edit-text-filter">
<div class="pure-control-group" id="pro-tips">
<strong>Pro-tips:</strong><br>
{% if capabilities.supports_text_filters_and_triggers_elements %}
<div class="pure-control-group" id="pro-tips">
<strong>{{ _('Pro-tips:') }}</strong><br>
<ul>
<li>
Use the preview page to see your filters and triggers highlighted.
{{ _('Use the preview page to see your filters and triggers highlighted.') }}
</li>
<li>
Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a>
{{ _('Some sites use JavaScript to create the content, for this you should') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">{{ _('use the Chrome/WebDriver Fetcher') }}</a>
</li>
</ul>
</div>
{% include "edit/include_subtract.html" %}
{% endif %}
<div class="text-filtering border-fieldset">
<fieldset class="pure-group" id="text-filtering-type-options">
<h3>Text filtering</h3>
Limit trigger/ignore/block/extract to;<br>
<h3>{{ _('Text filtering') }}</h3>
{{ _('Limit trigger/ignore/block/extract to;') }}<br>
{{ render_checkbox_field(form.filter_text_added) }}
{{ render_checkbox_field(form.filter_text_replaced) }}
{{ render_checkbox_field(form.filter_text_removed) }}
<span class="pure-form-message-inline">Note: Depending on the length and similarity of the text on each line, the algorithm may consider an <strong>addition</strong> instead of <strong>replacement</strong> for example.</span><br>
<span class="pure-form-message-inline">&nbsp;So it's always better to select <strong>Added</strong>+<strong>Replaced</strong> when you're interested in new content.</span><br>
<span class="pure-form-message-inline">&nbsp;When content is merely moved in a list, it will also trigger an <strong>addition</strong>, consider enabling <code><strong>Only trigger when unique lines appear</strong></code></span>
<span class="pure-form-message-inline">{{ _('Note: Depending on the length and similarity of the text on each line, the algorithm may consider an') }} <strong>{{ _('addition') }}</strong> {{ _('instead of') }} <strong>{{ _('replacement') }}</strong> {{ _('for example.') }}</span><br>
<span class="pure-form-message-inline">&nbsp;{{ _('So it\'s always better to select') }} <strong>{{ _('Added') }}</strong>+<strong>{{ _('Replaced') }}</strong> {{ _('when you\'re interested in new content.') }}</span><br>
<span class="pure-form-message-inline">&nbsp;{{ _('When content is merely moved in a list, it will also trigger an') }} <strong>{{ _('addition') }}</strong>, {{ _('consider enabling') }} <code><strong>{{ _('Only trigger when unique lines appear') }}</strong></code></span>
</fieldset>
<fieldset class="pure-control-group">
{{ render_checkbox_field(form.check_unique_lines) }}
<span class="pure-form-message-inline">Good for websites that just move the content around, and you want to know when NEW content is added, compares new lines against all history for this watch.</span>
<span class="pure-form-message-inline">{{ _('Good for websites that just move the content around, and you want to know when NEW content is added, compares new lines against all history for this watch.') }}</span>
</fieldset>
<fieldset class="pure-control-group">
{{ render_checkbox_field(form.remove_duplicate_lines) }}
<span class="pure-form-message-inline">Remove duplicate lines of text</span>
<span class="pure-form-message-inline">{{ _('Remove duplicate lines of text') }}</span>
</fieldset>
<fieldset class="pure-control-group">
{{ render_checkbox_field(form.sort_text_alphabetically) }}
<span class="pure-form-message-inline">Helps reduce changes detected caused by sites shuffling lines around, combine with <i>check unique lines</i> below.</span>
<span class="pure-form-message-inline">{{ _('Helps reduce changes detected caused by sites shuffling lines around, combine with') }} <i>{{ _('check unique lines') }}</i> {{ _('below.') }}</span>
</fieldset>
<fieldset class="pure-control-group">
{{ render_checkbox_field(form.trim_text_whitespace) }}
<span class="pure-form-message-inline">Remove any whitespace before and after each line of text</span>
<span class="pure-form-message-inline">{{ _('Remove any whitespace before and after each line of text') }}</span>
</fieldset>
{% include "edit/text-options.html" %}
</div>
</div>
<div id="text-preview" style="display: none;" >
<script>
const preview_text_edit_filters_url="{{url_for('ui.ui_edit.watch_get_preview_rendered', uuid=uuid)}}";
</script>
<br>
{#<div id="text-preview-controls"><span id="text-preview-refresh" class="pure-button button-xsmall">Refresh</span></div>#}
<div class="minitabs-wrapper">
<div class="minitabs-content">
<div id="text-preview-inner" class="monospace-preview">
<p>Loading...</p>
</div>
<div id="text-preview-before-inner" style="display: none;" class="monospace-preview">
<p>Loading...</p>
</div>
<script>
const preview_text_edit_filters_url="{{url_for('ui.ui_edit.watch_get_preview_rendered', uuid=uuid)}}";
</script>
<br>
{#<div id="text-preview-controls"><span id="text-preview-refresh" class="pure-button button-xsmall">Refresh</span></div>#}
<div class="minitabs-wrapper">
<div class="minitabs-content">
<div id="text-preview-inner" class="monospace-preview">
<p>{{ _('Loading...') }}</p>
</div>
</div>
<div id="text-preview-before-inner" style="display: none;" class="monospace-preview">
<p>{{ _('Loading...') }}</p>
</div>
</div>
</div>
{{ highlight_trigger_ignored_explainer() }}
</div>
</div>
</div>
@@ -377,41 +389,55 @@ Math: {{ 1 + 1 }}") }}
{{ extra_form_content|safe }}
</div>
{% endif %}
{% if watch['processor'] == 'text_json_diff' %}
{% if capabilities.supports_visual_selector %}
<div class="tab-pane-inner visual-selector-ui" id="visualselector">
<img class="beta-logo" src="{{url_for('static_content', group='images', filename='beta-logo.png')}}" alt="New beta functionality">
<fieldset>
<div class="pure-control-group">
{% if watch_needs_selenium_or_playwright %}
{% if system_has_playwright_configured %}
<span class="pure-form-message-inline" id="visual-selector-heading">
The Visual Selector tool lets you select the <i>text</i> elements that will be used for the change detection. It automatically fills-in the filters in the "CSS/JSONPath/JQ/XPath Filters" box of the <a href="#filters-and-triggers">Filters & Triggers</a> tab. Use <strong>Shift+Click</strong> to select multiple items.
</span>
{% if capabilities.supports_screenshots and capabilities.supports_xpath_element_data %}
{% if visual_selector_data_ready %}
<span class="pure-form-message-inline" id="visual-selector-heading">
{{ _('The Visual Selector tool lets you select the') }} <i>{{ _('text') }}</i> {{ _('elements that will be used for the change detection. It automatically fills-in the filters in the "CSS/JSONPath/JQ/XPath Filters" box of the') }} <a href="#filters-and-triggers">{{ _('Filters & Triggers') }}</a> {{ _('tab. Use') }} <strong>{{ _('Shift+Click') }}</strong> {{ _('to select multiple items.') }}
</span>
<div id="selector-header">
<a id="clear-selector" class="pure-button button-secondary button-xsmall" style="font-size: 70%">Clear selection</a>
<!-- visual selector IMG will try to load, it will either replace this or on error replace it with some handy text -->
<i class="fetching-update-notice" style="font-size: 80%;">One moment, fetching screenshot and element information..</i>
</div>
<div id="selector-wrapper" style="display: none">
<!-- request the screenshot and get the element offset info ready -->
<!-- use img src ready load to know everything is ready to map out -->
<!-- @todo: maybe something interesting like a field to select 'elements that contain text... and their parents n' -->
<img id="selector-background" >
<canvas id="selector-canvas"></canvas>
</div>
<div id="selector-current-xpath" style="overflow-x: hidden"><strong>Currently:</strong>&nbsp;<span class="text">Loading...</span></div>
{% else %}
{# The watch needed chrome but system says that playwright is not ready #}
{{ playwright_warning() }}
{% endif %}
{% if system_has_webdriver_configured %}
<strong>Selenium/Webdriver cant be used here because it wont fetch screenshots reliably.</strong>
{% endif %}
{% if watch['processor'] == 'image_ssim_diff' %} {# @todo, integrate with image_ssim_diff selector better, use some extra form ? #}
<div id="selection-mode-controls" style="margin: 10px 0; padding: 10px; background: var(--color-background-tab); border-radius: 5px;">
<label style="font-weight: 600; margin-right: 15px;">{{ _('Selection Mode:') }}</label>
<label style="margin-right: 15px;">
<input type="radio" name="selector-mode" value="element" style="margin-right: 5px;">
{{ _('Select by element') }}
</label>
<label>
<input type="radio" name="selector-mode" value="draw" checked style="margin-right: 5px;">
{{ _('Draw area') }}
</label>
{{ render_field(form.processor_config_bounding_box) }}
{{ render_field(form.processor_config_selection_mode) }}
</div>
{% endif %}
<div id="selector-header">
<a id="clear-selector" class="pure-button button-secondary button-xsmall" style="font-size: 70%">{{ _('Clear selection') }}</a>
<!-- visual selector IMG will try to load, it will either replace this or on error replace it with some handy text -->
<i class="fetching-update-notice" style="font-size: 80%;">{{ _('One moment, fetching screenshot and element information..') }}</i>
</div>
<div id="selector-wrapper" style="display: none">
<!-- request the screenshot and get the element offset info ready -->
<!-- use img src ready load to know everything is ready to map out -->
<!-- @todo: maybe something interesting like a field to select 'elements that contain text... and their parents n' -->
<img id="selector-background" >
<canvas id="selector-canvas"></canvas>
</div>
<div id="selector-current-xpath" style="overflow-x: hidden"><strong>{{ _('Currently:') }}</strong>&nbsp;<span class="text">{{ _('Loading...') }}</span></div>
{% else %}
<strong>{{ _('Visual Selector data is not ready, watch needs to be checked atleast once.') }}</strong>
{% endif %}
{% else %}
{# "This functionality needs chrome.." #}
{{ only_playwright_type_watches_warning() }}
<p>
<strong>{{ _('Sorry, this functionality only works with fetchers that support Javascript and screenshots (such as playwright etc).') }}<br>
{{ _('You need to') }} <a href="#request">{{ _('Set the fetch method') }}</a> {{ _('to one that supports Javascript and screenshots.') }}</strong>
</p>
{% endif %}
</div>
</fieldset>
@@ -427,27 +453,27 @@ Math: {{ 1 + 1 }}") }}
<table class="pure-table" id="stats-table">
<tbody>
<tr>
<td>Check count</td>
<td>{{ _('Check count') }}</td>
<td>{{ "{:,}".format( watch.check_count) }}</td>
</tr>
<tr>
<td>Consecutive filter failures</td>
<td>{{ _('Consecutive filter failures') }}</td>
<td>{{ "{:,}".format( watch.consecutive_filter_failures) }}</td>
</tr>
<tr>
<td>History length</td>
<td>{{ _('History length') }}</td>
<td>{{ "{:,}".format(watch.history|length) }}</td>
</tr>
<tr>
<td>Last fetch duration</td>
<td>{{ _('Last fetch duration') }}</td>
<td>{{ watch.fetch_time }}s</td>
</tr>
<tr>
<td>Notification alert count</td>
<td>{{ _('Notification alert count') }}</td>
<td>{{ watch.notification_alert_count }}</td>
</tr>
<tr>
<td>Server type reply</td>
<td>{{ _('Server type reply') }}</td>
<td>{{ watch.get('remote_server_reply') }}</td>
</tr>
</tbody>
@@ -461,7 +487,8 @@ Math: {{ 1 + 1 }}") }}
{% if watch.history_n %}
<p>
<a href="{{url_for('ui.ui_edit.watch_get_latest_html', uuid=uuid)}}" class="pure-button button-small">Download latest HTML snapshot</a>
<a href="{{url_for('ui.ui_edit.watch_get_latest_html', uuid=uuid)}}" class="pure-button button-small">{{ _('Download latest HTML snapshot') }}</a>
<a href="{{url_for('ui.ui_edit.watch_get_data_package', uuid=uuid)}}" class="pure-button button-small">{{ _('Download watch data package') }}</a>
</p>
{% endif %}
@@ -471,12 +498,22 @@ Math: {{ 1 + 1 }}") }}
<div class="pure-control-group">
{{ render_button(form.save_button) }}
<a href="{{url_for('ui.form_delete', uuid=uuid)}}"
class="pure-button button-error ">Delete</a>
class="pure-button button-error"
data-requires-confirm
data-confirm-type="danger"
data-confirm-title="{{ _('Delete Watch?') }}"
data-confirm-message="<p>{{ _('Are you sure you want to delete the watch for:') }}</p><p><strong>{{ watch.get('url', 'this watch') }}</strong></p><p>{{ _('This action cannot be undone.') }}</p>"
data-confirm-button="{{ _('Delete') }}">{{ _('Delete') }}</a>
{% if watch.history_n %}<a href="{{url_for('ui.clear_watch_history', uuid=uuid)}}"
class="pure-button button-error">Clear History</a>{% endif %}
class="pure-button button-error"
data-requires-confirm
data-confirm-type="warning"
data-confirm-title="{{ _('Clear History?') }}"
data-confirm-message="<p>{{ _('Are you sure you want to clear all history for:') }}</p><p><strong>{{ watch.get('url', 'this watch') }}</strong></p><p>{{ _('This will remove all snapshots and previous versions. This action cannot be undone.') }}</p>"
data-confirm-button="{{ _('Clear History') }}">{{ _('Clear History') }}</a>{% endif %}
<a href="{{url_for('ui.form_clone', uuid=uuid)}}"
class="pure-button">Clone &amp; Edit</a>
<a href="{{ url_for('rss.rss_single_watch', uuid=uuid, token=app_rss_token)}}"><img alt="RSS Feed for this watch" style="padding: .5em 1em;" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
class="pure-button">{{ _('Clone & Edit') }}</a>
<a href="{{ url_for('rss.rss_single_watch', uuid=uuid, token=app_rss_token)}}"><img alt="{{ _('RSS Feed for this watch') }}" style="padding: .5em 1em;" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>
</div>
</div>
</form>

View File

@@ -1,9 +1,11 @@
{% extends 'base.html' %}
{% from '_helpers.html' import highlight_trigger_ignored_explainer %}
{% block content %}
<script>
const screenshot_url = "{{url_for('static_content', group='screenshot', filename=uuid)}}";
const triggered_line_numbers = {{ triggered_line_numbers|tojson }};
const triggered_line_numbers = {{ highlight_triggered_line_numbers|tojson }};
const ignored_line_numbers = {{ highlight_ignored_line_numbers|tojson }};
const blocked_line_numbers = {{ highlight_blocked_line_numbers|tojson }};
{% if last_error_screenshot %}
const error_screenshot_url = "{{url_for('static_content', group='screenshot', filename=uuid, error_screenshot=1) }}";
{% endif %}
@@ -14,10 +16,10 @@
<script src="{{ url_for('static_content', group='js', filename='preview.js') }}" defer></script>
<script src="{{ url_for('static_content', group='js', filename='tabs.js') }}" defer></script>
{% if versions|length >= 2 %}
<div id="settings" style="text-align: center;">
<div id="diff-form" style="text-align: center;">
<form class="pure-form " action="" method="POST">
<fieldset>
<label for="preview-version">Select timestamp</label> <select id="preview-version"
<label for="preview-version">{{ _('Select timestamp') }}</label> <select id="preview-version"
name="from_version"
class="needs-localtime">
{% for version in versions|reverse %}
@@ -26,27 +28,27 @@
</option>
{% endfor %}
</select>
<button type="submit" class="pure-button pure-button-primary">Go</button>
<button type="submit" class="pure-button pure-button-primary">{{ _('Go') }}</button>
</fieldset>
</form>
<br>
<strong>Keyboard: </strong><a href="" class="pure-button pure-button-primary" id="btn-previous">
&larr; Previous</a> &nbsp; <a class="pure-button pure-button-primary" id="btn-next" href="">
&rarr; Next</a>
<strong>{{ _('Keyboard:') }} </strong><a href="" class="pure-button pure-button-primary" id="btn-previous">
&larr; {{ _('Previous') }}</a> &nbsp; <a class="pure-button pure-button-primary" id="btn-next" href="">
&rarr; {{ _('Next') }}</a>
</div>
{% endif %}
<div class="tabs">
<ul>
{% if last_error_text %}
<li class="tab" id="error-text-tab"><a href="#error-text">Error Text</a></li> {% endif %}
<li class="tab" id="error-text-tab"><a href="#error-text">{{ _('Error Text') }}</a></li> {% endif %}
{% if last_error_screenshot %}
<li class="tab" id="error-screenshot-tab"><a href="#error-screenshot">Error Screenshot</a>
<li class="tab" id="error-screenshot-tab"><a href="#error-screenshot">{{ _('Error Screenshot') }}</a>
</li> {% endif %}
{% if history_n > 0 %}
<li class="tab" id="text-tab"><a href="#text">Text</a></li>
<li class="tab" id="screenshot-tab"><a href="#screenshot">Screenshot</a></li>
<li class="tab" id="text-tab"><a href="#text">{{ _('Text') }}</a></li>
<li class="tab" id="screenshot-tab"><a href="#screenshot">{{ _('Current screenshot') }}</a></li>
{% endif %}
</ul>
</div>
@@ -54,50 +56,39 @@
<div id="diff-ui">
<div class="tab-pane-inner" id="error-text">
<div class="snapshot-age error">{{ watch.error_text_ctime|format_seconds_ago }} seconds ago</div>
<div class="snapshot-age error">{{ watch.error_text_ctime|format_seconds_ago }} {{ _('seconds ago') }}</div>
<pre>
{{ last_error_text }}
</pre>
</div>
<div class="tab-pane-inner" id="error-screenshot">
<div class="snapshot-age error">{{ watch.snapshot_error_screenshot_ctime|format_seconds_ago }} seconds ago
<div class="snapshot-age error">{{ watch.snapshot_error_screenshot_ctime|format_seconds_ago }} {{ _('seconds ago') }}
</div>
<img id="error-screenshot-img" style="max-width: 80%"
alt="Current erroring screenshot from most recent request">
alt="{{ _('Current erroring screenshot from most recent request') }}">
</div>
<div class="tab-pane-inner" id="text">
{{ highlight_trigger_ignored_explainer() }}
<div class="snapshot-age">{{ current_version|format_timestamp_timeago }}</div>
<span class="tip"><strong>Pro-tip</strong>: Highlight text to add to ignore filters</span>
<table>
<tbody>
<tr>
<td id="diff-col" class="highlightable-filter">
<pre style="border-left: 2px solid #ddd;">
{{ content }}
</pre>
</td>
</tr>
</tbody>
</table>
<pre id="difference" style="border-left: 2px solid #ddd;">{{ content| diff_unescape_difference_spans }}</pre>
</div>
<div class="tab-pane-inner" id="screenshot">
<div class="tip">
For now, Differences are performed on text, not graphically, only the latest screenshot is available.
{{ _('For now, Differences are performed on text, not graphically, only the latest screenshot is available.') }}
</div>
<br>
{% if is_html_webdriver %}
{% if capabilities.supports_screenshots %}
{% if screenshot %}
<div class="snapshot-age">{{ watch.snapshot_screenshot_ctime|format_timestamp_timeago }}</div>
<img style="max-width: 80%" id="screenshot-img" alt="Current screenshot from most recent request">
<img style="max-width: 80%" id="screenshot-img" alt="{{ _('Current screenshot from most recent request') }}">
{% else %}
No screenshot available just yet! Try rechecking the page.
{{ _('No screenshot available just yet! Try rechecking the page.') }}
{% endif %}
{% else %}
<strong>Screenshot requires Playwright/WebDriver enabled</strong>
<strong>{{ _('Screenshot requires a Content Fetcher ( Sockpuppetbrowser, selenium, etc ) that supports screenshots.') }}</strong>
{% endif %}
</div>
</div>

View File

@@ -1,207 +1,12 @@
from flask import Blueprint, request, redirect, url_for, flash, render_template, make_response, send_from_directory, abort
import os
import time
from loguru import logger
from flask import Blueprint, request, redirect, url_for, flash
from flask_babel import gettext
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
from changedetectionio import html_tools
from changedetectionio import worker_handler
from changedetectionio import worker_pool
def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMetaData, watch_check_update):
views_blueprint = Blueprint('ui_views', __name__, template_folder="../ui/templates")
@views_blueprint.route("/preview/<string:uuid>", methods=['GET'])
@login_optionally_required
def preview_page(uuid):
content = []
versions = []
timestamp = None
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash("No history found for the specified link, bad link?", "error")
return redirect(url_for('watchlist.index'))
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
extra_stylesheets = [url_for('static_content', group='styles', filename='diff.css')]
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
triggered_line_numbers = []
if datastore.data['watching'][uuid].history_n == 0 and (watch.get_error_text() or watch.get_error_snapshot()):
flash("Preview unavailable - No fetch/check completed or triggers not reached", "error")
else:
# So prepare the latest preview or not
preferred_version = request.args.get('version')
versions = list(watch.history.keys())
timestamp = versions[-1]
if preferred_version and preferred_version in versions:
timestamp = preferred_version
try:
versions = list(watch.history.keys())
content = watch.get_history_snapshot(timestamp=timestamp)
triggered_line_numbers = html_tools.strip_ignore_text(content=content,
wordlist=watch['trigger_text'],
mode='line numbers'
)
except Exception as e:
content.append({'line': f"File doesnt exist or unable to read timestamp {timestamp}", 'classes': ''})
output = render_template("preview.html",
content=content,
current_version=timestamp,
history_n=watch.history_n,
extra_stylesheets=extra_stylesheets,
extra_title=f" - Diff - {watch.label} @ {timestamp}",
triggered_line_numbers=triggered_line_numbers,
current_diff_url=watch['url'],
screenshot=watch.get_screenshot(),
watch=watch,
uuid=uuid,
is_html_webdriver=is_html_webdriver,
last_error=watch['last_error'],
last_error_text=watch.get_error_text(),
last_error_screenshot=watch.get_error_snapshot(),
versions=versions
)
return output
@views_blueprint.route("/diff/<string:uuid>", methods=['POST'])
@login_optionally_required
def diff_history_page_build_report(uuid):
from changedetectionio import forms
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash("No history found for the specified link, bad link?", "error")
return redirect(url_for('watchlist.index'))
# For submission of requesting an extract
extract_form = forms.extractDataForm(formdata=request.form,
data={'extract_regex': request.form.get('extract_regex', '')}
)
if not extract_form.validate():
flash("An error occurred, please see below.", "error")
return _render_diff_template(uuid, extract_form)
else:
extract_regex = request.form.get('extract_regex', '').strip()
output = watch.extract_regex_from_all_history(extract_regex)
if output:
watch_dir = os.path.join(datastore.datastore_path, uuid)
response = make_response(send_from_directory(directory=watch_dir, path=output, as_attachment=True))
response.headers['Content-type'] = 'text/csv'
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
response.headers['Expires'] = "0"
return response
flash('No matches found while scanning all of the watch history for that RegEx.', 'error')
return redirect(url_for('ui.ui_views.diff_history_page', uuid=uuid) + '#extract')
def _render_diff_template(uuid, extract_form=None):
"""Helper function to render the diff template with all required data"""
from changedetectionio import forms
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
extra_stylesheets = [url_for('static_content', group='styles', filename='diff.css')]
try:
watch = datastore.data['watching'][uuid]
except KeyError:
flash("No history found for the specified link, bad link?", "error")
return redirect(url_for('watchlist.index'))
# Use provided form or create a new one
if extract_form is None:
extract_form = forms.extractDataForm(formdata=request.form,
data={'extract_regex': request.form.get('extract_regex', '')}
)
history = watch.history
dates = list(history.keys())
# If a "from_version" was requested, then find it (or the closest one)
# Also set "from version" to be the closest version to the one that was last viewed.
best_last_viewed_timestamp = watch.get_from_version_based_on_last_viewed
from_version_timestamp = best_last_viewed_timestamp if best_last_viewed_timestamp else dates[-2]
from_version = request.args.get('from_version', from_version_timestamp )
# Use the current one if nothing was specified
to_version = request.args.get('to_version', str(dates[-1]))
try:
to_version_file_contents = watch.get_history_snapshot(timestamp=to_version)
except Exception as e:
logger.error(f"Unable to read watch history to-version for version {to_version}: {str(e)}")
to_version_file_contents = f"Unable to read to-version at {to_version}.\n"
try:
from_version_file_contents = watch.get_history_snapshot(timestamp=from_version)
except Exception as e:
logger.error(f"Unable to read watch history from-version for version {from_version}: {str(e)}")
from_version_file_contents = f"Unable to read to-version {from_version}.\n"
screenshot_url = watch.get_screenshot()
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
password_enabled_and_share_is_off = not datastore.data['settings']['application'].get('shared_diff_access')
datastore.set_last_viewed(uuid, time.time())
return render_template("diff.html",
current_diff_url=watch['url'],
from_version=str(from_version),
to_version=str(to_version),
extra_stylesheets=extra_stylesheets,
extra_title=f" - Diff - {watch.label}",
extract_form=extract_form,
is_html_webdriver=is_html_webdriver,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
left_sticky=True,
newest=to_version_file_contents,
newest_version_timestamp=dates[-1],
password_enabled_and_share_is_off=password_enabled_and_share_is_off,
from_version_file_contents=from_version_file_contents,
to_version_file_contents=to_version_file_contents,
screenshot=screenshot_url,
uuid=uuid,
versions=dates, # All except current/last
watch_a=watch
)
@views_blueprint.route("/diff/<string:uuid>", methods=['GET'])
@login_optionally_required
def diff_history_page(uuid):
return _render_diff_template(uuid)
@views_blueprint.route("/form/add/quickwatch", methods=['POST'])
@login_optionally_required
@@ -216,21 +21,22 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
url = request.form.get('url').strip()
if datastore.url_exists(url):
flash(f'Warning, URL {url} already exists', "notice")
flash(gettext('Warning, URL {} already exists').format(url), "notice")
add_paused = request.form.get('edit_and_watch_submit_button') != None
processor = request.form.get('processor', 'text_json_diff')
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags').strip(), extras={'paused': add_paused, 'processor': processor})
from changedetectionio import processors
processor = request.form.get('processor', processors.get_default_processor())
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tags','').strip(), extras={'paused': add_paused, 'processor': processor})
if new_uuid:
if add_paused:
flash('Watch added in Paused state, saving will unpause.')
flash(gettext('Watch added in Paused state, saving will unpause.'))
return redirect(url_for('ui.ui_edit.edit_page', uuid=new_uuid, unpause_on_save=1, tag=request.args.get('tag')))
else:
# Straight into the queue.
worker_handler.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
flash("Watch added.")
worker_pool.queue_item_async_safe(update_q, queuedWatchMetaData.PrioritizedItem(priority=1, item={'uuid': new_uuid}))
flash(gettext("Watch added."))
return redirect(url_for('watchlist.index', tag=request.args.get('tag','')))
return views_blueprint
return views_blueprint

View File

@@ -2,10 +2,11 @@ import os
import time
from flask import Blueprint, request, make_response, render_template, redirect, url_for, flash, session
from flask_login import current_user
from flask_paginate import Pagination, get_page_parameter
from flask_babel import gettext as _
from changedetectionio import forms
from changedetectionio import processors
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
@@ -38,7 +39,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
elif op == 'mute':
datastore.data['watching'][uuid].toggle_mute()
datastore.needs_write = True
datastore.data['watching'][uuid].commit()
return redirect(url_for('watchlist.index', tag = active_tag_uuid))
# Sort by last_changed and add the uuid which is usually the key..
@@ -73,7 +74,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
pagination = Pagination(page=page,
total=total_count,
per_page=datastore.data['settings']['application'].get('pager_size', 50), css_framework="semantic")
per_page=datastore.data['settings']['application'].get('pager_size', 50),
css_framework="semantic",
display_msg=_('displaying <b>{start} - {end}</b> {record_name} in total <b>{total}</b>'),
record_name=_('records'))
sorted_tags = sorted(datastore.data['settings']['application'].get('tags').items(), key=lambda x: x[1]['title'])
@@ -84,13 +88,19 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
app_rss_token=datastore.data['settings']['application'].get('rss_access_token'),
datastore=datastore,
errored_count=errored_count,
extra_classes='has-queue' if not update_q.empty() else '',
form=form,
generate_tag_colors=processors.generate_processor_badge_colors,
guid=datastore.data['app_guid'],
has_proxies=datastore.proxy_list,
hosted_sticky=os.getenv("SALTED_PASS", False) == False,
now_time_server=round(time.time()),
pagination=pagination,
queued_uuids=[q_uuid.item['uuid'] for q_uuid in update_q.queue],
processor_badge_css=processors.get_processor_badge_css(),
processor_badge_texts=processors.get_processor_badge_texts(),
processor_descriptions=processors.get_processor_descriptions(),
queue_size=update_q.qsize(),
queued_uuids=update_q.get_queued_uuids(),
search_q=request.args.get('q', '').strip(),
sort_attribute=request.args.get('sort') if request.args.get('sort') else request.cookies.get('sort'),
sort_order=request.args.get('order') if request.args.get('order') else request.cookies.get('order'),

View File

@@ -1,14 +1,59 @@
{%- extends 'base.html' -%}
{%- block content -%}
{%- set tips = [
_("Changedetection.io can monitor more than just web-pages! See our plugins!") ~ ' <a href="https://changedetection.io/plugins">' ~ _('More info') ~ '</a>',
_("You can also add 'shared' watches.") ~ ' <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Sharing-a-Watch">' ~ _('More info') ~ '</a>'
] -%}
{%- from '_helpers.html' import render_simple_field, render_field, render_nolabel_field, sort_by_title -%}
<script src="{{url_for('static_content', group='js', filename='jquery-3.6.0.min.js')}}"></script>
<script src="{{url_for('static_content', group='js', filename='watch-overview.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='modal.js')}}"></script>
<script>let nowtimeserver={{ now_time_server }};</script>
<script>let favicon_baseURL="{{ url_for('static_content', group='favicon', filename="PLACEHOLDER")}}";</script>
<script>
// Initialize Feather icons after the page loads
document.addEventListener('DOMContentLoaded', function() {
feather.replace();
// Intersection Observer for lazy loading favicons
// Only load favicon images when they enter the viewport
if ('IntersectionObserver' in window) {
const faviconObserver = new IntersectionObserver((entries, observer) => {
entries.forEach(entry => {
if (entry.isIntersecting) {
const img = entry.target;
const src = img.getAttribute('data-src');
if (src) {
// Load the actual favicon
img.src = src;
img.removeAttribute('data-src');
}
// Stop observing this image
observer.unobserve(img);
}
});
}, {
// Start loading slightly before the image enters viewport
rootMargin: '50px',
threshold: 0.01
});
// Observe all lazy favicon images
document.querySelectorAll('.lazy-favicon').forEach(img => {
faviconObserver.observe(img);
});
} else {
// Fallback for older browsers: load all favicons immediately
document.querySelectorAll('.lazy-favicon').forEach(img => {
const src = img.getAttribute('data-src');
if (src) {
img.src = src;
img.removeAttribute('data-src');
}
});
}
});
</script>
<style>
@@ -18,27 +63,59 @@ document.addEventListener('DOMContentLoaded', function() {
background-repeat: no-repeat;
transition: background-size 0.9s ease
}
/* Auto-generated processor badge colors */
{{ processor_badge_css|safe }}
/* Auto-generated tag colors */
{%- for uuid, tag in tags -%}
{%- if tag and tag.title -%}
{%- set class_name = tag.title|sanitize_tag_class -%}
{%- set colors = generate_tag_colors(tag.title) -%}
.button-tag.tag-{{ class_name }} {
background-color: {{ colors['light']['bg'] }};
color: {{ colors['light']['color'] }};
}
.watch-tag-list.tag-{{ class_name }} {
background-color: {{ colors['light']['bg'] }};
color: {{ colors['light']['color'] }};
}
html[data-darkmode="true"] .button-tag.tag-{{ class_name }} {
background-color: {{ colors['dark']['bg'] }};
color: {{ colors['dark']['color'] }};
}
html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
background-color: {{ colors['dark']['bg'] }};
color: {{ colors['dark']['color'] }};
}
{%- endif -%}
{%- endfor -%}
</style>
<div class="box" id="form-quick-watch-add">
<form class="pure-form" action="{{ url_for('ui.ui_views.form_quick_watch_add', tag=active_tag_uuid) }}" method="POST" id="new-watch-form">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" >
<fieldset>
<legend>Add a new web page change detection watch</legend>
<legend>{{ _('Add a new web page change detection watch') }}</legend>
<div id="watch-add-wrapper-zone">
{{ render_nolabel_field(form.url, placeholder="https://...", required=true) }}
{{ render_nolabel_field(form.watch_submit_button, title="Watch this URL!" ) }}
{{ render_nolabel_field(form.edit_and_watch_submit_button, title="Edit first then Watch") }}
{{ render_nolabel_field(form.watch_submit_button, title=_("Watch this URL!") ) }}
{{ render_nolabel_field(form.edit_and_watch_submit_button, title=_("Edit first then Watch") ) }}
</div>
<div id="watch-group-tag">
{{ render_field(form.tags, value=active_tag.title if active_tag_uuid else '', placeholder="Watch group / tag", class="transparent-field") }}
{{ render_field(form.tags, value=active_tag.title if active_tag_uuid else '', placeholder=_("Watch group / tag"), class="transparent-field") }}
</div>
<div id="quick-watch-processor-type">
{{ render_simple_field(form.processor) }}
</div>
</fieldset>
<span style="color:#eee; font-size: 80%;"><img alt="Create a shareable link" style="height: 1em;display:inline-block;" src="{{url_for('static_content', group='images', filename='spread-white.svg')}}" > Tip: You can also add 'shared' watches. <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Sharing-a-Watch">More info</a></span>
<span style="color:#eee; font-size: 80%;">
<strong>Tip: </strong> {{ tips | random | safe }}
</span>
</form>
</div>
<div class="box">
@@ -46,29 +123,44 @@ document.addEventListener('DOMContentLoaded', function() {
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}" >
<input type="hidden" id="op_extradata" name="op_extradata" value="" >
<div id="checkbox-operations">
<button class="pure-button button-secondary button-xsmall" name="op" value="pause"><i data-feather="pause" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Pause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unpause"><i data-feather="play" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>UnPause</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mute"><i data-feather="volume-x" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Mute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unmute"><i data-feather="volume-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>UnMute</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="recheck"><i data-feather="refresh-cw" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Recheck</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="assign-tag" id="checkbox-assign-tag"><i data-feather="tag" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Tag</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mark-viewed"><i data-feather="eye" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Mark viewed</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="notification-default"><i data-feather="bell" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Use default notification</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="clear-errors"><i data-feather="x-circle" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Clear errors</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="clear-history"><i data-feather="trash-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Clear/reset history</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="delete"><i data-feather="trash" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>Delete</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="pause"><i data-feather="pause" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Pause') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unpause"><i data-feather="play" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('UnPause') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mute"><i data-feather="volume-x" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Mute') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="unmute"><i data-feather="volume-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('UnMute') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="recheck"><i data-feather="refresh-cw" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Recheck') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="assign-tag" id="checkbox-assign-tag"><i data-feather="tag" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Tag') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="mark-viewed"><i data-feather="eye" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Mark viewed') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="notification-default"><i data-feather="bell" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Use default notification') }}</button>
<button class="pure-button button-secondary button-xsmall" name="op" value="clear-errors"><i data-feather="x-circle" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Clear errors') }}</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="clear-history"
data-requires-confirm
data-confirm-type="danger"
data-confirm-title="{{ _('Clear Histories') }}"
data-confirm-message="{{ _('<p>Are you sure you want to clear history for the selected items?</p><p>This action cannot be undone.</p>') }}"
data-confirm-button="{{ _('OK') }}"><i data-feather="trash-2" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Clear/reset history') }}</button>
<button class="pure-button button-secondary button-xsmall" style="background: #dd4242;" name="op" value="delete"
data-requires-confirm
data-confirm-type="danger"
data-confirm-title="{{ _('Delete Watches?') }}"
data-confirm-message="{{ _('<p>Are you sure you want to delete the selected watches?</strong></p><p>This action cannot be undone.</p>') }}"
data-confirm-button="{{ _('Delete') }}"><i data-feather="trash" style="width: 14px; height: 14px; stroke: white; margin-right: 4px;"></i>{{ _('Delete') }}</button>
</div>
{%- if watches|length >= pagination.per_page -%}
{{ pagination.info }}
{%- endif -%}
{%- if search_q -%}<div id="search-result-info">Searching "<strong><i>{{search_q}}</i></strong>"</div>{%- endif -%}
<div id="stats_row">
<div class="left">{%- if watches|length >= pagination.per_page -%}{{ pagination.info }}{%- endif -%}</div>
<div class="right" >{{ _('Queued size') }}: <span id="queue-size-int">{{ queue_size }}</span></div>
</div>
{%- if search_q -%}<div id="search-result-info">{{ _('Searching') }} "<strong><i>{{search_q}}</i></strong>"</div>{%- endif -%}
<div>
<a href="{{url_for('watchlist.index')}}" class="pure-button button-tag {{'active' if not active_tag_uuid }}">All</a>
<a href="{{url_for('watchlist.index')}}" class="pure-button button-tag {{'active' if not active_tag_uuid }}">{{ _('All') }}</a>
<!-- tag list -->
{%- for uuid, tag in tags -%}
{%- if tag != "" -%}
<a href="{{url_for('watchlist.index', tag=uuid) }}" class="pure-button button-tag {{'active' if active_tag_uuid == uuid }}">{{ tag.title }}</a>
<a href="{{url_for('watchlist.index', tag=uuid) }}" class="pure-button button-tag tag-{{ tag.title|sanitize_tag_class }} {{'active' if active_tag_uuid == uuid }}">{{ tag.title }}</a>
{%- endif -%}
{%- endfor -%}
</div>
@@ -101,19 +193,19 @@ document.addEventListener('DOMContentLoaded', function() {
&nbsp;
<a class="{{ 'active '+link_order if sort_attribute == 'notification_muted' else 'inactive' }}" href="{{url_for('watchlist.index', sort='notification_muted', order=link_order, tag=active_tag_uuid)}}"><i data-feather="volume-2" style="vertical-align: bottom; width: 14px; height: 14px; margin-right: 4px;"></i><span class='arrow {{link_order}}'></span></a>
</th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'label' else 'inactive' }}" href="{{url_for('watchlist.index', sort='label', order=link_order, tag=active_tag_uuid)}}">Website <span class='arrow {{link_order}}'></span></a></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'label' else 'inactive' }}" href="{{url_for('watchlist.index', sort='label', order=link_order, tag=active_tag_uuid)}}">{{ _('Website') }} <span class='arrow {{link_order}}'></span></a></th>
{%- if any_has_restock_price_processor -%}
<th>Restock &amp; Price</th>
<th>{{ _('Restock & Price') }}</th>
{%- endif -%}
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_checked' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_checked', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">Last</span> Checked <span class='arrow {{link_order}}'></span></a></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_changed' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_changed', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">Last</span> Changed <span class='arrow {{link_order}}'></span></a></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_checked' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_checked', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">{{ _('Last') }}</span> {{ _('Checked') }} <span class='arrow {{link_order}}'></span></a></th>
<th><a class="{{ 'active '+link_order if sort_attribute == 'last_changed' else 'inactive' }}" href="{{url_for('watchlist.index', sort='last_changed', order=link_order, tag=active_tag_uuid)}}"><span class="hide-on-mobile">{{ _('Last') }}</span> {{ _('Changed') }} <span class='arrow {{link_order}}'></span></a></th>
<th class="empty-cell"></th>
</tr>
</thead>
<tbody>
{%- if not watches|length -%}
<tr>
<td colspan="{{ cols_required }}" style="text-wrap: wrap;">No website watches configured, please add a URL in the box above, or <a href="{{ url_for('imports.import_page')}}" >import a list</a>.</td>
<td colspan="{{ cols_required }}" style="text-wrap: wrap;">{{ _('No web page change detection watches configured, please add a URL in the box above, or') }} <a href="{{ url_for('imports.import_page')}}" >{{ _('import a list') }}</a>.</td>
</tr>
{%- endif -%}
@@ -154,39 +246,47 @@ document.addEventListener('DOMContentLoaded', function() {
<td class="title-col inline">
<div class="flex-wrapper">
{% if 'favicons_enabled' not in ui_settings or ui_settings['favicons_enabled'] %}
<div>{# A page might have hundreds of these images, set IMG options for lazy loading, don't set SRC if we dont have it so it doesnt fetch the placeholder' #}
<img alt="Favicon thumbnail" class="favicon" loading="lazy" decoding="async" fetchpriority="low" {% if favicon %} src="{{url_for('static_content', group='favicon', filename=watch.uuid)}}" {% else %} src='data:image/svg+xml;utf8,%3Csvg xmlns="http://www.w3.org/2000/svg" width="7.087" height="7.087" viewBox="0 0 7.087 7.087"%3E%3Ccircle cx="3.543" cy="3.543" r="3.279" stroke="%23e1e1e1" stroke-width="0.45" fill="none" opacity="0.74"/%3E%3C/svg%3E' {% endif %} />
<div>
{# Intersection Observer lazy loading: store real URL in data-src, load only when visible in viewport #}
<img alt="Favicon thumbnail"
class="favicon lazy-favicon"
loading="lazy"
decoding="async"
fetchpriority="low"
{% if favicon %}
data-src="{{url_for('static_content', group='favicon', filename=watch.uuid)}}"
{% endif %}
src='data:image/svg+xml;utf8,%3Csvg xmlns="http://www.w3.org/2000/svg" width="7.087" height="7.087" viewBox="0 0 7.087 7.087"%3E%3Ccircle cx="3.543" cy="3.543" r="3.279" stroke="%23e1e1e1" stroke-width="0.45" fill="none" opacity="0.74"/%3E%3C/svg%3E'>
</div>
{% endif %}
<div>
<span class="watch-title">
{% if system_use_url_watchlist or watch.get('use_page_title_in_list') %}
{{ watch.label }}
{% else %}
{{ watch.get('title') or watch.link }}
{% endif %}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
</span>
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list) }}</div>
{%- if watch['processor'] and watch['processor'] in processor_badge_texts -%}
<span class="processor-badge processor-badge-{{ watch['processor'] }}" title="{{ processor_descriptions.get(watch['processor'], watch['processor']) }}">{{ processor_badge_texts[watch['processor']] }}</span>
{%- endif -%}
<span class="watch-title">
{% if system_use_url_watchlist or watch.get('use_page_title_in_list') %}
{{ watch.label }}
{% else %}
{{ watch.get('title') or watch.link }}
{% endif %}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
</span>
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div>
{%- if watch['processor'] == 'text_json_diff' -%}
{%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%}
<div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div>
{%- endif -%}
{%- endif -%}
{%- if watch['processor'] == 'restock_diff' -%}
<span class="tracking-ldjson-price-data" title="Automatically following embedded price information"><img src="{{url_for('static_content', group='images', filename='price-tag-icon.svg')}}" class="status-icon price-follow-tag-icon" > Price</span>
{%- endif -%}
{%- for watch_tag_uuid, watch_tag in datastore.get_all_tags_for_watch(watch['uuid']).items() -%}
<span class="watch-tag-list">{{ watch_tag.title }}</span>
<a href="{{url_for('watchlist.index', tag=watch_tag_uuid) }}" class="watch-tag-list tag-{{ watch_tag.title|sanitize_tag_class }}">{{ watch_tag.title }}</a>
{%- endfor -%}
</div>
<div class="status-icons">
<a class="link-spread" href="{{url_for('ui.form_share_put_watch', uuid=watch.uuid)}}"><img src="{{url_for('static_content', group='images', filename='spread.svg')}}" class="status-icon icon icon-spread" title="Create a link to share watch config with others" ></a>
{%- if watch.get_fetch_backend == "html_webdriver"
or ( watch.get_fetch_backend == "system" and system_default_fetcher == 'html_webdriver' )
or "extra_browser_" in watch.get_fetch_backend
-%}
<img class="status-icon" src="{{url_for('static_content', group='images', filename='google-chrome-icon.png')}}" alt="Using a Chrome browser" title="Using a Chrome browser" >
{%- set effective_fetcher = watch.get_fetch_backend if watch.get_fetch_backend != "system" else system_default_fetcher -%}
{%- if effective_fetcher and ("html_webdriver" in effective_fetcher or "html_" in effective_fetcher or "extra_browser_" in effective_fetcher) -%}
{{ effective_fetcher|fetcher_status_icons }}
{%- endif -%}
{%- if watch.is_pdf -%}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" alt="Converting PDF to text" >{%- endif -%}
{%- if watch.has_browser_steps -%}<img class="status-icon status-browsersteps" src="{{url_for('static_content', group='images', filename='steps.svg')}}" alt="Browser Steps is enabled" >{%- endif -%}
@@ -198,20 +298,21 @@ document.addEventListener('DOMContentLoaded', function() {
<td class="restock-and-price">
{%- if watch['processor'] == 'restock_diff' -%}
{%- if watch.has_restock_info -%}
<span class="restock-label {{'in-stock' if watch['restock']['in_stock'] else 'not-in-stock' }}" title="Detecting restock and price">
<span class="restock-label {{'in-stock' if watch['restock']['in_stock'] else 'not-in-stock' }}" title="{{ _('Detecting restock and price') }}">
<!-- maybe some object watch['processor'][restock_diff] or.. -->
{%- if watch['restock']['in_stock']-%} In stock {%- else-%} Not in stock {%- endif -%}
{%- if watch['restock']['in_stock']-%} {{ _('In stock') }} {%- else-%} {{ _('Not in stock') }} {%- endif -%}
</span>
{%- endif -%}
{%- if watch.get('restock') and watch['restock']['price'] != None -%}
{%- if watch['restock']['price'] != None -%}
<span class="restock-label price" title="Price">
{{ watch['restock']['price']|format_number_locale }} {{ watch['restock']['currency'] }}
{%- if watch.get('restock') and watch['restock'].get('price') -%}
{%- if watch['restock']['price'] is number -%}
<span class="restock-label price" title="{{ _('Price') }}">
{{ watch['restock']['price']|format_number_locale if watch['restock'].get('price') else '' }} {{ watch['restock'].get('currency','') }}
</span>
{%- endif -%}
{%- else -%} <!-- watch['restock']['price']' is not a number, cant output it -->
{%- endif -%}
{%- elif not watch.has_restock_info -%}
<span class="restock-label error">No information</span>
<span class="restock-label error">{{ _('No information') }}</span>
{%- endif -%}
{%- endif -%}
</td>
@@ -219,24 +320,24 @@ document.addEventListener('DOMContentLoaded', function() {
{#last_checked becomes fetch-start-time#}
<td class="last-checked" data-timestamp="{{ watch.last_checked }}" data-fetchduration={{ watch.fetch_time }} data-eta_complete="{{ watch.last_checked+watch.fetch_time }}" >
<div class="spinner-wrapper" style="display:none;" >
<span class="spinner"></span><span>&nbsp;Checking now</span>
<span class="spinner"></span><span class="status-text">&nbsp;{{ _('Checking now') }}</span>
</div>
<span class="innertext">{{watch|format_last_checked_time|safe}}</span>
</td>
<td class="last-changed" data-timestamp="{{ watch.last_changed }}">{%- if watch.history_n >=2 and watch.last_changed >0 -%}
{{watch.last_changed|format_timestamp_timeago}}
{%- else -%}
Not yet
{{ _('Not yet') }}
{%- endif -%}
</td>
<td class="buttons">
<div>
{%- set target_attr = ' target="' ~ watch.uuid ~ '"' if datastore.data['settings']['application']['ui'].get('open_diff_in_new_tab') else '' -%}
<a href="" class="already-in-queue-button recheck pure-button pure-button-primary" style="display: none;" disabled="disabled">Queued</a>
<a href="{{ url_for('ui.form_watch_checknow', uuid=watch.uuid, tag=request.args.get('tag')) }}" data-op='recheck' class="ajax-op recheck pure-button pure-button-primary">Recheck</a>
<a href="{{ url_for('ui.ui_edit.edit_page', uuid=watch.uuid, tag=active_tag_uuid)}}#general" class="pure-button pure-button-primary">Edit</a>
<a href="{{ url_for('ui.ui_views.diff_history_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary history-link" style="display: none;">History</a>
<a href="{{ url_for('ui.ui_views.preview_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary preview-link" style="display: none;">Preview</a>
<a href="" class="already-in-queue-button recheck pure-button pure-button-primary" style="display: none;" disabled="disabled">{{ _('Queued') }}</a>
<a href="{{ url_for('ui.form_watch_checknow', uuid=watch.uuid, tag=request.args.get('tag')) }}" data-op='recheck' class="ajax-op recheck pure-button pure-button-primary">{{ _('Recheck') }}</a>
<a href="{{ url_for('ui.ui_edit.edit_page', uuid=watch.uuid, tag=active_tag_uuid)}}#general" class="pure-button pure-button-primary">{{ _('Edit') }}</a>
<a href="{{ url_for('ui.ui_diff.diff_history_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary history-link" style="display: none;">{{ _('History') }}</a>
<a href="{{ url_for('ui.ui_preview.preview_page', uuid=watch.uuid)}}" {{target_attr}} class="pure-button pure-button-primary preview-link" style="display: none;">{{ _('Preview') }}</a>
</div>
</td>
</tr>
@@ -245,22 +346,21 @@ document.addEventListener('DOMContentLoaded', function() {
</table>
<ul id="post-list-buttons">
<li id="post-list-with-errors" style="display: none;" >
<a href="{{url_for('watchlist.index', with_errors=1, tag=request.args.get('tag')) }}" class="pure-button button-tag button-error">With errors ({{ errored_count }})</a>
<a href="{{url_for('watchlist.index', with_errors=1, tag=request.args.get('tag')) }}" class="pure-button button-tag button-error">{{ _('With errors') }} ({{ errored_count }})</a>
</li>
<li id="post-list-mark-views" style="display: none;" >
<a href="{{url_for('ui.mark_all_viewed',with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag " id="mark-all-viewed">Mark all viewed</a>
<a href="{{url_for('ui.mark_all_viewed',with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag " id="mark-all-viewed">{{ _('Mark all viewed') }}</a>
</li>
{%- if active_tag_uuid -%}
<li id="post-list-mark-views-tag">
<a href="{{url_for('ui.mark_all_viewed', tag=active_tag_uuid) }}" class="pure-button button-tag " id="mark-all-viewed">Mark all viewed in '{{active_tag.title}}'</a>
<a href="{{url_for('ui.mark_all_viewed', tag=active_tag_uuid) }}" class="pure-button button-tag " id="mark-all-viewed">{{ _("Mark all viewed in '%(title)s'", title=active_tag.title) }}</a>
</li>
{%- endif -%}
<li id="post-list-unread" style="display: none;" >
<a href="{{url_for('watchlist.index', unread=1, tag=request.args.get('tag')) }}" class="pure-button button-tag">Unread (<span id="unread-tab-counter">{{ unread_changes_count }}</span>)</a>
<a href="{{url_for('watchlist.index', unread=1, tag=request.args.get('tag')) }}" class="pure-button button-tag">{{ _('Unread') }} (<span id="unread-tab-counter">{{ unread_changes_count }}</span>)</a>
</li>
<li>
<a href="{{ url_for('ui.form_watch_checknow', tag=active_tag_uuid, with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag" id="recheck-all">Recheck
all {% if active_tag_uuid %} in '{{active_tag.title}}'{%endif%}</a>
<a href="{{ url_for('ui.form_watch_checknow', tag=active_tag_uuid, with_errors=request.args.get('with_errors',0)) }}" class="pure-button button-tag" id="recheck-all">{{ _('Recheck all') }} {% if active_tag_uuid %} {{ _("in '%(title)s'", title=active_tag.title) }}{%endif%}</a>
</li>
<li>
<a href="{{ url_for('rss.feed', tag=active_tag_uuid, token=app_rss_token)}}"><img alt="RSS Feed" id="feed-icon" src="{{url_for('static_content', group='images', filename='generic_feed-icon.svg')}}" height="15"></a>

View File

@@ -8,6 +8,17 @@ from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT
from changedetectionio.content_fetchers.base import manage_user_agent
from changedetectionio.jinja2_custom import render as jinja_render
def browser_steps_get_valid_steps(browser_steps: list):
if browser_steps is not None and len(browser_steps):
valid_steps = list(filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one'),browser_steps))
# Just incase they selected Goto site by accident with older JS
if valid_steps and valid_steps[0]['operation'] == 'Goto site':
del(valid_steps[0])
return valid_steps
return []
# Two flags, tell the JS which of the "Selector" or "Value" field should be enabled in the front end
@@ -439,7 +450,7 @@ class browsersteps_live_ui(steppable_browser_interface):
logger.warning("Attempted to get current state after cleanup")
return (None, None)
xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text()
xpath_element_js = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text(encoding="utf-8")
now = time.time()
await self.page.wait_for_timeout(1 * 1000)

View File

@@ -1,3 +1,7 @@
"""
Levenshtein distance and similarity plugin for text change detection.
Provides metrics for measuring text similarity between snapshots.
"""
import pluggy
from loguru import logger

View File

@@ -1,3 +1,7 @@
"""
Word count plugin for content analysis.
Provides word count metrics for snapshot content.
"""
import pluggy
from loguru import logger

View File

@@ -7,6 +7,9 @@ import os
# Visual Selector scraper - 'Button' is there because some sites have <button>OUT OF STOCK</button>.
visualselector_xpath_selectors = 'div,span,form,table,tbody,tr,td,a,p,ul,li,h1,h2,h3,h4,header,footer,section,article,aside,details,main,nav,section,summary,button'
# Import hookimpl from centralized pluggy interface
from changedetectionio.pluggy_interface import hookimpl
SCREENSHOT_MAX_HEIGHT_DEFAULT = 20000
SCREENSHOT_DEFAULT_QUALITY = 40
@@ -18,7 +21,9 @@ SCREENSHOT_MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_
# The size at which we will switch to stitching method, when below this (and
# MAX_TOTAL_HEIGHT which can be set by a user) we will use the default
# screenshot method.
SCREENSHOT_SIZE_STITCH_THRESHOLD = 8000
# Increased from 8000 to 10000 for better performance (fewer chunks = faster)
# Most modern GPUs support 16384x16384 textures, so 1280x10000 is safe
SCREENSHOT_SIZE_STITCH_THRESHOLD = int(os.getenv("SCREENSHOT_CHUNK_HEIGHT", 10000))
# available_fetchers() will scan this implementation looking for anything starting with html_
# this information is used in the form selections
@@ -35,17 +40,54 @@ def available_fetchers():
# See the if statement at the bottom of this file for how we switch between playwright and webdriver
import inspect
p = []
# Get built-in fetchers (but skip plugin fetchers that were added via setattr)
for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass):
if inspect.isclass(obj):
# @todo html_ is maybe better as fetcher_ or something
# In this case, make sure to edit the default one in store.py and fetch_site_status.py
if name.startswith('html_'):
t = tuple([name, obj.fetcher_description])
p.append(t)
# Skip plugin fetchers that were already registered
if name not in _plugin_fetchers:
t = tuple([name, obj.fetcher_description])
p.append(t)
# Get plugin fetchers from cache (already loaded at module init)
for name, fetcher_class in _plugin_fetchers.items():
if hasattr(fetcher_class, 'fetcher_description'):
t = tuple([name, fetcher_class.fetcher_description])
p.append(t)
else:
logger.warning(f"Plugin fetcher '{name}' does not have fetcher_description attribute")
return p
def get_plugin_fetchers():
"""Load and return all plugin fetchers from the centralized plugin manager."""
from changedetectionio.pluggy_interface import plugin_manager
fetchers = {}
try:
# Call the register_content_fetcher hook from all registered plugins
results = plugin_manager.hook.register_content_fetcher()
for result in results:
if result:
name, fetcher_class = result
fetchers[name] = fetcher_class
# Register in current module so hasattr() checks work
setattr(sys.modules[__name__], name, fetcher_class)
logger.info(f"Registered plugin fetcher: {name} - {getattr(fetcher_class, 'fetcher_description', 'No description')}")
except Exception as e:
logger.error(f"Error loading plugin fetchers: {e}")
return fetchers
# Initialize plugins at module load time
_plugin_fetchers = get_plugin_fetchers()
# Decide which is the 'real' HTML webdriver, this is more a system wide config
# rather than site-specific.
use_playwright_as_chrome_fetcher = os.getenv('PLAYWRIGHT_DRIVER_URL', False)
@@ -62,3 +104,8 @@ else:
logger.debug("Falling back to selenium as fetcher")
from .webdriver_selenium import fetcher as html_webdriver
# Register built-in fetchers as plugins after all imports are complete
from changedetectionio.pluggy_interface import register_builtin_fetchers
register_builtin_fetchers()

View File

@@ -38,7 +38,6 @@ def manage_user_agent(headers, current_ua=''):
return None
class Fetcher():
browser_connection_is_custom = None
browser_connection_url = None
@@ -51,6 +50,7 @@ class Fetcher():
favicon_blob = None
instock_data = None
instock_data_js = ""
screenshot_format = None
status_code = None
webdriver_js_execute_code = None
xpath_data = None
@@ -64,6 +64,44 @@ class Fetcher():
# Time ONTOP of the system defined env minimum time
render_extract_delay = 0
# Fetcher capability flags - subclasses should override these
# These indicate what features the fetcher supports
supports_browser_steps = False # Can execute browser automation steps
supports_screenshots = False # Can capture page screenshots
supports_xpath_element_data = False # Can extract xpath element positions/data for visual selector
# Screenshot element locking - prevents layout shifts during screenshot capture
# Only needed for visual comparison (image_ssim_diff processor)
# Locks element dimensions in the first viewport to prevent headers/ads from resizing
lock_viewport_elements = False # Default: disabled for performance
def __init__(self, **kwargs):
if kwargs and 'screenshot_format' in kwargs:
self.screenshot_format = kwargs.get('screenshot_format')
# Allow lock_viewport_elements to be set via kwargs
if kwargs and 'lock_viewport_elements' in kwargs:
self.lock_viewport_elements = kwargs.get('lock_viewport_elements')
@classmethod
def get_status_icon_data(cls):
"""Return data for status icon to display in the watch overview.
This method can be overridden by subclasses to provide custom status icons.
Returns:
dict or None: Dictionary with icon data:
{
'filename': 'icon-name.svg', # Icon filename
'alt': 'Alt text', # Alt attribute
'title': 'Tooltip text', # Title attribute
'style': 'height: 1em;' # Optional inline CSS
}
Or None if no icon
"""
return None
def clear_content(self):
"""
Explicitly clear all content from memory to free up heap space.
@@ -92,12 +130,13 @@ class Fetcher():
request_method=None,
timeout=None,
url=None,
watch_uuid=None,
):
# Should set self.error, self.status_code and self.content
pass
@abstractmethod
def quit(self, watch=None):
async def quit(self, watch=None):
return
@abstractmethod
@@ -123,30 +162,16 @@ class Fetcher():
"""
return {k.lower(): v for k, v in self.headers.items()}
def browser_steps_get_valid_steps(self):
if self.browser_steps is not None and len(self.browser_steps):
valid_steps = list(filter(
lambda s: (s['operation'] and len(s['operation']) and s['operation'] != 'Choose one'),
self.browser_steps))
# Just incase they selected Goto site by accident with older JS
if valid_steps and valid_steps[0]['operation'] == 'Goto site':
del(valid_steps[0])
return valid_steps
return None
async def iterate_browser_steps(self, start_url=None):
from changedetectionio.blueprint.browser_steps.browser_steps import steppable_browser_interface
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface, browser_steps_get_valid_steps
from playwright._impl._errors import TimeoutError, Error
from changedetectionio.jinja2_custom import render as jinja_render
step_n = 0
if self.browser_steps is not None and len(self.browser_steps):
if self.browser_steps:
interface = steppable_browser_interface(start_url=start_url)
interface.page = self.page
valid_steps = self.browser_steps_get_valid_steps()
valid_steps = browser_steps_get_valid_steps(self.browser_steps)
for step in valid_steps:
step_n += 1

View File

@@ -1,3 +1,5 @@
import asyncio
import gc
import json
import os
from urllib.parse import urlparse
@@ -7,69 +9,141 @@ from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, FAVICON_FETCHER_JS
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable, \
BrowserStepsStepException
async def capture_full_page_async(page):
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
from multiprocessing import Process, Pipe
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width}")
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Use an approach similar to puppeteer: set a larger viewport and take screenshots in chunks
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Size that won't cause GPU to overflow
screenshot_chunks = []
y = 0
elements_locked = False
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled (took {lock_time:.2f}s)")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
logger.debug(f"Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
viewport_start = time.time()
logger.debug(f"{watch_info}Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
# Set viewport to a larger size to capture more content at once
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport_size['width']}x{step_size} (took {viewport_time:.2f}s)")
# Capture screenshots in chunks up to the max total height
capture_start = time.time()
chunk_times = []
# Use PNG for better quality (no compression artifacts), JPEG for smaller size
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
# PNG should use quality 100, JPEG uses configurable quality
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
# Only scroll if not at the top (y > 0)
if y > 0:
await page.evaluate(f"window.scrollTo(0, {y})")
# Request GC only before screenshot (not 3x per chunk)
await page.request_gc()
await page.evaluate(f"window.scrollTo(0, {y})")
await page.request_gc()
screenshot_chunks.append(await page.screenshot(
type="jpeg",
full_page=False,
quality=int(os.getenv("SCREENSHOT_QUALITY", 72))
))
screenshot_kwargs = {
'type': screenshot_type,
'full_page': False
}
# Only pass quality parameter for jpeg (PNG doesn't support it in Playwright)
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
await page.request_gc()
# Restore original viewport size
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
# Unlock element dimensions if they were locked
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
# If we have multiple chunks, stitch them together
if len(screenshot_chunks) > 1:
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
parent_conn, child_conn = Pipe()
p = Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
# Explicit cleanup
del screenshot_chunks
del p
del parent_conn, child_conn
screenshot_chunks = None
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
@@ -89,8 +163,22 @@ class fetcher(Fetcher):
proxy = None
def __init__(self, proxy_override=None, custom_browser_connection_url=None):
super().__init__()
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Playwright fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
self.browser_type = os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').strip('"')
@@ -125,22 +213,36 @@ class fetcher(Fetcher):
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
screenshot = await capture_full_page_async(page=self.page)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Request GC immediately after screenshot to free memory
# Screenshots can be large and browser steps take many of them
await self.page.request_gc()
if self.browser_steps_screenshot_path is not None:
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
logger.debug(f"Saving step screenshot to {destination}")
with open(destination, 'wb') as f:
f.write(screenshot)
# Clear local reference to allow screenshot bytes to be collected
del screenshot
gc.collect()
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = await self.page.content()
# Request GC after getting page content
await self.page.request_gc()
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w', encoding='utf-8') as f:
f.write(content)
# Clear local reference
del content
gc.collect()
async def run(self,
fetch_favicon=True,
@@ -151,14 +253,17 @@ class fetcher(Fetcher):
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid # Store for use in screenshot_step
response = None
async with async_playwright() as p:
@@ -190,7 +295,7 @@ class fetcher(Fetcher):
self.page.on("console", lambda msg: logger.debug(f"Playwright console: Watch URL: {url} {msg.type}: {msg.text} {msg.args}"))
# Re-use as much code from browser steps as possible so its the same
from changedetectionio.blueprint.browser_steps.browser_steps import steppable_browser_interface
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
@@ -244,7 +349,8 @@ class fetcher(Fetcher):
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page)
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Finally block will handle cleanup
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
@@ -253,81 +359,113 @@ class fetcher(Fetcher):
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
# Run Browser Steps here
if self.browser_steps_get_valid_steps():
await self.iterate_browser_steps(start_url=url)
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
# request_gc before and after evaluate to free up memory
# @todo browsersteps etc
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await self.page.request_gc()
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await self.page.request_gc()
self.content = await self.page.content()
await self.page.request_gc()
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
# JPEG is better here because the screenshots can be very very large
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# Wrap remaining operations in try/finally to ensure cleanup
try:
# Run Browser Steps here
if self.browser_steps:
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
# Finally block will handle cleanup
raise
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
# request_gc before and after evaluate to free up memory
# @todo browsersteps etc
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await self.page.request_gc()
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await self.page.request_gc()
self.content = await self.page.content()
await self.page.request_gc()
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
# JPEG is better here because the screenshots can be very very large
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
self.screenshot = await capture_full_page_async(page=self.page)
self.screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
except Exception as e:
# It's likely the screenshot was too long/big and something crashed
# Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers
await self.page.request_gc()
gc.collect()
except ScreenshotUnavailable:
# Re-raise screenshot unavailable exceptions
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
# Request garbage collection one more time before closing
# Clean up resources properly with timeouts to prevent hanging
try:
await self.page.request_gc()
except:
pass
# Clean up resources properly
try:
await self.page.request_gc()
except:
pass
if hasattr(self, 'page') and self.page:
await self.page.request_gc()
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"Successfully closed page for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing page for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing page for {url}: {e}")
finally:
self.page = None
try:
await self.page.close()
except:
pass
self.page = None
if context:
await asyncio.wait_for(context.close(), timeout=5.0)
logger.debug(f"Successfully closed context for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing context for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing context for {url}: {e}")
finally:
context = None
try:
await context.close()
except:
pass
context = None
if browser:
await asyncio.wait_for(browser.close(), timeout=5.0)
logger.debug(f"Successfully closed browser connection for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing browser connection for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing browser for {url}: {e}")
finally:
browser = None
try:
await browser.close()
except:
pass
browser = None
# Force Python GC to release Playwright resources immediately
# Playwright objects can have circular references that delay cleanup
gc.collect()
# Plugin registration for built-in fetcher
class PlaywrightFetcherPlugin:
"""Plugin class that registers the Playwright fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the Playwright fetcher"""
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
playwright_plugin = PlaywrightFetcherPlugin()

View File

@@ -1,4 +1,5 @@
import asyncio
import gc
import json
import os
import websockets.exceptions
@@ -20,18 +21,20 @@ from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
async def capture_full_page(page):
async def capture_full_page(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
from multiprocessing import Process, Pipe
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport
dimensions_time = time.time() - setup_start
logger.debug(f"Puppeteer viewport size {page.viewport} page height {page_height} page width {page_width}")
logger.debug(f"{watch_info}Puppeteer viewport size {page.viewport} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
@@ -41,48 +44,124 @@ async def capture_full_page(page):
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# Use PNG for better quality (no compression artifacts), JPEG for smaller size
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
# PNG should use quality 100, JPEG uses configurable quality
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Something that will not cause the GPU to overflow when taking the screenshot
screenshot_chunks = []
y = 0
elements_locked = False
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
file_read_start = time.time()
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
file_read_time = time.time() - file_read_start
evaluate_start = time.time()
await page.evaluate(lock_elements_js)
evaluate_time = time.time() - evaluate_start
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled - File read: {file_read_time:.3f}s, Browser evaluate: {evaluate_time:.2f}s, Total: {lock_time:.2f}s")
if page_height > page.viewport['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
viewport_start = time.time()
await page.setViewport({'width': page.viewport['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport['width']}x{step_size} (took {viewport_time:.2f}s)")
capture_start = time.time()
chunk_times = []
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
# better than scrollTo incase they override it in the page
await page.evaluate(
"""(y) => {
document.documentElement.scrollTop = y;
document.body.scrollTop = y;
const el = document.scrollingElement;
if (el) el.scrollTop = y;
}""",
y
)
screenshot_chunks.append(await page.screenshot(type_='jpeg',
fullPage=False,
quality=int(os.getenv("SCREENSHOT_QUALITY", 72))))
screenshot_kwargs = {
'type_': screenshot_type,
'fullPage': False
}
# PNG doesn't support quality parameter in Puppeteer
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
await page.setViewport({'width': original_viewport['width'], 'height': original_viewport['height']})
# Unlock element dimensions if they were locked
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
if len(screenshot_chunks) > 1:
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker
logger.debug(f"Screenshot stitching {len(screenshot_chunks)} chunks together")
parent_conn, child_conn = Pipe()
p = Process(target=stitch_images_worker, args=(child_conn, screenshot_chunks, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
logger.debug(
f"Screenshot (chunked/stitched) - Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
screenshot_chunks = None
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"Screenshot Page height: {page_height} Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT} - Stitched together in {time.time() - start:.2f}s")
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
@@ -93,13 +172,27 @@ class fetcher(Fetcher):
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
fetcher_description += " via '{}'".format(os.getenv("PLAYWRIGHT_DRIVER_URL"))
browser = None
browser_type = ''
command_executor = ''
proxy = None
def __init__(self, proxy_override=None, custom_browser_connection_url=None):
super().__init__()
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Puppeteer fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
if custom_browser_connection_url:
self.browser_connection_is_custom = True
@@ -128,21 +221,37 @@ class fetcher(Fetcher):
proxy_url += f"{parsed.hostname}{port}{parsed.path}{q}"
self.browser_connection_url += f"{r}--proxy-server={proxy_url}"
# def screenshot_step(self, step_n=''):
# screenshot = self.page.screenshot(type='jpeg', full_page=True, quality=85)
#
# if self.browser_steps_screenshot_path is not None:
# destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
# logger.debug(f"Saving step screenshot to {destination}")
# with open(destination, 'wb') as f:
# f.write(screenshot)
#
# def save_step_html(self, step_n):
# content = self.page.content()
# destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
# logger.debug(f"Saving step HTML to {destination}")
# with open(destination, 'w') as f:
# f.write(content)
async def quit(self, watch=None):
watch_uuid = watch.get('uuid') if watch else 'unknown'
# Close page
try:
if hasattr(self, 'page') and self.page:
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"[{watch_uuid}] Page closed successfully")
except asyncio.TimeoutError:
logger.warning(f"[{watch_uuid}] Timed out closing page (5s)")
except Exception as e:
logger.warning(f"[{watch_uuid}] Error closing page: {e}")
finally:
self.page = None
# Close browser connection
try:
if hasattr(self, 'browser') and self.browser:
await asyncio.wait_for(self.browser.close(), timeout=5.0)
logger.debug(f"[{watch_uuid}] Browser closed successfully")
except asyncio.TimeoutError:
logger.warning(f"[{watch_uuid}] Timed out closing browser (5s)")
except Exception as e:
logger.warning(f"[{watch_uuid}] Error closing browser: {e}")
finally:
self.browser = None
logger.info(f"[{watch_uuid}] Cleanup puppeteer complete")
# Force garbage collection to release resources
gc.collect()
async def fetch_page(self,
current_include_filters,
@@ -153,13 +262,15 @@ class fetcher(Fetcher):
request_body,
request_headers,
request_method,
screenshot_format,
timeout,
url,
watch_uuid
):
import re
self.delete_browser_steps_screenshots()
n = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
n = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 12)) + self.render_extract_delay
extra_wait = min(n, 15)
logger.debug(f"Extra wait set to {extra_wait}s, requested was {n}s.")
@@ -170,9 +281,11 @@ class fetcher(Fetcher):
# Connect directly using the specified browser_ws_endpoint
# @todo timeout
try:
browser = await pyppeteer_instance.connect(browserWSEndpoint=self.browser_connection_url,
ignoreHTTPSErrors=True
)
logger.debug(f"[{watch_uuid}] Connecting to browser at {self.browser_connection_url}")
self.browser = await pyppeteer_instance.connect(browserWSEndpoint=self.browser_connection_url,
ignoreHTTPSErrors=True
)
logger.debug(f"[{watch_uuid}] Browser connected successfully")
except websockets.exceptions.InvalidStatusCode as e:
raise BrowserConnectError(msg=f"Error while trying to connect the browser, Code {e.status_code} (check your access, whitelist IP, password etc)")
except websockets.exceptions.InvalidURI:
@@ -181,7 +294,20 @@ class fetcher(Fetcher):
raise BrowserConnectError(msg=f"Error connecting to the browser - Exception '{str(e)}'")
# more reliable is to just request a new page
self.page = await browser.newPage()
try:
logger.debug(f"[{watch_uuid}] Creating new page")
self.page = await self.browser.newPage()
logger.debug(f"[{watch_uuid}] Page created successfully")
except Exception as e:
logger.error(f"[{watch_uuid}] Failed to create new page: {e}")
# Browser is connected but page creation failed - must cleanup browser
try:
await asyncio.wait_for(self.browser.close(), timeout=3.0)
except Exception as cleanup_error:
logger.error(f"[{watch_uuid}] Failed to cleanup browser after page creation failure: {cleanup_error}")
finally:
self.browser = None
raise
# Add console handler to capture console.log from favicon fetcher
#self.page.on('console', lambda msg: logger.debug(f"Browser console [{msg.type}]: {msg.text}"))
@@ -196,7 +322,6 @@ class fetcher(Fetcher):
"height": int(match.group(2))
})
logger.debug(f"Puppeteer viewport size {self.page.viewport}")
try:
from pyppeteerstealth import inject_evasions_into_page
except ImportError:
@@ -241,33 +366,57 @@ class fetcher(Fetcher):
# browsersteps_interface = steppable_browser_interface()
# browsersteps_interface.page = self.page
async def handle_frame_navigation(event):
# Enable Network domain to detect when first bytes arrive
await self.page._client.send('Network.enable')
# Now set up the frame navigation handlers
async def handle_frame_navigation(event=None):
# Wait n seconds after the frameStartedLoading, not from any frameStartedLoading/frameStartedNavigating
logger.debug(f"Frame navigated: {event}")
w = extra_wait - 2 if extra_wait > 4 else 2
logger.debug(f"Waiting {w} seconds before calling Page.stopLoading...")
await asyncio.sleep(w)
# Check if page still exists (might have been closed due to error during sleep)
if not self.page or not hasattr(self.page, '_client'):
logger.debug("Page already closed, skipping stopLoading")
return
logger.debug("Issuing stopLoading command...")
await self.page._client.send('Page.stopLoading')
logger.debug("stopLoading command sent!")
self.page._client.on('Page.frameStartedNavigating', lambda event: asyncio.create_task(handle_frame_navigation(event)))
self.page._client.on('Page.frameStartedLoading', lambda event: asyncio.create_task(handle_frame_navigation(event)))
self.page._client.on('Page.frameStoppedLoading', lambda event: logger.debug(f"Frame stopped loading: {event}"))
async def setup_frame_handlers_on_first_response(event):
# Only trigger for the main document response
if event.get('type') == 'Document':
logger.debug("First response received, setting up frame handlers for forced page stop load.")
self.page._client.on('Page.frameStartedNavigating', lambda e: asyncio.create_task(handle_frame_navigation(e)))
self.page._client.on('Page.frameStartedLoading', lambda e: asyncio.create_task(handle_frame_navigation(e)))
self.page._client.on('Page.frameStoppedLoading', lambda e: logger.debug(f"Frame stopped loading: {e}"))
logger.debug("First response received, setting up frame handlers for forced page stop load DONE SETUP")
# De-register this listener - we only need it once
self.page._client.remove_listener('Network.responseReceived', setup_frame_handlers_on_first_response)
# Listen for first response to trigger frame handler setup
self.page._client.on('Network.responseReceived', setup_frame_handlers_on_first_response)
response = None
attempt=0
while not response:
logger.debug(f"Attempting page fetch {url} attempt {attempt}")
response = await self.page.goto(url)
asyncio.create_task(handle_frame_navigation())
response = await self.page.goto(url, timeout=0)
await asyncio.sleep(1 + extra_wait)
# Check if page still exists before sending command
if self.page and hasattr(self.page, '_client'):
await self.page._client.send('Page.stopLoading')
if response:
break
if not response:
logger.warning("Page did not fetch! trying again!")
if response is None and attempt>=2:
await self.page.close()
await browser.close()
logger.warning(f"Content Fetcher > Response object was none (as in, the response from the browser was empty, not just the content) exiting attmpt {attempt}")
logger.warning(f"Content Fetcher > Response object was none (as in, the response from the browser was empty, not just the content) exiting attempt {attempt}")
raise EmptyReply(url=url, status_code=None)
attempt+=1
@@ -279,8 +428,6 @@ class fetcher(Fetcher):
except Exception as e:
logger.warning("Got exception when running evaluate on custom JS code")
logger.error(str(e))
await self.page.close()
await browser.close()
# This can be ok, we will try to grab what we could retrieve
raise PageUnloadable(url=url, status_code=None, message=str(e))
@@ -290,8 +437,6 @@ class fetcher(Fetcher):
# https://github.com/dgtlmoon/changedetection.io/discussions/2122#discussioncomment-8241962
logger.critical(f"Response from the browser/Playwright did not have a status_code! Response follows.")
logger.critical(response)
await self.page.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if fetch_favicon:
@@ -301,7 +446,7 @@ class fetcher(Fetcher):
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page(page=self.page)
screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
@@ -309,13 +454,11 @@ class fetcher(Fetcher):
if not empty_pages_are_a_change and len(content.strip()) == 0:
logger.error("Content Fetcher > Content was empty (empty_pages_are_a_change is False), closing browsers")
await self.page.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
# Run Browser Steps here
# @todo not yet supported, we switch to playwright in this case
# if self.browser_steps_get_valid_steps():
# if self.browser_steps:
# self.iterate_browser_steps()
@@ -328,6 +471,16 @@ class fetcher(Fetcher):
await self.page.evaluate(f"var include_filters=''")
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.content = await self.page.content
# Now take screenshot (scrolling may trigger layout changes, but measurements are already captured)
logger.debug(f"Screenshot format {self.screenshot_format}")
self.screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force garbage collection - pyppeteer base64 decode creates temporary buffers
import gc
gc.collect()
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
@@ -335,17 +488,10 @@ class fetcher(Fetcher):
if not self.xpath_data:
raise Exception(f"Content Fetcher > xPath scraper failed. Please report this URL so we can fix it :)")
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
self.content = await self.page.content
self.screenshot = await capture_full_page(page=self.page)
# It's good to log here in the case that the browser crashes on shutting down but we still get the data we need
logger.success(f"Fetching '{url}' complete, closing page")
await self.page.close()
logger.success(f"Fetching '{url}' complete, closing browser")
await browser.close()
logger.success(f"Fetching '{url}' complete, exiting puppeteer fetch.")
async def main(self, **kwargs):
@@ -360,8 +506,10 @@ class fetcher(Fetcher):
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
#@todo make update_worker async which could run any of these content_fetchers within memory and time constraints
@@ -378,9 +526,32 @@ class fetcher(Fetcher):
request_body=request_body,
request_headers=request_headers,
request_method=request_method,
screenshot_format=None,
timeout=timeout,
url=url,
watch_uuid=watch_uuid,
), timeout=max_time
)
except asyncio.TimeoutError:
raise (BrowserFetchTimedOut(msg=f"Browser connected but was unable to process the page in {max_time} seconds."))
finally:
# Internal cleanup on any exception/timeout - call quit() immediately
# This prevents connection leaks during exception bursts
# Worker.py's quit() call becomes a redundant safety net (idempotent)
try:
await self.quit(watch={'uuid': watch_uuid} if watch_uuid else None)
except Exception as cleanup_error:
logger.error(f"[{watch_uuid}] Error during internal quit() cleanup: {cleanup_error}")
# Plugin registration for built-in fetcher
class PuppeteerFetcherPlugin:
"""Plugin class that registers the Puppeteer fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the Puppeteer fetcher"""
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
puppeteer_plugin = PuppeteerFetcherPlugin()

View File

@@ -1,18 +1,22 @@
from loguru import logger
from urllib.parse import urljoin, urlparse
import hashlib
import os
import re
import asyncio
from changedetectionio import strtobool
from changedetectionio.content_fetchers.exceptions import BrowserStepsInUnsupportedFetcher, EmptyReply, Non200ErrorCodeReceived
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.validate_url import is_private_hostname
# "html_requests" is listed as the default fetcher in store.py!
class fetcher(Fetcher):
fetcher_description = "Basic fast Plaintext/HTTP Client"
def __init__(self, proxy_override=None, custom_browser_connection_url=None):
super().__init__()
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
self.proxy_override = proxy_override
# browser_connection_url is none because its always 'launched locally'
@@ -25,14 +29,16 @@ class fetcher(Fetcher):
ignore_status_codes=False,
current_include_filters=None,
is_binary=False,
empty_pages_are_a_change=False):
empty_pages_are_a_change=False,
watch_uuid=None,
):
"""Synchronous version of run - the original requests implementation"""
import chardet
import requests
from requests.exceptions import ProxyError, ConnectionError, RequestException
if self.browser_steps_get_valid_steps():
if self.browser_steps:
raise BrowserStepsInUnsupportedFetcher(url=url)
proxies = {}
@@ -51,18 +57,72 @@ class fetcher(Fetcher):
session = requests.Session()
# Configure retry adapter for low-level network errors only
# Retries connection timeouts, read timeouts, connection resets - not HTTP status codes
# Especially helpful in parallel test execution when servers are slow/overloaded
# Configurable via REQUESTS_RETRY_MAX_COUNT (default: 3 attempts)
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
max_retries = int(os.getenv("REQUESTS_RETRY_MAX_COUNT", "6"))
retry_strategy = Retry(
total=max_retries,
connect=max_retries, # Retry connection timeouts
read=max_retries, # Retry read timeouts
status=0, # Don't retry on HTTP status codes
backoff_factor=0.5, # Wait 0.3s, 0.6s, 1.2s between retries
allowed_methods=["HEAD", "GET", "OPTIONS", "POST"],
raise_on_status=False
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
if strtobool(os.getenv('ALLOW_FILE_URI', 'false')) and url.startswith('file://'):
from requests_file import FileAdapter
session.mount('file://', FileAdapter())
allow_iana_restricted = strtobool(os.getenv('ALLOW_IANA_RESTRICTED_ADDRESSES', 'false'))
try:
# Fresh DNS check at fetch time — catches DNS rebinding regardless of add-time cache.
if not allow_iana_restricted:
parsed_initial = urlparse(url)
if parsed_initial.hostname and is_private_hostname(parsed_initial.hostname):
raise Exception(f"Fetch blocked: '{url}' resolves to a private/reserved IP address. "
f"Set ALLOW_IANA_RESTRICTED_ADDRESSES=true to allow.")
r = session.request(method=request_method,
data=request_body.encode('utf-8') if type(request_body) is str else request_body,
url=url,
headers=request_headers,
timeout=timeout,
proxies=proxies,
verify=False)
verify=False,
allow_redirects=False)
# Manually follow redirects so each hop's resolved IP can be validated,
# preventing SSRF via an open redirect on a public host.
current_url = url
for _ in range(10):
if not r.is_redirect:
break
location = r.headers.get('Location', '')
redirect_url = urljoin(current_url, location)
if not allow_iana_restricted:
parsed_redirect = urlparse(redirect_url)
if parsed_redirect.hostname and is_private_hostname(parsed_redirect.hostname):
raise Exception(f"Redirect blocked: '{redirect_url}' resolves to a private/reserved IP address.")
current_url = redirect_url
r = session.request('GET', redirect_url,
headers=request_headers,
timeout=timeout,
proxies=proxies,
verify=False,
allow_redirects=False)
else:
raise Exception("Too many redirects")
except Exception as e:
msg = str(e)
if proxies and 'SOCKSHTTPSConnectionPool' in msg:
@@ -76,9 +136,22 @@ class fetcher(Fetcher):
if not is_binary:
# Don't run this for PDF (and requests identified as binary) takes a _long_ time
if not r.headers.get('content-type') or not 'charset=' in r.headers.get('content-type'):
encoding = chardet.detect(r.content)['encoding']
if encoding:
r.encoding = encoding
# For XML/RSS feeds, check the XML declaration for encoding attribute
# This is more reliable than chardet which can misdetect UTF-8 as MacRoman
content_type = r.headers.get('content-type', '').lower()
if 'xml' in content_type or 'rss' in content_type:
# Look for <?xml version="1.0" encoding="UTF-8"?>
xml_encoding_match = re.search(rb'<\?xml[^>]+encoding=["\']([^"\']+)["\']', r.content[:200])
if xml_encoding_match:
r.encoding = xml_encoding_match.group(1).decode('ascii')
else:
# Default to UTF-8 for XML if no encoding found
r.encoding = 'utf-8'
else:
# For other content types, use chardet
encoding = chardet.detect(r.content)['encoding']
if encoding:
r.encoding = encoding
self.headers = r.headers
@@ -104,6 +177,12 @@ class fetcher(Fetcher):
self.raw_content = r.content
# If the content is an image, set it as screenshot for SSIM/visual comparison
content_type = r.headers.get('content-type', '').lower()
if 'image/' in content_type:
self.screenshot = r.content
logger.debug(f"Image content detected ({content_type}), set as screenshot for comparison")
async def run(self,
fetch_favicon=True,
current_include_filters=None,
@@ -113,14 +192,17 @@ class fetcher(Fetcher):
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
"""Async wrapper that runs the synchronous requests code in a thread pool"""
loop = asyncio.get_event_loop()
# Run the synchronous _run_sync in a thread pool to avoid blocking the event loop
# Retry logic is handled by requests' HTTPAdapter (see _run_sync for configuration)
await loop.run_in_executor(
None, # Use default ThreadPoolExecutor
lambda: self._run_sync(
@@ -132,12 +214,12 @@ class fetcher(Fetcher):
ignore_status_codes=ignore_status_codes,
current_include_filters=current_include_filters,
is_binary=is_binary,
empty_pages_are_a_change=empty_pages_are_a_change
empty_pages_are_a_change=empty_pages_are_a_change,
watch_uuid=watch_uuid,
)
)
def quit(self, watch=None):
async def quit(self, watch=None):
# In case they switched to `requests` fetcher from something else
# Then the screenshot could be old, in any case, it's not used here.
# REMOVE_REQUESTS_OLD_SCREENSHOTS - Mainly used for testing
@@ -149,3 +231,15 @@ class fetcher(Fetcher):
except Exception as e:
logger.warning(f"Failed to unlink screenshot: {screenshot} - {e}")
# Plugin registration for built-in fetcher
class RequestsFetcherPlugin:
"""Plugin class that registers the requests fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the requests fetcher"""
return ('html_requests', fetcher)
# Create module-level instance for plugin registration
requests_plugin = RequestsFetcherPlugin()

View File

@@ -0,0 +1,107 @@
/**
* Lock Element Dimensions for Screenshot Capture (First Viewport Only)
*
* THE PROBLEM:
* When taking full-page screenshots of tall pages, Chrome/Puppeteer/Playwright need to:
* 1. Temporarily change the viewport height to a large value (e.g., 800px → 3809px)
* 2. Take screenshots in chunks while scrolling
* 3. Stitch the chunks together
*
* However, changing the viewport height triggers CSS media queries like:
* @media (min-height: 860px) { .ad { height: 250px; } }
*
* This causes elements (especially ads/headers) to resize during screenshot capture.
*
* THE SOLUTION:
* Lock element dimensions in the FIRST VIEWPORT ONLY with !important inline styles.
* This prevents headers, navigation, and top ads from resizing when viewport changes.
* We only lock the visible portion because:
* - Most layout shifts happen in headers/navbars/top ads
* - Locking only visible elements is 100x+ faster (100-200 elements vs 10,000+)
* - Below-fold content shifts don't affect visual comparison accuracy
*
* WHAT THIS SCRIPT DOES:
* 1. Gets current viewport height
* 2. Finds elements within first viewport (top of page to bottom of screen)
* 3. Locks their dimensions with !important inline styles
* 4. Disables ResizeObserver API (for JS-based resizing)
*
* USAGE:
* Execute this script BEFORE calling capture_full_page() / screenshot functions.
* Only enabled for image_ssim_diff processor (visual comparison).
* Default: OFF for performance.
*
* PERFORMANCE:
* - Only processes 100-300 elements (first viewport) vs 10,000+ (entire page)
* - Typically completes in 10-50ms
* - 100x+ faster than locking entire page
*
* @see https://github.com/dgtlmoon/changedetection.io/issues/XXXX
*/
(() => {
// Store original styles in a global WeakMap for later restoration
window.__elementSizingRestore = new WeakMap();
const start = performance.now();
// Get current viewport height (visible portion of page)
const viewportHeight = window.innerHeight;
// Get all elements and filter to FIRST VIEWPORT ONLY
// This dramatically reduces elements to process (100-300 vs 10,000+)
const allElements = Array.from(document.querySelectorAll('*'));
// BATCH READ PHASE: Get bounding rects and filter to viewport
const measurements = allElements.map(el => {
const rect = el.getBoundingClientRect();
const computed = window.getComputedStyle(el);
// Only lock elements in the first viewport (visible on initial page load)
// rect.top < viewportHeight means element starts within visible area
const inViewport = rect.top < viewportHeight && rect.top >= 0;
const hasSize = rect.height > 0 && rect.width > 0;
return inViewport && hasSize ? { el, computed, rect } : null;
}).filter(Boolean); // Remove null entries
const elapsed = performance.now() - start;
console.log(`Locked first viewport elements: ${measurements.length} of ${allElements.length} total elements (viewport height: ${viewportHeight}px, took ${elapsed.toFixed(0)}ms)`);
// BATCH WRITE PHASE: Apply all inline styles without triggering layout
// No interleaved reads means browser can optimize style application
measurements.forEach(({el, computed, rect}) => {
// Save original inline style values BEFORE locking
const properties = ['height', 'min-height', 'max-height', 'width', 'min-width', 'max-width'];
const originalStyles = {};
properties.forEach(prop => {
originalStyles[prop] = {
value: el.style.getPropertyValue(prop),
priority: el.style.getPropertyPriority(prop)
};
});
window.__elementSizingRestore.set(el, originalStyles);
// Lock dimensions with !important to override media queries
if (rect.height > 0) {
el.style.setProperty('height', computed.height, 'important');
el.style.setProperty('min-height', computed.height, 'important');
el.style.setProperty('max-height', computed.height, 'important');
}
if (rect.width > 0) {
el.style.setProperty('width', computed.width, 'important');
el.style.setProperty('min-width', computed.width, 'important');
el.style.setProperty('max-width', computed.width, 'important');
}
});
// Also disable ResizeObserver for JS-based resizing
window.ResizeObserver = class {
constructor() {}
observe() {}
unobserve() {}
disconnect() {}
};
console.log(`✓ Element dimensions locked (${measurements.length} elements) to prevent media query changes during screenshot`);
})();

View File

@@ -0,0 +1,52 @@
/**
* Unlock Element Dimensions After Screenshot Capture
*
* This script removes the inline !important styles that were applied by lock-elements-sizing.js
* and restores elements to their original state using the WeakMap created during locking.
*
* USAGE:
* Execute this script AFTER completing screenshot capture and restoring the viewport.
* This allows the page to return to its normal responsive behavior.
*
* WHAT THIS SCRIPT DOES:
* 1. Iterates through every element that was locked
* 2. Reads original style values from the global WeakMap
* 3. Restores original inline styles (or removes them if they weren't set originally)
* 4. Cleans up the WeakMap
*
* @see lock-elements-sizing.js for the locking mechanism
*/
(() => {
// Check if the restore map exists
if (!window.__elementSizingRestore) {
console.log('⚠ Element sizing restore map not found - elements may not have been locked');
return;
}
// Restore all locked dimension styles to their original state
document.querySelectorAll('*').forEach(el => {
const originalStyles = window.__elementSizingRestore.get(el);
if (originalStyles) {
const properties = ['height', 'min-height', 'max-height', 'width', 'min-width', 'max-width'];
properties.forEach(prop => {
const original = originalStyles[prop];
if (original.value) {
// Restore original value with original priority
el.style.setProperty(prop, original.value, original.priority || '');
} else {
// Was not set originally, so remove it
el.style.removeProperty(prop);
}
});
}
});
// Clean up the global WeakMap
delete window.__elementSizingRestore;
console.log('✓ Element dimensions unlocked - page restored to original state');
})();

View File

@@ -8,16 +8,42 @@ from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, SCREENSHOT_DEFAULT_QUALITY
def stitch_images_worker_raw_bytes(pipe_conn, original_page_height, capture_height):
"""
Stitch image chunks together in a separate process.
def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_height):
Uses spawn multiprocessing to isolate PIL's C-level memory allocation.
When the subprocess exits, the OS reclaims ALL memory including C-level allocations
that Python's GC cannot release. This prevents the ~50MB per stitch from accumulating
in the main process.
Trade-off: Adds 35MB resource_tracker subprocess, but prevents 500MB+ memory leak
in main process (much better at scale: 35GB vs 500GB for 1000 instances).
Args:
pipe_conn: Pipe connection to receive data and send result
original_page_height: Original page height in pixels
capture_height: Maximum capture height
"""
import os
import io
import struct
from PIL import Image, ImageDraw, ImageFont
try:
# Receive chunk count as 4-byte integer (no pickle!)
count_bytes = pipe_conn.recv_bytes()
chunk_count = struct.unpack('I', count_bytes)[0]
# Receive each chunk as raw bytes (no pickle!)
chunks_bytes = []
for _ in range(chunk_count):
chunks_bytes.append(pipe_conn.recv_bytes())
# Load images from byte chunks
images = [Image.open(io.BytesIO(b)) for b in chunks_bytes]
del chunks_bytes
total_height = sum(im.height for im in images)
max_width = max(im.width for im in images)
@@ -27,47 +53,42 @@ def stitch_images_worker(pipe_conn, chunks_bytes, original_page_height, capture_
for im in images:
stitched.paste(im, (0, y_offset))
y_offset += im.height
im.close()
del images
# Draw caption on top (overlaid, not extending canvas)
draw = ImageDraw.Draw(stitched)
# Draw caption only if page was trimmed
if original_page_height > capture_height:
draw = ImageDraw.Draw(stitched)
caption_text = f"WARNING: Screenshot was {original_page_height}px but trimmed to {capture_height}px because it was too long"
padding = 10
font_size = 35
font_color = (255, 0, 0)
background_color = (255, 255, 255)
# Try to load a proper font
try:
font = ImageFont.truetype("arial.ttf", font_size)
font = ImageFont.truetype("arial.ttf", 35)
except IOError:
font = ImageFont.load_default()
bbox = draw.textbbox((0, 0), caption_text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
# Draw white rectangle background behind text
rect_top = 0
rect_bottom = text_height + 2 * padding
draw.rectangle([(0, rect_top), (max_width, rect_bottom)], fill=background_color)
# Draw text centered horizontally, 10px padding from top of the rectangle
draw.rectangle([(0, 0), (max_width, text_height + 2 * padding)], fill=(255, 255, 255))
text_x = (max_width - text_width) // 2
text_y = padding
draw.text((text_x, text_y), caption_text, font=font, fill=font_color)
draw.text((text_x, padding), caption_text, font=font, fill=(255, 0, 0))
# Encode and send image
# Encode and send
output = io.BytesIO()
stitched.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", SCREENSHOT_DEFAULT_QUALITY)))
pipe_conn.send_bytes(output.getvalue())
stitched.save(output, format="JPEG", quality=int(os.getenv("SCREENSHOT_QUALITY", SCREENSHOT_DEFAULT_QUALITY)), optimize=True)
result_bytes = output.getvalue()
stitched.close()
del stitched
output.close()
del output
pipe_conn.send_bytes(result_bytes)
del result_bytes
except Exception as e:
pipe_conn.send(f"error:{e}")
logger.error(f"Error in stitch_images_worker_raw_bytes: {e}")
error_msg = f"error:{e}".encode('utf-8')
pipe_conn.send_bytes(error_msg)
finally:
pipe_conn.close()

View File

@@ -14,8 +14,22 @@ class fetcher(Fetcher):
proxy = None
proxy_url = None
def __init__(self, proxy_override=None, custom_browser_connection_url=None):
super().__init__()
# Capability flags
supports_browser_steps = False
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for WebDriver fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
from urllib.parse import urlparse
from selenium.webdriver.common.proxy import Proxy
@@ -55,8 +69,10 @@ class fetcher(Fetcher):
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
import asyncio
@@ -131,7 +147,34 @@ class fetcher(Fetcher):
time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay)
self.content = driver.page_source
self.headers = {}
self.screenshot = driver.get_screenshot_as_png()
# Selenium always captures as PNG, convert to JPEG if needed
screenshot_png = driver.get_screenshot_as_png()
# Convert to JPEG if requested (for smaller file size)
if self.screenshot_format and self.screenshot_format.upper() == 'JPEG':
from PIL import Image
import io
img = Image.open(io.BytesIO(screenshot_png))
# Convert to RGB if needed (JPEG doesn't support transparency)
# Always convert non-RGB modes to RGB to ensure JPEG compatibility
if img.mode in ('RGBA', 'LA', 'P', 'PA'):
# Handle transparency by compositing onto white background
if img.mode == 'P':
img = img.convert('RGBA')
background = Image.new('RGB', img.size, (255, 255, 255))
if img.mode in ('RGBA', 'LA', 'PA'):
background.paste(img, mask=img.split()[-1]) # Use alpha channel as mask
img = background
elif img.mode != 'RGB':
# For other modes, direct conversion
img = img.convert('RGB')
jpeg_buffer = io.BytesIO()
img.save(jpeg_buffer, format='JPEG', quality=int(os.getenv("SCREENSHOT_QUALITY", 72)))
self.screenshot = jpeg_buffer.getvalue()
img.close()
else:
self.screenshot = screenshot_png
except Exception as e:
driver.quit()
raise e
@@ -141,3 +184,16 @@ class fetcher(Fetcher):
# Run the selenium operations in a thread pool to avoid blocking the event loop
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, _run_sync)
# Plugin registration for built-in fetcher
class WebDriverSeleniumFetcherPlugin:
"""Plugin class that registers the WebDriver Selenium fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the WebDriver Selenium fetcher"""
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
webdriver_selenium_plugin = WebDriverSeleniumFetcherPlugin()

View File

@@ -57,14 +57,15 @@ class SignalPriorityQueue(queue.PriorityQueue):
def put(self, item, block=True, timeout=None):
# Call the parent's put method first
super().put(item, block, timeout)
# After putting the item in the queue, check if it has a UUID and emit signal
if hasattr(item, 'item') and isinstance(item.item, dict) and 'uuid' in item.item:
uuid = item.item['uuid']
# Get the signal and send it if it exists
watch_check_update = signal('watch_check_update')
if watch_check_update:
# Send the watch_uuid parameter
# NOTE: This would block other workers from .put/.get while this signal sends
# Signal handlers may iterate the queue/datastore while holding locks
watch_check_update.send(watch_uuid=uuid)
# Send queue_length signal with current queue size
@@ -312,14 +313,15 @@ class AsyncSignalPriorityQueue(asyncio.PriorityQueue):
async def put(self, item):
# Call the parent's put method first
await super().put(item)
# After putting the item in the queue, check if it has a UUID and emit signal
if hasattr(item, 'item') and isinstance(item.item, dict) and 'uuid' in item.item:
uuid = item.item['uuid']
# Get the signal and send it if it exists
watch_check_update = signal('watch_check_update')
if watch_check_update:
# Send the watch_uuid parameter
# NOTE: This would block other workers from .put/.get while this signal sends
# Signal handlers may iterate the queue/datastore while holding locks
watch_check_update.send(watch_uuid=uuid)
# Send queue_length signal with current queue size

View File

@@ -1,130 +0,0 @@
import difflib
from typing import List, Iterator, Union
# https://github.com/dgtlmoon/changedetection.io/issues/821#issuecomment-1241837050
#HTML_ADDED_STYLE = "background-color: #d2f7c2; color: #255d00;"
#HTML_CHANGED_INTO_STYLE = "background-color: #dafbe1; color: #116329;"
#HTML_CHANGED_STYLE = "background-color: #ffd6cc; color: #7a2000;"
#HTML_REMOVED_STYLE = "background-color: #ffebe9; color: #82071e;"
# @todo - In the future we can make this configurable
HTML_ADDED_STYLE = "background-color: #eaf2c2; color: #406619"
HTML_REMOVED_STYLE = "background-color: #fadad7; color: #b30000"
HTML_CHANGED_STYLE = HTML_REMOVED_STYLE
HTML_CHANGED_INTO_STYLE = HTML_ADDED_STYLE
# These get set to html or telegram type or discord compatible or whatever in handler.py
# Something that cant get escaped to HTML by accident
REMOVED_PLACEMARKER_OPEN = '@removed_PLACEMARKER_OPEN'
REMOVED_PLACEMARKER_CLOSED = '@removed_PLACEMARKER_CLOSED'
ADDED_PLACEMARKER_OPEN = '@added_PLACEMARKER_OPEN'
ADDED_PLACEMARKER_CLOSED = '@added_PLACEMARKER_CLOSED'
CHANGED_PLACEMARKER_OPEN = '@changed_PLACEMARKER_OPEN'
CHANGED_PLACEMARKER_CLOSED = '@changed_PLACEMARKER_CLOSED'
CHANGED_INTO_PLACEMARKER_OPEN = '@changed_into_PLACEMARKER_OPEN'
CHANGED_INTO_PLACEMARKER_CLOSED = '@changed_into_PLACEMARKER_CLOSED'
def same_slicer(lst: List[str], start: int, end: int) -> List[str]:
"""Return a slice of the list, or a single element if start == end."""
return lst[start:end] if start != end else [lst[start]]
def customSequenceMatcher(
before: List[str],
after: List[str],
include_equal: bool = False,
include_removed: bool = True,
include_added: bool = True,
include_replaced: bool = True,
include_change_type_prefix: bool = True
) -> Iterator[List[str]]:
"""
Compare two sequences and yield differences based on specified parameters.
Args:
before (List[str]): Original sequence
after (List[str]): Modified sequence
include_equal (bool): Include unchanged parts
include_removed (bool): Include removed parts
include_added (bool): Include added parts
include_replaced (bool): Include replaced parts
include_change_type_prefix (bool): Add prefixes to indicate change types
Yields:
List[str]: Differences between sequences
"""
cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \t", a=before, b=after)
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if include_equal and tag == 'equal':
yield before[alo:ahi]
elif include_removed and tag == 'delete':
if include_change_type_prefix:
yield [f'{REMOVED_PLACEMARKER_OPEN}{line}{REMOVED_PLACEMARKER_CLOSED}' for line in same_slicer(before, alo, ahi)]
else:
yield same_slicer(before, alo, ahi)
elif include_replaced and tag == 'replace':
if include_change_type_prefix:
yield [f'{CHANGED_PLACEMARKER_OPEN}{line}{CHANGED_PLACEMARKER_CLOSED}' for line in same_slicer(before, alo, ahi)] + \
[f'{CHANGED_INTO_PLACEMARKER_OPEN}{line}{CHANGED_INTO_PLACEMARKER_CLOSED}' for line in same_slicer(after, blo, bhi)]
else:
yield same_slicer(before, alo, ahi) + same_slicer(after, blo, bhi)
elif include_added and tag == 'insert':
if include_change_type_prefix:
yield [f'{ADDED_PLACEMARKER_OPEN}{line}{ADDED_PLACEMARKER_CLOSED}' for line in same_slicer(after, blo, bhi)]
else:
yield same_slicer(after, blo, bhi)
def render_diff(
previous_version_file_contents: str,
newest_version_file_contents: str,
include_equal: bool = False,
include_removed: bool = True,
include_added: bool = True,
include_replaced: bool = True,
line_feed_sep: str = "\n",
include_change_type_prefix: bool = True,
patch_format: bool = False
) -> str:
"""
Render the difference between two file contents.
Args:
previous_version_file_contents (str): Original file contents
newest_version_file_contents (str): Modified file contents
include_equal (bool): Include unchanged parts
include_removed (bool): Include removed parts
include_added (bool): Include added parts
include_replaced (bool): Include replaced parts
line_feed_sep (str): Separator for lines in output
include_change_type_prefix (bool): Add prefixes to indicate change types
patch_format (bool): Use patch format for output
Returns:
str: Rendered difference
"""
newest_lines = [line.rstrip() for line in newest_version_file_contents.splitlines()]
previous_lines = [line.rstrip() for line in previous_version_file_contents.splitlines()] if previous_version_file_contents else []
if patch_format:
patch = difflib.unified_diff(previous_lines, newest_lines)
return line_feed_sep.join(patch)
rendered_diff = customSequenceMatcher(
before=previous_lines,
after=newest_lines,
include_equal=include_equal,
include_removed=include_removed,
include_added=include_added,
include_replaced=include_replaced,
include_change_type_prefix=include_change_type_prefix
)
def flatten(lst: List[Union[str, List[str]]]) -> str:
return line_feed_sep.join(flatten(x) if isinstance(x, list) else x for x in lst)
return flatten(rendered_diff)

View File

@@ -0,0 +1,479 @@
"""
Diff rendering module for change detection.
This module provides functions for rendering differences between text content,
with support for various output formats and tokenization strategies.
"""
import difflib
from typing import List, Iterator, Union
from loguru import logger
import diff_match_patch as dmp_module
import re
import time
from .tokenizers import TOKENIZERS, tokenize_words_and_html
# Remember! gmail, outlook etc dont support <style> must be inline.
# Gmail: strips <ins> and <del> tags entirely.
# This is for the WHOLE line background style
REMOVED_STYLE = "background-color: #fadad7; color: #b30000;"
ADDED_STYLE = "background-color: #eaf2c2; color: #406619;"
HTML_REMOVED_STYLE = REMOVED_STYLE # Export alias for handler.py
HTML_ADDED_STYLE = ADDED_STYLE # Export alias for handler.py
# Darker backgrounds for nested highlighting (changed parts within lines)
REMOVED_INNER_STYLE = "background-color: #ff867a; color: #111;"
ADDED_INNER_STYLE = "background-color: #b2e841; color: #444;"
HTML_CHANGED_STYLE = REMOVED_STYLE
HTML_CHANGED_INTO_STYLE = ADDED_STYLE
# Placemarker constants - these get replaced by apply_service_tweaks() in handler.py
# Something that cant get escaped to HTML by accident
REMOVED_PLACEMARKER_OPEN = '@removed_PLACEMARKER_OPEN'
REMOVED_PLACEMARKER_CLOSED = '@removed_PLACEMARKER_CLOSED'
ADDED_PLACEMARKER_OPEN = '@added_PLACEMARKER_OPEN'
ADDED_PLACEMARKER_CLOSED = '@added_PLACEMARKER_CLOSED'
CHANGED_PLACEMARKER_OPEN = '@changed_PLACEMARKER_OPEN'
CHANGED_PLACEMARKER_CLOSED = '@changed_PLACEMARKER_CLOSED'
CHANGED_INTO_PLACEMARKER_OPEN = '@changed_into_PLACEMARKER_OPEN'
CHANGED_INTO_PLACEMARKER_CLOSED = '@changed_into_PLACEMARKER_CLOSED'
# Compiled regex patterns for performance
WHITESPACE_NORMALIZE_RE = re.compile(r'\s+')
def render_inline_word_diff(before_line: str, after_line: str, ignore_junk: bool = False, markdown_style: str = None, tokenizer: str = 'words_and_html') -> tuple[str, bool]:
"""
Render word-level differences between two lines inline using diff-match-patch library.
Args:
before_line: Original line text
after_line: Modified line text
ignore_junk: Ignore whitespace-only changes
markdown_style: Unused (kept for backwards compatibility)
tokenizer: Name of tokenizer to use from TOKENIZERS registry (default: 'words_and_html')
Returns:
tuple[str, bool]: (diff output with inline word-level highlighting, has_changes flag)
"""
# Normalize whitespace if ignore_junk is enabled
if ignore_junk:
# Normalize whitespace: replace multiple spaces/tabs with single space
before_normalized = WHITESPACE_NORMALIZE_RE.sub(' ', before_line)
after_normalized = WHITESPACE_NORMALIZE_RE.sub(' ', after_line)
else:
before_normalized = before_line
after_normalized = after_line
# Use diff-match-patch with word-level tokenization
# Strategy: Use linesToChars to treat words as atomic units
dmp = dmp_module.diff_match_patch()
# Get the tokenizer function from the registry
tokenizer_func = TOKENIZERS.get(tokenizer, tokenize_words_and_html)
# Tokenize both lines using the selected tokenizer
before_tokens = tokenizer_func(before_normalized)
after_tokens = tokenizer_func(after_normalized or ' ')
# Create mappings for linesToChars (using it for word-mode)
# Join tokens with newline so each "line" is a token
before_text = '\n'.join(before_tokens)
after_text = '\n'.join(after_tokens)
# Use linesToChars for word-mode diffing
lines_result = dmp.diff_linesToChars(before_text, after_text)
line_before, line_after, line_array = lines_result
# Perform diff on the encoded strings
diffs = dmp.diff_main(line_before, line_after, False)
# Convert back to original text
dmp.diff_charsToLines(diffs, line_array)
# Remove the newlines we added for tokenization
diffs = [(op, text.replace('\n', '')) for op, text in diffs]
# DON'T apply semantic cleanup here - it would break token boundaries
# (e.g., "63" -> "66" would become "6" + "3" vs "6" + "6")
# We want to preserve the tokenizer's word boundaries
# Check if there are any changes
has_changes = any(op != 0 for op, _ in diffs)
if ignore_junk and not has_changes:
return after_line, False
# Check if the whole line is replaced (no unchanged content)
whole_line_replaced = not any(op == 0 and text.strip() for op, text in diffs)
# Build the output using placemarkers
# When whole line is replaced, wrap entire removed content once and entire added content once
if whole_line_replaced:
removed_tokens = []
added_tokens = []
for op, text in diffs:
if op == 0: # Equal (e.g., whitespace tokens in common positions)
# Include in both removed and added to preserve spacing
removed_tokens.append(text)
added_tokens.append(text)
elif op == -1: # Deletion
removed_tokens.append(text)
elif op == 1: # Insertion
added_tokens.append(text)
# Join all tokens and wrap the entire string once for removed, once for added
result_parts = []
if removed_tokens:
removed_full = ''.join(removed_tokens).rstrip()
trailing_removed = ''.join(removed_tokens)[len(removed_full):] if len(''.join(removed_tokens)) > len(removed_full) else ''
result_parts.append(f'{CHANGED_PLACEMARKER_OPEN}{removed_full}{CHANGED_PLACEMARKER_CLOSED}{trailing_removed}')
if added_tokens:
if result_parts: # Add newline between removed and added
result_parts.append('\n')
added_full = ''.join(added_tokens).rstrip()
trailing_added = ''.join(added_tokens)[len(added_full):] if len(''.join(added_tokens)) > len(added_full) else ''
result_parts.append(f'{CHANGED_INTO_PLACEMARKER_OPEN}{added_full}{CHANGED_INTO_PLACEMARKER_CLOSED}{trailing_added}')
return ''.join(result_parts), has_changes
else:
# Inline changes within the line
result_parts = []
for op, text in diffs:
if op == 0: # Equal
result_parts.append(text)
elif op == 1: # Insertion
# Don't wrap empty content (e.g., whitespace-only tokens after rstrip)
content = text.rstrip()
trailing = text[len(content):] if len(text) > len(content) else ''
if content:
result_parts.append(f'{ADDED_PLACEMARKER_OPEN}{content}{ADDED_PLACEMARKER_CLOSED}{trailing}')
else:
result_parts.append(trailing)
elif op == -1: # Deletion
# Don't wrap empty content (e.g., whitespace-only tokens after rstrip)
content = text.rstrip()
trailing = text[len(content):] if len(text) > len(content) else ''
if content:
result_parts.append(f'{REMOVED_PLACEMARKER_OPEN}{content}{REMOVED_PLACEMARKER_CLOSED}{trailing}')
else:
result_parts.append(trailing)
return ''.join(result_parts), has_changes
def render_nested_line_diff(before_line: str, after_line: str, ignore_junk: bool = False, tokenizer: str = 'words_and_html') -> tuple[str, str, bool]:
"""
Render line-level differences with nested highlighting for changed parts.
Returns two separate lines:
- Before line: light red background with dark red on removed parts
- After line: light green background with dark green on added parts
Args:
before_line: Original line text
after_line: Modified line text
ignore_junk: Ignore whitespace-only changes
tokenizer: Name of tokenizer to use from TOKENIZERS registry
Returns:
tuple[str, str, bool]: (before_with_highlights, after_with_highlights, has_changes)
"""
# Normalize whitespace if ignore_junk is enabled
if ignore_junk:
before_normalized = WHITESPACE_NORMALIZE_RE.sub(' ', before_line)
after_normalized = WHITESPACE_NORMALIZE_RE.sub(' ', after_line)
else:
before_normalized = before_line
after_normalized = after_line
# Use diff-match-patch with word-level tokenization
dmp = dmp_module.diff_match_patch()
# Get the tokenizer function from the registry
tokenizer_func = TOKENIZERS.get(tokenizer, tokenize_words_and_html)
# Tokenize both lines
before_tokens = tokenizer_func(before_normalized)
after_tokens = tokenizer_func(after_normalized or ' ')
# Create mappings for linesToChars
before_text = '\n'.join(before_tokens)
after_text = '\n'.join(after_tokens)
# Use linesToChars for word-mode diffing
lines_result = dmp.diff_linesToChars(before_text, after_text)
line_before, line_after, line_array = lines_result
# Perform diff on the encoded strings
diffs = dmp.diff_main(line_before, line_after, False)
# Convert back to original text
dmp.diff_charsToLines(diffs, line_array)
# Remove the newlines we added for tokenization
diffs = [(op, text.replace('\n', '')) for op, text in diffs]
# DON'T apply semantic cleanup here - it would break token boundaries
# (e.g., "63" -> "66" would become "6" + "3" vs "6" + "6")
# We want to preserve the tokenizer's word boundaries
# Check if there are any changes
has_changes = any(op != 0 for op, _ in diffs)
if ignore_junk and not has_changes:
return before_line, after_line, False
# Build the before line (with nested highlighting for removed parts)
before_parts = []
for op, text in diffs:
if op == 0: # Equal
before_parts.append(text)
elif op == -1: # Deletion (in before)
before_parts.append(f'<span style="{REMOVED_INNER_STYLE}">{text}</span>')
# Skip insertions (op == 1) for the before line
before_content = ''.join(before_parts)
# Build the after line (with nested highlighting for added parts)
after_parts = []
for op, text in diffs:
if op == 0: # Equal
after_parts.append(text)
elif op == 1: # Insertion (in after)
after_parts.append(f'<span style="{ADDED_INNER_STYLE}">{text}</span>')
# Skip deletions (op == -1) for the after line
after_content = ''.join(after_parts)
# Wrap content with placemarkers (inner HTML highlighting is preserved)
before_html = f'{CHANGED_PLACEMARKER_OPEN}{before_content}{CHANGED_PLACEMARKER_CLOSED}'
after_html = f'{CHANGED_INTO_PLACEMARKER_OPEN}{after_content}{CHANGED_INTO_PLACEMARKER_CLOSED}'
return before_html, after_html, has_changes
def same_slicer(lst: List[str], start: int, end: int) -> List[str]:
"""Return a slice of the list, or a single element if start == end."""
return lst[start:end] if start != end else [lst[start]]
def customSequenceMatcher(
before: List[str],
after: List[str],
include_equal: bool = False,
include_removed: bool = True,
include_added: bool = True,
include_replaced: bool = True,
include_change_type_prefix: bool = True,
word_diff: bool = False,
context_lines: int = 0,
case_insensitive: bool = False,
ignore_junk: bool = False,
tokenizer: str = 'words_and_html'
) -> Iterator[List[str]]:
"""
Compare two sequences and yield differences based on specified parameters.
Args:
before (List[str]): Original sequence
after (List[str]): Modified sequence
include_equal (bool): Include unchanged parts
include_removed (bool): Include removed parts
include_added (bool): Include added parts
include_replaced (bool): Include replaced parts
include_change_type_prefix (bool): Add prefixes to indicate change types
word_diff (bool): Use word-level diffing for replaced lines (controls inline rendering)
context_lines (int): Number of unchanged lines to show around changes (like grep -C)
case_insensitive (bool): Perform case-insensitive comparison
ignore_junk (bool): Ignore whitespace-only changes
tokenizer (str): Name of tokenizer to use from TOKENIZERS registry (default: 'words_and_html')
Yields:
List[str]: Differences between sequences
"""
# Prepare sequences for comparison (lowercase if case-insensitive, normalize whitespace if ignore_junk)
def prepare_line(line):
if case_insensitive:
line = line.lower()
if ignore_junk:
# Normalize whitespace: replace multiple spaces/tabs with single space
line = WHITESPACE_NORMALIZE_RE.sub(' ', line)
return line
compare_before = [prepare_line(line) for line in before]
compare_after = [prepare_line(line) for line in after]
cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \t", a=compare_before, b=compare_after)
# When context_lines is set and include_equal is False, we need to track which equal lines to include
if context_lines > 0 and not include_equal:
opcodes = list(cruncher.get_opcodes())
# Mark equal ranges that should be included based on context
included_equal_ranges = set()
for i, (tag, alo, ahi, blo, bhi) in enumerate(opcodes):
if tag != 'equal':
# Include context lines before this change
for j in range(max(0, i - 1), i):
if opcodes[j][0] == 'equal':
prev_alo, prev_ahi = opcodes[j][1], opcodes[j][2]
# Include last N lines of the previous equal block
context_start = max(prev_alo, prev_ahi - context_lines)
for line_num in range(context_start, prev_ahi):
included_equal_ranges.add(line_num)
# Include context lines after this change
for j in range(i + 1, min(len(opcodes), i + 2)):
if opcodes[j][0] == 'equal':
next_alo, next_ahi = opcodes[j][1], opcodes[j][2]
# Include first N lines of the next equal block
context_end = min(next_ahi, next_alo + context_lines)
for line_num in range(next_alo, context_end):
included_equal_ranges.add(line_num)
# Remember! gmail, outlook etc dont support <style> must be inline.
# Gmail: strips <ins> and <del> tags entirely.
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if tag == 'equal':
if include_equal:
yield before[alo:ahi]
elif context_lines > 0:
# Only include equal lines that are in the context range
context_lines_to_include = [before[i] for i in range(alo, ahi) if i in included_equal_ranges]
if context_lines_to_include:
yield context_lines_to_include
elif include_removed and tag == 'delete':
if include_change_type_prefix:
yield [f'{REMOVED_PLACEMARKER_OPEN}{line}{REMOVED_PLACEMARKER_CLOSED}' for line in same_slicer(before, alo, ahi)]
else:
yield same_slicer(before, alo, ahi)
elif include_replaced and tag == 'replace':
before_lines = same_slicer(before, alo, ahi)
after_lines = same_slicer(after, blo, bhi)
# Use inline word-level diff for single line replacements when word_diff is enabled
if word_diff and len(before_lines) == 1 and len(after_lines) == 1:
inline_diff, has_changes = render_inline_word_diff(before_lines[0], after_lines[0], ignore_junk=ignore_junk, tokenizer=tokenizer)
# Check if there are any actual changes (not just whitespace when ignore_junk is enabled)
if ignore_junk and not has_changes:
# No real changes, skip this line
continue
yield [inline_diff]
else:
# Fall back to line-level diff for multi-line changes
if include_change_type_prefix:
yield [f'{CHANGED_PLACEMARKER_OPEN}{line}{CHANGED_PLACEMARKER_CLOSED}' for line in before_lines] + \
[f'{CHANGED_INTO_PLACEMARKER_OPEN}{line}{CHANGED_INTO_PLACEMARKER_CLOSED}' for line in after_lines]
else:
yield before_lines + after_lines
elif include_added and tag == 'insert':
if include_change_type_prefix:
yield [f'{ADDED_PLACEMARKER_OPEN}{line}{ADDED_PLACEMARKER_CLOSED}' for line in same_slicer(after, blo, bhi)]
else:
yield same_slicer(after, blo, bhi)
def render_diff(
previous_version_file_contents: str,
newest_version_file_contents: str,
include_equal: bool = False,
include_removed: bool = True,
include_added: bool = True,
include_replaced: bool = True,
include_change_type_prefix: bool = True,
patch_format: bool = False,
word_diff: bool = True,
context_lines: int = 0,
case_insensitive: bool = False,
ignore_junk: bool = False,
tokenizer: str = 'words_and_html'
) -> str:
"""
Render the difference between two file contents.
Args:
previous_version_file_contents (str): Original file contents
newest_version_file_contents (str): Modified file contents
include_equal (bool): Include unchanged parts
include_removed (bool): Include removed parts
include_added (bool): Include added parts
include_replaced (bool): Include replaced parts
include_change_type_prefix (bool): Add prefixes to indicate change types
patch_format (bool): Use patch format for output
word_diff (bool): Use word-level diffing for replaced lines (controls inline rendering)
context_lines (int): Number of unchanged lines to show around changes (like grep -C)
case_insensitive (bool): Perform case-insensitive comparison, By default the test_json_diff/process.py is case sensitive, so this follows same logic
ignore_junk (bool): Ignore whitespace-only changes
tokenizer (str): Name of tokenizer to use from TOKENIZERS registry (default: 'words_and_html')
Returns:
str: Rendered difference
"""
newest_lines = [line.rstrip() for line in newest_version_file_contents.splitlines()]
previous_lines = [line.rstrip() for line in previous_version_file_contents.splitlines()] if previous_version_file_contents else []
now = time.time()
logger.debug(
f"diff options: "
f"include_equal={include_equal}, "
f"include_removed={include_removed}, "
f"include_added={include_added}, "
f"include_replaced={include_replaced}, "
f"include_change_type_prefix={include_change_type_prefix}, "
f"patch_format={patch_format}, "
f"word_diff={word_diff}, "
f"context_lines={context_lines}, "
f"case_insensitive={case_insensitive}, "
f"ignore_junk={ignore_junk}, "
f"tokenizer={tokenizer}"
)
if patch_format:
patch = difflib.unified_diff(previous_lines, newest_lines)
return "\n".join(patch)
rendered_diff = customSequenceMatcher(
before=previous_lines,
after=newest_lines,
include_equal=include_equal,
include_removed=include_removed,
include_added=include_added,
include_replaced=include_replaced,
include_change_type_prefix=include_change_type_prefix,
word_diff=word_diff,
context_lines=context_lines,
case_insensitive=case_insensitive,
ignore_junk=ignore_junk,
tokenizer=tokenizer
)
def flatten(lst: List[Union[str, List[str]]]) -> str:
result = []
for x in lst:
if isinstance(x, list):
result.extend(x)
else:
result.append(x)
return "\n".join(result)
logger.debug(f"Diff generated in {time.time() - now:.2f}s")
return flatten(rendered_diff)
# Export main public API
__all__ = [
'render_diff',
'customSequenceMatcher',
'render_inline_word_diff',
'render_nested_line_diff',
'TOKENIZERS',
'REMOVED_STYLE',
'ADDED_STYLE',
'REMOVED_INNER_STYLE',
'ADDED_INNER_STYLE',
]

View File

@@ -0,0 +1,23 @@
"""
Tokenizers for diff operations.
This module provides various tokenization strategies for use with the diff system.
New tokenizers can be easily added by:
1. Creating a new module in this directory
2. Importing and registering it in the TOKENIZERS dictionary below
"""
from .natural_text import tokenize_words
from .words_and_html import tokenize_words_and_html
# Tokenizer registry - maps tokenizer names to functions
TOKENIZERS = {
'words': tokenize_words,
'words_and_html': tokenize_words_and_html,
}
__all__ = [
'tokenize_words',
'tokenize_words_and_html',
'TOKENIZERS',
]

View File

@@ -0,0 +1,44 @@
"""
Simple word tokenizer using whitespace boundaries.
This is a simpler tokenizer that treats all whitespace as token boundaries
without special handling for HTML tags or other markup.
"""
from typing import List
def tokenize_words(text: str) -> List[str]:
"""
Split text into words using simple whitespace boundaries.
This is a simpler tokenizer that treats all whitespace as token boundaries
without special handling for HTML tags.
Args:
text: Input text to tokenize
Returns:
List of tokens (words and whitespace)
Examples:
>>> tokenize_words("Hello world")
['Hello', ' ', 'world']
>>> tokenize_words("one two")
['one', ' ', ' ', 'two']
"""
tokens = []
current = ''
for char in text:
if char.isspace():
if current:
tokens.append(current)
current = ''
tokens.append(char)
else:
current += char
if current:
tokens.append(current)
return tokens

View File

@@ -0,0 +1,61 @@
"""
Tokenizer that preserves HTML tags as atomic units while splitting on whitespace.
This tokenizer is specifically designed for HTML content where:
- HTML tags should remain intact (e.g., '<p>', '<a href="...">')
- Whitespace tokens are preserved for accurate diff reconstruction
- Words are split on whitespace boundaries
"""
from typing import List
def tokenize_words_and_html(text: str) -> List[str]:
"""
Split text into words and boundaries (spaces, HTML tags).
This tokenizer preserves HTML tags as atomic units while splitting on whitespace.
Useful for content that contains HTML markup.
Args:
text: Input text to tokenize
Returns:
List of tokens (words, spaces, HTML tags)
Examples:
>>> tokenize_words_and_html("<p>Hello world</p>")
['<p>', 'Hello', ' ', 'world', '</p>']
>>> tokenize_words_and_html("<a href='test.com'>link</a>")
['<a href=\\'test.com\\'>', 'link', '</a>']
"""
tokens = []
current = ''
in_tag = False
for char in text:
if char == '<':
# Start of HTML tag
if current:
tokens.append(current)
current = ''
current = '<'
in_tag = True
elif char == '>' and in_tag:
# End of HTML tag
current += '>'
tokens.append(current)
current = ''
in_tag = False
elif char.isspace() and not in_tag:
# Space outside of tag
if current:
tokens.append(current)
current = ''
tokens.append(char)
else:
current += char
if current:
tokens.append(current)
return tokens

View File

@@ -0,0 +1,43 @@
"""
Favicon utilities for changedetection.io
Handles favicon MIME type detection with caching
"""
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_favicon_mime_type(filepath):
"""
Detect MIME type of favicon by reading file content using puremagic.
Results are cached to avoid repeatedly reading the same files.
Args:
filepath: Full path to the favicon file
Returns:
MIME type string (e.g., 'image/png')
"""
mime = None
try:
import puremagic
with open(filepath, 'rb') as f:
content_bytes = f.read(200) # Read first 200 bytes
detections = puremagic.magic_string(content_bytes)
if detections:
mime = detections[0].mime_type
except Exception:
pass
# Fallback to mimetypes if puremagic fails
if not mime:
import mimetypes
mime, _ = mimetypes.guess_type(filepath)
# Final fallback based on extension
if not mime:
mime = 'image/x-icon' if filepath.endswith('.ico') else 'image/png'
return mime

View File

@@ -9,11 +9,12 @@ import threading
import time
import timeago
from blinker import signal
from pathlib import Path
from changedetectionio.strtobool import strtobool
from threading import Event
from changedetectionio.queue_handlers import RecheckPriorityQueue, NotificationQueue
from changedetectionio import worker_handler
from changedetectionio import worker_pool
from flask import (
Flask,
@@ -23,10 +24,9 @@ from flask import (
render_template,
request,
send_from_directory,
session,
url_for,
)
from flask_compress import Compress as FlaskCompress
from flask_login import current_user
from flask_restful import abort, Api
from flask_cors import CORS
@@ -34,13 +34,18 @@ from flask_cors import CORS
# Make this a global singleton to avoid multiple signal objects
watch_check_update = signal('watch_check_update', doc='Signal sent when a watch check is completed')
from flask_wtf import CSRFProtect
from flask_babel import Babel, gettext, get_locale
from loguru import logger
from changedetectionio import __version__
from changedetectionio import queuedWatchMetaData
from changedetectionio.api import Watch, WatchHistory, WatchSingleHistory, CreateWatch, Import, SystemInfo, Tag, Tags, Notifications, WatchFavicon
from changedetectionio.api import Watch, WatchHistory, WatchSingleHistory, WatchHistoryDiff, CreateWatch, Import, SystemInfo, Tag, Tags, Notifications, WatchFavicon, Spec
from changedetectionio.api.Search import Search
from .time_handler import is_within_schedule
from changedetectionio.languages import get_available_languages, get_language_codes, get_flag_for_locale, get_timeago_locale
from changedetectionio.favicon_utils import get_favicon_mime_type
IN_PYTEST = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
datastore = None
@@ -51,7 +56,7 @@ extra_stylesheets = []
# Use bulletproof janus-based queues for sync/async reliability
update_q = RecheckPriorityQueue()
notification_q = NotificationQueue()
MAX_QUEUE_SIZE = 2000
MAX_QUEUE_SIZE = 5000
app = Flask(__name__,
static_url_path="",
@@ -63,9 +68,45 @@ socketio_server = None
# Enable CORS, especially useful for the Chrome extension to operate from anywhere
CORS(app)
from werkzeug.routing import BaseConverter, ValidationError
from uuid import UUID
class StrictUUIDConverter(BaseConverter):
# Special sentinel values allowed in addition to strict UUIDs
_ALLOWED_SENTINELS = frozenset({'first'})
def to_python(self, value: str) -> str:
if value in self._ALLOWED_SENTINELS:
return value
try:
u = UUID(value)
except ValueError as e:
raise ValidationError() from e
# Reject non-standard formats (braces, URNs, no-hyphens)
if str(u) != value.lower():
raise ValidationError()
return str(u)
def to_url(self, value) -> str:
return str(value)
# app setup (once)
app.url_map.converters["uuid_str"] = StrictUUIDConverter
# Flask-Compress handles HTTP compression, Socket.IO compression disabled to prevent memory leak.
# There's also a bug between flask compress and socketio that causes some kind of slow memory leak
# It's better to use compression on your reverse proxy (nginx etc) instead.
if strtobool(os.getenv("FLASK_ENABLE_COMPRESSION")):
from flask_compress import Compress as FlaskCompress
app.config['COMPRESS_MIN_SIZE'] = 2096
app.config['COMPRESS_MIMETYPES'] = ['text/html', 'text/css', 'text/javascript', 'application/json', 'application/javascript', 'image/svg+xml']
# Use gzip only - smaller memory footprint than zstd/brotli (4-8KB vs 200-500KB contexts)
app.config['COMPRESS_ALGORITHM'] = ['gzip']
compress = FlaskCompress()
compress.init_app(app)
app.config['TEMPLATES_AUTO_RELOAD'] = False
# Super handy for compressing large BrowserSteps responses and others
FlaskCompress(app)
# Stop browser caching of assets
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
@@ -76,11 +117,43 @@ app.config['NEW_VERSION_AVAILABLE'] = False
if os.getenv('FLASK_SERVER_NAME'):
app.config['SERVER_NAME'] = os.getenv('FLASK_SERVER_NAME')
# Babel/i18n configuration
app.config['BABEL_TRANSLATION_DIRECTORIES'] = str(Path(__file__).parent / 'translations')
app.config['BABEL_DEFAULT_LOCALE'] = 'en_GB'
# Session configuration
# NOTE: Flask session (for locale, etc.) is separate from Flask-Login's remember-me cookie
# - Flask session stores data like session['locale'] in a signed cookie
# - Flask-Login's remember=True creates a separate authentication cookie
# - Setting PERMANENT_SESSION_LIFETIME controls how long the Flask session cookie lasts
from datetime import timedelta
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=3650) # ~10 years (effectively unlimited)
#app.config["EXPLAIN_TEMPLATE_LOADING"] = True
# Disables caching of the templates
app.config['TEMPLATES_AUTO_RELOAD'] = True
app.jinja_env.add_extension('jinja2.ext.loopcontrols')
# Configure Jinja2 to search for templates in plugin directories
def _configure_plugin_templates():
"""Configure Jinja2 loader to include plugin template directories."""
from jinja2 import ChoiceLoader, FileSystemLoader
from changedetectionio.pluggy_interface import get_plugin_template_paths
# Get plugin template paths
plugin_template_paths = get_plugin_template_paths()
if plugin_template_paths:
# Create a ChoiceLoader that searches app templates first, then plugin templates
loaders = [app.jinja_loader] # Keep the default app loader first
for path in plugin_template_paths:
loaders.append(FileSystemLoader(path))
app.jinja_loader = ChoiceLoader(loaders)
logger.info(f"Configured Jinja2 to search {len(plugin_template_paths)} plugin template directories")
# Configure plugin templates (called after plugins are loaded)
_configure_plugin_templates()
csrf = CSRFProtect()
csrf.init_app(app)
notification_debug_log=[]
@@ -149,7 +222,7 @@ def _jinja2_filter_format_number_locale(value: float) -> str:
@app.template_global('is_checking_now')
def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"):
return worker_handler.is_watch_running(watch_obj['uuid'])
return worker_pool.is_watch_running(watch_obj['uuid'])
@app.template_global('get_watch_queue_position')
def _get_watch_queue_position(watch_obj):
@@ -160,13 +233,13 @@ def _get_watch_queue_position(watch_obj):
@app.template_global('get_current_worker_count')
def _get_current_worker_count():
"""Get the current number of operational workers"""
return worker_handler.get_worker_count()
return worker_pool.get_worker_count()
@app.template_global('get_worker_status_info')
def _get_worker_status_info():
"""Get detailed worker status information for display"""
status = worker_handler.get_worker_status()
running_uuids = worker_handler.get_running_uuids()
status = worker_pool.get_worker_status()
running_uuids = worker_pool.get_running_uuids()
return {
'count': status['worker_count'],
@@ -183,16 +256,26 @@ def _get_worker_status_info():
def _jinja2_filter_datetime(watch_obj, format="%Y-%m-%d %H:%M:%S"):
if watch_obj['last_checked'] == 0:
return 'Not yet'
return gettext('Not yet')
return timeago.format(int(watch_obj['last_checked']), time.time())
locale = get_timeago_locale(str(get_locale()))
try:
return timeago.format(int(watch_obj['last_checked']), time.time(), locale)
except:
# Fallback to English if locale not supported by timeago
return timeago.format(int(watch_obj['last_checked']), time.time(), 'en')
@app.template_filter('format_timestamp_timeago')
def _jinja2_filter_datetimestamp(timestamp, format="%Y-%m-%d %H:%M:%S"):
if not timestamp:
return 'Not yet'
return gettext('Not yet')
return timeago.format(int(timestamp), time.time())
locale = get_timeago_locale(str(get_locale()))
try:
return timeago.format(int(timestamp), time.time(), locale)
except:
# Fallback to English if locale not supported by timeago
return timeago.format(int(timestamp), time.time(), 'en')
@app.template_filter('pagination_slice')
@@ -206,10 +289,119 @@ def _jinja2_filter_pagination_slice(arr, skip):
@app.template_filter('format_seconds_ago')
def _jinja2_filter_seconds_precise(timestamp):
if timestamp == False:
return 'Not yet'
return gettext('Not yet')
return format(int(time.time()-timestamp), ',d')
@app.template_filter('format_duration')
def _jinja2_filter_format_duration(seconds):
"""Format a duration in seconds into human readable string like '5 days, 3 hours, 30 minutes'"""
from datetime import timedelta
if not seconds or seconds < 0:
return gettext('0 seconds')
td = timedelta(seconds=int(seconds))
# Calculate components
years = td.days // 365
remaining_days = td.days % 365
months = remaining_days // 30
remaining_days = remaining_days % 30
weeks = remaining_days // 7
days = remaining_days % 7
hours = td.seconds // 3600
minutes = (td.seconds % 3600) // 60
secs = td.seconds % 60
# Build parts list
parts = []
if years > 0:
parts.append(f"{years} {gettext('year') if years == 1 else gettext('years')}")
if months > 0:
parts.append(f"{months} {gettext('month') if months == 1 else gettext('months')}")
if weeks > 0:
parts.append(f"{weeks} {gettext('week') if weeks == 1 else gettext('weeks')}")
if days > 0:
parts.append(f"{days} {gettext('day') if days == 1 else gettext('days')}")
if hours > 0:
parts.append(f"{hours} {gettext('hour') if hours == 1 else gettext('hours')}")
if minutes > 0:
parts.append(f"{minutes} {gettext('minute') if minutes == 1 else gettext('minutes')}")
if secs > 0 or not parts:
parts.append(f"{secs} {gettext('second') if secs == 1 else gettext('seconds')}")
return ", ".join(parts)
@app.template_filter('fetcher_status_icons')
def _jinja2_filter_fetcher_status_icons(fetcher_name):
"""Get status icon HTML for a given fetcher.
This filter checks both built-in fetchers and plugin fetchers for status icons.
Args:
fetcher_name: The fetcher name (e.g., 'html_webdriver', 'html_js_zyte')
Returns:
str: HTML string containing status icon elements
"""
from changedetectionio import content_fetchers
from changedetectionio.pluggy_interface import collect_fetcher_status_icons
from markupsafe import Markup
from flask import url_for
icon_data = None
# First check if it's a plugin fetcher (plugins have priority)
plugin_icon_data = collect_fetcher_status_icons(fetcher_name)
if plugin_icon_data:
icon_data = plugin_icon_data
# Check if it's a built-in fetcher
elif hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
if hasattr(fetcher_class, 'get_status_icon_data'):
icon_data = fetcher_class.get_status_icon_data()
# Build HTML from icon data
if icon_data and isinstance(icon_data, dict):
# Use 'group' from icon_data if specified, otherwise default to 'images'
group = icon_data.get('group', 'images')
# Try to use url_for, but fall back to manual URL building if endpoint not registered yet
try:
icon_url = url_for('static_content', group=group, filename=icon_data['filename'])
except:
# Fallback: build URL manually respecting APPLICATION_ROOT
from flask import request
app_root = request.script_root if hasattr(request, 'script_root') else ''
icon_url = f"{app_root}/static/{group}/{icon_data['filename']}"
style_attr = f' style="{icon_data["style"]}"' if icon_data.get('style') else ''
html = f'<img class="status-icon" src="{icon_url}" alt="{icon_data["alt"]}" title="{icon_data["title"]}"{style_attr}>'
return Markup(html)
return ''
@app.template_filter('sanitize_tag_class')
def _jinja2_filter_sanitize_tag_class(tag_title):
"""Sanitize a tag title to create a valid CSS class name.
Removes all non-alphanumeric characters and converts to lowercase.
Args:
tag_title: The tag title string
Returns:
str: A sanitized string suitable for use as a CSS class name
"""
import re
# Remove all non-alphanumeric characters and convert to lowercase
sanitized = re.sub(r'[^a-zA-Z0-9]', '', tag_title).lower()
# Ensure it starts with a letter (CSS requirement)
if sanitized and not sanitized[0].isalpha():
sanitized = 'tag' + sanitized
return sanitized if sanitized else 'tag'
# Import login_optionally_required from auth_decorator
from changedetectionio.auth_decorator import login_optionally_required
@@ -264,17 +456,68 @@ def changedetection_app(config=None, datastore_o=None):
global datastore, socketio_server
datastore = datastore_o
# Set datastore reference in notification queue for all_muted checking
notification_q.set_datastore(datastore)
# Import and create a wrapper for is_safe_url that has access to app
from changedetectionio.is_safe_url import is_safe_url as _is_safe_url
def is_safe_url(target):
"""Wrapper for is_safe_url that passes the app instance"""
return _is_safe_url(target, app)
# so far just for read-only via tests, but this will be moved eventually to be the main source
# (instead of the global var)
app.config['DATASTORE'] = datastore_o
# Store batch mode flag to skip background threads when running in batch mode
app.config['batch_mode'] = config.get('batch_mode', False) if config else False
# Store the signal in the app config to ensure it's accessible everywhere
app.config['watch_check_update_SIGNAL'] = watch_check_update
login_manager = flask_login.LoginManager(app)
login_manager.login_view = 'login'
app.secret_key = init_app_secret(config['datastore_path'])
# Initialize Flask-Babel for i18n support
available_languages = get_available_languages()
language_codes = get_language_codes()
def get_locale():
# Locale aliases: map browser language codes to translation directory names
# This handles cases where browsers send standard codes (e.g., zh-TW)
# but our translations use more specific codes (e.g., zh_Hant_TW)
locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
# 1. Try to get locale from session (user explicitly selected)
if 'locale' in session:
return session['locale']
# 2. Fall back to Accept-Language header
# Get the best match from browser's Accept-Language header
browser_locale = request.accept_languages.best_match(language_codes + list(locale_aliases.keys()))
# 3. Check if we need to map the browser locale to our internal locale
if browser_locale in locale_aliases:
return locale_aliases[browser_locale]
return browser_locale
# Initialize Babel with locale selector
babel = Babel(app, locale_selector=get_locale)
# Make i18n functions available to templates
app.jinja_env.globals.update(
_=gettext,
get_locale=get_locale,
get_flag_for_locale=get_flag_for_locale,
available_languages=available_languages
)
# Set up a request hook to check authentication for all routes
@app.before_request
def check_authentication():
@@ -285,6 +528,12 @@ def changedetection_app(config=None, datastore_o=None):
if request.endpoint and request.endpoint == 'static_content' and request.view_args:
# Handled by static_content handler
return None
# Permitted - static flag icons need to load on login page
elif request.endpoint and request.endpoint == 'static_flags':
return None
# Permitted - language selection should work on login page
elif request.endpoint and request.endpoint == 'set_language':
return None
# Permitted
elif request.endpoint and 'login' in request.endpoint:
return None
@@ -307,20 +556,23 @@ def changedetection_app(config=None, datastore_o=None):
return login_manager.unauthorized()
watch_api.add_resource(WatchHistoryDiff,
'/api/v1/watch/<uuid_str:uuid>/difference/<string:from_timestamp>/<string:to_timestamp>',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(WatchSingleHistory,
'/api/v1/watch/<string:uuid>/history/<string:timestamp>',
'/api/v1/watch/<uuid_str:uuid>/history/<string:timestamp>',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(WatchFavicon,
'/api/v1/watch/<string:uuid>/favicon',
'/api/v1/watch/<uuid_str:uuid>/favicon',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(WatchHistory,
'/api/v1/watch/<string:uuid>/history',
'/api/v1/watch/<uuid_str:uuid>/history',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(CreateWatch, '/api/v1/watch',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(Watch, '/api/v1/watch/<string:uuid>',
watch_api.add_resource(Watch, '/api/v1/watch/<uuid_str:uuid>',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(SystemInfo, '/api/v1/systeminfo',
@@ -333,7 +585,7 @@ def changedetection_app(config=None, datastore_o=None):
watch_api.add_resource(Tags, '/api/v1/tags',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(Tag, '/api/v1/tag', '/api/v1/tag/<string:uuid>',
watch_api.add_resource(Tag, '/api/v1/tag', '/api/v1/tag/<uuid_str:uuid>',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(Search, '/api/v1/search',
@@ -342,6 +594,8 @@ def changedetection_app(config=None, datastore_o=None):
watch_api.add_resource(Notifications, '/api/v1/notifications',
resource_class_kwargs={'datastore': datastore})
watch_api.add_resource(Spec, '/api/v1/full-spec')
@login_manager.user_loader
def user_loader(email):
user = User()
@@ -350,25 +604,76 @@ def changedetection_app(config=None, datastore_o=None):
@login_manager.unauthorized_handler
def unauthorized_handler():
flash("You must be logged in, please log in.", 'error')
return redirect(url_for('login', next=url_for('watchlist.index')))
# Pass the current request path so users are redirected back after login
return redirect(url_for('login', redirect=request.path))
@app.route('/logout')
def logout():
flask_login.logout_user()
# Check if there's a redirect parameter to return to after re-login
redirect_url = request.args.get('redirect')
# If redirect is provided and safe, pass it to login page
if redirect_url and is_safe_url(redirect_url):
return redirect(url_for('login', redirect=redirect_url))
# Otherwise just go to watchlist
return redirect(url_for('watchlist.index'))
@app.route('/set-language/<locale>')
def set_language(locale):
"""Set the user's preferred language in the session"""
if not request.cookies:
logger.error("Cannot set language without session cookie")
flash("Cannot set language without session cookie", 'error')
return redirect(url_for('watchlist.index'))
# Validate the locale against available languages
if locale in language_codes:
# Make session permanent so language preference persists across browser sessions
# NOTE: This is the Flask session cookie (separate from Flask-Login's remember-me auth cookie)
session.permanent = True
session['locale'] = locale
# CRITICAL: Flask-Babel caches the locale in the request context (ctx.babel_locale)
# We must refresh to clear this cache so the new locale takes effect immediately
# This is especially important for tests where multiple requests happen rapidly
from flask_babel import refresh
refresh()
else:
logger.error(f"Invalid locale {locale}, available: {language_codes}")
# Check if there's a redirect parameter to return to the same page
redirect_url = request.args.get('redirect')
# If redirect is provided and safe, use it
if redirect_url and is_safe_url(redirect_url):
return redirect(redirect_url)
# Otherwise redirect to watchlist
return redirect(url_for('watchlist.index'))
# https://github.com/pallets/flask/blob/93dd1709d05a1cf0e886df6223377bdab3b077fb/examples/tutorial/flaskr/__init__.py#L39
# You can divide up the stuff like this
@app.route('/login', methods=['GET', 'POST'])
def login():
# Extract and validate the redirect parameter
redirect_url = request.args.get('redirect') or request.form.get('redirect')
# Validate the redirect URL - default to watchlist if invalid
if redirect_url and is_safe_url(redirect_url):
validated_redirect = redirect_url
else:
validated_redirect = url_for('watchlist.index')
if request.method == 'GET':
if flask_login.current_user.is_authenticated:
flash("Already logged in")
return redirect(url_for("watchlist.index"))
output = render_template("login.html")
# Already logged in - redirect immediately to the target
flash(gettext("Already logged in"))
return redirect(validated_redirect)
flash(gettext("You must be logged in, please log in."), 'error')
output = render_template("login.html", redirect_url=validated_redirect)
return output
user = User()
@@ -378,23 +683,13 @@ def changedetection_app(config=None, datastore_o=None):
if (user.check_password(password)):
flask_login.login_user(user, remember=True)
# For now there's nothing else interesting here other than the index/list page
# It's more reliable and safe to ignore the 'next' redirect
# When we used...
# next = request.args.get('next')
# return redirect(next or url_for('watchlist.index'))
# We would sometimes get login loop errors on sites hosted in sub-paths
# note for the future:
# if not is_safe_valid_url(next):
# return flask.abort(400)
return redirect(url_for('watchlist.index'))
# Redirect to the validated URL after successful login
return redirect(validated_redirect)
else:
flash('Incorrect password', 'error')
flash(gettext('Incorrect password'), 'error')
return redirect(url_for('login'))
return redirect(url_for('login', redirect=redirect_url if redirect_url else None))
@app.before_request
def before_request_handle_cookie_x_settings():
@@ -404,12 +699,52 @@ def changedetection_app(config=None, datastore_o=None):
app.config['SESSION_COOKIE_PATH'] = request.headers['X-Forwarded-Prefix']
return None
@app.route("/static/flags/<path:flag_path>", methods=['GET'])
def static_flags(flag_path):
"""Handle flag icon files with subdirectories"""
from flask import make_response
import re
# flag_path comes in as "1x1/de.svg" or "4x3/de.svg"
if re.match(r'^(1x1|4x3)/[a-z0-9-]+\.svg$', flag_path.lower()):
# Reconstruct the path safely with additional validation
parts = flag_path.lower().split('/')
if len(parts) != 2:
abort(404)
subdir = parts[0]
svg_file = parts[1]
# Extra validation: ensure subdir is exactly 1x1 or 4x3
if subdir not in ['1x1', '4x3']:
abort(404)
# Extra validation: ensure svg_file only contains safe characters
if not re.match(r'^[a-z0-9-]+\.svg$', svg_file):
abort(404)
try:
response = make_response(send_from_directory(f"static/flags/{subdir}", svg_file))
response.headers['Content-type'] = 'image/svg+xml'
response.headers['Cache-Control'] = 'max-age=86400, public' # Cache for 24 hours
return response
except FileNotFoundError:
abort(404)
else:
abort(404)
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
def static_content(group, filename):
from flask import make_response
import re
group = re.sub(r'[^\w.-]+', '', group.lower())
filename = re.sub(r'[^\w.-]+', '', filename.lower())
# Strict sanitization: only allow a-z, 0-9, and underscore (blocks .. and other traversal)
group = re.sub(r'[^a-z0-9_-]+', '', group.lower())
filename = filename
# Additional safety: reject if sanitization resulted in empty strings
if not group or not filename:
abort(404)
if group == 'screenshot':
# Could be sensitive, follow password requirements
@@ -443,18 +778,11 @@ def changedetection_app(config=None, datastore_o=None):
favicon_filename = watch.get_favicon_filename()
if favicon_filename:
try:
import magic
mime = magic.from_file(
os.path.join(watch.watch_data_dir, favicon_filename),
mime=True
)
except ImportError:
# Fallback, no python-magic
import mimetypes
mime, encoding = mimetypes.guess_type(favicon_filename)
# Use cached MIME type detection
filepath = os.path.join(watch.data_dir, favicon_filename)
mime = get_favicon_mime_type(filepath)
response = make_response(send_from_directory(watch.watch_data_dir, favicon_filename))
response = make_response(send_from_directory(watch.data_dir, favicon_filename))
response.headers['Content-type'] = mime
response.headers['Cache-Control'] = 'max-age=300, must-revalidate' # Cache for 5 minutes, then revalidate
return response
@@ -488,6 +816,31 @@ def changedetection_app(config=None, datastore_o=None):
except FileNotFoundError:
abort(404)
# Handle plugin group specially
if group == 'plugin':
# Serve files from plugin static directories
from changedetectionio.pluggy_interface import plugin_manager
import os as os_check
for plugin_name, plugin_obj in plugin_manager.list_name_plugin():
if hasattr(plugin_obj, 'plugin_static_path'):
try:
static_path = plugin_obj.plugin_static_path()
if static_path and os_check.path.isdir(static_path):
# Check if file exists in plugin's static directory
plugin_file_path = os_check.path.join(static_path, filename)
if os_check.path.isfile(plugin_file_path):
# Found the file in a plugin
response = make_response(send_from_directory(static_path, filename))
response.headers['Cache-Control'] = 'max-age=3600, public' # Cache for 1 hour
return response
except Exception as e:
logger.debug(f"Error checking plugin {plugin_name} for static file: {e}")
pass
# File not found in any plugin
abort(404)
# These files should be in our subdirectory
try:
return send_from_directory(f"static/{group}", path=filename)
@@ -524,13 +877,15 @@ def changedetection_app(config=None, datastore_o=None):
# watchlist UI buttons etc
import changedetectionio.blueprint.ui as ui
app.register_blueprint(ui.construct_blueprint(datastore, update_q, worker_handler, queuedWatchMetaData, watch_check_update))
app.register_blueprint(ui.construct_blueprint(datastore, update_q, worker_pool, queuedWatchMetaData, watch_check_update))
import changedetectionio.blueprint.watchlist as watchlist
app.register_blueprint(watchlist.construct_blueprint(datastore=datastore, update_q=update_q, queuedWatchMetaData=queuedWatchMetaData), url_prefix='')
# Initialize Socket.IO server conditionally based on settings
socket_io_enabled = datastore.data['settings']['application']['ui'].get('socket_io_enabled', True)
socket_io_enabled = datastore.data['settings']['application'].get('ui', {}).get('socket_io_enabled', True)
if socket_io_enabled and app.config.get('batch_mode'):
socket_io_enabled = False
if socket_io_enabled:
from changedetectionio.realtime.socket_server import init_socketio
global socketio_server
@@ -559,10 +914,10 @@ def changedetection_app(config=None, datastore_o=None):
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
# Get basic status
status = worker_handler.get_worker_status()
status = worker_pool.get_worker_status()
# Perform health check
health_result = worker_handler.check_worker_health(
health_result = worker_pool.check_worker_health(
expected_count=expected_workers,
update_q=update_q,
notification_q=notification_q,
@@ -626,16 +981,31 @@ def changedetection_app(config=None, datastore_o=None):
# Can be overridden by ENV or use the default settings
n_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
logger.info(f"Starting {n_workers} workers during app initialization")
worker_handler.start_workers(n_workers, update_q, notification_q, app, datastore)
worker_pool.start_workers(n_workers, update_q, notification_q, app, datastore)
# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
threading.Thread(target=notification_runner).start()
# Skip background threads in batch mode (just process queue and exit)
batch_mode = app.config.get('batch_mode', False)
if not batch_mode:
# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks, daemon=True, name="TickerThread-ScheduleChecker").start()
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
# Check for new release version, but not when running in test/build or pytest
if not os.getenv("GITHUB_REF", False) and not strtobool(os.getenv('DISABLE_VERSION_CHECK', 'no')) and not in_pytest:
threading.Thread(target=check_for_new_version).start()
# Start configurable number of notification workers (default 1)
notification_workers = int(os.getenv("NOTIFICATION_WORKERS", "1"))
for i in range(notification_workers):
threading.Thread(
target=notification_runner,
args=(i,),
daemon=True,
name=f"NotificationRunner-{i}"
).start()
logger.info(f"Started {notification_workers} notification worker(s)")
in_pytest = "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
# Check for new release version, but not when running in test/build or pytest
if not os.getenv("GITHUB_REF", False) and not strtobool(os.getenv('DISABLE_VERSION_CHECK', 'no')) and not in_pytest:
threading.Thread(target=check_for_new_version, daemon=True, name="VersionChecker").start()
else:
logger.info("Batch mode: Skipping ticker thread, notification runner, and version checker")
# Return the Flask app - the Socket.IO will be attached to it but initialized separately
# This avoids circular dependencies
@@ -670,17 +1040,17 @@ def check_for_new_version():
app.config.exit.wait(86400)
def notification_runner():
def notification_runner(worker_id=0):
global notification_debug_log
from datetime import datetime
import json
with app.app_context():
while not app.config.exit.is_set():
try:
# At the moment only one thread runs (single runner)
# Multiple workers can run concurrently (configurable via NOTIFICATION_WORKERS)
n_object = notification_q.get(block=False)
except queue.Empty:
time.sleep(1)
app.config.exit.wait(1)
else:
@@ -703,7 +1073,7 @@ def notification_runner():
sent_obj = process_notification(n_object, datastore)
except Exception as e:
logger.error(f"Watch URL: {n_object['watch_url']} Error {str(e)}")
logger.error(f"Notification worker {worker_id} - Watch URL: {n_object['watch_url']} Error {str(e)}")
# UUID wont be present when we submit a 'test' from the global settings
if 'uuid' in n_object:
@@ -717,7 +1087,7 @@ def notification_runner():
app.config['watch_check_update_SIGNAL'].send(app_context=app, watch_uuid=n_object.get('uuid'))
# Process notifications
notification_debug_log+= ["{} - SENDING - {}".format(now.strftime("%Y/%m/%d %H:%M:%S,000"), json.dumps(sent_obj))]
notification_debug_log+= ["{} - SENDING - {}".format(now.strftime("%c"), json.dumps(sent_obj))]
# Trim the log length
notification_debug_log = notification_debug_log[-100:]
@@ -733,6 +1103,10 @@ def ticker_thread_check_time_launch_checks():
logger.debug(f"System env MINIMUM_SECONDS_RECHECK_TIME {recheck_time_minimum_seconds}")
# Workers are now started during app initialization, not here
WAIT_TIME_BETWEEN_LOOP = 1.0 if not IN_PYTEST else 0.01
if IN_PYTEST:
# The time between loops should be less than the first .sleep/wait in def wait_for_all_checks() of tests/util.py
logger.warning(f"Looks like we're in PYTEST! Setting time between searching for items to add to the queue to {WAIT_TIME_BETWEEN_LOOP}s")
while not app.config.exit.is_set():
@@ -740,7 +1114,7 @@ def ticker_thread_check_time_launch_checks():
now = time.time()
if now - last_health_check > 60:
expected_workers = int(os.getenv("FETCH_WORKERS", datastore.data['settings']['requests']['workers']))
health_result = worker_handler.check_worker_health(
health_result = worker_pool.check_worker_health(
expected_count=expected_workers,
update_q=update_q,
notification_q=notification_q,
@@ -750,11 +1124,19 @@ def ticker_thread_check_time_launch_checks():
if health_result['status'] != 'healthy':
logger.warning(f"Worker health check: {health_result['message']}")
last_health_check = now
# Check if all checks are paused
if datastore.data['settings']['application'].get('all_paused', False):
app.config.exit.wait(1)
continue
# Get a list of watches by UUID that are currently fetching data
running_uuids = worker_handler.get_running_uuids()
running_uuids = worker_pool.get_running_uuids()
# Build set of queued UUIDs once for O(1) lookup instead of O(n) per watch
queued_uuids = {q_item.item['uuid'] for q_item in update_q.queue}
# Re #232 - Deepcopy the data incase it changes while we're iterating through it all
watch_uuid_list = []
@@ -772,16 +1154,17 @@ def ticker_thread_check_time_launch_checks():
else:
break
# Re #438 - Don't place more watches in the queue to be checked if the queue is already large
while update_q.qsize() >= 2000:
logger.warning(f"Recheck watches queue size limit reached ({MAX_QUEUE_SIZE}), skipping adding more items")
time.sleep(3)
recheck_time_system_seconds = int(datastore.threshold_seconds)
# Check for watches outside of the time threshold to put in the thread queue.
for uuid in watch_uuid_list:
for watch_index, uuid in enumerate(watch_uuid_list):
# Re #438 - Check queue size every 100 watches for CPU efficiency (not every watch)
if watch_index % 100 == 0:
current_queue_size = update_q.qsize()
if current_queue_size >= MAX_QUEUE_SIZE:
logger.debug(f"Queue size limit reached ({current_queue_size}/{MAX_QUEUE_SIZE}), stopping scheduler this iteration.")
break
now = time.time()
watch = datastore.data['watching'].get(uuid)
if not watch:
@@ -831,7 +1214,7 @@ def ticker_thread_check_time_launch_checks():
seconds_since_last_recheck = now - watch['last_checked']
if seconds_since_last_recheck >= (threshold + watch.jitter_seconds) and seconds_since_last_recheck >= recheck_time_minimum_seconds:
if not uuid in running_uuids and uuid not in [q_uuid.item['uuid'] for q_uuid in update_q.queue]:
if not uuid in running_uuids and uuid not in queued_uuids:
# Proxies can be set to have a limit on seconds between which they can be called
watch_proxy = datastore.get_preferred_proxy_for_watch(uuid=uuid)
@@ -856,7 +1239,7 @@ def ticker_thread_check_time_launch_checks():
priority = int(time.time())
# Into the queue with you
queued_successfully = worker_handler.queue_item_async_safe(update_q,
queued_successfully = worker_pool.queue_item_async_safe(update_q,
queuedWatchMetaData.PrioritizedItem(priority=priority,
item={'uuid': uuid})
)
@@ -873,8 +1256,5 @@ def ticker_thread_check_time_launch_checks():
# Reset for next time
watch.jitter_seconds = 0
# Wait before checking the list again - saves CPU
time.sleep(1)
# Should be low so we can break this out in testing
app.config.exit.wait(1)
app.config.exit.wait(WAIT_TIME_BETWEEN_LOOP)

View File

@@ -2,16 +2,19 @@ import os
import re
from loguru import logger
from wtforms.widgets.core import TimeInput
from flask_babel import lazy_gettext as _l, gettext
from changedetectionio.blueprint.rss import RSS_FORMAT_TYPES, RSS_TEMPLATE_TYPE_OPTIONS, RSS_TEMPLATE_HTML_DEFAULT
from changedetectionio.conditions.form import ConditionFormRow
from changedetectionio.notification_service import NotificationContextData
from changedetectionio.strtobool import strtobool
from changedetectionio import processors
from wtforms import (
BooleanField,
Form,
Field,
FloatField,
IntegerField,
RadioField,
SelectField,
@@ -32,7 +35,7 @@ from changedetectionio.widgets import TernaryNoneBooleanField
# default
# each select <option data-enabled="enabled-0-0"
from changedetectionio.blueprint.browser_steps.browser_steps import browser_step_ui_config
from changedetectionio.browser_steps.browser_steps import browser_step_ui_config
from changedetectionio import html_tools, content_fetchers
@@ -55,8 +58,8 @@ valid_method = {
default_method = 'GET'
allow_simplehost = not strtobool(os.getenv('BLOCK_SIMPLEHOSTS', 'False'))
REQUIRE_ATLEAST_ONE_TIME_PART_MESSAGE_DEFAULT='At least one time interval (weeks, days, hours, minutes, or seconds) must be specified.'
REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT='At least one time interval (weeks, days, hours, minutes, or seconds) must be specified when not using global settings.'
REQUIRE_ATLEAST_ONE_TIME_PART_MESSAGE_DEFAULT=_l('At least one time interval (weeks, days, hours, minutes, or seconds) must be specified.')
REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT=_l('At least one time interval (weeks, days, hours, minutes, or seconds) must be specified when not using global settings.')
class StringListField(StringField):
widget = widgets.TextArea()
@@ -156,7 +159,7 @@ class TimeStringField(Field):
time_str = valuelist[0]
# Simple validation for HH:MM format
if not time_str or len(time_str.split(":")) != 2:
raise ValidationError("Invalid time format. Use HH:MM.")
raise ValidationError(_l("Invalid time format. Use HH:MM."))
self.data = time_str
@@ -172,15 +175,15 @@ class validateTimeZoneName(object):
from zoneinfo import available_timezones
python_timezones = available_timezones()
if field.data and field.data not in python_timezones:
raise ValidationError("Not a valid timezone name")
raise ValidationError(_l("Not a valid timezone name"))
class ScheduleLimitDaySubForm(Form):
enabled = BooleanField("not set", default=True)
start_time = TimeStringField("Start At", default="00:00", validators=[validators.Optional()])
duration = FormField(TimeDurationForm, label="Run duration")
enabled = BooleanField(_l("not set"), default=True)
start_time = TimeStringField(_l("Start At"), default="00:00", validators=[validators.Optional()])
duration = FormField(TimeDurationForm, label=_l("Run duration"))
class ScheduleLimitForm(Form):
enabled = BooleanField("Use time scheduler", default=False)
enabled = BooleanField(_l("Use time scheduler"), default=False)
# Because the label for=""" doesnt line up/work with the actual checkbox
monday = FormField(ScheduleLimitDaySubForm, label="")
tuesday = FormField(ScheduleLimitDaySubForm, label="")
@@ -190,7 +193,7 @@ class ScheduleLimitForm(Form):
saturday = FormField(ScheduleLimitDaySubForm, label="")
sunday = FormField(ScheduleLimitDaySubForm, label="")
timezone = StringField("Optional timezone to run in",
timezone = StringField(_l("Optional timezone to run in"),
render_kw={"list": "timezones"},
validators=[validateTimeZoneName()]
)
@@ -204,13 +207,13 @@ class ScheduleLimitForm(Form):
**kwargs,
):
super().__init__(formdata, obj, prefix, data, meta, **kwargs)
self.monday.form.enabled.label.text="Monday"
self.tuesday.form.enabled.label.text = "Tuesday"
self.wednesday.form.enabled.label.text = "Wednesday"
self.thursday.form.enabled.label.text = "Thursday"
self.friday.form.enabled.label.text = "Friday"
self.saturday.form.enabled.label.text = "Saturday"
self.sunday.form.enabled.label.text = "Sunday"
self.monday.form.enabled.label.text=_l("Monday")
self.tuesday.form.enabled.label.text = _l("Tuesday")
self.wednesday.form.enabled.label.text = _l("Wednesday")
self.thursday.form.enabled.label.text = _l("Thursday")
self.friday.form.enabled.label.text = _l("Friday")
self.saturday.form.enabled.label.text = _l("Saturday")
self.sunday.form.enabled.label.text = _l("Sunday")
def validate_time_between_check_has_values(form):
@@ -235,7 +238,7 @@ class RequiredTimeInterval(object):
Use this with FormField(TimeBetweenCheckForm, validators=[RequiredTimeInterval()]).
"""
def __init__(self, message=None):
self.message = message or 'At least one time interval (weeks, days, hours, minutes, or seconds) must be specified.'
self.message = message or _l('At least one time interval (weeks, days, hours, minutes, or seconds) must be specified.')
def __call__(self, form, field):
if not validate_time_between_check_has_values(field.form):
@@ -243,11 +246,11 @@ class RequiredTimeInterval(object):
class TimeBetweenCheckForm(Form):
weeks = IntegerField('Weeks', validators=[validators.Optional(), validators.NumberRange(min=0, message="Should contain zero or more seconds")])
days = IntegerField('Days', validators=[validators.Optional(), validators.NumberRange(min=0, message="Should contain zero or more seconds")])
hours = IntegerField('Hours', validators=[validators.Optional(), validators.NumberRange(min=0, message="Should contain zero or more seconds")])
minutes = IntegerField('Minutes', validators=[validators.Optional(), validators.NumberRange(min=0, message="Should contain zero or more seconds")])
seconds = IntegerField('Seconds', validators=[validators.Optional(), validators.NumberRange(min=0, message="Should contain zero or more seconds")])
weeks = IntegerField(_l('Weeks'), validators=[validators.Optional(), validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
days = IntegerField(_l('Days'), validators=[validators.Optional(), validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
hours = IntegerField(_l('Hours'), validators=[validators.Optional(), validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
minutes = IntegerField(_l('Minutes'), validators=[validators.Optional(), validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
seconds = IntegerField(_l('Seconds'), validators=[validators.Optional(), validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
# @todo add total seconds minimum validatior = minimum_seconds_recheck_time
def __init__(self, formdata=None, obj=None, prefix="", data=None, meta=None, **kwargs):
@@ -489,7 +492,6 @@ class ValidateJinja2Template(object):
Validates that a {token} is from a valid set
"""
def __call__(self, form, field):
from changedetectionio import notification
from changedetectionio.jinja2_custom import create_jinja_env
from jinja2 import BaseLoader, TemplateSyntaxError, UndefinedError
from jinja2.meta import find_undeclared_variables
@@ -717,18 +719,16 @@ class ValidateStartsWithRegex(object):
if not stripped:
if self.allow_empty:
continue
raise ValidationError(self.message or "Empty value not allowed.")
raise ValidationError(self.message or _l("Empty value not allowed."))
if not self.pattern.match(stripped):
raise ValidationError(self.message or "Invalid value.")
raise ValidationError(self.message or _l("Invalid value."))
class quickWatchForm(Form):
from . import processors
url = fields.URLField('URL', validators=[validateURL()])
tags = StringTagUUID('Group tag', [validators.Optional()])
watch_submit_button = SubmitField('Watch', render_kw={"class": "pure-button pure-button-primary"})
processor = RadioField(u'Processor', choices=processors.available_processors(), default="text_json_diff")
edit_and_watch_submit_button = SubmitField('Edit > Watch', render_kw={"class": "pure-button pure-button-primary"})
url = fields.URLField(_l('URL'), validators=[validateURL()])
tags = StringTagUUID(_l('Group tag'), validators=[validators.Optional()])
watch_submit_button = SubmitField(_l('Watch'), render_kw={"class": "pure-button pure-button-primary"})
processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default=processors.get_default_processor)
edit_and_watch_submit_button = SubmitField(_l('Edit > Watch'), render_kw={"class": "pure-button pure-button-primary"})
# Common to a single watch and the global settings
@@ -741,14 +741,14 @@ class commonSettingsForm(Form):
self.notification_title.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
self.notification_urls.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
fetch_backend = RadioField(u'Fetch Method', choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
notification_body = TextAreaField('Notification Body', default='{{ watch_url }} had a change.', validators=[validators.Optional(), ValidateJinja2Template()])
notification_format = SelectField('Notification format', choices=list(valid_notification_formats.items()))
notification_title = StringField('Notification Title', default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()])
notification_urls = StringListField('Notification URL List', validators=[validators.Optional(), ValidateAppRiseServers(), ValidateJinja2Template()])
processor = RadioField( label=u"Processor - What do you want to achieve?", choices=processors.available_processors(), default="text_json_diff")
scheduler_timezone_default = StringField("Default timezone for watch check scheduler", render_kw={"list": "timezones"}, validators=[validateTimeZoneName()])
webdriver_delay = IntegerField('Wait seconds before extracting text', validators=[validators.Optional(), validators.NumberRange(min=1, message="Should contain one or more seconds")])
fetch_backend = RadioField(_l('Fetch Method'), choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
notification_body = TextAreaField(_l('Notification Body'), default='{{ watch_url }} had a change.', validators=[validators.Optional(), ValidateJinja2Template()])
notification_format = SelectField(_l('Notification format'), choices=list(valid_notification_formats.items()))
notification_title = StringField(_l('Notification Title'), default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()])
notification_urls = StringListField(_l('Notification URL List'), validators=[validators.Optional(), ValidateAppRiseServers(), ValidateJinja2Template()])
processor = RadioField( label=_l("Processor - What do you want to achieve?"), choices=lambda: processors.available_processors(), default=processors.get_default_processor)
scheduler_timezone_default = StringField(_l("Default timezone for watch check scheduler"), render_kw={"list": "timezones"}, validators=[validateTimeZoneName()])
webdriver_delay = IntegerField(_l('Wait seconds before extracting text'), validators=[validators.Optional(), validators.NumberRange(min=1, message=_l("Should contain one or more seconds"))])
# Not true anymore but keep the validate_ hook for future use, we convert color tags
# def validate_notification_urls(self, field):
@@ -760,30 +760,30 @@ class commonSettingsForm(Form):
class importForm(Form):
from . import processors
processor = RadioField(u'Processor', choices=processors.available_processors(), default="text_json_diff")
urls = TextAreaField('URLs')
xlsx_file = FileField('Upload .xlsx file', validators=[FileAllowed(['xlsx'], 'Must be .xlsx file!')])
file_mapping = SelectField('File mapping', [validators.DataRequired()], choices={('wachete', 'Wachete mapping'), ('custom','Custom mapping')})
processor = RadioField(_l('Processor'), choices=lambda: processors.available_processors(), default=processors.get_default_processor)
urls = TextAreaField(_l('URLs'))
xlsx_file = FileField(_l('Upload .xlsx file'), validators=[FileAllowed(['xlsx'], _l('Must be .xlsx file!'))])
file_mapping = SelectField(_l('File mapping'), [validators.DataRequired()], choices={('wachete', 'Wachete mapping'), ('custom','Custom mapping')})
class SingleBrowserStep(Form):
operation = SelectField('Operation', [validators.Optional()], choices=browser_step_ui_config.keys())
operation = SelectField(_l('Operation'), [validators.Optional()], choices=browser_step_ui_config.keys())
# maybe better to set some <script>var..
selector = StringField('Selector', [validators.Optional()], render_kw={"placeholder": "CSS or xPath selector"})
optional_value = StringField('value', [validators.Optional()], render_kw={"placeholder": "Value"})
selector = StringField(_l('Selector'), [validators.Optional()], render_kw={"placeholder": "CSS or xPath selector"})
optional_value = StringField(_l('value'), [validators.Optional()], render_kw={"placeholder": "Value"})
# @todo move to JS? ajax fetch new field?
# remove_button = SubmitField('-', render_kw={"type": "button", "class": "pure-button pure-button-primary", 'title': 'Remove'})
# add_button = SubmitField('+', render_kw={"type": "button", "class": "pure-button pure-button-primary", 'title': 'Add new step after'})
# remove_button = SubmitField(_l('-'), render_kw={"type": "button", "class": "pure-button pure-button-primary", 'title': 'Remove'})
# add_button = SubmitField(_l('+'), render_kw={"type": "button", "class": "pure-button pure-button-primary", 'title': 'Add new step after'})
class processor_text_json_diff_form(commonSettingsForm):
url = fields.URLField('URL', validators=[validateURL()])
tags = StringTagUUID('Group tag', [validators.Optional()], default='')
url = fields.URLField('Web Page URL', validators=[validateURL()])
tags = StringTagUUID('Group Tag', [validators.Optional()], default='')
time_between_check = EnhancedFormField(
TimeBetweenCheckForm,
label=_l('Time Between Check'),
conditional_field='time_between_check_use_default',
conditional_message=REQUIRE_ATLEAST_ONE_TIME_PART_WHEN_NOT_GLOBAL_DEFAULT,
conditional_test_function=validate_time_between_check_has_values
@@ -791,49 +791,49 @@ class processor_text_json_diff_form(commonSettingsForm):
time_schedule_limit = FormField(ScheduleLimitForm)
time_between_check_use_default = BooleanField('Use global settings for time between check and scheduler.', default=False)
time_between_check_use_default = BooleanField(_l('Use global settings for time between check and scheduler.'), default=False)
include_filters = StringListField('CSS/JSONPath/JQ/XPath Filters', [ValidateCSSJSONXPATHInput()], default='')
include_filters = StringListField(_l('CSS/JSONPath/JQ/XPath Filters'), [ValidateCSSJSONXPATHInput()], default='')
subtractive_selectors = StringListField('Remove elements', [ValidateCSSJSONXPATHInput(allow_json=False)])
subtractive_selectors = StringListField(_l('Remove elements'), [ValidateCSSJSONXPATHInput(allow_json=False)])
extract_text = StringListField('Extract text', [ValidateListRegex()])
extract_text = StringListField(_l('Extract text'), [ValidateListRegex()])
title = StringField('Title', default='')
title = StringField(_l('Title'), default='')
ignore_text = StringListField('Ignore lines containing', [ValidateListRegex()])
ignore_text = StringListField(_l('Ignore lines containing'), [ValidateListRegex()])
headers = StringDictKeyValue('Request headers')
body = TextAreaField('Request body', [validators.Optional()])
method = SelectField('Request method', choices=valid_method, default=default_method)
ignore_status_codes = BooleanField('Ignore status codes (process non-2xx status codes as normal)', default=False)
check_unique_lines = BooleanField('Only trigger when unique lines appear in all history', default=False)
remove_duplicate_lines = BooleanField('Remove duplicate lines of text', default=False)
sort_text_alphabetically = BooleanField('Sort text alphabetically', default=False)
strip_ignored_lines = TernaryNoneBooleanField('Strip ignored lines', default=None)
trim_text_whitespace = BooleanField('Trim whitespace before and after text', default=False)
body = TextAreaField(_l('Request body'), [validators.Optional()])
method = SelectField(_l('Request method'), choices=valid_method, default=default_method)
ignore_status_codes = BooleanField(_l('Ignore status codes (process non-2xx status codes as normal)'), default=False)
check_unique_lines = BooleanField(_l('Only trigger when unique lines appear in all history'), default=False)
remove_duplicate_lines = BooleanField(_l('Remove duplicate lines of text'), default=False)
sort_text_alphabetically = BooleanField(_l('Sort text alphabetically'), default=False)
strip_ignored_lines = TernaryNoneBooleanField(_l('Strip ignored lines'), default=None)
trim_text_whitespace = BooleanField(_l('Trim whitespace before and after text'), default=False)
filter_text_added = BooleanField('Added lines', default=True)
filter_text_replaced = BooleanField('Replaced/changed lines', default=True)
filter_text_removed = BooleanField('Removed lines', default=True)
filter_text_added = BooleanField(_l('Added lines'), default=True)
filter_text_replaced = BooleanField(_l('Replaced/changed lines'), default=True)
filter_text_removed = BooleanField(_l('Removed lines'), default=True)
trigger_text = StringListField('Keyword triggers - Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
browser_steps = FieldList(FormField(SingleBrowserStep), min_entries=10)
text_should_not_be_present = StringListField('Block change-detection while text matches', [validators.Optional(), ValidateListRegex()])
webdriver_js_execute_code = TextAreaField('Execute JavaScript before change detection', render_kw={"rows": "5"}, validators=[validators.Optional()])
trigger_text = StringListField(_l('Keyword triggers - Trigger/wait for text'), [validators.Optional(), ValidateListRegex()])
browser_steps = FieldList(FormField(SingleBrowserStep), min_entries=10)
text_should_not_be_present = StringListField(_l('Block change-detection while text matches'), [validators.Optional(), ValidateListRegex()])
webdriver_js_execute_code = TextAreaField(_l('Execute JavaScript before change detection'), render_kw={"rows": "5"}, validators=[validators.Optional()])
save_button = SubmitField('Save', render_kw={"class": "pure-button pure-button-primary"})
save_button = SubmitField(_l('Save'), render_kw={"class": "pure-button pure-button-primary"})
proxy = RadioField('Proxy')
proxy = RadioField(_l('Proxy'))
# filter_failure_notification_send @todo make ternary
filter_failure_notification_send = BooleanField(
'Send a notification when the filter can no longer be found on the page', default=False)
notification_muted = TernaryNoneBooleanField('Notifications', default=None, yes_text="Muted", no_text="On")
notification_screenshot = BooleanField('Attach screenshot to notification (where possible)', default=False)
filter_failure_notification_send = BooleanField(_l('Send a notification when the filter can no longer be found on the page'), default=False)
notification_muted = TernaryNoneBooleanField(_l('Notifications'), default=None, yes_text=_l("Muted"), no_text=_l("On"))
notification_screenshot = BooleanField(_l('Attach screenshot to notification (where possible)'), default=False)
conditions_match_logic = RadioField(u'Match', choices=[('ALL', 'Match all of the following'),('ANY', 'Match any of the following')], default='ALL')
conditions_match_logic = RadioField(_l('Match'), choices=[('ALL', _l('Match all of the following')),('ANY', _l('Match any of the following'))], default='ALL')
conditions = FieldList(FormField(ConditionFormRow), min_entries=1) # Add rule logic here
use_page_title_in_list = TernaryNoneBooleanField('Use page <title> in list', default=None)
use_page_title_in_list = TernaryNoneBooleanField(_l('Use page <title> in list'), default=None)
history_snapshot_max_length = IntegerField(_l('Number of history items per watch to keep'), render_kw={"style": "width: 5em;"}, validators=[validators.Optional(), validators.NumberRange(min=2)])
def extra_tab_content(self):
return None
@@ -850,7 +850,7 @@ class processor_text_json_diff_form(commonSettingsForm):
# Fail form validation when a body is set for a GET
if self.method.data == 'GET' and self.body.data:
self.body.errors.append('Body must be empty when Request Method is set to GET')
self.body.errors.append(gettext('Body must be empty when Request Method is set to GET'))
result = False
# Attempt to validate jinja2 templates in the URL
@@ -859,11 +859,11 @@ class processor_text_json_diff_form(commonSettingsForm):
except ModuleNotFoundError as e:
# incase jinja2_time or others is missing
logger.error(e)
self.url.errors.append(f'Invalid template syntax configuration: {e}')
self.url.errors.append(gettext('Invalid template syntax configuration: %(error)s') % {'error': e})
result = False
except Exception as e:
logger.error(e)
self.url.errors.append(f'Invalid template syntax: {e}')
self.url.errors.append(gettext('Invalid template syntax: %(error)s') % {'error': e})
result = False
# Attempt to validate jinja2 templates in the body
@@ -873,11 +873,11 @@ class processor_text_json_diff_form(commonSettingsForm):
except ModuleNotFoundError as e:
# incase jinja2_time or others is missing
logger.error(e)
self.body.errors.append(f'Invalid template syntax configuration: {e}')
self.body.errors.append(gettext('Invalid template syntax configuration: %(error)s') % {'error': e})
result = False
except Exception as e:
logger.error(e)
self.body.errors.append(f'Invalid template syntax: {e}')
self.body.errors.append(gettext('Invalid template syntax: %(error)s') % {'error': e})
result = False
# Attempt to validate jinja2 templates in the headers
@@ -888,11 +888,11 @@ class processor_text_json_diff_form(commonSettingsForm):
except ModuleNotFoundError as e:
# incase jinja2_time or others is missing
logger.error(e)
self.headers.errors.append(f'Invalid template syntax configuration: {e}')
self.headers.errors.append(gettext('Invalid template syntax configuration: %(error)s') % {'error': e})
result = False
except Exception as e:
logger.error(e)
self.headers.errors.append(f'Invalid template syntax in "{header}" header: {e}')
self.headers.errors.append(gettext('Invalid template syntax in \"%(header)s\" header: %(error)s') % {'header': header, 'error': e})
result = False
return result
@@ -916,110 +916,124 @@ class processor_text_json_diff_form(commonSettingsForm):
class SingleExtraProxy(Form):
# maybe better to set some <script>var..
proxy_name = StringField('Name', [validators.Optional()], render_kw={"placeholder": "Name"})
proxy_url = StringField('Proxy URL', [
proxy_name = StringField(_l('Name'), [validators.Optional()], render_kw={"placeholder": "Name"})
proxy_url = StringField(_l('Proxy URL'), [
validators.Optional(),
ValidateStartsWithRegex(
regex=r'^(https?|socks5)://', # ✅ main pattern
flags=re.IGNORECASE, # ✅ makes it case-insensitive
message='Proxy URLs must start with http://, https:// or socks5://',
message=_l('Proxy URLs must start with http://, https:// or socks5://'),
),
ValidateSimpleURL()
], render_kw={"placeholder": "socks5:// or regular proxy http://user:pass@...:3128", "size":50})
class SingleExtraBrowser(Form):
browser_name = StringField('Name', [validators.Optional()], render_kw={"placeholder": "Name"})
browser_connection_url = StringField('Browser connection URL', [
browser_name = StringField(_l('Name'), [validators.Optional()], render_kw={"placeholder": "Name"})
browser_connection_url = StringField(_l('Browser connection URL'), [
validators.Optional(),
ValidateStartsWithRegex(
regex=r'^(wss?|ws)://',
flags=re.IGNORECASE,
message='Browser URLs must start with wss:// or ws://'
message=_l('Browser URLs must start with wss:// or ws://')
),
ValidateSimpleURL()
], render_kw={"placeholder": "wss://brightdata... wss://oxylabs etc", "size":50})
class DefaultUAInputForm(Form):
html_requests = StringField('Plaintext requests', validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
html_requests = StringField(_l('Plaintext requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
if os.getenv("PLAYWRIGHT_DRIVER_URL") or os.getenv("WEBDRIVER_URL"):
html_webdriver = StringField('Chrome requests', validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
html_webdriver = StringField(_l('Chrome requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
# datastore.data['settings']['requests']..
class globalSettingsRequestForm(Form):
time_between_check = RequiredFormField(TimeBetweenCheckForm)
time_between_check = RequiredFormField(TimeBetweenCheckForm, label=_l('Time Between Check'))
time_schedule_limit = FormField(ScheduleLimitForm)
proxy = RadioField('Default proxy')
jitter_seconds = IntegerField('Random jitter seconds ± check',
proxy = RadioField(_l('Default proxy'))
jitter_seconds = IntegerField(_l('Random jitter seconds ± check'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0, message="Should contain zero or more seconds")])
validators=[validators.NumberRange(min=0, message=_l("Should contain zero or more seconds"))])
workers = IntegerField('Number of fetch workers',
workers = IntegerField(_l('Number of fetch workers'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=1, max=50,
message="Should be between 1 and 50")])
message=_l("Should be between 1 and 50"))])
timeout = IntegerField('Requests timeout in seconds',
timeout = IntegerField(_l('Requests timeout in seconds'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=1, max=999,
message="Should be between 1 and 999")])
message=_l("Should be between 1 and 999"))])
extra_proxies = FieldList(FormField(SingleExtraProxy), min_entries=5)
extra_browsers = FieldList(FormField(SingleExtraBrowser), min_entries=5)
default_ua = FormField(DefaultUAInputForm, label="Default User-Agent overrides")
default_ua = FormField(DefaultUAInputForm, label=_l("Default User-Agent overrides"))
def validate_extra_proxies(self, extra_validators=None):
for e in self.data['extra_proxies']:
if e.get('proxy_name') or e.get('proxy_url'):
if not e.get('proxy_name','').strip() or not e.get('proxy_url','').strip():
self.extra_proxies.errors.append('Both a name, and a Proxy URL is required.')
self.extra_proxies.errors.append(gettext('Both a name, and a Proxy URL is required.'))
return False
class globalSettingsApplicationUIForm(Form):
open_diff_in_new_tab = BooleanField("Open 'History' page in a new tab", default=True, validators=[validators.Optional()])
socket_io_enabled = BooleanField('Realtime UI Updates Enabled', default=True, validators=[validators.Optional()])
favicons_enabled = BooleanField('Favicons Enabled', default=True, validators=[validators.Optional()])
use_page_title_in_list = BooleanField('Use page <title> in watch overview list') #BooleanField=True
open_diff_in_new_tab = BooleanField(_l("Open 'History' page in a new tab"), default=True, validators=[validators.Optional()])
socket_io_enabled = BooleanField(_l('Realtime UI Updates Enabled'), default=True, validators=[validators.Optional()])
favicons_enabled = BooleanField(_l('Favicons Enabled'), default=True, validators=[validators.Optional()])
use_page_title_in_list = BooleanField(_l('Use page <title> in watch overview list')) #BooleanField=True
# datastore.data['settings']['application']..
class globalSettingsApplicationForm(commonSettingsForm):
api_access_token_enabled = BooleanField('API access token security check enabled', default=True, validators=[validators.Optional()])
base_url = StringField('Notification base URL override',
api_access_token_enabled = BooleanField(_l('API access token security check enabled'), default=True, validators=[validators.Optional()])
base_url = StringField(_l('Notification base URL override'),
validators=[validators.Optional()],
render_kw={"placeholder": os.getenv('BASE_URL', 'Not set')}
)
empty_pages_are_a_change = BooleanField('Treat empty pages as a change?', default=False)
fetch_backend = RadioField('Fetch Method', default="html_requests", choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
global_ignore_text = StringListField('Ignore Text', [ValidateListRegex()])
global_subtractive_selectors = StringListField('Remove elements', [ValidateCSSJSONXPATHInput(allow_json=False)])
ignore_whitespace = BooleanField('Ignore whitespace')
password = SaltyPasswordField()
pager_size = IntegerField('Pager size',
empty_pages_are_a_change = BooleanField(_l('Treat empty pages as a change?'), default=False)
fetch_backend = RadioField(_l('Fetch Method'), default="html_requests", choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
global_ignore_text = StringListField(_l('Ignore Text'), [ValidateListRegex()])
global_subtractive_selectors = StringListField(_l('Remove elements'), [ValidateCSSJSONXPATHInput(allow_json=False)])
ignore_whitespace = BooleanField(_l('Ignore whitespace'))
# Screenshot comparison settings
min_change_percentage = FloatField(
'Screenshot: Minimum Change Percentage',
validators=[
validators.Optional(),
validators.NumberRange(min=0.0, max=100.0, message=_l('Must be between 0 and 100'))
],
default=0.1,
render_kw={"placeholder": "0.1", "style": "width: 8em;"}
)
password = SaltyPasswordField(_l('Password'))
pager_size = IntegerField(_l('Pager size'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,
message="Should be atleast zero (disabled)")])
message=_l("Should be atleast zero (disabled)"))])
rss_content_format = SelectField('RSS Content format', choices=list(RSS_FORMAT_TYPES.items()))
rss_template_type = SelectField('RSS <description> body built from', choices=list(RSS_TEMPLATE_TYPE_OPTIONS.items()))
rss_template_override = TextAreaField('RSS "System default" template override', render_kw={"rows": "5", "placeholder": RSS_TEMPLATE_HTML_DEFAULT}, validators=[validators.Optional(), ValidateJinja2Template()])
rss_content_format = SelectField(_l('RSS Content format'), choices=list(RSS_FORMAT_TYPES.items()))
rss_template_type = SelectField(_l('RSS <description> body built from'), choices=list(RSS_TEMPLATE_TYPE_OPTIONS.items()))
rss_template_override = TextAreaField(_l('RSS "System default" template override'), render_kw={"rows": "5", "placeholder": RSS_TEMPLATE_HTML_DEFAULT}, validators=[validators.Optional(), ValidateJinja2Template()])
removepassword_button = SubmitField('Remove password', render_kw={"class": "pure-button pure-button-primary"})
render_anchor_tag_content = BooleanField('Render anchor tag content', default=False)
shared_diff_access = BooleanField('Allow anonymous access to watch history page when password is enabled', default=False, validators=[validators.Optional()])
strip_ignored_lines = BooleanField('Strip ignored lines')
rss_hide_muted_watches = BooleanField('Hide muted watches from RSS feed', default=True,
removepassword_button = SubmitField(_l('Remove password'), render_kw={"class": "pure-button pure-button-primary"})
render_anchor_tag_content = BooleanField(_l('Render anchor tag content'), default=False)
shared_diff_access = BooleanField(_l('Allow anonymous access to watch history page when password is enabled'), default=False, validators=[validators.Optional()])
strip_ignored_lines = BooleanField(_l('Strip ignored lines'))
rss_hide_muted_watches = BooleanField(_l('Hide muted watches from RSS feed'), default=True,
validators=[validators.Optional()])
rss_reader_mode = BooleanField('Enable RSS reader mode ', default=False, validators=[validators.Optional()])
rss_diff_length = IntegerField(label='Number of changes to show in watch RSS feed',
rss_reader_mode = BooleanField(_l('Enable RSS reader mode '), default=False, validators=[validators.Optional()])
rss_diff_length = IntegerField(label=_l('Number of changes to show in watch RSS feed'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0, message="Should contain zero or more attempts")])
validators=[validators.NumberRange(min=0, message=_l("Should contain zero or more attempts"))])
filter_failure_notification_threshold_attempts = IntegerField('Number of times the filter can be missing before sending a notification',
filter_failure_notification_threshold_attempts = IntegerField(_l('Number of times the filter can be missing before sending a notification'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,
message="Should contain zero or more attempts")])
message=_l("Should contain zero or more attempts"))])
history_snapshot_max_length = IntegerField(_l('Number of history items per watch to keep'), render_kw={"style": "width: 5em;"}, validators=[validators.Optional(), validators.NumberRange(min=2)])
ui = FormField(globalSettingsApplicationUIForm)
@@ -1035,9 +1049,9 @@ class globalSettingsForm(Form):
requests = FormField(globalSettingsRequestForm)
application = FormField(globalSettingsApplicationForm)
save_button = SubmitField('Save', render_kw={"class": "pure-button pure-button-primary"})
save_button = SubmitField(_l('Save'), render_kw={"class": "pure-button pure-button-primary"})
class extractDataForm(Form):
extract_regex = StringField('RegEx to extract', validators=[validators.DataRequired(), ValidateSinglePythonRegexString()])
extract_submit_button = SubmitField('Extract as CSV', render_kw={"class": "pure-button pure-button-primary"})
extract_regex = StringField(_l('RegEx to extract'), validators=[validators.DataRequired(), ValidateSinglePythonRegexString()])
extract_submit_button = SubmitField(_l('Extract as CSV'), render_kw={"class": "pure-button pure-button-primary"})

View File

@@ -172,99 +172,131 @@ def elementpath_tostring(obj):
return str(obj)
# Return str Utf-8 of matched rules
def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_rss=False):
def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_xml=False):
"""
:param xpath_filter:
:param html_content:
:param append_pretty_line_formatting:
:param is_xml: set to true if is XML or is RSS (RSS is XML)
:return:
"""
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
parser = etree.HTMLParser()
if is_rss:
# So that we can keep CDATA for cdata_in_document_to_text() to process
parser = etree.XMLParser(strip_cdata=False)
tree = html.fromstring(bytes(html_content, encoding='utf-8'), parser=parser)
html_block = ""
# Build namespace map for XPath queries
namespaces = {'re': 'http://exslt.org/regular-expressions'}
# Handle default namespace in documents (common in RSS/Atom feeds, but can occur in any XML)
# XPath spec: unprefixed element names have no namespace, not the default namespace
# Solution: Register the default namespace with empty string prefix in elementpath
# This is primarily for RSS/Atom feeds but works for any XML with default namespace
if hasattr(tree, 'nsmap') and tree.nsmap and None in tree.nsmap:
# Register the default namespace with empty string prefix for elementpath
# This allows //title to match elements in the default namespace
namespaces[''] = tree.nsmap[None]
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=XPath3Parser)
#@note: //title/text() now works with default namespaces (fixed by registering '' prefix)
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
if type(r) != list:
r = [r]
for element in r:
# When there's more than 1 match, then add the suffix to separate each line
# And where the matched result doesn't include something that will cause Inscriptis to add a newline
# (This way each 'match' reliably has a new-line in the diff)
# Divs are converted to 4 whitespaces by inscriptis
if append_pretty_line_formatting and len(html_block) and (not hasattr( element, 'tag' ) or not element.tag in (['br', 'hr', 'div', 'p'])):
html_block += TEXT_FILTER_LIST_LINE_SUFFIX
if type(element) == str:
html_block += element
elif issubclass(type(element), etree._Element) or issubclass(type(element), etree._ElementTree):
html_block += etree.tostring(element, pretty_print=True).decode('utf-8')
tree = None
try:
if is_xml:
# So that we can keep CDATA for cdata_in_document_to_text() to process
parser = etree.XMLParser(strip_cdata=False)
# For XML/RSS content, use etree.fromstring to properly handle XML declarations
tree = etree.fromstring(html_content.encode('utf-8') if isinstance(html_content, str) else html_content, parser=parser)
else:
html_block += elementpath_tostring(element)
tree = html.fromstring(html_content, parser=parser)
html_block = ""
return html_block
# Build namespace map for XPath queries
namespaces = {'re': 'http://exslt.org/regular-expressions'}
# Handle default namespace in documents (common in RSS/Atom feeds, but can occur in any XML)
# XPath spec: unprefixed element names have no namespace, not the default namespace
# Solution: Register the default namespace with empty string prefix in elementpath
# This is primarily for RSS/Atom feeds but works for any XML with default namespace
if hasattr(tree, 'nsmap') and tree.nsmap and None in tree.nsmap:
# Register the default namespace with empty string prefix for elementpath
# This allows //title to match elements in the default namespace
namespaces[''] = tree.nsmap[None]
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=XPath3Parser)
#@note: //title/text() now works with default namespaces (fixed by registering '' prefix)
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
if type(r) != list:
r = [r]
for element in r:
# When there's more than 1 match, then add the suffix to separate each line
# And where the matched result doesn't include something that will cause Inscriptis to add a newline
# (This way each 'match' reliably has a new-line in the diff)
# Divs are converted to 4 whitespaces by inscriptis
if append_pretty_line_formatting and len(html_block) and (not hasattr( element, 'tag' ) or not element.tag in (['br', 'hr', 'div', 'p'])):
html_block += TEXT_FILTER_LIST_LINE_SUFFIX
if type(element) == str:
html_block += element
elif issubclass(type(element), etree._Element) or issubclass(type(element), etree._ElementTree):
# Use 'xml' method for RSS/XML content, 'html' for HTML content
# parser will be XMLParser if we detected XML content
method = 'xml' if (is_xml or isinstance(parser, etree.XMLParser)) else 'html'
html_block += etree.tostring(element, pretty_print=True, method=method, encoding='unicode')
else:
html_block += elementpath_tostring(element)
return html_block
finally:
# Explicitly clear the tree to free memory
# lxml trees can hold significant memory, especially with large documents
if tree is not None:
tree.clear()
# Return str Utf-8 of matched rules
# 'xpath1:'
def xpath1_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_rss=False):
def xpath1_filter(xpath_filter, html_content, append_pretty_line_formatting=False, is_xml=False):
from lxml import etree, html
parser = None
if is_rss:
# So that we can keep CDATA for cdata_in_document_to_text() to process
parser = etree.XMLParser(strip_cdata=False)
tree = html.fromstring(bytes(html_content, encoding='utf-8'), parser=parser)
html_block = ""
# Build namespace map for XPath queries
namespaces = {'re': 'http://exslt.org/regular-expressions'}
# NOTE: lxml's native xpath() does NOT support empty string prefix for default namespace
# For documents with default namespace (RSS/Atom feeds), users must use:
# - local-name(): //*[local-name()='title']/text()
# - Or use xpath_filter (not xpath1_filter) which supports default namespaces
# XPath spec: unprefixed element names have no namespace, not the default namespace
r = tree.xpath(xpath_filter.strip(), namespaces=namespaces)
#@note: xpath1 (lxml) does NOT automatically handle default namespaces
#@note: Use //*[local-name()='element'] or switch to xpath_filter for default namespace support
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
for element in r:
# When there's more than 1 match, then add the suffix to separate each line
# And where the matched result doesn't include something that will cause Inscriptis to add a newline
# (This way each 'match' reliably has a new-line in the diff)
# Divs are converted to 4 whitespaces by inscriptis
if append_pretty_line_formatting and len(html_block) and (not hasattr(element, 'tag') or not element.tag in (['br', 'hr', 'div', 'p'])):
html_block += TEXT_FILTER_LIST_LINE_SUFFIX
# Some kind of text, UTF-8 or other
if isinstance(element, (str, bytes)):
html_block += element
tree = None
try:
if is_xml:
# So that we can keep CDATA for cdata_in_document_to_text() to process
parser = etree.XMLParser(strip_cdata=False)
# For XML/RSS content, use etree.fromstring to properly handle XML declarations
tree = etree.fromstring(html_content.encode('utf-8') if isinstance(html_content, str) else html_content, parser=parser)
else:
# Return the HTML which will get parsed as text
html_block += etree.tostring(element, pretty_print=True).decode('utf-8')
tree = html.fromstring(html_content, parser=parser)
html_block = ""
return html_block
# Build namespace map for XPath queries
namespaces = {'re': 'http://exslt.org/regular-expressions'}
# NOTE: lxml's native xpath() does NOT support empty string prefix for default namespace
# For documents with default namespace (RSS/Atom feeds), users must use:
# - local-name(): //*[local-name()='title']/text()
# - Or use xpath_filter (not xpath1_filter) which supports default namespaces
# XPath spec: unprefixed element names have no namespace, not the default namespace
r = tree.xpath(xpath_filter.strip(), namespaces=namespaces)
#@note: xpath1 (lxml) does NOT automatically handle default namespaces
#@note: Use //*[local-name()='element'] or switch to xpath_filter for default namespace support
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
for element in r:
# When there's more than 1 match, then add the suffix to separate each line
# And where the matched result doesn't include something that will cause Inscriptis to add a newline
# (This way each 'match' reliably has a new-line in the diff)
# Divs are converted to 4 whitespaces by inscriptis
if append_pretty_line_formatting and len(html_block) and (not hasattr(element, 'tag') or not element.tag in (['br', 'hr', 'div', 'p'])):
html_block += TEXT_FILTER_LIST_LINE_SUFFIX
# Some kind of text, UTF-8 or other
if isinstance(element, (str, bytes)):
html_block += element
else:
# Return the HTML/XML which will get parsed as text
# Use 'xml' method for RSS/XML content, 'html' for HTML content
# parser will be XMLParser if we detected XML content
method = 'xml' if (is_xml or isinstance(parser, etree.XMLParser)) else 'html'
html_block += etree.tostring(element, pretty_print=True, method=method, encoding='unicode')
return html_block
finally:
# Explicitly clear the tree to free memory
# lxml trees can hold significant memory, especially with large documents
if tree is not None:
tree.clear()
# Extract/find element
def extract_element(find='title', html_content=''):
@@ -432,6 +464,9 @@ def strip_ignore_text(content, wordlist, mode="content"):
ignore_regex_multiline = []
ignored_lines = []
if not content:
return ''
for k in wordlist:
# Skip empty strings to avoid matching everything
if not k or not k.strip():
@@ -504,6 +539,18 @@ def cdata_in_document_to_text(html_content: str, render_anchor_tag_content=False
def html_to_text(html_content: str, render_anchor_tag_content=False, is_rss=False, timeout=10) -> str:
"""
Convert HTML content to plain text using inscriptis.
Thread-Safety: This function uses inscriptis.get_text() which internally calls
lxml.html.fromstring() with the default parser. Testing with 50 concurrent threads
confirms this approach is thread-safe and produces deterministic output.
Alternative Approach Rejected: An explicit HTMLParser instance (thread-local or fresh)
would also be thread-safe, but was found to break change detection logic in subtle ways
(test_check_basic_change_detection_functionality). The default parser provides correct
and reliable behavior.
"""
from inscriptis import get_text
from inscriptis.model.config import ParserConfig
@@ -514,10 +561,33 @@ def html_to_text(html_content: str, render_anchor_tag_content=False, is_rss=Fals
)
else:
parser_config = None
if is_rss:
html_content = re.sub(r'<title([\s>])', r'<h1\1', html_content)
html_content = re.sub(r'</title>', r'</h1>', html_content)
else:
# Use BS4 html.parser to strip bloat — SPA's often dump 10MB+ of CSS/JS into <head>,
# causing inscriptis to silently give up. Regex-based stripping is unsafe because tags
# can appear inside JSON data attributes with JS-escaped closing tags (e.g. <\/script>),
# causing the regex to scan past the intended close and eat real page content.
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# Strip tags that inscriptis cannot render as meaningful text and which can be very large.
# svg/math: produce path-data/MathML garbage; canvas/iframe/template: no inscriptis handlers.
# video/audio/picture are kept — they may contain meaningful fallback text or captions.
for tag in soup.find_all(['head', 'script', 'style', 'noscript', 'svg',
'math', 'canvas', 'iframe', 'template']):
tag.decompose()
# SPAs often use <body style="display:none"> to hide content until JS loads.
# inscriptis respects CSS display rules, so strip hiding styles from the body tag.
body_tag = soup.find('body')
if body_tag and body_tag.get('style'):
style = body_tag['style']
if re.search(r'\b(?:display\s*:\s*none|visibility\s*:\s*hidden)\b', style, re.IGNORECASE):
logger.debug(f"html_to_text: Removing hiding styles from body tag (found: '{style}')")
del body_tag['style']
html_content = str(soup)
text_content = get_text(html_content, config=parser_config)
return text_content

View File

@@ -0,0 +1,113 @@
"""
URL redirect validation module for preventing open redirect vulnerabilities.
This module provides functionality to safely validate redirect URLs, ensuring they:
1. Point to internal routes only (no external redirects)
2. Are properly normalized (preventing browser parsing differences)
3. Match registered Flask routes (no fake/non-existent pages)
4. Are fully logged for security monitoring
References:
- https://flask-login.readthedocs.io/ (safe redirect patterns)
- https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-v-user-logins
- https://www.pythonkitchen.com/how-prevent-open-redirect-vulnerab-flask/
"""
from urllib.parse import urlparse, urljoin
from flask import request
from loguru import logger
def is_safe_url(target, app):
"""
Validate that a redirect URL is safe to prevent open redirect vulnerabilities.
This follows Flask/Werkzeug best practices by ensuring the redirect URL:
1. Is a relative path starting with exactly one '/'
2. Does not start with '//' (double-slash attack)
3. Has no external protocol handlers
4. Points to a valid registered route in the application
5. Is properly normalized to prevent browser parsing differences
Args:
target: The URL to validate (e.g., '/settings', '/login#top')
app: The Flask application instance (needed for route validation)
Returns:
bool: True if the URL is safe for redirection, False otherwise
Examples:
>>> is_safe_url('/settings', app)
True
>>> is_safe_url('//evil.com', app)
False
>>> is_safe_url('/settings#general', app)
True
>>> is_safe_url('/fake-page', app)
False
"""
if not target:
return False
# Normalize the URL to prevent browser parsing differences
# Strip whitespace and replace backslashes (which some browsers interpret as forward slashes)
target = target.strip()
target = target.replace('\\', '/')
# First, check if it starts with // or more (double-slash attack)
if target.startswith('//'):
logger.warning(f"Blocked redirect attempt with double-slash: {target}")
return False
# Parse the URL to check for scheme and netloc
parsed = urlparse(target)
# Block any URL with a scheme (http://, https://, javascript:, etc.)
if parsed.scheme:
logger.warning(f"Blocked redirect attempt with scheme: {target}")
return False
# Block any URL with a network location (netloc)
# This catches patterns like //evil.com, user@host, etc.
if parsed.netloc:
logger.warning(f"Blocked redirect attempt with netloc: {target}")
return False
# At this point, we have a relative URL with no scheme or netloc
# Use urljoin to resolve it and verify it points to the same host
ref_url = urlparse(request.host_url)
test_url = urlparse(urljoin(request.host_url, target))
# Check: ensure the resolved URL has the same netloc as current host
if not (test_url.scheme in ('http', 'https') and ref_url.netloc == test_url.netloc):
logger.warning(f"Blocked redirect attempt with mismatched netloc: {target}")
return False
# Additional validation: Check if the URL matches a registered route
# This prevents redirects to non-existent pages or unintended endpoints
try:
# Get the path without query string and fragment
# Fragments (like #general) are automatically stripped by urlparse
path = parsed.path
# Create a URL adapter bound to the server name
adapter = app.url_map.bind(ref_url.netloc)
# Try to match the path to a registered route
# This will raise NotFound if the route doesn't exist
endpoint, values = adapter.match(path, return_rule=False)
# Block redirects to static file endpoints - these are catch-all routes
# that would match arbitrary paths, potentially allowing unintended redirects
if endpoint in ('static_content', 'static', 'static_flags'):
logger.warning(f"Blocked redirect to static endpoint: {target}")
return False
# Successfully matched a valid route
logger.debug(f"Validated safe redirect to endpoint '{endpoint}': {target}")
return True
except Exception as e:
# Route doesn't exist or can't be matched
logger.warning(f"Blocked redirect to non-existent route: {target} (error: {e})")
return False

View File

@@ -52,7 +52,13 @@ def render(template_str, **args: t.Any) -> str:
return output[:JINJA2_MAX_RETURN_PAYLOAD_SIZE]
def render_fully_escaped(content):
env = jinja2.sandbox.ImmutableSandboxedEnvironment(autoescape=True)
template = env.from_string("{{ some_html|e }}")
return template.render(some_html=content)
"""
Escape HTML content safely.
MEMORY LEAK FIX: Use markupsafe.escape() directly instead of creating
Jinja2 environments (was causing 1M+ compilations per page load).
Simpler, faster, and no concerns about environment state.
"""
from markupsafe import escape
return str(escape(content))

View File

@@ -0,0 +1,112 @@
"""
Language configuration for i18n support
Automatically discovers available languages from translations directory
"""
import os
from pathlib import Path
def get_timeago_locale(flask_locale):
"""
Convert Flask-Babel locale codes to timeago library locale codes.
The Python timeago library (https://github.com/hustcc/timeago) supports 48 locales
but uses different naming conventions than Flask-Babel. This function maps between them.
Notable differences:
- Chinese: Flask uses 'zh', timeago uses 'zh_CN'
- Portuguese: Flask uses 'pt', timeago uses 'pt_PT' or 'pt_BR'
- Swedish: Flask uses 'sv', timeago uses 'sv_SE'
- Norwegian: Flask uses 'no', timeago uses 'nb_NO' or 'nn_NO'
- Hindi: Flask uses 'hi', timeago uses 'in_HI'
- Czech: Flask uses 'cs', but timeago doesn't support Czech - fallback to English
Args:
flask_locale (str): Flask-Babel locale code (e.g., 'cs', 'zh', 'pt')
Returns:
str: timeago library locale code (e.g., 'en', 'zh_CN', 'pt_PT')
"""
locale_map = {
'zh': 'zh_CN', # Chinese Simplified
# timeago library just hasn't been updated to use the more modern locale naming convention, before BCP 47 / RFC 5646.
'zh_TW': 'zh_TW', # Chinese Traditional (timeago uses zh_TW)
'zh_Hant_TW': 'zh_TW', # Flask-Babel normalizes zh_TW to zh_Hant_TW, map back to timeago's zh_TW
'pt': 'pt_PT', # Portuguese (Portugal)
'sv': 'sv_SE', # Swedish
'no': 'nb_NO', # Norwegian Bokmål
'hi': 'in_HI', # Hindi
'cs': 'en', # Czech not supported by timeago, fallback to English
'uk': 'uk', # Ukrainian
'en_GB': 'en', # British English - timeago uses 'en'
'en_US': 'en', # American English - timeago uses 'en'
}
return locale_map.get(flask_locale, flask_locale)
# Language metadata: flag icon CSS class and native name
# Using flag-icons library: https://flagicons.lipis.dev/
LANGUAGE_DATA = {
'en_GB': {'flag': 'fi fi-gb fis', 'name': 'English (UK)'},
'en_US': {'flag': 'fi fi-us fis', 'name': 'English (US)'},
'de': {'flag': 'fi fi-de fis', 'name': 'Deutsch'},
'fr': {'flag': 'fi fi-fr fis', 'name': 'Français'},
'ko': {'flag': 'fi fi-kr fis', 'name': '한국어'},
'cs': {'flag': 'fi fi-cz fis', 'name': 'Čeština'},
'es': {'flag': 'fi fi-es fis', 'name': 'Español'},
'pt': {'flag': 'fi fi-pt fis', 'name': 'Português'},
'it': {'flag': 'fi fi-it fis', 'name': 'Italiano'},
'ja': {'flag': 'fi fi-jp fis', 'name': '日本語'},
'zh': {'flag': 'fi fi-cn fis', 'name': '中文 (简体)'},
'zh_Hant_TW': {'flag': 'fi fi-tw fis', 'name': '繁體中文'},
'ru': {'flag': 'fi fi-ru fis', 'name': 'Русский'},
'pl': {'flag': 'fi fi-pl fis', 'name': 'Polski'},
'nl': {'flag': 'fi fi-nl fis', 'name': 'Nederlands'},
'sv': {'flag': 'fi fi-se fis', 'name': 'Svenska'},
'da': {'flag': 'fi fi-dk fis', 'name': 'Dansk'},
'no': {'flag': 'fi fi-no fis', 'name': 'Norsk'},
'fi': {'flag': 'fi fi-fi fis', 'name': 'Suomi'},
'tr': {'flag': 'fi fi-tr fis', 'name': 'Türkçe'},
'ar': {'flag': 'fi fi-sa fis', 'name': 'العربية'},
'hi': {'flag': 'fi fi-in fis', 'name': 'हिन्दी'},
'uk': {'flag': 'fi fi-ua fis', 'name': 'Українська'},
}
def get_available_languages():
"""
Discover available languages by scanning the translations directory
Returns a dict of available languages with their metadata
"""
translations_dir = Path(__file__).parent / 'translations'
available = {}
# Scan for translation directories
if translations_dir.exists():
for lang_dir in translations_dir.iterdir():
if lang_dir.is_dir() and lang_dir.name in LANGUAGE_DATA:
# Check if messages.po exists
po_file = lang_dir / 'LC_MESSAGES' / 'messages.po'
if po_file.exists():
available[lang_dir.name] = LANGUAGE_DATA[lang_dir.name]
# If no English variants found, fall back to adding en_GB as default
if 'en_GB' not in available and 'en_US' not in available:
available['en_GB'] = LANGUAGE_DATA['en_GB']
return available
def get_language_codes():
"""Get list of available language codes"""
return list(get_available_languages().keys())
def get_flag_for_locale(locale):
"""Get flag emoji for a locale, or globe if unknown"""
return LANGUAGE_DATA.get(locale, {}).get('flag', '🌐')
def get_name_for_locale(locale):
"""Get native name for a locale"""
return LANGUAGE_DATA.get(locale, {}).get('name', locale.upper())

View File

@@ -2,6 +2,7 @@ from os import getenv
from copy import deepcopy
from changedetectionio.blueprint.rss import RSS_FORMAT_TYPES, RSS_CONTENT_FORMAT_DEFAULT
from changedetectionio.model.Tags import TagsDict
from changedetectionio.notification import (
default_notification_body,
@@ -29,7 +30,7 @@ class model(dict):
'proxy': None, # Preferred proxy connection
'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None},
'timeout': int(getenv("DEFAULT_SETTINGS_REQUESTS_TIMEOUT", "45")), # Default 45 seconds
'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "10")), # Number of threads, lower is better for slow connections
'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "5")), # Number of threads, lower is better for slow connections
'default_ua': {
'html_requests': getenv("DEFAULT_SETTINGS_HEADERS_USERAGENT", DEFAULT_SETTINGS_HEADERS_USERAGENT),
'html_webdriver': None,
@@ -37,6 +38,8 @@ class model(dict):
},
'application': {
# Custom notification content
'all_paused': False,
'all_muted': False,
'api_access_token_enabled': True,
'base_url' : None,
'empty_pages_are_a_change': False,
@@ -44,8 +47,10 @@ class model(dict):
'filter_failure_notification_threshold_attempts': _FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT,
'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
'global_subtractive_selectors': [],
'history_snapshot_max_length': None,
'ignore_whitespace': True,
'ignore_status_codes': False, #@todo implement, as ternary.
'ssim_threshold': '0.96', # Default SSIM threshold for screenshot comparison
'notification_body': default_notification_body,
'notification_format': default_notification_format,
'notification_title': default_notification_title,
@@ -64,7 +69,7 @@ class model(dict):
'schema_version' : 0,
'shared_diff_access': False,
'strip_ignored_lines': False,
'tags': {}, #@todo use Tag.model initialisers
'tags': None, # Initialized in __init__ with real datastore_path
'webdriver_delay': None , # Extra delay in seconds before extracting text
'ui': {
'use_page_title_in_list': True,
@@ -76,10 +81,16 @@ class model(dict):
}
}
def __init__(self, *arg, **kw):
def __init__(self, *arg, datastore_path=None, **kw):
super(model, self).__init__(*arg, **kw)
# Capture any tags data passed in before base_config overwrites the structure
existing_tags = self.get('settings', {}).get('application', {}).get('tags') or {}
# CRITICAL: deepcopy to avoid sharing mutable objects between instances
self.update(deepcopy(self.base_config))
# TagsDict requires the real datastore_path at runtime (cannot be set at class-definition time)
if datastore_path is None:
raise ValueError("App.model() requires 'datastore_path' keyword argument")
self['settings']['application']['tags'] = TagsDict(existing_tags, datastore_path=datastore_path)
def parse_headers_from_text_file(filepath):

View File

@@ -1,10 +1,48 @@
"""
Tag/Group domain model for organizing and overriding watch settings.
ARCHITECTURE NOTE: Configuration Override Hierarchy
===================================================
Tags can override Watch settings when overrides_watch=True.
Current implementation requires manual checking in processors:
for tag_uuid in watch.get('tags'):
tag = datastore['settings']['application']['tags'][tag_uuid]
if tag.get('overrides_watch'):
restock_settings = tag.get('restock_settings', {})
break
With Pydantic, this would be automatic via chain resolution:
Watch → Tag (first with overrides_watch) → Global
See: Watch.py model docstring for full Pydantic architecture explanation
See: processors/restock_diff/processor.py:184-192 for current manual implementation
"""
from changedetectionio.model import watch_base
from changedetectionio.model.persistence import EntityPersistenceMixin
class model(EntityPersistenceMixin, watch_base):
"""
Tag domain model - groups watches and can override their settings.
class model(watch_base):
Tags inherit from watch_base to reuse all the same fields as Watch.
When overrides_watch=True, tag settings take precedence over watch settings
for all watches in this tag/group.
Fields:
overrides_watch (bool): If True, this tag's settings override watch settings
title (str): Display name for this tag/group
uuid (str): Unique identifier
... (all fields from watch_base can be set as tag-level overrides)
Resolution order when overrides_watch=True:
Watch.field → Tag.field (if overrides_watch) → Global.field
"""
def __init__(self, *arg, **kw):
# Parent class (watch_base) handles __datastore and __datastore_path
super(model, self).__init__(*arg, **kw)
self['overrides_watch'] = kw.get('default', {}).get('overrides_watch')
@@ -12,3 +50,7 @@ class model(watch_base):
if kw.get('default'):
self.update(kw['default'])
del kw['default']
# _save_to_disk() method provided by EntityPersistenceMixin
# commit() and _get_commit_data() methods inherited from watch_base
# Tag uses default _get_commit_data() (includes all keys)

View File

@@ -0,0 +1,39 @@
import os
import shutil
from pathlib import Path
from loguru import logger
_SENTINEL = object()
class TagsDict(dict):
"""Dict subclass that removes the corresponding tag.json file when a tag is deleted."""
def __init__(self, *args, datastore_path: str | os.PathLike, **kwargs) -> None:
self._datastore_path = Path(datastore_path)
super().__init__(*args, **kwargs)
def __delitem__(self, key: str) -> None:
super().__delitem__(key)
tag_dir = self._datastore_path / key
tag_json_file = tag_dir / "tag.json"
if not os.path.exists(tag_json_file):
logger.critical(f"Aborting deletion of directory '{tag_dir}' because '{tag_json_file}' does not exist.")
return
try:
shutil.rmtree(tag_dir)
logger.info(f"Deleted tag directory for tag {key!r}")
except FileNotFoundError:
pass
except OSError as e:
logger.error(f"Failed to delete tag directory for tag {key!r}: {e}")
def pop(self, key: str, default=_SENTINEL):
"""Remove and return tag, deleting its tag.json file. Raises KeyError if missing and no default given."""
if key in self:
value = self[key]
del self[key]
return value
if default is _SENTINEL:
raise KeyError(key)
return default

View File

@@ -1,35 +1,239 @@
"""
Watch domain model for change detection monitoring.
ARCHITECTURE NOTE: Configuration Override Hierarchy
===================================================
This module implements Watch objects that inherit from dict (technical debt).
The dream architecture would use Pydantic for:
1. CHAIN RESOLUTION (Watch → Tag → Global Settings)
- Current: Manual resolution scattered across codebase
- Future: @computed_field properties with automatic resolution
- Examples: resolved_fetch_backend, resolved_restock_settings, etc.
2. DATABASE BACKEND ABSTRACTION
- Current: Domain model tightly coupled to file-based JSON storage
- Future: Domain model (Pydantic) separate from persistence layer
- Enables: Easy migration to PostgreSQL, MongoDB, etc.
3. TYPE SAFETY & VALIDATION
- Current: Dict access with no compile-time checks
- Future: Type hints, IDE autocomplete, validation at boundaries
See class model docstring for detailed explanation and examples.
See: processors/restock_diff/processor.py:184-192 for manual resolution example
"""
from blinker import signal
from changedetectionio.validate_url import is_safe_valid_url
from changedetectionio.strtobool import strtobool
from changedetectionio.jinja2_custom import render as jinja_render
from . import watch_base
from .persistence import EntityPersistenceMixin
import os
import re
from pathlib import Path
from loguru import logger
from .. import jinja2_custom as safe_jinja
from ..diff import ADDED_PLACEMARKER_OPEN
from ..html_tools import TRANSLATE_WHITESPACE_TABLE
FAVICON_RESAVE_THRESHOLD_SECONDS=86400
BROTLI_COMPRESS_SIZE_THRESHOLD = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024*20))
minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 3))
mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
class model(watch_base):
def _brotli_save(contents, filepath, mode=None, fallback_uncompressed=False):
"""
Save compressed data using native brotli with streaming compression.
Uses chunked compression to minimize peak memory usage and malloc_trim()
to force release of C-level memory back to the OS.
Args:
contents: data to compress (str or bytes)
filepath: destination file path
mode: brotli compression mode (e.g., brotli.MODE_TEXT)
fallback_uncompressed: if True, save uncompressed on failure; if False, raise exception
Returns:
str: actual filepath saved (may differ from input if fallback used)
Raises:
Exception: if compression fails and fallback_uncompressed is False
"""
import brotli
import gc
import ctypes
# Ensure contents are bytes
if isinstance(contents, str):
contents = contents.encode('utf-8')
try:
original_size = len(contents)
logger.debug(f"Starting brotli streaming compression of {original_size} bytes.")
# Create streaming compressor
compressor = brotli.Compressor(quality=6, mode=mode if mode is not None else brotli.MODE_GENERIC)
# Stream compress in chunks to minimize memory usage
chunk_size = 65536 # 64KB chunks
total_compressed_size = 0
with open(filepath, 'wb') as f:
# Process data in chunks
offset = 0
while offset < len(contents):
chunk = contents[offset:offset + chunk_size]
compressed_chunk = compressor.process(chunk)
if compressed_chunk:
f.write(compressed_chunk)
total_compressed_size += len(compressed_chunk)
offset += chunk_size
# Finalize compression - critical for proper cleanup
final_chunk = compressor.finish()
if final_chunk:
f.write(final_chunk)
total_compressed_size += len(final_chunk)
logger.debug(f"Finished brotli compression - From {original_size} to {total_compressed_size} bytes.")
# Cleanup: Delete compressor, force Python GC, then force C-level memory release
del compressor
gc.collect()
# Force release of C-level memory back to OS (since brotli is a C library)
try:
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass # malloc_trim not available on all systems (e.g., macOS)
return filepath
except Exception as e:
logger.error(f"Brotli compression error: {e}")
# Compression failed
if fallback_uncompressed:
logger.warning(f"Brotli compression failed for {filepath}, saving uncompressed")
fallback_path = filepath.replace('.br', '')
with open(fallback_path, 'wb') as f:
f.write(contents)
return fallback_path
else:
raise Exception(f"Brotli compression failed for {filepath}: {e}")
class model(EntityPersistenceMixin, watch_base):
"""
Watch domain model for monitoring URL changes.
Inherits from watch_base (which inherits dict) - see watch_base docstring for field documentation.
## Configuration Override Hierarchy (Chain Resolution)
The dream architecture uses a 3-level resolution chain:
Watch settings → Tag/Group settings → Global settings
Current implementation is MANUAL (see processor.py:184-192 for example):
- Processors manually check watch.get('field')
- Then loop through watch.tags to find first tag with overrides_watch=True
- Finally fall back to datastore['settings']['application']['field']
FUTURE: Pydantic-based chain resolution would enable:
```python
# Instead of manual resolution in every processor:
restock_settings = watch.get('restock_settings', {})
for tag_uuid in watch.get('tags'):
tag = datastore['settings']['application']['tags'][tag_uuid]
if tag.get('overrides_watch'):
restock_settings = tag.get('restock_settings', {})
break
# Clean computed properties with automatic resolution:
@computed_field
def resolved_restock_settings(self) -> dict:
if self.restock_settings:
return self.restock_settings
for tag_uuid in self.tags:
tag = self._datastore.get_tag(tag_uuid)
if tag.overrides_watch and tag.restock_settings:
return tag.restock_settings
return self._datastore.settings.restock_settings or {}
# Usage: watch.resolved_restock_settings (automatic, type-safe, tested once)
```
Benefits of Pydantic migration:
1. Single source of truth for resolution logic (not scattered across processors)
2. Type safety + IDE autocomplete (watch.resolved_fetch_backend vs dict navigation)
3. Database backend abstraction (domain model separate from persistence)
4. Automatic validation at boundaries
5. Self-documenting via type hints
6. Easy to test resolution independently
Resolution chain examples that would benefit:
- fetch_backend: watch → tag → global (see get_fetch_backend property)
- notification_urls: watch → tag → global
- time_between_check: watch → global (see threshold_seconds)
- restock_settings: watch → tag (see processors/restock_diff/processor.py:184-192)
- history_snapshot_max_length: watch → global (see save_history_blob:550-556)
- All processor_config_* settings could use tag overrides
## Database Backend Abstraction with Pydantic
Current: Watch inherits dict, tightly coupled to file-based JSON storage
Future: Domain model (Watch) separate from persistence layer
```python
# Domain model (database-agnostic)
class Watch(BaseModel):
uuid: str
url: str
# ... validation, business logic
# Pluggable backends
class DataStoreBackend(ABC):
def save_watch(self, watch: Watch): ...
def load_watch(self, uuid: str) -> Watch: ...
# Implementations: FileBackend, MongoBackend, PostgresBackend, etc.
```
This would enable:
- Easy migration between storage backends (file → postgres → mongodb)
- Pydantic handles serialization/deserialization automatically
- Domain logic stays clean (no storage concerns in Watch methods)
## Migration Path
Given existing codebase, incremental migration recommended:
1. Create Pydantic models alongside existing dict-based models
2. Add .to_pydantic() / .from_pydantic() bridge methods
3. Gradually migrate code to use Pydantic models
4. Remove dict inheritance once migration complete
See: watch_base docstring for technical debt discussion
See: processors/restock_diff/processor.py:184-192 for manual resolution example
See: Watch.py:550-556 for nested dict navigation that would become watch.resolved_*
"""
__newest_history_key = None
__history_n = 0
jitter_seconds = 0
def __init__(self, *arg, **kw):
self.__datastore_path = kw.get('datastore_path')
if kw.get('datastore_path'):
del kw['datastore_path']
# Validate __datastore before calling parent (Watch requires it)
if not kw.get('__datastore'):
raise ValueError("Watch object requires '__datastore' reference - cannot access global settings without it")
# Parent class (watch_base) handles __datastore and __datastore_path
super(model, self).__init__(*arg, **kw)
if kw.get('default'):
self.update(kw['default'])
del kw['default']
@@ -40,6 +244,9 @@ class model(watch_base):
# Be sure the cached timestamp is ready
bump = self.history
# Note: __deepcopy__, __getstate__, and __setstate__ are inherited from watch_base
# This prevents memory leaks by sharing __datastore reference instead of copying it
@property
def viewed(self):
# Don't return viewed when last_viewed is 0 and newest_key is 0
@@ -52,11 +259,6 @@ class model(watch_base):
def has_unviewed(self):
return int(self.newest_history_key) > int(self['last_viewed']) and self.__history_n >= 2
def ensure_data_dir_exists(self):
if not os.path.isdir(self.watch_data_dir):
logger.debug(f"> Creating data dir {self.watch_data_dir}")
os.mkdir(self.watch_data_dir)
@property
def link(self):
@@ -93,11 +295,30 @@ class model(watch_base):
domain = parsed.hostname
return domain
@property
def history_index_filename(self):
# So that you dont try to view different histories in different 'diff' setups, can confuse cdio.
processor = self.get('processor')
if not processor or self.get('processor') == 'text_json_diff':
return 'history.txt'
else:
return f'history-{processor}.txt'
def clear_watch(self):
import pathlib
# Get list of processor config files to preserve
from changedetectionio.processors import find_processors
processor_names = [name for cls, name in find_processors()]
processor_config_files = {f"{name}.json" for name in processor_names}
# JSON Data, Screenshots, Textfiles (history index and snapshots), HTML in the future etc
for item in pathlib.Path(str(self.watch_data_dir)).rglob("*.*"):
# But preserve processor config files (they're configuration, not history data)
# Use glob not rglob here for safety.
for item in pathlib.Path(str(self.data_dir)).glob("*.*"):
# Skip processor config files
if item.name in processor_config_files:
continue
os.unlink(item)
# Force the attr to recalculate
@@ -114,7 +335,6 @@ class model(watch_base):
'last_notification_error': False,
'last_viewed': 0,
'previous_md5': False,
'previous_md5_before_filters': False,
'remote_server_reply': None,
'track_ldjson_price_data': None
})
@@ -131,8 +351,30 @@ class model(watch_base):
@property
def get_fetch_backend(self):
"""
Like just using the `fetch_backend` key but there could be some logic
:return:
Get the fetch backend for this watch with special case handling.
CHAIN RESOLUTION OPPORTUNITY:
Currently returns watch.fetch_backend directly, but doesn't implement
Watch → Tag → Global resolution chain. With Pydantic:
@computed_field
def resolved_fetch_backend(self) -> str:
# Special case: PDFs always use html_requests
if self.is_pdf:
return 'html_requests'
# Watch override
if self.fetch_backend and self.fetch_backend != 'system':
return self.fetch_backend
# Tag override (first tag with overrides_watch=True wins)
for tag_uuid in self.tags:
tag = self._datastore.get_tag(tag_uuid)
if tag.overrides_watch and tag.fetch_backend:
return tag.fetch_backend
# Global default
return self._datastore.settings.fetch_backend
"""
# Maybe also if is_image etc?
# This is because chrome/playwright wont render the PDF in the browser and we will just fetch it and use pdf2html to see the text.
@@ -143,10 +385,16 @@ class model(watch_base):
@property
def is_pdf(self):
# content_type field is set in the future
# https://github.com/dgtlmoon/changedetection.io/issues/1392
# Not sure the best logic here
return self.get('url', '').lower().endswith('.pdf') or 'pdf' in self.get('content_type', '').lower()
url = str(self.get("url") or "").lower()
content_type = str(self.get("content-type") or "").lower()
if content_type in ("none", "null", ""):
content_type = ""
return (
url.endswith(".pdf")
or content_type.split(";")[0].strip() == "application/pdf"
)
@property
def label(self):
@@ -181,11 +429,11 @@ class model(watch_base):
tmp_history = {}
# In the case we are only using the watch for processing without history
if not self.watch_data_dir:
if not self.data_dir:
return []
# Read the history file as a dict
fname = os.path.join(self.watch_data_dir, "history.txt")
fname = os.path.join(self.data_dir, self.history_index_filename)
if os.path.isfile(fname):
logger.debug(f"Reading watch history index for {self.get('uuid')}")
with open(fname, "r", encoding='utf-8') as f:
@@ -195,13 +443,16 @@ class model(watch_base):
# The index history could contain a relative path, so we need to make the fullpath
# so that python can read it
if not '/' in v and not '\'' in v:
v = os.path.join(self.watch_data_dir, v)
# Cross-platform: check for any path separator (works on Windows and Unix)
if os.sep not in v and '/' not in v and '\\' not in v:
# Relative filename only, no path separators
v = os.path.join(self.data_dir, v)
else:
# It's possible that they moved the datadir on older versions
# So the snapshot exists but is in a different path
snapshot_fname = v.split('/')[-1]
proposed_new_path = os.path.join(self.watch_data_dir, snapshot_fname)
# Cross-platform: use os.path.basename instead of split('/')
snapshot_fname = os.path.basename(v)
proposed_new_path = os.path.join(self.data_dir, snapshot_fname)
if not os.path.exists(v) and os.path.exists(proposed_new_path):
v = proposed_new_path
@@ -218,7 +469,7 @@ class model(watch_base):
@property
def has_history(self):
fname = os.path.join(self.watch_data_dir, "history.txt")
fname = os.path.join(self.data_dir, self.history_index_filename)
return os.path.isfile(fname)
@property
@@ -288,61 +539,140 @@ class model(watch_base):
if not filepath:
filepath = self.history[timestamp]
# See if a brotli versions exists and switch to that
if not filepath.endswith('.br') and os.path.isfile(f"{filepath}.br"):
filepath = f"{filepath}.br"
# Check if binary file (image, PDF, etc.)
# Binary files are NEVER saved with .br compression, only text files are
binary_extensions = ('.png', '.jpg', '.jpeg', '.gif', '.webp', '.pdf', '.bin', '.jfif')
is_binary = any(filepath.endswith(ext) for ext in binary_extensions)
# OR in the backup case that the .br does not exist, but the plain one does
if filepath.endswith('.br') and not os.path.isfile(filepath):
if os.path.isfile(filepath.replace('.br', '')):
filepath = filepath.replace('.br', '')
# Only look for .br versions for text files
if not is_binary:
# See if a brotli version exists and switch to that (text files only)
if not filepath.endswith('.br') and os.path.isfile(f"{filepath}.br"):
filepath = f"{filepath}.br"
# OR in the backup case that the .br does not exist, but the plain one does
if filepath.endswith('.br') and not os.path.isfile(filepath):
if os.path.isfile(filepath.replace('.br', '')):
filepath = filepath.replace('.br', '')
# Handle .br compressed text files
if filepath.endswith('.br'):
# Brotli doesnt have a fileheader to detect it, so we rely on filename
# https://www.rfc-editor.org/rfc/rfc7932
# Note: .br should ONLY exist for text files, never binary
with open(filepath, 'rb') as f:
return(brotli.decompress(f.read()).decode('utf-8'))
return brotli.decompress(f.read()).decode('utf-8')
# Binary file - return raw bytes
if is_binary:
with open(filepath, 'rb') as f:
return f.read()
# Text file - decode to string
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
return f.read()
# Save some text file to the appropriate path and bump the history
# result_obj from fetch_site_status.run()
def save_history_text(self, contents, timestamp, snapshot_id):
import brotli
def _write_atomic(self, dest, data, mode='wb'):
"""Write data atomically to dest using a temp file"""
import tempfile
logger.trace(f"{self.get('uuid')} - Updating history.txt with timestamp {timestamp}")
with tempfile.NamedTemporaryFile(mode, delete=False, dir=self.data_dir) as tmp:
tmp.write(data)
tmp.flush()
os.fsync(tmp.fileno())
tmp_path = tmp.name
os.replace(tmp_path, dest)
def history_trim(self, newest_n_items):
from pathlib import Path
import gc
# Sort by timestamp (key)
sorted_items = sorted(self.history.items(), key=lambda x: int(x[0]))
keep_part = dict(sorted_items[-newest_n_items:])
delete_part = dict(sorted_items[:-newest_n_items])
logger.info( f"[{self.get('uuid')}] Trimming history to most recent {newest_n_items} items, keeping {len(keep_part)} items deleting {len(delete_part)} items.")
if delete_part:
for item in delete_part.items():
try:
Path(item[1]).unlink(missing_ok=True)
except Exception as e:
logger.critical(f"{str(e)}")
finally:
logger.debug(f"[{self.get('uuid')}] Deleted {item[1]} history snapshot")
try:
dest = os.path.join(self.data_dir, self.history_index_filename)
output = "\r\n".join(
f"{k},{Path(v).name}"
for k, v in keep_part.items()
)+"\r\n"
self._write_atomic(dest=dest, data=output, mode='w')
except Exception as e:
logger.critical(f"{str(e)}")
finally:
logger.debug(f"[{self.get('uuid')}] Updated history index {dest}")
# reimport
bump = self.history
gc.collect()
# Save some text file to the appropriate path and bump the history
# result_obj from fetch_site_status.run()
def save_history_blob(self, contents, timestamp, snapshot_id):
logger.trace(f"{self.get('uuid')} - Updating {self.history_index_filename} with timestamp {timestamp}")
self.ensure_data_dir_exists()
threshold = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024))
skip_brotli = strtobool(os.getenv('DISABLE_BROTLI_TEXT_SNAPSHOT', 'False'))
# Decide on snapshot filename and destination path
if not skip_brotli and len(contents) > threshold:
snapshot_fname = f"{snapshot_id}.txt.br"
encoded_data = brotli.compress(contents.encode('utf-8'), mode=brotli.MODE_TEXT)
# Binary data - detect file type and save without compression
if isinstance(contents, bytes):
try:
import puremagic
detections = puremagic.magic_string(contents[:2048])
ext = detections[0].extension if detections else 'bin'
# Strip leading dot if present (puremagic returns extensions like '.jfif')
ext = ext.lstrip('.')
if detections:
logger.trace(f"Detected file type: {detections[0].mime_type} -> extension: {ext}")
except Exception as e:
logger.warning(f"puremagic detection failed: {e}, using 'bin' extension")
ext = 'bin'
snapshot_fname = f"{snapshot_id}.{ext}"
dest = os.path.join(self.data_dir, snapshot_fname)
self._write_atomic(dest, contents)
logger.trace(f"Saved binary snapshot as {snapshot_fname} ({len(contents)} bytes)")
# Text data - use brotli compression if enabled and above threshold
else:
snapshot_fname = f"{snapshot_id}.txt"
encoded_data = contents.encode('utf-8')
if not skip_brotli and len(contents) > BROTLI_COMPRESS_SIZE_THRESHOLD:
# Compressed text
import brotli
snapshot_fname = f"{snapshot_id}.txt.br"
dest = os.path.join(self.data_dir, snapshot_fname)
dest = os.path.join(self.watch_data_dir, snapshot_fname)
# Write snapshot file atomically if it doesn't exist
if not os.path.exists(dest):
with tempfile.NamedTemporaryFile('wb', delete=False, dir=self.watch_data_dir) as tmp:
tmp.write(encoded_data)
tmp.flush()
os.fsync(tmp.fileno())
tmp_path = tmp.name
os.rename(tmp_path, dest)
if not os.path.exists(dest):
try:
actual_dest = _brotli_save(contents, dest, mode=brotli.MODE_TEXT, fallback_uncompressed=True)
if actual_dest != dest:
snapshot_fname = os.path.basename(actual_dest)
except Exception as e:
logger.error(f"{self.get('uuid')} - Brotli compression failed: {e}")
# Fallback to uncompressed
snapshot_fname = f"{snapshot_id}.txt"
dest = os.path.join(self.data_dir, snapshot_fname)
self._write_atomic(dest, contents.encode('utf-8'))
else:
# Plain text
snapshot_fname = f"{snapshot_id}.txt"
dest = os.path.join(self.data_dir, snapshot_fname)
self._write_atomic(dest, contents.encode('utf-8'))
# Append to history.txt atomically
index_fname = os.path.join(self.watch_data_dir, "history.txt")
index_fname = os.path.join(self.data_dir, self.history_index_filename)
index_line = f"{timestamp},{snapshot_fname}\n"
# Lets try force flush here since it's usually a very small file
# If this still fails in the future then try reading all to memory first, re-writing etc
with open(index_fname, 'a', encoding='utf-8') as f:
f.write(index_line)
f.flush()
@@ -352,6 +682,17 @@ class model(watch_base):
self.__newest_history_key = timestamp
self.__history_n += 1
# MANUAL CHAIN RESOLUTION: Watch → Global
# With Pydantic, this would become: maxlen = watch.resolved_history_snapshot_max_length
# @computed_field def resolved_history_snapshot_max_length(self) -> Optional[int]:
# if self.history_snapshot_max_length: return self.history_snapshot_max_length
# if tag := self._get_override_tag(): return tag.history_snapshot_max_length
# return self._datastore.settings.history_snapshot_max_length
maxlen = self.get('history_snapshot_max_length') or self.get_global_setting('application', 'history_snapshot_max_length')
if maxlen and self.__history_n and self.__history_n > maxlen:
self.history_trim(newest_n_items=maxlen)
# @todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
return snapshot_fname
@@ -404,7 +745,7 @@ class model(watch_base):
return not local_lines.issubset(existing_history)
def get_screenshot(self):
fname = os.path.join(self.watch_data_dir, "last-screenshot.png")
fname = os.path.join(self.data_dir, "last-screenshot.png")
if os.path.isfile(fname):
return fname
@@ -419,7 +760,7 @@ class model(watch_base):
if not favicon_fname:
return True
try:
fname = next(iter(glob.glob(os.path.join(self.watch_data_dir, "favicon.*"))), None)
fname = next(iter(glob.glob(os.path.join(self.data_dir, "favicon.*"))), None)
logger.trace(f"Favicon file maybe found at {fname}")
if os.path.isfile(fname):
file_age = int(time.time() - os.path.getmtime(fname))
@@ -452,7 +793,7 @@ class model(watch_base):
base = "favicon"
extension = "ico"
fname = os.path.join(self.watch_data_dir, f"favicon.{extension}")
fname = os.path.join(self.data_dir, f"favicon.{extension}")
try:
# validate=True makes sure the string only contains valid base64 chars
@@ -464,6 +805,11 @@ class model(watch_base):
try:
with open(fname, 'wb') as f:
f.write(decoded)
# Invalidate favicon filename cache
if hasattr(self, '_favicon_filename_cache'):
delattr(self, '_favicon_filename_cache')
# A signal that could trigger the socket server to update the browser also
watch_check_update = signal('watch_favicon_bump')
if watch_check_update:
@@ -480,20 +826,32 @@ class model(watch_base):
Find any favicon.* file in the current working directory
and return the contents of the newest one.
MEMORY LEAK FIX: Cache the result to avoid repeated glob.glob() operations.
glob.glob() causes millions of fnmatch allocations when called for every watch on page load.
Returns:
bytes: Contents of the newest favicon file, or None if not found.
str: Basename of the newest favicon file, or None if not found.
"""
# Check cache first (prevents 26M+ allocations from repeated glob operations)
cache_key = '_favicon_filename_cache'
if hasattr(self, cache_key):
return getattr(self, cache_key)
import glob
# Search for all favicon.* files
files = glob.glob(os.path.join(self.watch_data_dir, "favicon.*"))
files = glob.glob(os.path.join(self.data_dir, "favicon.*"))
if not files:
return None
result = None
else:
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
result = os.path.basename(newest_file)
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
return os.path.basename(newest_file)
# Cache the result
setattr(self, cache_key, result)
return result
def get_screenshot_as_thumbnail(self, max_age=3200):
"""Return path to a square thumbnail of the most recent screenshot.
@@ -509,7 +867,7 @@ class model(watch_base):
import os
import time
thumbnail_path = os.path.join(self.watch_data_dir, "thumbnail.jpeg")
thumbnail_path = os.path.join(self.data_dir, "thumbnail.jpeg")
top_trim = 500 # Pixels from top of screenshot to use
screenshot_path = self.get_screenshot()
@@ -560,7 +918,7 @@ class model(watch_base):
return None
def __get_file_ctime(self, filename):
fname = os.path.join(self.watch_data_dir, filename)
fname = os.path.join(self.data_dir, filename)
if os.path.isfile(fname):
return int(os.path.getmtime(fname))
return False
@@ -585,14 +943,9 @@ class model(watch_base):
def snapshot_error_screenshot_ctime(self):
return self.__get_file_ctime('last-error-screenshot.png')
@property
def watch_data_dir(self):
# The base dir of the watch data
return os.path.join(self.__datastore_path, self['uuid']) if self.__datastore_path else None
def get_error_text(self):
"""Return the text saved from a previous request that resulted in a non-200 error"""
fname = os.path.join(self.watch_data_dir, "last-error.txt")
fname = os.path.join(self.data_dir, "last-error.txt")
if os.path.isfile(fname):
with open(fname, 'r', encoding='utf-8') as f:
return f.read()
@@ -600,7 +953,7 @@ class model(watch_base):
def get_error_snapshot(self):
"""Return path to the screenshot that resulted in a non-200 error"""
fname = os.path.join(self.watch_data_dir, "last-error-screenshot.png")
fname = os.path.join(self.data_dir, "last-error-screenshot.png")
if os.path.isfile(fname):
return fname
return False
@@ -624,6 +977,37 @@ class model(watch_base):
def toggle_mute(self):
self['notification_muted'] ^= True
def _get_commit_data(self):
"""
Prepare watch data for commit.
Excludes processor_config_* keys (stored in separate files).
Normalizes browser_steps to empty list if no meaningful steps.
"""
import copy
# Get base snapshot with lock
lock = self._datastore.lock if self._datastore and hasattr(self._datastore, 'lock') else None
if lock:
with lock:
snapshot = dict(self)
else:
snapshot = dict(self)
# Exclude processor config keys (stored separately)
watch_dict = {k: copy.deepcopy(v) for k, v in snapshot.items() if not k.startswith('processor_config_')}
# Normalize browser_steps: if no meaningful steps, save as empty list
if not self.has_browser_steps:
watch_dict['browser_steps'] = []
return watch_dict
# _save_to_disk() method provided by EntityPersistenceMixin
# commit() method inherited from watch_base
def extra_notification_token_values(self):
# Used for providing extra tokens
# return {'widget': 555}
@@ -653,7 +1037,7 @@ class model(watch_base):
if not csv_writer:
# A file on the disk can be transferred much faster via flask than a string reply
csv_output_filename = f"report-{self.get('uuid')}.csv"
f = open(os.path.join(self.watch_data_dir, csv_output_filename), 'w')
f = open(os.path.join(self.data_dir, csv_output_filename), 'w')
# @todo some headers in the future
#fieldnames = ['Epoch seconds', 'Date']
csv_writer = csv.writer(f,
@@ -695,7 +1079,7 @@ class model(watch_base):
def save_error_text(self, contents):
self.ensure_data_dir_exists()
target_path = os.path.join(self.watch_data_dir, "last-error.txt")
target_path = os.path.join(self.data_dir, "last-error.txt")
with open(target_path, 'w', encoding='utf-8') as f:
f.write(contents)
@@ -704,9 +1088,9 @@ class model(watch_base):
import zlib
if as_error:
target_path = os.path.join(str(self.watch_data_dir), "elements-error.deflate")
target_path = os.path.join(str(self.data_dir), "elements-error.deflate")
else:
target_path = os.path.join(str(self.watch_data_dir), "elements.deflate")
target_path = os.path.join(str(self.data_dir), "elements.deflate")
self.ensure_data_dir_exists()
@@ -721,9 +1105,9 @@ class model(watch_base):
def save_screenshot(self, screenshot: bytes, as_error=False):
if as_error:
target_path = os.path.join(self.watch_data_dir, "last-error-screenshot.png")
target_path = os.path.join(self.data_dir, "last-error-screenshot.png")
else:
target_path = os.path.join(self.watch_data_dir, "last-screenshot.png")
target_path = os.path.join(self.data_dir, "last-screenshot.png")
self.ensure_data_dir_exists()
@@ -734,7 +1118,7 @@ class model(watch_base):
def get_last_fetched_text_before_filters(self):
import brotli
filepath = os.path.join(self.watch_data_dir, 'last-fetched.br')
filepath = os.path.join(self.data_dir, 'last-fetched.br')
if not os.path.isfile(filepath) or os.path.getsize(filepath) == 0:
# If a previous attempt doesnt yet exist, just snarf the previous snapshot instead
@@ -749,33 +1133,21 @@ class model(watch_base):
def save_last_text_fetched_before_filters(self, contents):
import brotli
filepath = os.path.join(self.watch_data_dir, 'last-fetched.br')
with open(filepath, 'wb') as f:
f.write(brotli.compress(contents, mode=brotli.MODE_TEXT))
filepath = os.path.join(self.data_dir, 'last-fetched.br')
_brotli_save(contents, filepath, mode=brotli.MODE_TEXT, fallback_uncompressed=False)
def save_last_fetched_html(self, timestamp, contents):
import brotli
self.ensure_data_dir_exists()
snapshot_fname = f"{timestamp}.html.br"
filepath = os.path.join(self.watch_data_dir, snapshot_fname)
with open(filepath, 'wb') as f:
contents = contents.encode('utf-8') if isinstance(contents, str) else contents
try:
f.write(brotli.compress(contents))
except Exception as e:
logger.warning(f"{self.get('uuid')} - Unable to compress snapshot, saving as raw data to {filepath}")
logger.warning(e)
f.write(contents)
filepath = os.path.join(self.data_dir, snapshot_fname)
_brotli_save(contents, filepath, mode=None, fallback_uncompressed=True)
self._prune_last_fetched_html_snapshots()
def get_fetched_html(self, timestamp):
import brotli
snapshot_fname = f"{timestamp}.html.br"
filepath = os.path.join(self.watch_data_dir, snapshot_fname)
filepath = os.path.join(self.data_dir, snapshot_fname)
if os.path.isfile(filepath):
with open(filepath, 'rb') as f:
return (brotli.decompress(f.read()).decode('utf-8'))
@@ -790,7 +1162,7 @@ class model(watch_base):
for index, timestamp in enumerate(dates):
snapshot_fname = f"{timestamp}.html.br"
filepath = os.path.join(self.watch_data_dir, snapshot_fname)
filepath = os.path.join(self.data_dir, snapshot_fname)
# Keep only the first 2
if index > 1 and os.path.isfile(filepath):
@@ -801,7 +1173,7 @@ class model(watch_base):
def get_browsersteps_available_screenshots(self):
"For knowing which screenshots are available to show the user in BrowserSteps UI"
available = []
for f in Path(self.watch_data_dir).glob('step_before-*.jpeg'):
for f in Path(self.data_dir).glob('step_before-*.jpeg'):
step_n=re.search(r'step_before-(\d+)', f.name)
if step_n:
available.append(step_n.group(1))
@@ -826,6 +1198,7 @@ class model(watch_base):
# has app+request context, we can use url_for()
if has_app_context:
if last_error:
last_error = safe_jinja.render_fully_escaped(last_error)
if '403' in last_error:
if has_proxies:
output.append(str(Markup(f"{last_error} - <a href=\"{url_for('settings.settings_page', uuid=self.get('uuid'))}\">Try other proxies/location</a>&nbsp;'")))
@@ -835,7 +1208,9 @@ class model(watch_base):
output.append(str(Markup(last_error)))
if self.get('last_notification_error'):
output.append(str(Markup(f"<div class=\"notification-error\"><a href=\"{url_for('settings.notification_logs')}\">{ self.get('last_notification_error') }</a></div>")))
txt = safe_jinja.render_fully_escaped(self.get('last_notification_error'))
result = f'<div class="notification-error"><a href="{url_for("settings.notification_logs")}">{txt}</a></div>'
output.append(result)
else:
# Lo_Fi version - no app context, cant rely on Jinja2 Markup

View File

@@ -2,12 +2,175 @@ import os
import uuid
from changedetectionio import strtobool
from .persistence import EntityPersistenceMixin, _determine_entity_type
__all__ = ['EntityPersistenceMixin', 'watch_base']
from ..browser_steps.browser_steps import browser_steps_get_valid_steps
USE_SYSTEM_DEFAULT_NOTIFICATION_FORMAT_FOR_WATCH = 'System default'
CONDITIONS_MATCH_LOGIC_DEFAULT = 'ALL'
class watch_base(dict):
"""
Base watch domain model (inherits from dict for backward compatibility).
WARNING: This class inherits from dict, which violates proper encapsulation.
Dict inheritance is legacy technical debt that should be refactored to a proper
domain model (e.g., Pydantic BaseModel) for better type safety and validation.
TODO: Migrate to Pydantic BaseModel for:
- Type safety and IDE autocomplete
- Automatic validation
- Clear separation between domain model and serialization
- Database backend abstraction (file → postgres → mongodb)
- Configuration override chain resolution (Watch → Tag → Global)
- Immutability options
- Better testing
- USE https://docs.pydantic.dev/latest/integrations/datamodel_code_generator TO BUILD THE MODEL FROM THE API-SPEC!!!
CHAIN RESOLUTION ARCHITECTURE:
The dream is a 3-level override hierarchy:
Watch settings → Tag/Group settings → Global settings
Current implementation: MANUAL resolution scattered across codebase
- Processors manually check watch.get('field')
- Loop through tags to find overrides_watch=True
- Fall back to datastore['settings']['application']['field']
Pydantic implementation: AUTOMATIC resolution via @computed_field
- Single source of truth for each setting's resolution logic
- Type-safe, testable, self-documenting
- Example: watch.resolved_fetch_backend (instead of nested dict navigation)
See: Watch.py model docstring for detailed Pydantic architecture plan
See: Tag.py model docstring for tag override explanation
See: processors/restock_diff/processor.py:184-192 for current manual example
Core Fields:
uuid (str): Unique identifier for this watch (auto-generated)
url (str): Target URL to monitor for changes
title (str|None): Custom display name (overrides page_title if set)
page_title (str|None): Title extracted from <title> tag of monitored page
tags (List[str]): List of tag UUIDs for categorization
tag (str): DEPRECATED - Old single-tag system, use tags instead
Check Configuration:
processor (str): Processor type ('text_json_diff', 'restock_diff', etc.)
fetch_backend (str): Fetcher to use ('system', 'html_requests', 'playwright', etc.)
method (str): HTTP method ('GET', 'POST', etc.)
headers (dict): Custom HTTP headers to send
proxy (str|None): Preferred proxy server
paused (bool): Whether change detection is paused
Scheduling:
time_between_check (dict): Check interval {'weeks': int, 'days': int, 'hours': int, 'minutes': int, 'seconds': int}
time_between_check_use_default (bool): Use global default interval if True
time_schedule_limit (dict): Weekly schedule limiting when checks can run
Structure: {
'enabled': bool,
'monday/tuesday/.../sunday': {
'enabled': bool,
'start_time': str ('HH:MM'),
'duration': {'hours': str, 'minutes': str}
}
}
Content Filtering:
include_filters (List[str]): CSS/XPath selectors to extract content
subtractive_selectors (List[str]): Selectors to remove from content
ignore_text (List[str]): Text patterns to ignore in change detection
trigger_text (List[str]): Text/regex that must be present to trigger change
text_should_not_be_present (List[str]): Text that should NOT be present
extract_text (List[str]): Regex patterns to extract specific text after filtering
Text Processing:
trim_text_whitespace (bool): Strip leading/trailing whitespace
sort_text_alphabetically (bool): Sort lines alphabetically before comparison
remove_duplicate_lines (bool): Remove duplicate lines
check_unique_lines (bool): Compare against all history for unique lines
strip_ignored_lines (bool|None): Remove lines matching ignore patterns
Change Detection Filters:
filter_text_added (bool): Include added text in change detection
filter_text_removed (bool): Include removed text in change detection
filter_text_replaced (bool): Include replaced text in change detection
Browser Automation:
browser_steps (List[dict]): Browser automation steps for JS-heavy sites
browser_steps_last_error_step (int|None): Last step that caused error
webdriver_delay (int|None): Seconds to wait after page load
webdriver_js_execute_code (str|None): JavaScript to execute before extraction
Restock Detection:
in_stock_only (bool): Only trigger on in-stock transitions
follow_price_changes (bool): Monitor price changes
has_ldjson_price_data (bool|None): Whether page has LD-JSON price data
track_ldjson_price_data (str|None): Track LD-JSON price data ('ACCEPT', 'REJECT', None)
price_change_threshold_percent (float|None): Minimum price change % to trigger
Notifications:
notification_urls (List[str]): Apprise URLs for notifications
notification_title (str|None): Custom notification title template
notification_body (str|None): Custom notification body template
notification_format (str): Notification format (e.g., 'System default', 'Text', 'HTML')
notification_muted (bool): Disable notifications for this watch
notification_screenshot (bool): Include screenshot in notifications
notification_alert_count (int): Number of notifications sent
last_notification_error (str|None): Last notification error message
body (str|None): DEPRECATED? Legacy notification body field
filter_failure_notification_send (bool): Send notification on filter failures
History & State:
date_created (int|None): Unix timestamp of watch creation
last_checked (int): Unix timestamp of last check
last_viewed (int): History snapshot key of last user view
last_error (str|bool): Last error message or False if no error
check_count (int): Total number of checks performed
fetch_time (float): Duration of last fetch in seconds
consecutive_filter_failures (int): Counter for consecutive filter match failures
previous_md5 (str|bool): MD5 hash of previous content
history_snapshot_max_length (int|None): Max history snapshots to keep (None = use global)
Conditions:
conditions (dict): Custom conditions for change detection logic
conditions_match_logic (str): Logic operator ('ALL', 'ANY') for conditions
Metadata:
content-type (str|None): Content-Type from last fetch
remote_server_reply (str|None): Server header from last response
ignore_status_codes (List[int]|None): HTTP status codes to ignore
use_page_title_in_list (bool|None): Display page title in watch list (None = use system default)
Instance Attributes (not serialized):
__datastore: Reference to parent DataStore (set externally after creation)
data_dir: Filesystem path for this watch's data directory
Notes:
- Many fields default to None to distinguish "not set" from "set to default"
- When field is None, system-level defaults are used
- Processor-specific configs (e.g., processor_config_*) are NOT stored in watch.json
They are stored in separate {processor_name}.json files
- This class is used for both Watch and Tag objects (tags reuse the structure)
"""
def __init__(self, *arg, **kw):
# Store datastore reference (common to Watch and Tag)
# Use single underscore to avoid name mangling issues in subclasses
self._datastore = kw.get('__datastore')
if kw.get('__datastore'):
del kw['__datastore']
# Store datastore_path (common to Watch and Tag)
self._datastore_path = kw.get('datastore_path')
if kw.get('datastore_path'):
del kw['datastore_path']
# IMPORTANT: Don't initialize __watch_was_edited yet!
# We'll initialize it AFTER the initial update() call below
# This prevents marking the watch as edited during initialization
self.update({
# Custom notification content
# Re #110, so then if this is set to None, we know to use the default value instead
@@ -16,7 +179,7 @@ class watch_base(dict):
'body': None,
'browser_steps': [],
'browser_steps_last_error_step': None,
'conditions' : {},
'conditions' : [],
'conditions_match_logic': CONDITIONS_MATCH_LOGIC_DEFAULT,
'check_count': 0,
'check_unique_lines': False, # On change-detected, compare against all history if its something new
@@ -32,6 +195,7 @@ class watch_base(dict):
'filter_text_replaced': True,
'follow_price_changes': True,
'has_ldjson_price_data': None,
'history_snapshot_max_length': None,
'headers': {}, # Extra headers to send
'ignore_text': [], # List of text to ignore when calculating the comparison checksum
'ignore_status_codes': None,
@@ -52,7 +216,6 @@ class watch_base(dict):
'page_title': None, # <title> from the page
'paused': False,
'previous_md5': False,
'previous_md5_before_filters': False, # Used for skipping changedetection entirely
'processor': 'text_json_diff', # could be restock_diff or others from .processors
'price_change_threshold_percent': None,
'proxy': None, # Preferred proxy connection
@@ -138,5 +301,372 @@ class watch_base(dict):
super(watch_base, self).__init__(*arg, **kw)
# Check if we're being initialized from an existing watch object
# that has was_edited=True, so we can preserve the flag
preserve_edited_flag = False
if self.get('default'):
del self['default']
# When creating a new watch object from an existing one (e.g., changing processor),
# preserve the was_edited flag if it was True
default_watch = self.get('default')
if hasattr(default_watch, 'was_edited') and default_watch.was_edited:
preserve_edited_flag = True
del self['default']
# NOW initialize the edited flag after all initial setup is complete
# This ensures initialization doesn't trigger the edited flag
# But preserve it if the source watch had it set to True
self.__watch_was_edited = preserve_edited_flag
def _mark_field_as_edited(self, key):
"""
Helper to mark a field as edited if it's writable.
Internal method used by __setitem__, update(), pop(), etc.
"""
# Don't track edits during initial load or if already edited
if not hasattr(self, '_watch_base__watch_was_edited'):
return
if self.__watch_was_edited:
return # Already marked as edited
# Import from shared schema utilities (no circular dependency)
from .schema_utils import get_readonly_watch_fields
readonly_fields = get_readonly_watch_fields()
# Additional system-managed fields not in OpenAPI spec (yet)
# These are set by processors/workers and should not trigger edited flag
additional_system_fields = {
'last_check_status', # Set by processors
'restock', # Set by restock processor
'last_viewed', # Set by mark_all_viewed endpoint
}
# Only mark as edited if this is a user-writable field
if key not in readonly_fields and key not in additional_system_fields:
self.__watch_was_edited = True
def __setitem__(self, key, value):
"""
Override dict.__setitem__ to track when writable watch fields are modified.
This enables skipping reprocessing when:
1. HTML content is unchanged (checksumFromPreviousCheckWasTheSame)
2. AND watch configuration was not edited
Only sets the edited flag when field is NOT in readonly_fields (from OpenAPI spec).
"""
# Set the value first (always)
super().__setitem__(key, value)
# Mark as edited if writable field
self._mark_field_as_edited(key)
def __delitem__(self, key):
"""Override dict.__delitem__ to track deletions of writable fields."""
super().__delitem__(key)
self._mark_field_as_edited(key)
def update(self, *args, **kwargs):
if args and args[0].get('browser_steps'):
args[0]['browser_steps'] = browser_steps_get_valid_steps(args[0].get('browser_steps'))
"""Override dict.update() to track modifications to writable fields."""
# Call parent update first
super().update(*args, **kwargs)
# Mark as edited for any writable fields that were updated
# Handle both update(dict) and update(key=value) forms
if args:
for key in args[0].keys():
self._mark_field_as_edited(key)
for key in kwargs.keys():
self._mark_field_as_edited(key)
def pop(self, key, *args):
"""Override dict.pop() to track removal of writable fields."""
result = super().pop(key, *args)
self._mark_field_as_edited(key)
return result
def setdefault(self, key, default=None):
"""Override dict.setdefault() to track modifications to writable fields."""
# Only marks as edited if key didn't exist (i.e., a new value was set)
existed = key in self
result = super().setdefault(key, default)
if not existed:
self._mark_field_as_edited(key)
return result
@property
def was_edited(self):
"""
Check if watch configuration was edited since last processing.
Returns:
bool: True if writable fields were modified, False otherwise
"""
return getattr(self, '_watch_base__watch_was_edited', False)
def reset_watch_edited_flag(self):
"""
Reset the watch edited flag after successful processing.
Call this after processing completes to allow future content-only change detection.
"""
self.__watch_was_edited = False
@classmethod
def get_property_names(cls):
"""
Get all @property attribute names from this model class using introspection.
This discovers computed/derived properties that are not stored in the datastore.
These properties should be filtered out during PUT/POST requests.
Returns:
frozenset: Immutable set of @property attribute names from the model class
"""
import functools
# Create a cached version if it doesn't exist
if not hasattr(cls, '_cached_get_property_names'):
@functools.cache
def _get_props():
properties = set()
# Use introspection to find all @property attributes
for name in dir(cls):
# Skip private/magic attributes
if name.startswith('_'):
continue
try:
attr = getattr(cls, name)
# Check if it's a property descriptor
if isinstance(attr, property):
properties.add(name)
except (AttributeError, TypeError):
continue
return frozenset(properties)
cls._cached_get_property_names = _get_props
return cls._cached_get_property_names()
def __deepcopy__(self, memo):
"""
Custom deepcopy for all watch_base subclasses (Watch, Tag, etc.).
CRITICAL FIX: Prevents copying large reference objects like __datastore
which would cause exponential memory growth when Watch objects are deepcopied.
This is called by:
- api/Watch.py:76 (API endpoint)
- api/Tags.py:28 (Tags API)
- processors/base.py:26 (EVERY processor run)
- store/__init__.py:544 (clone watch)
- And other locations
"""
from copy import deepcopy
# Create new instance without calling __init__
cls = self.__class__
new_obj = cls.__new__(cls)
memo[id(self)] = new_obj
# Copy the dict data (all the settings)
for key, value in self.items():
new_obj[key] = deepcopy(value, memo)
# Copy instance attributes dynamically
# This handles Watch-specific attrs (like __datastore) and any future subclass attrs
for attr_name in dir(self):
# Skip methods, special attrs, and dict keys
if attr_name.startswith('_') and not attr_name.startswith('__'):
# This catches _model__datastore, _model__history_n, etc.
try:
attr_value = getattr(self, attr_name)
# Special handling: Share references to large objects instead of copying
# Examples: _datastore, __datastore, __app_reference, __global_settings, etc.
if (attr_name == '_datastore' or
attr_name.endswith('__datastore') or
attr_name.endswith('__app')):
# Share the reference (don't copy!) to prevent memory leaks
setattr(new_obj, attr_name, attr_value)
# Skip cache attributes - let them regenerate on demand
elif 'cache' in attr_name.lower():
pass # Don't copy caches
# Copy regular instance attributes
elif not callable(attr_value):
setattr(new_obj, attr_name, attr_value)
except AttributeError:
pass # Attribute doesn't exist in this instance
return new_obj
def __getstate__(self):
"""
Custom pickle serialization for all watch_base subclasses.
Excludes large reference objects (like __datastore) from serialization.
"""
# Get the dict data
state = dict(self)
# Collect instance attributes (excluding methods and large references)
instance_attrs = {}
for attr_name in dir(self):
if attr_name.startswith('_') and not attr_name.startswith('__'):
try:
attr_value = getattr(self, attr_name)
# Exclude large reference objects and caches from serialization
if not (attr_name == '_datastore' or
attr_name.endswith('__datastore') or
attr_name.endswith('__app') or
'cache' in attr_name.lower() or
callable(attr_value)):
instance_attrs[attr_name] = attr_value
except AttributeError:
pass
if instance_attrs:
state['__instance_metadata__'] = instance_attrs
return state
def __setstate__(self, state):
"""
Custom pickle deserialization for all watch_base subclasses.
WARNING: Large reference objects (like __datastore) are NOT restored!
Caller must restore these references after unpickling if needed.
"""
# Extract metadata
metadata = state.pop('__instance_metadata__', {})
# Restore dict data
self.update(state)
# Restore instance attributes
for attr_name, attr_value in metadata.items():
setattr(self, attr_name, attr_value)
@property
def data_dir(self):
"""
The base directory for this watch/tag data (property, computed from UUID).
Common property for both Watch and Tag objects.
Returns path like: /datastore/{uuid}/
"""
return os.path.join(self._datastore_path, self['uuid']) if self._datastore_path else None
def ensure_data_dir_exists(self):
"""
Create the data directory if it doesn't exist.
Common method for both Watch and Tag objects.
"""
from loguru import logger
if not os.path.isdir(self.data_dir):
logger.debug(f"> Creating data dir {self.data_dir}")
os.mkdir(self.data_dir)
def get_global_setting(self, *path):
"""
Get a setting from the global datastore configuration.
Args:
*path: Path to the setting (e.g., 'application', 'history_snapshot_max_length')
Returns:
The setting value, or None if not found
Example:
maxlen = self.get_global_setting('application', 'history_snapshot_max_length')
"""
if not self._datastore:
return None
try:
value = self._datastore['settings']
for key in path:
value = value[key]
return value
except (KeyError, TypeError):
return None
def _get_commit_data(self):
"""
Prepare data for commit (can be overridden by subclasses).
Returns:
dict: Data to serialize (filtered as needed by subclass)
"""
import copy
# Acquire datastore lock to prevent concurrent modifications during copy
lock = self._datastore.lock if self._datastore and hasattr(self._datastore, 'lock') else None
if lock:
with lock:
snapshot = dict(self)
else:
snapshot = dict(self)
# Deep copy snapshot (slower, but done outside lock to minimize contention)
# Subclasses can override to filter keys (e.g., Watch excludes processor_config_*)
return {k: copy.deepcopy(v) for k, v in snapshot.items()}
def _save_to_disk(self, data_dict, uuid):
"""
Save data to disk (must be implemented by subclasses).
Args:
data_dict: Dictionary to save
uuid: UUID for logging
Raises:
NotImplementedError: If subclass doesn't implement
"""
raise NotImplementedError("Subclass must implement _save_to_disk()")
def commit(self):
"""
Save this watch/tag immediately to disk using atomic write.
Common commit logic for Watch and Tag objects.
Subclasses override _get_commit_data() and _save_to_disk() for specifics.
Fire-and-forget: Logs errors but does not raise exceptions.
Data remains in memory even if save fails, so next commit will retry.
"""
from loguru import logger
if not self.data_dir:
entity_type = self.__class__.__name__
logger.error(f"Cannot commit {entity_type} {self.get('uuid')} without datastore_path")
return
uuid = self.get('uuid')
if not uuid:
entity_type = self.__class__.__name__
logger.error(f"Cannot commit {entity_type} without UUID")
return
# Get data from subclass (may filter keys)
try:
data_dict = self._get_commit_data()
except Exception as e:
logger.error(f"Failed to prepare commit data for {uuid}: {e}")
return
# Save to disk via subclass implementation
try:
# Determine entity type from module name (Watch.py -> watch, Tag.py -> tag)
entity_type = _determine_entity_type(self.__class__)
filename = f"{entity_type}.json"
self._save_to_disk(data_dict, uuid)
logger.debug(f"Committed {entity_type} {uuid} to {uuid}/{filename}")
except Exception as e:
logger.error(f"Failed to commit {uuid}: {e}")

View File

@@ -0,0 +1,84 @@
"""
Entity persistence mixin for Watch and Tag models.
Provides file-based persistence using atomic writes.
"""
import functools
import inspect
@functools.lru_cache(maxsize=None)
def _determine_entity_type(cls):
"""
Determine entity type from class hierarchy (cached at class level).
Args:
cls: The class to inspect
Returns:
str: Entity type ('watch', 'tag', etc.)
Raises:
ValueError: If entity type cannot be determined
"""
for base_class in inspect.getmro(cls):
module_name = base_class.__module__
if module_name.startswith('changedetectionio.model.'):
# Get last part after dot: "changedetectionio.model.Watch" -> "watch"
return module_name.split('.')[-1].lower()
raise ValueError(
f"Cannot determine entity type for {cls.__module__}.{cls.__name__}. "
f"Entity must inherit from a class in changedetectionio.model (Watch or Tag)."
)
class EntityPersistenceMixin:
"""
Mixin providing file persistence for watch_base subclasses (Watch, Tag, etc.).
This mixin provides the _save_to_disk() method required by watch_base.commit().
It automatically determines the correct filename and size limits based on class hierarchy.
Usage:
class model(EntityPersistenceMixin, watch_base): # in Watch.py
pass
class model(EntityPersistenceMixin, watch_base): # in Tag.py
pass
"""
def _save_to_disk(self, data_dict, uuid):
"""
Save entity to disk using atomic write.
Implements the abstract method required by watch_base.commit().
Automatically determines filename and size limits from class hierarchy.
Args:
data_dict: Dictionary to save
uuid: UUID for logging
Raises:
ValueError: If entity type cannot be determined from class hierarchy
"""
# Import here to avoid circular dependency
from changedetectionio.store.file_saving_datastore import save_entity_atomic
# Determine entity type (cached at class level, not instance level)
entity_type = _determine_entity_type(self.__class__)
# Set filename and size limits based on entity type
filename = f'{entity_type}.json'
max_size_mb = 10 if entity_type == 'watch' else 1
# Save using generic function
save_entity_atomic(
self.data_dir,
uuid,
data_dict,
filename=filename,
entity_type=entity_type,
max_size_mb=max_size_mb
)

View File

@@ -0,0 +1,92 @@
"""
Schema utilities for Watch and Tag models.
Provides functions to extract readonly fields and properties from OpenAPI spec.
Shared by both the model layer and API layer to avoid circular dependencies.
"""
import functools
@functools.cache
def get_openapi_schema_dict():
"""
Get the raw OpenAPI spec dictionary for schema access.
Returns the YAML dict directly (not the OpenAPI object).
"""
import os
import yaml
spec_path = os.path.join(os.path.dirname(__file__), '../../docs/api-spec.yaml')
if not os.path.exists(spec_path):
spec_path = os.path.join(os.path.dirname(__file__), '../docs/api-spec.yaml')
with open(spec_path, 'r', encoding='utf-8') as f:
return yaml.safe_load(f)
@functools.cache
def _resolve_readonly_fields(schema_name):
"""
Generic helper to resolve readOnly fields, including allOf inheritance.
Args:
schema_name: Name of the schema (e.g., 'Watch', 'Tag')
Returns:
frozenset: All readOnly field names including inherited ones
"""
spec_dict = get_openapi_schema_dict()
schema = spec_dict['components']['schemas'].get(schema_name, {})
readonly_fields = set()
# Handle allOf (schema inheritance)
if 'allOf' in schema:
for item in schema['allOf']:
# Resolve $ref to parent schema
if '$ref' in item:
ref_path = item['$ref'].split('/')[-1]
ref_schema = spec_dict['components']['schemas'].get(ref_path, {})
if 'properties' in ref_schema:
for field_name, field_def in ref_schema['properties'].items():
if field_def.get('readOnly') is True:
readonly_fields.add(field_name)
# Check schema-specific properties
if 'properties' in item:
for field_name, field_def in item['properties'].items():
if field_def.get('readOnly') is True:
readonly_fields.add(field_name)
else:
# Direct properties (no inheritance)
if 'properties' in schema:
for field_name, field_def in schema['properties'].items():
if field_def.get('readOnly') is True:
readonly_fields.add(field_name)
return frozenset(readonly_fields)
@functools.cache
def get_readonly_watch_fields():
"""
Extract readOnly field names from Watch schema in OpenAPI spec.
Returns readOnly fields from WatchBase (uuid, date_created) + Watch-specific readOnly fields.
Used by:
- model/watch_base.py: Track when writable fields are edited
- api/Watch.py: Filter readonly fields from PUT requests
"""
return _resolve_readonly_fields('Watch')
@functools.cache
def get_readonly_tag_fields():
"""
Extract readOnly field names from Tag schema in OpenAPI spec.
Returns readOnly fields from WatchBase (uuid, date_created) + Tag-specific readOnly fields.
"""
return _resolve_readonly_fields('Tag')

View File

@@ -1,5 +1,6 @@
import time
import re
import apprise
from apprise import NotifyFormat
from loguru import logger
@@ -11,11 +12,10 @@ from ..diff import HTML_REMOVED_STYLE, REMOVED_PLACEMARKER_OPEN, REMOVED_PLACEMA
CHANGED_PLACEMARKER_CLOSED, HTML_CHANGED_STYLE, HTML_CHANGED_INTO_STYLE
import re
from ..notification_service import NotificationContextData
from ..notification_service import NotificationContextData, add_rendered_diff_to_notification_vars
newline_re = re.compile(r'\r\n|\r|\n')
def markup_text_links_to_html(body):
"""
Convert plaintext to HTML with clickable links.
@@ -79,6 +79,24 @@ def notification_format_align_with_apprise(n_format : str):
return n_format
def apply_html_color_to_body(n_body: str):
# https://github.com/dgtlmoon/changedetection.io/issues/821#issuecomment-1241837050
n_body = n_body.replace(REMOVED_PLACEMARKER_OPEN,
f'<span style="{HTML_REMOVED_STYLE}" role="deletion" aria-label="Removed text" title="Removed text">')
n_body = n_body.replace(REMOVED_PLACEMARKER_CLOSED, f'</span>')
n_body = n_body.replace(ADDED_PLACEMARKER_OPEN,
f'<span style="{HTML_ADDED_STYLE}" role="insertion" aria-label="Added text" title="Added text">')
n_body = n_body.replace(ADDED_PLACEMARKER_CLOSED, f'</span>')
# Handle changed/replaced lines (old → new)
n_body = n_body.replace(CHANGED_PLACEMARKER_OPEN,
f'<span style="{HTML_CHANGED_STYLE}" role="note" aria-label="Changed text" title="Changed text">')
n_body = n_body.replace(CHANGED_PLACEMARKER_CLOSED, f'</span>')
n_body = n_body.replace(CHANGED_INTO_PLACEMARKER_OPEN,
f'<span style="{HTML_CHANGED_INTO_STYLE}" role="note" aria-label="Changed into" title="Changed into">')
n_body = n_body.replace(CHANGED_INTO_PLACEMARKER_CLOSED, f'</span>')
return n_body
def apply_discord_markdown_to_body(n_body):
"""
Discord does not support <del> but it supports non-standard ~~strikethrough~~
@@ -333,6 +351,16 @@ def process_notification(n_object: NotificationContextData, datastore):
if not n_object.get('notification_urls'):
return None
n_object.update(add_rendered_diff_to_notification_vars(
notification_scan_text=n_object.get('notification_body', '')+n_object.get('notification_title', ''),
current_snapshot=n_object.get('current_snapshot'),
prev_snapshot=n_object.get('prev_snapshot'),
# Should always be false for 'text' mode or its too hard to read
# But otherwise, this could be some setting
word_diff=False if requested_output_format_original == 'text' else True,
)
)
with (apprise.LogCapture(level=apprise.logging.DEBUG) as logs):
for url in n_object['notification_urls']:

View File

@@ -54,34 +54,128 @@ def _check_cascading_vars(datastore, var_name, watch):
return None
class FormattableTimestamp(str):
"""
A str subclass representing a formatted datetime. As a plain string it renders
with the default format, but can also be called with a custom format argument
in Jinja2 templates:
{{ change_datetime }} → '2024-01-15 10:30:00 UTC'
{{ change_datetime(format='%Y') }} → '2024'
{{ change_datetime(format='%A') }} → 'Monday'
{{ change_datetime(format='%Y-%m-%d') }} → '2024-01-15'
Being a str subclass means it is natively JSON serializable.
"""
_DEFAULT_FORMAT = '%Y-%m-%d %H:%M:%S %Z'
def __new__(cls, timestamp):
dt = datetime.datetime.fromtimestamp(int(timestamp), tz=pytz.UTC)
local_tz = datetime.datetime.now().astimezone().tzinfo
dt_local = dt.astimezone(local_tz)
try:
formatted = dt_local.strftime(cls._DEFAULT_FORMAT)
except Exception:
formatted = dt_local.isoformat()
instance = super().__new__(cls, formatted)
instance._dt = dt_local
return instance
def __call__(self, format=_DEFAULT_FORMAT):
try:
return self._dt.strftime(format)
except Exception:
return self._dt.isoformat()
class FormattableDiff(str):
"""
A str subclass representing a rendered diff. As a plain string it renders
with the default options for that variant, but can be called with custom
arguments in Jinja2 templates:
{{ diff }} → default diff output
{{ diff(lines=5) }} → truncate to 5 lines
{{ diff(added_only=true) }} → only show added lines
{{ diff(removed_only=true) }} → only show removed lines
{{ diff(context=3) }} → 3 lines of context around changes
{{ diff(word_diff=false) }} → line-level diff instead of word-level
{{ diff(lines=10, added_only=true) }} → combine args
{{ diff_added(lines=5) }} → works on any diff_* variant too
Being a str subclass means it is natively JSON serializable.
"""
def __new__(cls, prev_snapshot, current_snapshot, **base_kwargs):
if prev_snapshot or current_snapshot:
from changedetectionio import diff as diff_module
rendered = diff_module.render_diff(prev_snapshot, current_snapshot, **base_kwargs)
else:
rendered = ''
instance = super().__new__(cls, rendered)
instance._prev = prev_snapshot
instance._current = current_snapshot
instance._base_kwargs = base_kwargs
return instance
def __call__(self, lines=None, added_only=False, removed_only=False, context=0,
word_diff=None, case_insensitive=False, ignore_junk=False):
from changedetectionio import diff as diff_module
kwargs = dict(self._base_kwargs)
if added_only:
kwargs['include_removed'] = False
if removed_only:
kwargs['include_added'] = False
if context:
kwargs['context_lines'] = int(context)
if word_diff is not None:
kwargs['word_diff'] = bool(word_diff)
if case_insensitive:
kwargs['case_insensitive'] = True
if ignore_junk:
kwargs['ignore_junk'] = True
result = diff_module.render_diff(self._prev or '', self._current or '', **kwargs)
if lines is not None:
result = '\n'.join(result.splitlines()[:int(lines)])
return result
# What is passed around as notification context, also used as the complete list of valid {{ tokens }}
class NotificationContextData(dict):
def __init__(self, initial_data=None, **kwargs):
# ValidateJinja2Template() validates against the keynames of this dict to check for valid tokens in the body (user submission)
super().__init__({
'base_url': None,
'change_datetime': FormattableTimestamp(time.time()),
'current_snapshot': None,
'diff': None,
'diff_clean': None,
'diff_added': None,
'diff_added_clean': None,
'diff_full': None,
'diff_full_clean': None,
'diff_patch': None,
'diff_removed': None,
'diff_removed_clean': None,
'diff': FormattableDiff('', ''),
'diff_clean': FormattableDiff('', '', include_change_type_prefix=False),
'diff_added': FormattableDiff('', '', include_removed=False),
'diff_added_clean': FormattableDiff('', '', include_removed=False, include_change_type_prefix=False),
'diff_full': FormattableDiff('', '', include_equal=True),
'diff_full_clean': FormattableDiff('', '', include_equal=True, include_change_type_prefix=False),
'diff_patch': FormattableDiff('', '', patch_format=True),
'diff_removed': FormattableDiff('', '', include_added=False),
'diff_removed_clean': FormattableDiff('', '', include_added=False, include_change_type_prefix=False),
'diff_url': None,
'markup_text_links_to_html_links': False, # If automatic conversion of plaintext to HTML should happen
'notification_timestamp': time.time(),
'prev_snapshot': None,
'preview_url': None,
'screenshot': None,
'triggered_text': None,
'timestamp_from': None,
'timestamp_to': None,
'triggered_text': None,
'uuid': 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX', # Converted to 'watch_uuid' in create_notification_parameters
'watch_mime_type': None,
'watch_tag': None,
'watch_title': None,
'watch_url': 'https://WATCH-PLACE-HOLDER/',
'watch_uuid': 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX', # Converted to 'watch_uuid' in create_notification_parameters
})
# Apply any initial data passed in
@@ -103,7 +197,7 @@ class NotificationContextData(dict):
So we can test the output in the notification body
"""
for key in self.keys():
if key in ['uuid', 'time', 'watch_uuid']:
if key in ['uuid', 'time', 'watch_uuid', 'change_datetime'] or key.startswith('diff'):
continue
rand_str = 'RANDOM-PLACEHOLDER-'+''.join(random.choices(string.ascii_letters + string.digits, k=12))
self[key] = rand_str
@@ -115,42 +209,64 @@ class NotificationContextData(dict):
super().__setitem__(key, value)
def add_rendered_diff_to_notification_vars(notification_scan_text:str, prev_snapshot:str, current_snapshot:str, word_diff:bool):
"""
Efficiently renders only the diff placeholders that are actually used in the notification text.
def timestamp_to_localtime(timestamp):
# Format the date using locale-aware formatting with timezone
dt = datetime.datetime.fromtimestamp(int(timestamp))
dt = dt.replace(tzinfo=pytz.UTC)
Scans the notification template for diff placeholder usage (diff, diff_added, diff_clean, etc.)
and only renders those specific variants, avoiding expensive render_diff() calls for unused placeholders.
Uses LRU caching to avoid duplicate renders when multiple placeholders share the same arguments.
# Get local timezone-aware datetime
local_tz = datetime.datetime.now().astimezone().tzinfo
local_dt = dt.astimezone(local_tz)
Args:
notification_scan_text: The notification template text to scan for placeholders
prev_snapshot: Previous version of content for diff comparison
current_snapshot: Current version of content for diff comparison
word_diff: Whether to use word-level (True) or line-level (False) diffing
# Format date with timezone - using strftime for locale awareness
try:
formatted_date = local_dt.strftime('%Y-%m-%d %H:%M:%S %Z')
except:
# Fallback if locale issues
formatted_date = local_dt.isoformat()
Returns:
dict: Only the diff placeholders that were found in notification_scan_text, with rendered content
"""
import re
return formatted_date
def set_basic_notification_vars(snapshot_contents, current_snapshot, prev_snapshot, watch, triggered_text, timestamp_changed=None):
now = time.time()
from changedetectionio import diff
# Define base kwargs for each diff variant — these become the stored defaults
# on the FormattableDiff object, so {{ diff(lines=5) }} overrides on top of them
diff_specs = {
'diff': {'word_diff': word_diff},
'diff_clean': {'word_diff': word_diff, 'include_change_type_prefix': False},
'diff_added': {'word_diff': word_diff, 'include_removed': False},
'diff_added_clean': {'word_diff': word_diff, 'include_removed': False, 'include_change_type_prefix': False},
'diff_full': {'word_diff': word_diff, 'include_equal': True},
'diff_full_clean': {'word_diff': word_diff, 'include_equal': True, 'include_change_type_prefix': False},
'diff_patch': {'word_diff': word_diff, 'patch_format': True},
'diff_removed': {'word_diff': word_diff, 'include_added': False},
'diff_removed_clean': {'word_diff': word_diff, 'include_added': False, 'include_change_type_prefix': False},
}
ret = {}
rendered_count = 0
# Only create FormattableDiff objects for diff keys actually used in the notification text
for key in NotificationContextData().keys():
if key.startswith('diff') and key in diff_specs:
# Check if this placeholder is actually used in the notification text
pattern = rf"(?<![A-Za-z0-9_]){re.escape(key)}(?![A-Za-z0-9_])"
if re.search(pattern, notification_scan_text, re.IGNORECASE):
ret[key] = FormattableDiff(prev_snapshot, current_snapshot, **diff_specs[key])
rendered_count += 1
if rendered_count:
logger.trace(f"Rendered {rendered_count} diff placeholder(s) {sorted(ret.keys())} in {time.time() - now:.3f}s")
return ret
def set_basic_notification_vars(current_snapshot, prev_snapshot, watch, triggered_text, timestamp_changed=None):
n_object = {
'current_snapshot': snapshot_contents,
'diff': diff.render_diff(prev_snapshot, current_snapshot),
'diff_clean': diff.render_diff(prev_snapshot, current_snapshot, include_change_type_prefix=False),
'diff_added': diff.render_diff(prev_snapshot, current_snapshot, include_removed=False),
'diff_added_clean': diff.render_diff(prev_snapshot, current_snapshot, include_removed=False, include_change_type_prefix=False),
'diff_full': diff.render_diff(prev_snapshot, current_snapshot, include_equal=True),
'diff_full_clean': diff.render_diff(prev_snapshot, current_snapshot, include_equal=True, include_change_type_prefix=False),
'diff_patch': diff.render_diff(prev_snapshot, current_snapshot, patch_format=True),
'diff_removed': diff.render_diff(prev_snapshot, current_snapshot, include_added=False),
'diff_removed_clean': diff.render_diff(prev_snapshot, current_snapshot, include_added=False, include_change_type_prefix=False),
'current_snapshot': current_snapshot,
'prev_snapshot': prev_snapshot,
'screenshot': watch.get_screenshot() if watch and watch.get('notification_screenshot') else None,
'change_datetime': timestamp_to_localtime(timestamp_changed) if timestamp_changed else None,
'change_datetime': FormattableTimestamp(timestamp_changed) if timestamp_changed else None,
'triggered_text': triggered_text,
'uuid': watch.get('uuid') if watch else None,
'watch_url': watch.get('url') if watch else None,
@@ -163,7 +279,6 @@ def set_basic_notification_vars(snapshot_contents, current_snapshot, prev_snapsh
if watch:
n_object.update(watch.extra_notification_token_values())
logger.trace(f"Main rendered notification placeholders (diff_added etc) calculated in {time.time() - now:.3f}s")
return n_object
class NotificationService:
@@ -220,8 +335,7 @@ class NotificationService:
current_snapshot = watch.get_history_snapshot(timestamp=dates[date_index_to])
n_object.update(set_basic_notification_vars(snapshot_contents=snapshot_contents,
current_snapshot=current_snapshot,
n_object.update(set_basic_notification_vars(current_snapshot=current_snapshot,
prev_snapshot=prev_snapshot,
watch=watch,
triggered_text=triggered_text,

View File

@@ -2,6 +2,7 @@ import pluggy
import os
import importlib
import sys
from loguru import logger
# Global plugin namespace for changedetection.io
PLUGIN_NAMESPACE = "changedetectionio"
@@ -16,15 +17,163 @@ class ChangeDetectionSpec:
@hookspec
def ui_edit_stats_extras(watch):
"""Return HTML content to add to the stats tab in the edit view.
Args:
watch: The watch object being edited
Returns:
str: HTML content to be inserted in the stats tab
"""
pass
@hookspec
def register_content_fetcher(self):
"""Return a tuple of (fetcher_name, fetcher_class) for content fetcher plugins.
The fetcher_name should start with 'html_' and the fetcher_class
should inherit from changedetectionio.content_fetchers.base.Fetcher
Returns:
tuple: (str: fetcher_name, class: fetcher_class)
"""
pass
@hookspec
def fetcher_status_icon(fetcher_name):
"""Return status icon HTML attributes for a content fetcher.
Args:
fetcher_name: The name of the fetcher (e.g., 'html_webdriver', 'html_js_zyte')
Returns:
str: HTML string containing <img> tags or other status icon elements
Empty string if no custom status icon is needed
"""
pass
@hookspec
def plugin_static_path(self):
"""Return the path to the plugin's static files directory.
Returns:
str: Absolute path to the plugin's static directory, or None if no static files
"""
pass
@hookspec
def get_itemprop_availability_override(self, content, fetcher_name, fetcher_instance, url):
"""Provide custom implementation of get_itemprop_availability for a specific fetcher.
This hook allows plugins to provide their own product availability detection
when their fetcher is being used. This is called as a fallback when the built-in
method doesn't find good data.
Args:
content: The HTML/text content to parse
fetcher_name: The name of the fetcher being used (e.g., 'html_js_zyte')
fetcher_instance: The fetcher instance that generated the content
url: The URL being watched/checked
Returns:
dict or None: Dictionary with availability data:
{
'price': float or None,
'availability': str or None, # e.g., 'in stock', 'out of stock'
'currency': str or None, # e.g., 'USD', 'EUR'
}
Or None if this plugin doesn't handle this fetcher or couldn't extract data
"""
pass
@hookspec
def plugin_settings_tab(self):
"""Return settings tab information for this plugin.
This hook allows plugins to add their own settings tab to the settings page.
Settings will be saved to a separate JSON file in the datastore directory.
Returns:
dict or None: Dictionary with settings tab information:
{
'plugin_id': str, # Unique identifier (e.g., 'zyte_fetcher')
'tab_label': str, # Display name for tab (e.g., 'Zyte Fetcher')
'form_class': Form, # WTForms Form class for the settings
'template_path': str, # Optional: path to Jinja2 template (relative to plugin)
# If not provided, a default form renderer will be used
}
Or None if this plugin doesn't provide settings
"""
pass
@hookspec
def register_processor(self):
"""Register an external processor plugin.
External packages can implement this hook to register custom processors
that will be discovered alongside built-in processors.
Returns:
dict or None: Dictionary with processor information:
{
'processor_name': str, # Machine name (e.g., 'osint_recon')
'processor_module': module, # Module containing processor.py
'processor_class': class, # The perform_site_check class
'metadata': { # Optional metadata
'name': str, # Display name
'description': str, # Description
'processor_weight': int,# Sort weight (lower = higher priority)
'list_badge_text': str, # Badge text for UI
}
}
Return None if this plugin doesn't provide a processor
"""
pass
@hookspec
def update_handler_alter(update_handler, watch, datastore):
"""Modify or wrap the update_handler before it processes a watch.
This hook is called after the update_handler (perform_site_check instance) is created
but before it calls call_browser() and run_changedetection(). Plugins can use this to:
- Wrap the handler to add logging/metrics
- Modify handler configuration
- Add custom preprocessing logic
Args:
update_handler: The perform_site_check instance that will process the watch
watch: The watch dict being processed
datastore: The application datastore
Returns:
object or None: Return a modified/wrapped handler, or None to keep the original.
If multiple plugins return handlers, they are chained in registration order.
"""
pass
@hookspec
def update_finalize(update_handler, watch, datastore, processing_exception):
"""Called after watch processing completes (success or failure).
This hook is called in the finally block after all processing is complete,
allowing plugins to perform cleanup, update metrics, or log final status.
The plugin can access update_handler.last_logging_insert_id if it was stored
during update_handler_alter, and use processing_exception to determine if
the processing succeeded or failed.
Args:
update_handler: The perform_site_check instance (may be None if creation failed)
watch: The watch dict that was processed (may be None if not loaded)
datastore: The application datastore
processing_exception: The exception from the main processing block, or None if successful.
This does NOT include cleanup exceptions - only exceptions from
the actual watch processing (fetch, diff, etc).
Returns:
None: This hook doesn't return a value
"""
pass
# Set up Plugin Manager
plugin_manager = pluggy.PluginManager(PLUGIN_NAMESPACE)
@@ -65,18 +214,396 @@ load_plugins_from_directories()
# Discover installed plugins from external packages (if any)
plugin_manager.load_setuptools_entrypoints(PLUGIN_NAMESPACE)
# Function to inject datastore into plugins that need it
def inject_datastore_into_plugins(datastore):
"""Inject the global datastore into plugins that need access to settings.
This should be called after plugins are loaded and datastore is initialized.
Args:
datastore: The global ChangeDetectionStore instance
"""
for plugin_name, plugin_obj in plugin_manager.list_name_plugin():
# Check if plugin has datastore attribute and it's not set
if hasattr(plugin_obj, 'datastore'):
if plugin_obj.datastore is None:
plugin_obj.datastore = datastore
logger.debug(f"Injected datastore into plugin: {plugin_name}")
# Function to register built-in fetchers - called later from content_fetchers/__init__.py
def register_builtin_fetchers():
"""Register built-in content fetchers as internal plugins
This is called from content_fetchers/__init__.py after all fetchers are imported
to avoid circular import issues.
"""
from changedetectionio.content_fetchers import requests, playwright, puppeteer, webdriver_selenium
# Register each built-in fetcher plugin
if hasattr(requests, 'requests_plugin'):
plugin_manager.register(requests.requests_plugin, 'builtin_requests')
if hasattr(playwright, 'playwright_plugin'):
plugin_manager.register(playwright.playwright_plugin, 'builtin_playwright')
if hasattr(puppeteer, 'puppeteer_plugin'):
plugin_manager.register(puppeteer.puppeteer_plugin, 'builtin_puppeteer')
if hasattr(webdriver_selenium, 'webdriver_selenium_plugin'):
plugin_manager.register(webdriver_selenium.webdriver_selenium_plugin, 'builtin_webdriver_selenium')
# Helper function to collect UI stats extras from all plugins
def collect_ui_edit_stats_extras(watch):
"""Collect and combine HTML content from all plugins that implement ui_edit_stats_extras"""
extras_content = []
# Get all plugins that implement the ui_edit_stats_extras hook
results = plugin_manager.hook.ui_edit_stats_extras(watch=watch)
# If we have results, add them to our content
if results:
for result in results:
if result: # Skip empty results
extras_content.append(result)
return "\n".join(extras_content) if extras_content else ""
return "\n".join(extras_content) if extras_content else ""
def collect_fetcher_status_icons(fetcher_name):
"""Collect status icon data from all plugins
Args:
fetcher_name: The name of the fetcher (e.g., 'html_webdriver', 'html_js_zyte')
Returns:
dict or None: Icon data dictionary from first matching plugin, or None
"""
# Get status icon data from plugins
results = plugin_manager.hook.fetcher_status_icon(fetcher_name=fetcher_name)
# Return first non-None result
if results:
for result in results:
if result and isinstance(result, dict):
return result
return None
def get_itemprop_availability_from_plugin(content, fetcher_name, fetcher_instance, url):
"""Get itemprop availability data from plugins as a fallback.
This is called when the built-in get_itemprop_availability doesn't find good data.
Args:
content: The HTML/text content to parse
fetcher_name: The name of the fetcher being used (e.g., 'html_js_zyte')
fetcher_instance: The fetcher instance that generated the content
url: The URL being watched (watch.link - includes Jinja2 evaluation)
Returns:
dict or None: Availability data dictionary from first matching plugin, or None
"""
# Get availability data from plugins
results = plugin_manager.hook.get_itemprop_availability_override(
content=content,
fetcher_name=fetcher_name,
fetcher_instance=fetcher_instance,
url=url
)
# Return first non-None result with actual data
if results:
for result in results:
if result and isinstance(result, dict):
# Check if the result has any meaningful data
if result.get('price') is not None or result.get('availability'):
return result
return None
def get_active_plugins():
"""Get a list of active plugins with their descriptions.
Returns:
list: List of dictionaries with plugin information:
[
{'name': 'plugin_name', 'description': 'Plugin description'},
...
]
"""
active_plugins = []
# Get all registered plugins
for plugin_name, plugin_obj in plugin_manager.list_name_plugin():
# Skip built-in plugins (they start with 'builtin_')
if plugin_name.startswith('builtin_'):
continue
# Get plugin description if available
description = None
if hasattr(plugin_obj, '__doc__') and plugin_obj.__doc__:
description = plugin_obj.__doc__.strip().split('\n')[0] # First line only
elif hasattr(plugin_obj, 'description'):
description = plugin_obj.description
# Try to get a friendly name from the plugin
friendly_name = plugin_name
if hasattr(plugin_obj, 'name'):
friendly_name = plugin_obj.name
active_plugins.append({
'name': friendly_name,
'description': description or 'No description available'
})
return active_plugins
def get_fetcher_capabilities(watch, datastore):
"""Get capability flags for a watch's fetcher.
Args:
watch: The watch object/dict
datastore: The datastore to resolve 'system' fetcher
Returns:
dict: Dictionary with capability flags:
{
'supports_browser_steps': bool,
'supports_screenshots': bool,
'supports_xpath_element_data': bool
}
"""
# Get the fetcher name from watch
fetcher_name = watch.get('fetch_backend', 'system')
# Resolve 'system' to actual fetcher
if fetcher_name == 'system':
fetcher_name = datastore.data['settings']['application'].get('fetch_backend', 'html_requests')
# Get the fetcher class
from changedetectionio import content_fetchers
# Try to get from built-in fetchers first
if hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
# Try to get from plugin-provided fetchers
# Query all plugins for registered fetchers
plugin_fetchers = plugin_manager.hook.register_content_fetcher()
for fetcher_registration in plugin_fetchers:
if fetcher_registration:
name, fetcher_class = fetcher_registration
if name == fetcher_name:
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
# Default: no capabilities
return {
'supports_browser_steps': False,
'supports_screenshots': False,
'supports_xpath_element_data': False
}
def get_plugin_settings_tabs():
"""Get all plugin settings tabs.
Returns:
list: List of dictionaries with plugin settings tab information:
[
{
'plugin_id': str,
'tab_label': str,
'form_class': Form,
'description': str
},
...
]
"""
tabs = []
results = plugin_manager.hook.plugin_settings_tab()
for result in results:
if result and isinstance(result, dict):
# Validate required fields
if 'plugin_id' in result and 'tab_label' in result and 'form_class' in result:
tabs.append(result)
else:
logger.warning(f"Invalid plugin settings tab spec: {result}")
return tabs
def load_plugin_settings(datastore_path, plugin_id):
"""Load settings for a specific plugin from JSON file.
Args:
datastore_path: Path to the datastore directory
plugin_id: Unique identifier for the plugin (e.g., 'zyte_fetcher')
Returns:
dict: Plugin settings, or empty dict if file doesn't exist
"""
import json
settings_file = os.path.join(datastore_path, f"{plugin_id}.json")
if not os.path.exists(settings_file):
return {}
try:
with open(settings_file, 'r', encoding='utf-8') as f:
return json.load(f)
except Exception as e:
logger.error(f"Failed to load settings for plugin '{plugin_id}': {e}")
return {}
def save_plugin_settings(datastore_path, plugin_id, settings):
"""Save settings for a specific plugin to JSON file.
Args:
datastore_path: Path to the datastore directory
plugin_id: Unique identifier for the plugin (e.g., 'zyte_fetcher')
settings: Dictionary of settings to save
Returns:
bool: True if save was successful, False otherwise
"""
import json
settings_file = os.path.join(datastore_path, f"{plugin_id}.json")
try:
with open(settings_file, 'w', encoding='utf-8') as f:
json.dump(settings, f, indent=2, ensure_ascii=False)
logger.info(f"Saved settings for plugin '{plugin_id}' to {settings_file}")
return True
except Exception as e:
logger.error(f"Failed to save settings for plugin '{plugin_id}': {e}")
return False
def get_plugin_template_paths():
"""Get list of plugin template directories for Jinja2 loader.
Scans both external pluggy plugins and built-in processor plugins.
Returns:
list: List of absolute paths to plugin template directories
"""
template_paths = []
# Add the base processors/templates directory (as absolute path)
processors_templates_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'processors', 'templates')
if os.path.isdir(processors_templates_dir):
template_paths.append(processors_templates_dir)
logger.debug(f"Added base processors template path: {processors_templates_dir}")
# Scan built-in processor plugins
from changedetectionio.processors import find_processors
processor_list = find_processors()
for processor_module, processor_name in processor_list:
# Each processor is a module, check if it has a templates directory
if hasattr(processor_module, '__file__'):
processor_file = processor_module.__file__
if processor_file:
# Get the processor directory (e.g., processors/image_ssim_diff/)
processor_dir = os.path.dirname(os.path.abspath(processor_file))
templates_dir = os.path.join(processor_dir, 'templates')
if os.path.isdir(templates_dir):
template_paths.append(templates_dir)
logger.debug(f"Added processor template path: {templates_dir}")
# Get all registered external pluggy plugins
for plugin_name, plugin_obj in plugin_manager.list_name_plugin():
# Check if plugin has a templates directory
if hasattr(plugin_obj, '__file__'):
plugin_file = plugin_obj.__file__
elif hasattr(plugin_obj, '__module__'):
# Get the module file
module = sys.modules.get(plugin_obj.__module__)
if module and hasattr(module, '__file__'):
plugin_file = module.__file__
else:
continue
else:
continue
if plugin_file:
plugin_dir = os.path.dirname(os.path.abspath(plugin_file))
templates_dir = os.path.join(plugin_dir, 'templates')
if os.path.isdir(templates_dir):
template_paths.append(templates_dir)
logger.debug(f"Added plugin template path: {templates_dir}")
return template_paths
def apply_update_handler_alter(update_handler, watch, datastore):
"""Apply update_handler_alter hooks from all plugins.
Allows plugins to wrap or modify the update_handler before it processes a watch.
Multiple plugins can chain modifications - each plugin receives the result from
the previous plugin.
Args:
update_handler: The perform_site_check instance to potentially modify
watch: The watch dict being processed
datastore: The application datastore
Returns:
object: The (potentially modified/wrapped) update_handler
"""
# Get all plugins that implement the update_handler_alter hook
results = plugin_manager.hook.update_handler_alter(
update_handler=update_handler,
watch=watch,
datastore=datastore
)
# Chain results - each plugin gets the result from the previous one
current_handler = update_handler
if results:
for result in results:
if result is not None:
logger.debug(f"Plugin modified update_handler for watch {watch.get('uuid')}")
current_handler = result
return current_handler
def apply_update_finalize(update_handler, watch, datastore, processing_exception):
"""Apply update_finalize hooks from all plugins.
Called in the finally block after watch processing completes, allowing plugins
to perform cleanup, update metrics, or log final status.
Args:
update_handler: The perform_site_check instance (may be None)
watch: The watch dict that was processed (may be None)
datastore: The application datastore
processing_exception: The exception from processing, or None if successful
Returns:
None
"""
try:
# Call all plugins that implement the update_finalize hook
plugin_manager.hook.update_finalize(
update_handler=update_handler,
watch=watch,
datastore=datastore,
processing_exception=processing_exception
)
except Exception as e:
# Don't let plugin errors crash the worker
logger.error(f"Error in update_finalize hook: {e}")
logger.exception(f"update_finalize hook exception details:")

View File

@@ -9,6 +9,15 @@ Some suggestions for the future
- `graphical`
## API schema extension (`api.yaml`)
A processor can extend the Watch/Tag API schema by placing an `api.yaml` alongside its `__init__.py`.
Define a `components.schemas.processor_config_<name>` entry and it will be merged into `WatchBase` at startup,
making `processor_config_<name>` a valid field on all watch create/update API calls.
The fully merged spec is served live at `/api/v1/full-spec`.
See `restock_diff/api.yaml` for a working example.
## Todo
- Make each processor return a extra list of sub-processed (so you could configure a single processor in different ways)

View File

@@ -1,180 +1,10 @@
from abc import abstractmethod
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.strtobool import strtobool
from copy import deepcopy
from functools import lru_cache
from loguru import logger
import hashlib
from flask_babel import gettext, get_locale
import importlib
import inspect
import os
import pkgutil
import re
class difference_detection_processor():
browser_steps = None
datastore = None
fetcher = None
screenshot = None
watch = None
xpath_data = None
preferred_proxy = None
def __init__(self, *args, datastore, watch_uuid, **kwargs):
super().__init__(*args, **kwargs)
self.datastore = datastore
self.watch = deepcopy(self.datastore.data['watching'].get(watch_uuid))
# Generic fetcher that should be extended (requests, playwright etc)
self.fetcher = Fetcher()
async def call_browser(self, preferred_proxy_id=None):
from requests.structures import CaseInsensitiveDict
url = self.watch.link
# Protect against file:, file:/, file:// access, check the real "link" without any meta "source:" etc prepended.
if re.search(r'^file:', url.strip(), re.IGNORECASE):
if not strtobool(os.getenv('ALLOW_FILE_URI', 'false')):
raise Exception(
"file:// type access is denied for security reasons."
)
# Requests, playwright, other browser via wss:// etc, fetch_extra_something
prefer_fetch_backend = self.watch.get('fetch_backend', 'system')
# Proxy ID "key"
preferred_proxy_id = preferred_proxy_id if preferred_proxy_id else self.datastore.get_preferred_proxy_for_watch(uuid=self.watch.get('uuid'))
# Pluggable content self.fetcher
if not prefer_fetch_backend or prefer_fetch_backend == 'system':
prefer_fetch_backend = self.datastore.data['settings']['application'].get('fetch_backend')
# In the case that the preferred fetcher was a browser config with custom connection URL..
# @todo - on save watch, if its extra_browser_ then it should be obvious it will use playwright (like if its requests now..)
custom_browser_connection_url = None
if prefer_fetch_backend.startswith('extra_browser_'):
(t, key) = prefer_fetch_backend.split('extra_browser_')
connection = list(
filter(lambda s: (s['browser_name'] == key), self.datastore.data['settings']['requests'].get('extra_browsers', [])))
if connection:
prefer_fetch_backend = 'html_webdriver'
custom_browser_connection_url = connection[0].get('browser_connection_url')
# PDF should be html_requests because playwright will serve it up (so far) in a embedded page
# @todo https://github.com/dgtlmoon/changedetection.io/issues/2019
# @todo needs test to or a fix
if self.watch.is_pdf:
prefer_fetch_backend = "html_requests"
# Grab the right kind of 'fetcher', (playwright, requests, etc)
from changedetectionio import content_fetchers
if hasattr(content_fetchers, prefer_fetch_backend):
# @todo TEMPORARY HACK - SWITCH BACK TO PLAYWRIGHT FOR BROWSERSTEPS
if prefer_fetch_backend == 'html_webdriver' and self.watch.has_browser_steps:
# This is never supported in selenium anyway
logger.warning("Using playwright fetcher override for possible puppeteer request in browsersteps, because puppetteer:browser steps is incomplete.")
from changedetectionio.content_fetchers.playwright import fetcher as playwright_fetcher
fetcher_obj = playwright_fetcher
else:
fetcher_obj = getattr(content_fetchers, prefer_fetch_backend)
else:
# What it referenced doesnt exist, Just use a default
fetcher_obj = getattr(content_fetchers, "html_requests")
proxy_url = None
if preferred_proxy_id:
# Custom browser endpoints should NOT have a proxy added
if not prefer_fetch_backend.startswith('extra_browser_'):
proxy_url = self.datastore.proxy_list.get(preferred_proxy_id).get('url')
logger.debug(f"Selected proxy key '{preferred_proxy_id}' as proxy URL '{proxy_url}' for {url}")
else:
logger.debug("Skipping adding proxy data when custom Browser endpoint is specified. ")
logger.debug(f"Using proxy '{proxy_url}' for {self.watch['uuid']}")
# Now call the fetcher (playwright/requests/etc) with arguments that only a fetcher would need.
# When browser_connection_url is None, it method should default to working out whats the best defaults (os env vars etc)
self.fetcher = fetcher_obj(proxy_override=proxy_url,
custom_browser_connection_url=custom_browser_connection_url
)
if self.watch.has_browser_steps:
self.fetcher.browser_steps = self.watch.get('browser_steps', [])
self.fetcher.browser_steps_screenshot_path = os.path.join(self.datastore.datastore_path, self.watch.get('uuid'))
# Tweak the base config with the per-watch ones
from changedetectionio.jinja2_custom import render as jinja_render
request_headers = CaseInsensitiveDict()
ua = self.datastore.data['settings']['requests'].get('default_ua')
if ua and ua.get(prefer_fetch_backend):
request_headers.update({'User-Agent': ua.get(prefer_fetch_backend)})
request_headers.update(self.watch.get('headers', {}))
request_headers.update(self.datastore.get_all_base_headers())
request_headers.update(self.datastore.get_all_headers_in_textfile_for_watch(uuid=self.watch.get('uuid')))
# https://github.com/psf/requests/issues/4525
# Requests doesnt yet support brotli encoding, so don't put 'br' here, be totally sure that the user cannot
# do this by accident.
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
for header_name in request_headers:
request_headers.update({header_name: jinja_render(template_str=request_headers.get(header_name))})
timeout = self.datastore.data['settings']['requests'].get('timeout')
request_body = self.watch.get('body')
if request_body:
request_body = jinja_render(template_str=self.watch.get('body'))
request_method = self.watch.get('method')
ignore_status_codes = self.watch.get('ignore_status_codes', False)
# Configurable per-watch or global extra delay before extracting text (for webDriver types)
system_webdriver_delay = self.datastore.data['settings']['application'].get('webdriver_delay', None)
if self.watch.get('webdriver_delay'):
self.fetcher.render_extract_delay = self.watch.get('webdriver_delay')
elif system_webdriver_delay is not None:
self.fetcher.render_extract_delay = system_webdriver_delay
if self.watch.get('webdriver_js_execute_code') is not None and self.watch.get('webdriver_js_execute_code').strip():
self.fetcher.webdriver_js_execute_code = self.watch.get('webdriver_js_execute_code')
# Requests for PDF's, images etc should be passwd the is_binary flag
is_binary = self.watch.is_pdf
# And here we go! call the right browser with browser-specific settings
empty_pages_are_a_change = self.datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
# All fetchers are now async
await self.fetcher.run(
current_include_filters=self.watch.get('include_filters'),
empty_pages_are_a_change=empty_pages_are_a_change,
fetch_favicon=self.watch.favicon_is_expired(),
ignore_status_codes=ignore_status_codes,
is_binary=is_binary,
request_body=request_body,
request_headers=request_headers,
request_method=request_method,
timeout=timeout,
url=url,
)
#@todo .quit here could go on close object, so we can run JS if change-detected
self.fetcher.quit(watch=self.watch)
# After init, call run_changedetection() which will do the actual change-detection
@abstractmethod
def run_changedetection(self, watch):
update_obj = {'last_notification_error': False, 'last_error': False}
some_data = 'xxxxx'
update_obj["previous_md5"] = hashlib.md5(some_data.encode('utf-8')).hexdigest()
changed_detected = False
return changed_detected, update_obj, ''.encode('utf-8')
def find_sub_packages(package_name):
"""
@@ -187,9 +17,11 @@ def find_sub_packages(package_name):
return [name for _, name, is_pkg in pkgutil.iter_modules(package.__path__) if is_pkg]
@lru_cache(maxsize=1)
def find_processors():
"""
Find all subclasses of DifferenceDetectionProcessor in the specified package.
Results are cached to avoid repeated discovery.
:param package_name: The name of the package to scan for processor modules.
:return: A list of (module, class) tuples.
@@ -198,6 +30,7 @@ def find_processors():
processors = []
sub_packages = find_sub_packages(package_name)
from changedetectionio.processors.base import difference_detection_processor
for sub_package in sub_packages:
module_name = f"{package_name}.{sub_package}.processor"
@@ -206,11 +39,32 @@ def find_processors():
# Iterate through all classes in the module
for name, obj in inspect.getmembers(module, inspect.isclass):
if issubclass(obj, difference_detection_processor) and obj is not difference_detection_processor:
# Only register classes that are actually defined in this module (not imported)
if (issubclass(obj, difference_detection_processor) and
obj is not difference_detection_processor and
obj.__module__ == module.__name__):
processors.append((module, sub_package))
break # Only need one processor per module
except (ModuleNotFoundError, ImportError) as e:
logger.warning(f"Failed to import module {module_name}: {e} (find_processors())")
# Discover plugin processors via pluggy
try:
from changedetectionio.pluggy_interface import plugin_manager
plugin_results = plugin_manager.hook.register_processor()
for result in plugin_results:
if result and isinstance(result, dict):
processor_module = result.get('processor_module')
processor_name = result.get('processor_name')
if processor_module and processor_name:
processors.append((processor_module, processor_name))
plugin_path = getattr(processor_module, '__file__', 'unknown location')
logger.info(f"Registered plugin processor: {processor_name} from {plugin_path}")
except Exception as e:
logger.warning(f"Error loading plugin processors: {e}")
return processors
@@ -242,17 +96,392 @@ def get_custom_watch_obj_for_processor(processor_name):
return watch_class
def available_processors():
"""
Get a list of processors by name and description for the UI elements
:return: A list :)
def find_processor_module(processor_name):
"""
Find the processor module by name.
Args:
processor_name: Processor machine name (e.g., 'image_ssim_diff')
Returns:
module: The processor's parent module, or None if not found
"""
processor_classes = find_processors()
processor_tuple = next((tpl for tpl in processor_classes if tpl[1] == processor_name), None)
if processor_tuple:
# Return the parent module (the package containing processor.py)
return get_parent_module(processor_tuple[0])
return None
def get_processor_module(processor_name):
"""
Get the actual processor module (with perform_site_check class) by name.
Works for both built-in and plugin processors.
Args:
processor_name: Processor machine name (e.g., 'text_json_diff', 'osint_recon')
Returns:
module: The processor module containing perform_site_check, or None if not found
"""
processor_classes = find_processors()
processor_tuple = next((tpl for tpl in processor_classes if tpl[1] == processor_name), None)
if processor_tuple:
# Return the actual processor module (first element of tuple)
return processor_tuple[0]
return None
def get_processor_submodule(processor_name, submodule_name):
"""
Get an optional submodule from a processor (e.g., 'difference', 'extract', 'preview').
Works for both built-in and plugin processors.
Args:
processor_name: Processor machine name (e.g., 'text_json_diff', 'osint_recon')
submodule_name: Name of the submodule (e.g., 'difference', 'extract', 'preview')
Returns:
module: The submodule if it exists, or None if not found
"""
processor_classes = find_processors()
processor_tuple = next((tpl for tpl in processor_classes if tpl[1] == processor_name), None)
if not processor_tuple:
return None
processor_module = processor_tuple[0]
parent_module = get_parent_module(processor_module)
if not parent_module:
return None
# Try to import the submodule
try:
# For built-in processors: changedetectionio.processors.text_json_diff.difference
# For plugin processors: changedetectionio_osint.difference
parent_module_name = parent_module.__name__
submodule_full_name = f"{parent_module_name}.{submodule_name}"
return importlib.import_module(submodule_full_name)
except (ModuleNotFoundError, ImportError):
return None
@lru_cache(maxsize=1)
def get_plugin_processor_metadata():
"""Get metadata from plugin processors."""
metadata = {}
try:
from changedetectionio.pluggy_interface import plugin_manager
plugin_results = plugin_manager.hook.register_processor()
for result in plugin_results:
if result and isinstance(result, dict):
processor_name = result.get('processor_name')
meta = result.get('metadata', {})
if processor_name:
metadata[processor_name] = meta
except Exception as e:
logger.warning(f"Error getting plugin processor metadata: {e}")
return metadata
@lru_cache(maxsize=32)
def _available_processors_cached(locale_str):
"""
Internal cached function that includes locale in cache key.
This ensures translations are cached per-language instead of globally.
:param locale_str: The locale string (e.g., 'en', 'it', 'zh')
:return: A list of tuples (processor_name, translated_description, weight)
"""
processor_classes = find_processors()
# Check if DISABLED_PROCESSORS env var is set
disabled_processors_env = os.getenv('DISABLED_PROCESSORS', 'image_ssim_diff').strip()
disabled_processors = []
if disabled_processors_env:
# Parse comma-separated list and strip whitespace
disabled_processors = [p.strip() for p in disabled_processors_env.split(',') if p.strip()]
logger.info(f"DISABLED_PROCESSORS set, disabling: {disabled_processors}")
available = []
for package, processor_class in processor_classes:
available.append((processor_class, package.name))
plugin_metadata = get_plugin_processor_metadata()
return available
for module, sub_package_name in processor_classes:
# Skip disabled processors
if sub_package_name in disabled_processors:
logger.debug(f"Skipping processor '{sub_package_name}' (in DISABLED_PROCESSORS)")
continue
# Check if this is a plugin processor
if sub_package_name in plugin_metadata:
meta = plugin_metadata[sub_package_name]
description = gettext(meta.get('name', sub_package_name))
# Plugin processors start from weight 10 to separate them from built-in processors
weight = 100 + meta.get('processor_weight', 0)
else:
# Try to get the 'name' attribute from the processor module first
if hasattr(module, 'name'):
description = gettext(module.name)
else:
# Fall back to processor_description from parent module's __init__.py
parent_module = get_parent_module(module)
if parent_module and hasattr(parent_module, 'processor_description'):
description = gettext(parent_module.processor_description)
else:
# Final fallback to a readable name
description = sub_package_name.replace('_', ' ').title()
# Get weight for sorting (lower weight = higher in list)
weight = 0 # Default weight for processors without explicit weight
# Check processor module itself first
if hasattr(module, 'processor_weight'):
weight = module.processor_weight
else:
# Fall back to parent module (package __init__.py)
parent_module = get_parent_module(module)
if parent_module and hasattr(parent_module, 'processor_weight'):
weight = parent_module.processor_weight
available.append((sub_package_name, description, weight))
# Sort by weight (lower weight = appears first)
available.sort(key=lambda x: x[2])
# Return as tuples without weight (for backwards compatibility)
return [(name, desc) for name, desc, weight in available]
def available_processors():
"""
Get a list of processors by name and description for the UI elements.
Can be filtered via DISABLED_PROCESSORS environment variable (comma-separated list).
This function delegates to a locale-aware cached version to ensure translations
are cached per-language instead of globally.
:return: A list of tuples (processor_name, translated_description)
"""
# Get current locale and use it as cache key
# Convert Babel Locale object to string for use as cache key
locale = get_locale()
locale_str = str(locale) if locale else 'en'
return _available_processors_cached(locale_str)
def get_default_processor():
"""
Get the default processor to use when none is specified.
Returns the first available processor based on weight (lowest weight = highest priority).
This ensures forms auto-select a valid processor even when DISABLED_PROCESSORS filters the list.
:return: The processor name string (e.g., 'text_json_diff')
"""
available = available_processors()
if available:
return available[0][0] # Return the processor name from first tuple
return 'text_json_diff' # Fallback if somehow no processors are available
def get_processor_badge_texts():
"""
Get a dictionary mapping processor names to their list_badge_text values.
Translations are applied based on the current request locale.
:return: A dict mapping processor name to badge text (e.g., {'text_json_diff': 'Text', 'restock_diff': 'Restock'})
"""
processor_classes = find_processors()
badge_texts = {}
for module, sub_package_name in processor_classes:
# Try to get the 'list_badge_text' attribute from the processor module
if hasattr(module, 'list_badge_text'):
badge_texts[sub_package_name] = gettext(module.list_badge_text)
else:
# Fall back to parent module's __init__.py
parent_module = get_parent_module(module)
if parent_module and hasattr(parent_module, 'list_badge_text'):
badge_texts[sub_package_name] = gettext(parent_module.list_badge_text)
return badge_texts
def get_processor_descriptions():
"""
Get a dictionary mapping processor names to their description/name values.
Translations are applied based on the current request locale.
:return: A dict mapping processor name to description (e.g., {'text_json_diff': 'Webpage Text/HTML, JSON and PDF changes'})
"""
processor_classes = find_processors()
descriptions = {}
for module, sub_package_name in processor_classes:
# Try to get the 'name' or 'description' attribute from the processor module first
if hasattr(module, 'name'):
descriptions[sub_package_name] = gettext(module.name)
elif hasattr(module, 'description'):
descriptions[sub_package_name] = gettext(module.description)
else:
# Fall back to parent module's __init__.py
parent_module = get_parent_module(module)
if parent_module and hasattr(parent_module, 'processor_description'):
descriptions[sub_package_name] = gettext(parent_module.processor_description)
elif parent_module and hasattr(parent_module, 'name'):
descriptions[sub_package_name] = gettext(parent_module.name)
else:
# Final fallback to a readable name
descriptions[sub_package_name] = sub_package_name.replace('_', ' ').title()
return descriptions
def generate_processor_badge_colors(processor_name):
"""
Generate consistent colors for a processor badge based on its name.
Uses a hash of the processor name to generate pleasing, accessible colors
for both light and dark modes.
:param processor_name: The processor name (e.g., 'text_json_diff')
:return: A dict with 'light' and 'dark' color schemes, each containing 'bg' and 'color'
"""
import hashlib
# Generate a consistent hash from the processor name
hash_obj = hashlib.md5(processor_name.encode('utf-8'))
hash_int = int(hash_obj.hexdigest()[:8], 16)
# Generate hue from hash (0-360)
hue = hash_int % 360
# Light mode: pastel background with darker text
light_saturation = 60 + (hash_int % 25) # 60-85%
light_lightness = 85 + (hash_int % 10) # 85-95% - very light
text_lightness = 25 + (hash_int % 15) # 25-40% - dark
# Dark mode: solid, vibrant colors with white text
dark_saturation = 55 + (hash_int % 20) # 55-75%
dark_lightness = 45 + (hash_int % 15) # 45-60%
return {
'light': {
'bg': f'hsl({hue}, {light_saturation}%, {light_lightness}%)',
'color': f'hsl({hue}, 50%, {text_lightness}%)'
},
'dark': {
'bg': f'hsl({hue}, {dark_saturation}%, {dark_lightness}%)',
'color': '#fff'
}
}
@lru_cache(maxsize=1)
def get_processor_badge_css():
"""
Generate CSS for all processor badges with auto-generated colors.
This creates CSS rules for both light and dark modes for each processor.
:return: A string containing CSS rules for all processor badges
"""
processor_classes = find_processors()
css_rules = []
for module, sub_package_name in processor_classes:
colors = generate_processor_badge_colors(sub_package_name)
# Light mode rule
css_rules.append(
f".processor-badge-{sub_package_name} {{\n"
f" background-color: {colors['light']['bg']};\n"
f" color: {colors['light']['color']};\n"
f"}}"
)
# Dark mode rule
css_rules.append(
f"html[data-darkmode=\"true\"] .processor-badge-{sub_package_name} {{\n"
f" background-color: {colors['dark']['bg']};\n"
f" color: {colors['dark']['color']};\n"
f"}}"
)
return '\n\n'.join(css_rules)
def save_processor_config(datastore, watch_uuid, config_data):
"""
Save processor-specific configuration to JSON file.
This is a shared helper function used by both the UI edit form and API endpoints
to consistently handle processor configuration storage.
Args:
datastore: The application datastore instance
watch_uuid: UUID of the watch
config_data: Dictionary of configuration data to save (with processor_config_* prefix removed)
Returns:
bool: True if saved successfully, False otherwise
"""
if not config_data:
return True
try:
from changedetectionio.processors.base import difference_detection_processor
# Get processor name from watch
watch = datastore.data['watching'].get(watch_uuid)
if not watch:
logger.error(f"Cannot save processor config: watch {watch_uuid} not found")
return False
processor_name = watch.get('processor', 'text_json_diff')
# Create a processor instance to access config methods
processor_instance = difference_detection_processor(datastore, watch_uuid)
# Use processor name as filename so each processor keeps its own config
config_filename = f'{processor_name}.json'
processor_instance.update_extra_watch_config(config_filename, config_data)
logger.debug(f"Saved processor config to {config_filename}: {config_data}")
return True
except Exception as e:
logger.error(f"Failed to save processor config: {e}")
return False
def extract_processor_config_from_form_data(form_data):
"""
Extract processor_config_* fields from form data and return separate dicts.
This is a shared helper function used by both the UI edit form and API endpoints
to consistently handle processor configuration extraction.
IMPORTANT: This function modifies form_data in-place by removing processor_config_* fields.
Args:
form_data: Dictionary of form data (will be modified in-place)
Returns:
dict: Dictionary of processor config data (with processor_config_* prefix removed)
"""
processor_config_data = {}
# Use list() to create a copy of keys since we're modifying the dict
for field_name in list(form_data.keys()):
if field_name.startswith('processor_config_'):
config_key = field_name.replace('processor_config_', '')
# Save all values (including empty strings) to allow explicit clearing of settings
processor_config_data[config_key] = form_data[field_name]
# Remove from form_data to prevent it from reaching datastore
del form_data[field_name]
return processor_config_data

View File

@@ -0,0 +1,357 @@
import asyncio
import re
import hashlib
from changedetectionio.browser_steps.browser_steps import browser_steps_get_valid_steps
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.strtobool import strtobool
from changedetectionio.validate_url import is_private_hostname
from copy import deepcopy
from abc import abstractmethod
import os
from urllib.parse import urlparse
from loguru import logger
SCREENSHOT_FORMAT_JPEG = 'JPEG'
SCREENSHOT_FORMAT_PNG = 'PNG'
class difference_detection_processor():
browser_steps = None
datastore = None
fetcher = None
screenshot = None
watch = None
xpath_data = None
preferred_proxy = None
screenshot_format = SCREENSHOT_FORMAT_JPEG
last_raw_content_checksum = None
def __init__(self, datastore, watch_uuid):
self.datastore = datastore
self.watch_uuid = watch_uuid
# Create a stable snapshot of the watch for processing
# Why deepcopy?
# 1. Prevents "dict changed during iteration" errors if watch is modified during processing
# 2. Preserves Watch object with properties (.link, .is_pdf, etc.) - can't use dict()
# 3. Safe now: Watch.__deepcopy__() shares datastore ref (no memory leak) but copies dict data
self.watch = deepcopy(self.datastore.data['watching'].get(watch_uuid))
# Generic fetcher that should be extended (requests, playwright etc)
self.fetcher = Fetcher()
# Load the last raw content checksum from file
self.read_last_raw_content_checksum()
def update_last_raw_content_checksum(self, checksum):
"""
Save the raw content MD5 checksum to file.
This is used for skip logic - avoid reprocessing if raw HTML unchanged.
"""
if not checksum:
return
watch = self.datastore.data['watching'].get(self.watch_uuid)
if not watch:
return
data_dir = watch.data_dir
if not data_dir:
return
watch.ensure_data_dir_exists()
checksum_file = os.path.join(data_dir, 'last-checksum.txt')
try:
with open(checksum_file, 'w', encoding='utf-8') as f:
f.write(checksum)
self.last_raw_content_checksum = checksum
except IOError as e:
logger.warning(f"Failed to write checksum file for {self.watch_uuid}: {e}")
def read_last_raw_content_checksum(self):
"""
Read the last raw content MD5 checksum from file.
Returns None if file doesn't exist (first run) or can't be read.
"""
watch = self.datastore.data['watching'].get(self.watch_uuid)
if not watch:
self.last_raw_content_checksum = None
return
data_dir = watch.data_dir
if not data_dir:
self.last_raw_content_checksum = None
return
checksum_file = os.path.join(data_dir, 'last-checksum.txt')
if not os.path.isfile(checksum_file):
self.last_raw_content_checksum = None
return
try:
with open(checksum_file, 'r', encoding='utf-8') as f:
self.last_raw_content_checksum = f.read().strip()
except IOError as e:
logger.warning(f"Failed to read checksum file for {self.watch_uuid}: {e}")
self.last_raw_content_checksum = None
async def validate_iana_url(self):
"""Pre-flight SSRF check — runs DNS lookup in executor to avoid blocking the event loop.
Covers all fetchers (requests, playwright, puppeteer, plugins) since every fetch goes
through call_browser().
"""
if strtobool(os.getenv('ALLOW_IANA_RESTRICTED_ADDRESSES', 'false')):
return
parsed = urlparse(self.watch.link)
if not parsed.hostname:
return
loop = asyncio.get_running_loop()
if await loop.run_in_executor(None, is_private_hostname, parsed.hostname):
raise Exception(
f"Fetch blocked: '{self.watch.link}' resolves to a private/reserved IP address. "
f"Set ALLOW_IANA_RESTRICTED_ADDRESSES=true to allow."
)
async def call_browser(self, preferred_proxy_id=None):
from requests.structures import CaseInsensitiveDict
url = self.watch.link
# Protect against file:, file:/, file:// access, check the real "link" without any meta "source:" etc prepended.
if re.search(r'^file:', url.strip(), re.IGNORECASE):
if not strtobool(os.getenv('ALLOW_FILE_URI', 'false')):
raise Exception(
"file:// type access is denied for security reasons."
)
await self.validate_iana_url()
# Requests, playwright, other browser via wss:// etc, fetch_extra_something
prefer_fetch_backend = self.watch.get('fetch_backend', 'system')
# Proxy ID "key"
preferred_proxy_id = preferred_proxy_id if preferred_proxy_id else self.datastore.get_preferred_proxy_for_watch(
uuid=self.watch.get('uuid'))
# Pluggable content self.fetcher
if not prefer_fetch_backend or prefer_fetch_backend == 'system':
prefer_fetch_backend = self.datastore.data['settings']['application'].get('fetch_backend')
# In the case that the preferred fetcher was a browser config with custom connection URL..
# @todo - on save watch, if its extra_browser_ then it should be obvious it will use playwright (like if its requests now..)
custom_browser_connection_url = None
if prefer_fetch_backend.startswith('extra_browser_'):
(t, key) = prefer_fetch_backend.split('extra_browser_')
connection = list(
filter(lambda s: (s['browser_name'] == key), self.datastore.data['settings']['requests'].get('extra_browsers', [])))
if connection:
prefer_fetch_backend = 'html_webdriver'
custom_browser_connection_url = connection[0].get('browser_connection_url')
# PDF should be html_requests because playwright will serve it up (so far) in a embedded page
# @todo https://github.com/dgtlmoon/changedetection.io/issues/2019
# @todo needs test to or a fix
if self.watch.is_pdf:
prefer_fetch_backend = "html_requests"
# Grab the right kind of 'fetcher', (playwright, requests, etc)
from changedetectionio import content_fetchers
if hasattr(content_fetchers, prefer_fetch_backend):
# @todo TEMPORARY HACK - SWITCH BACK TO PLAYWRIGHT FOR BROWSERSTEPS
if prefer_fetch_backend == 'html_webdriver' and self.watch.has_browser_steps:
# This is never supported in selenium anyway
logger.warning(
"Using playwright fetcher override for possible puppeteer request in browsersteps, because puppetteer:browser steps is incomplete.")
from changedetectionio.content_fetchers.playwright import fetcher as playwright_fetcher
fetcher_obj = playwright_fetcher
else:
fetcher_obj = getattr(content_fetchers, prefer_fetch_backend)
else:
# What it referenced doesnt exist, Just use a default
fetcher_obj = getattr(content_fetchers, "html_requests")
proxy_url = None
if preferred_proxy_id:
# Custom browser endpoints should NOT have a proxy added
if not prefer_fetch_backend.startswith('extra_browser_'):
proxy_url = self.datastore.proxy_list.get(preferred_proxy_id).get('url')
logger.debug(f"Selected proxy key '{preferred_proxy_id}' as proxy URL '{proxy_url}' for {url}")
else:
logger.debug("Skipping adding proxy data when custom Browser endpoint is specified. ")
logger.debug(f"Using proxy '{proxy_url}' for {self.watch['uuid']}")
# Now call the fetcher (playwright/requests/etc) with arguments that only a fetcher would need.
# When browser_connection_url is None, it method should default to working out whats the best defaults (os env vars etc)
self.fetcher = fetcher_obj(proxy_override=proxy_url,
custom_browser_connection_url=custom_browser_connection_url,
screenshot_format=self.screenshot_format
)
if self.watch.has_browser_steps:
self.fetcher.browser_steps = browser_steps_get_valid_steps(self.watch.get('browser_steps', []))
self.fetcher.browser_steps_screenshot_path = os.path.join(self.datastore.datastore_path, self.watch.get('uuid'))
# Tweak the base config with the per-watch ones
from changedetectionio.jinja2_custom import render as jinja_render
request_headers = CaseInsensitiveDict()
ua = self.datastore.data['settings']['requests'].get('default_ua')
if ua and ua.get(prefer_fetch_backend):
request_headers.update({'User-Agent': ua.get(prefer_fetch_backend)})
request_headers.update(self.watch.get('headers', {}))
request_headers.update(self.datastore.get_all_base_headers())
request_headers.update(self.datastore.get_all_headers_in_textfile_for_watch(uuid=self.watch.get('uuid')))
# https://github.com/psf/requests/issues/4525
# Requests doesnt yet support brotli encoding, so don't put 'br' here, be totally sure that the user cannot
# do this by accident.
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
for header_name in request_headers:
request_headers.update({header_name: jinja_render(template_str=request_headers.get(header_name))})
timeout = self.datastore.data['settings']['requests'].get('timeout')
request_body = self.watch.get('body')
if request_body:
request_body = jinja_render(template_str=self.watch.get('body'))
request_method = self.watch.get('method')
ignore_status_codes = self.watch.get('ignore_status_codes', False)
# Configurable per-watch or global extra delay before extracting text (for webDriver types)
system_webdriver_delay = self.datastore.data['settings']['application'].get('webdriver_delay', None)
if self.watch.get('webdriver_delay'):
self.fetcher.render_extract_delay = self.watch.get('webdriver_delay')
elif system_webdriver_delay is not None:
self.fetcher.render_extract_delay = system_webdriver_delay
if self.watch.get('webdriver_js_execute_code') is not None and self.watch.get('webdriver_js_execute_code').strip():
self.fetcher.webdriver_js_execute_code = self.watch.get('webdriver_js_execute_code')
# Requests for PDF's, images etc should be passwd the is_binary flag
is_binary = self.watch.is_pdf
# And here we go! call the right browser with browser-specific settings
empty_pages_are_a_change = self.datastore.data['settings']['application'].get('empty_pages_are_a_change', False)
# All fetchers are now async
await self.fetcher.run(
current_include_filters=self.watch.get('include_filters'),
empty_pages_are_a_change=empty_pages_are_a_change,
fetch_favicon=self.watch.favicon_is_expired(),
ignore_status_codes=ignore_status_codes,
is_binary=is_binary,
request_body=request_body,
request_headers=request_headers,
request_method=request_method,
screenshot_format=self.screenshot_format,
timeout=timeout,
url=url,
watch_uuid=self.watch_uuid,
)
# @todo .quit here could go on close object, so we can run JS if change-detected
await self.fetcher.quit(watch=self.watch)
# After init, call run_changedetection() which will do the actual change-detection
def get_extra_watch_config(self, filename):
"""
Read processor-specific JSON config file from watch data directory.
Args:
filename: Name of JSON file (e.g., "visual_ssim_score.json")
Returns:
dict: Parsed JSON data, or empty dict if file doesn't exist
"""
import json
import os
watch = self.datastore.data['watching'].get(self.watch_uuid)
data_dir = watch.data_dir
if not data_dir:
return {}
filepath = os.path.join(data_dir, filename)
if not os.path.isfile(filepath):
return {}
try:
with open(filepath, 'r', encoding='utf-8') as f:
return json.load(f)
except (json.JSONDecodeError, IOError) as e:
logger.warning(f"Failed to read extra watch config {filename}: {e}")
return {}
def update_extra_watch_config(self, filename, data, merge=True):
"""
Write processor-specific JSON config file to watch data directory.
Args:
filename: Name of JSON file (e.g., "visual_ssim_score.json")
data: Dictionary to serialize as JSON
merge: If True, merge with existing data; if False, overwrite completely
"""
import json
import os
watch = self.datastore.data['watching'].get(self.watch_uuid)
data_dir = watch.data_dir
if not data_dir:
logger.warning(f"Cannot save extra watch config {filename}: no data_dir")
return
# Ensure directory exists
watch.ensure_data_dir_exists()
filepath = os.path.join(data_dir, filename)
try:
# If merge is enabled, read existing data first
existing_data = {}
if merge and os.path.isfile(filepath):
try:
with open(filepath, 'r', encoding='utf-8') as f:
existing_data = json.load(f)
except (json.JSONDecodeError, IOError) as e:
logger.warning(f"Failed to read existing config for merge: {e}")
# Merge new data with existing
if merge:
existing_data.update(data)
data_to_save = existing_data
else:
data_to_save = data
# Write the data
with open(filepath, 'w', encoding='utf-8') as f:
json.dump(data_to_save, f, indent=2)
except IOError as e:
logger.error(f"Failed to write extra watch config {filename}: {e}")
def get_raw_document_checksum(self):
checksum = None
if self.fetcher.content:
checksum = hashlib.md5(self.fetcher.content.encode('utf-8')).hexdigest()
return checksum
@abstractmethod
def run_changedetection(self, watch, force_reprocess=False):
update_obj = {'last_notification_error': False, 'last_error': False}
some_data = 'xxxxx'
update_obj["previous_md5"] = hashlib.md5(some_data.encode('utf-8')).hexdigest()
changed_detected = False
return changed_detected, update_obj, ''.encode('utf-8')

View File

@@ -0,0 +1,132 @@
"""
Base data extraction module for all processors.
This module handles extracting data from watch history using regex patterns
and exporting to CSV format. This is the default extractor that all processors
(text_json_diff, restock_diff, etc.) can use by default or override.
"""
import os
from flask_babel import gettext
from loguru import logger
def render_form(watch, datastore, request, url_for, render_template, flash, redirect, extract_form=None):
"""
Render the data extraction form.
Args:
watch: The watch object
datastore: The ChangeDetectionStore instance
request: Flask request object
url_for: Flask url_for function
render_template: Flask render_template function
flash: Flask flash function
redirect: Flask redirect function
extract_form: Optional pre-built extract form (for error cases)
Returns:
Rendered HTML response with the extraction form
"""
from changedetectionio import forms
uuid = watch.get('uuid')
# Use provided form or create a new one
if extract_form is None:
extract_form = forms.extractDataForm(
formdata=request.form,
data={'extract_regex': request.form.get('extract_regex', '')}
)
# Get error information for the template
screenshot_url = watch.get_screenshot()
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
password_enabled_and_share_is_off = not datastore.data['settings']['application'].get('shared_diff_access')
# Use the shared default template from processors/templates/
# Processors can override this by creating their own extract.py with custom template logic
output = render_template(
"extract.html",
uuid=uuid,
extract_form=extract_form,
watch_a=watch,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
screenshot=screenshot_url,
is_html_webdriver=is_html_webdriver,
password_enabled_and_share_is_off=password_enabled_and_share_is_off,
extra_title=f" - {watch.label} - Extract Data",
extra_stylesheets=[url_for('static_content', group='styles', filename='diff.css')],
pure_menu_fixed=False
)
return output
def process_extraction(watch, datastore, request, url_for, make_response, send_from_directory, flash, redirect, extract_form=None):
"""
Process the data extraction request and return CSV file.
Args:
watch: The watch object
datastore: The ChangeDetectionStore instance
request: Flask request object
url_for: Flask url_for function
make_response: Flask make_response function
send_from_directory: Flask send_from_directory function
flash: Flask flash function
redirect: Flask redirect function
extract_form: Optional pre-built extract form
Returns:
CSV file download response or redirect to form on error
"""
from changedetectionio import forms
uuid = watch.get('uuid')
# Use provided form or create a new one
if extract_form is None:
extract_form = forms.extractDataForm(
formdata=request.form,
data={'extract_regex': request.form.get('extract_regex', '')}
)
if not extract_form.validate():
flash(gettext("An error occurred, please see below."), "error")
# render_template needs to be imported from Flask for this to work
from flask import render_template as flask_render_template
return render_form(
watch=watch,
datastore=datastore,
request=request,
url_for=url_for,
render_template=flask_render_template,
flash=flash,
redirect=redirect,
extract_form=extract_form
)
extract_regex = request.form.get('extract_regex', '').strip()
output = watch.extract_regex_from_all_history(extract_regex)
if output:
watch_dir = os.path.join(datastore.datastore_path, uuid)
response = make_response(send_from_directory(directory=watch_dir, path=output, as_attachment=True))
response.headers['Content-type'] = 'text/csv'
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
response.headers['Expires'] = "0"
return response
flash(gettext('No matches found while scanning all of the watch history for that RegEx.'), 'error')
return redirect(url_for('ui.ui_diff.diff_history_page_extract_GET', uuid=uuid))

View File

@@ -0,0 +1,210 @@
# Fast Screenshot Comparison Processor
Visual/screenshot change detection using ultra-fast image comparison algorithms.
## Overview
This processor uses **OpenCV** by default for screenshot comparison, providing **50-100x faster** performance compared to the previous SSIM implementation while still detecting meaningful visual changes.
## Current Features
- **Ultra-fast OpenCV comparison**: cv2.absdiff with Gaussian blur for noise reduction
- **MD5 pre-check**: Fast identical image detection before expensive comparison
- **Configurable sensitivity**: Threshold-based change detection
- **Three-panel diff view**: Previous | Current | Difference (with red highlights)
- **Direct image support**: Works with browser screenshots AND direct image URLs
- **Visual selector support**: Compare specific page regions using CSS/XPath selectors
- **Download images**: Download any of the three comparison images directly from the diff view
## Performance
- **OpenCV (default)**: 50-100x faster than SSIM
- **Large screenshots**: Automatic downscaling for diff visualization (configurable via `MAX_DIFF_HEIGHT`/`MAX_DIFF_WIDTH`)
- **Memory efficient**: Explicit cleanup of large objects for long-running processes
- **JPEG diff images**: Smaller file sizes, faster rendering
## How It Works
1. **Fetch**: Screenshot captured via browser OR direct image URL fetched
2. **MD5 Check**: Quick hash comparison - if identical, skip comparison
3. **Region Selection** (optional): Crop to specific page region if visual selector is configured
4. **OpenCV Comparison**: Fast pixel-level difference detection with Gaussian blur
5. **Change Detection**: Percentage of changed pixels above threshold = change detected
6. **Visualization**: Generate diff image with red-highlighted changed regions
## Architecture
### Default Method: OpenCV
The processor uses OpenCV's `cv2.absdiff()` for ultra-fast pixel-level comparison:
```python
# Convert to grayscale
gray_from = cv2.cvtColor(image_from, cv2.COLOR_RGB2GRAY)
gray_to = cv2.cvtColor(image_to, cv2.COLOR_RGB2GRAY)
# Apply Gaussian blur (reduces noise, controlled by OPENCV_BLUR_SIGMA env var)
gray_from = cv2.GaussianBlur(gray_from, (0, 0), sigma=0.8)
gray_to = cv2.GaussianBlur(gray_to, (0, 0), sigma=0.8)
# Calculate absolute difference
diff = cv2.absdiff(gray_from, gray_to)
# Apply threshold (default: 30)
_, thresh = cv2.threshold(diff, threshold, 255, cv2.THRESH_BINARY)
# Count changed pixels
change_percentage = (changed_pixels / total_pixels) * 100
```
### Optional: Pixelmatch
For users who need better anti-aliasing detection (especially for text-heavy screenshots), **pixelmatch** can be optionally installed:
```bash
pip install pybind11-pixelmatch>=0.1.3
```
**Note**: Pixelmatch uses a C++17 implementation via pybind11 and may have build issues on some platforms (particularly Alpine/musl systems with symbolic link security restrictions). The application will automatically fall back to OpenCV if pixelmatch is not available.
To use pixelmatch instead of OpenCV, set the environment variable:
```bash
COMPARISON_METHOD=pixelmatch
```
#### When to use pixelmatch:
- Screenshots with lots of text and anti-aliasing
- Need to ignore minor font rendering differences between browser versions
- 10-20x faster than SSIM (but slower than OpenCV)
#### When to stick with OpenCV (default):
- General webpage monitoring
- Maximum performance (50-100x faster than SSIM)
- Simple pixel-level change detection
- Avoid build dependencies (Alpine/musl systems)
## Configuration
### Environment Variables
```bash
# Comparison method (opencv or pixelmatch)
COMPARISON_METHOD=opencv # Default
# OpenCV threshold (0-255, lower = more sensitive)
COMPARISON_THRESHOLD_OPENCV=30 # Default
# Pixelmatch threshold (0-100, mapped to 0-1 scale)
COMPARISON_THRESHOLD_PIXELMATCH=10 # Default
# Gaussian blur sigma for OpenCV (0 = no blur, higher = more blur)
OPENCV_BLUR_SIGMA=0.8 # Default
# Minimum change percentage to trigger detection
OPENCV_MIN_CHANGE_PERCENT=0.1 # Default (0.1%)
PIXELMATCH_MIN_CHANGE_PERCENT=0.1 # Default
# Diff visualization image size limits (pixels)
MAX_DIFF_HEIGHT=8000 # Default
MAX_DIFF_WIDTH=900 # Default
```
### Per-Watch Configuration
- **Comparison Threshold**: Can be configured per-watch in the edit form
- Very low sensitivity (10) - Only major changes
- Low sensitivity (20) - Significant changes
- Medium sensitivity (30) - Moderate changes (default)
- High sensitivity (50) - Small changes
- Very high sensitivity (75) - Any visible change
### Visual Selector (Region Comparison)
Use the "Include filters" field with CSS selectors or XPath to compare only specific page regions:
```
.content-area
//div[@id='main']
```
The processor will automatically crop both screenshots to the bounding box of the first matched element.
## Dependencies
### Required
- `opencv-python-headless>=4.8.0.76` - Fast image comparison
- `Pillow (PIL)` - Image loading and manipulation
- `numpy` - Array operations
### Optional
- `pybind11-pixelmatch>=0.1.3` - Alternative comparison method with anti-aliasing detection
## Change Detection Interpretation
- **0%** = Identical images (or below minimum change threshold)
- **0.1-1%** = Minor differences (anti-aliasing, slight rendering differences)
- **1-5%** = Noticeable changes (text updates, small content changes)
- **5-20%** = Significant changes (layout shifts, content additions)
- **>20%** = Major differences (page redesign, large content changes)
## Technical Notes
### Memory Management
```python
# Explicit cleanup for long-running processes
img.close() # Close PIL Images
buffer.close() # Close BytesIO buffers
del large_array # Mark numpy arrays for GC
```
### Diff Image Generation
- Format: JPEG (quality=85, optimized)
- Highlight: Red overlay (50% blend with original)
- Auto-downscaling: Large screenshots downscaled for faster rendering
- Base64 embedded: For direct template rendering
### OpenCV Blur Parameters
The Gaussian blur reduces sensitivity to:
- Font rendering differences
- Anti-aliasing variations
- JPEG compression artifacts
- Minor pixel shifts (1-2 pixels)
Increase `OPENCV_BLUR_SIGMA` to make comparison more tolerant of these differences.
## Comparison: OpenCV vs Pixelmatch vs SSIM
| Feature | OpenCV | Pixelmatch | SSIM (old) |
|---------|--------|------------|------------|
| **Speed** | 50-100x faster | 10-20x faster | Baseline |
| **Anti-aliasing** | Via blur | Built-in detection | Built-in |
| **Text sensitivity** | High | Medium (AA-aware) | Medium |
| **Dependencies** | opencv-python-headless | pybind11-pixelmatch + C++ compiler | scikit-image |
| **Alpine/musl support** | ✅ Yes | ⚠️ Build issues | ✅ Yes |
| **Memory usage** | Low | Low | High |
| **Best for** | General use, max speed | Text-heavy screenshots | Deprecated |
## Migration from SSIM
If you're upgrading from the old SSIM-based processor:
1. **Thresholds are different**: SSIM used 0-1 scale (higher = more similar), OpenCV uses 0-255 pixel difference (lower = more similar)
2. **Default threshold**: Start with 30 for OpenCV, adjust based on your needs
3. **Performance**: Expect dramatically faster comparisons, especially for large screenshots
4. **Accuracy**: OpenCV is more sensitive to pixel-level changes; increase `OPENCV_BLUR_SIGMA` if you're getting false positives
## Future Enhancements
Potential features for future consideration:
- **Change region detection**: Highlight specific areas that changed with bounding boxes
- **Perceptual hashing**: Pre-screening filter for even faster checks
- **Ignore regions**: Exclude specific page areas (ads, timestamps) from comparison
- **Text extraction**: OCR-based text comparison for semantic changes
- **Adaptive thresholds**: Different sensitivity for different page regions
## Resources
- [OpenCV Documentation](https://docs.opencv.org/)
- [pybind11-pixelmatch GitHub](https://github.com/whtsky/pybind11-pixelmatch)
- [Pixelmatch (original JS library)](https://github.com/mapbox/pixelmatch)

View File

@@ -0,0 +1,39 @@
"""
Visual/screenshot change detection using fast image comparison algorithms.
This processor compares screenshots using OpenCV (cv2.absdiff),
which is 10-100x faster than SSIM while still detecting meaningful visual changes.
"""
import os
from pathlib import Path
processor_description = "Visual/Screenshot change detection (Fast)"
processor_name = "image_ssim_diff"
processor_weight = 2 # Lower weight = appears at top, heavier weight = appears lower (bottom)
# Processor capabilities
supports_visual_selector = True
supports_browser_steps = True
supports_text_filters_and_triggers = False
supports_text_filters_and_triggers_elements = False
supports_request_type = True
PROCESSOR_CONFIG_NAME = f"{Path(__file__).parent.name}.json"
# Subprocess timeout settings
# Maximum time to wait for subprocess operations (seconds)
POLL_TIMEOUT_ABSOLUTE = int(os.getenv('OPENCV_SUBPROCESS_TIMEOUT', '20'))
# Template tracking filename
CROPPED_IMAGE_TEMPLATE_FILENAME = 'cropped_image_template.png'
SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS = [
('200', 'Low sensitivity (only major changes)'),
('80', 'Medium sensitivity (moderate changes - recommended)'),
('20', 'High sensitivity (small changes)'),
('0', 'Very high sensitivity (any change)')
]
SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS_DEFAULT=0.999
OPENCV_BLUR_SIGMA=float(os.getenv("OPENCV_BLUR_SIGMA", "3.0"))

View File

@@ -0,0 +1,441 @@
"""
Screenshot diff visualization for fast image comparison processor.
All image operations now use ImageDiffHandler abstraction for clean separation
of concerns and easy backend swapping (LibVIPS, OpenCV, PIL, etc.).
"""
import os
import json
import time
from flask_babel import gettext
from loguru import logger
from changedetectionio.processors.image_ssim_diff import SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS_DEFAULT, PROCESSOR_CONFIG_NAME, \
OPENCV_BLUR_SIGMA
# All image operations now use OpenCV via isolated_opencv subprocess handler
# No direct handler imports needed - subprocess isolation handles everything
# Maximum dimensions for diff visualization (can be overridden via environment variable)
# Large screenshots don't need full resolution for visual inspection
# Reduced defaults to minimize memory usage - 2000px height is plenty for diff viewing
MAX_DIFF_HEIGHT = int(os.getenv('MAX_DIFF_HEIGHT', '8000'))
MAX_DIFF_WIDTH = int(os.getenv('MAX_DIFF_WIDTH', '900'))
def get_asset(asset_name, watch, datastore, request):
"""
Get processor-specific binary assets for streaming.
Uses ImageDiffHandler for all image operations - no more multiprocessing needed
as LibVIPS handles threading/memory internally.
Supported assets:
- 'before': The previous/from screenshot
- 'after': The current/to screenshot
- 'rendered_diff': The generated diff visualization with red highlights
Args:
asset_name: Name of the asset to retrieve ('before', 'after', 'rendered_diff')
watch: Watch object
datastore: Datastore object
request: Flask request (for from_version/to_version query params)
Returns:
tuple: (binary_data, content_type, cache_control_header) or None if not found
"""
# Get version parameters from query string
versions = list(watch.history.keys())
if len(versions) < 2:
return None
from_version = request.args.get('from_version', versions[-2] if len(versions) >= 2 else versions[0])
to_version = request.args.get('to_version', versions[-1])
# Validate versions exist
if from_version not in versions:
from_version = versions[-2] if len(versions) >= 2 else versions[0]
if to_version not in versions:
to_version = versions[-1]
try:
if asset_name == 'before':
# Return the 'from' screenshot with bounding box if configured
img_bytes = watch.get_history_snapshot(timestamp=from_version)
img_bytes = _draw_bounding_box_if_configured(img_bytes, watch, datastore)
mime_type = _detect_mime_type(img_bytes)
return (img_bytes, mime_type, 'public, max-age=3600')
elif asset_name == 'after':
# Return the 'to' screenshot with bounding box if configured
img_bytes = watch.get_history_snapshot(timestamp=to_version)
img_bytes = _draw_bounding_box_if_configured(img_bytes, watch, datastore)
mime_type = _detect_mime_type(img_bytes)
return (img_bytes, mime_type, 'public, max-age=3600')
elif asset_name == 'rendered_diff':
# Generate diff in isolated subprocess to prevent memory leaks
# Subprocess provides complete memory isolation
from .image_handler import isolated_opencv as process_screenshot_handler
img_bytes_from = watch.get_history_snapshot(timestamp=from_version)
img_bytes_to = watch.get_history_snapshot(timestamp=to_version)
# Get pixel difference threshold sensitivity (per-watch > global)
# This controls how different a pixel must be (0-255 scale) to count as "changed"
from changedetectionio import processors
processor_instance = processors.difference_detection_processor(datastore, watch.get('uuid'))
processor_config = processor_instance.get_extra_watch_config(PROCESSOR_CONFIG_NAME)
pixel_difference_threshold_sensitivity = processor_config.get('pixel_difference_threshold_sensitivity')
if not pixel_difference_threshold_sensitivity:
pixel_difference_threshold_sensitivity = datastore.data['settings']['application'].get(
'pixel_difference_threshold_sensitivity', SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS_DEFAULT)
try:
pixel_difference_threshold_sensitivity = int(pixel_difference_threshold_sensitivity)
except (ValueError, TypeError):
logger.warning(
f"Invalid pixel_difference_threshold_sensitivity value '{pixel_difference_threshold_sensitivity}', using default")
pixel_difference_threshold_sensitivity = SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS_DEFAULT
logger.debug(f"Pixel difference threshold sensitivity is {pixel_difference_threshold_sensitivity}")
# Generate diff in isolated subprocess (async-safe)
import asyncio
import threading
# Async-safe wrapper: runs coroutine in new thread with its own event loop
def run_async_in_thread():
return asyncio.run(
process_screenshot_handler.generate_diff_isolated(
img_bytes_from,
img_bytes_to,
pixel_difference_threshold=int(pixel_difference_threshold_sensitivity),
blur_sigma=OPENCV_BLUR_SIGMA,
max_width=MAX_DIFF_WIDTH,
max_height=MAX_DIFF_HEIGHT
)
)
# Run in thread to avoid blocking event loop if called from async context
result_container = [None]
exception_container = [None]
def thread_target():
try:
result_container[0] = run_async_in_thread()
except Exception as e:
exception_container[0] = e
thread = threading.Thread(target=thread_target, daemon=True, name="ImageDiff-Asset")
thread.start()
thread.join(timeout=60)
if exception_container[0]:
raise exception_container[0]
diff_image_bytes = result_container[0]
if diff_image_bytes:
# Note: Bounding box drawing on diff not yet implemented
return (diff_image_bytes, 'image/jpeg', 'public, max-age=300')
else:
logger.error("Failed to generate diff in subprocess")
return None
else:
# Unknown asset
return None
except Exception as e:
logger.error(f"Failed to get asset '{asset_name}': {e}")
import traceback
logger.error(traceback.format_exc())
return None
def _detect_mime_type(img_bytes):
"""
Detect MIME type using puremagic (same as Watch.py).
Args:
img_bytes: Image bytes
Returns:
str: MIME type (e.g., 'image/png', 'image/jpeg')
"""
try:
import puremagic
detections = puremagic.magic_string(img_bytes[:2048])
if detections:
mime_type = detections[0].mime_type
logger.trace(f"Detected MIME type: {mime_type}")
return mime_type
else:
logger.trace("No MIME type detected, using 'image/png' fallback")
return 'image/png'
except Exception as e:
logger.warning(f"puremagic detection failed: {e}, using 'image/png' fallback")
return 'image/png'
def _draw_bounding_box_if_configured(img_bytes, watch, datastore):
"""
Draw blue bounding box on image if configured in processor settings.
Uses isolated subprocess to prevent memory leaks from large images.
Supports two modes:
- "Select by element": Use include_filter to find xpath element bbox
- "Draw area": Use manually drawn bounding_box from config
Args:
img_bytes: Image bytes (PNG)
watch: Watch object
datastore: Datastore object
Returns:
Image bytes (possibly with bounding box drawn)
"""
try:
# Get processor configuration
from changedetectionio import processors
processor_instance = processors.difference_detection_processor(datastore, watch.get('uuid'))
processor_name = watch.get('processor', 'default')
config_filename = f'{processor_name}.json'
processor_config = processor_instance.get_extra_watch_config(config_filename)
if not processor_config:
return img_bytes
selection_mode = processor_config.get('selection_mode', 'draw')
x, y, width, height = None, None, None, None
# Mode 1: Select by element (use include_filter + xpath_data)
if selection_mode == 'element':
include_filters = watch.get('include_filters', [])
if include_filters and len(include_filters) > 0:
first_filter = include_filters[0].strip()
# Get xpath_data from watch history
history_keys = list(watch.history.keys())
if history_keys:
latest_snapshot = watch.get_history_snapshot(timestamp=history_keys[-1])
xpath_data_path = watch.get_xpath_data_filepath(timestamp=history_keys[-1])
try:
import gzip
with gzip.open(xpath_data_path, 'rt') as f:
xpath_data = json.load(f)
# Find matching element
for element in xpath_data.get('size_pos', []):
if element.get('xpath') == first_filter and element.get('highlight_as_custom_filter'):
x = element.get('left', 0)
y = element.get('top', 0)
width = element.get('width', 0)
height = element.get('height', 0)
logger.debug(f"Found element bbox for filter '{first_filter}': x={x}, y={y}, w={width}, h={height}")
break
except Exception as e:
logger.warning(f"Failed to load xpath_data for element selection: {e}")
# Mode 2: Draw area (use manually configured bbox)
else:
bounding_box = processor_config.get('bounding_box')
if bounding_box:
# Parse bounding box: "x,y,width,height"
parts = [int(p.strip()) for p in bounding_box.split(',')]
if len(parts) == 4:
x, y, width, height = parts
else:
logger.warning(f"Invalid bounding box format: {bounding_box}")
# If no bbox found, return original image
if x is None or y is None or width is None or height is None:
return img_bytes
# Use isolated subprocess to prevent memory leaks from large images
from .image_handler import isolated_opencv
import asyncio
import threading
# Async-safe wrapper: runs coroutine in new thread with its own event loop
# This prevents blocking when called from async context (update worker)
def run_async_in_thread():
return asyncio.run(
isolated_opencv.draw_bounding_box_isolated(
img_bytes, x, y, width, height,
color=(255, 0, 0), # Blue in BGR format
thickness=3
)
)
# Always run in thread to avoid blocking event loop if called from async context
result_container = [None]
exception_container = [None]
def thread_target():
try:
result_container[0] = run_async_in_thread()
except Exception as e:
exception_container[0] = e
thread = threading.Thread(target=thread_target, daemon=True, name="ImageDiff-BoundingBox")
thread.start()
thread.join(timeout=15)
if exception_container[0]:
raise exception_container[0]
result = result_container[0]
# Return result or original if subprocess failed
return result if result else img_bytes
except Exception as e:
logger.warning(f"Failed to draw bounding box: {e}")
import traceback
logger.debug(traceback.format_exc())
return img_bytes
def render(watch, datastore, request, url_for, render_template, flash, redirect):
"""
Render the screenshot comparison diff page.
Uses ImageDiffHandler for all image operations.
Args:
watch: Watch object
datastore: Datastore object
request: Flask request
url_for: Flask url_for function
render_template: Flask render_template function
flash: Flask flash function
redirect: Flask redirect function
Returns:
Rendered template or redirect
"""
# Get version parameters (from_version, to_version)
versions = list(watch.history.keys())
if len(versions) < 2:
flash(gettext("Not enough history to compare. Need at least 2 snapshots."), "error")
return redirect(url_for('watchlist.index'))
# Default: compare latest two versions
from_version = request.args.get('from_version', versions[-2] if len(versions) >= 2 else versions[0])
to_version = request.args.get('to_version', versions[-1])
# Validate versions exist
if from_version not in versions:
from_version = versions[-2] if len(versions) >= 2 else versions[0]
if to_version not in versions:
to_version = versions[-1]
# Get pixel difference threshold sensitivity (per-watch > global > env default)
pixel_difference_threshold_sensitivity = watch.get('pixel_difference_threshold_sensitivity')
if not pixel_difference_threshold_sensitivity or pixel_difference_threshold_sensitivity == '':
pixel_difference_threshold_sensitivity = datastore.data['settings']['application'].get('pixel_difference_threshold_sensitivity', SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS_DEFAULT)
# Convert to appropriate type
try:
pixel_difference_threshold_sensitivity = float(pixel_difference_threshold_sensitivity)
except (ValueError, TypeError):
logger.warning(f"Invalid pixel_difference_threshold_sensitivity value '{pixel_difference_threshold_sensitivity}', using default")
pixel_difference_threshold_sensitivity = 30.0
# Get blur sigma
blur_sigma = OPENCV_BLUR_SIGMA
# Load screenshots from history
try:
img_bytes_from = watch.get_history_snapshot(timestamp=from_version)
img_bytes_to = watch.get_history_snapshot(timestamp=to_version)
except Exception as e:
logger.error(f"Failed to load screenshots: {e}")
flash(gettext("Failed to load screenshots: {}").format(e), "error")
return redirect(url_for('watchlist.index'))
# Calculate change percentage using isolated subprocess to prevent memory leaks (async-safe)
now = time.time()
try:
from .image_handler import isolated_opencv as process_screenshot_handler
import asyncio
import threading
# Async-safe wrapper: runs coroutine in new thread with its own event loop
def run_async_in_thread():
return asyncio.run(
process_screenshot_handler.calculate_change_percentage_isolated(
img_bytes_from,
img_bytes_to,
pixel_difference_threshold=int(pixel_difference_threshold_sensitivity),
blur_sigma=blur_sigma,
max_width=MAX_DIFF_WIDTH,
max_height=MAX_DIFF_HEIGHT
)
)
# Run in thread to avoid blocking event loop if called from async context
result_container = [None]
exception_container = [None]
def thread_target():
try:
result_container[0] = run_async_in_thread()
except Exception as e:
exception_container[0] = e
thread = threading.Thread(target=thread_target, daemon=True, name="ImageDiff-ChangePercentage")
thread.start()
thread.join(timeout=60)
if exception_container[0]:
raise exception_container[0]
change_percentage = result_container[0]
method_display = f"{process_screenshot_handler.IMPLEMENTATION_NAME} (pixel_diff_threshold: {pixel_difference_threshold_sensitivity:.0f})"
logger.debug(f"Done change percentage calculation in {time.time() - now:.2f}s")
except Exception as e:
logger.error(f"Failed to calculate change percentage: {e}")
import traceback
logger.error(traceback.format_exc())
flash(gettext("Failed to calculate diff: {}").format(e), "error")
return redirect(url_for('watchlist.index'))
# Load historical data if available (for charts/visualization)
comparison_data = {}
comparison_config_path = os.path.join(watch.data_dir, "visual_comparison_data.json")
if os.path.isfile(comparison_config_path):
try:
with open(comparison_config_path, 'r') as f:
comparison_data = json.load(f)
except Exception as e:
logger.warning(f"Failed to load comparison history data: {e}")
# Render custom template
# Template path is namespaced to avoid conflicts with other processors
# Images are now served via separate /processor-asset/ endpoints instead of base64
return render_template(
'image_ssim_diff/diff.html',
change_percentage=change_percentage,
comparison_data=comparison_data, # Full history for charts/visualization
comparison_method=method_display,
current_diff_url=watch['url'],
from_version=from_version,
percentage_different=change_percentage,
threshold=pixel_difference_threshold_sensitivity,
to_version=to_version,
uuid=watch.get('uuid'),
versions=versions,
watch=watch,
)

View File

@@ -0,0 +1,151 @@
"""
Optional hook called when processor settings are saved in edit page.
This hook analyzes the selected region to determine if template matching
should be enabled for tracking content movement.
Template matching is controlled via ENABLE_TEMPLATE_TRACKING env var (default: False).
"""
import io
import os
from loguru import logger
from changedetectionio import strtobool
from . import CROPPED_IMAGE_TEMPLATE_FILENAME
# Template matching controlled via environment variable (default: disabled)
# Set ENABLE_TEMPLATE_TRACKING=True to enable
TEMPLATE_MATCHING_ENABLED = strtobool(os.getenv('ENABLE_TEMPLATE_TRACKING', 'False'))
IMPORT_ERROR = "Template matching disabled (set ENABLE_TEMPLATE_TRACKING=True to enable)"
def on_config_save(watch, processor_config, datastore):
"""
Called after processor config is saved in edit page.
Analyzes the bounding box region to determine if it has enough
visual features (texture/edges) to enable template matching for
tracking content movement when page layout shifts.
Args:
watch: Watch object
processor_config: Dict of processor-specific config
datastore: Datastore object
Returns:
dict: Updated processor_config with auto_track_region setting
"""
# Check if template matching is globally enabled via ENV var
if not TEMPLATE_MATCHING_ENABLED:
logger.debug("Template tracking disabled via ENABLE_TEMPLATE_TRACKING env var")
processor_config['auto_track_region'] = False
return processor_config
bounding_box = processor_config.get('bounding_box')
if not bounding_box:
# No bounding box, disable tracking
processor_config['auto_track_region'] = False
logger.debug("No bounding box set, disabled auto-tracking")
return processor_config
try:
# Get the latest screenshot from watch history
history_keys = list(watch.history.keys())
if len(history_keys) == 0:
logger.warning("No screenshot history available yet, cannot analyze for tracking")
processor_config['auto_track_region'] = False
return processor_config
# Get latest screenshot
latest_timestamp = history_keys[-1]
screenshot_bytes = watch.get_history_snapshot(timestamp=latest_timestamp)
if not screenshot_bytes:
logger.warning("Could not load screenshot for analysis")
processor_config['auto_track_region'] = False
return processor_config
# Parse bounding box
parts = [int(p.strip()) for p in bounding_box.split(',')]
if len(parts) != 4:
logger.warning("Invalid bounding box format")
processor_config['auto_track_region'] = False
return processor_config
x, y, width, height = parts
# Analyze the region for features/texture
has_enough_features = analyze_region_features(screenshot_bytes, x, y, width, height)
if has_enough_features:
logger.info(f"Region has sufficient features for tracking - enabling auto_track_region")
processor_config['auto_track_region'] = True
# Save the template as cropped.jpg in watch data directory
save_template_to_file(watch, screenshot_bytes, x, y, width, height)
else:
logger.info(f"Region lacks distinctive features - disabling auto_track_region")
processor_config['auto_track_region'] = False
# Remove old template file if exists
template_path = os.path.join(watch.data_dir, CROPPED_IMAGE_TEMPLATE_FILENAME)
if os.path.exists(template_path):
os.remove(template_path)
logger.debug(f"Removed old template file: {template_path}")
return processor_config
except Exception as e:
logger.error(f"Error analyzing region for tracking: {e}")
processor_config['auto_track_region'] = False
return processor_config
def analyze_region_features(screenshot_bytes, x, y, width, height):
"""
Analyze if a region has enough visual features for template matching.
Uses OpenCV to detect corners/edges. If the region has distinctive
features, template matching can reliably track it when it moves.
Args:
screenshot_bytes: Full screenshot as bytes
x, y, width, height: Bounding box coordinates
Returns:
bool: True if region has enough features, False otherwise
"""
# Template matching disabled - would need OpenCV implementation for region analysis
if not TEMPLATE_MATCHING_ENABLED:
logger.warning(f"Cannot analyze region features: {IMPORT_ERROR}")
return False
# Note: Original implementation used LibVIPS handler to crop region, then OpenCV
# for feature detection (goodFeaturesToTrack, Canny edge detection, variance).
# If re-implementing, use OpenCV directly for both cropping and analysis.
# Feature detection would use: cv2.goodFeaturesToTrack, cv2.Canny, np.var
return False
def save_template_to_file(watch, screenshot_bytes, x, y, width, height):
"""
Extract the template region and save as cropped_image_template.png in watch data directory.
This is a convenience wrapper around handler.save_template() that handles
watch directory setup and path construction.
Args:
watch: Watch object
screenshot_bytes: Full screenshot as bytes
x, y, width, height: Bounding box coordinates
"""
# Template matching disabled - would need OpenCV implementation for template saving
if not TEMPLATE_MATCHING_ENABLED:
logger.warning(f"Cannot save template: {IMPORT_ERROR}")
return
# Note: Original implementation used LibVIPS handler to crop and save region.
# If re-implementing, use OpenCV (cv2.imdecode, crop with array slicing, cv2.imwrite).
return

View File

@@ -0,0 +1,120 @@
"""
Configuration forms for fast screenshot comparison processor.
"""
from wtforms import SelectField, StringField, validators, ValidationError, IntegerField
from flask_babel import lazy_gettext as _l
from changedetectionio.forms import processor_text_json_diff_form
import re
from changedetectionio.processors.image_ssim_diff import SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS
def validate_bounding_box(form, field):
"""Validate bounding box format: x,y,width,height with integers."""
if not field.data:
return # Optional field
if len(field.data) > 100:
raise ValidationError(_l('Bounding box value is too long'))
# Should be comma-separated integers
if not re.match(r'^\d+,\d+,\d+,\d+$', field.data):
raise ValidationError(_l('Bounding box must be in format: x,y,width,height (integers only)'))
# Validate values are reasonable (not negative, not ridiculously large)
parts = [int(p) for p in field.data.split(',')]
for part in parts:
if part < 0:
raise ValidationError(_l('Bounding box values must be non-negative'))
if part > 10000: # Reasonable max screen dimension
raise ValidationError(_l('Bounding box values are too large'))
def validate_selection_mode(form, field):
"""Validate selection mode value."""
if not field.data:
return # Optional field
if field.data not in ['element', 'draw']:
raise ValidationError(_l('Selection mode must be either "element" or "draw"'))
class processor_settings_form(processor_text_json_diff_form):
"""Form for fast image comparison processor settings."""
processor_config_min_change_percentage = IntegerField(
_l('Minimum Change Percentage'),
validators=[
validators.Optional(),
validators.NumberRange(min=1, max=100, message=_l('Must be between 0 and 100'))
],
render_kw={"placeholder": "Use global default (0.1)"}
)
processor_config_pixel_difference_threshold_sensitivity = SelectField(
_l('Pixel Difference Sensitivity'),
choices=[
('', _l('Use global default'))
] + SCREENSHOT_COMPARISON_THRESHOLD_OPTIONS,
validators=[validators.Optional()],
default=''
)
# Processor-specific config fields (stored in separate JSON file)
processor_config_bounding_box = StringField(
_l('Bounding Box'),
validators=[
validators.Optional(),
validators.Length(max=100, message=_l('Bounding box value is too long')),
validate_bounding_box
],
render_kw={"style": "display: none;", "id": "bounding_box"}
)
processor_config_selection_mode = StringField(
_l('Selection Mode'),
validators=[
validators.Optional(),
validators.Length(max=20, message=_l('Selection mode value is too long')),
validate_selection_mode
],
render_kw={"style": "display: none;", "id": "selection_mode"}
)
def extra_tab_content(self):
"""Tab label for processor-specific settings."""
return _l('Screenshot Comparison')
def extra_form_content(self):
"""Render processor-specific form fields.
@NOTE: prepend processor_config_* to the field name so it will save into its own datadir/uuid/image_ssim_diff.json and be read at process time
"""
return '''
{% from '_helpers.html' import render_field %}
<fieldset>
<legend>Screenshot Comparison Settings</legend>
<div class="pure-control-group">
{{ render_field(form.processor_config_min_change_percentage) }}
<span class="pure-form-message-inline">
<strong>What percentage of pixels must change to trigger a detection?</strong><br>
For example, <strong>0.1%</strong> means if 0.1% or more of the pixels change, it counts as a change.<br>
Lower values = more sensitive (detect smaller changes).<br>
Higher values = less sensitive (only detect larger changes).<br>
Leave blank to use global default (0.1%).
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.processor_config_pixel_difference_threshold_sensitivity) }}
<span class="pure-form-message-inline">
<strong>How different must an individual pixel be to count as "changed"?</strong><br>
<strong>Low sensitivity (75)</strong> = Only count pixels that changed significantly (0-255 scale).<br>
<strong>High sensitivity (20)</strong> = Count pixels with small changes as different.<br>
<strong>Very high (0)</strong> = Any pixel change counts.<br>
Select "Use global default" to inherit the system-wide setting.
</span>
</div>
</fieldset>
'''

View File

@@ -0,0 +1,242 @@
"""
Abstract base class for image processing operations.
All image operations for the image_ssim_diff processor must be implemented
through this interface to allow different backends (libvips, OpenCV, PIL, etc.).
"""
from abc import ABC, abstractmethod
from typing import Tuple, Optional, Any
class ImageDiffHandler(ABC):
"""
Abstract base class for image processing operations.
Implementations must handle all image operations needed for screenshot
comparison including loading, cropping, resizing, diffing, and overlays.
"""
@abstractmethod
def load_from_bytes(self, img_bytes: bytes) -> Any:
"""
Load image from bytes.
Args:
img_bytes: Image data as bytes (PNG, JPEG, etc.)
Returns:
Handler-specific image object
"""
pass
@abstractmethod
def save_to_bytes(self, img: Any, format: str = 'png', quality: int = 85) -> bytes:
"""
Save image to bytes.
Args:
img: Handler-specific image object
format: Output format ('png' or 'jpeg')
quality: Quality for JPEG (1-100)
Returns:
Image data as bytes
"""
pass
@abstractmethod
def crop(self, img: Any, left: int, top: int, right: int, bottom: int) -> Any:
"""
Crop image to specified region.
Args:
img: Handler-specific image object
left: Left coordinate
top: Top coordinate
right: Right coordinate
bottom: Bottom coordinate
Returns:
Cropped image object
"""
pass
@abstractmethod
def resize(self, img: Any, max_width: int, max_height: int) -> Any:
"""
Resize image maintaining aspect ratio.
Args:
img: Handler-specific image object
max_width: Maximum width in pixels
max_height: Maximum height in pixels
Returns:
Resized image object
"""
pass
@abstractmethod
def get_dimensions(self, img: Any) -> Tuple[int, int]:
"""
Get image dimensions.
Args:
img: Handler-specific image object
Returns:
Tuple of (width, height)
"""
pass
@abstractmethod
def to_grayscale(self, img: Any) -> Any:
"""
Convert image to grayscale.
Args:
img: Handler-specific image object
Returns:
Grayscale image object
"""
pass
@abstractmethod
def gaussian_blur(self, img: Any, sigma: float) -> Any:
"""
Apply Gaussian blur to image.
Args:
img: Handler-specific image object
sigma: Blur sigma value (0 = no blur)
Returns:
Blurred image object
"""
pass
@abstractmethod
def absolute_difference(self, img1: Any, img2: Any) -> Any:
"""
Calculate absolute difference between two images.
Args:
img1: First image (handler-specific object)
img2: Second image (handler-specific object)
Returns:
Difference image object
"""
pass
@abstractmethod
def threshold(self, img: Any, threshold_value: int) -> Tuple[float, Any]:
"""
Apply threshold to image and calculate change percentage.
Args:
img: Handler-specific image object (typically grayscale difference)
threshold_value: Threshold value (0-255)
Returns:
Tuple of (change_percentage, binary_mask)
- change_percentage: Percentage of pixels above threshold (0-100)
- binary_mask: Handler-specific binary mask object
"""
pass
@abstractmethod
def apply_red_overlay(self, img: Any, mask: Any) -> bytes:
"""
Apply red overlay to image where mask is True.
Args:
img: Handler-specific image object (color)
mask: Handler-specific binary mask object
Returns:
JPEG bytes with red overlay applied
"""
pass
@abstractmethod
def close(self, img: Any) -> None:
"""
Clean up image resources if needed.
Args:
img: Handler-specific image object
"""
pass
@abstractmethod
def find_template(
self,
img: Any,
template_img: Any,
original_bbox: Tuple[int, int, int, int],
search_tolerance: float = 0.2
) -> Optional[Tuple[int, int, int, int]]:
"""
Find template in image using template matching.
Args:
img: Handler-specific image object to search in
template_img: Handler-specific template image object to find
original_bbox: Original bounding box (left, top, right, bottom)
search_tolerance: How far to search (0.2 = ±20% of region size)
Returns:
New bounding box (left, top, right, bottom) or None if not found
"""
pass
@abstractmethod
def save_template(
self,
img: Any,
bbox: Tuple[int, int, int, int],
output_path: str
) -> bool:
"""
Save a cropped region as a template file.
Args:
img: Handler-specific image object
bbox: Bounding box to crop (left, top, right, bottom)
output_path: Where to save the template PNG
Returns:
True if successful, False otherwise
"""
pass
@abstractmethod
def draw_bounding_box(
self,
img_bytes: bytes,
x: int,
y: int,
width: int,
height: int,
color: Tuple[int, int, int] = (255, 0, 0),
thickness: int = 3
) -> bytes:
"""
Draw a bounding box rectangle on image.
Args:
img_bytes: Image data as bytes
x: Left coordinate
y: Top coordinate
width: Box width
height: Box height
color: BGR color tuple (default: blue)
thickness: Line thickness in pixels
Returns:
Image bytes with bounding box drawn
"""
pass

Some files were not shown because too many files have changed in this diff Show More