Compare commits

...

85 Commits

Author SHA1 Message Date
dgtlmoon fbbe9cf256 hostname connect fix for playwright etc
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-27 10:44:54 +01:00
dgtlmoon 87839e1ca5 Maybe? 2026-03-27 10:44:08 +01:00
dgtlmoon 9dff53726c test and ui tweaks 2026-03-27 10:34:45 +01:00
dgtlmoon 835230fced test fix 2026-03-27 10:22:53 +01:00
dgtlmoon a747e0d62e fix for saving/encoding settings 2026-03-27 10:20:18 +01:00
dgtlmoon 63929b2d05 test tweaks 2026-03-27 10:20:10 +01:00
dgtlmoon e0d0f4a429 Move UA browser profiles only
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-26 22:34:29 +01:00
dgtlmoon 6f3a68b70b Selenium doesnt support headers 2026-03-26 22:23:13 +01:00
dgtlmoon 25cee9cb0c test tweaks 2026-03-26 22:15:27 +01:00
dgtlmoon 2c06d5cc85 test tweak 2026-03-26 22:00:13 +01:00
dgtlmoon faf1092c7a serialise 2026-03-26 21:50:48 +01:00
dgtlmoon 30a94c0624 test tweak 2026-03-26 21:49:16 +01:00
dgtlmoon 47e343af18 test tweak 2026-03-26 21:40:26 +01:00
dgtlmoon 1d2fffddb3 Not needed it will choose the right default 2026-03-26 19:44:11 +01:00
dgtlmoon 31ca4cc0f2 WIP 2026-03-26 19:42:41 +01:00
dgtlmoon 1fcc08d828 Small tidyup 2026-03-26 19:39:09 +01:00
dgtlmoon b265c1ce07 woops 2026-03-26 19:33:18 +01:00
dgtlmoon 2a0ccba24c Attempt to make selenium handle non200s 2026-03-26 19:28:28 +01:00
dgtlmoon 545ec8e40c Fetcher fix 2026-03-26 19:24:00 +01:00
dgtlmoon 9996b8582b Fixing logging and exceptions for non200s 2026-03-26 19:17:17 +01:00
dgtlmoon b3029e97b4 oops 2026-03-26 18:37:34 +01:00
dgtlmoon 34f4fd9baa test tweaks 2026-03-26 18:25:34 +01:00
dgtlmoon f74822009a fixing indentionat 2026-03-26 16:09:09 +01:00
dgtlmoon 7b630bc75f Proxy fixes 2026-03-26 16:07:22 +01:00
dgtlmoon 7f101f7ab4 tweaks 2026-03-26 15:51:11 +01:00
dgtlmoon bc64f0de9b Tweaks 2026-03-26 15:41:10 +01:00
dgtlmoon d0126b5950 Simplify 2026-03-26 15:35:32 +01:00
dgtlmoon 6cc7939744 simplify 2026-03-26 15:25:03 +01:00
dgtlmoon 93e48eff23 WIP 2026-03-26 15:16:30 +01:00
dgtlmoon 498ff689c4 Merge branch 'master' into browser-settings-refactor 2026-03-26 14:49:58 +01:00
dgtlmoon 4ba5f6a003 0.54.7
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-26 09:50:02 +01:00
dgtlmoon 05fc885108 Translations - recompiling 2026-03-26 09:47:02 +01:00
Jaroslav Lichtblau f37e448411 fix: Czech translation strings updated (#4008) 2026-03-26 09:45:23 +01:00
dgtlmoon dadc804567 Security: XPath json-doc() Arbitrary File Read Bypass ( Similar fix as CVE-2026-29039 ) 2026-03-26 09:44:17 +01:00
dgtlmoon 65517a9c74 CVE-2026-33981 - Environment Variable Disclosure via jq env Builtin in Include Filters 2026-03-26 09:33:52 +01:00
dgtlmoon 33ab108597 tweaks
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-25 10:02:34 +01:00
dgtlmoon 985d9c999a Default browser should be listed with its name
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-24 18:29:53 +01:00
dgtlmoon 6c3655283d Move blueprint for browser settings to inside settings 2026-03-24 18:23:49 +01:00
dgtlmoon ba59c2ea17 Adding wider playwright support 2026-03-24 18:14:43 +01:00
dgtlmoon 86f447942d WIP 2026-03-24 18:14:30 +01:00
dgtlmoon 72079d4973 refactor 2026-03-24 17:55:36 +01:00
dgtlmoon 52f1902099 remove extra import 2026-03-24 17:37:39 +01:00
dgtlmoon 064bb32fa2 Removing the technical debt of html_ prefix for fetchers, was confusing 2026-03-24 17:32:04 +01:00
dgtlmoon fd657dfb36 test fix 2026-03-24 16:39:47 +01:00
dgtlmoon dc4485e620 tweaks 2026-03-24 16:36:59 +01:00
dgtlmoon 0dfd348287 API tweak 2026-03-24 16:17:32 +01:00
dgtlmoon 64aa0917bf form edit tweaks 2026-03-24 14:56:48 +01:00
dgtlmoon e8c8be94e6 tweaks 2026-03-24 14:44:15 +01:00
dgtlmoon a69d14f5ee fixing extensible icons 2026-03-24 14:31:37 +01:00
dgtlmoon 713430f03b tweak 2026-03-24 14:14:40 +01:00
dgtlmoon fbb36d6a39 test tweak 2026-03-24 14:05:47 +01:00
dgtlmoon 8cce643aab lint fix 2026-03-24 13:14:31 +01:00
dgtlmoon 3801e21898 Better configuration of browsers 2026-03-24 13:11:48 +01:00
dgtlmoon 17002b5b23 UI - Settings - Dont let 'password' field autocomplete (chrome)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-03-24 09:32:34 +01:00
dgtlmoon c4b890f4fa last_error should be cleared if page content was the same and there was no error (#3997)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-21 20:18:05 +01:00
P. León 2ab172408d fix: correct critical errors in Spanish (es) translation (#3994) 2026-03-21 19:53:13 +01:00
dgtlmoon b98f55030a Restock - Add previous_price to restock values #3987 (#3993)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-20 18:43:36 +01:00
dgtlmoon 6181b09b16 UI - Scan/check all proxies - Regression fix from earlier refactor
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-20 11:40:45 +01:00
dgtlmoon 5f9fa15a6a Realtime - Suppress socket.io errors in logs (#3991)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-19 18:02:07 +01:00
dgtlmoon 34c2c05bc5 UI - Text tidyup (#3989) 2026-03-19 15:57:05 +01:00
dgtlmoon 0da8dfb09a 0.54.6
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-03-17 11:53:33 +01:00
dgtlmoon b747e06c3e SONP - Attempt to strip out JSONP, treat as plaintext (#3983 #3982) 2026-03-17 11:10:48 +01:00
dgtlmoon 5a4266069b Content Fetchers / Browsers - Improvements for pluggable extra fetchers/browsers. (#3981)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-15 17:35:46 +01:00
Yunhao Jiang 36269717b2 fix: add commit calls for pause and mute operations (#3978)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-13 11:32:15 +01:00
dependabot[bot] 84f2629a0c Bump apprise from 1.9.7 to 1.9.8 (#3979) 2026-03-13 10:00:12 +01:00
dgtlmoon e9d740bd49 0.54.5
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-12 17:11:21 +01:00
dgtlmoon c18421fbe9 CI - YML tidyup 2026-03-12 16:46:14 +01:00
dgtlmoon f29d6a857b Docker image - Improving org.opencontainers labels for dev containers 2026-03-12 16:41:45 +01:00
dgtlmoon fcfe089a53 Docker image - Improving org.opencontainers labels #3794 2026-03-12 16:36:07 +01:00
dgtlmoon b32617d700 API - Invert changes_only flag for include_equal parameter, add test, fixes changesOnly option for history diff API call (#3976) 2026-03-12 16:15:37 +01:00
dgtlmoon 380d8a26a1 UI - Fixing Preview "GO" version button (#3969)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-03-10 11:52:58 +01:00
dgtlmoon 02c03fc32b API - Create (POST) tag/group through API do not save processor_config_restock_diff values #3966 (#3968) 2026-03-10 11:19:59 +01:00
Adrián González db3d38b3ee Add complete Spanish translation (es) (#3961)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-09 14:45:56 +01:00
dgtlmoon ecd8af94f6 Various memory and CPU improvements (#3960)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-08 14:32:50 +01:00
dependabot[bot] e400e463a4 CI - Bump the all group with 5 updates (#3955)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-06 09:31:10 +01:00
dgtlmoon 9d355b8f05 UI - Restock/pricing - Handle when price amount is sometimes string or integer (#3950)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-05 16:32:26 +01:00
dgtlmoon da43a17541 Content fetching -Better detection of other encodings, Replace/upgrade broken UTF-8 , Ensure rest of retrieved content is UTF-8 for the app (#3954) 2026-03-05 12:53:02 +01:00
dgtlmoon 904eaaaaf7 Restock - No need to extract the text because it's not used anyway (#3951)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-05 10:51:17 +01:00
dgtlmoon 1e12ae404f 0.54.4
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
CodeQL / Analyze (javascript) (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
2026-03-04 11:30:05 +01:00
dgtlmoon ec7d56f85d CVE-2026-29038 - Reflected XSS in RSS Tag Error Response 2026-03-04 10:56:02 +01:00
dgtlmoon 417d57e574 CVE-2026-29039 - XPath - Arbitrary File Read via unparsed-text() 2026-03-04 10:54:52 +01:00
dgtlmoon 1d7d812eb0 CVE-2026-29065 - fix(backups): patch zip slip advisory, zip bomb, upload size limit, UUID validation, secret.txt leakage, and download edge cases 2026-03-04 10:52:58 +01:00
dgtlmoon 524393a1fb Updating API docs with better processor plugin info (#3942)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (alpine) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/amd64 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v7 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm/v8 (main) (push) Has been cancelled
ChangeDetection.io Container Build Test / Build linux/arm64 (main) (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-14 (push) Has been cancelled
2026-03-02 12:12:39 +01:00
dgtlmoon b09ebcbef6 Python 3.14 CI test and support (#3941) 2026-03-02 11:28:52 +01:00
Muhammet Eren Karakuş 30ac10ff24 fix(i18n): accept translated confirmation text when clearing snapshot history (#3940)
Build and push containers / metadata (push) Has been cancelled
Build and push containers / build-push-containers (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Build distribution 📦 (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Test the built package works basically. (push) Has been cancelled
Publish Python 🐍distribution 📦 to PyPI and TestPyPI / Publish Python 🐍 distribution 📦 to PyPI (push) Has been cancelled
ChangeDetection.io App Test / lint-code (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-10 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-11 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-12 (push) Has been cancelled
ChangeDetection.io App Test / test-application-3-13 (push) Has been cancelled
2026-03-02 10:44:42 +01:00
140 changed files with 7821 additions and 1444 deletions
+26 -9
View File
@@ -66,27 +66,27 @@ jobs:
echo ${{ github.ref }} > changedetectionio/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
uses: docker/setup-qemu-action@v4
with:
image: tonistiigi/binfmt:latest
platforms: all
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Login to Docker Hub Container Registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
with:
install: true
version: latest
@@ -95,7 +95,7 @@ jobs:
# master branch -> :dev container tag
- name: Docker meta :dev
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
id: meta_dev
with:
images: |
@@ -103,11 +103,19 @@ jobs:
ghcr.io/${{ github.repository }}
tags: |
type=raw,value=dev
labels: |
org.opencontainers.image.created=${{ github.event.release.published_at }}
org.opencontainers.image.description=Website, webpage change detection, monitoring and notifications.
org.opencontainers.image.documentation=https://changedetection.io
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=https://github.com/dgtlmoon/changedetection.io
org.opencontainers.image.title=changedetection.io
org.opencontainers.image.url=https://changedetection.io
- name: Build and push :dev
id: docker_build
if: ${{ github.ref == 'refs/heads/master' && github.event_name != 'release' }}
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
with:
context: ./
file: ./Dockerfile
@@ -128,10 +136,10 @@ jobs:
echo "Release tag: ${{ github.event.release.tag_name }}"
echo "Github ref: ${{ github.ref }}"
echo "Github ref name: ${{ github.ref_name }}"
- name: Docker meta :tag
if: github.event_name == 'release' && startsWith(github.event.release.tag_name, '0.')
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
id: meta
with:
images: |
@@ -142,11 +150,20 @@ jobs:
type=semver,pattern={{major}}.{{minor}},value=${{ github.event.release.tag_name }}
type=semver,pattern={{major}},value=${{ github.event.release.tag_name }}
type=raw,value=latest
labels: |
org.opencontainers.image.created=${{ github.event.release.published_at }}
org.opencontainers.image.description=Website, webpage change detection, monitoring and notifications.
org.opencontainers.image.documentation=https://changedetection.io
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=https://github.com/dgtlmoon/changedetection.io
org.opencontainers.image.title=changedetection.io
org.opencontainers.image.url=https://changedetection.io
org.opencontainers.image.version=${{ github.event.release.tag_name }}
- name: Build and push :tag
id: docker_build_tag_release
if: github.event_name == 'release' && startsWith(github.event.release.tag_name, '0.')
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
with:
context: ./
file: ./Dockerfile
+3 -3
View File
@@ -60,14 +60,14 @@ jobs:
# Just test that the build works, some libraries won't compile on ARM/rPi etc
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
uses: docker/setup-qemu-action@v4
with:
image: tonistiigi/binfmt:latest
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
with:
install: true
version: latest
@@ -75,7 +75,7 @@ jobs:
- name: Test that the docker containers can build (${{ matrix.platform }} - ${{ matrix.dockerfile }})
id: docker_build
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
# https://github.com/docker/build-push-action#customizing
with:
context: ./
+10 -1
View File
@@ -52,4 +52,13 @@ jobs:
uses: ./.github/workflows/test-stack-reusable-workflow.yml
with:
python-version: '3.13'
skip-pypuppeteer: true
skip-pypuppeteer: true
test-application-3-14:
#if: github.event_name == 'push' && github.ref == 'refs/heads/master'
needs: lint-code
uses: ./.github/workflows/test-stack-reusable-workflow.yml
with:
python-version: '3.14'
skip-pypuppeteer: false
@@ -42,10 +42,10 @@ jobs:
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
- name: Build changedetection.io container for testing under Python ${{ env.PYTHON_VERSION }}
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
with:
context: ./
file: ./Dockerfile
@@ -292,8 +292,8 @@ jobs:
- name: Specific tests in built container for Selenium
run: |
docker run --rm -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio;pytest tests/fetchers/test_content.py && pytest tests/test_errorhandling.py'
docker run --name "changedet" --hostname changedet --rm -e "FLASK_SERVER_NAME=changedet" -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio; pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/fetchers/test_content.py'
docker run --name "changedet" --hostname changedet --rm -e "FLASK_SERVER_NAME=changedet" -e "WEBDRIVER_URL=http://selenium:4444/wd/hub" --network changedet-network test-changedetectionio bash -c 'cd changedetectionio; pytest --live-server-host=0.0.0.0 --live-server-port=5004 tests/test_errorhandling.py'
# SMTP tests
smtp-tests:
@@ -706,7 +706,19 @@ jobs:
- name: Check upgrade works without error
run: |
echo "=== Testing upgrade path from 0.49.1 to ${{ github.ref_name }} (${{ github.sha }}) ==="
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
g++ \
gcc \
libc-dev \
libffi-dev \
libjpeg-dev \
libssl-dev \
libxslt-dev \
make \
patch \
pkg-config \
zlib1g-dev
# Checkout old version and create datastore
git checkout 0.49.1
python3 -m venv .venv
+15 -2
View File
@@ -2,7 +2,7 @@
# Read more https://github.com/dgtlmoon/changedetection.io/wiki
# Semver means never use .01, or 00. Should be .1.
__version__ = '0.54.3'
__version__ = '0.54.7'
from changedetectionio.strtobool import strtobool
from json.decoder import JSONDecodeError
@@ -10,7 +10,6 @@ from json.decoder import JSONDecodeError
from loguru import logger
import getopt
import logging
import os
import platform
import signal
import threading
@@ -61,8 +60,22 @@ import time
# ==============================================================================
import multiprocessing
import os
import sys
# Limit glibc malloc arena count to prevent RSS growth from concurrent requests.
# Default: glibc creates up to 8×CPU_cores arenas. Each concurrent thread/connection
# can trigger a new arena, and freed memory stays mapped in those arenas as RSS forever.
# With MALLOC_ARENA_MAX=2, at most 2 arenas are used; freed pages return to the OS faster.
# Must be set before worker threads start; env var is read lazily by glibc on first arena creation.
if 'MALLOC_ARENA_MAX' not in os.environ:
os.environ['MALLOC_ARENA_MAX'] = '2'
try:
import ctypes as _ctypes
_ctypes.CDLL('libc.so.6').mallopt(-8, 2) # M_ARENA_MAX = -8
except Exception:
pass
# Set spawn as global default (safety net - all our code uses explicit contexts anyway)
# Skip in tests to avoid breaking pytest-flask's LiveServer fixture (uses unpicklable local functions)
if 'pytest' not in sys.modules:
+9 -2
View File
@@ -154,11 +154,10 @@ class Import(Resource):
if extras['processor'] not in available:
return f"Invalid processor '{extras['processor']}'. Available processors: {', '.join(available)}", 400
# Validate fetch_backend if provided
# Validate fetch_backend if provided (legacy API compat — still accepted, stored as-is)
if 'fetch_backend' in extras:
from changedetectionio.content_fetchers import available_fetchers
available = [f[0] for f in available_fetchers()]
# Also allow 'system' and extra_browser_* patterns
is_valid = (
extras['fetch_backend'] == 'system' or
extras['fetch_backend'] in available or
@@ -167,6 +166,14 @@ class Import(Resource):
if not is_valid:
return f"Invalid fetch_backend '{extras['fetch_backend']}'. Available: system, {', '.join(available)}", 400
# Validate browser_profile if provided
if 'browser_profile' in extras:
from changedetectionio.model.browser_profile import get_builtin_profiles, RESERVED_MACHINE_NAMES
store_profiles = self.datastore.data['settings']['application'].get('browser_profiles', {})
known = set(get_builtin_profiles().keys()) | set(store_profiles.keys()) | {'system', None}
if extras['browser_profile'] not in known:
return f"Invalid browser_profile '{extras['browser_profile']}'. Available: {', '.join(str(k) for k in known)}", 400
# Validate notification_urls if provided
if 'notification_urls' in extras:
from wtforms import ValidationError
+10
View File
@@ -85,6 +85,9 @@ class Tag(Resource):
# Create clean tag dict without Watch-specific fields
clean_tag = {k: v for k, v in tag.items() if k not in watch_only_fields}
# fetch_backend is a legacy field superseded by browser_profile — omit from API response
clean_tag.pop('fetch_backend', None)
return clean_tag
@auth.check_token
@@ -177,6 +180,13 @@ class Tag(Resource):
new_uuid = self.datastore.add_tag(title=title)
if new_uuid:
# Apply any extra fields (e.g. processor_config_restock_diff) beyond just title
extra = {k: v for k, v in json_data.items() if k != 'title'}
if extra:
tag = self.datastore.data['settings']['application']['tags'].get(new_uuid)
if tag:
tag.update(extra)
tag.commit()
return {'uuid': new_uuid}, 201
else:
return "Invalid or unsupported tag", 400
+6 -3
View File
@@ -105,6 +105,9 @@ class Watch(Resource):
watch['viewed'] = watch_obj.viewed
watch['link'] = watch_obj.link,
# fetch_backend is a legacy field superseded by browser_profile — omit from API response
watch.pop('fetch_backend', None)
return watch
@auth.check_token
@@ -338,7 +341,7 @@ class WatchHistoryDiff(Resource):
word_diff = True
# Get boolean diff preferences with defaults from DIFF_PREFERENCES_CONFIG
changes_only = strtobool(request.args.get('changesOnly', 'true'))
changes_only = strtobool(request.args.get('changesOnly', 'false'))
ignore_whitespace = strtobool(request.args.get('ignoreWhitespace', 'false'))
include_removed = strtobool(request.args.get('removed', 'true'))
include_added = strtobool(request.args.get('added', 'true'))
@@ -349,7 +352,7 @@ class WatchHistoryDiff(Resource):
previous_version_file_contents=from_version_file_contents,
newest_version_file_contents=to_version_file_contents,
ignore_junk=ignore_whitespace,
include_equal=changes_only,
include_equal=not changes_only,
include_removed=include_removed,
include_added=include_added,
include_replaced=include_replaced,
@@ -567,4 +570,4 @@ class CreateWatch(Resource):
return {'status': f'OK, queueing {len(watches_to_queue)} watches in background'}, 202
return list, 200
return list, 200
@@ -40,11 +40,6 @@ def create_backup(datastore_path, watches: dict, tags: dict = None):
zipObj.write(url_watches_json, arcname="url-watches.json")
logger.debug("Added url-watches.json to backup")
# Add the flask app secret (if it exists)
secret_file = os.path.join(datastore_path, "secret.txt")
if os.path.isfile(secret_file):
zipObj.write(secret_file, arcname="secret.txt")
# Add tag data directories (each tag has its own {uuid}/tag.json)
for uuid, tag in (tags or {}).items():
for f in Path(tag.data_dir).glob('*'):
@@ -151,19 +146,22 @@ def construct_blueprint(datastore: ChangeDetectionStore):
def download_backup(filename):
import re
filename = filename.strip()
backup_filename_regex = BACKUP_FILENAME_FORMAT.format("\d+")
full_path = os.path.join(os.path.abspath(datastore.datastore_path), filename)
if not full_path.startswith(os.path.abspath(datastore.datastore_path)):
abort(404)
backup_filename_regex = BACKUP_FILENAME_FORMAT.format(r"\d+")
# Resolve 'latest' before any validation so checks run against the real filename.
if filename == 'latest':
backups = find_backups()
if not backups:
abort(404)
filename = backups[0]['filename']
if not re.match(r"^" + backup_filename_regex + "$", filename):
abort(400) # Bad Request if the filename doesn't match the pattern
full_path = os.path.join(os.path.abspath(datastore.datastore_path), filename)
if not full_path.startswith(os.path.abspath(datastore.datastore_path) + os.sep):
abort(404)
logger.debug(f"Backup download request for '{full_path}'")
return send_from_directory(os.path.abspath(datastore.datastore_path), filename, as_attachment=True)
+45 -5
View File
@@ -1,6 +1,7 @@
import io
import json
import os
import re
import shutil
import tempfile
import threading
@@ -14,6 +15,16 @@ from loguru import logger
from changedetectionio.flask_app import login_optionally_required
# Maximum size of the uploaded zip file. Override via env var MAX_RESTORE_UPLOAD_MB.
_MAX_UPLOAD_BYTES = int(os.getenv("MAX_RESTORE_UPLOAD_MB", 256)) * 1024 * 1024
# Maximum total uncompressed size of all entries (zip-bomb guard). Override via MAX_RESTORE_DECOMPRESSED_MB.
_MAX_DECOMPRESSED_BYTES = int(os.getenv("MAX_RESTORE_DECOMPRESSED_MB", 1024)) * 1024 * 1024
# Only top-level directories whose name is a valid UUID are treated as watch/tag entries.
_UUID_RE = re.compile(
r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$',
re.IGNORECASE,
)
class RestoreForm(Form):
zip_file = FileField(_l('Backup zip file'), validators=[
@@ -50,7 +61,18 @@ def import_from_zip(zip_stream, datastore, include_groups, include_groups_replac
with tempfile.TemporaryDirectory() as tmpdir:
logger.debug(f"Restore: extracting zip to {tmpdir}")
with zipfile.ZipFile(zip_stream, 'r') as zf:
zf.extractall(tmpdir)
total_uncompressed = sum(m.file_size for m in zf.infolist())
if total_uncompressed > _MAX_DECOMPRESSED_BYTES:
raise ValueError(
f"Backup archive decompressed size ({total_uncompressed // (1024 * 1024)} MB) "
f"exceeds the {_MAX_DECOMPRESSED_BYTES // (1024 * 1024)} MB limit"
)
resolved_dest = os.path.realpath(tmpdir)
for member in zf.infolist():
member_dest = os.path.realpath(os.path.join(resolved_dest, member.filename))
if not member_dest.startswith(resolved_dest + os.sep) and member_dest != resolved_dest:
raise ValueError(f"Zip Slip path traversal detected in backup archive: {member.filename!r}")
zf.extract(member, tmpdir)
logger.debug("Restore: zip extracted, scanning UUID directories")
for entry in os.scandir(tmpdir):
@@ -58,6 +80,9 @@ def import_from_zip(zip_stream, datastore, include_groups, include_groups_replac
continue
uuid = entry.name
if not _UUID_RE.match(uuid):
logger.warning(f"Restore: skipping non-UUID directory {uuid!r}")
continue
tag_json_path = os.path.join(entry.path, 'tag.json')
watch_json_path = os.path.join(entry.path, 'watch.json')
@@ -155,7 +180,9 @@ def construct_restore_blueprint(datastore):
form = RestoreForm()
return render_template("backup_restore.html",
form=form,
restore_running=any(t.is_alive() for t in restore_threads))
restore_running=any(t.is_alive() for t in restore_threads),
max_upload_mb=_MAX_UPLOAD_BYTES // (1024 * 1024),
max_decompressed_mb=_MAX_DECOMPRESSED_BYTES // (1024 * 1024))
@login_optionally_required
@restore_blueprint.route("/restore/start", methods=['POST'])
@@ -173,10 +200,22 @@ def construct_restore_blueprint(datastore):
flash(gettext("File must be a .zip backup file"), "error")
return redirect(url_for('backups.restore.restore'))
# Read into memory now — the request stream is gone once we return
# Reject oversized uploads before reading the stream into memory.
content_length = request.content_length
if content_length and content_length > _MAX_UPLOAD_BYTES:
flash(gettext("Backup file is too large (max %(mb)s MB)", mb=_MAX_UPLOAD_BYTES // (1024 * 1024)), "error")
return redirect(url_for('backups.restore.restore'))
# Read into memory now — the request stream is gone once we return.
# Read one byte beyond the limit so we can detect truncated-but-still-oversized streams.
try:
zip_bytes = io.BytesIO(zip_file.read())
zipfile.ZipFile(zip_bytes) # quick validity check before spawning
raw = zip_file.read(_MAX_UPLOAD_BYTES + 1)
if len(raw) > _MAX_UPLOAD_BYTES:
flash(gettext("Backup file is too large (max %(mb)s MB)", mb=_MAX_UPLOAD_BYTES // (1024 * 1024)), "error")
return redirect(url_for('backups.restore.restore'))
zip_bytes = io.BytesIO(raw)
with zipfile.ZipFile(zip_bytes): # quick validity check before spawning
pass
zip_bytes.seek(0)
except zipfile.BadZipFile:
flash(gettext("Invalid or corrupted zip file"), "error")
@@ -201,6 +240,7 @@ def construct_restore_blueprint(datastore):
name="BackupRestore"
)
restore_thread.start()
restore_threads[:] = [t for t in restore_threads if t.is_alive()]
restore_threads.append(restore_thread)
flash(gettext("Restore started in background, check back in a few minutes."))
return redirect(url_for('backups.restore.restore'))
@@ -19,6 +19,10 @@
<p>{{ _('Restore a backup. Must be a .zip backup file created on/after v0.53.1 (new database layout).') }}</p>
<p>{{ _('Note: This does not override the main application settings, only watches and groups.') }}</p>
<p class="pure-form-message">
{{ _('Max upload size: %(upload)s MB &nbsp;·&nbsp; Max decompressed size: %(decomp)s MB',
upload=max_upload_mb, decomp=max_decompressed_mb) }}
</p>
<form class="pure-form pure-form-stacked settings"
action="{{ url_for('backups.restore.backups_restore_start') }}"
@@ -102,6 +102,35 @@ def run_async_in_browser_loop(coro):
else:
raise RuntimeError("Browser steps event loop is not available")
async def _close_session_resources(session_data, label=''):
"""Close all browser resources for a session in the correct order.
browserstepper.cleanup() closes page+context but not the browser itself.
For CloakBrowser, browser.close() is what stops the local Chromium process via pw.stop().
For the default CDP path, playwright_context.stop() shuts down the playwright instance.
"""
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
await browserstepper.cleanup()
except Exception as e:
logger.error(f"Error cleaning up browserstepper{label}: {e}")
browser = session_data.get('browser')
if browser:
try:
await asyncio.wait_for(browser.close(), timeout=5.0)
except Exception as e:
logger.warning(f"Error closing browser{label}: {e}")
playwright_context = session_data.get('playwright_context')
if playwright_context:
try:
await playwright_context.stop()
except Exception as e:
logger.warning(f"Error stopping playwright context{label}: {e}")
def cleanup_expired_sessions():
"""Remove expired browsersteps sessions and cleanup their resources"""
global browsersteps_sessions, browsersteps_watch_to_session
@@ -119,13 +148,10 @@ def cleanup_expired_sessions():
logger.debug(f"Cleaning up expired browsersteps session {session_id}")
session_data = browsersteps_sessions[session_id]
# Cleanup playwright resources asynchronously
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id}: {e}")
try:
run_async_in_browser_loop(_close_session_resources(session_data, label=f" for session {session_id}"))
except Exception as e:
logger.error(f"Error cleaning up session {session_id}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
@@ -152,12 +178,10 @@ def cleanup_session_for_watch(watch_uuid):
session_data = browsersteps_sessions.get(session_id)
if session_data:
browserstepper = session_data.get('browserstepper')
if browserstepper:
try:
run_async_in_browser_loop(browserstepper.cleanup())
except Exception as e:
logger.error(f"Error cleaning up session {session_id} for watch {watch_uuid}: {e}")
try:
run_async_in_browser_loop(_close_session_resources(session_data, label=f" for watch {watch_uuid}"))
except Exception as e:
logger.error(f"Error cleaning up session {session_id} for watch {watch_uuid}: {e}")
# Remove from sessions dict
del browsersteps_sessions[session_id]
@@ -178,59 +202,64 @@ def construct_blueprint(datastore: ChangeDetectionStore):
import time
from playwright.async_api import async_playwright
# We keep the playwright session open for many minutes
keepalive_seconds = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60
keepalive_ms = ((keepalive_seconds + 3) * 1000)
browsersteps_start_session = {'start_time': time.time()}
# Create a new async playwright instance for browser steps
playwright_instance = async_playwright()
playwright_context = await playwright_instance.start()
# Build proxy dict first — needed by both the CDP path and fetcher-specific launchers
proxy_url = datastore.get_proxy_url_for_watch(uuid=watch_uuid)
proxy = None
if proxy_url:
from urllib.parse import urlparse
parsed = urlparse(proxy_url)
proxy = {'server': proxy_url}
if parsed.username:
proxy['username'] = parsed.username
if parsed.password:
proxy['password'] = parsed.password
logger.debug(f"Browser Steps: UUID {watch_uuid} selected proxy {proxy_url}")
keepalive_ms = ((keepalive_seconds + 3) * 1000)
base_url = os.getenv('PLAYWRIGHT_DRIVER_URL', '').strip('"')
a = "?" if not '?' in base_url else '&'
base_url += a + f"timeout={keepalive_ms}"
# Resolve the fetcher class for this watch so we can ask it to launch its own browser
# if it supports that (e.g. CloakBrowser, which runs locally rather than via CDP)
watch = datastore.data['watching'][watch_uuid]
from changedetectionio import content_fetchers
fetcher_class = content_fetchers.get_fetcher(watch.effective_browser_profile.fetch_backend)
browser = None
playwright_context = None
# If the fetcher has its own browser launch for the live steps UI, use it.
# get_browsersteps_browser(proxy, keepalive_ms) returns (browser, playwright_context_or_None)
# or None to fall back to the default CDP path.
if fetcher_class and hasattr(fetcher_class, 'get_browsersteps_browser'):
result = await fetcher_class.get_browsersteps_browser(proxy=proxy, keepalive_ms=keepalive_ms)
if result is not None:
browser, playwright_context = result
logger.debug(f"Browser Steps: using fetcher-specific browser for '{fetcher_class.__name__}'")
# Default: connect to the remote Playwright/sockpuppetbrowser via CDP
if browser is None:
playwright_instance = async_playwright()
playwright_context = await playwright_instance.start()
base_url = os.getenv('PLAYWRIGHT_DRIVER_URL', '').strip('"')
a = "?" if '?' not in base_url else '&'
base_url += a + f"timeout={keepalive_ms}"
browser = await playwright_context.chromium.connect_over_cdp(base_url, timeout=keepalive_ms)
logger.debug(f"Browser Steps: using CDP connection to {base_url}")
browser = await playwright_context.chromium.connect_over_cdp(base_url, timeout=keepalive_ms)
browsersteps_start_session['browser'] = browser
browsersteps_start_session['playwright_context'] = playwright_context
proxy_id = datastore.get_preferred_proxy_for_watch(uuid=watch_uuid)
proxy = None
if proxy_id:
proxy_url = datastore.proxy_list.get(proxy_id).get('url')
if proxy_url:
# Playwright needs separate username and password values
from urllib.parse import urlparse
parsed = urlparse(proxy_url)
proxy = {'server': proxy_url}
if parsed.username:
proxy['username'] = parsed.username
if parsed.password:
proxy['password'] = parsed.password
logger.debug(f"Browser Steps: UUID {watch_uuid} selected proxy {proxy_url}")
# Tell Playwright to connect to Chrome and setup a new session via our stepper interface
browserstepper = browser_steps.browsersteps_live_ui(
playwright_browser=browser,
proxy=proxy,
start_url=datastore.data['watching'][watch_uuid].link,
headers=datastore.data['watching'][watch_uuid].get('headers')
start_url=watch.link,
headers=watch.get('headers')
)
# Initialize the async connection
await browserstepper.connect(proxy=proxy)
browsersteps_start_session['browserstepper'] = browserstepper
# For test
#await browsersteps_start_session['browserstepper'].action_goto_url(value="http://example.com?time="+str(time.time()))
return browsersteps_start_session
@@ -40,12 +40,14 @@ def construct_blueprint(datastore: ChangeDetectionStore):
contents = ''
now = time.time()
try:
import asyncio
processor_module = importlib.import_module("changedetectionio.processors.text_json_diff.processor")
update_handler = processor_module.perform_site_check(datastore=datastore,
watch_uuid=uuid
)
update_handler.call_browser(preferred_proxy_id=preferred_proxy)
update_handler.preferred_proxy_override = preferred_proxy
asyncio.run(update_handler.call_browser())
# title, size is len contents not len xfer
except content_fetcher_exceptions.Non200ErrorCodeReceived as e:
if e.status_code == 404:
@@ -175,9 +175,9 @@ class import_xlsx_wachete(Importer):
dynamic_wachet = str(data.get('dynamic wachet', '')).strip().lower() # Convert bool to str to cover all cases
# libreoffice and others can have it as =FALSE() =TRUE(), or bool(true)
if 'true' in dynamic_wachet or dynamic_wachet == '1':
extras['fetch_backend'] = 'html_webdriver'
extras['browser_profile'] = 'browser_chromeplaywright'
elif 'false' in dynamic_wachet or dynamic_wachet == '0':
extras['fetch_backend'] = 'html_requests'
extras['browser_profile'] = 'direct_http_requests'
if data.get('xpath'):
# @todo split by || ?
+1 -1
View File
@@ -7,7 +7,7 @@ def construct_tag_routes(rss_blueprint, datastore):
datastore: The ChangeDetectionStore instance
"""
@rss_blueprint.route("/tag/<string:tag_uuid>", methods=['GET'])
@rss_blueprint.route("/tag/<uuid_str:tag_uuid>", methods=['GET'])
def rss_tag_feed(tag_uuid):
from flask import make_response, request, url_for
@@ -15,6 +15,9 @@ from changedetectionio.auth_decorator import login_optionally_required
def construct_blueprint(datastore: ChangeDetectionStore):
settings_blueprint = Blueprint('settings', __name__, template_folder="templates")
from changedetectionio.blueprint.settings.browser_profile import construct_blueprint as construct_browser_profile_blueprint
settings_blueprint.register_blueprint(construct_browser_profile_blueprint(datastore), url_prefix='/browsers')
@settings_blueprint.route("", methods=['GET', "POST"])
@login_optionally_required
def settings_page():
@@ -0,0 +1,200 @@
import flask_login
from flask import Blueprint, render_template, request, redirect, url_for, flash
from flask_babel import gettext
from changedetectionio.store import ChangeDetectionStore
from changedetectionio.auth_decorator import login_optionally_required
def construct_blueprint(datastore: ChangeDetectionStore):
settings_browser_profile_blueprint = Blueprint(
'settings_browsers',
__name__,
template_folder="templates"
)
def _render_index(browser_profile_form=None, editing_machine_name=None):
from changedetectionio import forms
from changedetectionio import content_fetchers as cf
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
# Only browser-capable fetchers are valid profile types
fetcher_choices = cf.available_browser_fetchers()
if browser_profile_form is None:
browser_profile_form = forms.BrowserProfileForm()
browser_profile_form.fetch_backend.choices = fetcher_choices
fetcher_supports_screenshots = {name: True for name, _ in fetcher_choices}
fetcher_requires_connection_url = {name: True for name, cls in cf.FETCHERS.items()
if getattr(cls, 'requires_connection_url', False)}
# Table shows default built-in profiles first, then user-created profiles
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
user_profiles = dict(cf.DEFAULT_BROWSER_PROFILES)
for machine_name, raw in store_profiles.items():
try:
user_profiles[machine_name] = BrowserProfile(**raw) if isinstance(raw, dict) else raw
except Exception:
pass
current_default = datastore.data['settings']['application'].get('browser_profile') or 'direct_http_requests'
return render_template(
"browser_profiles.html",
browser_profiles=user_profiles,
browser_profile_form=browser_profile_form,
reserved_browser_profile_names=RESERVED_MACHINE_NAMES,
fetcher_choices=fetcher_choices,
fetcher_supports_screenshots=fetcher_supports_screenshots,
fetcher_requires_connection_url=fetcher_requires_connection_url,
current_default_profile=current_default,
editing_machine_name=editing_machine_name,
)
@settings_browser_profile_blueprint.route("", methods=['GET'])
@login_optionally_required
def index():
return _render_index()
@settings_browser_profile_blueprint.route("/<string:machine_name>/edit", methods=['GET'])
@login_optionally_required
def edit(machine_name):
from changedetectionio import forms
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Built-in browser profiles cannot be edited."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
raw = store_profiles.get(machine_name)
if raw is None:
flash(gettext("Browser profile not found."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
profile = BrowserProfile(**raw) if isinstance(raw, dict) else raw
form = forms.BrowserProfileForm(data=profile.model_dump())
return _render_index(browser_profile_form=form, editing_machine_name=machine_name)
@settings_browser_profile_blueprint.route("/save", methods=['POST'])
@login_optionally_required
def save():
from changedetectionio import forms
from changedetectionio import content_fetchers as cf
from changedetectionio.model.browser_profile import BrowserProfile, RESERVED_MACHINE_NAMES
fetcher_choices = [(name, desc) for name, desc in cf.available_fetchers()]
browser_profile_form = forms.BrowserProfileForm(formdata=request.form)
browser_profile_form.fetch_backend.choices = fetcher_choices
if not browser_profile_form.validate():
flash(gettext("Browser profile error: {}").format(
'; '.join(str(e) for errs in browser_profile_form.errors.values() for e in errs)
), 'error')
return redirect(url_for('settings.settings_browsers.index'))
name = browser_profile_form.name.data.strip()
machine_name = BrowserProfile.machine_name_from_str(name)
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Cannot use reserved profile name '{}'. Please choose a different name.").format(name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
original_machine_name = request.form.get('original_machine_name', '').strip()
store_profiles = datastore.data['settings']['application'].setdefault('browser_profiles', {})
if machine_name != original_machine_name and machine_name in store_profiles:
flash(gettext("A browser profile named '{}' already exists.").format(name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
profile_data = {
'name': name,
'fetch_backend': browser_profile_form.fetch_backend.data,
'browser_connection_url': browser_profile_form.browser_connection_url.data or None,
'viewport_width': browser_profile_form.viewport_width.data or 1280,
'viewport_height': browser_profile_form.viewport_height.data or 1000,
'block_images': bool(browser_profile_form.block_images.data),
'block_fonts': bool(browser_profile_form.block_fonts.data),
'ignore_https_errors': bool(browser_profile_form.ignore_https_errors.data),
'user_agent': browser_profile_form.user_agent.data or None,
'locale': browser_profile_form.locale.data or None,
'custom_headers': browser_profile_form.custom_headers.data or '',
'is_builtin': False,
}
try:
BrowserProfile(**profile_data)
except Exception as e:
flash(gettext("Browser profile validation error: {}").format(str(e)), 'error')
return redirect(url_for('settings.settings_browsers.index'))
# Handle rename: remove old key, cascade-update watches and tags
if original_machine_name and original_machine_name != machine_name and original_machine_name in store_profiles:
del store_profiles[original_machine_name]
for watch in datastore.data['watching'].values():
if watch.get('browser_profile') == original_machine_name:
watch['browser_profile'] = machine_name
for tag in datastore.data.get('settings', {}).get('application', {}).get('tags', {}).values():
if tag.get('browser_profile') == original_machine_name:
tag['browser_profile'] = machine_name
store_profiles[machine_name] = profile_data
datastore.commit()
flash(gettext("Browser profile '{}' saved.").format(name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
@settings_browser_profile_blueprint.route("/<string:machine_name>/delete", methods=['GET'])
@login_optionally_required
def delete(machine_name):
from changedetectionio.model.browser_profile import RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
flash(gettext("Built-in browser profiles cannot be deleted."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
if machine_name not in store_profiles:
flash(gettext("Browser profile not found."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
raw = store_profiles[machine_name]
profile_name = raw.get('name', machine_name) if isinstance(raw, dict) else machine_name
for watch in datastore.data['watching'].values():
if watch.get('browser_profile') == machine_name:
watch['browser_profile'] = None
for tag in datastore.data.get('settings', {}).get('application', {}).get('tags', {}).values():
if tag.get('browser_profile') == machine_name:
tag['browser_profile'] = None
if datastore.data['settings']['application'].get('browser_profile') == machine_name:
datastore.data['settings']['application']['browser_profile'] = None
del store_profiles[machine_name]
datastore.commit()
flash(gettext("Browser profile '{}' deleted.").format(profile_name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
@settings_browser_profile_blueprint.route("/set-default", methods=['POST'])
@login_optionally_required
def set_default():
from changedetectionio import content_fetchers as cf
machine_name = request.form.get('machine_name', '').strip()
if not machine_name:
flash(gettext("No profile specified."), 'error')
return redirect(url_for('settings.settings_browsers.index'))
from changedetectionio.model.browser_profile import get_profile
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
if get_profile(machine_name, store_profiles) is None:
flash(gettext("Unknown browser profile '{}'.").format(machine_name), 'error')
return redirect(url_for('settings.settings_browsers.index'))
datastore.data['settings']['application']['browser_profile'] = machine_name
datastore.commit()
flash(gettext("Default browser profile set to '{}'.").format(machine_name), 'notice')
return redirect(url_for('settings.settings_browsers.index'))
return settings_browser_profile_blueprint
@@ -0,0 +1,163 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.html' import render_field, render_checkbox_field, render_button %}
<div class="edit-form">
<div class="box-wrap inner">
<h2>{{ _('Browser Profiles') }}</h2>
<p>{{ _('Create named profiles to configure browser settings — viewport size, connection URL, image/font blocking, and more. Each profile is based on an available browser type.') }}</p>
<form id="set-default-form" action="{{ url_for('settings.settings_browsers.set_default') }}" method="POST">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="machine_name" id="default-machine-name" value="">
</form>
{% if browser_profiles %}
<table class="pure-table pure-table-striped" style="width:100%; margin-bottom:1.5em;">
<thead>
<tr>
<th style="width:2.5em; text-align:center;" title="{{ _('System default') }}">{{ _('Default') }}</th>
<th>{{ _('Name') }}</th>
<th>{{ _('Type') }}</th>
<th style="width:3em; text-align:center;"></th>
<th>{{ _('Viewport') }}</th>
<th>{{ _('Options') }}</th>
<th></th>
</tr>
</thead>
<tbody>
{% for machine_name, profile in browser_profiles.items() %}
<tr>
<td style="text-align:center;">
<input type="radio"
name="default_profile"
value="{{ machine_name }}"
title="{{ _('Set as system default') }}"
{% if machine_name == current_default_profile %}checked{% endif %}
onchange="setDefaultProfile('{{ machine_name }}')">
</td>
<td>{{ profile.name }}</td>
<td><code>{{ profile.fetch_backend }}</code></td>
<td style="text-align:center;">{{ profile.get_fetcher_class_name()|fetcher_status_icons }}</td>
<td>{{ profile.viewport_width }}×{{ profile.viewport_height }}</td>
<td style="font-size:0.8em; line-height:1.6;">
{% if profile.block_images %}{{ _('No images') }}<br>{% endif %}
{% if profile.block_fonts %}{{ _('No fonts') }}<br>{% endif %}
{% if profile.ignore_https_errors %}{{ _('Ignore TLS') }}<br>{% endif %}
{% if profile.browser_connection_url %}<span title="{{ profile.browser_connection_url }}">{{ _('Custom URL') }}</span>{% endif %}
</td>
<td style="white-space:nowrap;">
{% if not profile.is_builtin %}
<a href="{{ url_for('settings.settings_browsers.edit', machine_name=machine_name) }}"
class="pure-button button-small">{{ _('Edit') }}</a>
<a href="{{ url_for('settings.settings_browsers.delete', machine_name=machine_name) }}"
class="pure-button button-small button-error"
onclick="return confirm('{{ _('Delete this browser profile?') }}')">{{ _('Delete') }}</a>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p style="color:#888; font-style:italic;">{{ _('No browser profiles configured yet. Add one below.') }}</p>
{% endif %}
<div class="border-fieldset">
<h3 id="profile-form-heading">{{ _('Edit browser profile') if editing_machine_name else _('Add new browser profile') }}</h3>
{% if not editing_machine_name %}
<p style="font-size:0.9em; color:#666;">{{ _('Choose a browser type, give it a name, and configure its settings. You can create multiple profiles from the same type with different connection URLs or options.') }}</p>
{% endif %}
<form class="pure-form pure-form-stacked"
id="browser-profile-form"
action="{{ url_for('settings.settings_browsers.save') }}"
method="POST">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="original_machine_name" id="original_machine_name" value="{{ editing_machine_name or '' }}">
<fieldset>
<div class="pure-control-group">
{{ render_field(browser_profile_form.name) }}
</div>
<div class="pure-control-group inline-radio">
{{ render_field(browser_profile_form.fetch_backend, id="profile-fetch-backend") }}
</div>
<div class="pure-control-group browser-only-field cdp-only-field">
{{ render_field(browser_profile_form.browser_connection_url) }}
<span class="pure-form-message-inline">{{ _('Optional — override the system CDP/WebSocket URL for this profile only (e.g.') }} <code>ws://my-chrome:3000</code>).</span>
</div>
<div class="pure-control-group browser-only-field" style="display:flex; gap:1em; flex-wrap:wrap;">
<div>{{ render_field(browser_profile_form.viewport_width) }}</div>
<div>{{ render_field(browser_profile_form.viewport_height) }}</div>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.block_images) }}
<span class="pure-form-message-inline">{{ _('Block image downloads — speeds up loads on image-heavy pages.') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.block_fonts) }}
<span class="pure-form-message-inline">{{ _('Block web font downloads.') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_checkbox_field(browser_profile_form.ignore_https_errors) }}
<span class="pure-form-message-inline">{{ _('Ignore TLS/HTTPS certificate errors (useful for self-signed certs on staging sites).') }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_field(browser_profile_form.user_agent) }}
<span class="pure-form-message-inline">{{ _("Leave blank to use the fetcher's default User-Agent.") }}</span>
</div>
<div class="pure-control-group browser-only-field">
{{ render_field(browser_profile_form.locale) }}
<span class="pure-form-message-inline">{{ _('Sets Accept-Language and navigator.language (e.g. en-US, de-DE).') }}</span>
</div>
<div class="pure-control-group">
{{ render_field(browser_profile_form.custom_headers) }}
<span class="pure-form-message-inline">{{ _('Extra HTTP headers for all requests using this profile (one per line, Key: Value). Applied before per-watch headers.') }}</span>
</div>
<div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary" id="profile-submit-btn">{{ _('Save profile') }}</button>
{% if editing_machine_name %}
<a href="{{ url_for('settings.settings_browsers.index') }}" class="pure-button button-cancel">{{ _('Cancel') }}</a>
{% endif %}
<a href="{{ url_for('settings.settings_page') }}" class="pure-button button-cancel">{{ _('Back to Settings') }}</a>
</div>
</fieldset>
</form>
</div>
</div>
</div>
<script>
function setDefaultProfile(machineName) {
document.getElementById('default-machine-name').value = machineName;
document.getElementById('set-default-form').submit();
}
const fetcherSupportsBrowser = {{ fetcher_supports_screenshots | tojson }};
const fetcherRequiresConnectionUrl = {{ fetcher_requires_connection_url | tojson }};
function updateBrowserFieldVisibility() {
const fetchBackend = document.getElementById('profile-fetch-backend').value;
const isBrowser = !!fetcherSupportsBrowser[fetchBackend];
const isCdp = !!fetcherRequiresConnectionUrl[fetchBackend];
document.querySelectorAll('.browser-only-field').forEach(function(el) {
el.style.display = isBrowser ? '' : 'none';
});
document.querySelectorAll('.cdp-only-field').forEach(function(el) {
el.style.display = isCdp ? '' : 'none';
});
}
document.addEventListener('DOMContentLoaded', function() {
const sel = document.getElementById('profile-fetch-backend');
if (sel) {
sel.addEventListener('change', updateBrowserFieldVisibility);
updateBrowserFieldVisibility();
}
});
{% if editing_machine_name %}
document.addEventListener('DOMContentLoaded', function() {
document.getElementById('browser-profile-form').scrollIntoView({behavior: 'smooth'});
});
{% endif %}
</script>
{% endblock %}
@@ -28,6 +28,7 @@
<li class="tab"><a href="{{ url_for('backups.create') }}">{{ _('Backups') }}</a></li>
<li class="tab"><a href="#timedate">{{ _('Time & Date') }}</a></li>
<li class="tab"><a href="#proxies">{{ _('CAPTCHA & Proxies') }}</a></li>
<li class="tab"><a href="{{ url_for('settings.settings_browsers.index') }}">{{ _('Browsers') }}</a></li>
{% if plugin_tabs %}
{% for tab in plugin_tabs %}
<li class="tab"><a href="#plugin-{{ tab.plugin_id }}">{{ tab.tab_label }}</a></li>
@@ -115,14 +116,7 @@
</div>
<div class="tab-pane-inner" id="fetching">
<div class="pure-control-group inline-radio">
{{ render_field(form.application.form.fetch_backend, class="fetch-backend") }}
<span class="pure-form-message-inline">
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched sites don\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var') }} 'WEBDRIVER_URL'. </p>
</span>
</div>
<fieldset class="pure-group" id="webdriver-override-options" data-visible-for="application-fetch_backend=html_webdriver">
<fieldset class="pure-group" id="webdriver-override-options">
<div class="pure-form-message-inline">
<strong>{{ _('If you\'re having trouble waiting for the page to be fully rendered (text missing etc), try increasing the \'wait\' time here.') }}</strong>
<br>
@@ -146,17 +140,9 @@
{{ render_field(form.requests.form.timeout) }}
<span class="pure-form-message-inline">{{ _('For regular plain requests (not chrome based), maximum number of seconds until timeout, 1-999.') }}</span><br>
</div>
<div class="pure-control-group inline-radio">
{{ render_field(form.requests.form.default_ua) }}
<span class="pure-form-message-inline">
{{ _('Applied to all requests.') }}<br><br>
{{ _('Note: Simply changing the User-Agent often does not defeat anti-robot technologies, it\'s important to consider') }} <a href="https://changedetection.io/tutorial/what-are-main-types-anti-robot-mechanisms">{{ _('all of the ways that the browser is detected') }}</a>.
</span>
</div>
<div class="pure-control-group">
<br>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
<br>
{{ _('Tip:') }} <a href="{{ url_for('settings.settings_page')}}#proxies">{{ _('Connect using Bright Data proxies, find out more here.') }}</a>
</div>
</div>
@@ -352,7 +338,7 @@ nav
</div>
</div>
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successfull than "Data Center" for blocked websites.') }}</p>
<p><strong>{{ _('Tip') }}</strong>: {{ _('"Residential" and "Mobile" proxy type can be more successful than "Data Center" for blocked websites.') }}</p>
<div class="pure-control-group" id="extra-proxies-setting">
{{ render_fieldlist_with_inline_errors(form.requests.form.extra_proxies) }}
+2 -2
View File
@@ -156,9 +156,9 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, worker_pool,
@login_optionally_required
def clear_all_history():
if request.method == 'POST':
confirmtext = request.form.get('confirmtext')
confirmtext = request.form.get('confirmtext', '')
if confirmtext == 'clear':
if confirmtext.strip().lower() == gettext('clear').strip().lower():
# Run in background thread to avoid blocking
def clear_history_background():
# Capture UUIDs first to avoid race conditions
+36 -4
View File
@@ -67,6 +67,10 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
default['proxy'] = ''
# proxy_override set to the json/text list of the items
# browser_profile: None means "use system default" — map to 'system' so the radio pre-selects correctly
if not default.get('browser_profile'):
default['browser_profile'] = 'system'
# Does it use some custom form? does one exist?
processor_name = datastore.data['watching'][uuid].get('processor', '')
processor_classes = next((tpl for tpl in processors.find_processors() if tpl[1] == processor_name), None)
@@ -139,10 +143,37 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
except Exception as e:
logger.warning(f"Failed to load processor config: {e}")
for p in datastore.extra_browsers:
form.fetch_backend.choices.append(p)
from changedetectionio.model.browser_profile import BrowserProfile
from changedetectionio import content_fetchers as cf
store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
form.fetch_backend.choices.append(("system", 'System settings default'))
# Resolve the name of the system-level default profile for the label
from changedetectionio.model.browser_profile import get_profile
_system_default_machine_name = datastore.data['settings']['application'].get('browser_profile') or 'direct_http_requests'
_all_store_profiles = datastore.data['settings']['application'].get('browser_profiles', {})
_default_profile = get_profile(_system_default_machine_name, _all_store_profiles)
if _default_profile:
_system_label = gettext('System settings default') + ' \u2013 ' + _default_profile.name
else:
_system_label = gettext('System settings default')
# Choices: system default + always-present defaults (requests) + user-created profiles
form.browser_profile.choices = [('system', _system_label)] + [
(p.get_machine_name(), p.name)
for p in cf.DEFAULT_BROWSER_PROFILES.values()
] + [
(machine_name, raw.get('name', machine_name) if isinstance(raw, dict) else getattr(raw, 'name', machine_name))
for machine_name, raw in store_profiles.items()
]
# Build a map of machine_name → fetcher class name for the JS visibility system
all_profiles = dict(cf.DEFAULT_BROWSER_PROFILES)
for machine_name, raw in store_profiles.items():
try:
all_profiles[machine_name] = BrowserProfile(**raw) if isinstance(raw, dict) else raw
except Exception:
pass
browser_profile_fetchers = {mn: p.get_fetcher_class_name() for mn, p in all_profiles.items()}
# form.browser_steps[0] can be assumed that we 'goto url' first
@@ -210,7 +241,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
# Recast it if need be to right data Watch handler
watch_class = processors.get_custom_watch_obj_for_processor(form.data.get('processor'))
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore.data, default=datastore.data['watching'][uuid])
datastore.data['watching'][uuid] = watch_class(datastore_path=datastore.datastore_path, __datastore=datastore, default=datastore.data['watching'][uuid])
# Save the watch immediately
datastore.data['watching'][uuid].commit()
@@ -296,6 +327,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
template_args = {
'available_processors': processors.available_processors(),
'available_timezones': sorted(available_timezones()),
'browser_profile_fetchers': browser_profile_fetchers,
'browser_steps_config': browser_step_ui_config,
'emailprefix': os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
'extra_classes': ' '.join(c),
+7 -8
View File
@@ -10,7 +10,8 @@ from changedetectionio import html_tools
def construct_blueprint(datastore: ChangeDetectionStore):
preview_blueprint = Blueprint('ui_preview', __name__, template_folder="../ui/templates")
@preview_blueprint.route("/preview/<uuid_str:uuid>", methods=['GET'])
@preview_blueprint.route("/preview/<uuid_str:uuid>", methods=['GET', 'POST'])
@login_optionally_required
def preview_page(uuid):
"""
@@ -59,12 +60,8 @@ def construct_blueprint(datastore: ChangeDetectionStore):
versions = []
timestamp = None
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
extra_stylesheets = [url_for('static_content', group='styles', filename='diff.css')]
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
triggered_line_numbers = []
ignored_line_numbers = []
@@ -74,7 +71,9 @@ def construct_blueprint(datastore: ChangeDetectionStore):
flash(gettext("Preview unavailable - No fetch/check completed or triggers not reached"), "error")
else:
# So prepare the latest preview or not
preferred_version = request.args.get('version')
preferred_version = request.values.get('version') if request.method == 'POST' else request.args.get('version')
versions = list(watch.history.keys())
timestamp = versions[-1]
if preferred_version and preferred_version in versions:
@@ -113,7 +112,7 @@ def construct_blueprint(datastore: ChangeDetectionStore):
highlight_triggered_line_numbers=triggered_line_numbers,
highlight_blocked_line_numbers=blocked_line_numbers,
history_n=watch.history_n,
is_html_webdriver=is_html_webdriver,
fetcher_supports_screenshots=fetcher_supports_screenshots,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
@@ -143,7 +143,7 @@
<div class="tip">
{{ _('For now, Differences are performed on text, not graphically, only the latest screenshot is available.') }}
</div>
{% if is_html_webdriver %}
{% if fetcher_supports_screenshots %}
{% if screenshot %}
<div class="snapshot-age">{{watch_a.snapshot_screenshot_ctime|format_timestamp_timeago}}</div>
<img style="max-width: 80%" id="screenshot-img" alt="{{ _('Current screenshot from most recent request') }}" >
@@ -27,7 +27,8 @@
const proxy_recheck_status_url="{{url_for('check_proxies.get_recheck_status', uuid=uuid)}}";
const screenshot_url="{{url_for('static_content', group='screenshot', filename=uuid)}}";
const watch_visual_selector_data_url="{{url_for('static_content', group='visual_selector_data', filename=uuid)}}";
const default_system_fetch_backend="{{ settings_application['fetch_backend'] }}";
const default_system_fetch_backend = {{ (browser_profile_fetchers.get(settings_application.get('browser_profile') or 'direct_http_requests', 'requests')) | tojson }};
const browserProfileFetcherMap = {{ browser_profile_fetchers | tojson }};
</script>
<script src="{{url_for('static_content', group='js', filename='plugins.js')}}" defer></script>
<script src="{{url_for('static_content', group='js', filename='watch-settings.js')}}" defer></script>
@@ -131,11 +132,19 @@
{% if capabilities.supports_request_type %}
<div class="tab-pane-inner" id="request">
<div class="pure-control-group inline-radio">
{{ render_field(form.fetch_backend, class="fetch-backend") }}
<div><label for="browser_profile">{{ form.browser_profile.label.text }}</label></div>
<div><ul class="fetch-backend" id="browser_profile">
{%- for subfield in form.browser_profile %}
<li>
{{ subfield() }}
{{ browser_profile_fetchers.get(subfield.data, '')|fetcher_status_icons }}
<label for="{{ subfield.id }}">{{ subfield.label.text }}</label>
</li>
{%- endfor %}
</ul></div>
<span class="pure-form-message-inline">
<p>{{ _('Use the') }} <strong>{{ _('Basic') }}</strong> {{ _('method (default) where your watched site doesn\'t need Javascript to render.') }}</p>
<p>{{ _('The') }} <strong>{{ _('Chrome/Javascript') }}</strong> {{ _('method requires a network connection to a running WebDriver+Chrome server, set by the ENV var \'WEBDRIVER_URL\'.') }} </p>
{{ _('Tip:') }} <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support">{{ _('Connect using Bright Data and Oxylabs Proxies, find out more here.') }}</a>
<p>{{ _('Choose how this watch fetches its target URL. \'System settings default\' inherits the global setting.') }}</p>
<p>{{ _('Manage browser profiles in') }} <a href="{{ url_for('settings.settings_browsers.index') }}">{{ _('Settings → Browsers') }}</a>.</p>
</span>
</div>
{% if form.proxy %}
@@ -149,7 +158,7 @@
{% endif %}
<!-- webdriver always -->
<fieldset data-visible-for="fetch_backend=html_webdriver" style="display: none;">
<fieldset data-visible-for="fetch_backend=playwright fetch_backend=selenium fetch_backend=puppeteer fetch_backend=cloakbrowser" style="display: none;">
<div class="pure-control-group">
{{ render_field(form.webdriver_delay) }}
<div class="pure-form-message-inline">
@@ -172,8 +181,8 @@
</div>
</div>
</fieldset>
<!-- html requests always -->
<fieldset data-visible-for="fetch_backend=html_requests">
<!-- requests always -->
<fieldset data-visible-for="fetch_backend=requests">
<div class="pure-control-group">
<a class="pure-button button-secondary button-xsmall show-advanced">{{ _('Show advanced options') }}</a>
</div>
@@ -210,7 +219,7 @@ Math: {{ 1 + 1 }}") }}
({{ _('Not supported by Selenium browser') }})
</div>
</div>
<fieldset data-visible-for="fetch_backend=html_requests fetch_backend=html_webdriver" >
<fieldset data-visible-for="fetch_backend=requests fetch_backend=playwright fetch_backend=selenium fetch_backend=puppeteer fetch_backend=cloakbrowser" >
<div class="pure-control-group inline-radio advanced-options" style="display: none;">
{{ render_checkbox_field(form.ignore_status_codes) }}
</div>
@@ -17,7 +17,7 @@
<script src="{{ url_for('static_content', group='js', filename='tabs.js') }}" defer></script>
{% if versions|length >= 2 %}
<div id="diff-form" style="text-align: center;">
<form class="pure-form " action="" method="POST">
<form class="pure-form " action="{{url_for('ui.ui_preview.preview_page', uuid=uuid)}}" method="POST">
<fieldset>
<label for="preview-version">{{ _('Select timestamp') }}</label> <select id="preview-version"
name="from_version"
@@ -28,6 +28,7 @@
</option>
{% endfor %}
</select>
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="pure-button pure-button-primary">{{ _('Go') }}</button>
</fieldset>
@@ -81,6 +81,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
sorted_tags = sorted(datastore.data['settings']['application'].get('tags').items(), key=lambda x: x[1]['title'])
proxy_list = datastore.proxy_list
output = render_template(
"watch-overview.html",
active_tag=active_tag,
@@ -92,7 +93,7 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
form=form,
generate_tag_colors=processors.generate_processor_badge_colors,
guid=datastore.data['app_guid'],
has_proxies=datastore.proxy_list,
has_proxies=proxy_list,
hosted_sticky=os.getenv("SALTED_PASS", False) == False,
now_time_server=round(time.time()),
pagination=pagination,
@@ -104,12 +105,22 @@ def construct_blueprint(datastore: ChangeDetectionStore, update_q, queuedWatchMe
search_q=request.args.get('q', '').strip(),
sort_attribute=request.args.get('sort') if request.args.get('sort') else request.cookies.get('sort'),
sort_order=request.args.get('order') if request.args.get('order') else request.cookies.get('order'),
system_default_fetcher=datastore.data['settings']['application'].get('fetch_backend'),
system_default_fetcher=datastore.data['settings']['application'].get('browser_profile'),
tags=sorted_tags,
unread_changes_count=datastore.unread_changes_count,
watches=sorted_watches
)
# Return freed template-building memory to the OS immediately.
# render_template allocates ~20MB of intermediate strings that are freed on return,
# but glibc keeps those pages mapped in its arenas as RSS. malloc_trim() forces
# glibc to release them, preventing RSS growth from concurrent Chrome connections.
try:
import ctypes
ctypes.CDLL('libc.so.6').malloc_trim(0)
except Exception:
pass
if session.get('share-link'):
del (session['share-link'])
@@ -213,12 +213,13 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{%- set checking_now = is_checking_now(watch) -%}
{%- set history_n = watch.history_n -%}
{%- set favicon = watch.get_favicon_filename() -%}
{%- set error_texts = watch.compile_error_texts(has_proxies=has_proxies) -%}
{%- set system_use_url_watchlist = datastore.data['settings']['application']['ui'].get('use_page_title_in_list') -%}
{# Class settings mirrored in changedetectionio/static/js/realtime.js for the frontend #}
{%- set row_classes = [
loop.cycle('pure-table-odd', 'pure-table-even'),
'processor-' ~ watch['processor'],
'has-error' if watch.compile_error_texts()|length > 2 else '',
'has-error' if error_texts|length > 2 else '',
'paused' if watch.paused is defined and watch.paused != False else '',
'unviewed' if watch.has_unviewed else '',
'has-restock-info' if watch.has_restock_info else 'no-restock-info',
@@ -271,7 +272,7 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{% endif %}
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}">&nbsp;</a>
</span>
<div class="error-text" style="display:none;">{{ watch.compile_error_texts(has_proxies=datastore.proxy_list)|safe }}</div>
<div class="error-text" style="display:none;">{{ error_texts|safe }}</div>
{%- if watch['processor'] == 'text_json_diff' -%}
{%- if watch['has_ldjson_price_data'] and not watch['track_ldjson_price_data'] -%}
<div class="ldjson-price-track-offer">Switch to Restock & Price watch mode? <a href="{{url_for('price_data_follower.accept', uuid=watch.uuid)}}" class="pure-button button-xsmall">Yes</a> <a href="{{url_for('price_data_follower.reject', uuid=watch.uuid)}}" class="">No</a></div>
@@ -284,10 +285,7 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
</div>
<div class="status-icons">
<a class="link-spread" href="{{url_for('ui.form_share_put_watch', uuid=watch.uuid)}}"><img src="{{url_for('static_content', group='images', filename='spread.svg')}}" class="status-icon icon icon-spread" title="Create a link to share watch config with others" ></a>
{%- set effective_fetcher = watch.get_fetch_backend if watch.get_fetch_backend != "system" else system_default_fetcher -%}
{%- if effective_fetcher and ("html_webdriver" in effective_fetcher or "html_" in effective_fetcher or "extra_browser_" in effective_fetcher) -%}
{{ effective_fetcher|fetcher_status_icons }}
{%- endif -%}
{{ watch.effective_browser_profile.get_fetcher_class_name()|fetcher_status_icons }}
{%- if watch.is_pdf -%}<img class="status-icon" src="{{url_for('static_content', group='images', filename='pdf-icon.svg')}}" alt="Converting PDF to text" >{%- endif -%}
{%- if watch.has_browser_steps -%}<img class="status-icon status-browsersteps" src="{{url_for('static_content', group='images', filename='steps.svg')}}" alt="Browser Steps is enabled" >{%- endif -%}
@@ -305,12 +303,20 @@ html[data-darkmode="true"] .watch-tag-list.tag-{{ class_name }} {
{%- endif -%}
{%- if watch.get('restock') and watch['restock'].get('price') -%}
{%- if watch['restock']['price'] is number -%}
<span class="restock-label price" title="{{ _('Price') }}">
{{ watch['restock']['price']|format_number_locale if watch['restock'].get('price') else '' }} {{ watch['restock'].get('currency','') }}
</span>
{%- else -%} <!-- watch['restock']['price']' is not a number, cant output it -->
{%- set restock = watch['restock'] -%}
{%- set price = restock.get('price') -%}
{%- set cur = restock.get('currency','') -%}
{%- if price is not none and (price|string)|regex_search('\d') -%}
<span class="restock-label price" title="{{ _('Price') }}">
{# @todo: make parse_currency/parse_decimal aware of the locale of the actual web page and use that instead changedetectionio/processors/restock_diff/__init__.py #}
{%- if price is number -%}{# It's a number so we can convert it to their locale' #}
{{ price|format_number_locale }} {{ cur }}<!-- as number -->
{%- else -%}{# It's totally fine if it arrives as something else, the website might be something weird in this field #}
{{ price }} {{ cur }}<!-- as string -->
{%- endif -%}
</span>
{%- endif -%}
{%- elif not watch.has_restock_info -%}
<span class="restock-label error">{{ _('No information') }}</span>
{%- endif -%}
+43 -60
View File
@@ -1,5 +1,4 @@
import sys
from changedetectionio.strtobool import strtobool
from loguru import logger
from changedetectionio.content_fetchers.exceptions import BrowserStepsStepException
import os
@@ -25,87 +24,71 @@ SCREENSHOT_MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_
# Most modern GPUs support 16384x16384 textures, so 1280x10000 is safe
SCREENSHOT_SIZE_STITCH_THRESHOLD = int(os.getenv("SCREENSHOT_CHUNK_HEIGHT", 10000))
# available_fetchers() will scan this implementation looking for anything starting with html_
# this information is used in the form selections
from changedetectionio.content_fetchers.requests import fetcher as html_requests
import importlib.resources
XPATH_ELEMENT_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('xpath_element_scraper.js').read_text(encoding='utf-8')
INSTOCK_DATA_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('stock-not-in-stock.js').read_text(encoding='utf-8')
FAVICON_FETCHER_JS = importlib.resources.files("changedetectionio.content_fetchers.res").joinpath('favicon-fetcher.js').read_text(encoding='utf-8')
# Registry: clean fetcher name → fetcher class (e.g. 'requests', 'playwright', 'cloakbrowser')
FETCHERS: dict = {}
def register_fetcher(name: str, cls) -> None:
"""Register a fetcher class under its clean name (no html_ prefix)."""
FETCHERS[name] = cls
def get_fetcher(name: str):
"""Return the fetcher class for a clean name, or None."""
return FETCHERS.get(name)
def available_fetchers():
# See the if statement at the bottom of this file for how we switch between playwright and webdriver
import inspect
p = []
# Get built-in fetchers (but skip plugin fetchers that were added via setattr)
for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass):
if inspect.isclass(obj):
# @todo html_ is maybe better as fetcher_ or something
# In this case, make sure to edit the default one in store.py and fetch_site_status.py
if name.startswith('html_'):
# Skip plugin fetchers that were already registered
if name not in _plugin_fetchers:
t = tuple([name, obj.fetcher_description])
p.append(t)
# Get plugin fetchers from cache (already loaded at module init)
for name, fetcher_class in _plugin_fetchers.items():
if hasattr(fetcher_class, 'fetcher_description'):
t = tuple([name, fetcher_class.fetcher_description])
p.append(t)
else:
logger.warning(f"Plugin fetcher '{name}' does not have fetcher_description attribute")
return p
"""Return list of (name, description) for all registered fetchers."""
return [(name, cls.fetcher_description) for name, cls in FETCHERS.items()
if hasattr(cls, 'fetcher_description')]
def get_plugin_fetchers():
"""Load and return all plugin fetchers from the centralized plugin manager."""
from changedetectionio.pluggy_interface import plugin_manager
def available_browser_fetchers():
"""Return list of (name, description) for fetchers that support screenshots (browser-type fetchers)."""
return [(name, cls.fetcher_description) for name, cls in FETCHERS.items()
if cls.supports_screenshots]
fetchers = {}
def _load_fetchers():
"""Load all fetchers (built-ins + plugins) into the FETCHERS registry."""
from changedetectionio.pluggy_interface import plugin_manager, register_builtin_fetchers
# Built-ins must be registered first
register_builtin_fetchers()
# Then external plugins
try:
# Call the register_content_fetcher hook from all registered plugins
results = plugin_manager.hook.register_content_fetcher()
for result in results:
if result:
name, fetcher_class = result
fetchers[name] = fetcher_class
# Register in current module so hasattr() checks work
setattr(sys.modules[__name__], name, fetcher_class)
logger.info(f"Registered plugin fetcher: {name} - {getattr(fetcher_class, 'fetcher_description', 'No description')}")
register_fetcher(name, fetcher_class)
logger.info(f"Registered fetcher: {name} - {getattr(fetcher_class, 'fetcher_description', '?')}")
except Exception as e:
logger.error(f"Error loading plugin fetchers: {e}")
return fetchers
# Initialize plugins at module load time
_plugin_fetchers = get_plugin_fetchers()
# Default browser profiles always shown in the browser profiles table (keyed by machine name)
DEFAULT_BROWSER_PROFILES: dict = {}
# Decide which is the 'real' HTML webdriver, this is more a system wide config
# rather than site-specific.
use_playwright_as_chrome_fetcher = os.getenv('PLAYWRIGHT_DRIVER_URL', False)
if use_playwright_as_chrome_fetcher:
# @note - For now, browser steps always uses playwright
if not strtobool(os.getenv('FAST_PUPPETEER_CHROME_FETCHER', 'False')):
logger.debug('Using Playwright library as fetcher')
from .playwright import fetcher as html_webdriver
else:
logger.debug('Using direct Python Puppeteer library as fetcher')
from .puppeteer import fetcher as html_webdriver
else:
logger.debug("Falling back to selenium as fetcher")
from .webdriver_selenium import fetcher as html_webdriver
def _register_default_browser_profiles():
"""Register browser profiles that are always present in the profiles table."""
from changedetectionio.model.browser_profile import BUILTIN_REQUESTS
DEFAULT_BROWSER_PROFILES[BUILTIN_REQUESTS.get_machine_name()] = BUILTIN_REQUESTS
# Register built-in fetchers as plugins after all imports are complete
from changedetectionio.pluggy_interface import register_builtin_fetchers
register_builtin_fetchers()
# Populate the registry at module load time
_load_fetchers()
_register_default_browser_profiles()
+32 -18
View File
@@ -70,37 +70,41 @@ class Fetcher():
supports_screenshots = False # Can capture page screenshots
supports_xpath_element_data = False # Can extract xpath element positions/data for visual selector
# Icon shown in the watch list when this fetcher is the effective fetcher.
# Set to a dict with 'filename', 'alt', 'title' keys (image served from static/images/).
# None means no icon is shown (e.g. plain HTTP requests fetcher).
status_icon = None
# Screenshot element locking - prevents layout shifts during screenshot capture
# Only needed for visual comparison (image_ssim_diff processor)
# Locks element dimensions in the first viewport to prevent headers/ads from resizing
lock_viewport_elements = False # Default: disabled for performance
# BrowserProfile-derived settings — applied by browser fetchers, ignored by html_requests
viewport_width: int = 1280
viewport_height: int = 1000
block_images: bool = False
block_fonts: bool = False
profile_user_agent: str = None # Profile-level UA; lower priority than request_headers User-Agent
ignore_https_errors: bool = False
locale: str = None
service_workers: str = 'allow'
extra_delay: int = 0
def __init__(self, **kwargs):
if kwargs and 'screenshot_format' in kwargs:
self.screenshot_format = kwargs.get('screenshot_format')
# Allow lock_viewport_elements to be set via kwargs
if kwargs and 'lock_viewport_elements' in kwargs:
self.lock_viewport_elements = kwargs.get('lock_viewport_elements')
# BrowserProfile fields — store whatever was passed, subclasses use them
for field in ('viewport_width', 'viewport_height', 'block_images', 'block_fonts',
'profile_user_agent', 'ignore_https_errors', 'locale',
'service_workers', 'extra_delay'):
if field in kwargs:
setattr(self, field, kwargs[field])
@classmethod
def get_status_icon_data(cls):
"""Return data for status icon to display in the watch overview.
This method can be overridden by subclasses to provide custom status icons.
Returns:
dict or None: Dictionary with icon data:
{
'filename': 'icon-name.svg', # Icon filename
'alt': 'Alt text', # Alt attribute
'title': 'Tooltip text', # Title attribute
'style': 'height: 1em;' # Optional inline CSS
}
Or None if no icon
"""
return None
def clear_content(self):
"""
@@ -198,6 +202,16 @@ class Fetcher():
# Stop processing here
raise BrowserStepsStepException(step_n=step_n, original_e=e)
def disk_cleanup_after_fetch(self):
"""Remove any temporary files written to disk during a fetch.
The default implementation is a no-op. Browser-based fetchers
override this to delete browser-step screenshots and any other
ephemeral files they create. Called by the processor after
``quit()`` regardless of whether the fetch succeeded or failed.
"""
pass
# It's always good to reset these
def delete_browser_steps_screenshots(self):
import glob
@@ -1,471 +0,0 @@
import asyncio
import gc
import json
import os
from urllib.parse import urlparse
from loguru import logger
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_MAX_TOTAL_HEIGHT, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, FAVICON_FETCHER_JS
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import PageUnloadable, Non200ErrorCodeReceived, EmptyReply, ScreenshotUnavailable, \
BrowserStepsStepException
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import os
import time
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
# Use an approach similar to puppeteer: set a larger viewport and take screenshots in chunks
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD # Size that won't cause GPU to overflow
screenshot_chunks = []
y = 0
elements_locked = False
# Only lock viewport elements if explicitly enabled (for image_ssim_diff processor)
# This prevents headers/ads from resizing when viewport changes
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
lock_time = time.time() - lock_start
logger.debug(f"{watch_info}Viewport element locking enabled (took {lock_time:.2f}s)")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height # Incase page is bigger than default viewport but smaller than proposed step size
viewport_start = time.time()
logger.debug(f"{watch_info}Setting bigger viewport to step through large page width W{page.viewport_size['width']}xH{step_size} because page_height > viewport_size")
# Set viewport to a larger size to capture more content at once
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
viewport_time = time.time() - viewport_start
logger.debug(f"{watch_info}Viewport changed to {page.viewport_size['width']}x{step_size} (took {viewport_time:.2f}s)")
# Capture screenshots in chunks up to the max total height
capture_start = time.time()
chunk_times = []
# Use PNG for better quality (no compression artifacts), JPEG for smaller size
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
# PNG should use quality 100, JPEG uses configurable quality
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
# Only scroll if not at the top (y > 0)
if y > 0:
await page.evaluate(f"window.scrollTo(0, {y})")
# Request GC only before screenshot (not 3x per chunk)
await page.request_gc()
screenshot_kwargs = {
'type': screenshot_type,
'full_page': False
}
# Only pass quality parameter for jpeg (PNG doesn't support it in Playwright)
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
# Restore original viewport size
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
# Unlock element dimensions if they were locked
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
logger.debug(f"{watch_info}Element dimensions unlocked after screenshot capture")
capture_time = time.time() - capture_start
total_capture_time = sum(chunk_times)
logger.debug(f"{watch_info}All {len(screenshot_chunks)} chunks captured in {capture_time:.2f}s (total chunk time: {total_capture_time:.2f}s)")
# If we have multiple chunks, stitch them together
if len(screenshot_chunks) > 1:
stitch_start = time.time()
logger.debug(f"{watch_info}Starting stitching of {len(screenshot_chunks)} chunks")
# Always use spawn subprocess for ANY stitching (2+ chunks)
# PIL allocates at C level and Python GC never releases it - subprocess exit forces OS to reclaim
# Trade-off: 35MB resource_tracker vs 500MB+ PIL leak in main process
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
# Send via raw bytes (no pickle)
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
setup_time = total_time - capture_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px, Capture height: {SCREENSHOT_MAX_TOTAL_HEIGHT}px | "
f"Setup: {setup_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
class fetcher(Fetcher):
fetcher_description = "Playwright {}/Javascript".format(
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
)
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
fetcher_description += " via '{}'".format(os.getenv("PLAYWRIGHT_DRIVER_URL"))
browser_type = ''
command_executor = ''
# Configs for Proxy setup
# In the ENV vars, is prefixed with "playwright_proxy_", so it is for example "playwright_proxy_server"
playwright_proxy_settings_mappings = ['bypass', 'server', 'username', 'password']
proxy = None
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Playwright fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
self.browser_type = os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').strip('"')
if custom_browser_connection_url:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
# Fallback to fetching from system
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.browser_connection_url = os.getenv("PLAYWRIGHT_DRIVER_URL", 'ws://playwright-chrome:3000').strip('"')
# If any proxy settings are enabled, then we should setup the proxy object
proxy_args = {}
for k in self.playwright_proxy_settings_mappings:
v = os.getenv('playwright_proxy_' + k, False)
if v:
proxy_args[k] = v.strip('"')
if proxy_args:
self.proxy = proxy_args
# allow per-watch proxy selection override
if proxy_override:
self.proxy = {'server': proxy_override}
if self.proxy:
# Playwright needs separate username and password values
parsed = urlparse(self.proxy.get('server'))
if parsed.username:
self.proxy['username'] = parsed.username
self.proxy['password'] = parsed.password
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Request GC immediately after screenshot to free memory
# Screenshots can be large and browser steps take many of them
await self.page.request_gc()
if self.browser_steps_screenshot_path is not None:
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
logger.debug(f"Saving step screenshot to {destination}")
with open(destination, 'wb') as f:
f.write(screenshot)
# Clear local reference to allow screenshot bytes to be collected
del screenshot
gc.collect()
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = await self.page.content()
# Request GC after getting page content
await self.page.request_gc()
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w', encoding='utf-8') as f:
f.write(content)
# Clear local reference
del content
gc.collect()
async def run(self,
fetch_favicon=True,
current_include_filters=None,
empty_pages_are_a_change=False,
ignore_status_codes=False,
is_binary=False,
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid # Store for use in screenshot_step
response = None
async with async_playwright() as p:
browser_type = getattr(p, self.browser_type)
# Seemed to cause a connection Exception even tho I can see it connect
# self.browser = browser_type.connect(self.command_executor, timeout=timeout*1000)
# 60,000 connection timeout only
browser = await browser_type.connect_over_cdp(self.browser_connection_url, timeout=60000)
# SOCKS5 with authentication is not supported (yet)
# https://github.com/microsoft/playwright/issues/10567
# Set user agent to prevent Cloudflare from blocking the browser
# Use the default one configured in the App.py model that's passed from fetch_site_status.py
context = await browser.new_context(
accept_downloads=False, # Should never be needed
bypass_csp=True, # This is needed to enable JavaScript execution on GitHub and others
extra_http_headers=request_headers,
ignore_https_errors=True,
proxy=self.proxy,
service_workers=os.getenv('PLAYWRIGHT_SERVICE_WORKERS', 'allow'), # Should be `allow` or `block` - sites like YouTube can transmit large amounts of data via Service Workers
user_agent=manage_user_agent(headers=request_headers),
)
self.page = await context.new_page()
# Listen for all console events and handle errors
self.page.on("console", lambda msg: logger.debug(f"Playwright console: Watch URL: {url} {msg.type}: {msg.text} {msg.args}"))
# Re-use as much code from browser steps as possible so its the same
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
response = await browsersteps_interface.action_goto_url(value=url)
if response is None:
await context.close()
await browser.close()
logger.debug("Content Fetcher > Response object from the browser communication was none")
raise EmptyReply(url=url, status_code=None)
# In async_playwright, all_headers() returns a coroutine
try:
self.headers = await response.all_headers()
except TypeError:
# Fallback for sync version
self.headers = response.all_headers()
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
await browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._errors.TimeoutError as e:
await context.close()
await browser.close()
# This can be ok, we will try to grab what we could retrieve
pass
except Exception as e:
logger.debug(f"Content Fetcher > Other exception when executing custom JS code {str(e)}")
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
extra_wait = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay
await self.page.wait_for_timeout(extra_wait * 1000)
try:
self.status_code = response.status
except Exception as e:
# https://github.com/dgtlmoon/changedetection.io/discussions/2122#discussioncomment-8241962
logger.critical(f"Response from the browser/Playwright did not have a status_code! Response follows.")
logger.critical(response)
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if fetch_favicon:
try:
self.favicon_blob = await self.page.evaluate(FAVICON_FETCHER_JS)
await self.page.request_gc()
except Exception as e:
logger.error(f"Error fetching FavIcon info {str(e)}, continuing.")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Finally block will handle cleanup
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
logger.debug("Content Fetcher > Content was empty, empty_pages_are_a_change = False")
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
# Wrap remaining operations in try/finally to ensure cleanup
try:
# Run Browser Steps here
if self.browser_steps:
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
# Finally block will handle cleanup
raise
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
# So we can find an element on the page where its selector was entered manually (maybe not xPath etc)
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await self.page.request_gc()
# request_gc before and after evaluate to free up memory
# @todo browsersteps etc
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await self.page.request_gc()
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await self.page.request_gc()
self.content = await self.page.content()
await self.page.request_gc()
logger.debug(f"Scrape xPath element data in browser done in {time.time() - now:.2f}s")
# Bug 3 in Playwright screenshot handling
# Some bug where it gives the wrong screenshot size, but making a request with the clip set first seems to solve it
# JPEG is better here because the screenshots can be very very large
# Screenshots also travel via the ws:// (websocket) meaning that the binary data is base64 encoded
# which will significantly increase the IO size between the server and client, it's recommended to use the lowest
# acceptable screenshot quality here
# The actual screenshot - this always base64 and needs decoding! horrible! huge CPU usage
self.screenshot = await capture_full_page_async(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
# Force aggressive memory cleanup - screenshots are large and base64 decode creates temporary buffers
await self.page.request_gc()
gc.collect()
except ScreenshotUnavailable:
# Re-raise screenshot unavailable exceptions
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
# Clean up resources properly with timeouts to prevent hanging
try:
if hasattr(self, 'page') and self.page:
await self.page.request_gc()
await asyncio.wait_for(self.page.close(), timeout=5.0)
logger.debug(f"Successfully closed page for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing page for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing page for {url}: {e}")
finally:
self.page = None
try:
if context:
await asyncio.wait_for(context.close(), timeout=5.0)
logger.debug(f"Successfully closed context for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing context for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing context for {url}: {e}")
finally:
context = None
try:
if browser:
await asyncio.wait_for(browser.close(), timeout=5.0)
logger.debug(f"Successfully closed browser connection for {url}")
except asyncio.TimeoutError:
logger.warning(f"Timed out closing browser connection for {url} (5s)")
except Exception as e:
logger.warning(f"Error closing browser for {url}: {e}")
finally:
browser = None
# Force Python GC to release Playwright resources immediately
# Playwright objects can have circular references that delay cleanup
gc.collect()
# Plugin registration for built-in fetcher
class PlaywrightFetcherPlugin:
"""Plugin class that registers the Playwright fetcher as a built-in plugin."""
def register_content_fetcher(self):
"""Register the Playwright fetcher"""
return ('html_webdriver', fetcher)
# Create module-level instance for plugin registration
playwright_plugin = PlaywrightFetcherPlugin()
@@ -0,0 +1,41 @@
"""
Playwright CDP fetcher connects to a remote browser via Chrome DevTools Protocol.
browser_connection_url must be supplied via the resolved BrowserProfile
(set by preconfigure_browser_profiles_based_on_env at startup or edited in the UI).
"""
from loguru import logger
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Chrome (CDP/Remote)"
requires_connection_url = True
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(proxy_override=proxy_override, custom_browser_connection_url=custom_browser_connection_url, **kwargs)
if custom_browser_connection_url:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
logger.critical("Playwright CDP fetcher has no browser_connection_url — browser profile was not configured. "
"Set PLAYWRIGHT_DRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
# CDP always connects to Chromium
self.browser_type = 'chromium'
async def _connect_browser(self, p):
browser_type = getattr(p, self.browser_type)
return await browser_type.connect_over_cdp(self.browser_connection_url, timeout=60_000)
class PlaywrightCDPPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_cdp', fetcher)
cdp_plugin = PlaywrightCDPPlugin()
@@ -0,0 +1,403 @@
"""
Playwright-based content fetchers.
Submodules:
cdp connect to a remote browser via Chrome DevTools Protocol (CDP/WebSocket)
chrome launch a local Chromium browser
firefox launch a local Firefox browser
webkit launch a local WebKit (Safari-engine) browser
"""
import asyncio
import gc
import json
import os
import re
from urllib.parse import urlparse
from loguru import logger
from changedetectionio.content_fetchers import (
SCREENSHOT_MAX_HEIGHT_DEFAULT,
SCREENSHOT_MAX_TOTAL_HEIGHT,
SCREENSHOT_SIZE_STITCH_THRESHOLD,
FAVICON_FETCHER_JS,
INSTOCK_DATA_JS,
XPATH_ELEMENT_JS,
visualselector_xpath_selectors,
)
from changedetectionio.content_fetchers.base import Fetcher, manage_user_agent
from changedetectionio.content_fetchers.exceptions import (
BrowserStepsStepException,
EmptyReply,
Non200ErrorCodeReceived,
PageUnloadable,
ScreenshotUnavailable,
)
async def capture_full_page_async(page, screenshot_format='JPEG', watch_uuid=None, lock_viewport_elements=False):
import time
start = time.time()
watch_info = f"[{watch_uuid}] " if watch_uuid else ""
setup_start = time.time()
page_height = await page.evaluate("document.documentElement.scrollHeight")
page_width = await page.evaluate("document.documentElement.scrollWidth")
original_viewport = page.viewport_size
dimensions_time = time.time() - setup_start
logger.debug(f"{watch_info}Playwright viewport size {page.viewport_size} page height {page_height} page width {page_width} (got dimensions in {dimensions_time:.2f}s)")
step_size = SCREENSHOT_SIZE_STITCH_THRESHOLD
screenshot_chunks = []
y = 0
elements_locked = False
if lock_viewport_elements and page_height > page.viewport_size['height']:
lock_start = time.time()
lock_elements_js_path = os.path.join(os.path.dirname(__file__), '..', 'res', 'lock-elements-sizing.js')
with open(lock_elements_js_path, 'r') as f:
lock_elements_js = f.read()
await page.evaluate(lock_elements_js)
elements_locked = True
logger.debug(f"{watch_info}Viewport element locking enabled (took {time.time() - lock_start:.2f}s)")
if page_height > page.viewport_size['height']:
if page_height < step_size:
step_size = page_height
await page.set_viewport_size({'width': page.viewport_size['width'], 'height': step_size})
capture_start = time.time()
chunk_times = []
screenshot_type = screenshot_format.lower() if screenshot_format else 'jpeg'
screenshot_quality = 100 if screenshot_type == 'png' else int(os.getenv("SCREENSHOT_QUALITY", 72))
while y < min(page_height, SCREENSHOT_MAX_TOTAL_HEIGHT):
if y > 0:
await page.evaluate(f"window.scrollTo(0, {y})")
await _safe_request_gc(page)
screenshot_kwargs = {'type': screenshot_type, 'full_page': False}
if screenshot_type == 'jpeg':
screenshot_kwargs['quality'] = screenshot_quality
chunk_start = time.time()
screenshot_chunks.append(await page.screenshot(**screenshot_kwargs))
chunk_time = time.time() - chunk_start
chunk_times.append(chunk_time)
logger.debug(f"{watch_info}Chunk {len(screenshot_chunks)} captured in {chunk_time:.2f}s")
y += step_size
await page.set_viewport_size({'width': original_viewport['width'], 'height': original_viewport['height']})
if elements_locked:
unlock_elements_js_path = os.path.join(os.path.dirname(__file__), '..', 'res', 'unlock-elements-sizing.js')
with open(unlock_elements_js_path, 'r') as f:
unlock_elements_js = f.read()
await page.evaluate(unlock_elements_js)
capture_time = time.time() - capture_start
if len(screenshot_chunks) > 1:
stitch_start = time.time()
from changedetectionio.content_fetchers.screenshot_handler import stitch_images_worker_raw_bytes
import multiprocessing
import struct
ctx = multiprocessing.get_context('spawn')
parent_conn, child_conn = ctx.Pipe()
p = ctx.Process(target=stitch_images_worker_raw_bytes, args=(child_conn, page_height, SCREENSHOT_MAX_TOTAL_HEIGHT))
p.start()
parent_conn.send_bytes(struct.pack('I', len(screenshot_chunks)))
for chunk in screenshot_chunks:
parent_conn.send_bytes(chunk)
screenshot = parent_conn.recv_bytes()
p.join()
parent_conn.close()
child_conn.close()
del p, parent_conn, child_conn
stitch_time = time.time() - stitch_start
total_time = time.time() - start
setup_time = total_time - capture_time - stitch_time
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px | "
f"Setup: {setup_time:.2f}s, Capture: {capture_time:.2f}s, Stitching: {stitch_time:.2f}s, Total: {total_time:.2f}s")
return screenshot
total_time = time.time() - start
logger.debug(
f"{watch_info}Screenshot complete - Page height: {page_height}px | "
f"Setup: {total_time - capture_time:.2f}s, Single chunk: {capture_time:.2f}s, Total: {total_time:.2f}s")
return screenshot_chunks[0]
async def _safe_request_gc(page):
"""Request browser GC — Chromium-specific, silently ignored on Firefox/WebKit."""
try:
await page.request_gc()
except Exception:
pass
class PlaywrightBaseFetcher(Fetcher):
"""
Shared base for all Playwright fetchers.
Subclasses implement ``_connect_browser(playwright_instance)`` to return a
connected-or-launched browser object. Everything else context creation,
page interaction, screenshot capture, browser-steps execution lives here.
"""
playwright_proxy_settings_mappings = ['bypass', 'server', 'username', 'password']
proxy = None
# Capability flags
supports_browser_steps = True
supports_screenshots = True
supports_xpath_element_data = True
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
# Subclasses may use this (e.g. CDP); others ignore it
self._custom_browser_connection_url = custom_browser_connection_url
proxy_args = {}
for k in self.playwright_proxy_settings_mappings:
v = os.getenv('playwright_proxy_' + k, False)
if v:
proxy_args[k] = v.strip('"')
if proxy_args:
self.proxy = proxy_args
if proxy_override:
self.proxy = {'server': proxy_override}
if self.proxy:
parsed = urlparse(self.proxy.get('server', ''))
if parsed.username:
self.proxy['username'] = parsed.username
self.proxy['password'] = parsed.password
def disk_cleanup_after_fetch(self):
"""Delete browser-step screenshots written during this fetch."""
self.delete_browser_steps_screenshots()
async def _connect_browser(self, playwright_instance):
"""Return an open browser object. Must be overridden by each subclass."""
raise NotImplementedError(f"{type(self).__name__} must implement _connect_browser()")
async def screenshot_step(self, step_n=''):
super().screenshot_step(step_n=step_n)
watch_uuid = getattr(self, 'watch_uuid', None)
screenshot = await capture_full_page_async(
page=self.page,
screenshot_format=self.screenshot_format,
watch_uuid=watch_uuid,
lock_viewport_elements=self.lock_viewport_elements,
)
await _safe_request_gc(self.page)
if self.browser_steps_screenshot_path is not None:
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.jpeg'.format(step_n))
logger.debug(f"Saving step screenshot to {destination}")
with open(destination, 'wb') as f:
f.write(screenshot)
del screenshot
gc.collect()
async def save_step_html(self, step_n):
super().save_step_html(step_n=step_n)
content = await self.page.content()
await _safe_request_gc(self.page)
destination = os.path.join(self.browser_steps_screenshot_path, 'step_{}.html'.format(step_n))
logger.debug(f"Saving step HTML to {destination}")
with open(destination, 'w', encoding='utf-8') as f:
f.write(content)
del content
gc.collect()
async def run(self,
fetch_favicon=True,
current_include_filters=None,
empty_pages_are_a_change=False,
ignore_status_codes=False,
is_binary=False,
request_body=None,
request_headers=None,
request_method=None,
screenshot_format=None,
timeout=None,
url=None,
watch_uuid=None,
):
from playwright.async_api import async_playwright
import playwright._impl._errors
import time
self.delete_browser_steps_screenshots()
self.watch_uuid = watch_uuid
response = None
async with async_playwright() as p:
browser = await self._connect_browser(p)
ua = manage_user_agent(headers=request_headers) or self.profile_user_agent or None
context_kwargs = dict(
accept_downloads=False,
bypass_csp=True,
extra_http_headers=request_headers,
ignore_https_errors=self.ignore_https_errors,
proxy=self.proxy,
service_workers=self.service_workers,
user_agent=ua,
viewport={'width': self.viewport_width, 'height': self.viewport_height},
)
if self.locale:
context_kwargs['locale'] = self.locale
context = await browser.new_context(**context_kwargs)
if self.block_images:
await context.route(
re.compile(r'\.(png|jpe?g|gif|svg|ico|webp|avif|bmp)(\?.*)?$', re.IGNORECASE),
lambda route: route.abort()
)
if self.block_fonts:
await context.route(
re.compile(r'\.(woff2?|ttf|otf|eot)(\?.*)?$', re.IGNORECASE),
lambda route: route.abort()
)
self.page = await context.new_page()
self.page.on("console", lambda msg: logger.debug(f"Playwright console: {url} {msg.type}: {msg.text}"))
from changedetectionio.browser_steps.browser_steps import steppable_browser_interface
browsersteps_interface = steppable_browser_interface(start_url=url)
browsersteps_interface.page = self.page
response = await browsersteps_interface.action_goto_url(value=url)
if response is None:
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=None)
try:
self.headers = await response.all_headers()
except TypeError:
self.headers = response.all_headers()
try:
if self.webdriver_js_execute_code is not None and len(self.webdriver_js_execute_code):
await browsersteps_interface.action_execute_js(value=self.webdriver_js_execute_code, selector=None)
except playwright._impl._errors.TimeoutError:
await context.close()
await browser.close()
pass
except Exception as e:
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
extra_wait = self.extra_delay + self.render_extract_delay
await self.page.wait_for_timeout(extra_wait * 1000)
try:
self.status_code = response.status
except Exception as e:
await context.close()
await browser.close()
raise PageUnloadable(url=url, status_code=None, message=str(e))
if fetch_favicon:
try:
self.favicon_blob = await self.page.evaluate(FAVICON_FETCHER_JS)
await _safe_request_gc(self.page)
except Exception as e:
logger.error(f"Error fetching favicon: {e}")
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page_async(self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
try:
page_html = await self.page.content()
except Exception as e:
logger.warning(f"Got non-200 status {self.status_code} but failed to fetch page content: {e}")
page_html = None
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot, page_html=page_html)
if not empty_pages_are_a_change and len((await self.page.content()).strip()) == 0:
await context.close()
await browser.close()
raise EmptyReply(url=url, status_code=response.status)
try:
if self.browser_steps:
try:
await self.iterate_browser_steps(start_url=url)
except BrowserStepsStepException:
raise
await self.page.wait_for_timeout(extra_wait * 1000)
now = time.time()
if current_include_filters is not None:
await self.page.evaluate("var include_filters={}".format(json.dumps(current_include_filters)))
else:
await self.page.evaluate("var include_filters=''")
await _safe_request_gc(self.page)
MAX_TOTAL_HEIGHT = int(os.getenv("SCREENSHOT_MAX_HEIGHT", SCREENSHOT_MAX_HEIGHT_DEFAULT))
self.xpath_data = await self.page.evaluate(XPATH_ELEMENT_JS, {
"visualselector_xpath_selectors": visualselector_xpath_selectors,
"max_height": MAX_TOTAL_HEIGHT
})
await _safe_request_gc(self.page)
self.instock_data = await self.page.evaluate(INSTOCK_DATA_JS)
await _safe_request_gc(self.page)
self.content = await self.page.content()
await _safe_request_gc(self.page)
logger.debug(f"Scrape xPath element data done in {time.time() - now:.2f}s")
self.screenshot = await capture_full_page_async(
page=self.page,
screenshot_format=self.screenshot_format,
watch_uuid=watch_uuid,
lock_viewport_elements=self.lock_viewport_elements,
)
await _safe_request_gc(self.page)
gc.collect()
except ScreenshotUnavailable:
raise ScreenshotUnavailable(url=url, status_code=self.status_code)
finally:
for obj, name, close_coro in [
(self.page if hasattr(self, 'page') and self.page else None, 'page', lambda: self.page.close() if self.page else asyncio.sleep(0)),
(context, 'context', lambda: context.close() if context else asyncio.sleep(0)),
(browser, 'browser', lambda: browser.close() if browser else asyncio.sleep(0)),
]:
try:
await asyncio.wait_for(close_coro(), timeout=5.0)
except asyncio.TimeoutError:
logger.warning(f"Timed out closing {name} for {url}")
except Exception as e:
logger.warning(f"Error closing {name} for {url}: {e}")
self.page = None
context = None
browser = None
gc.collect()
@@ -0,0 +1,27 @@
"""
Playwright Chrome fetcher launches a local Chromium browser directly.
No external browser container is required. Playwright must be installed
with Chromium browsers: ``playwright install chromium``.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Chrome (local)"
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.chromium.launch(**launch_kwargs)
class PlaywrightChromePlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_chrome', fetcher)
chrome_plugin = PlaywrightChromePlugin()
@@ -0,0 +1,33 @@
"""
Playwright Firefox fetcher launches a local Firefox browser directly.
No external browser container is required. Playwright must be installed
with Firefox browsers: ``playwright install firefox``.
Note: ``page.request_gc()`` is Chromium-specific and is silently skipped
on Firefox this is handled transparently by ``_safe_request_gc()`` in
the base package.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright Firefox (local)"
status_icon = {'filename': 'firefox-icon.svg', 'alt': 'Using Firefox', 'title': 'Using Firefox'}
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.firefox.launch(**launch_kwargs)
class PlaywrightFirefoxPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_firefox', fetcher)
firefox_plugin = PlaywrightFirefoxPlugin()
@@ -0,0 +1,30 @@
"""
Playwright WebKit fetcher launches a local WebKit (Safari-engine) browser.
No external browser container is required. Playwright must be installed
with WebKit browsers: ``playwright install webkit``.
Note: ``page.request_gc()`` is Chromium-specific and is silently skipped
on WebKit handled transparently by ``_safe_request_gc()`` in the base package.
"""
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers.playwright import PlaywrightBaseFetcher
class fetcher(PlaywrightBaseFetcher):
fetcher_description = "Playwright WebKit/Safari (local)"
async def _connect_browser(self, p):
launch_kwargs = {'headless': True}
if self.proxy:
launch_kwargs['proxy'] = self.proxy
return await p.webkit.launch(**launch_kwargs)
class PlaywrightWebKitPlugin:
@hookimpl
def register_content_fetcher(self):
return ('playwright_webkit', fetcher)
webkit_plugin = PlaywrightWebKitPlugin()
+20 -20
View File
@@ -7,6 +7,7 @@ from urllib.parse import urlparse
from loguru import logger
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.content_fetchers import SCREENSHOT_MAX_HEIGHT_DEFAULT, visualselector_xpath_selectors, \
SCREENSHOT_SIZE_STITCH_THRESHOLD, SCREENSHOT_DEFAULT_QUALITY, XPATH_ELEMENT_JS, INSTOCK_DATA_JS, \
SCREENSHOT_MAX_TOTAL_HEIGHT, FAVICON_FETCHER_JS
@@ -166,11 +167,8 @@ async def capture_full_page(page, screenshot_format='JPEG', watch_uuid=None, loc
class fetcher(Fetcher):
fetcher_description = "Puppeteer/direct {}/Javascript".format(
os.getenv("PLAYWRIGHT_BROWSER_TYPE", 'chromium').capitalize()
)
if os.getenv("PLAYWRIGHT_DRIVER_URL"):
fetcher_description += " via '{}'".format(os.getenv("PLAYWRIGHT_DRIVER_URL"))
fetcher_description = "Puppeteer Chromium"
requires_connection_url = True
browser = None
browser_type = ''
@@ -182,14 +180,10 @@ class fetcher(Fetcher):
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for Puppeteer fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
def disk_cleanup_after_fetch(self):
self.delete_browser_steps_screenshots()
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
@@ -198,9 +192,10 @@ class fetcher(Fetcher):
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
# Fallback to fetching from system
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.browser_connection_url = os.getenv("PLAYWRIGHT_DRIVER_URL", 'ws://playwright-chrome:3000').strip('"')
from loguru import logger
logger.critical("Puppeteer fetcher has no browser_connection_url — browser profile was not configured. "
"Set PLAYWRIGHT_DRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
# allow per-watch proxy selection override
# @todo check global too?
@@ -270,7 +265,7 @@ class fetcher(Fetcher):
import re
self.delete_browser_steps_screenshots()
n = int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 12)) + self.render_extract_delay
n = self.extra_delay + self.render_extract_delay
extra_wait = min(n, 15)
logger.debug(f"Extra wait set to {extra_wait}s, requested was {n}s.")
@@ -447,8 +442,12 @@ class fetcher(Fetcher):
if self.status_code != 200 and not ignore_status_codes:
screenshot = await capture_full_page(page=self.page, screenshot_format=self.screenshot_format, watch_uuid=watch_uuid, lock_viewport_elements=self.lock_viewport_elements)
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot)
try:
page_html = await self.page.content
except Exception as e:
logger.warning(f"Got non-200 status {self.status_code} but failed to fetch page content: {e}")
page_html = None
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=screenshot, page_html=page_html)
content = await self.page.content
@@ -548,9 +547,10 @@ class fetcher(Fetcher):
class PuppeteerFetcherPlugin:
"""Plugin class that registers the Puppeteer fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the Puppeteer fetcher"""
return ('html_webdriver', fetcher)
return ('puppeteer', fetcher)
# Create module-level instance for plugin registration
+29 -5
View File
@@ -8,6 +8,7 @@ import asyncio
from changedetectionio import strtobool
from changedetectionio.content_fetchers.exceptions import BrowserStepsInUnsupportedFetcher, EmptyReply, Non200ErrorCodeReceived
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.pluggy_interface import hookimpl
from changedetectionio.validate_url import is_private_hostname
@@ -148,10 +149,32 @@ class fetcher(Fetcher):
# Default to UTF-8 for XML if no encoding found
r.encoding = 'utf-8'
else:
# For other content types, use chardet
encoding = chardet.detect(r.content)['encoding']
if encoding:
r.encoding = encoding
# No charset in HTTP header - sniff encoding in priority order matching browsers
# (WHATWG encoding sniffing algorithm):
# 1. BOM - highest confidence, check before anything else
# 2. <meta charset> in first 2kb
# 3. chardet statistical detection - last resort
# See: https://github.com/dgtlmoon/changedetection.io/issues/3952
boms = [
(b'\xef\xbb\xbf', 'utf-8-sig'),
(b'\xff\xfe', 'utf-16-le'),
(b'\xfe\xff', 'utf-16-be'),
]
bom_encoding = next((enc for bom, enc in boms if r.content.startswith(bom)), None)
if bom_encoding:
logger.info(f"URL: {url} Using encoding '{bom_encoding}' detected from BOM")
r.encoding = bom_encoding
else:
meta_charset_match = re.search(rb'<meta[^>]+charset\s*=\s*["\']?\s*([^"\'\s;>]+)', r.content[:2000], re.IGNORECASE)
if meta_charset_match:
encoding = meta_charset_match.group(1).decode('ascii', errors='ignore')
logger.info(f"URL: {url} No content-type encoding in HTTP headers - Using encoding '{encoding}' from HTML meta charset tag")
r.encoding = encoding
else:
encoding = chardet.detect(r.content)['encoding']
logger.warning(f"URL: {url} No charset in headers or meta tag, guessed encoding as '{encoding}' via chardet")
if encoding:
r.encoding = encoding
self.headers = r.headers
@@ -236,9 +259,10 @@ class fetcher(Fetcher):
class RequestsFetcherPlugin:
"""Plugin class that registers the requests fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the requests fetcher"""
return ('html_requests', fetcher)
return ('requests', fetcher)
# Create module-level instance for plugin registration
@@ -3,13 +3,13 @@ import time
from loguru import logger
from changedetectionio.content_fetchers.base import Fetcher
from changedetectionio.content_fetchers.exceptions import Non200ErrorCodeReceived
from changedetectionio.pluggy_interface import hookimpl
class fetcher(Fetcher):
if os.getenv("WEBDRIVER_URL"):
fetcher_description = f"WebDriver Chrome/Javascript via \"{os.getenv('WEBDRIVER_URL', '')}\""
else:
fetcher_description = "WebDriver Chrome/Javascript"
fetcher_description = "Selenium WebDriver Chrome"
requires_connection_url = True
proxy = None
proxy_url = None
@@ -19,26 +19,21 @@ class fetcher(Fetcher):
supports_screenshots = True
supports_xpath_element_data = True
@classmethod
def get_status_icon_data(cls):
"""Return Chrome browser icon data for WebDriver fetcher."""
return {
'filename': 'google-chrome-icon.png',
'alt': 'Using a Chrome browser',
'title': 'Using a Chrome browser'
}
status_icon = {'filename': 'google-chrome-icon.png', 'alt': 'Using a Chrome browser', 'title': 'Using a Chrome browser'}
def __init__(self, proxy_override=None, custom_browser_connection_url=None, **kwargs):
super().__init__(**kwargs)
from urllib.parse import urlparse
from selenium.webdriver.common.proxy import Proxy
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
if not custom_browser_connection_url:
self.browser_connection_url = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub').strip('"')
else:
if custom_browser_connection_url:
self.browser_connection_is_custom = True
self.browser_connection_url = custom_browser_connection_url
else:
from loguru import logger
logger.critical("Selenium WebDriver fetcher has no browser_connection_url — browser profile was not configured. "
"Set WEBDRIVER_URL or configure a browser profile in Settings.")
self.browser_connection_url = None
##### PROXY SETUP #####
@@ -130,22 +125,28 @@ class fetcher(Fetcher):
if not "--window-size" in os.getenv("CHROME_OPTIONS", ""):
driver.set_window_size(1280, 1024)
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
driver.implicitly_wait(self.extra_delay)
if self.webdriver_js_execute_code is not None:
driver.execute_script(self.webdriver_js_execute_code)
# Selenium doesn't automatically wait for actions as good as Playwright, so wait again
driver.implicitly_wait(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))
# @todo - how to check this? is it possible?
self.status_code = 200
# @todo somehow we should try to get this working for WebDriver
# raise EmptyReply(url=url, status_code=r.status_code)
driver.implicitly_wait(self.extra_delay)
# @todo - dom wait loaded?
import time
time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)) + self.render_extract_delay)
time.sleep(self.extra_delay + self.render_extract_delay)
self.content = driver.page_source
# Use Navigation Timing API to get the real HTTP status code (Chrome 102+)
# Read after the sleep so the page is fully settled
try:
nav_status = driver.execute_script(
"return window.performance.getEntriesByType('navigation')[0]?.responseStatus"
)
# Guard against 0 (file://, blocked requests) which should not raise Non200
self.status_code = int(nav_status) if nav_status and int(nav_status) > 0 else 200
except Exception:
self.status_code = 200
self.headers = {}
# Selenium always captures as PNG, convert to JPEG if needed
@@ -175,6 +176,10 @@ class fetcher(Fetcher):
img.close()
else:
self.screenshot = screenshot_png
if self.status_code != 200 and not ignore_status_codes:
raise Non200ErrorCodeReceived(url=url, status_code=self.status_code, screenshot=self.screenshot, page_html=self.content)
except Exception as e:
driver.quit()
raise e
@@ -190,9 +195,10 @@ class fetcher(Fetcher):
class WebDriverSeleniumFetcherPlugin:
"""Plugin class that registers the WebDriver Selenium fetcher as a built-in plugin."""
@hookimpl
def register_content_fetcher(self):
"""Register the WebDriver Selenium fetcher"""
return ('html_webdriver', fetcher)
return ('selenium', fetcher)
# Create module-level instance for plugin registration
+40 -56
View File
@@ -4,6 +4,7 @@ import flask_login
import locale
import os
import queue
import re
import sys
import threading
import time
@@ -217,9 +218,13 @@ def _jinja2_filter_format_number_locale(value: float) -> str:
"Formats for example 4000.10 to the local locale default of 4,000.10"
# Format the number with two decimal places (locale format string will return 6 decimal)
formatted_value = locale.format_string("%.2f", value, grouping=True)
return formatted_value
@app.template_filter('regex_search')
def _jinja2_filter_regex_search(value, pattern):
import re
return re.search(pattern, str(value)) is not None
@app.template_global('is_checking_now')
def _watch_is_checking_now(watch_obj, format="%Y-%m-%d %H:%M:%S"):
return worker_pool.is_watch_running(watch_obj['uuid'])
@@ -336,52 +341,38 @@ def _jinja2_filter_format_duration(seconds):
@app.template_filter('fetcher_status_icons')
def _jinja2_filter_fetcher_status_icons(fetcher_name):
"""Get status icon HTML for a given fetcher.
"""Return status icon HTML for a fetcher, or empty string if none.
This filter checks both built-in fetchers and plugin fetchers for status icons.
Args:
fetcher_name: The fetcher name (e.g., 'html_webdriver', 'html_js_zyte')
Returns:
str: HTML string containing status icon elements
Built-in fetchers declare their icon via the ``status_icon`` class attribute
on their ``Fetcher`` subclass. Plugin fetchers may still use the pluggy
``collect_fetcher_status_icons`` hook as a fallback.
"""
from changedetectionio import content_fetchers
from changedetectionio.pluggy_interface import collect_fetcher_status_icons
from markupsafe import Markup
from flask import url_for
icon_data = None
# First check if it's a plugin fetcher (plugins have priority)
plugin_icon_data = collect_fetcher_status_icons(fetcher_name)
if plugin_icon_data:
icon_data = plugin_icon_data
# Check if it's a built-in fetcher
elif hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
if hasattr(fetcher_class, 'get_status_icon_data'):
fetcher_class = content_fetchers.get_fetcher(fetcher_name)
if fetcher_class is not None:
icon_data = getattr(fetcher_class, 'status_icon', None)
if not icon_data and callable(getattr(fetcher_class, 'get_status_icon_data', None)):
icon_data = fetcher_class.get_status_icon_data()
# Build HTML from icon data
if icon_data and isinstance(icon_data, dict):
# Use 'group' from icon_data if specified, otherwise default to 'images'
group = icon_data.get('group', 'images')
# Fallback: pluggy hook for plugins that implement fetcher_status_icon
if not icon_data:
from changedetectionio.pluggy_interface import collect_fetcher_status_icons
icon_data = collect_fetcher_status_icons(fetcher_name)
# Try to use url_for, but fall back to manual URL building if endpoint not registered yet
try:
icon_url = url_for('static_content', group=group, filename=icon_data['filename'])
except:
# Fallback: build URL manually respecting APPLICATION_ROOT
from flask import request
app_root = request.script_root if hasattr(request, 'script_root') else ''
icon_url = f"{app_root}/static/{group}/{icon_data['filename']}"
if not icon_data:
return ''
style_attr = f' style="{icon_data["style"]}"' if icon_data.get('style') else ''
html = f'<img class="status-icon" src="{icon_url}" alt="{icon_data["alt"]}" title="{icon_data["title"]}"{style_attr}>'
return Markup(html)
group = icon_data.get('group', 'images')
icon_url = url_for('static_content', group=group, filename=icon_data['filename'])
style_attr = f' style="{icon_data["style"]}"' if icon_data.get('style') else ''
return Markup(f'<img class="status-icon" src="{icon_url}" alt="{icon_data["alt"]}" title="{icon_data["title"]}"{style_attr}>')
return ''
_RE_SANITIZE_TAG = re.compile(r'[^a-zA-Z0-9]')
@app.template_filter('sanitize_tag_class')
def _jinja2_filter_sanitize_tag_class(tag_title):
@@ -394,9 +385,8 @@ def _jinja2_filter_sanitize_tag_class(tag_title):
Returns:
str: A sanitized string suitable for use as a CSS class name
"""
import re
# Remove all non-alphanumeric characters and convert to lowercase
sanitized = re.sub(r'[^a-zA-Z0-9]', '', tag_title).lower()
sanitized = _RE_SANITIZE_TAG.sub('', tag_title).lower()
# Ensure it starts with a letter (CSS requirement)
if sanitized and not sanitized[0].isalpha():
sanitized = 'tag' + sanitized
@@ -484,28 +474,21 @@ def changedetection_app(config=None, datastore_o=None):
available_languages = get_available_languages()
language_codes = get_language_codes()
def get_locale():
# Locale aliases: map browser language codes to translation directory names
# This handles cases where browsers send standard codes (e.g., zh-TW)
# but our translations use more specific codes (e.g., zh_Hant_TW)
locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
_locale_aliases = {
'zh-TW': 'zh_Hant_TW', # Traditional Chinese: browser sends zh-TW, we use zh_Hant_TW
'zh_TW': 'zh_Hant_TW', # Also handle underscore variant
}
_locale_match_list = language_codes + list(_locale_aliases.keys())
def get_locale():
# 1. Try to get locale from session (user explicitly selected)
if 'locale' in session:
return session['locale']
# 2. Fall back to Accept-Language header
# Get the best match from browser's Accept-Language header
browser_locale = request.accept_languages.best_match(language_codes + list(locale_aliases.keys()))
# 3. Check if we need to map the browser locale to our internal locale
if browser_locale in locale_aliases:
return locale_aliases[browser_locale]
return browser_locale
browser_locale = request.accept_languages.best_match(_locale_match_list)
# 3. Map browser locale to our internal locale if needed
return _locale_aliases.get(browser_locale, browser_locale)
# Initialize Babel with locale selector
babel = Babel(app, locale_selector=get_locale)
@@ -1018,15 +1001,16 @@ def check_for_new_version():
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
session = requests.Session()
session.verify = False
while not app.config.exit.is_set():
try:
r = requests.post("https://changedetection.io/check-ver.php",
r = session.post("https://changedetection.io/check-ver.php",
data={'version': __version__,
'app_guid': datastore.data['app_guid'],
'watch_count': len(datastore.data['watching'])
},
verify=False)
})
except:
pass
+66 -12
View File
@@ -608,13 +608,12 @@ class ValidateCSSJSONXPATHInput(object):
raise ValidationError("XPath not permitted in this field!")
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
from changedetectionio.html_tools import SafeXPath3Parser
tree = html.fromstring("<html></html>")
line = line.replace('xpath:', '')
try:
elementpath.select(tree, line.strip(), parser=XPath3Parser)
elementpath.select(tree, line.strip(), parser=SafeXPath3Parser)
except elementpath.ElementPathError as e:
message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
raise ValidationError(message % (line, str(e)))
@@ -668,9 +667,11 @@ class ValidateCSSJSONXPATHInput(object):
# `jq` requires full compilation in windows and so isn't generally available
raise ValidationError("jq not support not found")
from changedetectionio.html_tools import validate_jq_expression
input = line.replace('jq:', '')
try:
validate_jq_expression(input)
jq.compile(input)
except (ValueError) as e:
message = field.gettext('\'%s\' is not a valid jq expression. (%s)')
@@ -741,7 +742,6 @@ class commonSettingsForm(Form):
self.notification_title.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
self.notification_urls.extra_notification_tokens = kwargs.get('extra_notification_tokens', {})
fetch_backend = RadioField(_l('Fetch Method'), choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
notification_body = TextAreaField(_l('Notification Body'), default='{{ watch_url }} had a change.', validators=[validators.Optional(), ValidateJinja2Template()])
notification_format = SelectField(_l('Notification format'), choices=list(valid_notification_formats.items()))
notification_title = StringField(_l('Notification Title'), default='ChangeDetection.io Notification - {{ watch_url }}', validators=[validators.Optional(), ValidateJinja2Template()])
@@ -778,6 +778,7 @@ class SingleBrowserStep(Form):
class processor_text_json_diff_form(commonSettingsForm):
browser_profile = RadioField(_l('Browser / Fetch method'), choices=[]) # populated at runtime in edit.py
url = fields.URLField('Web Page URL', validators=[validateURL()])
tags = StringTagUUID('Group Tag', [validators.Optional()], default='')
@@ -939,10 +940,66 @@ class SingleExtraBrowser(Form):
ValidateSimpleURL()
], render_kw={"placeholder": "wss://brightdata... wss://oxylabs etc", "size":50})
class DefaultUAInputForm(Form):
html_requests = StringField(_l('Plaintext requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
if os.getenv("PLAYWRIGHT_DRIVER_URL") or os.getenv("WEBDRIVER_URL"):
html_webdriver = StringField(_l('Chrome requests'), validators=[validators.Optional()], render_kw={"placeholder": "<default>"})
class BrowserProfileForm(Form):
"""Create or edit a named BrowserProfile stored in settings.application.browser_profiles."""
name = StringField(
_l('Profile name'),
[validators.DataRequired(), validators.Length(max=100)],
render_kw={"placeholder": _l("e.g. Mobile Chrome, Bright Data CDP"), "maxlength": "100"}
)
fetch_backend = SelectField(
_l('Fetch method'),
choices=[], # populated at runtime from available_fetchers()
)
browser_connection_url = StringField(
_l('Browser connection URL'),
[
validators.Optional(),
ValidateStartsWithRegex(
regex=r'^(wss?|ws|http|https)://',
flags=re.IGNORECASE,
message=_l('Browser connection URL must start with ws://, wss://, http://, https://')
),
ValidateSimpleURL(),
],
render_kw={"placeholder": "ws://my-chrome:3000", "size": 50}
)
viewport_width = IntegerField(
_l('Viewport width (px)'),
[validators.Optional(), validators.NumberRange(min=100, max=7680)],
default=1280,
render_kw={"style": "width:5em;"}
)
viewport_height = IntegerField(
_l('Viewport height (px)'),
[validators.Optional(), validators.NumberRange(min=100, max=4320)],
default=1000,
render_kw={"style": "width:5em;"}
)
block_images = BooleanField(_l('Block images (faster loads)'), default=False)
block_fonts = BooleanField(_l('Block web fonts'), default=False)
ignore_https_errors = BooleanField(_l('Ignore HTTPS/TLS errors'), default=False)
user_agent = StringField(
_l('User-Agent override'),
[validators.Optional(), validators.Length(max=500)],
render_kw={"placeholder": _l("Leave blank to use fetcher default"), "size": 60}
)
locale = StringField(
_l('Locale'),
[validators.Optional(), validators.Length(max=20)],
render_kw={"placeholder": "en-US, de-DE, fr-FR …", "size": 15}
)
custom_headers = TextAreaField(
_l('Custom headers'),
[validators.Optional()],
render_kw={
"placeholder": "Header-Name: value\nAnother-Header: value",
"rows": 4, "cols": 60,
"style": "font-family:monospace;"
}
)
# datastore.data['settings']['requests']..
class globalSettingsRequestForm(Form):
@@ -966,8 +1023,6 @@ class globalSettingsRequestForm(Form):
extra_proxies = FieldList(FormField(SingleExtraProxy), min_entries=5)
extra_browsers = FieldList(FormField(SingleExtraBrowser), min_entries=5)
default_ua = FormField(DefaultUAInputForm, label=_l("Default User-Agent overrides"))
def validate_extra_proxies(self, extra_validators=None):
for e in self.data['extra_proxies']:
if e.get('proxy_name') or e.get('proxy_url'):
@@ -990,7 +1045,6 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"placeholder": os.getenv('BASE_URL', 'Not set')}
)
empty_pages_are_a_change = BooleanField(_l('Treat empty pages as a change?'), default=False)
fetch_backend = RadioField(_l('Fetch Method'), default="html_requests", choices=content_fetchers.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
global_ignore_text = StringListField(_l('Ignore Text'), [ValidateListRegex()])
global_subtractive_selectors = StringListField(_l('Remove elements'), [ValidateCSSJSONXPATHInput(allow_json=False)])
ignore_whitespace = BooleanField(_l('Ignore whitespace'))
@@ -1006,7 +1060,7 @@ class globalSettingsApplicationForm(commonSettingsForm):
render_kw={"placeholder": "0.1", "style": "width: 8em;"}
)
password = SaltyPasswordField(_l('Password'))
password = SaltyPasswordField(_l('Password'), render_kw={"autocomplete": "new-password"})
pager_size = IntegerField(_l('Pager size'),
render_kw={"style": "width: 5em;"},
validators=[validators.NumberRange(min=0,
+122 -12
View File
@@ -4,6 +4,7 @@ from loguru import logger
from typing import List
import html
import json
import os
import re
# HTML added to be sure each result matching a filter (.example) gets converted to a new line by Inscriptis
@@ -13,6 +14,45 @@ PERL_STYLE_REGEX = r'^/(.*?)/([a-z]*)?$'
TITLE_RE = re.compile(r"<title[^>]*>(.*?)</title>", re.I | re.S)
META_CS = re.compile(r'<meta[^>]+charset=["\']?\s*([a-z0-9_\-:+.]+)', re.I)
# jq builtins that can leak sensitive data or cause harm when user-supplied expressions are executed.
# env/$ENV reads all process environment variables (passwords, API keys, etc.)
# include/import can read arbitrary files from disk
# input/inputs reads beyond the supplied JSON data
# debug/stderr leaks data to stderr
# halt/halt_error terminates the process (DoS)
_JQ_BLOCKED_PATTERNS = [
(re.compile(r'\benv\b'), 'env (reads environment variables)'),
(re.compile(r'\$ENV\b'), '$ENV (reads environment variables)'),
(re.compile(r'\binclude\b'), 'include (reads files from disk)'),
(re.compile(r'\bimport\b'), 'import (reads files from disk)'),
(re.compile(r'\binputs?\b'), 'input/inputs (reads beyond provided data)'),
(re.compile(r'\bdebug\b'), 'debug (leaks data to stderr)'),
(re.compile(r'\bstderr\b'), 'stderr (leaks data to stderr)'),
(re.compile(r'\bhalt(?:_error)?\b'), 'halt/halt_error (terminates the process)'),
(re.compile(r'\$__loc__\b'), '$__loc__ (leaks file path information)'),
(re.compile(r'\bbuiltins\b'), 'builtins (enumerates available functions)'),
(re.compile(r'\bmodulemeta\b'), 'modulemeta (leaks module information)'),
(re.compile(r'\$JQ_BUILD_CONFIGURATION\b'), '$JQ_BUILD_CONFIGURATION (leaks build information)'),
]
def validate_jq_expression(expression: str) -> None:
"""Raise ValueError if the jq expression uses any dangerous builtin.
User-supplied jq expressions are executed server-side. Without this check,
builtins like `env` expose every process environment variable (SALTED_PASS,
proxy credentials, API keys, etc.) as watch output.
"""
from changedetectionio.strtobool import strtobool
if strtobool(os.getenv('JQ_ALLOW_RISKY_EXPRESSIONS', 'false')):
return
for pattern, description in _JQ_BLOCKED_PATTERNS:
if pattern.search(expression):
msg = f"jq expression uses disallowed builtin: {description}"
logger.critical(f"Security: blocked jq expression containing '{description}' - expression: {expression!r}")
raise ValueError(msg)
META_CT = re.compile(r'<meta[^>]+http-equiv=["\']?content-type["\']?[^>]*content=["\'][^>]*charset=([a-z0-9_\-:+.]+)', re.I)
# 'price' , 'lowPrice', 'highPrice' are usually under here
@@ -23,6 +63,59 @@ class JSONNotFound(ValueError):
def __init__(self, msg):
ValueError.__init__(self, msg)
_DEFAULT_UNSAFE_XPATH3_FUNCTIONS = [
'unparsed-text',
'unparsed-text-lines',
'unparsed-text-available',
'doc',
'doc-available',
'json-doc',
'json-doc-available',
'collection', # XPath 2.0+: loads XML node collections from arbitrary URIs
'uri-collection', # XPath 3.0+: enumerates URIs from resource collections
'transform', # XPath 3.1: XSLT transformation (currently raises, block proactively)
'load-xquery-module', # XPath 3.1: loads XQuery modules (currently raises, block proactively)
'environment-variable',
'available-environment-variables',
]
def _build_safe_xpath3_parser():
"""Return an XPath3Parser subclass with filesystem/environment access functions removed.
XPath 3.0 includes functions that can read arbitrary files or environment variables:
- unparsed-text / unparsed-text-lines / unparsed-text-available (file read)
- doc / doc-available (XML fetch from URI)
- environment-variable / available-environment-variables (env var leakage)
Subclassing gives us an independent symbol_table copy (not shared with the parent class),
so removing entries here does not affect XPath3Parser itself.
Override the blocked list via the XPATH_BLOCKED_FUNCTIONS environment variable
(comma-separated, e.g. "unparsed-text,doc,environment-variable").
"""
import os
from elementpath.xpath3 import XPath3Parser
class SafeXPath3Parser(XPath3Parser):
pass
env_override = os.getenv('XPATH_BLOCKED_FUNCTIONS')
if env_override is not None:
blocked = [f.strip() for f in env_override.split(',') if f.strip()]
else:
blocked = _DEFAULT_UNSAFE_XPATH3_FUNCTIONS
for _fn in blocked:
SafeXPath3Parser.symbol_table.pop(_fn, None)
return SafeXPath3Parser
# Module-level singleton — built once, reused everywhere.
SafeXPath3Parser = _build_safe_xpath3_parser()
# Doesn't look like python supports forward slash auto enclosure in re.findall
# So convert it to inline flag "(?i)foobar" type configuration
@lru_cache(maxsize=100)
@@ -183,8 +276,6 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
"""
from lxml import etree, html
import elementpath
# xpath 2.0-3.1
from elementpath.xpath3 import XPath3Parser
parser = etree.HTMLParser()
tree = None
@@ -210,7 +301,7 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
# This allows //title to match elements in the default namespace
namespaces[''] = tree.nsmap[None]
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=XPath3Parser)
r = elementpath.select(tree, xpath_filter.strip(), namespaces=namespaces, parser=SafeXPath3Parser)
#@note: //title/text() now works with default namespaces (fixed by registering '' prefix)
#@note: //title/text() wont work where <title>CDATA.. (use cdata_in_document_to_text first)
@@ -235,6 +326,9 @@ def xpath_filter(xpath_filter, html_content, append_pretty_line_formatting=False
else:
html_block += elementpath_tostring(element)
# Drop element references before the finally block so tree.clear() can release
# the libxml2 document immediately (elements pin the C-level doc via refcount).
del r
return html_block
finally:
# Explicitly clear the tree to free memory
@@ -330,12 +424,16 @@ def _parse_json(json_data, json_filter):
raise Exception("jq not support not found")
if json_filter.startswith("jq:"):
jq_expression = jq.compile(json_filter.removeprefix("jq:"))
expr = json_filter.removeprefix("jq:")
validate_jq_expression(expr)
jq_expression = jq.compile(expr)
match = jq_expression.input(json_data).all()
return _get_stripped_text_from_json_match(match)
if json_filter.startswith("jqraw:"):
jq_expression = jq.compile(json_filter.removeprefix("jqraw:"))
expr = json_filter.removeprefix("jqraw:")
validate_jq_expression(expr)
jq_expression = jq.compile(expr)
match = jq_expression.input(json_data).all()
return '\n'.join(str(item) for item in match)
@@ -439,13 +537,25 @@ def extract_json_as_string(content, json_filter, ensure_is_ldjson_info_type=None
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON {content[:20]}...{str(e)})")
else:
# Probably something else, go fish inside for it
try:
stripped_text_from_html = extract_json_blob_from_html(content=content,
ensure_is_ldjson_info_type=ensure_is_ldjson_info_type,
json_filter=json_filter )
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON while extracting JSON from HTML blob {content[:20]}...{str(e)})")
# Check for JSONP wrapper: someCallback({...}) or some.namespace({...})
# Server may claim application/json but actually return JSONP
jsonp_match = re.match(r'^\w[\w.]*\s*\((.+)\)\s*;?\s*$', content.lstrip("\ufeff").strip(), re.DOTALL)
if jsonp_match:
try:
inner = jsonp_match.group(1).strip()
logger.warning(f"Content looks like JSONP, attempting to extract inner JSON for filter '{json_filter}'")
stripped_text_from_html = _parse_json(json.loads(inner), json_filter)
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSONP inner content {content[:20]}...{str(e)})")
if not stripped_text_from_html:
# Probably something else, go fish inside for it
try:
stripped_text_from_html = extract_json_blob_from_html(content=content,
ensure_is_ldjson_info_type=ensure_is_ldjson_info_type,
json_filter=json_filter)
except json.JSONDecodeError as e:
logger.warning(f"Error processing JSON while extracting JSON from HTML blob {content[:20]}...{str(e)})")
if not stripped_text_from_html:
# Re 265 - Just return an empty string when filter not found
+3 -6
View File
@@ -12,7 +12,6 @@ from changedetectionio.notification import (
# Equal to or greater than this number of FilterNotFoundInResponse exceptions will trigger a filter-not-found notification
_FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT = 6
DEFAULT_SETTINGS_HEADERS_USERAGENT='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'
@@ -31,10 +30,6 @@ class model(dict):
'time_between_check': {'weeks': None, 'days': None, 'hours': 3, 'minutes': None, 'seconds': None},
'timeout': int(getenv("DEFAULT_SETTINGS_REQUESTS_TIMEOUT", "45")), # Default 45 seconds
'workers': int(getenv("DEFAULT_SETTINGS_REQUESTS_WORKERS", "5")), # Number of threads, lower is better for slow connections
'default_ua': {
'html_requests': getenv("DEFAULT_SETTINGS_HEADERS_USERAGENT", DEFAULT_SETTINGS_HEADERS_USERAGENT),
'html_webdriver': None,
}
},
'application': {
# Custom notification content
@@ -43,7 +38,9 @@ class model(dict):
'api_access_token_enabled': True,
'base_url' : None,
'empty_pages_are_a_change': False,
'fetch_backend': getenv("DEFAULT_FETCH_BACKEND", "html_requests"),
'browser_profile': None, # machine-name of the system-default BrowserProfile
'browser_profiles': {}, # user-defined profiles keyed by machine name
'fetch_backend': getenv("DEFAULT_FETCH_BACKEND", "requests"),
'filter_failure_notification_threshold_attempts': _FILTER_FAILURE_THRESHOLD_ATTEMPTS_DEFAULT,
'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
'global_subtractive_selectors': [],
+48 -61
View File
@@ -43,6 +43,11 @@ from ..html_tools import TRANSLATE_WHITESPACE_TABLE
FAVICON_RESAVE_THRESHOLD_SECONDS=86400
BROTLI_COMPRESS_SIZE_THRESHOLD = int(os.getenv('SNAPSHOT_BROTLI_COMPRESSION_THRESHOLD', 1024*20))
# Module-level favicon filename cache: data_dir → basename (or None)
# Keyed by data_dir so it survives Watch object recreation, deepcopy, and concurrent requests.
# Invalidated explicitly in bump_favicon() when a new favicon is saved.
_FAVICON_FILENAME_CACHE: dict = {}
minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 3))
mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
@@ -348,41 +353,41 @@ class model(EntityPersistenceMixin, watch_base):
def is_source_type_url(self):
return self.get('url', '').startswith('source:')
@property
def effective_browser_profile(self):
"""Resolve the effective BrowserProfile for this watch.
Walks the chain: watch tag (overrides_watch=True) global settings built-in fallback.
Never raises. Returns a BrowserProfile instance.
"""
from changedetectionio.model.browser_profile import resolve_browser_profile, BUILTIN_REQUESTS
if not self._datastore:
return BUILTIN_REQUESTS
try:
return resolve_browser_profile(self, self._datastore)
except Exception:
return BUILTIN_REQUESTS
@property
def get_fetch_backend(self):
"""Legacy property — prefer effective_browser_profile.fetch_backend for new code.
Returns the raw fetch_backend stored on this watch (or 'requests' for PDFs).
Does NOT walk the tag/global resolution chain.
"""
Get the fetch backend for this watch with special case handling.
CHAIN RESOLUTION OPPORTUNITY:
Currently returns watch.fetch_backend directly, but doesn't implement
Watch Tag Global resolution chain. With Pydantic:
@computed_field
def resolved_fetch_backend(self) -> str:
# Special case: PDFs always use html_requests
if self.is_pdf:
return 'html_requests'
# Watch override
if self.fetch_backend and self.fetch_backend != 'system':
return self.fetch_backend
# Tag override (first tag with overrides_watch=True wins)
for tag_uuid in self.tags:
tag = self._datastore.get_tag(tag_uuid)
if tag.overrides_watch and tag.fetch_backend:
return tag.fetch_backend
# Global default
return self._datastore.settings.fetch_backend
"""
# Maybe also if is_image etc?
# This is because chrome/playwright wont render the PDF in the browser and we will just fetch it and use pdf2html to see the text.
if self.is_pdf:
return 'html_requests'
return 'requests'
return self.get('fetch_backend')
@property
def fetcher_supports_screenshots(self):
"""Return True if the resolved fetcher for this watch supports screenshots."""
from changedetectionio import content_fetchers
fetcher_class = content_fetchers.get_fetcher(self.effective_browser_profile.fetch_backend)
if fetcher_class is None:
return False
return bool(getattr(fetcher_class, 'supports_screenshots', False))
@property
def is_pdf(self):
url = str(self.get("url") or "").lower()
@@ -806,9 +811,8 @@ class model(EntityPersistenceMixin, watch_base):
with open(fname, 'wb') as f:
f.write(decoded)
# Invalidate favicon filename cache
if hasattr(self, '_favicon_filename_cache'):
delattr(self, '_favicon_filename_cache')
# Invalidate module-level favicon filename cache for this watch
_FAVICON_FILENAME_CACHE.pop(self.data_dir, None)
# A signal that could trigger the socket server to update the browser also
watch_check_update = signal('watch_favicon_bump')
@@ -823,35 +827,23 @@ class model(EntityPersistenceMixin, watch_base):
def get_favicon_filename(self) -> str | None:
"""
Find any favicon.* file in the current working directory
and return the contents of the newest one.
Find any favicon.* file in the watch data directory.
MEMORY LEAK FIX: Cache the result to avoid repeated glob.glob() operations.
glob.glob() causes millions of fnmatch allocations when called for every watch on page load.
Uses a module-level cache keyed by data_dir to survive Watch object recreation,
deepcopy (which drops instance attrs), and concurrent request races.
Invalidated by bump_favicon() when a new favicon is saved.
Returns:
str: Basename of the newest favicon file, or None if not found.
str: Basename of the favicon file, or None if not found.
"""
# Check cache first (prevents 26M+ allocations from repeated glob operations)
cache_key = '_favicon_filename_cache'
if hasattr(self, cache_key):
return getattr(self, cache_key)
if self.data_dir in _FAVICON_FILENAME_CACHE:
return _FAVICON_FILENAME_CACHE[self.data_dir]
import glob
# Search for all favicon.* files
files = glob.glob(os.path.join(self.data_dir, "favicon.*"))
if not files:
result = None
else:
# Find the newest by modification time
newest_file = max(files, key=os.path.getmtime)
result = os.path.basename(newest_file)
# Cache the result
setattr(self, cache_key, result)
return result
fname = os.path.basename(files[0]) if files else None
_FAVICON_FILENAME_CACHE[self.data_dir] = fname
return fname
def get_screenshot_as_thumbnail(self, max_age=3200):
"""Return path to a square thumbnail of the most recent screenshot.
@@ -1182,18 +1174,13 @@ class model(EntityPersistenceMixin, watch_base):
def compile_error_texts(self, has_proxies=None):
"""Compile error texts for this watch.
Accepts has_proxies parameter to ensure it works even outside app context"""
from flask import url_for
from flask import url_for, has_request_context
from markupsafe import Markup
output = [] # Initialize as list since we're using append
last_error = self.get('last_error','')
try:
url_for('settings.settings_page')
except Exception as e:
has_app_context = False
else:
has_app_context = True
has_app_context = has_request_context()
# has app+request context, we can use url_for()
if has_app_context:
+4 -1
View File
@@ -187,6 +187,7 @@ class watch_base(dict):
'content-type': None,
'date_created': None,
'extract_text': [], # Extract text by regex after filters
'browser_profile': 'system', # machine-name key of a BrowserProfile; 'system' → resolve via chain
'fetch_backend': 'system', # plaintext, playwright etc
'fetch_time': 0.0,
'filter_failure_notification_send': strtobool(os.getenv('FILTER_FAILURE_NOTIFICATION_SEND_DEFAULT', 'True')),
@@ -589,7 +590,9 @@ class watch_base(dict):
return None
try:
value = self._datastore['settings']
# _datastore is a ChangeDetectionStore (has .data) or a plain dict (unit tests)
store_data = self._datastore.data if hasattr(self._datastore, 'data') else self._datastore
value = store_data['settings']
for key in path:
value = value[key]
return value
+380
View File
@@ -0,0 +1,380 @@
"""
BrowserProfile named, reusable browser/fetcher configuration.
Storage key
-----------
Profiles are stored in ``settings.application.browser_profiles`` as a plain dict
keyed by *machine name* a lowercase, underscore-separated slug derived from the
human-readable ``name`` field:
'My Blocking Chrome' 'my_blocking_chrome'
'Custom CDP — Mobile (375px)' 'custom_cdp_mobile_375px'
Using the machine name as the key means that deleting a profile and recreating
it with the same name restores the original key, so all watches that referenced
it continue to work without any manual re-linking.
Resolution chain
----------------
``resolve_browser_profile(watch, datastore)`` walks:
watch.browser_profile first tag with overrides_watch=True
settings.application.browser_profile built-in fallback
It never raises. Stale / missing machine-name references are logged and the
resolver falls through to the next level.
Built-in profiles
-----------------
``BUILTIN_REQUESTS`` and ``BUILTIN_BROWSER`` are always available and cannot be
deleted from the UI (``is_builtin=True``). Their machine names are stored in
``RESERVED_MACHINE_NAMES`` to block user profiles from shadowing them.
Migration
---------
``store/updates.py::update_31`` converts the legacy ``fetch_backend`` field on
watches, tags and global settings into ``browser_profile`` machine-name
references. After that migration no legacy paths are needed here.
"""
from __future__ import annotations
import os
import re
from typing import Optional
from loguru import logger
from pydantic import BaseModel, field_validator
# Default User-Agent for the built-in plaintext requests profile.
# Overridable via environment variable for deployments that need a custom UA.
_DEFAULT_REQUESTS_UA = os.getenv(
"DEFAULT_SETTINGS_HEADERS_USERAGENT",
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'
)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
NAME_MAX_LEN = 100
# ---------------------------------------------------------------------------
# Model
# ---------------------------------------------------------------------------
class BrowserProfile(BaseModel):
"""
A named, reusable configuration for how a watch fetches its target URL.
The *machine name* (see ``get_machine_name()``) is the stable storage key.
Updating ``name`` changes the machine name; any watch that referenced the
old machine name will then fall back through the resolution chain until it
is explicitly re-pointed. To replace a profile without breaking watches,
delete it and recreate it with the *same* name.
"""
name: str
"""Human-readable label shown in the UI. Max 100 characters."""
fetch_backend: str = 'requests'
"""
Which fetch engine to use. This is the *clean* fetcher name without the
``html_`` module prefix (e.g. ``'requests'``, ``'webdriver'``,
``'playwright'``, ``'puppeteer'``, ``'cloakbrowser'``).
The module-level ``html_`` prefix (``html_requests``, ``html_webdriver``,
) is an implementation detail of ``content_fetchers/``. Use
``get_fetcher_class_name()`` to obtain the full module attribute name when
you need to look up the class.
Must be non-empty and contain only ``[a-z0-9_]`` characters.
"""
is_builtin: bool = False
"""Built-in profiles are always present and cannot be deleted from the UI."""
# ------------------------------------------------------------------
# Browser-specific settings (silently ignored by html_requests)
# ------------------------------------------------------------------
browser_connection_url: Optional[str] = None
"""
Custom CDP / WebSocket endpoint, e.g. ``ws://my-chrome:3000``.
Overrides the system-wide ``PLAYWRIGHT_DRIVER_URL`` for this profile.
Only meaningful for ``html_webdriver`` profiles.
"""
viewport_width: int = 1280
"""
Browser viewport width in pixels.
Common presets: 375 (iPhone), 768 (tablet), 1280 (desktop).
"""
viewport_height: int = 1000
"""
Browser viewport height in pixels.
Common presets: 812 (iPhone), 1024 (tablet), 1000 (desktop).
"""
block_images: bool = False
"""
Block all image requests. Typically cuts page-load time by 40-70 % on
image-heavy sites with no impact on text-based change detection.
"""
block_fonts: bool = False
"""Block web-font requests. Modest speed gain; rarely affects detection."""
user_agent: Optional[str] = None
"""
Override the browser User-Agent string.
``None`` keeps the fetcher's built-in default, which already strips
obvious headless markers such as ``HeadlessChrome``.
"""
ignore_https_errors: bool = False
"""
Proceed even when the server's TLS certificate is invalid or self-signed.
Useful for staging / development environments.
"""
locale: Optional[str] = None
"""
Browser locale (e.g. ``en-US``, ``de-DE``).
Sets the ``Accept-Language`` header and ``navigator.language``.
Some sites serve different prices or copy based on locale.
"""
custom_headers: str = ''
"""
Extra HTTP headers sent with every request using this profile, in ``Key: Value`` format
(one per line, ``#`` lines are ignored). Applied before per-watch headers so
individual watches can override them.
"""
service_workers: str = 'allow'
"""
Whether to allow Service Workers in the browser context.
Playwright accepts ``'allow'`` or ``'block'``.
Block to avoid large Service Worker data transfers (e.g. YouTube).
"""
extra_delay: int = 0
"""
Extra seconds to wait after page load before extracting content
(on top of the per-watch ``render_extract_delay``).
Sourced from ``WEBDRIVER_DELAY_BEFORE_CONTENT_READY`` at startup.
"""
model_config = {"frozen": False}
# ------------------------------------------------------------------
# Validators
# ------------------------------------------------------------------
@field_validator('fetch_backend')
@classmethod
def _validate_fetch_backend(cls, v: str) -> str:
v = v.strip()
if not v:
raise ValueError('fetch_backend cannot be empty')
if not re.fullmatch(r'[a-z0-9_]+', v):
raise ValueError(
f"fetch_backend must contain only lowercase letters, digits and underscores, got {v!r}"
)
if v.startswith('html_'):
raise ValueError(
f"fetch_backend should be the clean fetcher name without the 'html_' prefix "
f"(e.g. 'requests', 'webdriver', 'playwright'). Got {v!r}. "
f"Use get_fetcher_class_name() to obtain the full module attribute name."
)
return v
@field_validator('name')
@classmethod
def _validate_name(cls, v: str) -> str:
v = v.strip()
if not v:
raise ValueError('Name cannot be empty')
if len(v) > NAME_MAX_LEN:
raise ValueError(f'Name must be {NAME_MAX_LEN} characters or less')
return v
# ------------------------------------------------------------------
# Machine-name helpers
# ------------------------------------------------------------------
@staticmethod
def machine_name_from_str(name: str) -> str:
"""
Convert a human name to a machine-safe storage key.
Transformation rules (applied in order):
1. Strip surrounding whitespace; lower-case.
2. Replace runs of whitespace or hyphens with a single ``_``.
3. Drop every character that is not ``[a-z0-9_]``.
4. Collapse consecutive underscores.
5. Strip leading / trailing underscores.
6. Truncate to ``NAME_MAX_LEN`` characters.
Examples::
'My Blocking Browser Chrome' 'my_blocking_browser_chrome'
'Custom CDP — Mobile (375px)' 'custom_cdp_mobile_375px'
' Weird --- Name ' 'weird_name'
"""
s = name.strip().lower()
s = re.sub(r'[\s\-]+', '_', s) # whitespace / hyphens → underscore
s = re.sub(r'[^a-z0-9_]', '', s) # drop everything else
s = re.sub(r'_+', '_', s) # collapse repeated underscores
s = s.strip('_') # drop leading / trailing underscores
return s[:NAME_MAX_LEN]
def get_machine_name(self) -> str:
"""Return the machine-safe storage key derived from this profile's ``name``."""
return self.machine_name_from_str(self.name)
def get_fetcher_class_name(self) -> str:
"""Return the clean fetcher name for this profile (same as ``fetch_backend``).
Use with ``content_fetchers.get_fetcher()``::
from changedetectionio import content_fetchers
fetcher_cls = content_fetchers.get_fetcher(profile.get_fetcher_class_name())
"""
return self.fetch_backend
# ---------------------------------------------------------------------------
# Built-in profiles (always present, cannot be deleted)
# ---------------------------------------------------------------------------
BUILTIN_REQUESTS = BrowserProfile(
name='Direct HTTP (requests)',
fetch_backend='requests',
is_builtin=True,
user_agent=_DEFAULT_REQUESTS_UA,
)
BUILTIN_PLAYWRIGHT = BrowserProfile(
name='Browser (Chrome/Playwright)',
fetch_backend='playwright_cdp',
is_builtin=True,
)
BUILTIN_SELENIUM = BrowserProfile(
name='Browser (Chrome/Selenium)',
fetch_backend='selenium',
is_builtin=True,
)
BUILTIN_PUPPETEER = BrowserProfile(
name='Browser (Chrome/Puppeteer)',
fetch_backend='puppeteer',
is_builtin=True,
)
# Backwards-compatible alias — code that imported BUILTIN_BROWSER keeps working.
BUILTIN_BROWSER = BUILTIN_PLAYWRIGHT
# Keyed by machine name for O(1) lookup.
_BUILTINS: dict[str, BrowserProfile] = {
b.get_machine_name(): b
for b in (BUILTIN_REQUESTS, BUILTIN_PLAYWRIGHT, BUILTIN_SELENIUM, BUILTIN_PUPPETEER)
}
# Machine names that cannot be used by user-created profiles.
RESERVED_MACHINE_NAMES: frozenset[str] = frozenset(_BUILTINS.keys())
def get_default_browser_builtin() -> BrowserProfile:
"""Final fallback when no profile can be resolved through the chain.
``preconfigure_browser_profiles_based_on_env()`` sets
``settings.application.browser_profile`` explicitly at startup, so this
fallback is only reached for watches with stale / missing machine-name
references. Safe default is always direct HTTP requests.
"""
return BUILTIN_REQUESTS
# ---------------------------------------------------------------------------
# Lookup helpers
# ---------------------------------------------------------------------------
def get_builtin_profiles() -> dict[str, BrowserProfile]:
"""Return a shallow copy of the built-in profiles dict (keyed by machine name)."""
return dict(_BUILTINS)
def get_profile(machine_name: str, store_profiles: dict) -> Optional[BrowserProfile]:
"""
Look up a ``BrowserProfile`` by machine name.
Stored profiles are checked first so that env-configured built-ins (written
by ``preconfigure_browser_profiles_based_on_env``) take priority over the
bare module-level defaults. Falls back to ``_BUILTINS`` when no stored
version exists.
Returns ``None`` when the machine name is unknown or the stored data is
corrupt (a warning is logged in the latter case).
"""
raw = store_profiles.get(machine_name)
if raw is not None:
if isinstance(raw, BrowserProfile):
return raw
try:
return BrowserProfile(**raw)
except Exception as exc:
logger.warning(f"BrowserProfile '{machine_name}': failed to deserialize — {exc}")
# Fall through to built-in
if machine_name in _BUILTINS:
return _BUILTINS[machine_name]
return None
# ---------------------------------------------------------------------------
# Resolution
# ---------------------------------------------------------------------------
def resolve_browser_profile(watch, datastore) -> BrowserProfile:
"""
Resolve the effective ``BrowserProfile`` for *watch*.
Resolution chain
~~~~~~~~~~~~~~~~
1. ``watch['browser_profile']`` explicit machine name set on the watch.
2. First tag with ``overrides_watch=True`` that has ``browser_profile`` set.
3. ``settings.application['browser_profile']`` system-wide default.
4. Built-in fallback: ``BUILTIN_REQUESTS`` (requests is always the safe default).
Never raises. A stale / missing machine-name reference produces a
``logger.warning`` and the resolver continues down the chain.
"""
from changedetectionio.model.resolver import resolve_setting
store_profiles: dict = datastore.data['settings']['application'].get('browser_profiles', {})
machine_name = resolve_setting(
watch, datastore,
field_name='browser_profile',
sentinel_values={'system', 'default', ''},
default=None,
require_tag_override=True,
)
if machine_name:
profile = get_profile(machine_name, store_profiles)
if profile:
return profile
logger.warning(
f"Watch {watch.get('uuid')!r}: browser_profile {machine_name!r} not found, "
f"falling back through the chain"
)
return get_default_browser_builtin()
+63
View File
@@ -0,0 +1,63 @@
"""
Unified Watch Tag Global settings cascade resolver.
All settings resolution follows the same priority order:
1. Watch-level setting (if set and not a sentinel "use parent" value)
2. First tag with overrides_watch=True that has the field set
3. Global application settings
4. Caller-supplied default
This replaces the previously scattered manual resolution loops found in
notification_service.py, processors/base.py, and the restock processor.
"""
def resolve_setting(watch, datastore, field_name, *,
sentinel_values=None,
default=None,
require_tag_override=True):
"""
Resolve a single setting value by walking the Watch Tag Global chain.
Args:
watch: Watch dict / model object.
datastore: App datastore (must have get_all_tags_for_watch() and
data['settings']['application']).
field_name: The setting key to look up at each level.
sentinel_values: Set of values that mean "not configured here, keep looking".
For example {'system'} for fetch_backend.
default: Value returned when nothing is found in the chain.
require_tag_override: If True (default), only tags where overrides_watch=True
contribute to the cascade. Set to False when every tag
that carries the field should be considered (e.g. for
fields that make sense to merge/override at any tag level).
Returns:
The first non-sentinel, non-empty value found, or *default*.
"""
_sentinels = set(sentinel_values) if sentinel_values else set()
def _is_unset(v):
return v is None or v == '' or v in _sentinels
# 1. Watch level
v = watch.get(field_name)
if not _is_unset(v):
return v
# 2. Tag level
tags = datastore.get_all_tags_for_watch(uuid=watch.get('uuid'))
if tags:
for tag in tags.values():
if require_tag_override and not tag.get('overrides_watch'):
continue
v = tag.get(field_name)
if not _is_unset(v):
return v
# 3. Global application settings
v = datastore.data['settings']['application'].get(field_name)
if not _is_unset(v):
return v
return default
@@ -0,0 +1,3 @@
from .registry import registry, NotificationProfileType, AppriseProfileType
__all__ = ['registry', 'NotificationProfileType', 'AppriseProfileType']
@@ -0,0 +1,73 @@
"""
Per-profile notification log.
Each profile gets its own log file at:
{datastore_path}/notification-logs/{profile_uuid}.log
Entries are stored as JSON-lines (one JSON object per line).
The file is capped at MAX_ENTRIES lines (oldest pruned first).
"""
import json
import os
from datetime import datetime, timezone
MAX_ENTRIES = 100
_LOG_DIR = 'notification-logs'
def _log_file(datastore_path: str, profile_uuid: str) -> str:
return os.path.join(datastore_path, _LOG_DIR, f'{profile_uuid}.log')
def write_profile_log(datastore_path: str, profile_uuid: str, *,
watch_url: str = '',
watch_uuid: str = '',
status: str, # 'ok' | 'error' | 'test'
message: str = ''):
"""Append one log entry; prune to MAX_ENTRIES."""
log_dir = os.path.join(datastore_path, _LOG_DIR)
os.makedirs(log_dir, exist_ok=True)
entry = json.dumps({
'ts': datetime.now(tz=timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC'),
'watch_url': watch_url[:200],
'watch_uuid': watch_uuid,
'status': status,
'message': message[:500],
}, ensure_ascii=False)
path = _log_file(datastore_path, profile_uuid)
try:
with open(path, 'r', encoding='utf-8') as fh:
lines = [l for l in fh.read().splitlines() if l.strip()]
except FileNotFoundError:
lines = []
lines.append(entry)
lines = lines[-MAX_ENTRIES:]
with open(path, 'w', encoding='utf-8') as fh:
fh.write('\n'.join(lines) + '\n')
def read_profile_log(datastore_path: str, profile_uuid: str) -> list:
"""Return log entries as a list of dicts, newest first."""
path = _log_file(datastore_path, profile_uuid)
try:
with open(path, 'r', encoding='utf-8') as fh:
lines = [l.strip() for l in fh if l.strip()]
except FileNotFoundError:
return []
entries = []
for line in reversed(lines):
try:
entries.append(json.loads(line))
except (json.JSONDecodeError, ValueError):
pass
return entries
def has_log(datastore_path: str, profile_uuid: str) -> bool:
return os.path.exists(_log_file(datastore_path, profile_uuid))
@@ -0,0 +1,111 @@
"""
Notification Profile Type plugin registry.
NotificationProfileType is the abstract base the only contract is send().
Plugins are free to use any delivery mechanism (Apprise, direct HTTP, SDK, etc.).
Built-in: AppriseProfileType (raw Apprise URL list).
Third-party plugins register additional types:
from changedetectionio.notification_profiles.registry import registry, NotificationProfileType
@registry.register
class MyProfileType(NotificationProfileType):
type_id = "mytype"
display_name = "My Service"
icon = "bell"
template = "my_plugin/notification_profiles/types/mytype.html"
def send(self, config: dict, n_object: dict, datastore) -> bool:
requests.post(config['webhook_url'], json={"text": n_object['notification_body']})
return True
"""
from abc import ABC, abstractmethod
class NotificationProfileType(ABC):
type_id: str = NotImplemented
display_name: str = NotImplemented
icon: str = "bell" # feather icon name
template: str = NotImplemented # Jinja2 partial rendered in the profile edit form
@abstractmethod
def send(self, config: dict, n_object: dict, datastore) -> bool:
"""
Deliver the notification.
Args:
config: The profile's config dict (type-specific fields).
n_object: Fully-rendered NotificationContextData (title, body, format, etc.).
datastore: App datastore for any extra lookups.
Returns True on success, False on failure (do not raise log instead).
"""
def validate(self, config: dict) -> None:
"""Raise ValueError with a user-readable message on invalid config."""
pass
def get_url_hint(self, config: dict) -> str:
"""Short display string shown in the selector chip tooltip / dropdown row."""
return ''
class AppriseProfileType(NotificationProfileType):
"""Delivers notifications via Apprise using a raw URL list."""
type_id = "apprise"
display_name = "Apprise"
icon = "bell"
template = "notification_profiles/types/apprise.html"
def get_apprise_urls(self, config: dict) -> list:
return config.get('notification_urls') or []
def send(self, config: dict, n_object, datastore) -> bool:
from changedetectionio.notification.handler import process_notification
from changedetectionio.notification_service import NotificationContextData
urls = self.get_apprise_urls(config)
if not urls:
return False
if not isinstance(n_object, NotificationContextData):
n_object = NotificationContextData(n_object)
n_object['notification_urls'] = urls
n_object['notification_title'] = config.get('notification_title') or n_object.get('notification_title')
n_object['notification_body'] = config.get('notification_body') or n_object.get('notification_body')
n_object['notification_format'] = config.get('notification_format') or n_object.get('notification_format')
process_notification(n_object, datastore)
return True
def get_url_hint(self, config: dict) -> str:
urls = config.get('notification_urls') or []
if urls:
u = urls[0]
return (u[:60] + '') if len(u) > 60 else u
return ''
class _Registry:
def __init__(self):
self._types: dict = {}
def register(self, cls):
"""Register a NotificationProfileType subclass. Usable as a decorator."""
instance = cls()
self._types[instance.type_id] = instance
return cls
def get(self, type_id: str) -> NotificationProfileType:
return self._types.get(type_id, self._types.get('apprise'))
def all(self) -> list:
return list(self._types.values())
def choices(self) -> list:
return [(t.type_id, t.display_name) for t in self._types.values()]
registry = _Registry()
registry.register(AppriseProfileType)
@@ -0,0 +1,49 @@
"""
Resolve the full set of NotificationProfile objects that should fire for a given watch.
Merges profile UUIDs from: Watch Tags System (union, deduplicated).
Mute cascade is checked separately via resolve_setting() before calling this.
"""
from loguru import logger
def resolve_notification_profiles(watch, datastore) -> list:
"""
Return list of (profile_dict, NotificationProfileType) tuples to fire for *watch*.
Profiles are deduplicated by UUID if the same UUID appears at multiple levels
it fires once, not multiple times.
"""
from changedetectionio.notification_profiles.registry import registry
all_profiles = datastore.data['settings']['application'].get('notification_profile_data', {})
seen = set()
result = []
def _add(uuids):
for uid in (uuids or []):
if uid in seen:
continue
profile = all_profiles.get(uid)
if not profile:
logger.warning(f"Notification profile UUID {uid!r} not found, skipping")
continue
seen.add(uid)
type_handler = registry.get(profile.get('type', 'apprise'))
result.append((profile, type_handler))
# 1. Watch-level
_add(watch.get('notification_profiles', []))
# 2. Tag/group level
tags = datastore.get_all_tags_for_watch(uuid=watch.get('uuid'))
if tags:
for tag in tags.values():
_add(tag.get('notification_profiles', []))
# 3. System level
_add(datastore.data['settings']['application'].get('notification_profiles', []))
return result
+26 -46
View File
@@ -237,14 +237,23 @@ def register_builtin_fetchers():
This is called from content_fetchers/__init__.py after all fetchers are imported
to avoid circular import issues.
"""
from changedetectionio.content_fetchers import requests, playwright, puppeteer, webdriver_selenium
from changedetectionio.content_fetchers import requests, puppeteer, webdriver_selenium
from changedetectionio.content_fetchers.playwright import CDP, chrome, firefox, webkit
# Register each built-in fetcher plugin
if hasattr(requests, 'requests_plugin'):
plugin_manager.register(requests.requests_plugin, 'builtin_requests')
if hasattr(playwright, 'playwright_plugin'):
plugin_manager.register(playwright.playwright_plugin, 'builtin_playwright')
if hasattr(CDP, 'cdp_plugin'):
plugin_manager.register(CDP.cdp_plugin, 'builtin_playwright_cdp')
if hasattr(chrome, 'chrome_plugin'):
plugin_manager.register(chrome.chrome_plugin, 'builtin_playwright_chrome')
if hasattr(firefox, 'firefox_plugin'):
plugin_manager.register(firefox.firefox_plugin, 'builtin_playwright_firefox')
if hasattr(webkit, 'webkit_plugin'):
plugin_manager.register(webkit.webkit_plugin, 'builtin_playwright_webkit')
if hasattr(puppeteer, 'puppeteer_plugin'):
plugin_manager.register(puppeteer.puppeteer_plugin, 'builtin_puppeteer')
@@ -360,57 +369,28 @@ def get_active_plugins():
def get_fetcher_capabilities(watch, datastore):
"""Get capability flags for a watch's fetcher.
"""Get capability flags for a watch's resolved fetcher.
Args:
watch: The watch object/dict
datastore: The datastore to resolve 'system' fetcher
Uses the BrowserProfile resolution chain (watch tag global built-in)
to determine the actual fetcher class, then reads its capability flags.
Returns:
dict: Dictionary with capability flags:
{
'supports_browser_steps': bool,
'supports_screenshots': bool,
'supports_xpath_element_data': bool
}
dict: {'supports_browser_steps': bool, 'supports_screenshots': bool,
'supports_xpath_element_data': bool}
"""
# Get the fetcher name from watch
fetcher_name = watch.get('fetch_backend', 'system')
# Resolve 'system' to actual fetcher
if fetcher_name == 'system':
fetcher_name = datastore.data['settings']['application'].get('fetch_backend', 'html_requests')
# Get the fetcher class
from changedetectionio.model.browser_profile import resolve_browser_profile
from changedetectionio import content_fetchers
# Try to get from built-in fetchers first
if hasattr(content_fetchers, fetcher_name):
fetcher_class = getattr(content_fetchers, fetcher_name)
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
profile = resolve_browser_profile(watch, datastore)
fetcher_class = content_fetchers.get_fetcher(profile.fetch_backend)
# Try to get from plugin-provided fetchers
# Query all plugins for registered fetchers
plugin_fetchers = plugin_manager.hook.register_content_fetcher()
for fetcher_registration in plugin_fetchers:
if fetcher_registration:
name, fetcher_class = fetcher_registration
if name == fetcher_name:
return {
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False)
}
if fetcher_class is None:
return {'supports_browser_steps': False, 'supports_screenshots': False, 'supports_xpath_element_data': False}
# Default: no capabilities
return {
'supports_browser_steps': False,
'supports_screenshots': False,
'supports_xpath_element_data': False
'supports_browser_steps': getattr(fetcher_class, 'supports_browser_steps', False),
'supports_screenshots': getattr(fetcher_class, 'supports_screenshots', False),
'supports_xpath_element_data': getattr(fetcher_class, 'supports_xpath_element_data', False),
}
+67 -62
View File
@@ -23,6 +23,7 @@ class difference_detection_processor():
watch = None
xpath_data = None
preferred_proxy = None
preferred_proxy_override = None # Set externally to force a specific proxy (e.g. proxy checker)
screenshot_format = SCREENSHOT_FORMAT_JPEG
last_raw_content_checksum = None
@@ -36,6 +37,8 @@ class difference_detection_processor():
# 2. Preserves Watch object with properties (.link, .is_pdf, etc.) - can't use dict()
# 3. Safe now: Watch.__deepcopy__() shares datastore ref (no memory leak) but copies dict data
self.watch = deepcopy(self.datastore.data['watching'].get(watch_uuid))
if self.watch is None:
raise KeyError(f"Watch UUID {watch_uuid} not found in datastore (deleted before processing?)")
# Generic fetcher that should be extended (requests, playwright etc)
self.fetcher = Fetcher()
@@ -115,82 +118,65 @@ class difference_detection_processor():
f"Set ALLOW_IANA_RESTRICTED_ADDRESSES=true to allow."
)
async def call_browser(self, preferred_proxy_id=None):
async def call_browser(self):
from requests.structures import CaseInsensitiveDict
from changedetectionio.model.browser_profile import resolve_browser_profile, BUILTIN_REQUESTS
url = self.watch.link
# Protect against file:, file:/, file:// access, check the real "link" without any meta "source:" etc prepended.
# Protect against file:, file:/, file:// access
if re.search(r'^file:', url.strip(), re.IGNORECASE):
if not strtobool(os.getenv('ALLOW_FILE_URI', 'false')):
raise Exception(
"file:// type access is denied for security reasons."
)
raise Exception("file:// type access is denied for security reasons.")
await self.validate_iana_url()
# Requests, playwright, other browser via wss:// etc, fetch_extra_something
prefer_fetch_backend = self.watch.get('fetch_backend', 'system')
# Resolve the full browser profile for this watch (watch → tag → global → built-in)
profile = resolve_browser_profile(self.watch, self.datastore)
# Proxy ID "key"
preferred_proxy_id = preferred_proxy_id if preferred_proxy_id else self.datastore.get_preferred_proxy_for_watch(
uuid=self.watch.get('uuid'))
# Pluggable content self.fetcher
if not prefer_fetch_backend or prefer_fetch_backend == 'system':
prefer_fetch_backend = self.datastore.data['settings']['application'].get('fetch_backend')
# In the case that the preferred fetcher was a browser config with custom connection URL..
# @todo - on save watch, if its extra_browser_ then it should be obvious it will use playwright (like if its requests now..)
custom_browser_connection_url = None
if prefer_fetch_backend.startswith('extra_browser_'):
(t, key) = prefer_fetch_backend.split('extra_browser_')
connection = list(
filter(lambda s: (s['browser_name'] == key), self.datastore.data['settings']['requests'].get('extra_browsers', [])))
if connection:
prefer_fetch_backend = 'html_webdriver'
custom_browser_connection_url = connection[0].get('browser_connection_url')
# PDF should be html_requests because playwright will serve it up (so far) in a embedded page
# PDFs always use the requests fetcher — browsers render them in an embedded viewer
# @todo https://github.com/dgtlmoon/changedetection.io/issues/2019
# @todo needs test to or a fix
if self.watch.is_pdf:
prefer_fetch_backend = "html_requests"
profile = BUILTIN_REQUESTS
# Grab the right kind of 'fetcher', (playwright, requests, etc)
# Resolve proxy for the target URL fetch.
# Note: browser_connection_url is the WebSocket endpoint to reach the remote browser,
# which is separate from the proxy used by the browser to fetch target pages.
proxy_url = self.datastore.get_proxy_url_for_watch(self.watch.get('uuid'), override_id=self.preferred_proxy_override)
if proxy_url:
logger.debug(f"Proxy '{proxy_url}' for {url}")
logger.debug(f"BrowserProfile '{profile.get_machine_name()}' (fetcher={profile.fetch_backend}) for watch {self.watch['uuid']}")
# Select the fetcher class
from changedetectionio import content_fetchers
if hasattr(content_fetchers, prefer_fetch_backend):
# @todo TEMPORARY HACK - SWITCH BACK TO PLAYWRIGHT FOR BROWSERSTEPS
if prefer_fetch_backend == 'html_webdriver' and self.watch.has_browser_steps:
# This is never supported in selenium anyway
logger.warning(
"Using playwright fetcher override for possible puppeteer request in browsersteps, because puppetteer:browser steps is incomplete.")
from changedetectionio.content_fetchers.playwright import fetcher as playwright_fetcher
fetcher_obj = playwright_fetcher
else:
fetcher_obj = getattr(content_fetchers, prefer_fetch_backend)
else:
# What it referenced doesnt exist, Just use a default
fetcher_obj = getattr(content_fetchers, "html_requests")
fetcher_class_name = profile.get_fetcher_class_name()
proxy_url = None
if preferred_proxy_id:
# Custom browser endpoints should NOT have a proxy added
if not prefer_fetch_backend.startswith('extra_browser_'):
proxy_url = self.datastore.proxy_list.get(preferred_proxy_id).get('url')
logger.debug(f"Selected proxy key '{preferred_proxy_id}' as proxy URL '{proxy_url}' for {url}")
else:
logger.debug("Skipping adding proxy data when custom Browser endpoint is specified. ")
fetcher_obj = content_fetchers.get_fetcher(fetcher_class_name)
if fetcher_obj is None:
logger.warning(f"Fetcher '{fetcher_class_name}' not found, falling back to requests")
fetcher_obj = content_fetchers.get_fetcher('requests')
elif self.watch.has_browser_steps and not getattr(fetcher_obj, 'supports_browser_steps', False):
# Browser steps require Playwright — override if the resolved fetcher doesn't support them
logger.warning(f"Fetcher '{fetcher_class_name}' does not support browser steps, overriding to Playwright")
fetcher_obj = content_fetchers.get_fetcher('playwright')
logger.debug(f"Using proxy '{proxy_url}' for {self.watch['uuid']}")
# Now call the fetcher (playwright/requests/etc) with arguments that only a fetcher would need.
# When browser_connection_url is None, it method should default to working out whats the best defaults (os env vars etc)
self.fetcher = fetcher_obj(proxy_override=proxy_url,
custom_browser_connection_url=custom_browser_connection_url,
screenshot_format=self.screenshot_format
)
self.fetcher = fetcher_obj(
proxy_override=proxy_url,
custom_browser_connection_url=profile.browser_connection_url,
screenshot_format=self.screenshot_format,
# BrowserProfile fields — browser fetchers use these; html_requests ignores them
viewport_width=profile.viewport_width,
viewport_height=profile.viewport_height,
block_images=profile.block_images,
block_fonts=profile.block_fonts,
profile_user_agent=profile.user_agent,
ignore_https_errors=profile.ignore_https_errors,
locale=profile.locale,
service_workers=profile.service_workers,
extra_delay=profile.extra_delay,
)
if self.watch.has_browser_steps:
self.fetcher.browser_steps = browser_steps_get_valid_steps(self.watch.get('browser_steps', []))
@@ -200,9 +186,17 @@ class difference_detection_processor():
from changedetectionio.jinja2_custom import render as jinja_render
request_headers = CaseInsensitiveDict()
ua = self.datastore.data['settings']['requests'].get('default_ua')
if ua and ua.get(prefer_fetch_backend):
request_headers.update({'User-Agent': ua.get(prefer_fetch_backend)})
# Browser profile: UA override (lowest priority — watch headers override this)
if profile.user_agent:
request_headers['User-Agent'] = profile.user_agent
# Browser profile: custom headers (override profile UA, but watch headers override these)
if profile.custom_headers:
for line in profile.custom_headers.splitlines():
line = line.strip()
if not line.startswith('#') and ':' in line:
k, v = line.split(':', 1)
request_headers[k.strip()] = v.strip()
request_headers.update(self.watch.get('headers', {}))
request_headers.update(self.datastore.get_all_base_headers())
@@ -259,6 +253,17 @@ class difference_detection_processor():
# @todo .quit here could go on close object, so we can run JS if change-detected
await self.fetcher.quit(watch=self.watch)
self.fetcher.disk_cleanup_after_fetch()
# Sanitize lone surrogates - these can appear when servers return malformed/mixed-encoding
# content that gets decoded into surrogate characters (e.g. \udcad). Without this,
# encode('utf-8') raises UnicodeEncodeError downstream in checksums, diffs, file writes, etc.
# Covers all fetchers (requests, playwright, puppeteer, selenium) in one place.
# Also note: By this point we SHOULD know the original encoding so it can safely convert to utf-8 for the rest of the app.
# See: https://github.com/dgtlmoon/changedetection.io/issues/3952
if self.fetcher.content and isinstance(self.fetcher.content, str):
self.fetcher.content = self.fetcher.content.encode('utf-8', errors='replace').decode('utf-8')
# After init, call run_changedetection() which will do the actual change-detection
+2 -5
View File
@@ -42,10 +42,7 @@ def render_form(watch, datastore, request, url_for, render_template, flash, redi
# Get error information for the template
screenshot_url = watch.get_screenshot()
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
@@ -62,7 +59,7 @@ def render_form(watch, datastore, request, url_for, render_template, flash, redi
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
screenshot=screenshot_url,
is_html_webdriver=is_html_webdriver,
fetcher_supports_screenshots=fetcher_supports_screenshots,
password_enabled_and_share_is_off=password_enabled_and_share_is_off,
extra_title=f" - {watch.label} - Extract Data",
extra_stylesheets=[url_for('static_content', group='styles', filename='diff.css')],
+7 -1
View File
@@ -100,7 +100,13 @@ class guess_stream_type():
if any(s in http_content_header for s in RSS_XML_CONTENT_TYPES):
self.is_rss = True
elif any(s in http_content_header for s in JSON_CONTENT_TYPES):
self.is_json = True
# JSONP detection: server claims application/json but content is actually JSONP (e.g. cb({...}))
# A JSONP response starts with an identifier followed by '(' - not valid JSON
if re.match(r'^\w[\w.]*\s*\(', test_content):
logger.warning(f"Content-Type header claims JSON but content looks like JSONP (starts with identifier+parenthesis) - treating as plaintext")
self.is_plaintext = True
else:
self.is_json = True
elif 'pdf' in magic_content_header:
self.is_pdf = True
# magic will call a rss document 'xml'
@@ -1,6 +1,7 @@
from babel.numbers import parse_decimal
from changedetectionio.model.Watch import model as BaseWatch
from decimal import Decimal, InvalidOperation
from typing import Union
import re
@@ -10,6 +11,8 @@ supports_browser_steps = True
supports_text_filters_and_triggers = True
supports_text_filters_and_triggers_elements = True
supports_request_type = True
_price_re = re.compile(r"Price:\s*(\d+(?:\.\d+)?)", re.IGNORECASE)
class Restock(dict):
@@ -31,6 +34,7 @@ class Restock(dict):
if standardized_value:
# Convert to float
# @todo locale needs to be the locale of the webpage
return float(parse_decimal(standardized_value, locale='en'))
return None
@@ -62,6 +66,17 @@ class Restock(dict):
super().__setitem__(key, value)
def get_price_from_history_str(history_str):
m = _price_re.search(history_str)
if not m:
return None
try:
return str(Decimal(m.group(1)))
except InvalidOperation:
return None
class Watch(BaseWatch):
def __init__(self, *arg, **kw):
super().__init__(*arg, **kw)
@@ -75,13 +90,27 @@ class Watch(BaseWatch):
def extra_notification_token_values(self):
values = super().extra_notification_token_values()
values['restock'] = self.get('restock', {})
values['restock']['previous_price'] = None
if self.history_n >= 2:
history = self.history
if history and len(history) >=2:
"""Unfortunately for now timestamp is stored as string key"""
sorted_keys = sorted(list(history), key=lambda x: int(x))
sorted_keys.reverse()
price_str = self.get_history_snapshot(timestamp=sorted_keys[-1])
if price_str:
values['restock']['previous_price'] = get_price_from_history_str(price_str)
return values
def extra_notification_token_placeholder_info(self):
values = super().extra_notification_token_placeholder_info()
values.append(('restock.price', "Price detected"))
values.append(('restock.in_stock', "In stock status"))
values.append(('restock.original_price', "Original price at first check"))
values.append(('restock.previous_price', "Previous price in history"))
return values
@@ -437,17 +437,18 @@ class perform_site_check(difference_detection_processor):
# Only try to process restock information (like scraping for keywords) if the page was actually rendered correctly.
# Otherwise it will assume "in stock" because nothing suggesting the opposite was found
from ...html_tools import html_to_text
text = html_to_text(self.fetcher.content)
logger.debug(f"Length of text after conversion: {len(text)}")
if not len(text):
from ...content_fetchers.exceptions import ReplyWithContentButNoText
raise ReplyWithContentButNoText(url=watch.link,
status_code=self.fetcher.get_last_status_code(),
screenshot=self.fetcher.screenshot,
html_content=self.fetcher.content,
xpath_data=self.fetcher.xpath_data
)
#useless
# from ...html_tools import html_to_text
# text = html_to_text(self.fetcher.content)
# logger.debug(f"Length of text after conversion: {len(text)}")
# if not len(text):
# from ...content_fetchers.exceptions import ReplyWithContentButNoText
# raise ReplyWithContentButNoText(url=watch.link,
# status_code=self.fetcher.get_last_status_code(),
# screenshot=self.fetcher.screenshot,
# html_content=self.fetcher.content,
# xpath_data=self.fetcher.xpath_data
# )
# Which restock settings to compare against?
# Settings are stored in restock_diff.json (migrated from watch.json by update_30).
@@ -488,19 +489,9 @@ class perform_site_check(difference_detection_processor):
# @TODO !!! some setting like "Use as fallback" or "always use", "t
if not (has_price and has_availability) or True:
from changedetectionio.pluggy_interface import get_itemprop_availability_from_plugin
fetcher_name = watch.get('fetch_backend', 'html_requests')
# Resolve 'system' to the actual fetcher being used
# This allows plugins to work even when watch uses "system settings default"
if fetcher_name == 'system':
# Get the actual fetcher that was used (from self.fetcher)
# Fetcher class name gives us the actual backend (e.g., 'html_requests', 'html_webdriver')
actual_fetcher = type(self.fetcher).__name__
if 'html_requests' in actual_fetcher.lower():
fetcher_name = 'html_requests'
elif 'webdriver' in actual_fetcher.lower() or 'playwright' in actual_fetcher.lower():
fetcher_name = 'html_webdriver'
logger.debug(f"Resolved 'system' fetcher to actual fetcher: {fetcher_name}")
# Use the actual resolved fetcher name from the fetcher instance
fetcher_name = self.watch.effective_browser_profile.fetch_backend
logger.debug(f"Resolved effective fetcher: {fetcher_name}")
# Try plugin override - plugins can decide if they support this fetcher
if fetcher_name:
@@ -283,4 +283,7 @@ def query_price_availability(extracted_data):
if not result.get('availability') and 'availability' in microdata:
result['availability'] = microdata['availability']
# result['price'] could be float or str here, depending on the website, for example it might contain "1,00" commas, etc.
# using something like babel you need to know the locale of the website and even then it can be problematic
# we dont really do anything with the price data so far.. so just accept it the way it comes.
return result
@@ -154,11 +154,7 @@ def render(watch, datastore, request, url_for, render_template, flash, redirect,
screenshot_url = watch.get_screenshot()
system_uses_webdriver = datastore.data['settings']['application']['fetch_backend'] == 'html_webdriver'
is_html_webdriver = False
if (watch.get('fetch_backend') == 'system' and system_uses_webdriver) or watch.get('fetch_backend') == 'html_webdriver' or watch.get('fetch_backend', '').startswith('extra_browser_'):
is_html_webdriver = True
fetcher_supports_screenshots = watch.fetcher_supports_screenshots
password_enabled_and_share_is_off = False
if datastore.data['settings']['application'].get('password') or os.getenv("SALTED_PASS", False):
@@ -214,7 +210,7 @@ def render(watch, datastore, request, url_for, render_template, flash, redirect,
extra_title=f" - {watch.label} - History",
extract_form=extract_form,
from_version=str(from_version),
is_html_webdriver=is_html_webdriver,
fetcher_supports_screenshots=fetcher_supports_screenshots,
last_error=watch['last_error'],
last_error_screenshot=watch.get_error_snapshot(),
last_error_text=watch.get_error_text(),
+2
View File
@@ -29,9 +29,11 @@ def register_watch_operation_handlers(socketio, datastore):
# Perform the operation
if op == 'pause':
watch.toggle_pause()
watch.commit()
logger.info(f"Socket.IO: Toggled pause for watch {uuid}")
elif op == 'mute':
watch.toggle_mute()
watch.commit()
logger.info(f"Socket.IO: Toggled mute for watch {uuid}")
elif op == 'recheck':
# Import here to avoid circular imports
@@ -199,8 +199,31 @@ def handle_watch_update(socketio, **kwargs):
logger.error(f"Socket.IO error in handle_watch_update: {str(e)}")
def _suppress_werkzeug_ws_abrupt_disconnect_noise():
"""Patch BaseWSGIServer.log to suppress the AssertionError traceback that fires when
a browser closes a WebSocket connection mid-handshake (e.g. closing a tab).
The exception is caught inside run_wsgi and routed to self.server.log() it never
propagates out, so wrapping run_wsgi doesn't help. Patching the log method is the
only reliable intercept point. The error is cosmetic: Socket.IO already handles the
disconnect correctly via its own disconnect handler and timeout logic."""
try:
from werkzeug.serving import BaseWSGIServer
_original_log = BaseWSGIServer.log
def _filtered_log(self, type, message, *args):
if type == 'error' and 'write() before start_response' in message:
return
_original_log(self, type, message, *args)
BaseWSGIServer.log = _filtered_log
except Exception:
pass
def init_socketio(app, datastore):
"""Initialize SocketIO with the main Flask app"""
_suppress_werkzeug_ws_abrupt_disconnect_noise()
import platform
import sys
File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 10 KiB

+8
View File
@@ -116,6 +116,14 @@ $(document).ready(function () {
$('#realtime-conn-error').show();
});
// Tell the server we're leaving cleanly so it can release the connection
// immediately rather than waiting for a timeout.
// Note: this only fires for voluntary closes (tab/window close, navigation away).
// Hard kills, crashes and network drops will still timeout normally on the server.
window.addEventListener('beforeunload', function () {
socket.disconnect();
});
socket.on('queue_size', function (data) {
console.log(`${data.event_timestamp} - Queue size update: ${data.q_length}`);
if(queueSizePagerInfoText) {
+1 -1
View File
@@ -4,7 +4,7 @@ $(document).ready(function(){
});
var checkUserVal = function(){
if($('#fetch_backend input:checked').val()=='html_requests') {
if($('#fetch_backend input:checked').val()=='requests') {
$('#request-override').show();
$('#webdriver-stepper').hide();
} else {
+25 -6
View File
@@ -3,21 +3,40 @@ $(document).ready(function () {
// Lazy Hide/Show elements mechanism
$('[data-visible-for]').hide();
function show_related_elem(e) {
var n = $(e).attr('name') + "=" + $(e).val();
if (n === 'fetch_backend=system') {
var name = $(e).attr('name');
var val = $(e).val();
var n = name + "=" + val;
// Resolve browser_profile select → underlying fetch_backend class name
// browserProfileFetcherMap is injected by the page as {machine_name: 'playwright', ...}
if (name && name.endsWith('browser_profile') && typeof browserProfileFetcherMap !== 'undefined') {
var fetcherClass = val === 'system'
? (typeof default_system_fetch_backend !== 'undefined' ? default_system_fetch_backend : null)
: browserProfileFetcherMap[val];
if (fetcherClass) {
n = 'fetch_backend=' + fetcherClass;
}
} else if (n === 'fetch_backend=system') {
n = "fetch_backend=" + default_system_fetch_backend;
}
$(`[data-visible-for~="${n}"]`).show();
}
$(':radio').on('keyup keypress blur change click', function (e) {
$('select, :radio').on('change', function (e) {
$(`[data-visible-for]`).hide();
$('.advanced-options').hide();
show_related_elem(this);
});
// Retain original click/keyup handling for radio buttons
$(':radio').on('keyup keypress blur click', function (e) {
$(`[data-visible-for]`).hide();
$('.advanced-options').hide();
show_related_elem(this);
});
$(':radio:checked').each(function (e) {
$(':radio:checked, select').each(function (e) {
show_related_elem(this);
})
});
// Show advanced
@@ -26,4 +45,4 @@ $(document).ready(function () {
$(this).toggle();
})
});
});
});
+150 -10
View File
@@ -143,7 +143,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.__data['settings']['application']['tags'][uuid] = Tag.model(
datastore_path=self.datastore_path,
__datastore=self.__data,
__datastore=self,
default=tag
)
logger.info(f"Tag: {uuid} {tag['title']}")
@@ -207,7 +207,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
self.json_store_path = os.path.join(self.datastore_path, "changedetection.json")
# Base definition for all watchers (deepcopy part of #569)
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, __datastore=self.__data, default={}))
self.generic_definition = deepcopy(Watch.model(datastore_path=datastore_path, __datastore=self, default={}))
# Load build SHA if available (Docker deployments)
if path.isfile('changedetectionio/source.txt'):
@@ -245,6 +245,10 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# Maybe they copied a bunch of watch subdirs across too
self._load_state()
# Apply env-var browser config after state is fully loaded so we can safely
# read existing settings without risk of being overwritten.
self.preconfigure_browser_profiles_based_on_env()
def init_fresh_install(self, include_default_watches, version_tag):
# Generate app_guid FIRST (required for all operations)
if "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ:
@@ -268,13 +272,11 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
if include_default_watches:
self.add_watch(
url='https://news.ycombinator.com/',
tag='Tech news',
extras={'fetch_backend': 'html_requests'}
tag='Tech news'
)
self.add_watch(
url='https://changedetection.io/CHANGELOG.txt',
tag='changedetection.io',
extras={'fetch_backend': 'html_requests'}
tag='changedetection.io'
)
# Create changedetection.json immediately
@@ -331,9 +333,64 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
if entity.get('processor') != 'text_json_diff':
logger.trace(f"Loading Watch object '{watch_class.__module__}.{watch_class.__name__}' for UUID {uuid}")
entity = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, default=entity)
entity = watch_class(datastore_path=self.datastore_path, __datastore=self, default=entity)
return entity
def preconfigure_browser_profiles_based_on_env(self):
"""Instantiate browser profiles from environment variables and store them.
Always runs at the end of reload_state() covers fresh installs,
existing datastores, and server restarts. Env vars always win so that
changing PLAYWRIGHT_DRIVER_URL and restarting is reflected immediately.
Creates BrowserProfile instances from env vars and stores them in
``settings.application.browser_profiles`` under their machine names,
then sets ``settings.application.browser_profile`` to that profile as
the system-wide default.
"""
from changedetectionio.model import browser_profile as bp
from changedetectionio.strtobool import strtobool
store_profiles = self.__data['settings']['application'].setdefault('browser_profiles', {})
service_workers = os.getenv('PLAYWRIGHT_SERVICE_WORKERS', 'allow')
extra_delay = int(os.getenv('WEBDRIVER_DELAY_BEFORE_CONTENT_READY', 0))
configured_profile = None
playwright_url = os.getenv('PLAYWRIGHT_DRIVER_URL')
if playwright_url:
playwright_url = playwright_url.strip('"')
builtin = bp.BUILTIN_PUPPETEER if strtobool(os.getenv('FAST_PUPPETEER_CHROME_FETCHER', 'False')) else bp.BUILTIN_PLAYWRIGHT
profile = bp.BrowserProfile(
name=builtin.name,
fetch_backend=builtin.fetch_backend,
browser_connection_url=playwright_url,
service_workers=service_workers,
extra_delay=extra_delay,
is_builtin=True,
)
logger.debug(f"Configuring browser profile '{profile.get_machine_name()}' from env")
store_profiles[profile.get_machine_name()] = profile.model_dump()
configured_profile = profile
webdriver_url = os.getenv('WEBDRIVER_URL')
if webdriver_url:
profile = bp.BrowserProfile(
name=bp.BUILTIN_SELENIUM.name,
fetch_backend=bp.BUILTIN_SELENIUM.fetch_backend,
browser_connection_url=webdriver_url.strip('"'),
extra_delay=extra_delay,
is_builtin=True,
)
logger.debug(f"Configuring browser profile '{profile.get_machine_name()}' from env")
store_profiles[profile.get_machine_name()] = profile.model_dump()
if not configured_profile:
configured_profile = profile
if configured_profile:
logger.debug(f"Setting system default browser profile to '{configured_profile.get_machine_name()}'")
self.__data['settings']['application']['browser_profile'] = configured_profile.get_machine_name()
# ============================================================================
# FileSavingDataStore Abstract Method Implementations
# ============================================================================
@@ -365,6 +422,14 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# Is saved as {uuid}/tag.json
settings_copy['application']['tags'] = {}
# Serialize BrowserProfile Pydantic instances to plain dicts for JSON storage
raw_profiles = settings_copy['application'].get('browser_profiles', {})
from changedetectionio.model.browser_profile import BrowserProfile
settings_copy['application']['browser_profiles'] = {
k: v.model_dump() if isinstance(v, BrowserProfile) else v
for k, v in raw_profiles.items()
}
return {
'note': 'Settings file - watches are in {uuid}/watch.json, tags are in {uuid}/tag.json',
'app_guid': self.__data.get('app_guid'),
@@ -421,7 +486,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return Tag.model(
datastore_path=self.datastore_path,
__datastore=self.__data,
__datastore=self,
default=entity_dict
)
@@ -767,7 +832,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
# If the processor also has its own Watch implementation
watch_class = get_custom_watch_obj_for_processor(apply_extras.get('processor'))
new_watch = watch_class(datastore_path=self.datastore_path, __datastore=self.__data, url=url)
new_watch = watch_class(datastore_path=self.datastore_path, __datastore=self, url=url)
new_uuid = new_watch.get('uuid')
@@ -852,6 +917,16 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return proxy_list if len(proxy_list) else None
def get_proxy_url_for_watch(self, uuid, override_id=None):
"""
Returns the resolved proxy URL string for a watch, or None.
override_id forces a specific proxy (e.g. proxy checker bypass).
"""
proxy_id = override_id or self.get_preferred_proxy_for_watch(uuid)
if proxy_id:
return self.proxy_list.get(proxy_id, {}).get('url')
return None
def get_preferred_proxy_for_watch(self, uuid):
"""
Returns the preferred proxy by ID key
@@ -885,6 +960,71 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
return None
# ------------------------------------------------------------------
# BrowserProfile helpers
# ------------------------------------------------------------------
def get_browser_profile(self, machine_name: str):
"""Return a BrowserProfile by machine name, or None if not found.
Built-in profiles (direct_http_requests, browser_chromeplaywright) are
always available and checked first.
"""
from changedetectionio.model.browser_profile import get_profile
store_profiles = self.data['settings']['application'].get('browser_profiles', {})
return get_profile(machine_name, store_profiles)
def delete_browser_profile(self, machine_name: str):
"""Delete a user-defined BrowserProfile by machine name.
Rules enforced:
- Built-in profiles cannot be deleted.
- The profile cannot be the current system default
(settings.application.browser_profile); caller must change the
default first.
- Any watch or tag that referenced this profile is reset to None
(falls back through the chain on next fetch).
Returns the number of watches/tags that were reset.
"""
from changedetectionio.model.browser_profile import RESERVED_MACHINE_NAMES
if machine_name in RESERVED_MACHINE_NAMES:
raise ValueError(f"Built-in profile '{machine_name}' cannot be deleted")
system_default = self.data['settings']['application'].get('browser_profile')
if system_default == machine_name:
raise ValueError(
f"Profile '{machine_name}' is the system default. "
f"Change the system default before deleting it."
)
store_profiles = self.data['settings']['application'].get('browser_profiles', {})
if machine_name not in store_profiles:
return 0
del store_profiles[machine_name]
reset_count = 0
# Reset watches that reference this profile
for uuid, watch in self.data['watching'].items():
if watch.get('browser_profile') == machine_name:
watch['browser_profile'] = None
watch.commit()
reset_count += 1
# Reset tags that reference this profile
for tag_uuid, tag in self.data['settings']['application'].get('tags', {}).items():
if tag.get('browser_profile') == machine_name:
tag['browser_profile'] = None
tag.commit()
reset_count += 1
self._save_settings()
logger.info(f"Deleted BrowserProfile '{machine_name}', reset {reset_count} watches/tags")
return reset_count
@property
def has_extra_headers_file(self):
filepath = os.path.join(self.datastore_path, 'headers.txt')
@@ -962,7 +1102,7 @@ class ChangeDetectionStore(DatastoreUpdatesMixin, FileSavingDataStore):
from ..model import Tag
new_tag = Tag.model(
datastore_path=self.datastore_path,
__datastore=self.__data,
__datastore=self,
default={
'title': title.strip(),
'date_created': int(time.time())
+139
View File
@@ -15,6 +15,7 @@ import tarfile
import time
from loguru import logger
from copy import deepcopy
from typing import Optional
# Try to import orjson for faster JSON serialization
@@ -730,6 +731,144 @@ class DatastoreUpdatesMixin:
# (left this out by accident in previous update, added tags={} in the changedetection.json save_to_disk)
self._save_settings()
def update_31(self):
"""
Migrate legacy ``fetch_backend`` strings to the new ``browser_profile``
machine-name system.
What this migration does
------------------------
1. ``settings.requests.extra_browsers`` entries are converted into
``BrowserProfile`` objects and stored in
``settings.application.browser_profiles`` keyed by machine name.
2. ``settings.application.fetch_backend`` (the system-wide default) is
translated to a machine name and written to
``settings.application.browser_profile``.
3. Every watch that has an explicit ``fetch_backend`` (not ``'system'``)
gets a corresponding ``browser_profile`` machine name set, then
``fetch_backend`` is reset to ``'system'``.
4. The same translation is applied to tags with ``overrides_watch=True``
that carry an explicit ``fetch_backend``.
Legacy mapping
~~~~~~~~~~~~~~
* ``'html_requests'`` built-in ``'direct_http_requests'``
* ``'html_webdriver'`` built-in ``'browser_chromeplaywright'``
* ``'extra_browser_<name>'`` machine name of the migrated custom profile
* ``'system'`` / missing ``None`` (continue to use chain resolution)
Safe to re-run: skips watches / tags that already have ``browser_profile``
set, and skips extra_browser entries that have already been migrated.
"""
from ..model.browser_profile import (
BrowserProfile,
BUILTIN_REQUESTS,
BUILTIN_BROWSER,
)
app_settings = self.data['settings']['application']
# ------------------------------------------------------------------
# 1. Migrate extra_browsers → browser_profiles
# ------------------------------------------------------------------
extra_browsers = self.data['settings']['requests'].get('extra_browsers', [])
browser_profiles: dict = app_settings.setdefault('browser_profiles', {})
extra_browser_name_to_machine: dict[str, str] = {}
for entry in extra_browsers:
browser_name = entry.get('browser_name', '').strip()
connection_url = entry.get('browser_connection_url', '').strip()
if not browser_name:
continue
profile = BrowserProfile(
name=browser_name,
fetch_backend='playwright_cdp',
browser_connection_url=connection_url or None,
)
machine_name = profile.get_machine_name()
if machine_name not in browser_profiles:
browser_profiles[machine_name] = profile.model_dump()
logger.info(f"update_31: migrated extra_browser '{browser_name}' → profile '{machine_name}'")
extra_browser_name_to_machine[browser_name] = machine_name
# ------------------------------------------------------------------
# Helper: translate a fetch_backend string to a machine name
# ------------------------------------------------------------------
builtin_requests_name = BUILTIN_REQUESTS.get_machine_name()
builtin_browser_name = BUILTIN_BROWSER.get_machine_name()
def _to_machine_name(fetch_backend: str) -> Optional[str]:
if not fetch_backend or fetch_backend in ('system', 'default', ''):
return None
if fetch_backend.startswith('extra_browser_'):
key = fetch_backend[len('extra_browser_'):]
return extra_browser_name_to_machine.get(key)
# Strip legacy html_ prefix then query the fetcher registry
from changedetectionio import content_fetchers as cf
clean = fetch_backend[5:] if fetch_backend.startswith('html_') else fetch_backend
fetcher_cls = cf.get_fetcher(clean)
if fetcher_cls is None:
logger.warning(f"update_31: unknown fetch_backend value {fetch_backend!r}, skipping")
return None
if fetcher_cls.supports_screenshots:
return builtin_browser_name
return builtin_requests_name
# ------------------------------------------------------------------
# 2. Migrate system-wide default
# ------------------------------------------------------------------
system_fetch_backend = app_settings.get('fetch_backend', 'requests')
if not app_settings.get('browser_profile'):
machine = _to_machine_name(system_fetch_backend)
app_settings['browser_profile'] = machine
logger.info(
f"update_31: system fetch_backend '{system_fetch_backend}' → browser_profile '{machine}'"
)
# ------------------------------------------------------------------
# 3. Migrate watches
# ------------------------------------------------------------------
for uuid, watch in self.data['watching'].items():
if watch.get('browser_profile'):
continue # already migrated
fetch_backend = watch.get('fetch_backend', 'system')
machine = _to_machine_name(fetch_backend)
watch['browser_profile'] = machine
watch['fetch_backend'] = 'system' # clear legacy value
watch.commit()
if machine:
logger.info(
f"update_31: watch {uuid} fetch_backend '{fetch_backend}' → browser_profile '{machine}'"
)
# ------------------------------------------------------------------
# 4. Migrate tags
# ------------------------------------------------------------------
for tag_uuid, tag in app_settings.get('tags', {}).items():
if tag.get('browser_profile'):
continue # already migrated
fetch_backend = tag.get('fetch_backend', 'system')
machine = _to_machine_name(fetch_backend)
if machine:
tag['browser_profile'] = machine
tag['fetch_backend'] = 'system'
tag.commit()
logger.info(
f"update_31: tag {tag_uuid} fetch_backend '{fetch_backend}' → browser_profile '{machine}'"
)
self._save_settings()
logger.success("update_31: fetch_backend → browser_profile migration complete")
def update_30(self):
"""Migrate restock_settings out of watch.json into restock_diff.json processor config file.
@@ -0,0 +1,208 @@
{#
Notification Profile Selector widget.
Usage:
{% from '_notification_profiles_selector.html' import render_notification_profile_selector %}
{{ render_notification_profile_selector(
own_profiles=watch.get('notification_profiles', []),
inherited_profiles=inherited_notification_profiles,
all_profile_data=settings_application.get('notification_profile_data', {}),
registry=registry
) }}
own_profiles — list of UUIDs directly linked to this watch/group
inherited_profiles — list of (uuid, origin_label) tuples from parent groups/system
all_profile_data — dict of uuid→profile from settings.application.notification_profile_data
registry — notification_profiles.registry instance
#}
{% macro render_notification_profile_selector(own_profiles, inherited_profiles, all_profile_data, registry) %}
<div class="notification-profile-selector" id="notification-profile-selector">
{# Hidden inputs — one per selected UUID, submitted with the form #}
<div id="np-hidden-inputs">
{% for uid in own_profiles %}
<input type="hidden" name="notification_profiles" value="{{ uid }}">
{% endfor %}
</div>
<div class="np-chips" id="np-chips">
{# Own profiles — solid chips, removable #}
{% for uid in own_profiles %}
{% set profile = all_profile_data.get(uid) %}
{% if profile %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
<span class="np-chip np-chip-own" data-uuid="{{ uid }}"
title="{{ handler.get_url_hint(profile.get('config', {})) }}">
<i data-feather="{{ handler.icon }}" class="np-chip-icon"></i>
<span class="np-chip-name">{{ profile.get('name', uid) }}</span>
<span class="np-chip-remove" data-uuid="{{ uid }}" title="{{ _('Remove') }}">×</span>
</span>
{% endif %}
{% endfor %}
{# Inherited profiles — dimmed, read-only, show origin #}
{% for uid, origin_label in (inherited_profiles or []) %}
{% if uid not in own_profiles %}
{% set profile = all_profile_data.get(uid) %}
{% if profile %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
<span class="np-chip np-chip-inherited"
title="{{ _('Inherited from') }}: {{ origin_label }} — {{ handler.get_url_hint(profile.get('config', {})) }}">
<i data-feather="{{ handler.icon }}" class="np-chip-icon"></i>
<span class="np-chip-name">{{ profile.get('name', uid) }}</span>
<i data-feather="lock" class="np-chip-lock"></i>
</span>
{% endif %}
{% endif %}
{% endfor %}
{# Add button + dropdown #}
<div class="np-add-wrapper" id="np-add-wrapper">
<button type="button" class="np-add-btn pure-button button-xsmall" id="np-add-btn">
<i data-feather="plus"></i> {{ _('Add profile') }}
</button>
<div class="np-dropdown" id="np-dropdown" style="display:none;">
<input type="text" class="np-search" id="np-search" placeholder="{{ _('Search profiles…') }}" autocomplete="off">
<div class="np-options" id="np-options">
{% set has_options = [] %}
{% for uid, profile in all_profile_data.items() %}
{% if uid not in own_profiles %}
{% set handler = registry.get(profile.get('type', 'apprise')) %}
{% set hint = handler.get_url_hint(profile.get('config', {})) %}
<div class="np-option" data-uuid="{{ uid }}"
data-name="{{ profile.get('name', '') }}"
data-icon="{{ handler.icon }}"
data-hint="{{ hint }}">
<i data-feather="{{ handler.icon }}" class="np-option-icon"></i>
<span class="np-option-text">
<strong class="np-option-name">{{ profile.get('name', uid) }}</strong>
{% if hint %}<small class="np-option-hint">{{ hint }}</small>{% endif %}
</span>
</div>
{% if has_options.append(1) %}{% endif %}
{% endif %}
{% endfor %}
{% if not has_options %}
<div class="np-option np-no-results" style="pointer-events:none; color: var(--color-grey-600);">
{{ _('No other profiles available') }}
</div>
{% endif %}
<div class="np-no-match" style="display:none; padding: 8px 12px; color: var(--color-grey-600); font-size: 0.85em;">
{{ _('No profiles match') }}
</div>
</div>
<a href="{{ url_for('notification_profiles.edit') }}" class="np-create-new">
<i data-feather="plus-circle"></i> {{ _('Create new profile') }}
</a>
</div>
</div>
</div>{# .np-chips #}
{% if not own_profiles and not inherited_profiles %}
<p class="pure-form-message-inline" style="margin: 4px 0 0 0; color: var(--color-grey-600);">
{{ _('No notification profiles linked. Notifications will not be sent for this watch.') }}
</p>
{% endif %}
</div>{# .notification-profile-selector #}
<script>
(function() {
var selector = document.getElementById('notification-profile-selector');
if (!selector) return;
var addBtn = selector.querySelector('#np-add-btn');
var dropdown = selector.querySelector('#np-dropdown');
var search = selector.querySelector('#np-search');
var chips = selector.querySelector('#np-chips');
var hiddenWrap = selector.querySelector('#np-hidden-inputs');
var noMatch = selector.querySelector('.np-no-match');
// Toggle dropdown
addBtn.addEventListener('click', function(e) {
e.stopPropagation();
var open = dropdown.style.display !== 'none';
dropdown.style.display = open ? 'none' : 'block';
if (!open) { search.value = ''; filterOptions(''); search.focus(); }
});
// Close on outside click
document.addEventListener('click', function(e) {
if (!selector.contains(e.target)) dropdown.style.display = 'none';
});
// Search filter
search.addEventListener('input', function() { filterOptions(this.value.toLowerCase()); });
function filterOptions(q) {
var opts = selector.querySelectorAll('.np-option:not(.np-no-results)');
var visible = 0;
opts.forEach(function(opt) {
var match = !q || opt.dataset.name.toLowerCase().indexOf(q) !== -1
|| (opt.dataset.hint || '').toLowerCase().indexOf(q) !== -1;
opt.style.display = match ? '' : 'none';
if (match) visible++;
});
noMatch.style.display = (visible === 0 && q) ? 'block' : 'none';
}
// Add profile
selector.querySelectorAll('.np-option:not(.np-no-results)').forEach(function(opt) {
opt.addEventListener('click', function() {
var uuid = this.dataset.uuid;
var name = this.dataset.name;
var icon = this.dataset.icon;
var hint = this.dataset.hint;
// Add hidden input
var inp = document.createElement('input');
inp.type = 'hidden'; inp.name = 'notification_profiles'; inp.value = uuid;
hiddenWrap.appendChild(inp);
// Add chip (before the add-wrapper)
var chip = document.createElement('span');
chip.className = 'np-chip np-chip-own';
chip.dataset.uuid = uuid;
chip.title = hint || '';
chip.innerHTML = '<i data-feather="' + icon + '" class="np-chip-icon"></i>'
+ '<span class="np-chip-name">' + escHtml(name) + '</span>'
+ '<span class="np-chip-remove" data-uuid="' + uuid + '" title="{{ _("Remove") }}">×</span>';
chips.insertBefore(chip, selector.querySelector('#np-add-wrapper'));
chip.querySelector('.np-chip-remove').addEventListener('click', removeChip);
// Hide this option in dropdown
this.style.display = 'none';
dropdown.style.display = 'none';
if (window.feather) feather.replace();
});
});
// Remove chip
selector.querySelectorAll('.np-chip-remove').forEach(function(btn) {
btn.addEventListener('click', removeChip);
});
function removeChip() {
var uuid = this.dataset.uuid;
var chip = selector.querySelector('.np-chip-own[data-uuid="' + uuid + '"]');
if (chip) chip.remove();
var inp = hiddenWrap.querySelector('input[value="' + uuid + '"]');
if (inp) inp.remove();
// Re-show in dropdown
var opt = selector.querySelector('.np-option[data-uuid="' + uuid + '"]');
if (opt) opt.style.display = '';
}
function escHtml(s) {
return s.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
}
if (window.feather) feather.replace();
})();
</script>
{% endmacro %}
+29
View File
@@ -2,6 +2,7 @@
import psutil
import time
from threading import Thread
import multiprocessing
import pytest
import arrow
@@ -191,6 +192,34 @@ def cleanup(datastore_path):
if os.path.isfile(f):
os.unlink(f)
def pytest_configure(config):
"""Configure pytest environment before tests run.
CRITICAL: Set multiprocessing start method to 'fork' for Python 3.14+ compatibility.
Python 3.14 changed the default start method from 'fork' to 'forkserver' on Linux.
The forkserver method requires all objects to be picklable, but pytest-flask's
LiveServer uses nested functions that can't be pickled.
Setting 'fork' explicitly:
- Maintains compatibility with Python 3.10-3.13 (where 'fork' was already default)
- Fixes Python 3.14 pickling errors
- Only affects Unix-like systems (Windows uses 'spawn' regardless)
See: https://github.com/python/cpython/issues/126831
See: https://docs.python.org/3/whatsnew/3.14.html
"""
# Only set if not already set (respects existing configuration)
if multiprocessing.get_start_method(allow_none=True) is None:
try:
# 'fork' is available on Unix-like systems (Linux, macOS)
# On Windows, this will have no effect as 'spawn' is the only option
multiprocessing.set_start_method('fork', force=False)
logger.debug("Set multiprocessing start method to 'fork' for Python 3.14+ compatibility")
except (ValueError, RuntimeError):
# Already set, not available on this platform, or context already created
pass
def pytest_addoption(parser):
"""Add custom command-line options for pytest.
@@ -4,30 +4,54 @@ import os
from flask import url_for
from ..util import live_server_setup, wait_for_all_checks
def do_test(client, live_server, make_test_use_extra_browser=False):
CUSTOM_PROFILE_NAME = 'Custom Browser URL'
CUSTOM_PROFILE_MACHINE_NAME = 'custom_browser_url'
CUSTOM_BROWSER_WS = 'ws://sockpuppetbrowser-custom-url:3000'
# Grep for this string in the logs?
test_url = "https://changedetection.io/ci-test.html?non-custom-default=true"
# "non-custom-default" should not appear in the custom browser connection
custom_browser_name = 'custom browser URL'
def create_custom_browser_profile(client):
"""Create a browser profile that uses the custom sockpuppet container."""
res = client.post(
url_for("settings.settings_browsers.save"),
data={
"name": CUSTOM_PROFILE_NAME,
"fetch_backend": "playwright_cdp",
"browser_connection_url": CUSTOM_BROWSER_WS,
"viewport_width": 1280,
"viewport_height": 1000,
"block_images": "",
"block_fonts": "",
"ignore_https_errors": "",
"user_agent": "",
"locale": "",
"original_machine_name": "",
},
follow_redirects=True
)
assert b"saved." in res.data, f"Expected profile save confirmation, got: {res.data[:500]}"
def do_test(client, live_server, make_test_use_extra_browser=False):
# needs to be set and something like 'ws://127.0.0.1:3000'
assert os.getenv('PLAYWRIGHT_DRIVER_URL'), "Needs PLAYWRIGHT_DRIVER_URL set for this test"
#####################
test_url = "https://changedetection.io/ci-test.html?non-custom-default=true"
# preconfigure_browser_profiles_based_on_env() already set the correct system default
res = client.post(
url_for("settings.settings_page"),
data={"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver",
'requests-extra_browsers-0-browser_connection_url': 'ws://sockpuppetbrowser-custom-url:3000',
'requests-extra_browsers-0-browser_name': custom_browser_name
},
data={
"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
},
follow_redirects=True
)
assert b"Settings updated." in res.data
# Create the custom browser profile
create_custom_browser_profile(client)
# Add our URL to the import page
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
@@ -35,23 +59,24 @@ def do_test(client, live_server, make_test_use_extra_browser=False):
if make_test_use_extra_browser:
# So the name should appear in the edit page under "Request" > "Fetch Method"
# The custom profile name should appear in the edit page under "Request" tab
res = client.get(
url_for("ui.ui_edit.edit_page", uuid="first"),
follow_redirects=True
)
assert b'custom browser URL' in res.data
assert CUSTOM_PROFILE_NAME.encode() in res.data, \
f"Expected '{CUSTOM_PROFILE_NAME}' in edit page fetch method choices"
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={
# 'run_customer_browser_url_tests.sh' will search for this string to know if we hit the right browser container or not
"url": "https://changedetection.io/ci-test.html?custom-browser-search-string=1",
"tags": "",
"headers": "",
'fetch_backend': f"extra_browser_{custom_browser_name}",
'webdriver_js_execute_code': '',
"time_between_check_use_default": "y"
# 'run_custom_browser_url_tests.sh' will grep for this string in the custom container logs
"url": "https://changedetection.io/ci-test.html?custom-browser-search-string=1",
"tags": "",
"headers": "",
"browser_profile": CUSTOM_PROFILE_MACHINE_NAME,
"webdriver_js_execute_code": "",
"time_between_check_use_default": "y"
},
follow_redirects=True
)
@@ -74,12 +99,10 @@ def do_test(client, live_server, make_test_use_extra_browser=False):
# Requires playwright to be installed
def test_request_via_custom_browser_url(client, live_server, measure_memory_usage, datastore_path):
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=True)
def test_request_not_via_custom_browser_url(client, live_server, measure_memory_usage, datastore_path):
# live_server_setup(live_server) # Setup on conftest per function
# We do this so we can grep the logs of the custom container and see if the request actually went through that container
do_test(client, live_server, make_test_use_extra_browser=False)
@@ -12,12 +12,13 @@ def test_fetch_webdriver_content(client, live_server, measure_memory_usage, data
# live_server_setup(live_server) # Setup on conftest per function
#####################
# preconfigure_browser_profiles_based_on_env() already set the correct system default
# (playwright or puppeteer depending on FAST_PUPPETEER_CHROME_FETCHER) — no need to override it.
res = client.post(
url_for("settings.settings_page"),
data={
"application-empty_pages_are_a_change": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver",
'application-ui-favicons_enabled': "y",
},
follow_redirects=True
@@ -25,7 +25,6 @@ def test_execute_custom_js(client, live_server, measure_memory_usage, datastore_
data={
"url": test_url,
"tags": "",
'fetch_backend': "html_webdriver",
'webdriver_js_execute_code': 'document.querySelector("button[name=test-button]").click();',
'headers': "testheader: yes\buser-agent: MyCustomAgent",
"time_between_check_use_default": "y",
@@ -22,7 +22,7 @@ def test_preferred_proxy(client, live_server, measure_memory_usage, datastore_pa
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"browser_profile": "system",
"headers": "",
"proxy": "proxy-two",
"tags": "",
@@ -22,7 +22,6 @@ def test_noproxy_option(client, live_server, measure_memory_usage, datastore_pat
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": "html_requests",
"requests-extra_proxies-0-proxy_name": "custom-one-proxy",
"requests-extra_proxies-0-proxy_url": "http://test:awesome@squid-one:3128",
"requests-extra_proxies-1-proxy_name": "custom-two-proxy",
@@ -57,7 +56,6 @@ def test_noproxy_option(client, live_server, measure_memory_usage, datastore_pat
url_for("ui.ui_edit.edit_page", uuid=uuid, unpause_on_save=1),
data={
"include_filters": "",
"fetch_backend": "html_requests",
"headers": "",
"proxy": "no-proxy",
"tags": "",
@@ -21,7 +21,6 @@ def test_proxy_noconnect_custom(client, live_server, measure_memory_usage, datas
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') or os.getenv("WEBDRIVER_URL") else 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
# test:awesome is set in tests/proxy_list/squid-passwords.txt
"requests-extra_proxies-0-proxy_url": "http://127.0.0.1:3128",
@@ -42,7 +41,7 @@ def test_proxy_noconnect_custom(client, live_server, measure_memory_usage, datas
options = {
"url": test_url,
"fetch_backend": "html_webdriver" if os.getenv('PLAYWRIGHT_DRIVER_URL') or os.getenv("WEBDRIVER_URL") else "html_requests",
"browser_profile": "system",
"proxy": "ui-0custom-test-proxy",
"time_between_check_use_default": "y",
}
@@ -15,7 +15,6 @@ def test_select_custom(client, live_server, measure_memory_usage, datastore_path
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
# test:awesome is set in tests/proxy_list/squid-passwords.txt
"requests-extra_proxies-0-proxy_url": "http://test:awesome@squid-custom:3128",
@@ -59,7 +58,6 @@ def test_custom_proxy_validation(client, live_server, measure_memory_usage, data
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
"requests-extra_proxies-0-proxy_url": "xxxxhtt/333??p://test:awesome@squid-custom:3128",
},
@@ -75,7 +73,6 @@ def test_custom_proxy_validation(client, live_server, measure_memory_usage, data
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": 'html_requests',
"requests-extra_proxies-0-proxy_name": "custom-test-proxy",
"requests-extra_proxies-0-proxy_url": "https://",
},
@@ -29,7 +29,6 @@ def test_socks5(client, live_server, measure_memory_usage, datastore_path):
data={
"requests-time_between_check-minutes": 180,
"application-ignore_whitespace": "y",
"application-fetch_backend": "html_requests",
# set in .github/workflows/test-only.yml
"requests-extra_proxies-0-proxy_url": "socks5://proxy_user123:proxy_pass123@socks5proxy:1080",
"requests-extra_proxies-0-proxy_name": "socks5proxy",
@@ -61,7 +60,7 @@ def test_socks5(client, live_server, measure_memory_usage, datastore_path):
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"browser_profile": "system",
"headers": "",
"proxy": "ui-0socks5proxy",
"tags": "",
@@ -48,7 +48,7 @@ def test_socks5_from_proxiesjson_file(client, live_server, measure_memory_usage,
url_for("ui.ui_edit.edit_page", uuid="first", unpause_on_save=1),
data={
"include_filters": "",
"fetch_backend": 'html_webdriver' if os.getenv('PLAYWRIGHT_DRIVER_URL') else 'html_requests',
"browser_profile": "system",
"headers": "",
"proxy": "socks5proxy",
"tags": "",
@@ -60,15 +60,14 @@ def test_restock_detection(client, live_server, measure_memory_usage, datastore_
#####################
# Set this up for when we remove the notification from the watch, it should fallback with these details
# preconfigure_browser_profiles_based_on_env() already set the correct system default
res = client.post(
url_for("settings.settings_page"),
data={"application-notification_urls": notification_url,
"application-notification_title": "fallback-title "+default_notification_title,
"application-notification_body": "fallback-body "+default_notification_body,
"application-notification_format": default_notification_format,
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_webdriver"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
# Add our URL to the import page, because the docker container (playwright/selenium) wont be able to connect to our usual test url
@@ -56,8 +56,7 @@ def test_check_notification_email_formats_default_HTML(client, live_server, meas
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": "some text\nfallback-body<br> " + default_notification_body,
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -126,8 +125,7 @@ def test_check_notification_plaintext_format(client, live_server, measure_memory
"application-notification_title": "fallback-title {{watch_title}} {{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }} " + default_notification_title,
"application-notification_body": f"some text\n" + default_notification_body + f"\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -188,8 +186,7 @@ def test_check_notification_html_color_format(client, live_server, measure_memor
"application-notification_title": "fallback-title {{watch_title}} - diff_added_lines_test : '{{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }}' " + default_notification_title,
"application-notification_body": f"some text\n{default_notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'htmlcolor',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -273,8 +270,7 @@ def test_check_notification_markdown_format(client, live_server, measure_memory_
"application-notification_title": "fallback-title diff_added_lines_test : '{{ diff_added.splitlines()[0] if diff_added else 'diff added didnt split' }}' " + default_notification_title,
"application-notification_body": "*header*\n\nsome text\n" + default_notification_body,
"application-notification_format": 'markdown',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -369,8 +365,7 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": notification_body,
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -420,7 +415,7 @@ def test_check_notification_email_formats_default_Text_override_HTML(client, liv
data={
"url": test_url,
"notification_format": 'html',
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -480,8 +475,7 @@ def test_check_plaintext_document_plaintext_notification_smtp(client, live_serve
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -533,8 +527,7 @@ def test_check_plaintext_document_html_notifications(client, live_server, measur
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
assert b"Settings updated." in res.data
@@ -613,8 +606,7 @@ def test_check_plaintext_document_html_color_notifications(client, live_server,
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'htmlcolor',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -686,8 +678,7 @@ def test_check_html_document_plaintext_notification(client, live_server, measure
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": f"{notification_body}\nMore output test\n{ALL_MARKUP_TOKENS}",
"application-notification_format": 'text',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -740,8 +731,7 @@ def test_check_html_notification_with_apprise_format_is_html(client, live_server
"application-notification_title": "fallback-title " + default_notification_title,
"application-notification_body": "some text\nfallback-body<br> " + default_notification_body,
"application-notification_format": 'html',
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
assert b"Settings updated." in res.data
+6 -10
View File
@@ -32,8 +32,7 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
url_for("settings.settings_page"),
data={"application-password": "foobar",
"application-shared_diff_access": "True",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -91,8 +90,7 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
res = c.post(
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -127,16 +125,16 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
assert b"IMPORT" in res.data
assert b"LOG OUT" in res.data
assert b"time_between_check-minutes" in res.data
assert b"fetch_backend" in res.data
##################################################
# Remove password button, and check that it worked
##################################################
# preconfigure_browser_profiles_based_on_env() already set the correct system default
res = c.post(
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_webdriver",
"application-removepassword_button": "Remove password"
},
follow_redirects=True,
@@ -150,8 +148,7 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
res = c.post(
url_for("settings.settings_page"),
data={"application-password": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -164,8 +161,7 @@ def test_check_access_control(app, client, live_server, measure_memory_usage, da
data={"application-password": "foobar",
# Should be disabled
"application-shared_diff_access": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -60,7 +60,6 @@ def test_check_removed_line_contains_trigger(client, live_server, measure_memory
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"trigger_text": 'The golden line',
"url": test_url,
'fetch_backend': "html_requests",
'filter_text_removed': 'y',
"time_between_check_use_default": "y"},
follow_redirects=True
@@ -127,8 +126,7 @@ def test_check_add_line_contains_trigger(client, live_server, measure_memory_usa
# https://github.com/caronc/apprise/wiki/Notify_Custom_JSON#get-parameter-manipulation
"application-notification_urls": test_notification_url,
"application-notification_format": 'text',
"application-minutes_between_check": 180,
"application-fetch_backend": "html_requests"
"application-minutes_between_check": 180
},
follow_redirects=True
)
@@ -149,7 +147,6 @@ def test_check_add_line_contains_trigger(client, live_server, measure_memory_usa
data={"trigger_text": 'Oh yes please',
"url": test_url,
'processor': 'text_json_diff',
'fetch_backend': "html_requests",
'filter_text_removed': '',
'filter_text_added': 'y',
"time_between_check_use_default": "y"},
+9 -4
View File
@@ -170,6 +170,14 @@ def test_api_simple(client, live_server, measure_memory_usage, datastore_path):
headers={'x-api-key': api_key},
)
assert b'(changed) Which is across' in res.data
assert b'Some text thats the same' in res.data
# Fetch the difference between two versions (default text format)
res = client.get(
url_for("watchhistorydiff", uuid=watch_uuid, from_timestamp='previous', to_timestamp='latest')+"?changesOnly=true",
headers={'x-api-key': api_key},
)
assert b'Some text thats the same' not in res.data
# Test htmlcolor format
res = client.get(
@@ -408,7 +416,6 @@ def test_access_denied(client, live_server, measure_memory_usage, datastore_path
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": ""
},
follow_redirects=True
@@ -428,7 +435,6 @@ def test_access_denied(client, live_server, measure_memory_usage, datastore_path
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": "y"
},
follow_redirects=True
@@ -899,8 +905,7 @@ def test_api_conflict_UI_password(client, live_server, measure_memory_usage, dat
url_for("settings.settings_page"),
data={"application-password": "foobar", # password is now set! API should still work!
"application-api_access_token_enabled": "y",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -177,7 +177,6 @@ def test_openapi_validation_get_requests_bypass_validation(client, live_server,
url_for("settings.settings_page"),
data={
"requests-time_between_check-minutes": 180,
"application-fetch_backend": "html_requests",
"application-api_access_token_enabled": ""
},
follow_redirects=True
+26 -5
View File
@@ -178,23 +178,44 @@ def test_api_tags_listing(client, live_server, measure_memory_usage, datastore_p
def test_api_tag_restock_processor_config(client, live_server, measure_memory_usage, datastore_path):
"""
Test that a tag/group can be updated with processor_config_restock_diff via the API.
Test that a tag/group can be created and updated with processor_config_restock_diff via the API.
Since Tag extends WatchBase, processor config fields injected into WatchBase are also valid for tags.
"""
api_key = live_server.app.config['DATASTORE'].data['settings']['application'].get('api_access_token')
set_original_response(datastore_path=datastore_path)
# Create a tag
# Create a tag with processor_config_restock_diff in a single POST (issue #3966)
res = client.post(
url_for("tag"),
data=json.dumps({"title": "Restock Group"}),
data=json.dumps({
"title": "Restock Group",
"overrides_watch": True,
"processor_config_restock_diff": {
"in_stock_processing": "in_stock_only",
"follow_price_changes": True,
"price_change_min": 7777777
}
}),
headers={'content-type': 'application/json', 'x-api-key': api_key}
)
assert res.status_code == 201
assert res.status_code == 201, f"POST tag with restock config failed: {res.data}"
tag_uuid = res.json.get('uuid')
# Update tag with valid processor_config_restock_diff
# Verify processor config was saved during creation (the bug: these were discarded)
res = client.get(
url_for("tag", uuid=tag_uuid),
headers={'x-api-key': api_key}
)
assert res.status_code == 200
tag_data = res.json
assert tag_data.get('overrides_watch') == True, "overrides_watch should be saved on POST"
assert tag_data.get('processor_config_restock_diff', {}).get('in_stock_processing') == 'in_stock_only', \
"processor_config_restock_diff should be saved on POST"
assert tag_data.get('processor_config_restock_diff', {}).get('price_change_min') == 7777777, \
"price_change_min should be saved on POST"
# Update tag with valid processor_config_restock_diff via PUT
res = client.put(
url_for("tag", uuid=tag_uuid),
headers={'x-api-key': api_key, 'content-type': 'application/json'},
+1 -1
View File
@@ -19,7 +19,7 @@ def test_basic_auth(client, live_server, measure_memory_usage, datastore_path):
# Check form validation
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"},
data={"include_filters": "", "url": test_url, "tags": "", "headers": "", "time_between_check_use_default": "y"},
follow_redirects=True
)
assert b"Updated watch." in res.data
+37 -6
View File
@@ -48,6 +48,15 @@ def test_check_basic_change_detection_functionality(client, live_server, measure
# Check this class does not appear (that we didnt see the actual source)
assert b'foobar-detection' not in res.data
# Check POST preview
res = client.post(
url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True
)
# Check this class does not appear (that we didnt see the actual source)
assert b'foobar-detection' not in res.data
# Make a change
set_modified_response(datastore_path=datastore_path)
@@ -163,8 +172,7 @@ def test_title_scraper(client, live_server, measure_memory_usage, datastore_path
res = client.post(
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -206,8 +214,7 @@ def test_requests_timeout(client, live_server, measure_memory_usage, datastore_p
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180,
"requests-timeout": delay - 1,
'application-fetch_backend': "html_requests"},
"requests-timeout": delay - 1},
follow_redirects=True
)
@@ -225,8 +232,7 @@ def test_requests_timeout(client, live_server, measure_memory_usage, datastore_p
url_for("settings.settings_page"),
data={"application-ui-use_page_title_in_list": "",
"requests-time_between_check-minutes": 180,
"requests-timeout": delay + 1, # timeout should be a second more than the reply time
'application-fetch_backend': "html_requests"},
"requests-timeout": delay + 1}, # timeout should be a second more than the reply time
follow_redirects=True
)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
@@ -413,3 +419,28 @@ def test_plaintext_even_if_xml_content_and_can_apply_filters(client, live_server
assert b'&lt;foobar' not in res.data
res = delete_all_watches(client)
def test_last_error_cleared_on_same_checksum(client, live_server, datastore_path):
"""last_error should be cleared even when content is unchanged (checksumFromPreviousCheckWasTheSame path)"""
set_original_response(datastore_path=datastore_path)
uuid = client.application.config.get('DATASTORE').add_watch(url=url_for('test_endpoint', _external=True))
# First check - establishes baseline checksum
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# Inject a stale last_error directly (simulates a prior failed check)
datastore = client.application.config.get('DATASTORE')
datastore.update_watch(uuid=uuid, update_obj={'last_error': 'Some previous error'})
assert datastore.data['watching'][uuid].get('last_error') == 'Some previous error'
# Second check - same content, so checksumFromPreviousCheckWasTheSame will fire
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# last_error must be cleared even though no change was detected
assert datastore.data['watching'][uuid].get('last_error') == False
delete_all_watches(client)
+64 -2
View File
@@ -3,7 +3,7 @@
from .util import set_original_response, live_server_setup, wait_for_all_checks
from flask import url_for
import io
from zipfile import ZipFile
from zipfile import ZipFile, ZIP_DEFLATED
import re
import time
from changedetectionio.model import Watch, Tag
@@ -68,6 +68,9 @@ def test_backup(client, live_server, measure_memory_usage, datastore_path):
# Check for changedetection.json (settings file)
assert 'changedetection.json' in l, "changedetection.json should be in backup"
# secret.txt must never be included — it contains the Flask session key
assert 'secret.txt' not in l, "secret.txt (Flask session key) must not be included in backup"
# Get the latest one
res = client.get(
url_for("backups.remove_backups"),
@@ -196,4 +199,63 @@ def test_backup_restore(client, live_server, measure_memory_usage, datastore_pat
assert restored_tag2 is not None, f"Tag {tag_uuid2} not found after restore"
assert restored_tag2['title'] == "Tasty backup tag number two", "Restored tag 2 title does not match"
assert isinstance(restored_tag2, Tag.model), \
f"Tag 2 not properly rehydrated, got {type(restored_tag2)}"
f"Tag 2 not properly rehydrated, got {type(restored_tag2)}"
def test_backup_restore_zip_slip_rejected(client, live_server, measure_memory_usage, datastore_path):
"""Zip Slip path traversal entries in a restore zip must be rejected."""
import pytest
from changedetectionio.blueprint.backups.restore import import_from_zip
# Build a zip with a path traversal entry that would escape the extraction dir
malicious_zip = io.BytesIO()
with ZipFile(malicious_zip, 'w') as zf:
zf.writestr("../escaped.txt", "ATTACKER-CONTROLLED")
malicious_zip.seek(0)
datastore = live_server.app.config['DATASTORE']
with pytest.raises(ValueError, match="Zip Slip"):
import_from_zip(
zip_stream=malicious_zip,
datastore=datastore,
include_groups=True,
include_groups_replace=True,
include_watches=True,
include_watches_replace=True,
)
def test_backup_restore_zip_bomb_rejected(client, live_server, measure_memory_usage, datastore_path):
"""A zip whose total uncompressed size exceeds the limit must be rejected.
The guard reads file_size from the zip central-directory metadata no
actual decompression happens, so this test is fast and uses minimal RAM.
100 KB of zeros compresses to ~100 bytes; monkeypatching the limit to
50 KB is enough to trigger the check without creating any large files.
"""
import pytest
import changedetectionio.blueprint.backups.restore as restore_mod
from changedetectionio.blueprint.backups.restore import import_from_zip
# ~100 KB of zeros → deflate compresses to ~100 bytes, but file_size metadata = 100 KB
bomb_zip = io.BytesIO()
with ZipFile(bomb_zip, 'w', compression=ZIP_DEFLATED) as zf:
zf.writestr("data.txt", b"\x00" * (100 * 1024))
bomb_zip.seek(0)
datastore = live_server.app.config['DATASTORE']
original_limit = restore_mod._MAX_DECOMPRESSED_BYTES
try:
restore_mod._MAX_DECOMPRESSED_BYTES = 50 * 1024 # 50 KB limit for this test
with pytest.raises(ValueError, match="decompressed size"):
import_from_zip(
zip_stream=bomb_zip,
datastore=datastore,
include_groups=True,
include_groups_replace=True,
include_watches=True,
include_watches_replace=True,
)
finally:
restore_mod._MAX_DECOMPRESSED_BYTES = original_limit
@@ -118,8 +118,7 @@ def test_everything(live_server, client, measure_memory_usage, datastore_path):
res = client.post(
url_for("settings.settings_page"),
data={"application-password": "foobar",
"requests-time_between_check-minutes": 180,
'application-fetch_backend': "html_requests"},
"requests-time_between_check-minutes": 180},
follow_redirects=True
)
@@ -83,7 +83,6 @@ def test_check_block_changedetection_text_NOT_present(client, live_server, measu
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={"text_should_not_be_present": ignore_text,
"url": test_url,
'fetch_backend': "html_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -0,0 +1,281 @@
#!/usr/bin/env python3
"""
Tests that the watchlist shows/hides the browser status icon based on the
effective browser profile, covering the full inheritance chain:
watch browser_profile system default browser_profile direct_http_requests
"""
import pytest
from flask import url_for
def set_system_default_profile(client, profile_machine_name):
res = client.post(
url_for('settings.settings_browsers.set_default'),
data={'machine_name': profile_machine_name},
follow_redirects=True,
)
assert res.status_code == 200
def create_custom_browser_profile(client, name='My Custom Chrome'):
"""Create a custom browser profile using playwright_cdp and return its machine name."""
res = client.post(
url_for('settings.settings_browsers.save'),
data={
'name': name,
'fetch_backend': 'playwright_cdp',
'browser_connection_url': 'ws://localhost:3000',
'viewport_width': 1280,
'viewport_height': 1000,
'block_images': '',
'block_fonts': '',
'ignore_https_errors': '',
'user_agent': '',
'locale': '',
'custom_headers': '',
'original_machine_name': '',
},
follow_redirects=True,
)
assert b'saved.' in res.data
from changedetectionio.model.browser_profile import BrowserProfile
return BrowserProfile(name=name, fetch_backend='playwright_cdp').get_machine_name()
def create_requests_browser_profile(client, name, user_agent='', custom_headers=''):
"""Create a requests-type browser profile with optional UA and custom headers."""
res = client.post(
url_for('settings.settings_browsers.save'),
data={
'name': name,
'fetch_backend': 'requests',
'browser_connection_url': '',
'viewport_width': 1280,
'viewport_height': 1000,
'block_images': '',
'block_fonts': '',
'ignore_https_errors': '',
'user_agent': user_agent,
'locale': '',
'custom_headers': custom_headers,
'original_machine_name': '',
},
follow_redirects=True,
)
assert b'saved.' in res.data
from changedetectionio.model.browser_profile import BrowserProfile
return BrowserProfile(name=name, fetch_backend='requests').get_machine_name()
# ---------------------------------------------------------------------------
# Unit tests — status_icon attribute on fetcher classes
# ---------------------------------------------------------------------------
def test_status_icon_on_browser_fetchers():
"""Browser fetcher classes must declare a status_icon dict."""
from changedetectionio.content_fetchers.playwright.CDP import fetcher as playwright_fetcher
from changedetectionio.content_fetchers.puppeteer import fetcher as puppeteer_fetcher
from changedetectionio.content_fetchers.webdriver_selenium import fetcher as selenium_fetcher
for cls in (playwright_fetcher, puppeteer_fetcher, selenium_fetcher):
assert cls.status_icon is not None, f"{cls} should have status_icon set"
assert 'filename' in cls.status_icon
assert 'alt' in cls.status_icon
assert 'title' in cls.status_icon
def test_no_status_icon_on_requests_fetcher():
"""The plain requests fetcher must have status_icon = None."""
from changedetectionio.content_fetchers.requests import fetcher as requests_fetcher
assert requests_fetcher.status_icon is None
def test_fetcher_status_icons_filter_uses_status_icon(monkeypatch):
"""fetcher_status_icons filter returns icon HTML for a class with status_icon set."""
from changedetectionio import content_fetchers
class FakeBrowserFetcher:
status_icon = {'filename': 'test-icon.png', 'alt': 'Test browser', 'title': 'Test browser'}
supports_screenshots = True
monkeypatch.setitem(content_fetchers.FETCHERS, 'fake_browser', FakeBrowserFetcher)
from changedetectionio.flask_app import app
with app.test_request_context('/'):
from changedetectionio.flask_app import _jinja2_filter_fetcher_status_icons
result = _jinja2_filter_fetcher_status_icons('fake_browser')
assert 'test-icon.png' in result
assert 'Test browser' in result
# Requests fetcher → empty string
with app.test_request_context('/'):
result = _jinja2_filter_fetcher_status_icons('requests')
assert result == ''
# ---------------------------------------------------------------------------
# Integration tests — inheritance chain
# ---------------------------------------------------------------------------
def test_watch_explicit_browser_profile_shows_icon(client, live_server, measure_memory_usage, datastore_path):
"""Watch explicitly assigned a browser profile shows the chrome icon,
even when the system default is requests."""
datastore = client.application.config.get('DATASTORE')
set_system_default_profile(client, 'direct_http_requests')
machine_name = create_custom_browser_profile(client)
uuid = datastore.add_watch(url='http://example.com', extras={'browser_profile': machine_name, 'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' in res.data, \
"Chrome icon should appear when watch is explicitly set to a browser profile"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_watch_explicit_requests_profile_no_icon(client, live_server, measure_memory_usage, datastore_path):
"""Watch explicitly set to direct_http_requests never shows the chrome icon,
even when the system default is a browser."""
datastore = client.application.config.get('DATASTORE')
machine_name = create_custom_browser_profile(client)
set_system_default_profile(client, machine_name)
uuid = datastore.add_watch(url='http://example.com', extras={'browser_profile': 'direct_http_requests', 'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' not in res.data, \
"Chrome icon should NOT appear when watch is explicitly set to direct_http_requests"
datastore.delete(uuid)
set_system_default_profile(client, 'direct_http_requests')
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_system_default_requests_inherited_by_watch(client, live_server, measure_memory_usage, datastore_path):
"""Watch using system default inherits requests → no icon."""
datastore = client.application.config.get('DATASTORE')
set_system_default_profile(client, 'direct_http_requests')
uuid = datastore.add_watch(url='http://example.com', extras={'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' not in res.data, \
"Chrome icon should NOT appear when system default is requests and watch uses system default"
datastore.delete(uuid)
def test_system_default_browser_inherited_by_watch(client, live_server, measure_memory_usage, datastore_path):
"""Watch using system default inherits a browser profile → icon shown."""
datastore = client.application.config.get('DATASTORE')
machine_name = create_custom_browser_profile(client)
set_system_default_profile(client, machine_name)
uuid = datastore.add_watch(url='http://example.com', extras={'paused': True})
res = client.get(url_for('watchlist.index'), follow_redirects=True)
assert b'Using a Chrome browser' in res.data, \
"Chrome icon should appear when system default is a browser profile and watch uses system default"
datastore.delete(uuid)
set_system_default_profile(client, 'direct_http_requests')
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
# ---------------------------------------------------------------------------
# Integration tests — BrowserProfile UA and custom_headers applied to requests
# ---------------------------------------------------------------------------
def test_browser_profile_user_agent_applied(client, live_server, measure_memory_usage, datastore_path):
"""User-Agent set on a BrowserProfile appears in the fetched request;
a per-watch User-Agent header overrides it."""
from changedetectionio.tests.util import wait_for_all_checks
datastore = client.application.config.get('DATASTORE')
test_url = url_for('test_headers', _external=True)
machine_name = create_requests_browser_profile(
client, name='UA Profile Test', user_agent='profile-ua/2.0'
)
uuid = datastore.add_watch(url=test_url, extras={'browser_profile': machine_name})
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'profile-ua/2.0' in res.data, "Profile UA should appear in the echoed request headers"
# Per-watch User-Agent header overrides the profile UA
client.post(
url_for('ui.ui_edit.edit_page', uuid='first'),
data={
'url': test_url,
'tags': '',
'browser_profile': machine_name,
'headers': 'User-Agent: watch-ua/3.0',
'time_between_check_use_default': 'y',
},
follow_redirects=True,
)
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'watch-ua/3.0' in res.data, "Watch-level UA should override profile UA"
assert b'profile-ua/2.0' not in res.data, "Profile UA should be superseded by watch-level header"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
def test_browser_profile_custom_headers_applied(client, live_server, measure_memory_usage, datastore_path):
"""Custom headers set on a BrowserProfile are sent with every request using that profile;
per-watch headers override them when the same header name is used."""
from changedetectionio.tests.util import wait_for_all_checks
datastore = client.application.config.get('DATASTORE')
test_url = url_for('test_headers', _external=True)
machine_name = create_requests_browser_profile(
client,
name='Headers Profile Test',
custom_headers='X-Profile-Header: profile-value\nX-Shared-Header: from-profile',
)
uuid = datastore.add_watch(url=test_url, extras={'browser_profile': machine_name})
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'X-Profile-Header:profile-value' in res.data, \
"Profile custom header should appear in the echoed request"
assert b'X-Shared-Header:from-profile' in res.data, \
"Second profile custom header should appear"
# Per-watch header for the same key overrides the profile header
client.post(
url_for('ui.ui_edit.edit_page', uuid='first'),
data={
'url': test_url,
'tags': '',
'browser_profile': machine_name,
'headers': 'X-Shared-Header: from-watch\nX-Watch-Only: watch-value',
'time_between_check_use_default': 'y',
},
follow_redirects=True,
)
client.get(url_for('ui.form_watch_checknow'), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for('ui.ui_preview.preview_page', uuid='first'), follow_redirects=True)
assert b'X-Profile-Header:profile-value' in res.data, \
"Unrelated profile header should still be present"
assert b'X-Shared-Header:from-watch' in res.data, \
"Watch-level header should override the same-named profile header"
assert b'X-Shared-Header:from-profile' not in res.data, \
"Profile value for overridden header should be gone"
assert b'X-Watch-Only:watch-value' in res.data, \
"Watch-only header should appear"
datastore.delete(uuid)
client.get(url_for('settings.settings_browsers.delete', machine_name=machine_name), follow_redirects=True)
@@ -464,7 +464,7 @@ def test_settings_persist_after_update(client, live_server):
# Update settings directly (bypass form validation issues)
datastore.data['settings']['application']['empty_pages_are_a_change'] = True
datastore.data['settings']['application']['fetch_backend'] = 'html_requests'
datastore.data['settings']['application']['browser_profile'] = 'direct_http_requests'
datastore.data['settings']['requests']['time_between_check']['minutes'] = 120
datastore.commit()
@@ -478,7 +478,7 @@ def test_settings_persist_after_update(client, live_server):
# Verify settings survived
assert datastore2.data['settings']['application']['empty_pages_are_a_change'] == True, "empty_pages_are_a_change should persist"
assert datastore2.data['settings']['application']['fetch_backend'] == 'html_requests', "fetch_backend should persist"
assert datastore2.data['settings']['application']['browser_profile'] == 'direct_http_requests', "browser_profile should persist"
assert datastore2.data['settings']['requests']['time_between_check']['minutes'] == 120, "time_between_check should persist"
@@ -634,7 +634,7 @@ def test_ui_watch_edit_persists_all_fields(client, live_server):
'time_between_check-hours': '2',
'time_between_check-minutes': '30',
'include_filters': '#content',
'fetch_backend': 'html_requests',
'browser_profile': 'direct_http_requests',
'method': 'POST',
'ignore_text': 'Advertisement\nTracking'
},
@@ -657,5 +657,5 @@ def test_ui_watch_edit_persists_all_fields(client, live_server):
assert watch['title'] == 'Updated Watch Title'
assert watch['time_between_check']['hours'] == 2
assert watch['time_between_check']['minutes'] == 30
assert watch['fetch_backend'] == 'html_requests'
assert watch['browser_profile'] == 'direct_http_requests'
assert watch['method'] == 'POST'
@@ -72,7 +72,6 @@ def test_conditions_with_text_and_number(client, live_server, measure_memory_usa
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={
"url": test_url,
"fetch_backend": "html_requests",
"include_filters": ".number-container",
"title": "Number AND Text Condition Test",
"conditions_match_logic": CONDITIONS_MATCH_LOGIC_DEFAULT, # ALL = AND logic
@@ -258,7 +257,6 @@ def test_lev_conditions_plugin(client, live_server, measure_memory_usage, datast
url_for("ui.ui_edit.edit_page", uuid=uuid, unpause_on_save=1),
data={
"url": test_url,
"fetch_backend": "html_requests",
"conditions_match_logic": CONDITIONS_MATCH_LOGIC_DEFAULT, # ALL = AND logic
"conditions-0-field": "levenshtein_ratio",
"conditions-0-operator": "<",
+3 -3
View File
@@ -89,7 +89,7 @@ def test_check_markup_include_filters_restriction(client, live_server, measure_m
# Add our URL to the import page
res = client.post(
url_for("ui.ui_edit.edit_page", uuid="first"),
data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'fetch_backend': "html_requests", "time_between_check_use_default": "y"},
data={"include_filters": include_filters, "url": test_url, "tags": "", "headers": "", 'browser_profile': "direct_http_requests", "time_between_check_use_default": "y"},
follow_redirects=True
)
assert b"Updated watch." in res.data
@@ -144,7 +144,7 @@ def test_check_multiple_filters(client, live_server, measure_memory_usage, datas
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -195,7 +195,7 @@ def test_filter_is_empty_help_suggestion(client, live_server, measure_memory_usa
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -171,7 +171,7 @@ def test_element_removal_full(client, live_server, measure_memory_usage, datasto
"url": test_url,
"tags": "",
"headers": "",
"fetch_backend": "html_requests",
"browser_profile": "direct_http_requests",
"time_between_check_use_default": "y",
},
follow_redirects=True,
+64
View File
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# coding=utf-8
import hashlib
import time
from flask import url_for
from .util import live_server_setup, wait_for_all_checks, extract_UUID_from_client
@@ -11,6 +12,69 @@ import os
def test_surrogate_characters_in_content_are_sanitized():
"""Lone surrogates can appear in requests' r.text when a server returns malformed/mixed-encoding
content. Without sanitization, encoding to UTF-8 raises UnicodeEncodeError.
See: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
content_with_surrogate = '<html><body>Hello \udcad World</body></html>'
# Confirm the raw problem exists
with pytest.raises(UnicodeEncodeError):
content_with_surrogate.encode('utf-8')
# Our fix: sanitize after fetcher.run() in processors/base.py call_browser()
sanitized = content_with_surrogate.encode('utf-8', errors='replace').decode('utf-8')
assert 'Hello' in sanitized
assert 'World' in sanitized
assert '\udcad' not in sanitized
# Checksum computation (processors/base.py get_raw_document_checksum) must not crash
hashlib.md5(sanitized.encode('utf-8')).hexdigest()
def test_utf8_content_without_charset_header(client, live_server, datastore_path):
"""Server returns UTF-8 content but no charset in Content-Type header.
chardet can misdetect such pages as UTF-7 (Python 3.14 then produces surrogates).
Our fix tries UTF-8 first before falling back to chardet.
See: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
from .util import write_test_file_and_sync
# UTF-8 encoded content with non-ASCII chars - no charset will be in the header
html = '<html><body><p>Español</p><p>Français</p><p>日本語</p></body></html>'
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), html.encode('utf-8'), mode='wb')
test_url = url_for('test_endpoint', content_type="text/html", _external=True)
client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("ui.ui_preview.preview_page", uuid="first"), follow_redirects=True)
# Should decode correctly as UTF-8, not produce mojibake (Español) or replacement chars
assert 'Español'.encode('utf-8') in res.data
assert 'Français'.encode('utf-8') in res.data
assert '日本語'.encode('utf-8') in res.data
def test_shiftjis_with_meta_charset(client, live_server, datastore_path):
"""Server returns Shift-JIS content with no charset in HTTP header, but the HTML
declares <meta charset="Shift-JIS">. We should use the meta tag, not chardet.
Real-world case: https://github.com/dgtlmoon/changedetection.io/issues/3952
"""
from .util import write_test_file_and_sync
japanese_text = '日本語のページ'
html = f'<html><head><meta http-equiv="Content-Type" content="text/html;charset=Shift-JIS"></head><body><p>{japanese_text}</p></body></html>'
write_test_file_and_sync(os.path.join(datastore_path, "endpoint-content.txt"), html.encode('shift_jis'), mode='wb')
test_url = url_for('test_endpoint', content_type="text/html", _external=True)
client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("ui.ui_preview.preview_page", uuid="first"), follow_redirects=True)
assert japanese_text.encode('utf-8') in res.data
def set_html_response(datastore_path):
test_return_data = """
<html><body><span class="nav_second_img_text">
+14 -4
View File
@@ -10,6 +10,8 @@ from .util import live_server_setup, wait_for_all_checks, delete_all_watches
def _runner_test_http_errors(client, live_server, http_code, expected_text, datastore_path):
from loguru import logger
logger.debug(f"_runner_test_http_errors - testing text '{expected_text}' for code {http_code}")
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write("Now you going to get a {} error code\n".format(http_code))
@@ -20,6 +22,11 @@ def _runner_test_http_errors(client, live_server, http_code, expected_text, data
status_code=http_code,
_external=True)
if os.getenv("PLAYWRIGHT_DRIVER_URL") or os.getenv('WEBDRIVER_URL'):
logger.warning("!!! Looks like we're running test with playwright or selenium, so FORCE a connection back to our container 'cdio'")
test_url = test_url.replace('localhost.localdomain', 'changedet')
test_url = test_url.replace('localhost', 'changedet')
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
@@ -76,7 +83,8 @@ def test_DNS_errors(client, live_server, measure_memory_usage, datastore_path):
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
)
assert found_name_resolution_error
# Should always record that we tried
@@ -108,7 +116,8 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
)
assert found_name_resolution_error
@@ -117,7 +126,7 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
url_for("ui.ui_edit.edit_page", uuid="first"),
data={
"url": test_url,
"fetch_backend": "html_requests",
"browser_profile": "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -131,7 +140,8 @@ def test_low_level_errors_clear_correctly(client, live_server, measure_memory_us
b"nodename nor servname provided" in res.data or
b"Temporary failure in name resolution" in res.data or
b"Failed to establish a new connection" in res.data or
b"Connection error occurred" in res.data
b"Connection error occurred" in res.data or
b"net::ERR_NAME_NOT_RESOLVED" in res.data
)
assert not found_name_resolution_error
@@ -92,7 +92,7 @@ def test_check_filter_multiline(client, live_server, measure_memory_usage, datas
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -143,7 +143,7 @@ def test_check_filter_and_regex_extract(client, live_server, measure_memory_usag
"url": test_url,
"tags": "",
"headers": "",
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
@@ -212,7 +212,7 @@ def test_regex_error_handling(client, live_server, measure_memory_usage, datasto
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={"extract_text": '/something bad\d{3/XYZ',
"url": test_url,
"fetch_backend": "html_requests",
"browser_profile": "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)
@@ -96,7 +96,7 @@ def test_filter_doesnt_exist_then_exists_should_get_notification(client, live_se
# preprended with extra filter that intentionally doesn't match any entry,
# notification should still be sent even if first filter does not match (PR#3516)
"include_filters": ".non-matching-selector\n.ticket-available",
"fetch_backend": "html_requests",
"browser_profile": "direct_http_requests",
"time_between_check_use_default": "y"})
res = client.post(
@@ -70,7 +70,7 @@ def run_filter_test(client, live_server, content_filter, app_notification_format
"Diff as Patch: {{diff_patch}}\n"
":-)",
"notification_format": 'text',
"fetch_backend": "html_requests",
"browser_profile": "direct_http_requests",
"filter_failure_notification_send": 'y',
"time_between_check_use_default": "y",
"headers": "",
+1 -1
View File
@@ -417,7 +417,7 @@ def test_order_of_filters_tag_filter_and_watch_filter(client, live_server, measu
"url": test_url,
"tags": "test-tag-keep-order",
"headers": "",
'fetch_backend': "html_requests",
'browser_profile': "direct_http_requests",
"time_between_check_use_default": "y"},
follow_redirects=True
)

Some files were not shown because too many files have changed in this diff Show More