Compare commits

..

95 Commits

Author SHA1 Message Date
dgtlmoon
07c60a6acc Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2022-01-05 19:13:42 +01:00
dgtlmoon
7916faf58b 0.39.6 2022-01-05 19:13:36 +01:00
dgtlmoon
febb2bbf0d Heroku tweaks (backup download) (#356)
* use absolute path, just incase the data-dir is set relative
2022-01-05 19:12:13 +01:00
dgtlmoon
59d31bf76f XPath support (#355)
* XPath support and minor improvements to form validation
2022-01-05 17:58:07 +01:00
dgtlmoon
f87f7077a6 Better handling of EmptyReply exception, always bump 'last_checked' in the case of an error (#354)
* Better handling of EmptyReply exception, always bump 'last_checked' in the case of an error, adds test
2022-01-05 14:13:30 +01:00
revilo951
f166ab1e30 Adding note in comments for working arm64 chrome with rPi-4 (#336) 2022-01-05 12:20:56 +01:00
Valtteri Huuskonen
55e679e973 fix typo in README.md (#350)
Fix spelling of Raspberry Pi.
2022-01-04 10:55:20 +01:00
dgtlmoon
e211ba806f Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2022-01-03 20:16:51 +01:00
dgtlmoon
b33105d576 Re #348 - Add test for backup, use proper datastore path 2022-01-03 20:16:21 +01:00
dgtlmoon
b73f5a5c88 Update README.md 2022-01-03 18:46:50 +01:00
Unpublished
023951a10e Be sure that documents returned with a application/json header are not parsed with inscriptis (#337)
* Auto-detect JSON by Content-Type header
* Add test to not parse JSON responses with inscriptis
2022-01-02 22:35:33 +01:00
dgtlmoon
fbd9ecab62 Re #340 - snapshot should not be modified by ignore text (#344) 2022-01-02 22:35:04 +01:00
dgtlmoon
b5c1fce136 Re #133 Option for ignoring whitespacing (#345)
* Global setting option to ignore whitespace when detecting a change
2022-01-02 22:28:34 +01:00
dgtlmoon
489671dcca Re #342 notification encoding (#343)
* Re #342 - check for accidental python byte encoding of non-utf8/string, check return type of fetcher and fix encoding of notification content
2022-01-02 14:11:04 +01:00
dgtlmoon
d4dc3466dc Update README.md 2022-01-01 18:11:54 +01:00
dgtlmoon
0439acacbe Adding global ignore text (#339) 2022-01-01 14:53:08 +01:00
dgtlmoon
735fc2ac8e Adding new proxyType to selenium mappings 2021-12-31 10:48:11 +01:00
dgtlmoon
8a825f0055 Use selenium 4.1.0 2021-12-31 10:44:45 +01:00
dgtlmoon
d0ae8b7923 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-31 10:35:47 +01:00
dgtlmoon
a504773941 Bumping selenium version re https://github.com/dgtlmoon/changedetection.io/pull/331#issuecomment-1003323594 2021-12-31 10:35:29 +01:00
Calvin Bui
feb8e6c76c Add socksVersion mapping (#331) 2021-12-31 10:26:38 +01:00
dgtlmoon
a37a5038d8 Fix broken RSS link fields 2021-12-30 00:04:38 +01:00
dgtlmoon
f1933b786c RSS Link links you back to the difference UI/JS page, RSS Description is the page you're watching, and RSS Title is the page you're watching 2021-12-29 23:57:30 +01:00
dgtlmoon
d6a6ef2c1d Unify Filters and Triggers tabs into a single tab 2021-12-29 23:37:04 +01:00
dgtlmoon
cf9554b169 Move 'request type' field to the new 'Requests' tab 2021-12-29 23:31:53 +01:00
dgtlmoon
d602cf4646 Aligning call signatures #325 2021-12-29 23:28:34 +01:00
Simon Caron
dfcae4ee64 Extend Request Parameters to add Body & Method (#325) 2021-12-29 23:18:29 +01:00
dgtlmoon
e3bcd8c9bf Update README.md 2021-12-29 08:55:37 +01:00
dgtlmoon
c4990fa3f9 Create CONTRIBUTING.md 2021-12-28 18:59:43 +01:00
dgtlmoon
98461d813e Update README.md 2021-12-28 18:57:39 +01:00
dgtlmoon
8ec17a4c83 Re #267 - Pass settings for the proxy setup for webdriver (#326)
* Re #267 - Pass HTTP_PROXY as the proxy setup for webdriver
* Update README.md
2021-12-28 17:07:41 +01:00
dgtlmoon
ee708cc395 Update README.md 2021-12-28 13:19:24 +01:00
dgtlmoon
8a670c029a Update README.md 2021-12-28 13:18:44 +01:00
dgtlmoon
9fa5aec01e Update README.md 2021-12-28 00:47:00 +01:00
dgtlmoon
43c9cb8b0c 0.39.5 2021-12-27 23:46:29 +01:00
dgtlmoon
b6a359d55b Update feature_request.md 2021-12-27 13:50:38 +01:00
dgtlmoon
ae5a88beea Update issue templates 2021-12-27 13:49:07 +01:00
dgtlmoon
a899d338e9 Update bug_report.md 2021-12-27 13:48:02 +01:00
dgtlmoon
7975e8ec2e Update issue templates 2021-12-27 13:46:41 +01:00
dgtlmoon
ce383bcd04 W3C HTML validation issue around RSS icon 2021-12-27 10:55:43 +01:00
dgtlmoon
0b0cdb101b Closes #323 adds link to wiki 2021-12-27 10:14:40 +01:00
dgtlmoon
396509bae8 Update README.md 2021-12-22 10:43:22 +01:00
dgtlmoon
2973f40035 Update README.md 2021-12-22 10:42:48 +01:00
dgtlmoon
067fac862c Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-19 23:17:48 +01:00
dgtlmoon
20647ea319 improve theming docs 2021-12-19 23:17:24 +01:00
dgtlmoon
fafc7fda62 Update README.md 2021-12-19 23:10:55 +01:00
dgtlmoon
b1aaf9f277 Update README.md 2021-12-19 23:04:56 +01:00
dgtlmoon
18987aeb23 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-19 18:17:37 +01:00
dgtlmoon
856789a9ba Closes #315 - Include library apprise Notify_mqtt 2021-12-19 18:16:51 +01:00
Iván
2857c7bb77 Re #80, sets SECLEVEL=1 in openssl.conf to allow monitoring sites with weak/old cipher suites (#312)
* set SECLEVEL=1 in openssl.conf to allow monitoring sites with weak/old cipher suites

* Re #80, sets SECLEVEL=1 in openssl.conf to allow monitoring sites with weak/old cipher suites
2021-12-16 12:13:47 +01:00
dgtlmoon
df951637c4 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-16 11:53:39 +01:00
dgtlmoon
ba6fe076bb Go back to docker hub 2021-12-16 11:53:28 +01:00
dgtlmoon
9815fc2526 RSS allow access via token (#310)
Allow access via a token
* New RSS URL
* Redirect the old RSS feed URL
* fix tests
2021-12-16 00:05:01 +01:00
dgtlmoon
e71dbbe771 Adding deploy to Heroku button 2021-12-15 23:32:48 +01:00
dgtlmoon
bd222c99c6 Adding heroku app.json app 2021-12-15 23:28:23 +01:00
dgtlmoon
4b002ad9e0 Tweak runtime Heroku version 2021-12-15 23:20:21 +01:00
dgtlmoon
fe2ffd6356 Tweaking heroku Procfile 2021-12-15 23:20:06 +01:00
dgtlmoon
266bebb5bc Adjust buildpacks on Heroku 2021-12-15 23:15:36 +01:00
dgtlmoon
115ff5bc2e Adding heroku python3 runtime config 2021-12-15 23:13:03 +01:00
dgtlmoon
dd6a24d337 Try simpler heroku recipe 2021-12-15 23:09:43 +01:00
dgtlmoon
f0d418d58c Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-15 23:07:32 +01:00
dgtlmoon
10d3b09051 -C option to create a datadir if it doesnt exist 2021-12-15 23:07:13 +01:00
dgtlmoon
35d0c74454 Re #308 - Adding test and including settings in clone operation (#309) 2021-12-15 19:54:30 +01:00
Glassed Silver
dd450b81ad fixing too small font in diff UI (#260)
* fixing too small font in diff UI , lower size from 12 to 11 in Part II
2021-12-15 19:21:25 +01:00
dgtlmoon
512d76c52b Update README.md
Make link more accurate
2021-12-10 20:21:27 +01:00
dgtlmoon
5a10acfd09 Send diff in notifications (#296) 2021-12-10 12:08:51 +01:00
dgtlmoon
a7c09c8990 Fix scrub form theme 2021-12-10 00:09:54 +01:00
dgtlmoon
9235eae608 Scrub dates: Fix date regex limit handler parsing 2021-12-10 00:09:42 +01:00
dgtlmoon
5bbd82be79 Wait 60 seconds or until stop_thread is set 2021-12-09 23:28:17 +01:00
dgtlmoon
7f8c0fb2fa Check that a notification URL is set when sending the test notification (#300) 2021-12-08 12:23:48 +01:00
Tristan Hill
489eedf34e Flask 2 (#299)
Co-authored-by: Tristan Hill <t+git@eaux.uk>
2021-12-07 23:23:23 +01:00
dgtlmoon
3956b3fd68 Re #269 - Show current/correct BASE_URL information (#271)
* Re #269 - Show current/correct BASE_URL information
2021-12-04 15:23:23 +01:00
dgtlmoon
61c1d213d0 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-12-04 14:48:18 +01:00
dgtlmoon
e07f573f64 Re #269 - Fix env var comment name 2021-12-04 14:47:46 +01:00
ghjklw
ecba130fdb Enable Markdown and HTML notifications. (#288)
This change enable defining the notification body as HTML or Markdown. This can be very
useful to have more user-friendly notifications such as:
* applying a heading style to the `{watch_title}` to make it stand out
* creating clickable links using the `{watch_url}`, `{preview_url}` and `{diff_url}`.

Changes
=======
* Add a `notification_format` to the notification settings, defaults to plain text.
* Use the `body_format` parameter of Apprise's `notify` method.

Co-authored-by: Malo Jaffré <malo.jaffre@dunnhumby.com>
2021-12-04 14:41:48 +01:00
dgtlmoon
ff6dc842c0 0.39.4 release 2021-12-02 22:54:38 +01:00
dgtlmoon
4659993ecf Re #286 - Solving lost data/corrupted data - Tweak timing and try to write to a temp file first (#292)
* Re #286 - Tweak timing and try to write to a temp file first, Increase logging and format info message better.
2021-12-02 22:48:44 +01:00
jeremysherriff
0a29b3a582 Fix element paths when using reverse proxy subfolder (#272) 2021-11-12 11:34:19 +01:00
dgtlmoon
c55bf418c5 0.39.3 release 2021-10-28 11:32:33 +02:00
dgtlmoon
4bbb7d99b6 Re #264 - fixing clone watch operation 2021-10-28 11:29:59 +02:00
dgtlmoon
a8e92e2226 Re #265 - extended jsonpath support (#266)
* Re #265 - Use extended JSONpath support,
Allow a JSONPath selector to not match anything (yet)
Adding test
Correctly capture invalid JSONPath query error
2021-10-27 09:24:08 +02:00
dgtlmoon
c17327633f Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-10-26 22:32:29 +02:00
dgtlmoon
56d1dde7c3 Re #265 - wasnt catching the jsonpath exception due to invalid jsonpath expressions properly 2021-10-26 22:30:58 +02:00
dgtlmoon
6e4ddacaf8 Re #257 - Handle bool val of json path better (#263)
* Re #257 - Handle bool val of json path better, with test
2021-10-21 23:25:38 +02:00
dgtlmoon
3195ffa1c6 Re #249 - Add EXPOSE 5000 to Dockerfile 2021-10-06 22:28:35 +02:00
dgtlmoon
c749d2ee44 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-10-06 20:51:38 +02:00
dgtlmoon
ec94359f3c Provide better combination of chardet and urllib3 2021-10-06 20:51:05 +02:00
dgtlmoon
4d0bd58eb1 Prefer GHCR.io over DockerHub (#245)
* Prefer GHCR.io over DockerHub (DockerHub limits pulls)
2021-10-06 13:07:56 +02:00
dgtlmoon
3525f43469 Limit branches/tags of container build
Limit branch
2021-10-06 12:27:02 +02:00
dgtlmoon
d70252c1eb Re #213 - Adding screensize examples to selenium container 2021-10-06 11:34:24 +02:00
dgtlmoon
b57b94c63a Be more specific about tagged release builds 2021-10-06 11:28:39 +02:00
dgtlmoon
9e914c140e Fix :latest release worflow syntax check 2021-10-06 10:27:03 +02:00
dgtlmoon
5d5ceb2f52 Form helper - explain where the webdriver setting comes from 2021-10-06 09:27:41 +02:00
dgtlmoon
bc0303c5da Rename workflow name 2021-10-06 08:59:03 +02:00
dgtlmoon
1240da4a6e Just 'published' and 'edited' package release is enough (remove 'created') 2021-10-06 08:52:10 +02:00
52 changed files with 1751 additions and 406 deletions

41
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,41 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**Version**
In the top right area: 0....
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.

View File

@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Version and OS**
For example, 0.123 on linux/docker
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe the use-case and give concrete real-world examples**
Attach any HTML/JSON, give links to sites, screenshots etc, we are not mind readers
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@@ -1,14 +1,16 @@
name: Build and push containers
on:
# Automatically triggered by a testing workflow passing
# Automatically triggered by a testing workflow passing, but this is only checked when it lands in the `master`/default branch
workflow_run:
workflows: ["ChangeDetection.io Test"]
branches: [master]
tags: ['0.*']
types: [completed]
# Or a new tagged release
release:
types: [created, published, edited]
types: [published, edited]
jobs:
metadata:
@@ -23,10 +25,11 @@ jobs:
echo Ref ${{ github.ref }}
echo c: ${{ github.event.workflow_run.conclusion }}
echo r: ${{ github.event.workflow_run }}
echo tname: ${{ github.event.release.tag_name }}
echo tname: "${{ github.event.release.tag_name }}"
echo headbranch: -${{ github.event.workflow_run.head_branch }}-
set
on-success:
build-push-containers:
runs-on: ubuntu-latest
# If the testing workflow has a success, then we build to :latest
# Or if we are in a tagged release scenario.
@@ -81,7 +84,7 @@ jobs:
# master always builds :latest
- name: Build and push :latest
id: docker_build
if: ${{ github.ref == 'refs/heads/master'}}
if: ${{ github.ref }} == "refs/heads/master"
uses: docker/build-push-action@v2
with:
context: ./
@@ -97,7 +100,7 @@ jobs:
# A new tagged release is required, which builds :tag
- name: Build and push :tag
id: docker_build_tag_release
if: ${{ github.event.release.tag_name }} != ''
if: github.event_name == 'release' && startsWith(github.event.release.tag_name, '0.')
uses: docker/build-push-action@v2
with:
context: ./
@@ -110,7 +113,6 @@ jobs:
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
- name: Image digest
run: echo step SHA ${{ steps.vars.outputs.sha_short }} tag ${{steps.vars.outputs.tag}} branch ${{steps.vars.outputs.branch}} digest ${{ steps.docker_build.outputs.digest }}

View File

@@ -14,6 +14,9 @@ jobs:
with:
python-version: 3.9
- name: Show env vars
run: set
- name: Install dependencies
run: |
python -m pip install --upgrade pip
@@ -27,12 +30,15 @@ jobs:
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Unit tests
run: |
python3 -m unittest changedetectionio.tests.unit.test_notification_diff
- name: Test with pytest
run: |
# Each test is totally isolated and performs its own cleanup/reset
cd changedetectionio; ./run_all_tests.sh
# https://github.com/docker/build-push-action/blob/master/docs/advanced/test-before-push.md ?
# https://github.com/docker/buildx/issues/59 ? Needs to be one platform?

5
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,5 @@
Contributing is always welcome!
I am no professional flask developer, if you know a better way that something can be done, please let me know!
Otherwise, it's always best to PR into the `dev` branch.

View File

@@ -42,10 +42,15 @@ ENV PYTHONUNBUFFERED=1
RUN [ ! -d "/datastore" ] && mkdir /datastore
# Re #80, sets SECLEVEL=1 in openssl.conf to allow monitoring sites with weak/old cipher suites
RUN sed -i 's/^CipherString = .*/CipherString = DEFAULT@SECLEVEL=1/' /etc/ssl/openssl.cnf
# Copy modules over to the final image and add their dir to PYTHONPATH
COPY --from=builder /dependencies /usr/local
ENV PYTHONPATH=/usr/local
EXPOSE 5000
# The actual flask app
COPY changedetectionio /app/changedetectionio
# The eventlet server wrapper

1
Procfile Normal file
View File

@@ -0,0 +1 @@
web: python3 ./changedetection.py -C -d ./datastore -p $PORT

View File

@@ -1,13 +1,9 @@
# changedetection.io
![changedetection.io](https://github.com/dgtlmoon/changedetection.io/actions/workflows/test-only.yml/badge.svg?branch=master)
<a href="https://hub.docker.com/r/dgtlmoon/changedetection.io" target="_blank" title="Change detection docker hub">
<img src="https://img.shields.io/docker/pulls/dgtlmoon/changedetection.io" alt="Docker Pulls"/>
</a>
<a href="https://hub.docker.com/r/dgtlmoon/changedetection.io" target="_blank" title="Change detection docker hub">
<img src="https://img.shields.io/github/v/release/dgtlmoon/changedetection.io" alt="Change detection latest tag version"/>
</a>
[![Release Version][release-shield]][release-link] [![Docker Pulls][docker-pulls]][docker-link] [![License][license-shield]](LICENSE.md)
## Self-hosted open source change monitoring of web pages.
![changedetection.io](https://github.com/dgtlmoon/changedetection.io/actions/workflows/test-only.yml/badge.svg?branch=master)
## Self-Hosted, Open Source, Change Monitoring of Web Pages
_Know when web pages change! Stay ontop of new information!_
@@ -18,32 +14,47 @@ Open source web page monitoring, notification and change detection.
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot.png" style="max-width:100%;" alt="Self-hosted web page change monitoring" title="Self-hosted web page change monitoring" />
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://dashboard.heroku.com/new?template=https%3A%2F%2Fgithub.com%2Fdgtlmoon%2Fchangedetection.io%2Ftree%2Fmaster)
Read the [Heroku notes and limitations wiki page first](https://github.com/dgtlmoon/changedetection.io/wiki/Heroku-notes)
#### Example use cases
Know when ...
- Government department updates (changes are often only on their websites)
- Local government news (changes are often only on their websites)
- Products and services have a change in pricing
- Governmental department updates (changes are often only on their websites)
- New software releases, security advisories when you're not on their mailing list.
- Festivals with changes
- Realestate listing changes
- COVID related news from government websites
- University/organisation news from their website
- Detect and monitor changes in JSON API responses
- API monitoring and alerting
- Changes in legal and other documents
- Trigger API calls via notifications when text appears on a website
- Glue together APIs using the JSON filter and JSON notifications
- Create RSS feeds based on changes in web content
- You have a very sensitive list of URLs to watch and you do _not_ want to use the paid alternatives. (Remember, _you_ are the product)
_Need an actual Chrome runner with Javascript support? We support fetching via WebDriver!</a>_
**Get monitoring now! super simple, one command!**
**Get monitoring now! super simple.**
<a href="https://dashboard.heroku.com/new?template=https%3A%2F%2Fgithub.com%2Fdgtlmoon%2Fchangedetection.io%2Ftree%2Fmaster">Deploy to Heroku for free</a>, Run this python directly, or with <a href="https://docs.docker.com/get-docker/">docker</a> and/or <a href="https://www.digitalocean.com/community/tutorial_collections/how-to-install-docker-compose">docker-compose</a>
## Screenshots
Examining differences in content.
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot-diff.png" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/
Run the python code on your own machine by cloning this repository, or with <a href="https://docs.docker.com/get-docker/">docker</a> and/or <a href="https://www.digitalocean.com/community/tutorial_collections/how-to-install-docker-compose">docker-compose</a>
## Installation
### Docker
Check out our Docker hub page https://hub.docker.com/r/dgtlmoon/changedetection.io
With Docker composer, just clone this repository and
With Docker composer, just clone this repository and..
```bash
$ docker-compose up -d
```
@@ -80,14 +91,8 @@ docker run -d --restart always -p "127.0.0.1:5000:5000" -v datastore-volume:/dat
```bash
docker-compose pull && docker-compose up -d
```
## Screenshots
Examining differences in content.
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot-diff.png" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/
### Filters
XPath, JSONPath and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools.
### Notifications
@@ -107,7 +112,7 @@ Just some examples
json://someserver.com/custom-api
syslog://
<a href="https://github.com/caronc/apprise">And everything else in this list!</a>
<a href="https://github.com/caronc/apprise#popular-notification-services">And everything else in this list!</a>
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot-notifications.png" style="max-width:100%;" alt="Self-hosted web page change monitoring notifications" title="Self-hosted web page change monitoring notifications" />
@@ -137,29 +142,13 @@ When you enable a `json:` filter, you can even automatically extract and parse e
`json:$.price` would give `23.50`, or you can extract the whole structure
### Proxy
### Proxy configuration
A proxy for ChangeDetection.io can be configured by setting environment the
`HTTP_PROXY`, `HTTPS_PROXY` variables, examples are also in the `docker-compose.yml`
See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration
`NO_PROXY` exclude list can be specified by following `"localhost,192.168.0.0/24"`
### Raspberry Pi support?
as `docker run` with `-e`
```
docker run -d --restart always -e HTTPS_PROXY="socks5h://10.10.1.10:1080" -p "127.0.0.1:5000:5000" -v datastore-volume:/datastore --name changedetection.io dgtlmoon/changedetection.io
```
With `docker-compose`, see the `Proxy support example` in <a href="https://github.com/dgtlmoon/changedetection.io/blob/master/docker-compose.yml">docker-compose.yml</a>.
For more information see https://docs.python-requests.org/en/master/user/advanced/#proxies
This proxy support also extends to the notifications https://github.com/caronc/apprise/issues/387#issuecomment-841718867
### RaspberriPi support?
RaspberriPi and linux/arm/v6 linux/arm/v7 arm64 devices are supported!
Raspberry Pi and linux/arm/v6 linux/arm/v7 arm64 devices are supported!
### Windows native support?
@@ -174,3 +163,12 @@ Please support us, even small amounts help a LOT.
BTC `1PLFN327GyUarpJd7nVe7Reqg9qHx5frNn`
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/btc-support.png" style="max-width:50%;" alt="Support us!" />
[release-shield]: https://img.shields.io/github/v/release/dgtlmoon/changedetection.io?style=for-the-badge
[docker-pulls]: https://img.shields.io/docker/pulls/dgtlmoon/changedetection.io?style=for-the-badge
[test-shield]: https://github.com/dgtlmoon/changedetection.io/actions/workflows/test-only.yml/badge.svg?branch=master
[license-shield]: https://img.shields.io/github/license/dgtlmoon/changedetection.io.svg?style=for-the-badge
[release-link]: https://github.com/dgtlmoon.com/changedetection.io/releases
[docker-link]: https://hub.docker.com/r/dgtlmoon/changedetection.io

21
app.json Normal file
View File

@@ -0,0 +1,21 @@
{
"name": "ChangeDetection.io",
"description": "The best and simplest self-hosted open source website change detection monitoring and notification service.",
"keywords": [
"changedetection",
"website monitoring"
],
"repository": "https://github.com/dgtlmoon/changedetection.io",
"success_url": "/",
"scripts": {
},
"env": {
},
"formation": {
"web": {
"quantity": 1,
"size": "free"
}
},
"image": "heroku/python"
}

View File

@@ -21,11 +21,13 @@ def main():
datastore_path = os.path.join(os.getcwd(), "datastore")
try:
opts, args = getopt.getopt(sys.argv[1:], "csd:p:", "port")
opts, args = getopt.getopt(sys.argv[1:], "Ccsd:p:", "port")
except getopt.GetoptError:
print('backend.py -s SSL enable -p [port] -d [datastore path]')
sys.exit(2)
create_datastore_dir = False
for opt, arg in opts:
# if opt == '--purge':
# Remove history, the actual files you need to delete manually.
@@ -45,13 +47,20 @@ def main():
if opt == '-c':
do_cleanup = True
# Create the datadir if it doesnt exist
if opt == '-C':
create_datastore_dir = True
# isnt there some @thingy to attach to each route to tell it, that this route needs a datastore
app_config = {'datastore_path': datastore_path}
if not os.path.isdir(app_config['datastore_path']):
print ("ERROR: Directory path for the datastore '{}' does not exist, cannot start, please make sure the directory exists.\n"
"Alternatively, use the -d parameter.".format(app_config['datastore_path']),file=sys.stderr)
sys.exit(2)
if create_datastore_dir:
os.mkdir(app_config['datastore_path'])
else:
print ("ERROR: Directory path for the datastore '{}' does not exist, cannot start, please make sure the directory exists.\n"
"Alternatively, use the -d parameter.".format(app_config['datastore_path']),file=sys.stderr)
sys.exit(2)
datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], version_tag=changedetectionio.__version__)
app = changedetectionio.changedetection_app(app_config, datastore)

View File

@@ -30,7 +30,7 @@ import datetime
import pytz
from copy import deepcopy
__version__ = '0.39.2'
__version__ = '0.39.6'
datastore = None
@@ -224,12 +224,86 @@ def changedetection_app(config=None, datastore_o=None):
# Disable password loginif there is not one set
app.config['LOGIN_DISABLED'] = datastore.data['settings']['application']['password'] == False
# For the RSS path, allow access via a token
if request.path == '/rss' and request.args.get('token'):
app_rss_token = datastore.data['settings']['application']['rss_access_token']
rss_url_token = request.args.get('token')
if app_rss_token == rss_url_token:
app.config['LOGIN_DISABLED'] = True
@app.route("/rss", methods=['GET'])
@login_required
def rss():
limit_tag = request.args.get('tag')
# Sort by last_changed and add the uuid which is usually the key..
sorted_watches = []
# @todo needs a .itemsWithTag() or something
for uuid, watch in datastore.data['watching'].items():
if limit_tag != None:
# Support for comma separated list of tags.
for tag_in_watch in watch['tag'].split(','):
tag_in_watch = tag_in_watch.strip()
if tag_in_watch == limit_tag:
watch['uuid'] = uuid
sorted_watches.append(watch)
else:
watch['uuid'] = uuid
sorted_watches.append(watch)
sorted_watches.sort(key=lambda x: x['last_changed'], reverse=True)
fg = FeedGenerator()
fg.title('changedetection.io')
fg.description('Feed description')
fg.link(href='https://changedetection.io')
for watch in sorted_watches:
if not watch['viewed']:
# Re #239 - GUID needs to be individual for each event
# @todo In the future make this a configurable link back (see work on BASE_URL https://github.com/dgtlmoon/changedetection.io/pull/228)
guid = "{}/{}".format(watch['uuid'], watch['last_changed'])
fe = fg.add_entry()
# Include a link to the diff page, they will have to login here to see if password protection is enabled.
# Description is the page you watch, link takes you to the diff JS UI page
base_url = datastore.data['settings']['application']['base_url']
if base_url == '':
base_url = "<base-url-env-var-not-set>"
diff_link = {'href': "{}{}".format(base_url, url_for('diff_history_page', uuid=watch['uuid']))}
# @todo use title if it exists
fe.link(link=diff_link)
fe.title(title=watch['url'])
# @todo in the future <description><![CDATA[<html><body>Any code html is valid.</body></html>]]></description>
fe.description(description=watch['url'])
fe.guid(guid, permalink=False)
dt = datetime.datetime.fromtimestamp(int(watch['newest_history_key']))
dt = dt.replace(tzinfo=pytz.UTC)
fe.pubDate(dt)
response = make_response(fg.rss_str())
response.headers.set('Content-Type', 'application/rss+xml')
return response
@app.route("/", methods=['GET'])
@login_required
def index():
limit_tag = request.args.get('tag')
pause_uuid = request.args.get('pause')
# Redirect for the old rss path which used the /?rss=true
if request.args.get('rss'):
return redirect(url_for('rss', tag=limit_tag))
if pause_uuid:
try:
datastore.data['watching'][pause_uuid]['paused'] ^= True
@@ -239,7 +313,6 @@ def changedetection_app(config=None, datastore_o=None):
except KeyError:
pass
# Sort by last_changed and add the uuid which is usually the key..
sorted_watches = []
for uuid, watch in datastore.data['watching'].items():
@@ -259,42 +332,17 @@ def changedetection_app(config=None, datastore_o=None):
sorted_watches.sort(key=lambda x: x['last_changed'], reverse=True)
existing_tags = datastore.get_all_tags()
rss = request.args.get('rss')
if rss:
fg = FeedGenerator()
fg.title('changedetection.io')
fg.description('Feed description')
fg.link(href='https://changedetection.io')
from changedetectionio import forms
form = forms.quickWatchForm(request.form)
for watch in sorted_watches:
if not watch['viewed']:
# Re #239 - GUID needs to be individual for each event
# @todo In the future make this a configurable link back (see work on BASE_URL https://github.com/dgtlmoon/changedetection.io/pull/228)
guid = "{}/{}".format(watch['uuid'], watch['last_changed'])
fe = fg.add_entry()
fe.title(watch['url'])
fe.link(href=watch['url'])
fe.description(watch['url'])
fe.guid(guid, permalink=False)
dt = datetime.datetime.fromtimestamp(int(watch['newest_history_key']))
dt = dt.replace(tzinfo=pytz.UTC)
fe.pubDate(dt)
response = make_response(fg.rss_str())
response.headers.set('Content-Type', 'application/rss+xml')
return response
else:
from changedetectionio import forms
form = forms.quickWatchForm(request.form)
output = render_template("watch-overview.html",
form=form,
watches=sorted_watches,
tags=existing_tags,
active_tag=limit_tag,
has_unviewed=datastore.data['has_unviewed'])
output = render_template("watch-overview.html",
form=form,
watches=sorted_watches,
tags=existing_tags,
active_tag=limit_tag,
app_rss_token=datastore.data['settings']['application']['rss_access_token'],
has_unviewed=datastore.data['has_unviewed'])
return output
@@ -316,7 +364,7 @@ def changedetection_app(config=None, datastore_o=None):
# I noticed chrome will show '/' but actually submit '-'
limit_date = limit_date.replace('-', '/')
# In the case that :ss seconds are supplied
limit_date = re.sub('(\d\d:\d\d)(:\d\d)', '\\1', limit_date)
limit_date = re.sub(r'(\d\d:\d\d)(:\d\d)', '\\1', limit_date)
str_to_dt = datetime.datetime.strptime(limit_date, '%Y/%m/%d %H:%M')
limit_timestamp = int(str_to_dt.timestamp())
@@ -357,7 +405,7 @@ def changedetection_app(config=None, datastore_o=None):
# Get the most recent one
newest_history_key = datastore.get_val(uuid, 'newest_history_key')
# 0 means that theres only one, so that there should be no 'unviewed' history availabe
# 0 means that theres only one, so that there should be no 'unviewed' history available
if newest_history_key == 0:
newest_history_key = list(datastore.data['watching'][uuid]['history'].keys())[0]
@@ -370,7 +418,11 @@ def changedetection_app(config=None, datastore_o=None):
stripped_content = handler.strip_ignore_text(raw_content,
datastore.data['watching'][uuid]['ignore_text'])
checksum = hashlib.md5(stripped_content).hexdigest()
if datastore.data['settings']['application'].get('ignore_whitespace', False):
checksum = hashlib.md5(stripped_content.translate(None, b'\r\n\t ')).hexdigest()
else:
checksum = hashlib.md5(stripped_content).hexdigest()
return checksum
return datastore.data['watching'][uuid]['previous_md5']
@@ -411,10 +463,13 @@ def changedetection_app(config=None, datastore_o=None):
'tag': form.tag.data.strip(),
'title': form.title.data.strip(),
'headers': form.headers.data,
'body': form.body.data,
'method': form.method.data,
'fetch_backend': form.fetch_backend.data,
'trigger_text': form.trigger_text.data,
'notification_title': form.notification_title.data,
'notification_body': form.notification_body.data,
'notification_format': form.notification_format.data,
'extract_title_as_title': form.extract_title_as_title.data
}
@@ -443,18 +498,25 @@ def changedetection_app(config=None, datastore_o=None):
flash("Updated watch.")
# Re #286 - We wait for syncing new data to disk in another thread every 60 seconds
# But in the case something is added we should save straight away
datastore.sync_to_json()
# Queue the watch for immediate recheck
update_q.put(uuid)
if form.trigger_check.data:
n_object = {'watch_url': form.url.data.strip(),
'notification_urls': form.notification_urls.data,
'notification_title': form.notification_title.data,
'notification_body' : form.notification_body.data
}
notification_q.put(n_object)
flash('Notifications queued.')
if len(form.notification_urls.data):
n_object = {'watch_url': form.url.data.strip(),
'notification_urls': form.notification_urls.data,
'notification_title': form.notification_title.data,
'notification_body': form.notification_body.data,
'notification_format': form.notification_format.data,
}
notification_q.put(n_object)
flash('Test notification queued.')
else:
flash('No notification URLs set, cannot send test.', 'error')
# Diff page [edit] link should go back to diff page
if request.args.get("next") and request.args.get("next") == 'diff':
@@ -476,7 +538,8 @@ def changedetection_app(config=None, datastore_o=None):
uuid=uuid,
watch=datastore.data['watching'][uuid],
form=form,
using_default_minutes=using_default_minutes
using_default_minutes=using_default_minutes,
current_base_url = datastore.data['settings']['application']['base_url']
)
return output
@@ -493,10 +556,13 @@ def changedetection_app(config=None, datastore_o=None):
if request.method == 'GET':
form.minutes_between_check.data = int(datastore.data['settings']['requests']['minutes_between_check'])
form.notification_urls.data = datastore.data['settings']['application']['notification_urls']
form.global_ignore_text.data = datastore.data['settings']['application']['global_ignore_text']
form.ignore_whitespace.data = datastore.data['settings']['application']['ignore_whitespace']
form.extract_title_as_title.data = datastore.data['settings']['application']['extract_title_as_title']
form.fetch_backend.data = datastore.data['settings']['application']['fetch_backend']
form.notification_title.data = datastore.data['settings']['application']['notification_title']
form.notification_body.data = datastore.data['settings']['application']['notification_body']
form.notification_format.data = datastore.data['settings']['application']['notification_format']
form.base_url.data = datastore.data['settings']['application']['base_url']
# Password unset is a GET
@@ -515,17 +581,24 @@ def changedetection_app(config=None, datastore_o=None):
datastore.data['settings']['application']['fetch_backend'] = form.fetch_backend.data
datastore.data['settings']['application']['notification_title'] = form.notification_title.data
datastore.data['settings']['application']['notification_body'] = form.notification_body.data
datastore.data['settings']['application']['notification_format'] = form.notification_format.data
datastore.data['settings']['application']['notification_urls'] = form.notification_urls.data
datastore.data['settings']['application']['base_url'] = form.base_url.data
datastore.data['settings']['application']['global_ignore_text'] = form.global_ignore_text.data
datastore.data['settings']['application']['ignore_whitespace'] = form.ignore_whitespace.data
if form.trigger_check.data and len(form.notification_urls.data):
n_object = {'watch_url': "Test from changedetection.io!",
'notification_urls': form.notification_urls.data,
'notification_title': form.notification_title.data,
'notification_body': form.notification_body.data
}
notification_q.put(n_object)
flash('Notifications queued.')
if form.trigger_check.data:
if len(form.notification_urls.data):
n_object = {'watch_url': "Test from changedetection.io!",
'notification_urls': form.notification_urls.data,
'notification_title': form.notification_title.data,
'notification_body': form.notification_body.data,
'notification_format': form.notification_format.data,
}
notification_q.put(n_object)
flash('Test notification queued.')
else:
flash('No notification URLs set, cannot send test.', 'error')
if form.password.encrypted_password:
datastore.data['settings']['application']['password'] = form.password.encrypted_password
@@ -539,7 +612,7 @@ def changedetection_app(config=None, datastore_o=None):
if request.method == 'POST' and not form.validate():
flash("An error occurred, please see below.", "error")
output = render_template("settings.html", form=form)
output = render_template("settings.html", form=form, current_base_url = datastore.data['settings']['application']['base_url'])
return output
@@ -605,6 +678,7 @@ def changedetection_app(config=None, datastore_o=None):
dates = list(watch['history'].keys())
# Convert to int, sort and back to str again
# @todo replace datastore getter that does this automatically
dates = [int(i) for i in dates]
dates.sort(reverse=True)
dates = [str(i) for i in dates]
@@ -615,13 +689,11 @@ def changedetection_app(config=None, datastore_o=None):
# Save the current newest history as the most recently viewed
datastore.set_last_viewed(uuid, dates[0])
newest_file = watch['history'][dates[0]]
with open(newest_file, 'r') as f:
newest_version_file_contents = f.read()
previous_version = request.args.get('previous_version')
try:
previous_file = watch['history'][previous_version]
except KeyError:
@@ -675,7 +747,7 @@ def changedetection_app(config=None, datastore_o=None):
@app.route("/favicon.ico", methods=['GET'])
def favicon():
return send_from_directory("/app/static/images", filename="favicon.ico")
return send_from_directory("static/images", path="favicon.ico")
# We're good but backups are even better!
@app.route("/backup", methods=['GET'])
@@ -686,7 +758,8 @@ def changedetection_app(config=None, datastore_o=None):
from pathlib import Path
# Remove any existing backup file, for now we just keep one file
for previous_backup_filename in Path(app.config['datastore_path']).rglob('changedetection-backup-*.zip'):
for previous_backup_filename in Path(datastore_o.datastore_path).rglob('changedetection-backup-*.zip'):
os.unlink(previous_backup_filename)
# create a ZipFile object
@@ -694,7 +767,7 @@ def changedetection_app(config=None, datastore_o=None):
# We only care about UUIDS from the current index file
uuids = list(datastore.data['watching'].keys())
backup_filepath = os.path.join(app.config['datastore_path'], backupname)
backup_filepath = os.path.join(datastore_o.datastore_path, backupname)
with zipfile.ZipFile(backup_filepath, "w",
compression=zipfile.ZIP_DEFLATED,
@@ -704,22 +777,22 @@ def changedetection_app(config=None, datastore_o=None):
datastore.sync_to_json()
# Add the index
zipObj.write(os.path.join(app.config['datastore_path'], "url-watches.json"), arcname="url-watches.json")
zipObj.write(os.path.join(datastore_o.datastore_path, "url-watches.json"), arcname="url-watches.json")
# Add the flask app secret
zipObj.write(os.path.join(app.config['datastore_path'], "secret.txt"), arcname="secret.txt")
zipObj.write(os.path.join(datastore_o.datastore_path, "secret.txt"), arcname="secret.txt")
# Add any snapshot data we find, use the full path to access the file, but make the file 'relative' in the Zip.
for txt_file_path in Path(app.config['datastore_path']).rglob('*.txt'):
for txt_file_path in Path(datastore_o.datastore_path).rglob('*.txt'):
parent_p = txt_file_path.parent
if parent_p.name in uuids:
zipObj.write(txt_file_path,
arcname=str(txt_file_path).replace(app.config['datastore_path'], ''),
arcname=str(txt_file_path).replace(datastore_o.datastore_path, ''),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
# Create a list file with just the URLs, so it's easier to port somewhere else in the future
list_file = os.path.join(app.config['datastore_path'], "url-list.txt")
list_file = os.path.join(datastore_o.datastore_path, "url-list.txt")
with open(list_file, "w") as f:
for uuid in datastore.data['watching']:
url = datastore.data['watching'][uuid]['url']
@@ -731,13 +804,14 @@ def changedetection_app(config=None, datastore_o=None):
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
return send_from_directory(app.config['datastore_path'], backupname, as_attachment=True)
# Send_from_directory needs to be the full absolute path
return send_from_directory(os.path.abspath(datastore_o.datastore_path), backupname, as_attachment=True)
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
def static_content(group, filename):
# These files should be in our subdirectory
try:
return send_from_directory("static/{}".format(group), filename=filename)
return send_from_directory("static/{}".format(group), path=filename)
except FileNotFoundError:
abort(404)
@@ -778,9 +852,13 @@ def changedetection_app(config=None, datastore_o=None):
@app.route("/api/clone", methods=['GET'])
@login_required
def api_clone():
uuid = request.args.get('uuid')
datastore.clone(uuid)
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
new_uuid = datastore.clone(uuid)
update_q.put(new_uuid)
flash('Cloned.')
return redirect(url_for('index'))
@@ -827,8 +905,10 @@ def changedetection_app(config=None, datastore_o=None):
threading.Thread(target=notification_runner).start()
# Check for new release version
threading.Thread(target=check_for_new_version).start()
# Check for new release version, but not when running in test/build
if not os.getenv("GITHUB_REF", False):
threading.Thread(target=check_for_new_version).start()
return app
@@ -877,8 +957,6 @@ def notification_runner():
except Exception as e:
print("Watch URL: {} Error {}".format(n_object['watch_url'], e))
# Thread runner to check every minute, look for new watches to feed into the Queue.
def ticker_thread_check_time_launch_checks():
from changedetectionio import update_worker
@@ -902,7 +980,6 @@ def ticker_thread_check_time_launch_checks():
# Check for watches outside of the time threshold to put in the thread queue.
for uuid, watch in copied_datastore.data['watching'].items():
# If they supplied an individual entry minutes to threshold.
if 'minutes_between_check' in watch and watch['minutes_between_check'] is not None:
# Cast to int just incase

View File

@@ -3,17 +3,25 @@ import time
from abc import ABC, abstractmethod
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy as SeleniumProxy
from selenium.common.exceptions import WebDriverException
import urllib3.exceptions
class EmptyReply(Exception):
def __init__(self, status_code, url):
# Set this so we can use it in other parts of the app
self.status_code = status_code
self.url = url
return
pass
class Fetcher():
error = None
status_code = None
content = None # Should be bytes?
content = None # Should always be bytes.
headers = None
fetcher_description ="No description"
@@ -22,7 +30,7 @@ class Fetcher():
return self.error
@abstractmethod
def run(self, url, timeout, request_headers):
def run(self, url, timeout, request_headers, request_body, request_method):
# Should set self.error, self.status_code and self.content
pass
@@ -65,15 +73,39 @@ class html_webdriver(Fetcher):
command_executor = ''
def __init__(self):
self.command_executor = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub')
# Configs for Proxy setup
# In the ENV vars, is prefixed with "webdriver_", so it is for example "webdriver_sslProxy"
selenium_proxy_settings_mappings = ['proxyType', 'ftpProxy', 'httpProxy', 'noProxy',
'proxyAutoconfigUrl', 'sslProxy', 'autodetect',
'socksProxy', 'socksVersion', 'socksUsername', 'socksPassword']
def run(self, url, timeout, request_headers):
proxy=None
def __init__(self):
# .strip('"') is going to save someone a lot of time when they accidently wrap the env value
self.command_executor = os.getenv("WEBDRIVER_URL", 'http://browser-chrome:4444/wd/hub').strip('"')
# If any proxy settings are enabled, then we should setup the proxy object
proxy_args = {}
for k in self.selenium_proxy_settings_mappings:
v = os.getenv('webdriver_' + k, False)
if v:
proxy_args[k] = v.strip('"')
if proxy_args:
self.proxy = SeleniumProxy(raw=proxy_args)
def run(self, url, timeout, request_headers, request_body, request_method):
# request_body, request_method unused for now, until some magic in the future happens.
# check env for WEBDRIVER_URL
driver = webdriver.Remote(
command_executor=self.command_executor,
desired_capabilities=DesiredCapabilities.CHROME)
desired_capabilities=DesiredCapabilities.CHROME,
proxy=self.proxy)
try:
driver.get(url)
@@ -84,10 +116,13 @@ class html_webdriver(Fetcher):
# @todo - how to check this? is it possible?
self.status_code = 200
# @todo somehow we should try to get this working for WebDriver
# raise EmptyReply(url=url, status_code=r.status_code)
# @todo - dom wait loaded?
time.sleep(5)
self.content = driver.page_source
self.headers = {}
driver.quit()
@@ -104,28 +139,32 @@ class html_webdriver(Fetcher):
# driver.quit() seems to cause better exceptions
driver.quit()
return True
# "html_requests" is listed as the default fetcher in store.py!
class html_requests(Fetcher):
fetcher_description = "Basic fast Plaintext/HTTP Client"
def run(self, url, timeout, request_headers):
def run(self, url, timeout, request_headers, request_body, request_method):
import requests
r = requests.get(url,
r = requests.request(method=request_method,
data=request_body,
url=url,
headers=request_headers,
timeout=timeout,
verify=False)
# https://stackoverflow.com/questions/44203397/python-requests-get-returns-improperly-decoded-text-instead-of-utf-8
# Return bytes here
html = r.text
# @todo test this
# @todo maybe you really want to test zero-byte return pages?
if not r or not html or not len(html):
raise EmptyReply(url)
raise EmptyReply(url=url, status_code=r.status_code)
self.status_code = r.status_code
self.content = html
self.headers = r.headers

43
changedetectionio/diff.py Normal file
View File

@@ -0,0 +1,43 @@
# used for the notifications, the front-end is using a JS library
import difflib
# like .compare but a little different output
def customSequenceMatcher(before, after, include_equal=False):
cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \\t", a=before, b=after)
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if include_equal and tag == 'equal':
g = before[alo:ahi]
yield g
elif tag == 'delete':
g = "(removed) {}".format(before[alo])
yield g
elif tag == 'replace':
g = ["(changed) {}".format(before[alo]), "(-> into) {}".format(after[blo])]
yield g
elif tag == 'insert':
g = "(added) {}".format(after[blo])
yield g
# only_differences - only return info about the differences, no context
# line_feed_sep could be "<br/>" or "<li>" or "\n" etc
def render_diff(previous_file, newest_file, include_equal=False, line_feed_sep="\n"):
with open(newest_file, 'r') as f:
newest_version_file_contents = f.read()
newest_version_file_contents = [line.rstrip() for line in newest_version_file_contents.splitlines()]
if previous_file:
with open(previous_file, 'r') as f:
previous_version_file_contents = f.read()
previous_version_file_contents = [line.rstrip() for line in previous_version_file_contents.splitlines()]
else:
previous_version_file_contents = ""
rendered_diff = customSequenceMatcher(previous_version_file_contents,
newest_version_file_contents,
include_equal)
# Recursively join lists
f = lambda L: line_feed_sep.join([f(x) if type(x) is list else x for x in L])
return f(rendered_diff)

View File

@@ -58,10 +58,7 @@ class perform_site_check():
watch = self.datastore.data['watching'][uuid]
update_obj = {'previous_md5': self.datastore.data['watching'][uuid]['previous_md5'],
'history': {},
"last_checked": timestamp
}
update_obj = {}
extra_headers = self.datastore.get_val(uuid, 'headers')
@@ -80,6 +77,8 @@ class perform_site_check():
else:
timeout = self.datastore.data['settings']['requests']['timeout']
url = self.datastore.get_val(uuid, 'url')
request_body = self.datastore.get_val(uuid, 'body')
request_method = self.datastore.get_val(uuid, 'method')
# Pluggable content fetcher
prefer_backend = watch['fetch_backend']
@@ -91,7 +90,7 @@ class perform_site_check():
fetcher = klass()
fetcher.run(url, timeout, request_headers)
fetcher.run(url, timeout, request_headers, request_body, request_method)
# Fetching complete, now filters
# @todo move to class / maybe inside of fetcher abstract base?
@@ -102,41 +101,61 @@ class perform_site_check():
# https://stackoverflow.com/questions/41817578/basic-method-chaining ?
# return content().textfilter().jsonextract().checksumcompare() ?
is_html = True
is_json = fetcher.headers.get('Content-Type', '') == 'application/json'
is_html = not is_json
css_filter_rule = watch['css_filter']
if css_filter_rule and len(css_filter_rule.strip()):
has_filter_rule = css_filter_rule and len(css_filter_rule.strip())
if is_json and not has_filter_rule:
css_filter_rule = "json:$"
has_filter_rule = True
if has_filter_rule:
if 'json:' in css_filter_rule:
stripped_text_from_html = html_tools.extract_json_as_string(content=fetcher.content, jsonpath_filter=css_filter_rule)
is_html = False
else:
# CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
stripped_text_from_html = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content)
if is_html:
# CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
html_content = fetcher.content
if css_filter_rule and len(css_filter_rule.strip()):
html_content = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content)
if has_filter_rule:
# For HTML/XML we offer xpath as an option, just start a regular xPath "/.."
if css_filter_rule[0] == '/':
html_content = html_tools.xpath_filter(xpath_filter=css_filter_rule, html_content=fetcher.content)
else:
# CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
html_content = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content)
# get_text() via inscriptis
stripped_text_from_html = get_text(html_content)
# Re #340 - return the content before the 'ignore text' was applied
text_content_before_ignored_filter = stripped_text_from_html.encode('utf-8')
# We rely on the actual text in the html output.. many sites have random script vars etc,
# in the future we'll implement other mechanisms.
update_obj["last_check_status"] = fetcher.get_last_status_code()
update_obj["last_error"] = False
# If there's text to skip
# @todo we could abstract out the get_text() to handle this cleaner
if len(watch['ignore_text']):
stripped_text_from_html = self.strip_ignore_text(stripped_text_from_html, watch['ignore_text'])
text_to_ignore = watch.get('ignore_text', []) + self.datastore.data['settings']['application'].get('global_ignore_text', [])
if len(text_to_ignore):
stripped_text_from_html = self.strip_ignore_text(stripped_text_from_html, text_to_ignore)
else:
stripped_text_from_html = stripped_text_from_html.encode('utf8')
# Re #133 - if we should strip whitespaces from triggering the change detected comparison
if self.datastore.data['settings']['application'].get('ignore_whitespace', False):
fetched_md5 = hashlib.md5(stripped_text_from_html.translate(None, b'\r\n\t ')).hexdigest()
else:
fetched_md5 = hashlib.md5(stripped_text_from_html).hexdigest()
fetched_md5 = hashlib.md5(stripped_text_from_html).hexdigest()
# On the first run of a site, watch['previous_md5'] will be an empty string, set it the current one.
if not len(watch['previous_md5']):
watch['previous_md5'] = fetched_md5
update_obj["previous_md5"] = fetched_md5
blocked_by_not_found_trigger_text = False
@@ -158,16 +177,12 @@ class perform_site_check():
break
# could be None or False depending on JSON type
# On the first run of a site, watch['previous_md5'] will be an empty string
if not blocked_by_not_found_trigger_text and watch['previous_md5'] != fetched_md5:
changed_detected = True
# Don't confuse people by updating as last-changed, when it actually just changed from None..
if self.datastore.get_val(uuid, 'previous_md5'):
update_obj["last_changed"] = timestamp
update_obj["previous_md5"] = fetched_md5
update_obj["last_changed"] = timestamp
# Extract title as title
if is_html:
@@ -176,4 +191,4 @@ class perform_site_check():
update_obj['title'] = html_tools.extract_element(find='title', html_content=fetcher.content)
return changed_detected, update_obj, stripped_text_from_html
return changed_detected, update_obj, text_content_before_ignored_filter

View File

@@ -6,6 +6,18 @@ from wtforms.fields import html5
from changedetectionio import content_fetcher
import re
from changedetectionio.notification import default_notification_format, valid_notification_formats, default_notification_body, default_notification_title
valid_method = {
'GET',
'POST',
'PUT',
'PATCH',
'DELETE',
}
default_method = 'GET'
class StringListField(StringField):
widget = widgets.TextArea()
@@ -104,10 +116,12 @@ class ValidateContentFetcherIsReady(object):
except urllib3.exceptions.MaxRetryError as e:
driver_url = some_object.command_executor
message = field.gettext('Content fetcher \'%s\' did not respond.' % (field.data))
message += '<br/>'+field.gettext('Be sure that the selenium/webdriver runner is running and accessible via network from this container/host.')
message += '<br/>' + field.gettext(
'Be sure that the selenium/webdriver runner is running and accessible via network from this container/host.')
message += '<br/>' + field.gettext('Did you follow the instructions in the wiki?')
message += '<br/><br/>' + field.gettext('WebDriver Host: %s' % (driver_url))
message += '<br/><a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">Go here for more information</a>'
message += '<br/>'+field.gettext('Content fetcher did not respond properly, unable to use it.\n %s' % (str(e)))
raise ValidationError(message)
@@ -167,7 +181,7 @@ class ValidateListRegex(object):
message = field.gettext('RegEx \'%s\' is not a valid regular expression.')
raise ValidationError(message % (line))
class ValidateCSSJSONInput(object):
class ValidateCSSJSONXPATHInput(object):
"""
Filter validation
@todo CSS validator ;)
@@ -177,18 +191,40 @@ class ValidateCSSJSONInput(object):
self.message = message
def __call__(self, form, field):
# Nothing to see here
if not len(field.data.strip()):
return
# Does it look like XPath?
if field.data.strip()[0] == '/':
from lxml import html, etree
tree = html.fromstring("<html></html>")
try:
tree.xpath(field.data.strip())
except etree.XPathEvalError as e:
message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
raise ValidationError(message % (field.data, str(e)))
except:
raise ValidationError("A system-error occurred when validating your XPath expression")
if 'json:' in field.data:
from jsonpath_ng.exceptions import JsonPathParserError
from jsonpath_ng import jsonpath, parse
from jsonpath_ng.exceptions import JsonPathParserError, JsonPathLexerError
from jsonpath_ng.ext import parse
input = field.data.replace('json:', '')
try:
parse(input)
except JsonPathParserError as e:
except (JsonPathParserError, JsonPathLexerError) as e:
message = field.gettext('\'%s\' is not a valid JSONPath expression. (%s)')
raise ValidationError(message % (input, str(e)))
except:
raise ValidationError("A system-error occurred when validating your JSONPath expression")
# Re #265 - maybe in the future fetch the page and offer a
# warning/notice that its possible the rule doesnt yet match anything?
class quickWatchForm(Form):
# https://wtforms.readthedocs.io/en/2.3.x/fields/#module-wtforms.fields.html5
@@ -199,8 +235,9 @@ class quickWatchForm(Form):
class commonSettingsForm(Form):
notification_urls = StringListField('Notification URL List', validators=[validators.Optional(), ValidateAppRiseServers()])
notification_title = StringField('Notification Title', default='ChangeDetection.io Notification - {watch_url}', validators=[validators.Optional(), ValidateTokensList()])
notification_body = TextAreaField('Notification Body', default='{watch_url} had a change.', validators=[validators.Optional(), ValidateTokensList()])
notification_title = StringField('Notification Title', default=default_notification_title, validators=[validators.Optional(), ValidateTokensList()])
notification_body = TextAreaField('Notification Body', default=default_notification_body, validators=[validators.Optional(), ValidateTokensList()])
notification_format = SelectField('Notification Format', choices=valid_notification_formats.keys(), default=default_notification_format)
trigger_check = BooleanField('Send test notification on save')
fetch_backend = RadioField(u'Fetch Method', choices=content_fetcher.available_fetchers(), validators=[ValidateContentFetcherIsReady()])
extract_title_as_title = BooleanField('Extract <title> from document and use as watch title', default=False)
@@ -212,13 +249,27 @@ class watchForm(commonSettingsForm):
minutes_between_check = html5.IntegerField('Maximum time in minutes until recheck',
[validators.Optional(), validators.NumberRange(min=1)])
css_filter = StringField('CSS/JSON Filter', [ValidateCSSJSONInput()])
css_filter = StringField('CSS/JSON/XPATH Filter', [ValidateCSSJSONXPATHInput()])
title = StringField('Title')
ignore_text = StringListField('Ignore Text', [ValidateListRegex()])
headers = StringDictKeyValue('Request Headers')
body = TextAreaField('Request Body', [validators.Optional()])
method = SelectField('Request Method', choices=valid_method, default=default_method)
trigger_text = StringListField('Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
def validate(self, **kwargs):
if not super().validate():
return False
result = True
# Fail form validation when a body is set for a GET
if self.method.data == 'GET' and self.body.data:
self.body.errors.append('Body must be empty when Request Method is set to GET')
result = False
return result
class globalSettingsForm(commonSettingsForm):
@@ -227,3 +278,5 @@ class globalSettingsForm(commonSettingsForm):
[validators.NumberRange(min=1)])
extract_title_as_title = BooleanField('Extract <title> from document and use as watch title')
base_url = StringField('Base URL', validators=[validators.Optional()])
global_ignore_text = StringListField('Ignore Text', [ValidateListRegex()])
ignore_whitespace = BooleanField('Ignore whitespace')

View File

@@ -1,6 +1,6 @@
import json
from bs4 import BeautifulSoup
from jsonpath_ng import parse
from jsonpath_ng.ext import parse
class JSONNotFound(ValueError):
@@ -17,6 +17,20 @@ def css_filter(css_filter, html_content):
return html_block + "\n"
# Return str Utf-8 of matched rules
def xpath_filter(xpath_filter, html_content):
from lxml import html
from lxml import etree
tree = html.fromstring(html_content)
html_block = ""
for item in tree.xpath(xpath_filter.strip()):
html_block+= etree.tostring(item, pretty_print=True).decode('utf-8')+"<br/>"
return html_block
# Extract/find element
def extract_element(find='title', html_content=''):
@@ -45,8 +59,10 @@ def _parse_json(json_data, jsonpath_filter):
if len(match) == 1:
s = match[0].value
if not s:
raise JSONNotFound("No Matching JSON could be found for the rule {}".format(jsonpath_filter.replace('json:', '')))
# Re #257 - Better handling where it does not exist, in the case the original 's' value was False..
if not match:
# Re 265 - Just return an empty string when filter not found
return ''
stripped_text_from_html = json.dumps(s, indent=4)
@@ -85,6 +101,7 @@ def extract_json_as_string(content, jsonpath_filter):
break
if not stripped_text_from_html:
raise JSONNotFound("No JSON matching the rule '%s' found" % jsonpath_filter.replace('json:',''))
# Re 265 - Just return an empty string when filter not found
return ''
return stripped_text_from_html

View File

@@ -1,5 +1,5 @@
import os
import apprise
from apprise import NotifyFormat
valid_tokens = {
'base_url': '',
@@ -7,11 +7,22 @@ valid_tokens = {
'watch_uuid': '',
'watch_title': '',
'watch_tag': '',
'diff': '',
'diff_full': '',
'diff_url': '',
'preview_url': '',
'current_snapshot': ''
}
valid_notification_formats = {
'Text': NotifyFormat.TEXT,
'Markdown': NotifyFormat.MARKDOWN,
'HTML': NotifyFormat.HTML,
}
default_notification_format = 'Text'
default_notification_body = '{watch_url} had a change.\n---\n{diff}\n---\n'
default_notification_title = 'ChangeDetection.io Notification - {watch_url}'
def process_notification(n_object, datastore):
import logging
@@ -25,8 +36,13 @@ def process_notification(n_object, datastore):
apobj.add(url)
# Get the notification body from datastore
n_body = n_object['notification_body']
n_title = n_object['notification_title']
n_body = n_object.get('notification_body', default_notification_body)
n_title = n_object.get('notification_title', default_notification_title)
n_format = valid_notification_formats.get(
n_object['notification_format'],
valid_notification_formats[default_notification_format],
)
# Insert variables into the notification content
notification_parameters = create_notification_parameters(n_object, datastore)
@@ -39,7 +55,8 @@ def process_notification(n_object, datastore):
apobj.notify(
body=n_body,
title=n_title
title=n_title,
body_format=n_format,
)
# Notification title + body content parameters get created here.
@@ -74,15 +91,17 @@ def create_notification_parameters(n_object, datastore):
# Valid_tokens also used as a field validator
tokens.update(
{
'base_url': base_url if base_url is not None else '',
'watch_url': watch_url,
'watch_uuid': uuid,
'watch_title': watch_title if watch_title is not None else '',
'watch_tag': watch_tag if watch_tag is not None else '',
'diff_url': diff_url,
'preview_url': preview_url,
'current_snapshot': n_object['current_snapshot'] if 'current_snapshot' in n_object else ''
})
{
'base_url': base_url if base_url is not None else '',
'watch_url': watch_url,
'watch_uuid': uuid,
'watch_title': watch_title if watch_title is not None else '',
'watch_tag': watch_tag if watch_tag is not None else '',
'diff_url': diff_url,
'diff': n_object.get('diff', ''), # Null default in the case we use a test
'diff_full': n_object.get('diff_full', ''), # Null default in the case we use a test
'preview_url': preview_url,
'current_snapshot': n_object['current_snapshot'] if 'current_snapshot' in n_object else ''
})
return tokens
return tokens

View File

@@ -9,15 +9,16 @@
# exit when any command fails
set -e
find tests/test_*py -type f|while read test_name
do
echo "TEST RUNNING $test_name"
pytest $test_name
done
echo "RUNNING WITH BASE_URL SET"
# Now re-run some tests with BASE_URL enabled
# Re #65 - Ability to include a link back to the installation, in the notification.
export BASE_URL="https://really-unique-domain.io"
pytest tests/test_notification.py

View File

@@ -3,7 +3,7 @@
padding: 2em;
margin: 1em;
border-radius: 5px;
font-size: 9px; }
font-size: 11px; }
#diff-ui table {
table-layout: fixed;
width: 100%; }

View File

@@ -4,7 +4,7 @@
padding: 2em;
margin: 1em;
border-radius: 5px;
font-size: 9px;
font-size: 11px;
table {
table-layout: fixed;

View File

@@ -4,8 +4,8 @@
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"scss": "node-sass --watch styles.scss diff.scss -o ."
"build": "node-sass styles.scss diff.scss -o .",
"watch": "node-sass --watch styles.scss diff.scss -o ."
},
"author": "",
"license": "ISC",

File diff suppressed because one or more lines are too long

View File

@@ -1,7 +1,10 @@
/*
* -- BASE STYLES --
* Most of these are inherited from Base, but I want to change a few.
* npm run scss
* nvm use v14.18.1
* npm install
* npm run build
* or npm run watch
*/
body {
color: #333;

View File

@@ -9,6 +9,8 @@ import time
import threading
import os
from changedetectionio.notification import default_notification_format, default_notification_body, default_notification_title
# Is there an existing library to ensure some data store (JSON etc) is in sync with CRUD methods?
# Open a github issue if you know something :)
# https://stackoverflow.com/questions/6190468/how-to-trigger-function-on-value-change
@@ -16,6 +18,8 @@ class ChangeDetectionStore:
lock = Lock()
def __init__(self, datastore_path="/datastore", include_default_watches=True, version_tag="0.0.0"):
# Should only be active for docker
# logging.basicConfig(filename='/dev/stdout', level=logging.INFO)
self.needs_write = False
self.datastore_path = datastore_path
self.json_store_path = "{}/url-watches.json".format(self.datastore_path)
@@ -41,10 +45,13 @@ class ChangeDetectionStore:
'base_url' : None,
'extract_title_as_title': False,
'fetch_backend': 'html_requests',
'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
'ignore_whitespace': False,
'notification_urls': [], # Apprise URL list
# Custom notification content
'notification_title': None,
'notification_body': None,
'notification_format': None
}
}
}
@@ -65,12 +72,15 @@ class ChangeDetectionStore:
'previous_md5': "",
'uuid': str(uuid_builder.uuid4()),
'headers': {}, # Extra headers to send
'body': None,
'method': 'GET',
'history': {}, # Dict of timestamp and output stripped filename
'ignore_text': [], # List of text to ignore when calculating the comparison checksum
# Custom notification content
'notification_urls': [], # List of URLs to add to the notification Queue (Usually AppRise)
'notification_title': None,
'notification_body': None,
'notification_format': None,
'css_filter': "",
'trigger_text': [], # List of text or regex to wait for until a change is detected
'fetch_backend': None,
@@ -141,6 +151,12 @@ class ChangeDetectionStore:
else:
self.__data['app_guid'] = str(uuid_builder.uuid4())
# Generate the URL access token for RSS feeds
if not 'rss_access_token' in self.__data['settings']['application']:
import secrets
secret = secrets.token_hex(16)
self.__data['settings']['application']['rss_access_token'] = secret
self.needs_write = True
# Finally start the thread that will manage periodic data saves to JSON
@@ -153,6 +169,7 @@ class ChangeDetectionStore:
dates = list(self.__data['watching'][uuid]['history'].keys())
# Convert to int, sort and back to str again
# @todo replace datastore getter that does this automatically
dates = [int(i) for i in dates]
dates.sort(reverse=True)
if len(dates):
@@ -204,8 +221,8 @@ class ChangeDetectionStore:
# Re #152, Return env base_url if not overriden, @todo also prefer the proxy pass url
env_base_url = os.getenv('BASE_URL','')
if self.__data['settings']['application']['base_url'] is None and len(env_base_url) >0:
self.__data['settings']['application']['base_url'] = env_base_url.strip('" ')
if not self.__data['settings']['application']['base_url']:
self.__data['settings']['application']['base_url'] = env_base_url.strip('" ')
self.__data['has_unviewed'] = has_unviewed
@@ -251,24 +268,11 @@ class ChangeDetectionStore:
# Clone a watch by UUID
def clone(self, uuid):
with self.lock:
new_uuid = str(uuid_builder.uuid4())
_clone = deepcopy(self.data['watching'][uuid])
_clone.update({'uuid': new_uuid})
attributes_to_reset = [
'last_checked',
'last_changed',
'last_viewed',
'newest_history_key',
'previous_md5',
'history'
]
for attribute in attributes_to_reset:
_clone.update({attribute: self.generic_definition[attribute]})
self.data['watching'][new_uuid] = _clone
self.needs_write = True
url = self.data['watching'][uuid]['url']
tag = self.data['watching'][uuid]['tag']
extras = self.data['watching'][uuid]
new_uuid = self.add_watch(url=url, tag=tag, extras=extras)
return new_uuid
def url_exists(self, url):
@@ -325,17 +329,27 @@ class ChangeDetectionStore:
self.needs_write = True
return changes_removed
def add_watch(self, url, tag):
def add_watch(self, url, tag, extras=None):
if extras is None:
extras = {}
with self.lock:
# @todo use a common generic version of this
new_uuid = str(uuid_builder.uuid4())
_blank = deepcopy(self.generic_definition)
_blank.update({
'url': url,
'tag': tag,
'uuid': new_uuid
'tag': tag
})
# Incase these are copied across, assume it's a reference and deepcopy()
apply_extras = deepcopy(extras)
for k in ['uuid', 'history', 'last_checked', 'last_changed', 'newest_history_key', 'previous_md5', 'viewed']:
if k in apply_extras:
del apply_extras[k]
_blank.update(apply_extras)
self.data['watching'][new_uuid] = _blank
# Get the directory ready
@@ -354,6 +368,10 @@ class ChangeDetectionStore:
import uuid
output_path = "{}/{}".format(self.datastore_path, watch_uuid)
# Incase the operator deleted it, check and create.
if not os.path.isdir(output_path):
mkdir(output_path)
fname = "{}/{}.stripped.txt".format(output_path, uuid.uuid4())
with open(fname, 'wb') as f:
f.write(contents)
@@ -362,20 +380,30 @@ class ChangeDetectionStore:
return fname
def sync_to_json(self):
print("Saving..")
data ={}
logging.info("Saving JSON..")
try:
data = deepcopy(self.__data)
except RuntimeError:
time.sleep(0.5)
print ("! Data changed when writing to JSON, trying again..")
except RuntimeError as e:
# Try again in 15 seconds
time.sleep(15)
logging.error ("! Data changed when writing to JSON, trying again.. %s", str(e))
self.sync_to_json()
return
else:
with open(self.json_store_path, 'w') as json_file:
json.dump(data, json_file, indent=4)
logging.info("Re-saved index")
try:
# Re #286 - First write to a temp file, then confirm it looks OK and rename it
# This is a fairly basic strategy to deal with the case that the file is corrupted,
# system was out of memory, out of RAM etc
with open(self.json_store_path+".tmp", 'w') as json_file:
json.dump(data, json_file, indent=4)
except Exception as e:
logging.error("Error writing JSON!! (Main JSON file save was skipped) : %s", str(e))
else:
os.rename(self.json_store_path+".tmp", self.json_store_path)
self.needs_write = False
@@ -390,7 +418,13 @@ class ChangeDetectionStore:
if self.needs_write:
self.sync_to_json()
time.sleep(3)
# Once per minute is enough, more and it can cause high CPU usage
# better here is to use something like self.app.config.exit.wait(1), but we cant get to 'app' from here
for i in range(30):
time.sleep(2)
if self.stop_thread:
break
# Go through the datastore path and remove any snapshots that are not mentioned in the index
# This usually is not used, but can be handy.

View File

@@ -1,7 +1,7 @@
{% from '_helpers.jinja' import render_field %}
{% macro render_notifications_field(form) %}
{% macro render_common_settings_form(form, current_base_url) %}
<div class="pure-control-group">
{{ render_field(form.notification_urls, rows=5, placeholder="Examples:
@@ -12,7 +12,7 @@
}}
<div class="pure-form-message-inline">Use <a target=_new
href="https://github.com/caronc/apprise">AppRise
URLs</a> for notification to just about any service!
URLs</a> for notification to just about any service! <i><a target=_new href="https://github.com/dgtlmoon/changedetection.io/wiki/Notification-configuration-notes">Please read the notification services wiki here for important configuration notes</a></i>
</div>
</div>
<div id="notification-customisation">
@@ -24,6 +24,10 @@
{{ render_field(form.notification_body , rows=5) }}
<span class="pure-form-message-inline">Body for all notifications</span>
</div>
<div class="pure-control-group">
{{ render_field(form.notification_format , rows=5) }}
<span class="pure-form-message-inline">Format for all notifications</span>
</div>
<div class="pure-controls">
<span class="pure-form-message-inline">
These tokens can be used in the notification body and title to
@@ -61,6 +65,14 @@
<td><code>{preview_url}</code></td>
<td>The URL of the preview page generated by changedetection.io.</td>
</tr>
<tr>
<td><code>{diff}</code></td>
<td>The diff output - differences only</td>
</tr>
<tr>
<td><code>{diff_full}</code></td>
<td>The diff output - full difference output</td>
</tr>
<tr>
<td><code>{diff_url}</code></td>
<td>The URL of the diff page generated by changedetection.io.</td>
@@ -74,7 +86,7 @@
</table>
<span class="pure-form-message-inline">
URLs generated by changedetection.io (such as <code>{diff_url}</code>) require the <code>BASE_URL</code> environment variable set.<br/>
Your <code>BASE_URL</code> var is currently "{{base_url}}"
Your <code>BASE_URL</code> var is currently "{{current_base_url}}"
</span>
</div>
</div>

View File

@@ -1,7 +1,7 @@
{% extends 'base.html' %}
{% block content %}
{% from '_helpers.jinja' import render_field %}
{% from '_common_fields.jinja' import render_notifications_field %}
{% from '_common_fields.jinja' import render_common_settings_form %}
<script type="text/javascript" src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
<div class="edit-form monospaced-textarea">
@@ -9,9 +9,9 @@
<div class="tabs">
<ul>
<li class="tab" id="default-tab"><a href="#general">General</a></li>
<li class="tab"><a href="#request">Request</a></li>
<li class="tab"><a href="#filters-and-triggers">Filters &amp; Triggers</a></li>
<li class="tab"><a href="#notifications">Notifications</a></li>
<li class="tab"><a href="#filters">Filters</a></li>
<li class="tab"><a href="#triggers">Triggers</a></li>
</ul>
</div>
@@ -41,37 +41,51 @@
href="{{ url_for('settings_page', uuid=uuid) }}">default global settings</a>.</span>
{% endif %}
</div>
<fieldset class="pure-group">
{{ render_field(form.headers, rows=5, placeholder="Example
Cookie: foobar
User-Agent: wonderbra 1.0") }}
<span class="pure-form-message-inline">
Note: ONLY used by Basic fast Plaintext/HTTP Client
</span>
</fieldset>
<div class="pure-control-group">
{{ render_field(form.fetch_backend) }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched sites don't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server. </p>
</span>
</div>
<div class="pure-control-group">
{{ render_field(form.extract_title_as_title) }}
</div>
</fieldset>
</div>
<div class="tab-pane-inner" id="request">
<div class="pure-control-group">
{{ render_field(form.fetch_backend) }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched site doesn't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server, set by the ENV var 'WEBDRIVER_URL'. </p>
</span>
</div>
<fieldset class="pure-group">
<div class="pure-control-group">
{{ render_field(form.method) }}
</div>
<strong>Note: <i>Request Headers and Body settings are ONLY used by Basic fast Plaintext/HTTP Client fetch method.</i></strong>
{{ render_field(form.headers, rows=5, placeholder="Example
Cookie: foobar
User-Agent: wonderbra 1.0") }}
</fieldset>
<div class="pure-control-group">
{{ render_field(form.body, rows=5, placeholder="Example
{
\"name\":\"John\",
\"age\":30,
\"car\":null
}") }}
</div>
</div>
<div class="tab-pane-inner" id="notifications">
<strong>Note: <i>These settings override the global settings.</i></strong>
<fieldset>
<div class="field-group">
{{ render_notifications_field(form) }}
{{ render_common_settings_form(form, current_base_url) }}
</div>
</fieldset>
</div>
<div class="tab-pane-inner" id="filters">
<div class="tab-pane-inner" id="filters-and-triggers">
<fieldset>
<div class="pure-control-group">
{{ render_field(form.css_filter, placeholder=".class-name or #some-id, or other CSS selector rule.",
@@ -81,8 +95,10 @@ User-Agent: wonderbra 1.0") }}
<li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
<li>JSON - Limit text to this JSON rule, using <a href="https://pypi.org/project/jsonpath-ng/">JSONPath</a>, prefix with <b>"json:"</b>, <a
href="https://jsonpath.com/" target="new">test your JSONPath here</a></li>
<li>XPATH - Limit text to this XPath rule, simply start with a forward-slash, example <b>//*[contains(@class, 'sametext')]</b>, <a
href="http://xpather.com/" target="new">test your XPath here</a></li>
</ul>
Please be sure that you thoroughly understand how to write CSS or JSONPath selector rules before filing an issue on GitHub! <a
Please be sure that you thoroughly understand how to write CSS or JSONPath, XPath selector rules before filing an issue on GitHub! <a
href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br/>
</span>
</div>
@@ -98,9 +114,6 @@ User-Agent: wonderbra 1.0") }}
</span>
</fieldset>
</div>
<div class="tab-pane-inner" id="triggers">
<fieldset>
<div class="pure-control-group">
{{ render_field(form.trigger_text, rows=5, placeholder="Some text to wait for in a line
@@ -113,6 +126,7 @@ User-Agent: wonderbra 1.0") }}
</div>
</fieldset>
</div>
<div id="actions">
<div class="pure-control-group">

View File

@@ -2,6 +2,7 @@
{% block content %}
<div class="edit-form">
<div class="box-wrap inner">
<form class="pure-form pure-form-stacked" action="{{url_for('scrub_page')}}" method="POST">
<fieldset>
<div class="pure-control-group">
@@ -30,6 +31,7 @@
</div>
</fieldset>
</form>
</div>
</div>
{% endblock %}

View File

@@ -2,7 +2,7 @@
{% block content %}
{% from '_helpers.jinja' import render_field %}
{% from '_common_fields.jinja' import render_notifications_field %}
{% from '_common_fields.jinja' import render_common_settings_form %}
<script type="text/javascript" src="{{url_for('static_content', group='js', filename='settings.js')}}" defer></script>
<script type="text/javascript" src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>
@@ -13,6 +13,7 @@
<li class="tab" id="default-tab"><a href="#general">General</a></li>
<li class="tab"><a href="#notifications">Notifications</a></li>
<li class="tab"><a href="#fetching">Fetching</a></li>
<li class="tab"><a href="#filters">Global Filters</a></li>
</ul>
</div>
<div class="box-wrap inner">
@@ -36,7 +37,7 @@
{{ render_field(form.base_url, placeholder="http://yoursite.com:5000/",
class="m-d") }}
<span class="pure-form-message-inline">
Base URL used for the {base_url} token in notifications, default value is the ENV var 'base_url',
Base URL used for the {base_url} token in notifications and RSS links.<br/>Default value is the ENV var 'BASE_URL' (Currently "{{current_base_url}}"),
<a href="https://github.com/dgtlmoon/changedetection.io/wiki/Configurable-BASE_URL-setting">read more here</a>.
</span>
</div>
@@ -51,7 +52,7 @@
<div class="tab-pane-inner" id="notifications">
<fieldset>
<div class="field-group">
{{ render_notifications_field(form) }}
{{ render_common_settings_form(form, current_base_url) }}
</div>
</fieldset>
</div>
@@ -61,10 +62,33 @@
{{ render_field(form.fetch_backend) }}
<span class="pure-form-message-inline">
<p>Use the <strong>Basic</strong> method (default) where your watched sites don't need Javascript to render.</p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server. </p>
<p>The <strong>Chrome/Javascript</strong> method requires a network connection to a running WebDriver+Chrome server, set by the ENV var 'WEBDRIVER_URL'. </p>
</span>
</div>
</div>
<div class="tab-pane-inner" id="filters">
<fieldset class="pure-group">
{{ render_field(form.ignore_whitespace) }}
<span class="pure-form-message-inline">Ignore whitespace, tabs and new-lines/line-feeds when considering if a change was detected.<br/>
<i>Note:</i> Changing this will change the status of your existing watches, possibily trigger alerts etc.
</span>
</fieldset>
<fieldset class="pure-group">
{{ render_field(form.global_ignore_text, rows=5, placeholder="Some text to ignore in a line
/some.regex\d{2}/ for case-INsensitive regex
") }}
<span class="pure-form-message-inline">Note: This is applied globally in addition to the per-watch rules.</span><br/>
<span class="pure-form-message-inline">Each line processed separately, any line matching will be ignored.<br/>
Regular Expression support, wrap the line in forward slash <b>/regex/</b>.
</span>
</fieldset>
</div>
<div id="actions">
<div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary">Save</button>

View File

@@ -49,7 +49,7 @@
<td class="title-col inline">{{watch.title if watch.title is not none and watch.title|length > 0 else watch.url}}
<a class="external" target="_blank" rel="noopener" href="{{ watch.url }}"></a>
{%if watch.fetch_backend == "html_webdriver" %}<img style="height: 1em; display:inline-block;" src="/static/images/Google-Chrome-icon.png" />{% endif %}
{%if watch.fetch_backend == "html_webdriver" %}<img style="height: 1em; display:inline-block;" src="static/images/Google-Chrome-icon.png" />{% endif %}
{% if watch.last_error is defined and watch.last_error != False %}
<div class="fetch-error">{{ watch.last_error }}</div>
@@ -92,9 +92,9 @@
all {% if active_tag%}in "{{active_tag}}"{%endif%}</a>
</li>
<li>
<a href="{{ url_for('index', tag=active_tag , rss=true)}}"><img id="feed-icon" src="{{url_for('static_content', group='images', filename='Generic_Feed-icon.svg')}}" height="15px"></a>
<a href="{{ url_for('rss', tag=active_tag , token=app_rss_token)}}"><img alt="RSS Feed" id="feed-icon" src="{{url_for('static_content', group='images', filename='Generic_Feed-icon.svg')}}" height="15"></a>
</li>
</ul>
</div>
</div>
{% endblock %}
{% endblock %}

View File

@@ -18,11 +18,11 @@ def cleanup(datastore_path):
'url-watches.json',
'notification.txt',
'count.txt',
'endpoint-content.txt']
'endpoint-content.txt'
]
for file in files:
try:
os.unlink("{}/{}".format(datastore_path, file))
x = 1
except FileNotFoundError:
pass

View File

@@ -59,7 +59,7 @@ def test_check_basic_change_detection_functionality(client, live_server):
assert b'unviewed' in res.data
# #75, and it should be in the RSS feed
res = client.get(url_for("index", rss="true"))
res = client.get(url_for("rss"))
expected_url = url_for('test_endpoint', _external=True)
assert b'<rss' in res.data
assert expected_url.encode('utf-8') in res.data

View File

@@ -0,0 +1,25 @@
#!/usr/bin/python3
import time
from flask import url_for
from urllib.request import urlopen
from . util import set_original_response, set_modified_response, live_server_setup
def test_backup(client, live_server):
live_server_setup(live_server)
# Give the endpoint time to spin up
time.sleep(1)
res = client.get(
url_for("get_backup"),
follow_redirects=True
)
# Should get the right zip content type
assert res.content_type == "application/zip"
# Should be PK/ZIP stream
assert res.data.count(b'PK') >= 2

View File

@@ -0,0 +1,30 @@
#!/usr/bin/python3
import time
from flask import url_for
from . util import live_server_setup
def test_trigger_functionality(client, live_server):
live_server_setup(live_server)
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
res = client.post(
url_for("import_page"),
data={"urls": "https://changedetection.io"},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.get(
url_for("api_clone", uuid="first"),
follow_redirects=True
)
assert b"Cloned." in res.data

View File

@@ -0,0 +1,38 @@
#!/usr/bin/python3
import time
from flask import url_for
from . util import live_server_setup
from ..html_tools import *
def test_setup(live_server):
live_server_setup(live_server)
def test_error_handler(client, live_server):
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint_403_error', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'Status Code 403' in res.data
assert bytes("just now".encode('utf-8')) in res.data

View File

@@ -1,80 +0,0 @@
import json
import time
from flask import url_for
from . util import set_original_response, set_modified_response, live_server_setup
# Hard to just add more live server URLs when one test is already running (I think)
# So we add our test here (was in a different file)
def test_headers_in_request(client, live_server):
live_server_setup(live_server)
# Add our URL to the import page
test_url = url_for('test_headers', _external=True)
# Add the test URL twice, we will check
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
cookie_header = '_ga=GA1.2.1022228332; cookie-preferences=analytics:accepted;'
# Add some headers to a request
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"fetch_backend": "html_requests",
"headers": "xxx:ooo\ncool:yeah\r\ncookie:"+cookie_header},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Give the thread time to pick up the first version
time.sleep(5)
# The service should echo back the request headers
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
# Flask will convert the header key to uppercase
assert b"Xxx:ooo" in res.data
assert b"Cool:yeah" in res.data
# The test call service will return the headers as the body
from html import escape
assert escape(cookie_header).encode('utf-8') in res.data
time.sleep(5)
# Re #137 - Examine the JSON index file, it should have only one set of headers entered
watches_with_headers = 0
with open('test-datastore/url-watches.json') as f:
app_struct = json.load(f)
for uuid in app_struct['watching']:
if (len(app_struct['watching'][uuid]['headers'])):
watches_with_headers += 1
# Should be only one with headers set
assert watches_with_headers==1

View File

@@ -151,3 +151,88 @@ def test_check_ignore_text_functionality(client, live_server):
res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_check_global_ignore_text_functionality(client, live_server):
sleep_time_for_fetch_thread = 3
ignore_text = "XXXXX\r\nYYYYY\r\nZZZZZ"
set_original_ignore_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# Goto the settings page, add our ignore text
res = client.post(
url_for("settings_page"),
data={
"minutes_between_check": 180,
"global_ignore_text": ignore_text,
'fetch_backend': "html_requests"
},
follow_redirects=True
)
assert b"Settings updated." in res.data
# Goto the edit page of the item, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"ignore_text": "something irrelevent but just to check", "url": test_url, 'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Check it saved
res = client.get(
url_for("settings_page"),
)
assert bytes(ignore_text.encode('utf-8')) in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'/test-endpoint' in res.data
# Make a change
set_modified_ignore_response()
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'/test-endpoint' in res.data
# Just to be sure.. set a regular modified change..
set_modified_original_ignore_response()
client.get(url_for("api_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' in res.data
res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data

View File

@@ -0,0 +1,96 @@
#!/usr/bin/python3
import time
from flask import url_for
from . util import live_server_setup
def test_setup(live_server):
live_server_setup(live_server)
# Should be the same as set_original_ignore_response() but with a little more whitespacing
def set_original_ignore_response_but_with_whitespace():
test_return_data = """<html>
<body>
Some initial text</br>
<p>
Which is across multiple lines</p>
<br>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def set_original_ignore_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
# If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_check_ignore_whitespace(client, live_server):
sleep_time_for_fetch_thread = 3
# Give the endpoint time to spin up
time.sleep(1)
set_original_ignore_response()
# Goto the settings page, add our ignore text
res = client.post(
url_for("settings_page"),
data={
"minutes_between_check": 180,
"ignore_whitespace": "y",
'fetch_backend': "html_requests"
},
follow_redirects=True
)
assert b"Settings updated." in res.data
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
time.sleep(sleep_time_for_fetch_thread)
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
set_original_ignore_response_but_with_whitespace()
time.sleep(sleep_time_for_fetch_thread)
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'/test-endpoint' in res.data

View File

@@ -5,6 +5,10 @@ from flask import url_for
from . util import live_server_setup
import pytest
def test_setup(live_server):
live_server_setup(live_server)
def test_unittest_inline_html_extract():
# So lets pretend that the JSON we want is inside some HTML
content="""
@@ -42,6 +46,45 @@ and it can also be repeated
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "$.id")
def set_original_ext_response():
data = """
[
{
"isPriceLowered": false,
"status": "ForSale",
"statusOrig": "for sale"
},
{
"_id": "5e7b3e1fb3262d306323ff1e",
"listingsType": "consumer",
"status": "ForSale",
"statusOrig": "for sale"
}
]
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(data)
def set_modified_ext_response():
data = """
[
{
"isPriceLowered": false,
"status": "Sold",
"statusOrig": "sold"
},
{
"_id": "5e7b3e1fb3262d306323ff1e",
"listingsType": "consumer",
"isPriceLowered": false,
"status": "Sold"
}
]
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(data)
def set_original_response():
test_return_data = """
@@ -60,7 +103,23 @@ def set_original_response():
],
"boss": {
"name": "Fat guy"
}
},
"available": true
}
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
return None
def set_response_with_html():
test_return_data = """
{
"test": [
{
"html": "<b>"
}
]
}
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
@@ -84,7 +143,8 @@ def set_modified_response():
],
"boss": {
"name": "Foobar"
}
},
"available": false
}
"""
@@ -93,11 +153,38 @@ def set_modified_response():
return None
def test_check_json_without_filter(client, live_server):
# Request a JSON document from a application/json source containing HTML
# and be sure it doesn't get chewed up by instriptis
set_response_with_html()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint_json', _external=True)
client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b'&#34;&lt;b&gt;' in res.data
assert res.data.count(b'{\n') >= 2
def test_check_json_filter(client, live_server):
live_server_setup(live_server)
json_filter = 'json:boss.name'
set_original_response()
@@ -161,3 +248,126 @@ def test_check_json_filter(client, live_server):
res = client.get(url_for("diff_history_page", uuid="first"))
# But the change should be there, tho its hard to test the change was detected because it will show old and new versions
assert b'Foobar' in res.data
def test_check_json_filter_bool_val(client, live_server):
json_filter = "json:$['available']"
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Goto the edit page, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"css_filter": json_filter,
"url": test_url,
"tag": "",
"headers": "",
"fetch_backend": "html_requests"
},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
# Make a change
set_modified_response()
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
res = client.get(url_for("diff_history_page", uuid="first"))
# But the change should be there, tho its hard to test the change was detected because it will show old and new versions
assert b'false' in res.data
# Re #265 - Extended JSON selector test
# Stuff to consider here
# - Selector should be allowed to return empty when it doesnt match (people might wait for some condition)
# - The 'diff' tab could show the old and new content
# - Form should let us enter a selector that doesnt (yet) match anything
def test_check_json_ext_filter(client, live_server):
json_filter = 'json:$[?(@.status==Sold)]'
set_original_ext_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
# Goto the edit page, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"css_filter": json_filter,
"url": test_url,
"tag": "",
"headers": "",
"fetch_backend": "html_requests"
},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Check it saved
res = client.get(
url_for("edit_page", uuid="first"),
)
assert bytes(json_filter.encode('utf-8')) in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(3)
# Make a change
set_modified_ext_response()
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(4)
# It should have 'unviewed'
res = client.get(url_for("index"))
assert b'unviewed' in res.data
res = client.get(url_for("diff_history_page", uuid="first"))
# We should never see 'ForSale' because we are selecting on 'Sold' in the rule,
# But we should know it triggered ('unviewed' assert above)
assert b'ForSale' not in res.data
assert b'Sold' in res.data

View File

@@ -55,7 +55,10 @@ def test_check_notification(client, live_server):
"Preview: {preview_url}\n"
"Diff URL: {diff_url}\n"
"Snapshot: {current_snapshot}\n"
"Diff: {diff}\n"
"Diff Full: {diff_full}\n"
":-)",
"notification_format": "Text",
"url": test_url,
"tag": "my tag",
"title": "my title",
@@ -65,7 +68,7 @@ def test_check_notification(client, live_server):
follow_redirects=True
)
assert b"Updated watch." in res.data
assert b"Notifications queued" in res.data
assert b"Test notification queued" in res.data
# Hit the edit page, be sure that we saved it
res = client.get(
@@ -113,6 +116,11 @@ def test_check_notification(client, live_server):
assert test_url in notification_submission
# Diff was correctly executed
assert "Diff Full: (changed) Which is across multiple lines" in notification_submission
assert "(-> into) which has this one new line" in notification_submission
if env_base_url:
# Re #65 - did we see our BASE_URl ?
logging.debug (">>> BASE_URL checking in notification: %s", env_base_url)
@@ -135,7 +143,7 @@ def test_check_notification(client, live_server):
)
assert b"Settings updated." in res.data
# Re #143 - should not see this if we didnt hit the test box
assert b"Notifications queued" not in res.data
assert b"Test notification queued" not in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
@@ -151,6 +159,9 @@ def test_check_notification(client, live_server):
with open("test-datastore/notification.txt", "r") as f:
notification_submission = f.read()
print ("Notification submission was:", notification_submission)
# Re #342 - check for accidental python byte encoding of non-utf8/string
assert "b'" not in notification_submission
assert re.search('Watch UUID: [0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}', notification_submission, re.IGNORECASE)
assert "Watch title: my title" in notification_submission
@@ -181,6 +192,7 @@ def test_check_notification(client, live_server):
url_for("settings_page"),
data={"notification_title": "New ChangeDetection.io Notification - {watch_url}",
"notification_body": "Rubbish: {rubbish}\n",
"notification_format": "Text",
"notification_urls": "json://foobar.com",
"minutes_between_check": 180,
"fetch_backend": "html_requests"
@@ -188,4 +200,4 @@ def test_check_notification(client, live_server):
follow_redirects=True
)
assert bytes("is not a valid token".encode('utf-8')) in res.data
assert bytes("is not a valid token".encode('utf-8')) in res.data

View File

@@ -0,0 +1,211 @@
import json
import time
from flask import url_for
from . util import set_original_response, set_modified_response, live_server_setup
def test_setup(live_server):
live_server_setup(live_server)
# Hard to just add more live server URLs when one test is already running (I think)
# So we add our test here (was in a different file)
def test_headers_in_request(client, live_server):
# Add our URL to the import page
test_url = url_for('test_headers', _external=True)
# Add the test URL twice, we will check
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
cookie_header = '_ga=GA1.2.1022228332; cookie-preferences=analytics:accepted;'
# Add some headers to a request
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"fetch_backend": "html_requests",
"headers": "xxx:ooo\ncool:yeah\r\ncookie:"+cookie_header},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Give the thread time to pick up the first version
time.sleep(5)
# The service should echo back the request headers
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
# Flask will convert the header key to uppercase
assert b"Xxx:ooo" in res.data
assert b"Cool:yeah" in res.data
# The test call service will return the headers as the body
from html import escape
assert escape(cookie_header).encode('utf-8') in res.data
time.sleep(5)
# Re #137 - Examine the JSON index file, it should have only one set of headers entered
watches_with_headers = 0
with open('test-datastore/url-watches.json') as f:
app_struct = json.load(f)
for uuid in app_struct['watching']:
if (len(app_struct['watching'][uuid]['headers'])):
watches_with_headers += 1
# Should be only one with headers set
assert watches_with_headers==1
def test_body_in_request(client, live_server):
# Add our URL to the import page
test_url = url_for('test_body', _external=True)
# Add the test URL twice, we will check
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
body_value = 'Test Body Value'
# Attempt to add a body with a GET method
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"method": "GET",
"fetch_backend": "html_requests",
"body": "invalid"},
follow_redirects=True
)
assert b"Body must be empty when Request Method is set to GET" in res.data
# Add a properly formatted body with a proper method
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"method": "POST",
"fetch_backend": "html_requests",
"body": body_value},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Give the thread time to pick up the first version
time.sleep(5)
# The service should echo back the body
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
# Check if body returned contains the specified data
assert str.encode(body_value) in res.data
watches_with_body = 0
with open('test-datastore/url-watches.json') as f:
app_struct = json.load(f)
for uuid in app_struct['watching']:
if app_struct['watching'][uuid]['body']==body_value:
watches_with_body += 1
# Should be only one with body set
assert watches_with_body==1
def test_method_in_request(client, live_server):
# Add our URL to the import page
test_url = url_for('test_method', _external=True)
# Add the test URL twice, we will check
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Attempt to add a method which is not valid
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"fetch_backend": "html_requests",
"method": "invalid"},
follow_redirects=True
)
assert b"Not a valid choice" in res.data
# Add a properly formatted body
res = client.post(
url_for("edit_page", uuid="first"),
data={
"url": test_url,
"tag": "",
"fetch_backend": "html_requests",
"method": "PATCH"},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Give the thread time to pick up the first version
time.sleep(5)
# The service should echo back the request verb
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
# The test call service will return the verb as the body
assert b"PATCH" in res.data
time.sleep(5)
watches_with_method = 0
with open('test-datastore/url-watches.json') as f:
app_struct = json.load(f)
for uuid in app_struct['watching']:
if app_struct['watching'][uuid]['method'] == 'PATCH':
watches_with_method += 1
# Should be only one with method set to PATCH
assert watches_with_method == 1

View File

@@ -0,0 +1,118 @@
#!/usr/bin/python3
import time
from flask import url_for
from . util import live_server_setup
from ..html_tools import *
def test_setup(live_server):
live_server_setup(live_server)
def set_original_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. </br>
<div class="sametext">Some text thats the same</div>
<div class="changetext">Some text that will change</div>
</body>
</html>
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
return None
def set_modified_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. THIS CHANGES AND SHOULDNT TRIGGER A CHANGE</br>
<div class="sametext">Some text thats the same</div>
<div class="changetext">Some new text</div>
</body>
</html>
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
return None
def test_check_markup_xpath_filter_restriction(client, live_server):
sleep_time_for_fetch_thread = 3
xpath_filter = "//*[contains(@class, 'sametext')]"
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# Goto the edit page, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"css_filter": xpath_filter, "url": test_url, "tag": "", "headers": "", 'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# view it/reset state back to viewed
client.get(url_for("diff_history_page", uuid="first"), follow_redirects=True)
# Make a change
set_modified_response()
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
def test_xpath_validation(client, live_server):
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
res = client.post(
url_for("edit_page", uuid="first"),
data={"css_filter": "/something horrible", "url": test_url, "tag": "", "headers": "", 'fetch_backend': "html_requests"},
follow_redirects=True
)
assert b"is not a valid XPath expression" in res.data

View File

@@ -0,0 +1 @@
"""Unit tests for the app."""

View File

@@ -0,0 +1,5 @@
# What is this?
This is test content for the python diff engine, we use the JS interface for the front end, because you can explore
differences in words etc, but we use (at the moment) the python difflib engine.
This content `before.txt` and `after.txt` is for unit testing

View File

@@ -0,0 +1,6 @@
After twenty years, as cursed as I may be
for having learned computerese,
I continue to examine bits, bytes and words
xok
and insure that I'm one of those computer nerds.
and something new

View File

@@ -0,0 +1,5 @@
After twenty years, as cursed as I may be
for having learned computerese,
I continue to examine bits, bytes and words
ok
and insure that I'm one of those computer nerds.

View File

@@ -0,0 +1,25 @@
#!/usr/bin/python3
# run from dir above changedetectionio/ dir
# python3 -m unittest changedetectionio.tests.unit.test_notification_diff
import unittest
import os
from changedetectionio import diff
# mostly
class TestDiffBuilder(unittest.TestCase):
def test_expected_diff_output(self):
base_dir=os.path.dirname(__file__)
output = diff.render_diff(base_dir+"/test-content/before.txt", base_dir+"/test-content/after.txt")
output = output.split("\n")
self.assertIn("(changed) ok", output)
self.assertIn("(-> into) xok", output)
self.assertIn("(added) and something new", output)
# @todo test blocks of changed, blocks of added, blocks of removed
if __name__ == '__main__':
unittest.main()

View File

@@ -44,6 +44,23 @@ def live_server_setup(live_server):
with open("test-datastore/endpoint-content.txt", "r") as f:
return f.read()
@live_server.app.route('/test-endpoint-json')
def test_endpoint_json():
from flask import make_response
with open("test-datastore/endpoint-content.txt", "r") as f:
resp = make_response(f.read())
resp.headers['Content-Type'] = 'application/json'
return resp
@live_server.app.route('/test-403')
def test_endpoint_403_error():
from flask import make_response
resp = make_response('', 403)
return resp
# Just return the headers in the request
@live_server.app.route('/test-headers')
def test_headers():
@@ -56,6 +73,21 @@ def live_server_setup(live_server):
return "\n".join(output)
# Just return the body in the request
@live_server.app.route('/test-body', methods=['POST', 'GET'])
def test_body():
from flask import request
return request.data
# Just return the verb in the request
@live_server.app.route('/test-method', methods=['POST', 'GET', 'PATCH'])
def test_method():
from flask import request
return request.method
# Where we POST to as a notification
@live_server.app.route('/test_notification_endpoint', methods=['POST', 'GET'])

View File

@@ -2,7 +2,12 @@ import threading
import queue
import time
# Requests for checking on the site use a pool of thread Workers managed by a Queue.
# A single update worker
#
# Requests for checking on a single site(watch) from a queue of watches
# (another process inserts watches into the queue that are time-ready for checking)
class update_worker(threading.Thread):
current_uuid = None
@@ -34,52 +39,62 @@ class update_worker(threading.Thread):
changed_detected = False
contents = ""
update_obj= {}
now = time.time()
try:
now = time.time()
changed_detected, update_obj, contents = update_handler.run(uuid)
# Always record that we atleast tried
self.datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - now, 3)})
# Re #342
# In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.
# We then convert/.decode('utf-8') for the notification etc
if not isinstance(contents, (bytes, bytearray)):
raise Exception("Error - returned data from the fetch handler SHOULD be bytes")
except PermissionError as e:
self.app.logger.error("File permission error updating", uuid, str(e))
except content_fetcher.EmptyReply as e:
self.datastore.update_watch(uuid=uuid, update_obj={'last_error':str(e)})
# Some kind of custom to-str handler in the exception handler that does this?
err_text = "EmptyReply: Status Code {}".format(e.status_code)
self.datastore.update_watch(uuid=uuid, update_obj={'last_error': err_text,
'last_check_status': e.status_code})
except Exception as e:
self.app.logger.error("Exception reached processing watch UUID:%s - %s", uuid, str(e))
self.datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})
else:
if update_obj:
try:
self.datastore.update_watch(uuid=uuid, update_obj=update_obj)
if changed_detected:
try:
watch = self.datastore.data['watching'][uuid]
fname = "" # Saved history text filename
# A change was detected
newest_version_file_contents = ""
fname = self.datastore.save_history_text(watch_uuid=uuid, contents=contents)
# For the FIRST time we check a site, or a change detected, save the snapshot.
if changed_detected or not watch['last_checked']:
# A change was detected
fname = self.datastore.save_history_text(watch_uuid=uuid, contents=contents)
# Should always be keyed by string(timestamp)
self.datastore.update_watch(uuid, {"history": {str(round(time.time())): fname}})
# Update history with the stripped text for future reference, this will also mean we save the first
# Should always be keyed by string(timestamp)
self.datastore.update_watch(uuid, {"history": {str(update_obj["last_checked"]): fname}})
# Generally update anything interesting returned
self.datastore.update_watch(uuid=uuid, update_obj=update_obj)
watch = self.datastore.data['watching'][uuid]
# A change was detected
if changed_detected:
n_object = {}
print (">> Change detected in UUID {} - {}".format(uuid, watch['url']))
print (">> Change detected in UUID {} - {}".format(uuid, watch['url']))
# Notifications should only trigger on the second time (first time, we gather the initial snapshot)
if len(watch['history']) > 1:
# Get the newest snapshot data to be possibily used in a notification
newest_key = self.datastore.get_newest_history_key(uuid)
if newest_key:
with open(watch['history'][newest_key], 'r') as f:
newest_version_file_contents = f.read().strip()
dates = list(watch['history'].keys())
# Convert to int, sort and back to str again
# @todo replace datastore getter that does this automatically
dates = [int(i) for i in dates]
dates.sort(reverse=True)
dates = [str(i) for i in dates]
prev_fname = watch['history'][dates[1]]
n_object = {
'watch_url': watch['url'],
'uuid': uuid,
'current_snapshot': newest_version_file_contents
}
# Did it have any notification alerts to hit?
if len(watch['notification_urls']):
@@ -87,7 +102,7 @@ class update_worker(threading.Thread):
n_object['notification_urls'] = watch['notification_urls']
n_object['notification_title'] = watch['notification_title']
n_object['notification_body'] = watch['notification_body']
self.notification_q.put(n_object)
n_object['notification_format'] = watch['notification_format']
# No? maybe theres a global setting, queue them all
elif len(self.datastore.data['settings']['application']['notification_urls']):
@@ -95,12 +110,37 @@ class update_worker(threading.Thread):
n_object['notification_urls'] = self.datastore.data['settings']['application']['notification_urls']
n_object['notification_title'] = self.datastore.data['settings']['application']['notification_title']
n_object['notification_body'] = self.datastore.data['settings']['application']['notification_body']
self.notification_q.put(n_object)
n_object['notification_format'] = self.datastore.data['settings']['application']['notification_format']
else:
print(">>> NO notifications queued, watch and global notification URLs were empty.")
except Exception as e:
print("!!!! Exception in update_worker !!!\n", e)
# Only prepare to notify if the rules above matched
if 'notification_urls' in n_object:
# HTML needs linebreak, but MarkDown and Text can use a linefeed
if n_object['notification_format'] == 'HTML':
line_feed_sep = "</br>"
else:
line_feed_sep = "\n"
from changedetectionio import diff
n_object.update({
'watch_url': watch['url'],
'uuid': uuid,
'current_snapshot': contents.decode('utf-8'),
'diff_full': diff.render_diff(prev_fname, fname, line_feed_sep=line_feed_sep),
'diff': diff.render_diff(prev_fname, fname, True, line_feed_sep=line_feed_sep)
})
self.notification_q.put(n_object)
except Exception as e:
# Catch everything possible here, so that if a worker crashes, we don't lose it until restart!
print("!!!! Exception in update_worker !!!\n", e)
finally:
# Always record that we atleast tried
self.datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - now, 3),
'last_checked': round(time.time())})
self.current_uuid = None # Done
self.q.task_done()

View File

@@ -1,7 +1,7 @@
version: '2'
services:
changedetection.io:
image: dgtlmoon/changedetection.io
image: ghcr.io/dgtlmoon/changedetection.io
container_name: changedetection.io
hostname: changedetection.io
volumes:
@@ -13,13 +13,23 @@ services:
# - PUID=1000
# - PGID=1000
# # Alternative WebDriver/selenium URL, do not use "'s or 's!
#
# Alternative WebDriver/selenium URL, do not use "'s or 's!
# - WEBDRIVER_URL=http://browser-chrome:4444/wd/hub
# Proxy support example.
#
# WebDriver proxy settings webdriver_proxyType, webdriver_ftpProxy, webdriver_httpProxy, webdriver_noProxy,
# webdriver_proxyAutoconfigUrl, webdriver_sslProxy, webdriver_autodetect,
# webdriver_socksProxy, webdriver_socksUsername, webdriver_socksVersion, webdriver_socksPassword
#
# https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.proxy
#
# Plain requsts - proxy support example.
# - HTTP_PROXY=socks5h://10.10.1.10:1080
# - HTTPS_PROXY=socks5h://10.10.1.10:1080
#
# An exclude list (useful for notification URLs above) can be specified by with
# - NO_PROXY="localhost,192.168.0.0/24"
#
# Base URL of your changedetection.io install (Added to the notification alert)
# - BASE_URL=https://mysite.com
@@ -33,13 +43,17 @@ services:
restart: unless-stopped
# Used for fetching pages via WebDriver+Chrome where you need Javascript support.
# Does not work on rPi, https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver
# Now working on arm64 (needs testing on rPi - tested on Oracle ARM instance)
# replace image with seleniarm/standalone-chromium:4.0.0-20211213
# browser-chrome:
# hostname: browser-chrome
# image: selenium/standalone-chrome-debug:3.141.59
# environment:
# - VNC_NO_PASSWORD=1
# - SCREEN_WIDTH=1920
# - SCREEN_HEIGHT=1080
# - SCREEN_DEPTH=24
# volumes:
# # Workaround to avoid the browser crashing inside a docker container
# # See https://github.com/SeleniumHQ/docker-selenium#quick-start

View File

@@ -1,5 +1,2 @@
build:
docker:
changedetection: Dockerfile
run:
changedetection: python ./changedetection.py -d /datastore -p $PORT
changedetection: python3 ./changedetection.py -C -d ./datastore -p $PORT

View File

@@ -1,26 +1,36 @@
chardet==2.3.0
flask~= 1.0
flask~= 2.0
eventlet>=0.31.0
requests[socks] ~= 2.15
validators
timeago ~=1.0
inscriptis ~= 1.2
feedgen ~= 0.9
flask-login ~= 0.5
pytz
urllib3
# Set these versions together to avoid a RequestsDependencyWarning
requests[socks] ~= 2.26
urllib3 > 1.26
chardet > 2.3.0
wtforms ~= 2.3.3
jsonpath-ng ~= 1.5.3
# Notification library
apprise ~= 0.9
# apprise mqtt https://github.com/dgtlmoon/changedetection.io/issues/315
paho-mqtt
# Pinned version of cryptography otherwise
# ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
cryptography ~= 3.4
# Used for CSS filtering, replace with soupsieve and lxml for xpath
# Used for CSS filtering
bs4
selenium ~= 3.141
# XPath filtering, lxml is required by bs4 anyway, but put it here to be safe.
lxml
# 3.141 was missing socksVersion, 3.150 was not in pypi, so we try 4.1.0
selenium ~= 4.1.0

1
runtime.txt Normal file
View File

@@ -0,0 +1 @@
python-3.8.12