Compare commits

..

117 Commits
0.22 ... 0.29

Author SHA1 Message Date
dgtlmoon
64e172433a docker-compose for dev not needed (use venv etc) 2021-04-30 16:54:07 +10:00
dgtlmoon
92c0fa90ee Password protection / login support (#34)
Issue #24 Password login  hashlib.pbkdf2_hmac implementation
2021-04-30 16:47:13 +10:00
dgtlmoon
ee8053e0e8 Update FUNDING.yml 2021-04-21 11:13:50 +10:00
dgtlmoon
7f5b592f6f Skip using tag limit on pause when no tag is being viewed 2021-04-16 10:29:03 +10:00
dgtlmoon
1e45156bc0 Pause/Unpause should respect limit tag on redirect 2021-04-10 19:47:31 +09:30
dgtlmoon
c7169ebba1 Validate duplicate URLs 2021-04-10 14:31:57 +09:30
dgtlmoon
a58679f983 Chdir is not needed because we add the file from the full path, but make it 'relative' in the Zip 2021-04-09 04:50:55 +02:00
dgtlmoon
661542b056 Fix backup generation on relative paths (like when run outside docker, under venv, etc) 2021-04-09 04:49:50 +02:00
dgtlmoon
2ea48cb90a Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-04-04 06:32:04 +02:00
dgtlmoon
2a80022cd9 Adding noopener per CodeQL, stop pages from knowing the referer etc 2021-04-04 06:31:42 +02:00
dgtlmoon
8861f70ac4 Create codeql-analysis.yml 2021-04-04 06:27:32 +02:00
dgtlmoon
07113216d5 yarl not needed, lock requests version 2021-04-03 10:28:11 +02:00
dgtlmoon
02062c5893 dev packages needed, drop apt cache 2021-04-03 09:05:02 +02:00
dgtlmoon
a11f09062b See if we get a clean buildx without dev packages 2021-04-03 08:45:24 +02:00
dgtlmoon
0bb48cbd43 Tweaking build size thanks to https://github.com/hadolint/hadolint 2021-04-03 08:04:42 +02:00
dgtlmoon
7109a17a8e Adding dockerignore 2021-04-03 07:59:22 +02:00
dgtlmoon
4ed026aba6 Re #18 - Show "preview" of the page when only one revision exists (#33) 2021-04-03 05:55:43 +02:00
dgtlmoon
3b79f8ed4e Update README.md 2021-04-02 05:00:58 +02:00
dgtlmoon
5d02c4fe6f Update README.md 2021-04-02 04:58:49 +02:00
dgtlmoon
f2b06c63bf Also check that the watch is not paused before putting it into the checking queuex 2021-04-02 03:58:23 +02:00
dgtlmoon
ab6f4d11ed revert c60be56271 2021-04-02 03:07:36 +02:00
dgtlmoon
5311a95140 remove extra packages (#32)
* remove extra packages

* add test only workflow
2021-04-02 02:57:48 +02:00
dgtlmoon
fb723c264d Bumping version to 0.28 2021-04-01 14:43:46 +02:00
dgtlmoon
3ad722d63c Docker push amd64 rpi etc (#28)
* trying multiarch docker hub push on build, similar to https://github.com/dgtlmoon/changedetection.io/pull/25/files

* Adding image builder

* Include our dev branch

* Tweak buildx

* dont use alias

* Finally found the right info at https://docs.docker.com/ci-cd/github-actions/

* Updated from https://github.com/razorpay/docker-build-push-action

* Teaks to build

* Tweaks

* Minor tweaks to version

* tweaks

* Remove version

* Remove old workflow

* syntax cleanup
2021-04-01 14:10:23 +02:00
dgtlmoon
9c16695932 Open [diff] links into their own window 2021-04-01 12:57:47 +02:00
dgtlmoon
35fc76c02c Fix auto jump on viewing the diff 2021-04-01 12:53:19 +02:00
dgtlmoon
934d8c6211 Re #30 - Delete history watch snapshots (#31)
Re #30 - Delete history watch snapshots  Scrub - Optionally delete history snapshots newer than timestamp
2021-04-01 12:01:42 +02:00
dgtlmoon
294256d5c3 Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-03-29 18:38:20 +02:00
dgtlmoon
b7efdfd52c Slow down the DB write interval and catch the case that it changed during write 2021-03-29 18:37:03 +02:00
dgtlmoon
6a78b5ad1d Immediately 'jump' to the change 2021-03-29 18:36:50 +02:00
dgtlmoon
98f3e61314 Tweak to hover pause icon 2021-03-29 18:36:31 +02:00
dgtlmoon
e322c44d3e Stop runtime error on dict changing during write/init at start (#27)
* Lock datastore when writing

* Racecase fix

* Tweaks to locking (add delay)
2021-03-29 18:23:13 +02:00
dgtlmoon
7b226e1d54 Merge pull request #26 from dgtlmoon/pause
Re #22 - ability to pause
2021-03-29 16:14:16 +02:00
dgtlmoon
35e597a4c8 Re #22 - ability to pause 2021-03-29 16:11:22 +02:00
dgtlmoon
0a1a8340c2 Re #23 - always check value of interval time, not just on start 2021-03-29 15:04:15 +02:00
dgtlmoon
8b5cd40593 Update README.md 2021-03-26 11:07:06 +01:00
dgtlmoon
7d978a6e65 Merge pull request #19 from dgtlmoon/markdown-tweak
Use absolute image links so the screenshots work from docker hub
2021-03-04 09:59:37 +01:00
dgtlmoon
fdab52d400 Use absolute image links so the screenshots work from docker hub 2021-03-04 09:58:58 +01:00
dgtlmoon
782795310f Update README.md
Removing text that is tricky to maintain and confusing
2021-03-03 09:01:14 +01:00
Leigh Morresi
2280e6d497 Updating screenshot 2021-03-01 16:12:30 +01:00
Leigh Morresi
822f3e6d20 Reuse the GUID if we have one 2021-03-01 16:01:53 +01:00
dgtlmoon
35546c331c Merge pull request #15 from dgtlmoon/dev
Prepare 0.27
2021-03-01 15:50:25 +01:00
Leigh Morresi
982a0d7781 Dont show 'empty' tag, it will be in the [ALL] list 2021-03-01 15:44:34 +01:00
Leigh Morresi
c5c3e8c6c2 Adding RSS feed icon 2021-03-01 15:39:36 +01:00
Leigh Morresi
ff1b19cdb8 Generic object sync should use private method 2021-03-01 15:32:59 +01:00
Leigh Morresi
df96b8d76c Add missing urllib3 2021-03-01 15:21:15 +01:00
Leigh Morresi
89134b5b6c Add missing pytz 2021-03-01 15:11:03 +01:00
Leigh Morresi
b31bf34890 Check for new version 2021-03-01 15:09:37 +01:00
Leigh Morresi
5b2fda1a6e Fix import form flow logic 2021-03-01 14:33:25 +01:00
Leigh Morresi
fb38b06eae Code tidy/lint 2021-03-01 14:31:45 +01:00
Leigh Morresi
e0578acca2 Tidy up thread logic and version check 2021-03-01 14:29:21 +01:00
Leigh Morresi
187523d8d6 Add missing dep 2021-03-01 12:45:56 +01:00
Leigh Morresi
b0975694c8 Remove todos 2021-03-01 11:52:29 +01:00
Leigh Morresi
b1fb47e689 Add icon for RSS, RSS should show only unviewed entries 2021-03-01 11:51:28 +01:00
Leigh Morresi
a82e9243a6 Issue #7 - RSS feeds 2021-03-01 11:25:04 +01:00
Leigh Morresi
e3e36b3cef Always override tag version (load from disk in future, so we can add it at build time) 2021-02-27 23:20:40 +01:00
Leigh Morresi
cd6465f844 next dev is 0.27 2021-02-27 22:49:56 +01:00
Leigh Morresi
30d53c353f Tweak to tests 2021-02-27 22:09:25 +01:00
Leigh Morresi
47fcb8b4f8 Move logic 2021-02-27 22:01:42 +01:00
Leigh Morresi
0ec9edb971 Remove erroneous extra liveserver setup 2021-02-27 20:30:36 +01:00
Leigh Morresi
f1da8f96b6 When new ignore text is specified, reprocess the checksum 2021-02-27 20:30:06 +01:00
Leigh Morresi
8bc7b5be40 Adding filter and log output to pytest 2021-02-27 20:29:52 +01:00
Leigh Morresi
022826493b Fix edit action link 2021-02-27 20:29:01 +01:00
Leigh Morresi
092f77f066 Minor lint cleanup 2021-02-27 09:38:51 +01:00
Leigh Morresi
013cbcabd4 Clean up after test case 2021-02-27 09:37:40 +01:00
Leigh Morresi
66be95ecc6 Remove liveserver, doesnt belong here 2021-02-27 09:08:25 +01:00
Leigh Morresi
efe0356f37 Fix syntax, Triggers the workflow on push or pull request events 2021-02-27 09:06:54 +01:00
Leigh Morresi
ec1ac300af Activate workflow on all branches 2021-02-27 09:05:25 +01:00
Leigh Morresi
468184bc3a Issue #14 - Tweaks to edit, create ignore text, tests for ignore text, integrate ignore text 2021-02-26 20:07:26 +01:00
Leigh Morresi
0855017dca Validation of added headers, should contain key/val (2 parts) 2021-02-26 16:52:14 +01:00
Leigh Morresi
ae0f640ff4 Issue #12 include version for easy reference. 2021-02-24 14:44:35 +01:00
Leigh Morresi
cd6629ac2d Bring dev environment inline 2021-02-24 14:44:28 +01:00
Leigh Morresi
3c3ca7944b Tidying up requirements.txt 2021-02-24 14:44:13 +01:00
Leigh Morresi
b0fb52017c Handle the case of someone supplying a bad link 2021-02-24 09:56:29 +01:00
Leigh Morresi
fc6fba377a Merge branch 'master' of github.com:dgtlmoon/changedetection.io 2021-02-24 09:53:58 +01:00
Leigh Morresi
7ea39ada7c Adding jump to next change diff widget 2021-02-24 09:53:40 +01:00
dgtlmoon
e98ea37342 Moving nice screenshot to above the fold :) 2021-02-22 16:39:04 +01:00
dgtlmoon
e20577df15 Adding docker hub badge for tag information 2021-02-22 14:48:57 +01:00
Leigh Morresi
19dcbc2f08 Bumping schema tag to 0.25 2021-02-22 08:53:04 +01:00
Leigh Morresi
c59838a6e4 Issue #5 - Remove arbitrary '600' minutes limit 2021-02-22 08:38:41 +01:00
Leigh Morresi
0a8c339535 Add test delay for github action test 2021-02-21 21:08:04 +01:00
Leigh Morresi
cd5b703037 Add wait for threads in test 2021-02-21 20:54:15 +01:00
Leigh Morresi
90642742bd Extending tests to cover resetting the diff/unviewed status correctly 2021-02-21 20:46:56 +01:00
Leigh Morresi
96221598e7 Tidy up return logic 2021-02-21 20:23:50 +01:00
Leigh Morresi
98623de38c Code tidy 2021-02-21 20:14:35 +01:00
Leigh Morresi
33985dbd9d Fix docker app files paths 2021-02-21 16:31:42 +01:00
Leigh Morresi
a3a5ca78bf Tweaking Dockerfile for new eventlet wrapper 2021-02-21 16:13:55 +01:00
dgtlmoon
3fcbbb3fbf Create LICENSE 2021-02-21 15:42:45 +01:00
dgtlmoon
70252b24f9 Adding docker pulls counter badge 2021-02-21 15:39:17 +01:00
dgtlmoon
0a08616c87 Merge pull request #11 from dgtlmoon/pytest
Separate flask from eventlet runtime and get pytest working
2021-02-21 15:22:54 +01:00
Leigh Morresi
beebba487c Use master branch for badge 2021-02-21 15:21:30 +01:00
Leigh Morresi
cbeafcbaa0 Removing unused import 2021-02-21 14:26:58 +01:00
Leigh Morresi
e200cd3289 Fixing a few more easy lint wins 2021-02-21 14:26:19 +01:00
Leigh Morresi
22c7a1a88d Merge branch 'pytest' of github.com:dgtlmoon/changedetection.io into pytest 2021-02-21 14:21:45 +01:00
Leigh Morresi
63eea2d6db Linting fixups 2021-02-21 14:21:14 +01:00
dgtlmoon
3e9a110671 Update README.md 2021-02-21 14:15:21 +01:00
Leigh Morresi
22bc8fabd1 Add badge under pytest branch 2021-02-21 14:14:27 +01:00
Leigh Morresi
9030070b3d Merge branch 'master' into pytest 2021-02-21 14:09:49 +01:00
dgtlmoon
fca7bb8583 Create python-app.yml 2021-02-21 14:09:34 +01:00
Leigh Morresi
3c175bfc4a Create the test datastore 2021-02-21 14:08:34 +01:00
Leigh Morresi
fd5475ba38 Minor cleanup 2021-02-21 14:05:52 +01:00
Leigh Morresi
b0c5dbd88e Just use the current/previous md5 2021-02-21 13:46:16 +01:00
Leigh Morresi
1718e2e86f Finalse pytest methods 2021-02-21 13:41:00 +01:00
Leigh Morresi
b46a7fc4b1 Port should be an integer 2021-02-21 13:40:48 +01:00
Leigh Morresi
4770ebb2ea Tweaking client 2021-02-16 21:48:38 +01:00
Leigh Morresi
d4db082c01 remove unused imports 2021-02-16 21:44:44 +01:00
Leigh Morresi
c8607ae8bb Use session/client fixture 2021-02-16 21:42:26 +01:00
Leigh Morresi
b361a61d18 Addingmissing files 2021-02-16 21:36:41 +01:00
Leigh Morresi
87f4347fe5 hack of pytest implementation - doesnt work yet 2021-02-16 21:35:28 +01:00
Leigh Morresi
93ee65fe53 Tidy up a few broken datastore paths 2021-02-12 19:43:05 +01:00
Leigh Morresi
9f964b6d3f WIP, separate out the Flask from everything else, get pytest working 2021-02-12 19:24:30 +01:00
Leigh Morresi
426b09b7e1 Make records in the overview that have a difference that have not been viewed in the [diff] tab bold 2021-02-11 10:36:54 +01:00
Leigh Morresi
ec98415c4d Adding 0.24 tag 2021-02-05 18:46:00 +01:00
Leigh Morresi
47e5a7cf09 Avoid accidently using Python's objects that are copied - but land as a 'soft reference', need to use a better dict struct in the future #6 2021-02-05 18:43:35 +01:00
Leigh Morresi
d07cf53a07 Minor fix to 'last changed' field, simplify template and logic 2021-02-04 13:15:39 +01:00
Leigh Morresi
b9f73a6240 Remove debug print 2021-02-04 12:55:13 +01:00
Leigh Morresi
5e31ae86d0 Use a thread locker and cleaner separation of concerns between main thread and site status fetch 2021-02-04 12:38:48 +01:00
38 changed files with 2328 additions and 736 deletions

2
.dockerignore Normal file
View File

@@ -0,0 +1,2 @@
.git
.github

9
.github/FUNDING.yml vendored
View File

@@ -1,12 +1,3 @@
# These are supported funding model platforms # These are supported funding model platforms
github: dgtlmoon github: dgtlmoon
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

67
.github/workflows/codeql-analysis.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ master ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
schedule:
- cron: '27 9 * * 4'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'javascript', 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
# Learn more:
# https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

77
.github/workflows/image.yml vendored Normal file
View File

@@ -0,0 +1,77 @@
name: Test, build and push to Docker Hub
on:
push:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
cd backend; pytest
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
image: tonistiigi/binfmt:latest
platforms: all
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
install: true
version: latest
driver-opts: image=moby/buildkit:master
- name: Build and push
id: docker_build
uses: docker/build-push-action@v2
with:
context: ./
file: ./Dockerfile
push: true
tags: |
${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:latest
# ${{ secrets.DOCKER_HUB_USERNAME }}:/changedetection.io:${{ env.RELEASE_VERSION }}
platforms: linux/amd64,linux/arm64,linux/arm/v6,linux/arm/v7
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
- name: Image digest
run: echo ${{ steps.docker_build.outputs.digest }}
- name: Cache Docker layers
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-

35
.github/workflows/test-only.yml vendored Normal file
View File

@@ -0,0 +1,35 @@
name: Test only
on:
push:
branches:
- /refs/heads/*
- !master
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
cd backend; pytest

4
.gitignore vendored
View File

@@ -2,4 +2,6 @@ __pycache__
.idea .idea
*.pyc *.pyc
datastore/url-watches.json datastore/url-watches.json
datastore/* datastore/*
__pycache__
.pytest_cache

View File

@@ -1,24 +1,30 @@
FROM python:3.8-slim FROM python:3.8-slim
COPY requirements.txt /tmp/requirements.txt COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt RUN apt-get update && apt-get install -y gcc libc-dev libxslt-dev zlib1g-dev g++ --no-install-recommends && rm -rf /var/lib/apt/lists/* /var/cache/apt/*
RUN pip3 install --no-cache-dir -r /tmp/requirements.txt
RUN [ ! -d "/app" ] && mkdir /app
RUN [ ! -d "/datastore" ] && mkdir /datastore
# The actual flask app
COPY backend /app/backend
# The eventlet server wrapper
COPY changedetection.py /app/changedetection.py
COPY backend /app
WORKDIR /app WORKDIR /app
# https://stackoverflow.com/questions/58701233/docker-logs-erroneously-appears-empty-until-container-stops # https://stackoverflow.com/questions/58701233/docker-logs-erroneously-appears-empty-until-container-stops
ENV PYTHONUNBUFFERED=1 ENV PYTHONUNBUFFERED=1
# Attempt to store the triggered commit # Attempt to store the triggered commit
ARG SOURCE_COMMIT ARG SOURCE_COMMIT
ARG SOURCE_BRANCH ARG SOURCE_BRANCH
RUN echo "commit: $SOURCE_COMMIT branch: $SOURCE_BRANCH" >/source.txt RUN echo "commit: $SOURCE_COMMIT branch: $SOURCE_BRANCH" >/source.txt
CMD [ "python", "./changedetection.py" , "-d", "/datastore"]
RUN [ ! -d "/datastore" ] && mkdir /datastore
CMD [ "python", "./backend.py" ]

201
LICENSE Normal file
View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,9 +1,20 @@
# changedetection.io # changedetection.io
![changedetection.io](https://github.com/dgtlmoon/changedetection.io/actions/workflows/image.yml/badge.svg?branch=master)
<a href="https://hub.docker.com/r/dgtlmoon/changedetection.io" target="_blank" title="Change detection docker hub">
<img src="https://img.shields.io/docker/pulls/dgtlmoon/changedetection.io" alt="Docker Pulls"/>
</a>
<a href="https://hub.docker.com/r/dgtlmoon/changedetection.io" target="_blank" title="Change detection docker hub">
<img src="https://img.shields.io/github/v/release/dgtlmoon/changedetection.io" alt="Change detection latest tag version"/>
</a>
## Self-hosted change monitoring of web pages. ## Self-hosted change monitoring of web pages.
_Know when web pages change! Stay ontop of new information!_ _Know when web pages change! Stay ontop of new information!_
Live your data-life *pro-actively* instead of *re-actively*, do not rely on manipulative social media for consuming important information.
<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot.png" style="max-width:100%;" alt="Self-hosted web page change monitoring" title="Self-hosted web page change monitoring" />
#### Example use cases #### Example use cases
@@ -11,7 +22,7 @@ Know when ...
- Government department updates (changes are often only on their websites) - Government department updates (changes are often only on their websites)
- Local government news (changes are often only on their websites) - Local government news (changes are often only on their websites)
- New software releases - New software releases, security advisories when you're not on their mailing list.
- Festivals with changes - Festivals with changes
- Realestate listing changes - Realestate listing changes
@@ -37,22 +48,9 @@ docker run -d --restart always -p "127.0.0.1:5000:5000" -v datastore-volume:/dat
### Screenshots ### Screenshots
Application running.
![Self-hosted web page change monitoring application screenshot](screenshot.png?raw=true "Self-hosted web page change monitoring screenshot")
Examining differences in content. Examining differences in content.
![Self-hosted web page change monitoring context difference screenshot](screenshot-diff.png?raw=true "Self-hosted web page change monitoring context difference screenshot")
### Future plans <img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot-diff.png" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />
- Greater configuration of check interval times, page request headers.
- ~~General options for timeout, default headers~~
- On change detection, callout to another API (handy for notices/issue trackers)
- ~~Explore the differences that were detected~~
- Add more options to explore versions of differences
- Use a graphic/rendered page difference instead of text (see the experimental `selenium-screenshot-diff` branch)
Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/ Please :star: star :star: this project and help it grow! https://github.com/dgtlmoon/changedetection.io/

1
backend/README-pytest.md Normal file
View File

@@ -0,0 +1 @@
Note: run `pytest` from this directory.

790
backend/__init__.py Normal file
View File

@@ -0,0 +1,790 @@
#!/usr/bin/python3
# @todo logging
# @todo extra options for url like , verify=False etc.
# @todo enable https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl as option?
# @todo option for interval day/6 hour/etc
# @todo on change detected, config for calling some API
# @todo make tables responsive!
# @todo fetch title into json
# https://distill.io/features
# proxy per check
# - flask_cors, itsdangerous,MarkupSafe
import time
import os
import timeago
import flask_login
from flask_login import login_required
import threading
from threading import Event
import queue
from flask import Flask, render_template, request, send_from_directory, abort, redirect, url_for
from feedgen.feed import FeedGenerator
from flask import make_response
import datetime
import pytz
datastore = None
# Local
running_update_threads = []
ticker_thread = None
messages = []
extra_stylesheets = []
update_q = queue.Queue()
app = Flask(__name__, static_url_path="/var/www/change-detection/backen/static")
# Stop browser caching of assets
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
app.config.exit = Event()
app.config['NEW_VERSION_AVAILABLE'] = False
app.config['LOGIN_DISABLED'] = False
# Disables caching of the templates
app.config['TEMPLATES_AUTO_RELOAD'] = True
# We use the whole watch object from the store/JSON so we can see if there's some related status in terms of a thread
# running or something similar.
@app.template_filter('format_last_checked_time')
def _jinja2_filter_datetime(watch_obj, format="%Y-%m-%d %H:%M:%S"):
# Worker thread tells us which UUID it is currently processing.
for t in running_update_threads:
if t.current_uuid == watch_obj['uuid']:
return "Checking now.."
if watch_obj['last_checked'] == 0:
return 'Not yet'
return timeago.format(int(watch_obj['last_checked']), time.time())
# @app.context_processor
# def timeago():
# def _timeago(lower_time, now):
# return timeago.format(lower_time, now)
# return dict(timeago=_timeago)
@app.template_filter('format_timestamp_timeago')
def _jinja2_filter_datetimestamp(timestamp, format="%Y-%m-%d %H:%M:%S"):
return timeago.format(timestamp, time.time())
# return timeago.format(timestamp, time.time())
# return datetime.datetime.utcfromtimestamp(timestamp).strftime(format)
class User(flask_login.UserMixin):
id=None
def set_password(self, password):
return True
def get_user(self, email="defaultuser@changedetection.io"):
return self
def is_authenticated(self):
return True
def is_active(self):
return True
def is_anonymous(self):
return False
def get_id(self):
return str(self.id)
def check_password(self, password):
import hashlib
import base64
# Getting the values back out
raw_salt_pass = base64.b64decode(datastore.data['settings']['application']['password'])
salt_from_storage = raw_salt_pass[:32] # 32 is the length of the salt
# Use the exact same setup you used to generate the key, but this time put in the password to check
new_key = hashlib.pbkdf2_hmac(
'sha256',
password.encode('utf-8'), # Convert the password to bytes
salt_from_storage,
100000
)
new_key = salt_from_storage + new_key
return new_key == raw_salt_pass
pass
def changedetection_app(conig=None, datastore_o=None):
global datastore
datastore = datastore_o
app.config.update(dict(DEBUG=True))
#app.config.update(config or {})
login_manager = flask_login.LoginManager(app)
login_manager.login_view = 'login'
# Setup cors headers to allow all domains
# https://flask-cors.readthedocs.io/en/latest/
# CORS(app)
@login_manager.user_loader
def user_loader(email):
user = User()
user.get_user(email)
return user
@login_manager.unauthorized_handler
def unauthorized_handler():
# @todo validate its a URL of this host and use that
return redirect(url_for('login', next=url_for('index')))
@app.route('/logout')
def logout():
flask_login.logout_user()
return redirect(url_for('index'))
# https://github.com/pallets/flask/blob/93dd1709d05a1cf0e886df6223377bdab3b077fb/examples/tutorial/flaskr/__init__.py#L39
# You can divide up the stuff like this
@app.route('/login', methods=['GET', 'POST'])
def login():
global messages
if request.method == 'GET':
output = render_template("login.html", messages=messages)
# Show messages but once.
messages = []
return output
user = User()
user.id = "defaultuser@changedetection.io"
password = request.form.get('password')
if (user.check_password(password)):
flask_login.login_user(user, remember=True)
next = request.args.get('next')
# if not is_safe_url(next):
# return flask.abort(400)
return redirect(next or url_for('index'))
else:
messages.append({'class': 'error', 'message': 'Incorrect password'})
return redirect(url_for('login'))
@app.before_request
def do_something_whenever_a_request_comes_in():
# Disable password loginif there is not one set
app.config['LOGIN_DISABLED'] = datastore.data['settings']['application']['password'] == False
@app.route("/", methods=['GET'])
@login_required
def index():
global messages
limit_tag = request.args.get('tag')
pause_uuid = request.args.get('pause')
if pause_uuid:
try:
datastore.data['watching'][pause_uuid]['paused'] ^= True
datastore.needs_write = True
return redirect(url_for('index', tag = limit_tag))
except KeyError:
pass
# Sort by last_changed and add the uuid which is usually the key..
sorted_watches = []
for uuid, watch in datastore.data['watching'].items():
if limit_tag != None:
# Support for comma separated list of tags.
for tag_in_watch in watch['tag'].split(','):
tag_in_watch = tag_in_watch.strip()
if tag_in_watch == limit_tag:
watch['uuid'] = uuid
sorted_watches.append(watch)
else:
watch['uuid'] = uuid
sorted_watches.append(watch)
sorted_watches.sort(key=lambda x: x['last_changed'], reverse=True)
existing_tags = datastore.get_all_tags()
rss = request.args.get('rss')
if rss:
fg = FeedGenerator()
fg.title('changedetection.io')
fg.description('Feed description')
fg.link(href='https://changedetection.io')
for watch in sorted_watches:
if not watch['viewed']:
fe = fg.add_entry()
fe.title(watch['url'])
fe.link(href=watch['url'])
fe.description(watch['url'])
fe.guid(watch['uuid'], permalink=False)
dt = datetime.datetime.fromtimestamp(int(watch['newest_history_key']))
dt = dt.replace(tzinfo=pytz.UTC)
fe.pubDate(dt)
response = make_response(fg.rss_str())
response.headers.set('Content-Type', 'application/rss+xml')
return response
else:
output = render_template("watch-overview.html",
watches=sorted_watches,
messages=messages,
tags=existing_tags,
active_tag=limit_tag,
has_unviewed=datastore.data['has_unviewed'])
# Show messages but once.
messages = []
return output
@app.route("/scrub", methods=['GET', 'POST'])
@login_required
def scrub_page():
from pathlib import Path
global messages
if request.method == 'POST':
confirmtext = request.form.get('confirmtext')
limit_timestamp = int(request.form.get('limit_date'))
if confirmtext == 'scrub':
for uuid, watch in datastore.data['watching'].items():
if len(str(limit_timestamp)) == 10:
datastore.scrub_watch(uuid, limit_timestamp = limit_timestamp)
else:
datastore.scrub_watch(uuid)
messages.append({'class': 'ok', 'message': 'Cleaned all version history.'})
else:
messages.append({'class': 'error', 'message': 'Wrong confirm text.'})
return redirect(url_for('index'))
return render_template("scrub.html")
# If they edited an existing watch, we need to know to reset the current/previous md5 to include
# the excluded text.
def get_current_checksum_include_ignore_text(uuid):
import hashlib
from backend import fetch_site_status
# Get the most recent one
newest_history_key = datastore.get_val(uuid, 'newest_history_key')
# 0 means that theres only one, so that there should be no 'unviewed' history availabe
if newest_history_key == 0:
newest_history_key = list(datastore.data['watching'][uuid]['history'].keys())[0]
if newest_history_key:
with open(datastore.data['watching'][uuid]['history'][newest_history_key],
encoding='utf-8') as file:
raw_content = file.read()
handler = fetch_site_status.perform_site_check(datastore=datastore)
stripped_content = handler.strip_ignore_text(raw_content,
datastore.data['watching'][uuid]['ignore_text'])
checksum = hashlib.md5(stripped_content).hexdigest()
return checksum
return datastore.data['watching'][uuid]['previous_md5']
@app.route("/edit/<string:uuid>", methods=['GET', 'POST'])
@login_required
def edit_page(uuid):
global messages
import validators
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
if request.method == 'POST':
url = request.form.get('url').strip()
tag = request.form.get('tag').strip()
# Extra headers
form_headers = request.form.get('headers').strip().split("\n")
extra_headers = {}
if form_headers:
for header in form_headers:
if len(header):
parts = header.split(':', 1)
if len(parts) == 2:
extra_headers.update({parts[0].strip(): parts[1].strip()})
update_obj = {'url': url,
'tag': tag,
'headers': extra_headers
}
# Ignore text
form_ignore_text = request.form.get('ignore-text').strip()
ignore_text = []
if len(form_ignore_text):
for text in form_ignore_text.split("\n"):
text = text.strip()
if len(text):
ignore_text.append(text)
datastore.data['watching'][uuid]['ignore_text'] = ignore_text
# Reset the previous_md5 so we process a new snapshot including stripping ignore text.
if len(datastore.data['watching'][uuid]['history']):
update_obj['previous_md5'] = get_current_checksum_include_ignore_text(uuid=uuid)
validators.url(url) # @todo switch to prop/attr/observer
datastore.data['watching'][uuid].update(update_obj)
datastore.needs_write = True
messages.append({'class': 'ok', 'message': 'Updated watch.'})
return redirect(url_for('index'))
else:
output = render_template("edit.html", uuid=uuid, watch=datastore.data['watching'][uuid], messages=messages)
return output
@app.route("/settings", methods=['GET', "POST"])
@login_required
def settings_page():
global messages
if request.method == 'GET':
if request.values.get('removepassword'):
from pathlib import Path
datastore.data['settings']['application']['password'] = False
messages.append({'class': 'notice', 'message': "Password protection removed."})
flask_login.logout_user()
return redirect(url_for('settings_page'))
if request.method == 'POST':
password = request.values.get('password')
if password:
import hashlib
import base64
import secrets
# Make a new salt on every new password and store it with the password
salt = secrets.token_bytes(32)
key = hashlib.pbkdf2_hmac('sha256', password.encode('utf-8'), salt, 100000)
store = base64.b64encode(salt + key).decode('ascii')
datastore.data['settings']['application']['password'] = store
messages.append({'class': 'notice', 'message': "Password protection enabled."})
flask_login.logout_user()
return redirect(url_for('index'))
try:
minutes = int(request.values.get('minutes').strip())
except ValueError:
messages.append({'class': 'error', 'message': "Invalid value given, use an integer."})
else:
if minutes >= 5:
datastore.data['settings']['requests']['minutes_between_check'] = minutes
datastore.needs_write = True
messages.append({'class': 'ok', 'message': "Updated"})
else:
messages.append(
{'class': 'error', 'message': "Must be atleast 5 minutes."})
output = render_template("settings.html", messages=messages,
minutes=datastore.data['settings']['requests']['minutes_between_check'])
messages = []
return output
@app.route("/import", methods=['GET', "POST"])
@login_required
def import_page():
import validators
global messages
remaining_urls = []
good = 0
if request.method == 'POST':
urls = request.values.get('urls').split("\n")
for url in urls:
url = url.strip()
if len(url) and validators.url(url):
new_uuid = datastore.add_watch(url=url.strip(), tag="")
# Straight into the queue.
update_q.put(new_uuid)
good += 1
else:
if len(url):
remaining_urls.append(url)
messages.append({'class': 'ok', 'message': "{} Imported, {} Skipped.".format(good, len(remaining_urls))})
if len(remaining_urls) == 0:
# Looking good, redirect to index.
return redirect(url_for('index'))
# Could be some remaining, or we could be on GET
output = render_template("import.html",
messages=messages,
remaining="\n".join(remaining_urls)
)
messages = []
return output
# Clear all statuses, so we do not see the 'unviewed' class
@app.route("/api/mark-all-viewed", methods=['GET'])
@login_required
def mark_all_viewed():
# Save the current newest history as the most recently viewed
for watch_uuid, watch in datastore.data['watching'].items():
datastore.set_last_viewed(watch_uuid, watch['newest_history_key'])
messages.append({'class': 'ok', 'message': "Cleared all statuses."})
return redirect(url_for('index'))
@app.route("/diff/<string:uuid>", methods=['GET'])
@login_required
def diff_history_page(uuid):
global messages
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
extra_stylesheets = ['/static/css/diff.css']
try:
watch = datastore.data['watching'][uuid]
except KeyError:
messages.append({'class': 'error', 'message': "No history found for the specified link, bad link?"})
return redirect(url_for('index'))
dates = list(watch['history'].keys())
# Convert to int, sort and back to str again
dates = [int(i) for i in dates]
dates.sort(reverse=True)
dates = [str(i) for i in dates]
if len(dates) < 2:
messages.append(
{'class': 'error', 'message': "Not enough saved change detection snapshots to produce a report."})
return redirect(url_for('index'))
# Save the current newest history as the most recently viewed
datastore.set_last_viewed(uuid, dates[0])
newest_file = watch['history'][dates[0]]
with open(newest_file, 'r') as f:
newest_version_file_contents = f.read()
previous_version = request.args.get('previous_version')
try:
previous_file = watch['history'][previous_version]
except KeyError:
# Not present, use a default value, the second one in the sorted list.
previous_file = watch['history'][dates[1]]
with open(previous_file, 'r') as f:
previous_version_file_contents = f.read()
output = render_template("diff.html", watch_a=watch,
messages=messages,
newest=newest_version_file_contents,
previous=previous_version_file_contents,
extra_stylesheets=extra_stylesheets,
versions=dates[1:],
newest_version_timestamp=dates[0],
current_previous_version=str(previous_version),
current_diff_url=watch['url'])
return output
@app.route("/preview/<string:uuid>", methods=['GET'])
@login_required
def preview_page(uuid):
global messages
# More for testing, possible to return the first/only
if uuid == 'first':
uuid = list(datastore.data['watching'].keys()).pop()
extra_stylesheets = ['/static/css/diff.css']
try:
watch = datastore.data['watching'][uuid]
except KeyError:
messages.append({'class': 'error', 'message': "No history found for the specified link, bad link?"})
return redirect(url_for('index'))
print(watch)
with open(list(watch['history'].values())[-1], 'r') as f:
content = f.readlines()
output = render_template("preview.html", content=content, extra_stylesheets=extra_stylesheets)
return output
@app.route("/favicon.ico", methods=['GET'])
def favicon():
return send_from_directory("/app/static/images", filename="favicon.ico")
# We're good but backups are even better!
@app.route("/backup", methods=['GET'])
@login_required
def get_backup():
import zipfile
from pathlib import Path
# create a ZipFile object
backupname = "changedetection-backup-{}.zip".format(int(time.time()))
# We only care about UUIDS from the current index file
uuids = list(datastore.data['watching'].keys())
backup_filepath = os.path.join(app.config['datastore_path'], backupname)
with zipfile.ZipFile(backup_filepath, "w",
compression=zipfile.ZIP_DEFLATED,
compresslevel=8) as zipObj:
# Be sure we're written fresh
datastore.sync_to_json()
# Add the index
zipObj.write(os.path.join(app.config['datastore_path'], "url-watches.json"), arcname="url-watches.json")
# Add the flask app secret
zipObj.write(os.path.join(app.config['datastore_path'], "secret.txt"), arcname="secret.txt")
# Add any snapshot data we find, use the full path to access the file, but make the file 'relative' in the Zip.
for txt_file_path in Path(app.config['datastore_path']).rglob('*.txt'):
parent_p = txt_file_path.parent
if parent_p.name in uuids:
zipObj.write(txt_file_path,
arcname=str(txt_file_path).replace(app.config['datastore_path'], ''),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)
return send_from_directory(app.config['datastore_path'], backupname)
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
def static_content(group, filename):
# These files should be in our subdirectory
full_path = os.path.realpath(__file__)
p = os.path.dirname(full_path)
try:
return send_from_directory("{}/static/{}".format(p, group), filename=filename)
except FileNotFoundError:
abort(404)
@app.route("/api/add", methods=['POST'])
@login_required
def api_watch_add():
global messages
url = request.form.get('url').strip()
if datastore.url_exists(url):
messages.append({'class': 'error', 'message': 'The URL {} already exists'.format(url)})
return redirect(url_for('index'))
# @todo add_watch should throw a custom Exception for validation etc
new_uuid = datastore.add_watch(url=url, tag=request.form.get('tag').strip())
# Straight into the queue.
update_q.put(new_uuid)
messages.append({'class': 'ok', 'message': 'Watch added.'})
return redirect(url_for('index'))
@app.route("/api/delete", methods=['GET'])
@login_required
def api_delete():
global messages
uuid = request.args.get('uuid')
datastore.delete(uuid)
messages.append({'class': 'ok', 'message': 'Deleted.'})
return redirect(url_for('index'))
@app.route("/api/checknow", methods=['GET'])
@login_required
def api_watch_checknow():
global messages
tag = request.args.get('tag')
uuid = request.args.get('uuid')
i = 0
running_uuids = []
for t in running_update_threads:
running_uuids.append(t.current_uuid)
# @todo check thread is running and skip
if uuid:
if uuid not in running_uuids:
update_q.put(uuid)
i = 1
elif tag != None:
# Items that have this current tag
for watch_uuid, watch in datastore.data['watching'].items():
if (tag != None and tag in watch['tag']):
if watch_uuid not in running_uuids and not datastore.data['watching'][watch_uuid]['paused']:
update_q.put(watch_uuid)
i += 1
else:
# No tag, no uuid, add everything.
for watch_uuid, watch in datastore.data['watching'].items():
if watch_uuid not in running_uuids and not datastore.data['watching'][watch_uuid]['paused']:
update_q.put(watch_uuid)
i += 1
messages.append({'class': 'ok', 'message': "{} watches are rechecking.".format(i)})
return redirect(url_for('index', tag=tag))
# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
# Check for new release version
threading.Thread(target=check_for_new_version).start()
return app
# Check for new version and anonymous stats
def check_for_new_version():
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
while not app.config.exit.is_set():
try:
r = requests.post("https://changedetection.io/check-ver.php",
data={'version': datastore.data['version_tag'],
'app_guid': datastore.data['app_guid']},
verify=False)
except:
pass
try:
if "new_version" in r.text:
app.config['NEW_VERSION_AVAILABLE'] = True
except:
pass
# Check daily
app.config.exit.wait(86400)
# Requests for checking on the site use a pool of thread Workers managed by a Queue.
class Worker(threading.Thread):
current_uuid = None
def __init__(self, q, *args, **kwargs):
self.q = q
super().__init__(*args, **kwargs)
def run(self):
from backend import fetch_site_status
update_handler = fetch_site_status.perform_site_check(datastore=datastore)
while not app.config.exit.is_set():
try:
uuid = self.q.get(block=False)
except queue.Empty:
pass
else:
self.current_uuid = uuid
if uuid in list(datastore.data['watching'].keys()):
try:
changed_detected, result, contents = update_handler.run(uuid)
except PermissionError as s:
app.logger.error("File permission error updating", uuid, str(s))
else:
if result:
datastore.update_watch(uuid=uuid, update_obj=result)
if changed_detected:
# A change was detected
datastore.save_history_text(uuid=uuid, contents=contents, result_obj=result)
self.current_uuid = None # Done
self.q.task_done()
app.config.exit.wait(1)
# Thread runner to check every minute, look for new watches to feed into the Queue.
def ticker_thread_check_time_launch_checks():
# Spin up Workers.
for _ in range(datastore.data['settings']['requests']['workers']):
new_worker = Worker(update_q)
running_update_threads.append(new_worker)
new_worker.start()
while not app.config.exit.is_set():
running_uuids = []
for t in running_update_threads:
if t.current_uuid:
running_uuids.append(t.current_uuid)
# Look at the dataset, find a stale watch to process
# Every minute check for new UUIDs to follow up on, should be inside the loop incase it changes.
minutes = datastore.data['settings']['requests']['minutes_between_check']
threshold = time.time() - (minutes * 60)
for uuid, watch in datastore.data['watching'].items():
if not watch['paused'] and watch['last_checked'] <= threshold:
if not uuid in running_uuids and uuid not in update_q.queue:
update_q.put(uuid)
# Should be low so we can break this out in testing
app.config.exit.wait(1)

View File

@@ -1,486 +0,0 @@
#!/usr/bin/python3
# @todo logging
# @todo sort by last_changed
# @todo extra options for url like , verify=False etc.
# @todo enable https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl as option?
# @todo maybe a button to reset all 'last-changed'.. so you can see it clearly when something happens since your last visit
# @todo option for interval day/6 hour/etc
# @todo on change detected, config for calling some API
# @todo make tables responsive!
# @todo fetch title into json
# https://distill.io/features
# proxy per check
#i
import json
import eventlet
import eventlet.wsgi
import time
import os
import getopt
import sys
import datetime
import timeago
import threading
import queue
from flask import Flask, render_template, request, send_file, send_from_directory, safe_join, abort, redirect, url_for
# Local
import store
import fetch_site_status
running_update_threads = []
ticker_thread = None
datastore = store.ChangeDetectionStore()
messages = []
extra_stylesheets = []
update_q = queue.Queue()
app = Flask(__name__, static_url_path='/static')
app.config['STATIC_RESOURCES'] = "/app/static"
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
# app.config['SECRET_KEY'] = 'secret!'
# Disables caching of the templates
app.config['TEMPLATES_AUTO_RELOAD'] = True
# We use the whole watch object from the store/JSON so we can see if there's some related status in terms of a thread
# running or something similar.
@app.template_filter('format_last_checked_time')
def _jinja2_filter_datetime(watch_obj, format="%Y-%m-%d %H:%M:%S"):
# Worker thread tells us which UUID it is currently processing.
for t in running_update_threads:
if t.current_uuid == watch_obj['uuid']:
return "Checking now.."
if watch_obj['last_checked'] == 0:
return 'Not yet'
return timeago.format(int(watch_obj['last_checked']), time.time())
# @app.context_processor
# def timeago():
# def _timeago(lower_time, now):
# return timeago.format(lower_time, now)
# return dict(timeago=_timeago)
@app.template_filter('format_timestamp_timeago')
def _jinja2_filter_datetimestamp(timestamp, format="%Y-%m-%d %H:%M:%S"):
if timestamp == 0:
return 'Not yet'
return timeago.format(timestamp, time.time())
# return timeago.format(timestamp, time.time())
# return datetime.datetime.utcfromtimestamp(timestamp).strftime(format)
@app.route("/", methods=['GET'])
def main_page():
global messages
limit_tag = request.args.get('tag')
# Sort by last_changed and add the uuid which is usually the key..
sorted_watches = []
for uuid, watch in datastore.data['watching'].items():
if limit_tag != None:
# Support for comma separated list of tags.
for tag_in_watch in watch['tag'].split(','):
tag_in_watch = tag_in_watch.strip()
if tag_in_watch == limit_tag:
watch['uuid'] = uuid
sorted_watches.append(watch)
else:
watch['uuid'] = uuid
sorted_watches.append(watch)
sorted_watches.sort(key=lambda x: x['last_changed'], reverse=True)
existing_tags = datastore.get_all_tags()
output = render_template("watch-overview.html",
watches=sorted_watches,
messages=messages,
tags=existing_tags,
active_tag=limit_tag)
# Show messages but once.
messages = []
return output
@app.route("/scrub", methods=['GET', 'POST'])
def scrub_page():
from pathlib import Path
global messages
if request.method == 'POST':
confirmtext = request.form.get('confirmtext')
if confirmtext == 'scrub':
for txt_file_path in Path('/datastore').rglob('*.txt'):
os.unlink(txt_file_path)
for uuid, watch in datastore.data['watching'].items():
watch['last_checked'] = 0
watch['last_changed'] = 0
watch['previous_md5'] = None
watch['history'] = {}
datastore.needs_write = True
messages.append({'class': 'ok', 'message': 'Cleaned all version history.'})
else:
messages.append({'class': 'error', 'message': 'Wrong confirm text.'})
return redirect(url_for('main_page'))
return render_template("scrub.html")
@app.route("/edit", methods=['GET', 'POST'])
def edit_page():
global messages
import validators
if request.method == 'POST':
uuid = request.args.get('uuid')
url = request.form.get('url').strip()
tag = request.form.get('tag').strip()
form_headers = request.form.get('headers').strip().split("\n")
extra_headers = {}
if form_headers:
for header in form_headers:
if len(header):
parts = header.split(':', 1)
extra_headers.update({parts[0].strip(): parts[1].strip()})
validators.url(url) # @todo switch to prop/attr/observer
datastore.data['watching'][uuid].update({'url': url,
'tag': tag,
'headers': extra_headers})
datastore.needs_write = True
messages.append({'class': 'ok', 'message': 'Updated watch.'})
return redirect(url_for('main_page'))
else:
uuid = request.args.get('uuid')
output = render_template("edit.html", uuid=uuid, watch=datastore.data['watching'][uuid], messages=messages)
return output
@app.route("/settings", methods=['GET', "POST"])
def settings_page():
global messages
if request.method == 'POST':
try:
minutes = int(request.values.get('minutes').strip())
except ValueError:
messages.append({'class': 'error', 'message': "Invalid value given, use an integer."})
else:
if minutes >= 5 and minutes <= 600:
datastore.data['settings']['requests']['minutes_between_check'] = minutes
datastore.needs_write = True
messages.append({'class': 'ok', 'message': "Updated"})
else:
messages.append({'class': 'error', 'message': "Must be equal to or greater than 5 and less than 600 minutes"})
output = render_template("settings.html", messages=messages, minutes=datastore.data['settings']['requests']['minutes_between_check'])
messages =[]
return output
@app.route("/import", methods=['GET', "POST"])
def import_page():
import validators
global messages
remaining_urls=[]
good = 0
if request.method == 'POST':
urls = request.values.get('urls').split("\n")
for url in urls:
url = url.strip()
if len(url) and validators.url(url):
datastore.add_watch(url=url.strip(), tag="")
good += 1
else:
if len(url):
remaining_urls.append(url)
messages.append({'class': 'ok', 'message': "{} Imported, {} Skipped.".format(good, len(remaining_urls))})
output = render_template("import.html",
messages=messages,
remaining="\n".join(remaining_urls)
)
messages = []
return output
@app.route("/diff/<string:uuid>", methods=['GET'])
def diff_history_page(uuid):
global messages
extra_stylesheets=['/static/css/diff.css']
watch = datastore.data['watching'][uuid]
dates = list(watch['history'].keys())
# Convert to int, sort and back to str again
dates = [int(i) for i in dates]
dates.sort(reverse=True)
dates = [str(i) for i in dates]
newest_file = watch['history'][dates[0]]
with open(newest_file, 'r') as f:
newest_version_file_contents = f.read()
previous_version = request.args.get('previous_version')
try:
previous_file = watch['history'][previous_version]
except KeyError:
# Not present, use a default value, the second one in the sorted list.
previous_file = watch['history'][dates[1]]
with open(previous_file, 'r') as f:
previous_version_file_contents = f.read()
output = render_template("diff.html", watch_a=watch,
messages=messages,
newest=newest_version_file_contents,
previous=previous_version_file_contents,
extra_stylesheets=extra_stylesheets,
versions=dates[1:],
newest_version_timestamp=dates[0],
current_previous_version=str(previous_version),
current_diff_url=watch['url'])
return output
@app.route("/favicon.ico", methods=['GET'])
def favicon():
return send_from_directory("/app/static/images", filename="favicon.ico")
# We're good but backups are even better!
@app.route("/backup", methods=['GET'])
def get_backup():
import zipfile
from pathlib import Path
import zlib
# create a ZipFile object
backupname = "changedetection-backup-{}.zip".format(int(time.time()))
# We only care about UUIDS from the current index file
uuids = list(datastore.data['watching'].keys())
with zipfile.ZipFile(os.path.join("/datastore", backupname), 'w', compression=zipfile.ZIP_DEFLATED,
compresslevel=6) as zipObj:
# Be sure we're written fresh
datastore.sync_to_json()
# Add the index
zipObj.write(os.path.join("/datastore", "url-watches.json"))
# Add any snapshot data we find
for txt_file_path in Path('/datastore').rglob('*.txt'):
parent_p = txt_file_path.parent
if parent_p.name in uuids:
zipObj.write(txt_file_path)
return send_file(os.path.join("/datastore", backupname),
as_attachment=True,
mimetype="application/zip",
attachment_filename=backupname)
# A few self sanity checks, mostly for developer/bug check
@app.route("/self-check", methods=['GET'])
def selfcheck():
output = "All fine"
# In earlier versions before a single threaded write of the JSON store, sometimes histories could get mixed.
# Could also maybe affect people who manually fiddle with their JSON store?
for uuid, watch in datastore.data['watching'].items():
for timestamp, path in watch['history'].items():
# Each history snapshot should include a full path, which contains the {uuid}
if not uuid in path:
output = "Something weird in {}, suspected incorrect snapshot path.".format(uuid)
return output
@app.route("/static/<string:group>/<string:filename>", methods=['GET'])
def static_content(group, filename):
try:
return send_from_directory("/app/static/{}".format(group), filename=filename)
except FileNotFoundError:
abort(404)
@app.route("/api/add", methods=['POST'])
def api_watch_add():
global messages
# @todo add_watch should throw a custom Exception for validation etc
new_uuid = datastore.add_watch(url=request.form.get('url').strip(), tag=request.form.get('tag').strip())
# Straight into the queue.
update_q.put(new_uuid)
messages.append({'class': 'ok', 'message': 'Watch added.'})
return redirect(url_for('main_page'))
@app.route("/api/delete", methods=['GET'])
def api_delete():
global messages
uuid = request.args.get('uuid')
datastore.delete(uuid)
messages.append({'class': 'ok', 'message': 'Deleted.'})
return redirect(url_for('main_page'))
@app.route("/api/checknow", methods=['GET'])
def api_watch_checknow():
global messages
tag = request.args.get('tag')
uuid = request.args.get('uuid')
i=0
if uuid:
update_q.put(uuid)
i = 1
elif tag != None:
for watch_uuid, watch in datastore.data['watching'].items():
if (tag != None and tag in watch['tag']):
i += 1
update_q.put(watch_uuid)
else:
# No tag, no uuid, add everything.
for watch_uuid, watch in datastore.data['watching'].items():
i += 1
update_q.put(watch_uuid)
messages.append({'class': 'ok', 'message': "{} watches are rechecking.".format(i)})
return redirect(url_for('main_page', tag=tag))
# Requests for checking on the site use a pool of thread Workers managed by a Queue.
class Worker(threading.Thread):
current_uuid = None
def __init__(self, q, *args, **kwargs):
self.q = q
super().__init__(*args, **kwargs)
def run(self):
try:
while True:
uuid = self.q.get() # Blocking
self.current_uuid = uuid
# A little safety protection
if uuid in list( datastore.data['watching'].keys()):
fetch_site_status.perform_site_check(uuid=uuid, datastore=datastore)
self.current_uuid = None # Done
self.q.task_done()
except KeyboardInterrupt:
return
# Thread runner to check every minute, look for new watches to feed into the Queue.
def ticker_thread_check_time_launch_checks():
# Spin up Workers.
for _ in range(datastore.data['settings']['requests']['workers']):
new_worker = Worker(update_q)
running_update_threads.append(new_worker)
new_worker.start()
# Every minute check for new UUIDs to follow up on
while True:
minutes = datastore.data['settings']['requests']['minutes_between_check']
for uuid, watch in datastore.data['watching'].items():
if watch['last_checked'] <= time.time() - (minutes * 60):
update_q.put(uuid)
time.sleep(60)
# Thread runner, this helps with thread/write issues when there are many operations that want to update the JSON
# by just running periodically in one thread, according to python, dict updates are threadsafe.
def save_datastore():
try:
while True:
if datastore.needs_write:
datastore.sync_to_json()
time.sleep(5)
except KeyboardInterrupt:
return
def main(argv):
ssl_mode = False
port = 5000
try:
opts, args = getopt.getopt(argv, "sp:", "purge")
except getopt.GetoptError:
print('backend.py -s SSL enable -p [port]')
sys.exit(2)
for opt, arg in opts:
if opt == '--purge':
# Remove history, the actual files you need to delete manually.
for uuid, watch in datastore.data['watching'].items():
watch.update({'history': {}, 'last_checked': 0, 'last_changed': 0, 'previous_md5': None})
if opt == '-s':
ssl_mode = True
if opt == '-p':
port = arg
# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()
save_data_thread = threading.Thread(target=save_datastore).start()
# @todo finalise SSL config, but this should get you in the right direction if you need it.
if ssl_mode:
eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen(('', port)),
certfile='cert.pem',
keyfile='privkey.pem',
server_side=True), app)
else:
eventlet.wsgi.server(eventlet.listen(('', port)), app)
if __name__ == '__main__':
main(sys.argv[1:])

View File

@@ -3,9 +3,7 @@ FROM python:3.8-slim
# https://stackoverflow.com/questions/58701233/docker-logs-erroneously-appears-empty-until-container-stops # https://stackoverflow.com/questions/58701233/docker-logs-erroneously-appears-empty-until-container-stops
ENV PYTHONUNBUFFERED=1 ENV PYTHONUNBUFFERED=1
# Should be mounted from docker-compose-development.yml WORKDIR /app
RUN pip3 install -r /requirements.txt
RUN [ ! -d "/datastore" ] && mkdir /datastore RUN [ ! -d "/datastore" ] && mkdir /datastore

View File

@@ -1,9 +1,7 @@
import time import time
import sys
print ("Sleep loop, you should run your script from the console") print ("Sleep loop, you should run your script from the console")
while True: while True:
# Wait for 5 seconds # Wait for 5 seconds
time.sleep(2)
time.sleep(2)

View File

@@ -1,59 +1,49 @@
import time import time
import requests import requests
import hashlib import hashlib
import os
import re
from inscriptis import get_text from inscriptis import get_text
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# Doesn't feel right having 'datastore' as a var here, perhaps this class can inherit from datastore/abstract # Some common stuff here that can be moved to a base class
# but on the other hand, I dont want a new instantiation of the that datastore object every time, due to it reading the
# JSON store, setting vars, writing etc.
class perform_site_check(): class perform_site_check():
def __init__(self, *args, uuid=False, datastore, **kwargs):
def __init__(self, *args, datastore, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.timestamp = int(time.time()) # used for storage etc too
self.uuid = uuid
self.datastore = datastore self.datastore = datastore
self.url = datastore.get_val(uuid, 'url')
self.current_md5 = datastore.get_val(uuid, 'previous_md5')
self.output_path = "/datastore/{}".format(self.uuid)
self.ensure_output_path() def strip_ignore_text(self, content, list_ignore_text):
self.run() ignore = []
for k in list_ignore_text:
ignore.append(k.encode('utf8'))
def save_firefox_screenshot(self, uuid, output): output = []
# @todo call selenium or whatever for line in content.splitlines():
return line = line.encode('utf8')
def ensure_output_path(self): # Always ignore blank lines in this mode. (when this function gets called)
if len(line.strip()):
if not any(skip_text in line for skip_text in ignore):
output.append(line)
try: return "\n".encode('utf8').join(output)
os.stat(self.output_path)
except:
os.mkdir(self.output_path)
def save_response_html_output(self, output):
# @todo Saving the original HTML can be very large, better to set as an option, these files could be important to some.
with open("{}/{}.html".format(self.output_path, self.timestamp), 'w') as f:
f.write(output)
f.close()
def save_response_stripped_output(self, output): def run(self, uuid):
fname = "{}/{}.stripped.txt".format(self.output_path, self.timestamp) timestamp = int(time.time()) # used for storage etc too
with open(fname, 'w') as f: stripped_text_from_html = False
f.write(output) changed_detected = False
f.close()
return fname update_obj = {'previous_md5': self.datastore.data['watching'][uuid]['previous_md5'],
'history': {},
"last_checked": timestamp
}
def run(self): extra_headers = self.datastore.get_val(uuid, 'headers')
extra_headers = self.datastore.get_val(self.uuid, 'headers')
# Tweak the base config with the per-watch ones # Tweak the base config with the per-watch ones
request_headers = self.datastore.data['settings']['headers'].copy() request_headers = self.datastore.data['settings']['headers']
request_headers.update(extra_headers) request_headers.update(extra_headers)
# https://github.com/psf/requests/issues/4525 # https://github.com/psf/requests/issues/4525
@@ -62,10 +52,6 @@ class perform_site_check():
if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']: if 'Accept-Encoding' in request_headers and "br" in request_headers['Accept-Encoding']:
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '') request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
print("Checking", self.url)
try: try:
timeout = self.datastore.data['settings']['requests']['timeout'] timeout = self.datastore.data['settings']['requests']['timeout']
except KeyError: except KeyError:
@@ -73,7 +59,9 @@ class perform_site_check():
timeout = 15 timeout = 15
try: try:
r = requests.get(self.url, url = self.datastore.get_val(uuid, 'url')
r = requests.get(url,
headers=request_headers, headers=request_headers,
timeout=timeout, timeout=timeout,
verify=False) verify=False)
@@ -82,43 +70,49 @@ class perform_site_check():
# Usually from networkIO/requests level # Usually from networkIO/requests level
except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e: except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e:
self.datastore.update_watch(self.uuid, 'last_error', str(e)) update_obj["last_error"] = str(e)
print(str(e)) print(str(e))
except requests.exceptions.MissingSchema: except requests.exceptions.MissingSchema:
print("Skipping {} due to missing schema/bad url".format(self.uuid)) print("Skipping {} due to missing schema/bad url".format(uuid))
# Usually from html2text level # Usually from html2text level
except UnicodeDecodeError as e: except UnicodeDecodeError as e:
self.datastore.update_watch(self.uuid, 'last_error', str(e))
update_obj["last_error"] = str(e)
print(str(e)) print(str(e))
# figure out how to deal with this cleaner.. # figure out how to deal with this cleaner..
# 'utf-8' codec can't decode byte 0xe9 in position 480: invalid continuation byte # 'utf-8' codec can't decode byte 0xe9 in position 480: invalid continuation byte
else: else:
# We rely on the actual text in the html output.. many sites have random script vars etc,
# in the future we'll implement other mechanisms.
# We rely on the actual text in the html output.. many sites have random script vars etc update_obj["last_check_status"] = r.status_code
self.datastore.update_watch(self.uuid, 'last_error', False) update_obj["last_error"] = False
self.datastore.update_watch(self.uuid, 'last_check_status', r.status_code)
fetched_md5 = hashlib.md5(stripped_text_from_html.encode('utf-8')).hexdigest() if not len(r.text):
update_obj["last_error"] = "Empty reply"
if self.current_md5 != fetched_md5: # If there's text to skip
# @todo we could abstract out the get_text() to handle this cleaner
if len(self.datastore.data['watching'][uuid]['ignore_text']):
content = self.strip_ignore_text(stripped_text_from_html,
self.datastore.data['watching'][uuid]['ignore_text'])
else:
content = stripped_text_from_html.encode('utf8')
# Dont confuse people by putting last-changed, when it actually just changed from nothing.. fetched_md5 = hashlib.md5(content).hexdigest()
if self.datastore.get_val(self.uuid, 'previous_md5') is not None:
self.datastore.update_watch(self.uuid, 'last_changed', self.timestamp)
self.datastore.update_watch(self.uuid, 'previous_md5', fetched_md5) # could be None or False depending on JSON type
self.save_response_html_output(r.text) if self.datastore.data['watching'][uuid]['previous_md5'] != fetched_md5:
output_filepath = self.save_response_stripped_output(stripped_text_from_html) changed_detected = True
# Update history with the stripped text for future reference, this will also mean we save the first # Don't confuse people by updating as last-changed, when it actually just changed from None..
# attempt because 'self.current_md5 != fetched_md5' (current_md5 will be None when not run) if self.datastore.get_val(uuid, 'previous_md5'):
# need to learn more about attr/setters/getters update_obj["last_changed"] = timestamp
history = self.datastore.get_val(self.uuid, 'history')
history.update(dict([(str(self.timestamp), output_filepath)]))
self.datastore.update_watch(self.uuid, 'history', history)
self.datastore.update_watch(self.uuid, 'last_checked', int(time.time())) update_obj["previous_md5"] = fetched_md5
pass
return changed_detected, update_obj, stripped_text_from_html

View File

@@ -1,14 +0,0 @@
from flask import make_response
from functools import wraps, update_wrapper
from datetime import datetime
def nocache(view):
@wraps(view)
def no_cache(*args, **kwargs):
response = make_response(view(*args, **kwargs))
response.headers['hmm'] = datetime.now()
return response
return update_wrapper(no_cache, view)

12
backend/pytest.ini Normal file
View File

@@ -0,0 +1,12 @@
[pytest]
addopts = --no-start-live-server --live-server-port=5005
#testpaths = tests pytest_invenio
#live_server_scope = session
filterwarnings =
ignore::DeprecationWarning:urllib3.*:
; logging options
log_cli = 1
log_cli_level = DEBUG
log_cli_format = %(asctime)s %(name)s: %(levelname)s %(message)s

View File

@@ -48,7 +48,13 @@ section.content {
/* table related */ /* table related */
.watch-table { .watch-table {
width: 100%; width: 100%;
} }
.watch-table tr.unviewed {
font-weight: bold;
}
.watch-tag-list { .watch-tag-list {
color: #e70069; color: #e70069;
white-space: nowrap; white-space: nowrap;
@@ -82,11 +88,16 @@ section.content {
margin: 0 3px 0 5px; margin: 0 3px 0 5px;
} }
#check-all-button { #post-list-buttons {
text-align:right; text-align: right;
padding: 0px;
margin: 0px;
}
#post-list-buttons li {
display: inline-block;
} }
#check-all-button a { #post-list-buttons a {
border-top-left-radius: initial; border-top-left-radius: initial;
border-top-right-radius: initial; border-top-right-radius: initial;
border-bottom-left-radius: 5px; border-bottom-left-radius: 5px;
@@ -208,3 +219,59 @@ body:after, body:before {
color: #fff; color: #fff;
} }
#diff-col {
padding-left:40px;
}
#diff-jump {
position: fixed;
left: 0px;
top: 80px;
background: #fff;
padding: 10px;
border-top-right-radius: 5px;
border-bottom-right-radius: 5px;
box-shadow: 5px 0 5px -2px #888;
}
#diff-jump a {
color: #1b98f8;
cursor: grabbing;
-moz-user-select: none;
-webkit-user-select: none;
-ms-user-select:none;
user-select:none;
-o-user-select:none;
}
footer {
padding: 10px;
background: #fff;
color: #444;
text-align: center;
}
#feed-icon {
vertical-align: middle;
}
#version {
position: absolute;
top: 80px;
right: 0px;
font-size: 8px;
background: #fff;
padding: 10px;
}
#new-version-text a{
color: #e07171;
}
.paused-state.state-False img {
opacity: 0.2;
}
.paused-state.state-False:hover img{
opacity: 0.8;
}

View File

@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg"
id="RSSicon"
viewBox="0 0 8 8" width="256" height="256">
<title>RSS feed icon</title>
<style type="text/css">
.button {stroke: none; fill: orange;}
.symbol {stroke: none; fill: white;}
</style>
<rect class="button" width="8" height="8" rx="1.5" />
<circle class="symbol" cx="2" cy="6" r="1" />
<path class="symbol" d="m 1,4 a 3,3 0 0 1 3,3 h 1 a 4,4 0 0 0 -4,-4 z" />
<path class="symbol" d="m 1,2 a 5,5 0 0 1 5,5 h 1 a 6,6 0 0 0 -6,-6 z" />
</svg>

After

Width:  |  Height:  |  Size: 569 B

View File

@@ -0,0 +1,84 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
version="1.1"
id="Capa_1"
x="0px"
y="0px"
viewBox="0 0 15 14.998326"
xml:space="preserve"
width="15"
height="14.998326"><metadata
id="metadata39"><rdf:RDF><cc:Work
rdf:about=""><dc:format>image/svg+xml</dc:format><dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" /><dc:title></dc:title></cc:Work></rdf:RDF></metadata><defs
id="defs37" />
<path
id="path2"
style="fill:#1b98f8;fill-opacity:1;stroke-width:0.0292893"
d="M 7.4975161,6.5052867e-4 C 4.549072,-0.04028702 1.7055675,1.8548221 0.58868606,4.5801341 -0.57739762,7.2574642 0.02596981,10.583326 2.069916,12.671949 4.0364753,14.788409 7.2763651,15.56067 9.989207,14.57284 12.801145,13.617602 14.87442,10.855325 14.985833,7.8845744 15.172496,4.9966544 13.49856,2.1100704 10.911002,0.8209349 9.8598067,0.28073592 8.6791261,-0.00114855 7.4975161,6.5052867e-4 Z M 6.5602569,10.251923 c -0.00509,0.507593 -0.5693885,0.488472 -0.9352002,0.468629 -0.3399386,0.0018 -0.8402048,0.07132 -0.9297965,-0.374189 -0.015842,-1.8973128 -0.015872,-3.7979649 0,-5.6952784 0.1334405,-0.5224315 0.7416869,-0.3424086 1.1377562,-0.374189 0.3969969,-0.084515 0.8245634,0.1963256 0.7272405,0.6382917 0,1.7789118 0,3.5578239 0,5.3367357 z m 3.7490371,0 c -0.0051,0.507593 -0.5693888,0.488472 -0.9352005,0.468629 -0.3399386,0.0018 -0.8402048,0.07132 -0.9297965,-0.374189 -0.015842,-1.8973128 -0.015872,-3.7979649 0,-5.6952784 0.1334405,-0.5224315 0.7416869,-0.3424086 1.1377562,-0.374189 0.3969969,-0.084515 0.8245638,0.1963256 0.7272408,0.6382917 0,1.7789118 0,3.5578239 0,5.3367357 z" />
<g
id="g4"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g6"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g8"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g10"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g12"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g14"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g16"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g18"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g20"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g22"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g24"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g26"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g28"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g30"
transform="translate(-0.01903604,0.02221043)">
</g>
<g
id="g32"
transform="translate(-0.01903604,0.02221043)">
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.9 KiB

View File

@@ -1,48 +1,64 @@
import json import json
import uuid as uuid_builder import uuid as uuid_builder
import validators
import os.path import os.path
from os import path from os import path
from threading import Lock
from copy import deepcopy
import logging
import time
import threading
# Is there an existing library to ensure some data store (JSON etc) is in sync with CRUD methods? # Is there an existing library to ensure some data store (JSON etc) is in sync with CRUD methods?
# Open a github issue if you know something :) # Open a github issue if you know something :)
# https://stackoverflow.com/questions/6190468/how-to-trigger-function-on-value-change # https://stackoverflow.com/questions/6190468/how-to-trigger-function-on-value-change
class ChangeDetectionStore: class ChangeDetectionStore:
lock = Lock()
def __init__(self): def __init__(self, datastore_path="/datastore", include_default_watches=True):
self.needs_write = False self.needs_write = False
self.datastore_path = datastore_path
self.json_store_path = "{}/url-watches.json".format(self.datastore_path)
self.stop_thread = False
self.__data = { self.__data = {
'note' : "Hello! If you change this file manually, please be sure to restart your changedetection.io instance!", 'note': "Hello! If you change this file manually, please be sure to restart your changedetection.io instance!",
'watching': {}, 'watching': {},
'tag': "0.22",
'settings': { 'settings': {
'headers': { 'headers': {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate', # No support for brolti in python requests yet. 'Accept-Encoding': 'gzip, deflate', # No support for brolti in python requests yet.
'Accept-Language': 'en-GB,en-US;q=0.9,en;' 'Accept-Language': 'en-GB,en-US;q=0.9,en;'
}, },
'requests': { 'requests': {
'timeout': 15, # Default 15 seconds 'timeout': 15, # Default 15 seconds
'minutes_between_check': 3 * 60, # Default 3 hours 'minutes_between_check': 3 * 60, # Default 3 hours
'workers': 10 # Number of threads, lower is better for slow connections 'workers': 10 # Number of threads, lower is better for slow connections
},
'application': {
'password': False
} }
} }
} }
# Base definition for all watchers # Base definition for all watchers
self.generic_definition = { self.generic_definition = {
'url': None, 'url': None,
'tag': None, 'tag': None,
'last_checked': 0, 'last_checked': 0,
'last_changed': 0, 'last_changed': 0,
'paused': False,
'last_viewed': 0, # history key value of the last viewed via the [diff] link
'newest_history_key': "",
'title': None, 'title': None,
'previous_md5': None, 'previous_md5': "",
'uuid': str(uuid_builder.uuid4()), 'uuid': str(uuid_builder.uuid4()),
'headers' : {}, # Extra headers to send 'headers': {}, # Extra headers to send
'history' : {} # Dict of timestamp and output stripped filename 'history': {}, # Dict of timestamp and output stripped filename
'ignore_text': [] # List of text to ignore when calculating the comparison checksum
} }
if path.isfile('/source.txt'): if path.isfile('/source.txt'):
@@ -52,7 +68,8 @@ class ChangeDetectionStore:
self.__data['build_sha'] = f.read() self.__data['build_sha'] = f.read()
try: try:
with open('/datastore/url-watches.json') as json_file: # @todo retest with ", encoding='utf-8'"
with open(self.json_store_path) as json_file:
from_disk = json.load(json_file) from_disk = json.load(json_file)
# @todo isnt there a way todo this dict.update recursively? # @todo isnt there a way todo this dict.update recursively?
@@ -60,6 +77,9 @@ class ChangeDetectionStore:
if 'watching' in from_disk: if 'watching' in from_disk:
self.__data['watching'].update(from_disk['watching']) self.__data['watching'].update(from_disk['watching'])
if 'app_guid' in from_disk:
self.__data['app_guid'] = from_disk['app_guid']
if 'settings' in from_disk: if 'settings' in from_disk:
if 'headers' in from_disk['settings']: if 'headers' in from_disk['settings']:
self.__data['settings']['headers'].update(from_disk['settings']['headers']) self.__data['settings']['headers'].update(from_disk['settings']['headers'])
@@ -67,59 +87,137 @@ class ChangeDetectionStore:
if 'requests' in from_disk['settings']: if 'requests' in from_disk['settings']:
self.__data['settings']['requests'].update(from_disk['settings']['requests']) self.__data['settings']['requests'].update(from_disk['settings']['requests'])
if 'application' in from_disk['settings']:
self.__data['settings']['application'].update(from_disk['settings']['application'])
# Reinitialise each `watching` with our generic_definition in the case that we add a new var in the future. # Reinitialise each `watching` with our generic_definition in the case that we add a new var in the future.
# @todo pretty sure theres a python we todo this with an abstracted(?) object! # @todo pretty sure theres a python we todo this with an abstracted(?) object!
i = 0 for uuid, watch in self.__data['watching'].items():
for uuid, watch in self.data['watching'].items(): _blank = deepcopy(self.generic_definition)
_blank = self.generic_definition.copy()
_blank.update(watch) _blank.update(watch)
self.__data['watching'].update({uuid: _blank}) self.__data['watching'].update({uuid: _blank})
print("Watching:", uuid, _blank['url']) self.__data['watching'][uuid]['newest_history_key'] = self.get_newest_history_key(uuid)
print("Watching:", uuid, self.__data['watching'][uuid]['url'])
# First time ran, doesnt exist. # First time ran, doesnt exist.
except (FileNotFoundError, json.decoder.JSONDecodeError): except (FileNotFoundError, json.decoder.JSONDecodeError):
print("Creating JSON store") if include_default_watches:
self.add_watch(url='http://www.quotationspage.com/random.php', tag='test') print("Creating JSON store at", self.datastore_path)
self.add_watch(url='https://news.ycombinator.com/', tag='Tech news')
self.add_watch(url='https://www.gov.uk/coronavirus', tag='Covid')
self.add_watch(url='https://changedetection.io', tag='Tech news')
self.add_watch(url='http://www.quotationspage.com/random.php', tag='test')
# self.entryVariable.get() self.add_watch(url='https://news.ycombinator.com/', tag='Tech news')
def update_watch(self, uuid, val, var): self.add_watch(url='https://www.gov.uk/coronavirus', tag='Covid')
self.add_watch(url='https://changedetection.io', tag='Tech news')
self.__data['version_tag'] = "0.28"
if not 'app_guid' in self.__data:
self.__data['app_guid'] = str(uuid_builder.uuid4())
self.__data['watching'][uuid].update({val: var})
self.needs_write = True self.needs_write = True
# Finally start the thread that will manage periodic data saves to JSON
save_data_thread = threading.Thread(target=self.save_datastore).start()
# Returns the newest key, but if theres only 1 record, then it's counted as not being new, so return 0.
def get_newest_history_key(self, uuid):
if len(self.__data['watching'][uuid]['history']) == 1:
return 0
dates = list(self.__data['watching'][uuid]['history'].keys())
# Convert to int, sort and back to str again
dates = [int(i) for i in dates]
dates.sort(reverse=True)
if len(dates):
# always keyed as str
return str(dates[0])
return 0
def set_last_viewed(self, uuid, timestamp):
self.data['watching'][uuid].update({'last_viewed': int(timestamp)})
self.needs_write = True
def update_watch(self, uuid, update_obj):
# Skip if 'paused' state
if self.__data['watching'][uuid]['paused']:
return
with self.lock:
# In python 3.9 we have the |= dict operator, but that still will lose data on nested structures...
for dict_key, d in self.generic_definition.items():
if isinstance(d, dict):
if update_obj is not None and dict_key in update_obj:
self.__data['watching'][uuid][dict_key].update(update_obj[dict_key])
del (update_obj[dict_key])
self.__data['watching'][uuid].update(update_obj)
self.__data['watching'][uuid]['newest_history_key'] = self.get_newest_history_key(uuid)
self.needs_write = True
@property @property
def data(self): def data(self):
has_unviewed = False
for uuid, v in self.__data['watching'].items():
self.__data['watching'][uuid]['newest_history_key'] = self.get_newest_history_key(uuid)
if int(v['newest_history_key']) <= int(v['last_viewed']):
self.__data['watching'][uuid]['viewed'] = True
else:
self.__data['watching'][uuid]['viewed'] = False
has_unviewed = True
self.__data['has_unviewed'] = has_unviewed
return self.__data return self.__data
def get_all_tags(self): def get_all_tags(self):
tags=[] tags = []
for uuid, watch in self.data['watching'].items(): for uuid, watch in self.data['watching'].items():
# Support for comma separated list of tags. # Support for comma separated list of tags.
for tag in watch['tag'].split(','): for tag in watch['tag'].split(','):
tag = tag.strip() tag = tag.strip()
if not tag in tags: if tag not in tags:
tags.append(tag) tags.append(tag)
tags.sort() tags.sort()
return tags return tags
def delete(self, uuid): def unlink_history_file(self, path):
# Probably their should be dict... try:
del(self.__data['watching'][uuid]) os.unlink(path)
self.needs_write = True except (FileNotFoundError, IOError):
pass
# Delete a single watch by UUID
def delete(self, uuid):
with self.lock:
if uuid == 'all':
self.__data['watching'] = {}
# GitHub #30 also delete history records
for uuid in self.data['watching']:
for path in self.data['watching'][uuid]['history'].values():
self.unlink_history_file(path)
else:
for path in self.data['watching'][uuid]['history'].values():
self.unlink_history_file(path)
del self.data['watching'][uuid]
self.needs_write = True
def url_exists(self, url): def url_exists(self, url):
# Probably their should be dict... # Probably their should be dict...
for watch in self.data['watching']: for watch in self.data['watching'].values():
if watch['url'] == url: if watch['url'] == url:
return True return True
@@ -129,30 +227,114 @@ class ChangeDetectionStore:
# Probably their should be dict... # Probably their should be dict...
return self.data['watching'][uuid].get(val) return self.data['watching'][uuid].get(val)
def add_watch(self, url, tag): # Remove a watchs data but keep the entry (URL etc)
def scrub_watch(self, uuid, limit_timestamp = False):
print("Adding", url, tag) import hashlib
# # @todo deal with exception del_timestamps = []
# validators.url(url)
for timestamp, path in self.data['watching'][uuid]['history'].items():
if not limit_timestamp or (limit_timestamp is not False and int(timestamp) > limit_timestamp):
self.unlink_history_file(path)
del_timestamps.append(timestamp)
if not limit_timestamp:
self.data['watching'][uuid]['last_checked'] = 0
self.data['watching'][uuid]['last_changed'] = 0
self.data['watching'][uuid]['previous_md5'] = 0
for timestamp in del_timestamps:
del self.data['watching'][uuid]['history'][str(timestamp)]
# If there was a limitstamp, we need to reset some meta data about the entry
# This has to happen after we remove the others from the list
if limit_timestamp:
newest_key = self.get_newest_history_key(uuid)
if newest_key:
self.data['watching'][uuid]['last_checked'] = int(newest_key)
# @todo should be the original value if it was less than newest key
self.data['watching'][uuid]['last_changed'] = int(newest_key)
try:
with open(self.data['watching'][uuid]['history'][str(newest_key)], "rb") as fp:
content = fp.read()
self.data['watching'][uuid]['previous_md5'] = hashlib.md5(content).hexdigest()
except (FileNotFoundError, IOError):
self.data['watching'][uuid]['previous_md5'] = False
pass
# @todo use a common generic version of this
new_uuid = str(uuid_builder.uuid4())
_blank = self.generic_definition.copy()
_blank.update({
'url': url,
'tag': tag,
'uuid': new_uuid
})
self.data['watching'][new_uuid] = _blank
self.needs_write = True self.needs_write = True
def add_watch(self, url, tag):
with self.lock:
# @todo use a common generic version of this
new_uuid = str(uuid_builder.uuid4())
_blank = deepcopy(self.generic_definition)
_blank.update({
'url': url,
'tag': tag,
'uuid': new_uuid
})
self.data['watching'][new_uuid] = _blank
# Get the directory ready
output_path = "{}/{}".format(self.datastore_path, new_uuid)
try:
os.mkdir(output_path)
except FileExistsError:
print(output_path, "already exists.")
self.sync_to_json()
return new_uuid return new_uuid
# Save some text file to the appropriate path and bump the history
# result_obj from fetch_site_status.run()
def save_history_text(self, uuid, result_obj, contents):
output_path = "{}/{}".format(self.datastore_path, uuid)
fname = "{}/{}-{}.stripped.txt".format(output_path, result_obj['previous_md5'], str(time.time()))
with open(fname, 'w') as f:
f.write(contents)
f.close()
# Update history with the stripped text for future reference, this will also mean we save the first
# Should always be keyed by string(timestamp)
self.update_watch(uuid, {"history": {str(result_obj["last_checked"]): fname}})
return fname
def sync_to_json(self): def sync_to_json(self):
print ("Saving index") print("Saving..")
with open('/datastore/url-watches.json', 'w') as json_file: data ={}
json.dump(self.data, json_file, indent=4)
self.needs_write = False try:
data = deepcopy(self.__data)
except RuntimeError:
time.sleep(0.5)
print ("! Data changed when writing to JSON, trying again..")
self.sync_to_json()
return
else:
with open(self.json_store_path, 'w') as json_file:
json.dump(data, json_file, indent=4)
logging.info("Re-saved index")
self.needs_write = False
# Thread runner, this helps with thread/write issues when there are many operations that want to update the JSON
# by just running periodically in one thread, according to python, dict updates are threadsafe.
def save_datastore(self):
while True:
if self.stop_thread:
print("Shutting down datastore thread")
return
if self.needs_write:
self.sync_to_json()
time.sleep(3)
# body of the constructor

View File

@@ -17,13 +17,21 @@
<div class="header"> <div class="header">
<div class="home-menu pure-menu pure-menu-horizontal pure-menu-fixed"> <div class="home-menu pure-menu pure-menu-horizontal pure-menu-fixed">
<a class="pure-menu-heading" href="/"><strong>Change</strong>Detection.io</a> {% if not current_user.is_authenticated %}
<a class="pure-menu-heading" href="https://github.com/dgtlmoon/changedetection.io" rel="noopener"><strong>Change</strong>Detection.io</a>
{% else %}
<a class="pure-menu-heading" href="/"><strong>Change</strong>Detection.io</a>
{% endif %}
{% if current_diff_url %} {% if current_diff_url %}
<a class=current-diff-url href="{{ current_diff_url }}"><span style="max-width: 30%; overflow: hidden;">{{ current_diff_url }}</a> <a class=current-diff-url href="{{ current_diff_url }}"><span style="max-width: 30%; overflow: hidden;">{{ current_diff_url }}</span></a>
{% else %}
{% if new_version_available %}
<span id="new-version-text" class="pure-menu-heading"><a href="https://github.com/dgtlmoon/changedetection.io">A new version is available</a></span>
{% endif %}
{% endif %} {% endif %}
<ul class="pure-menu-list"> <ul class="pure-menu-list">
{% if current_user.is_authenticated %}
<li class="pure-menu-item"> <li class="pure-menu-item">
<a href="/backup" class="pure-menu-link">BACKUP</a> <a href="/backup" class="pure-menu-link">BACKUP</a>
</li> </li>
@@ -33,8 +41,18 @@
<li class="pure-menu-item"> <li class="pure-menu-item">
<a href="/settings" class="pure-menu-link">SETTINGS</a> <a href="/settings" class="pure-menu-link">SETTINGS</a>
</li> </li>
{% else %}
<li class="pure-menu-item">
<a class="pure-menu-link" href="https://github.com/dgtlmoon/changedetection.io">Website Change Detection and Notification.</a>
</li>
{% endif %}
{% if current_user.is_authenticated %}
<li class="pure-menu-item"><a href="/logout" class="pure-menu-link">LOG OUT</a></li>
{% endif %}
<li class="pure-menu-item"><a class="github-link" href="https://github.com/dgtlmoon/changedetection.io"> <li class="pure-menu-item"><a class="github-link" href="https://github.com/dgtlmoon/changedetection.io">
<svg class="octicon octicon-mark-github v-align-middle" height="32" viewBox="0 0 16 16" version="1.1" <svg class="octicon octicon-mark-github v-align-middle" height="32" viewBox="0 0 16 16"
version="1.1"
width="32" aria-hidden="true"> width="32" aria-hidden="true">
<path fill-rule="evenodd" <path fill-rule="evenodd"
d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"></path> d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"></path>
@@ -47,7 +65,7 @@
</ul> </ul>
</div> </div>
</div> </div>
<div id="version">v{{ version }}</div>
<section class="content"> <section class="content">
<header> <header>
{% block header %}{% endblock %} {% block header %}{% endblock %}
@@ -66,5 +84,6 @@
{% endblock %} {% endblock %}
</section> </section>
</body> </body>
</html> </html>

View File

@@ -8,7 +8,7 @@
<fieldset> <fieldset>
<label for="diffWords" class="pure-checkbox"> <label for="diffWords" class="pure-checkbox">
<input type="radio" name="diff_type" id="diffWords" value="diffWords" /> Words</label> <input type="radio" name="diff_type" id="diffWords" value="diffWords"/> Words</label>
<label for="diffLines" class="pure-checkbox"> <label for="diffLines" class="pure-checkbox">
<input type="radio" name="diff_type" id="diffLines" value="diffLines" checked=""/> Lines</label> <input type="radio" name="diff_type" id="diffLines" value="diffLines" checked=""/> Lines</label>
@@ -19,9 +19,9 @@
<label for="diff-version">Compare newest (<span id="current-v-date"></span>) with</label> <label for="diff-version">Compare newest (<span id="current-v-date"></span>) with</label>
<select id="diff-version" name="previous_version"> <select id="diff-version" name="previous_version">
{% for version in versions %} {% for version in versions %}
<option value="{{version}}" {% if version== current_previous_version %} selected="" {% endif %}> <option value="{{version}}" {% if version== current_previous_version %} selected="" {% endif %}>
{{version}} {{version}}
</option> </option>
{% endfor %} {% endfor %}
</select> </select>
<button type="submit" class="pure-button pure-button-primary">Go</button> <button type="submit" class="pure-button pure-button-primary">Go</button>
@@ -32,6 +32,9 @@
<ins>Inserted Text</ins> <ins>Inserted Text</ins>
</div> </div>
<div id="diff-jump">
<a onclick="next_diff();">Jump</a>
</div>
<div id="diff-ui"> <div id="diff-ui">
<table> <table>
@@ -40,7 +43,7 @@
<!-- just proof of concept copied straight from github.com/kpdecker/jsdiff --> <!-- just proof of concept copied straight from github.com/kpdecker/jsdiff -->
<td id="a" style="display: none;">{{previous}}</td> <td id="a" style="display: none;">{{previous}}</td>
<td id="b" style="display: none;">{{newest}}</td> <td id="b" style="display: none;">{{newest}}</td>
<td> <td id="diff-col">
<span id="result"></span> <span id="result"></span>
</td> </td>
</tr> </tr>
@@ -72,9 +75,12 @@ function changed() {
var node; var node;
if (diff[i].removed) { if (diff[i].removed) {
node = document.createElement('del'); node = document.createElement('del');
node.classList.add("change");
node.appendChild(document.createTextNode(diff[i].value)); node.appendChild(document.createTextNode(diff[i].value));
} else if (diff[i].added) { } else if (diff[i].added) {
node = document.createElement('ins'); node = document.createElement('ins');
node.classList.add("change");
node.appendChild(document.createTextNode(diff[i].value)); node.appendChild(document.createTextNode(diff[i].value));
} else { } else {
node = document.createTextNode(diff[i].value); node = document.createTextNode(diff[i].value);
@@ -84,6 +90,10 @@ function changed() {
result.textContent = ''; result.textContent = '';
result.appendChild(fragment); result.appendChild(fragment);
// Jump at start
inputs.current=0;
next_diff();
} }
window.onload = function() { window.onload = function() {
@@ -106,6 +116,7 @@ window.onload = function() {
onDiffTypeChange(document.querySelector('#settings [name="diff_type"]:checked')); onDiffTypeChange(document.querySelector('#settings [name="diff_type"]:checked'));
changed(); changed();
}; };
a.onpaste = a.onchange = a.onpaste = a.onchange =
@@ -131,6 +142,27 @@ for (var i = 0; i < radio.length; i++) {
} }
var inputs = document.getElementsByClassName('change');
inputs.current=0;
function next_diff() {
var element = inputs[inputs.current];
var headerOffset = 80;
var elementPosition = element.getBoundingClientRect().top;
var offsetPosition = elementPosition - headerOffset + window.scrollY;
window.scrollTo({
top: offsetPosition,
behavior: "smooth"
});
inputs.current++;
if(inputs.current >= inputs.length) {
inputs.current=0;
}
}

View File

@@ -4,7 +4,7 @@
<div class="edit-form"> <div class="edit-form">
<form class="pure-form pure-form-stacked" action="/edit?uuid={{uuid}}" method="POST"> <form class="pure-form pure-form-stacked" action="/edit/{{uuid}}" method="POST">
<fieldset> <fieldset>
<div class="pure-control-group"> <div class="pure-control-group">
<label for="url">URL</label> <label for="url">URL</label>
@@ -18,10 +18,26 @@
<span class="pure-form-message-inline">Grouping tags, can be a comma separated list.</span> <span class="pure-form-message-inline">Grouping tags, can be a comma separated list.</span>
</div> </div>
<!-- @todo: move to tabs --->
<fieldset class="pure-group">
<label for="ignore-text">Ignore text</label>
<textarea id="ignore-text" name="ignore-text" class="pure-input-1-2" placeholder=""
style="width: 100%;
font-family:monospace;
white-space: pre;
overflow-wrap: normal;
overflow-x: scroll;" rows="5">{% for value in watch.ignore_text %}{{ value }}
{% endfor %}</textarea>
<span class="pure-form-message-inline">Each line will be processed separately as an ignore rule.</span>
</fieldset>
<!-- @todo: move to tabs --->
<fieldset class="pure-group"> <fieldset class="pure-group">
<label for="headers">Extra request headers</label> <label for="headers">Extra request headers</label>
<textarea id=headers name="headers" class="pure-input-1-2" placeholder="Example <textarea id="headers" name="headers" class="pure-input-1-2" placeholder="Example
Cookie: foobar Cookie: foobar
User-Agent: wonderbra 1.0" User-Agent: wonderbra 1.0"
style="width: 100%; style="width: 100%;
@@ -33,6 +49,8 @@ User-Agent: wonderbra 1.0"
<br/> <br/>
</fieldset> </fieldset>
<div class="pure-control-group"> <div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary">Save</button> <button type="submit" class="pure-button pure-button-primary">Save</button>
</div> </div>

View File

@@ -0,0 +1,20 @@
{% extends 'base.html' %}
{% block content %}
<div class="edit-form">
<form class="pure-form pure-form-stacked" action="/login" method="POST">
<fieldset>
<div class="pure-control-group">
<label for="password">Password</label>
<input type="password" id="password" required="" name="password" value=""
size="15"/>
<input type="hidden" id="email" name="email" value="defaultuser@changedetection.io" />
</div>
<div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary">Submit</button>
</div>
</fieldset>
</form>
</div>
{% endblock %}

View File

@@ -0,0 +1,26 @@
{% extends 'base.html' %}
{% block content %}
<div id="settings">
<h1>Current</h1>
</div>
<div id="diff-ui">
<table>
<tbody>
<tr>
<!-- just proof of concept copied straight from github.com/kpdecker/jsdiff -->
<td id="diff-col">
<span id="result">{% for row in content %}<pre>{{row}}</pre>{% endfor %}</span>
</td>
</tr>
</tbody>
</table>
</div>
{% endblock %}

View File

@@ -17,14 +17,19 @@
<div class="pure-control-group"> <div class="pure-control-group">
<br/> <br/>
<label for="confirmtext">Confirm</label><br/> <label for="confirmtext">Confirm text</label><br/>
<input type="text" id="confirmtext" required="" name="confirmtext" value="" size="10"/> <input type="text" id="confirmtext" required="" name="confirmtext" value="" size="10"/>
</div>
<div class="pure-control-group">
<br/>
<label for="confirmtext">Limit delete history including and after date</label><br/>
<input type="text" id="limit_date" required="" name="limit_date" value="" size="10"/>
<br/> <br/>
</div> </div>
<div class="pure-control-group"> <div class="pure-control-group">
<button type="submit" class="pure-button pure-button-primary">Scrub!</button> <button type="submit" class="pure-button pure-button-primary">Scrub!</button>
</div> </div>

View File

@@ -12,7 +12,14 @@
size="5"/> size="5"/>
<span class="pure-form-message-inline">This is a required field.</span> <span class="pure-form-message-inline">This is a required field.</span>
</div> </div>
<br/>
<div class="pure-control-group">
<label for="minutes">Password protection</label>
<input type="password" id="password" name="password" size="15"/>
{% if current_user.is_authenticated %}
<a href="/settings?removepassword=true" class="pure-button pure-button-primary">Remove password</a>
{% endif %}
</div>
<br/> <br/>
<div class="pure-control-group"> <div class="pure-control-group">
@@ -22,7 +29,7 @@
<div class="pure-control-group"> <div class="pure-control-group">
<a href="/" class="pure-button button-small button-cancel">Back</a> <a href="/" class="pure-button button-small button-cancel">Back</a>
<a href="/scrub" class="pure-button button-small button-cancel">Reset all version data</a> <a href="/scrub" class="pure-button button-small button-cancel">Delete history version data</a>
</div> </div>

View File

@@ -15,12 +15,10 @@
<!-- user/pass r = requests.get('https://api.github.com/user', auth=('user', 'pass')) --> <!-- user/pass r = requests.get('https://api.github.com/user', auth=('user', 'pass')) -->
</form> </form>
<div> <div>
<a href="/" class="pure-button button-tag {{'active' if not active_tag }}">All</a>
{% for tag in tags %} {% for tag in tags %}
{% if tag == "" %} {% if tag != "" %}
<a href="/" class="pure-button button-tag {{'active' if active_tag == tag }}">All</a> <a href="/?tag={{ tag}}" class="pure-button button-tag {{'active' if active_tag == tag }}">{{ tag }}</a>
{% else %}
<a href="/?tag={{ tag}}" class="pure-button button-tag {{'active' if active_tag == tag }}">{{ tag }}</a>
{% endif %} {% endif %}
{% endfor %} {% endfor %}
</div> </div>
@@ -31,6 +29,7 @@
<tr> <tr>
<th>#</th> <th>#</th>
<th></th> <th></th>
<th></th>
<th>Last Checked</th> <th>Last Checked</th>
<th>Last Changed</th> <th>Last Changed</th>
<th></th> <th></th>
@@ -41,10 +40,14 @@
{% for watch in watches %} {% for watch in watches %}
<tr id="{{ watch.uuid }}" <tr id="{{ watch.uuid }}"
class="{{ loop.cycle('pure-table-odd', 'pure-table-even') }} {% if watch.last_error is defined and watch.last_error != False %}error{% endif %}"> class="{{ loop.cycle('pure-table-odd', 'pure-table-even') }}
{% if watch.last_error is defined and watch.last_error != False %}error{% endif %}
{% if watch.paused is defined and watch.paused != False %}paused{% endif %}
{% if watch.newest_history_key| int > watch.last_viewed| int %}unviewed{% endif %}">
<td>{{ loop.index }}</td> <td>{{ loop.index }}</td>
<td class="paused-state state-{{watch.paused}}"><a href="/?pause={{ watch.uuid}}{% if active_tag %}&tag={{active_tag}}{% endif %}"><img src="/static/images/pause.svg" alt="Pause"/></a></td>
<td class="title-col">{{watch.title if watch.title is not none else watch.url}} <td class="title-col">{{watch.title if watch.title is not none else watch.url}}
<a class="external" target=_blank href="{{ watch.url }}"></a> <a class="external" target="_blank" rel="noopener" href="{{ watch.url }}"></a>
{% if watch.last_error is defined and watch.last_error != False %} {% if watch.last_error is defined and watch.last_error != False %}
<div class="fetch-error">{{ watch.last_error }}</div> <div class="fetch-error">{{ watch.last_error }}</div>
{% endif %} {% endif %}
@@ -53,23 +56,42 @@
{% endif %} {% endif %}
</td> </td>
<td>{{watch|format_last_checked_time}}</td> <td>{{watch|format_last_checked_time}}</td>
<td>{{watch.last_changed|format_timestamp_timeago}}</td> <td>{% if watch.history|length >= 2 and watch.last_changed %}
<td><a href="/api/checknow?uuid={{ watch.uuid}}{% if request.args.get('tag') %}&tag={{request.args.get('tag')}}{% endif %}" class="pure-button button-small pure-button-primary">Recheck</a> {{watch.last_changed|format_timestamp_timeago}}
<a href="/edit?uuid={{ watch.uuid}}" class="pure-button button-small pure-button-primary">Edit</a> {% else %}
Not yet
{% endif %}
</td>
<td>
<a href="/api/checknow?uuid={{ watch.uuid}}{% if request.args.get('tag') %}&tag={{request.args.get('tag')}}{% endif %}"
class="pure-button button-small pure-button-primary">Recheck</a>
<a href="/edit/{{ watch.uuid}}" class="pure-button button-small pure-button-primary">Edit</a>
{% if watch.history|length >= 2 %} {% if watch.history|length >= 2 %}
<a href="/diff/{{ watch.uuid}}" class="pure-button button-small pure-button-primary">Diff</a> <a href="/diff/{{ watch.uuid}}" target="{{watch.uuid}}" class="pure-button button-small pure-button-primary">Diff</a>
{% else %}
{% if watch.history|length == 1 %}
<a href="/preview/{{ watch.uuid}}" target="{{watch.uuid}}" class="pure-button button-small pure-button-primary">Preview</a>
{% endif %}
{% endif %} {% endif %}
</td> </td>
</tr> </tr>
{% endfor %} {% endfor %}
</tbody> </tbody>
</table> </table>
<div id="check-all-button"> <ul id="post-list-buttons">
{% if has_unviewed %}
<a href="/api/checknow{% if active_tag%}?tag={{active_tag}}{%endif%}" class="pure-button button-tag " >Recheck all {% if active_tag%}in "{{active_tag}}"{%endif%}</a> <li>
</div> <a href="/api/mark-all-viewed" class="pure-button button-tag ">Mark all viewed</a>
</li>
{% endif %}
<li>
<a href="/api/checknow{% if active_tag%}?tag={{active_tag}}{%endif%}" class="pure-button button-tag ">Recheck
all {% if active_tag%}in "{{active_tag}}"{%endif%}</a>
</li>
<li>
<a href="{{ url_for('index', tag=active_tag , rss=true)}}"><img id="feed-icon" src="/static/images/Generic_Feed-icon.svg" height="15px"></a>
</li>
</ul>
</div> </div>
</div> </div>
{% endblock %} {% endblock %}

View File

@@ -0,0 +1,2 @@
"""Tests for the app."""

48
backend/tests/conftest.py Normal file
View File

@@ -0,0 +1,48 @@
#!/usr/bin/python3
import pytest
from backend import changedetection_app
from backend import store
import os
# https://github.com/pallets/flask/blob/1.1.2/examples/tutorial/tests/test_auth.py
# Much better boilerplate than the docs
# https://www.python-boilerplate.com/py3+flask+pytest/
global app
@pytest.fixture(scope='session')
def app(request):
"""Create application for the tests."""
datastore_path = "./test-datastore"
try:
os.mkdir(datastore_path)
except FileExistsError:
pass
try:
os.unlink("{}/url-watches.json".format(datastore_path))
except FileNotFoundError:
pass
app_config = {'datastore_path': datastore_path}
datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], include_default_watches=False)
app = changedetection_app(app_config, datastore)
app.config['STOP_THREADS'] = True
def teardown():
datastore.stop_thread = True
app.config.exit.set()
try:
os.unlink("{}/url-watches.json".format(datastore_path))
except FileNotFoundError:
# This is fine in the case of a failure.
pass
assert 1 == 1
request.addfinalizer(teardown)
yield app

View File

@@ -0,0 +1,160 @@
#!/usr/bin/python3
import time
from flask import url_for
from urllib.request import urlopen
import pytest
sleep_time_for_fetch_thread = 3
def test_setup_liveserver(live_server):
@live_server.app.route('/test-endpoint')
def test_endpoint():
# Tried using a global var here but didn't seem to work, so reading from a file instead.
with open("test-datastore/output.txt", "r") as f:
return f.read()
live_server.start()
assert 1 == 1
def set_original_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/output.txt", "w") as f:
f.write(test_return_data)
def set_modified_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>which has this one new line</p>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/output.txt", "w") as f:
f.write(test_return_data)
def test_check_basic_change_detection_functionality(client, live_server):
set_original_response()
# Add our URL to the import page
res = client.post(
url_for("import_page"),
data={"urls": url_for('test_endpoint', _external=True)},
follow_redirects=True
)
assert b"1 Imported" in res.data
time.sleep(sleep_time_for_fetch_thread)
# Do this a few times.. ensures we dont accidently set the status
for n in range(3):
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'test-endpoint' in res.data
#####################
# Make a change
set_modified_response()
res = urlopen(url_for('test_endpoint', _external=True))
assert b'which has this one new line' in res.read()
# Force recheck
res = client.get(url_for("api_watch_checknow"), follow_redirects=True)
assert b'1 watches are rechecking.' in res.data
time.sleep(sleep_time_for_fetch_thread)
# Now something should be ready, indicated by having a 'unviewed' class
res = client.get(url_for("index"))
assert b'unviewed' in res.data
# Following the 'diff' link, it should no longer display as 'unviewed' even after we recheck it a few times
res = client.get(url_for("diff_history_page", uuid="first"))
assert b'Compare newest' in res.data
time.sleep(2)
# Do this a few times.. ensures we dont accidently set the status
for n in range(2):
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'test-endpoint' in res.data
set_original_response()
client.get(url_for("api_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' in res.data
# Cleanup everything
res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data
def test_check_access_control(client):
return
# @note: does not seem to handle the last logout step correctly, we're still logged in.. but yet..
# pytest team keep telling us that we have a new context.. i'm lost :(
# Add our URL to the import page
res = client.post(
url_for("settings_page"),
data={"password": "foobar"},
follow_redirects=True
)
assert b"LOG OUT" not in res.data
client.get(url_for("import_page"), follow_redirects=True)
assert b"Password" in res.data
#defaultuser@changedetection.io is actually hardcoded for now, we only use a single password
res = client.post(
url_for("login"),
data={"password": "foobar", "email": "defaultuser@changedetection.io"},
follow_redirects=True
)
assert b"LOG OUT" in res.data
client.get(url_for("settings_page"), follow_redirects=True)
assert b"LOG OUT" in res.data
# Now remove the password so other tests function, @todo this should happen before each test automatically
print(res.data)
client.get(url_for("settings_page", removepassword="true"), follow_redirects=True)
client.get(url_for("import_page", removepassword="true"), follow_redirects=True)
assert b"LOG OUT" not in res.data

View File

@@ -0,0 +1,152 @@
#!/usr/bin/python3
import time
from flask import url_for
from urllib.request import urlopen
import pytest
# Unit test of the stripper
# Always we are dealing in utf-8
def test_strip_text_func():
from backend import fetch_site_status
test_content = """
Some content
is listed here
but sometimes we want to remove the lines.
but not always."""
ignore_lines = ["sometimes"]
fetcher = fetch_site_status.perform_site_check(datastore=False)
stripped_content = fetcher.strip_ignore_text(test_content, ignore_lines)
assert b"sometimes" not in stripped_content
assert b"Some content" in stripped_content
def set_original_ignore_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/output.txt", "w") as f:
f.write(test_return_data)
def set_modified_original_ignore_response():
test_return_data = """<html>
<body>
Some NEW nice initial text</br>
<p>Which is across multiple lines</p>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/output.txt", "w") as f:
f.write(test_return_data)
# Is the same but includes ZZZZZ, 'ZZZZZ' is the last line in ignore_text
def set_modified_ignore_response():
test_return_data = """<html>
<body>
Some initial text</br>
<p>Which is across multiple lines</p>
<P>ZZZZZ</P>
</br>
So let's see what happens. </br>
</body>
</html>
"""
with open("test-datastore/output.txt", "w") as f:
f.write(test_return_data)
def test_check_ignore_text_functionality(client, live_server):
sleep_time_for_fetch_thread = 3
ignore_text = "XXXXX\nYYYYY\nZZZZZ"
set_original_ignore_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# Goto the edit page, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"ignore-text": ignore_text, "url": test_url, "tag": "", "headers": ""},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Check it saved
res = client.get(
url_for("edit_page", uuid="first"),
)
assert bytes(ignore_text.encode('utf-8')) in res.data
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'/test-endpoint' in res.data
# Make a change
set_modified_ignore_response()
# Trigger a check
client.get(url_for("api_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# It should report nothing found (no new 'unviewed' class)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
assert b'/test-endpoint' in res.data
# Just to be sure.. set a regular modified change..
set_modified_original_ignore_response()
client.get(url_for("api_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' in res.data
res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
assert b'Deleted' in res.data

91
changedetection.py Normal file
View File

@@ -0,0 +1,91 @@
#!/usr/bin/python3
# Launch as a eventlet.wsgi server instance.
import getopt
import os
import sys
import eventlet
import eventlet.wsgi
import backend
from backend import store
def init_app_secret(datastore_path):
secret = ""
path = "{}/secret.txt".format(datastore_path)
try:
with open(path, "r") as f:
secret = f.read()
except FileNotFoundError:
import secrets
with open(path, "w") as f:
secret = secrets.token_hex(32)
f.write(secret)
return secret
def main(argv):
ssl_mode = False
port = 5000
# Must be absolute so that send_from_directory doesnt try to make it relative to backend/
datastore_path = os.path.join(os.getcwd(), "datastore")
try:
opts, args = getopt.getopt(argv, "sd:p:", "purge")
except getopt.GetoptError:
print('backend.py -s SSL enable -p [port] -d [datastore path]')
sys.exit(2)
for opt, arg in opts:
# if opt == '--purge':
# Remove history, the actual files you need to delete manually.
# for uuid, watch in datastore.data['watching'].items():
# watch.update({'history': {}, 'last_checked': 0, 'last_changed': 0, 'previous_md5': None})
if opt == '-s':
ssl_mode = True
if opt == '-p':
port = int(arg)
if opt == '-d':
datastore_path = arg
# isnt there some @thingy to attach to each route to tell it, that this route needs a datastore
app_config = {'datastore_path': datastore_path}
datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'])
app = backend.changedetection_app(app_config, datastore)
app.secret_key = init_app_secret(app_config['datastore_path'])
@app.context_processor
def inject_version():
return dict(version=datastore.data['version_tag'])
@app.context_processor
def inject_new_version_available():
return dict(new_version_available=app.config['NEW_VERSION_AVAILABLE'])
if ssl_mode:
# @todo finalise SSL config, but this should get you in the right direction if you need it.
eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen(('', port)),
certfile='cert.pem',
keyfile='privkey.pem',
server_side=True), app)
else:
eventlet.wsgi.server(eventlet.listen(('', port)), app)
if __name__ == '__main__':
main(sys.argv[1:])

View File

@@ -1,2 +0,0 @@
Empty dir, please keep, this is used to store your data!

View File

@@ -1,23 +0,0 @@
version: "2"
services:
# I have a feeling we can get rid of this, and just use one docker-compose.yml, and just set a ENV var if
# we want dev mode (just gives a docker shell) or not.
backend:
build: ./backend/dev-docker
image: dgtlmoon/changedetection.io:dev
container_name: changedetection.io-dev
volumes:
- ./backend:/app
- ./requirements.txt:/requirements.txt # Normally COPY'ed in the Dockerfile
- ./datastore:/datastore
ports:
- "127.0.0.1:5001:5000"
networks:
- changenet
networks:
changenet:

View File

@@ -1,21 +1,13 @@
aiohttp
async-timeout
chardet==2.3.0 chardet==2.3.0
multidict flask~= 1.0
python-engineio pytest ~=6.2
six==1.10.0 pytest-flask ~=1.1
yarl eventlet ~= 0.30
flask requests ~= 2.15
eventlet
requests
validators validators
timeago ~=1.0
bleach==3.2.1 inscriptis ~= 1.1
html5lib==0.9999999 # via bleach feedgen ~= 0.9
timeago flask-login ~= 0.5
html2text pytz
inscriptis urllib3
# @notes
# - Dont install socketio, it interferes with flask_socketio

Binary file not shown.

Before

Width:  |  Height:  |  Size: 217 KiB

After

Width:  |  Height:  |  Size: 213 KiB