Adding test

Merge branch 'master' into ticket-16-diff-in-RSS
Fix - {diff} and {diff_full} notifications tokens were not always including the full output
2026-06-12 11:51:38 +00:00 · 2022-03-30 10:32:51 +02:00 · 2022-03-30 10:24:40 +02:00 · 2022-03-29 19:18:26 +02:00 · 2022-03-26 15:53:12 +01:00 · 2022-03-26 15:35:56 +01:00
58 changed files with 1840 additions and 788 deletions
@@ -2,16 +2,20 @@ name: Build and push containers

 on:
  # Automatically triggered by a testing workflow passing, but this is only checked when it lands in the `master`/default branch
-  workflow_run:
-    workflows: ["ChangeDetection.io Test"]
-    branches: [master]
-    tags: ['0.*']
-    types: [completed]
+#  workflow_run:
+#    workflows: ["ChangeDetection.io Test"]
+#    branches: [master]
+#    tags: ['0.*']
+#    types: [completed]

  # Or a new tagged release
  release:
    types: [published, edited]

+  push:
+    branches:
+      - master
+
 jobs:
  metadata:
    runs-on: ubuntu-latest
@@ -91,8 +95,7 @@ jobs:
          file: ./Dockerfile
          push: true
          tags: |
-            ${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:latest
-            ghcr.io/${{ github.repository }}:latest
+            ${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:latest,ghcr.io/${{ github.repository }}:latest
          platforms: linux/amd64,linux/arm64,linux/arm/v6,linux/arm/v7
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,dest=/tmp/.buildx-cache
@@ -107,8 +110,7 @@ jobs:
          file: ./Dockerfile
          push: true
          tags: |
-            ${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:${{ github.event.release.tag_name }}
-            ghcr.io/dgtlmoon/changedetection.io:${{ github.event.release.tag_name }}
+            ${{ secrets.DOCKER_HUB_USERNAME }}/changedetection.io:${{ github.event.release.tag_name }},ghcr.io/dgtlmoon/changedetection.io:${{ github.event.release.tag_name }}
          platforms: linux/amd64,linux/arm64,linux/arm/v6,linux/arm/v7
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,dest=/tmp/.buildx-cache
@@ -7,4 +7,6 @@ __pycache__
 .pytest_cache
 build
 dist
+venv
+*.egg-info*
 .vscode/settings.json
@@ -2,5 +2,5 @@ recursive-include changedetectionio/templates *
 recursive-include changedetectionio/static *
 include changedetection.py
 global-exclude *.pyc
-global-exclude *node_modules*
+global-exclude node_modules
 global-exclude venv
@@ -9,19 +9,25 @@ _Know when web pages change! Stay ontop of new information!_

 Live your data-life *pro-actively* instead of *re-actively*.

-Open source web page monitoring, notification and change detection.
+Free, Open-source web page monitoring, notification and change detection. Don't have time? [Try our $6.99/month plan - unlimited checks and watches!](https://lemonade.changedetection.io/start)


-<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot.png" style="max-width:100%;" alt="Self-hosted web page change monitoring"  title="Self-hosted web page change monitoring"  />
+[<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/screenshot.png" style="max-width:100%;" alt="Self-hosted web page change monitoring"  title="Self-hosted web page change monitoring"  />](https://lemonade.changedetection.io/start)


-**Get your own instance now on Lemonade!**
+**Get your own private instance now! Let us host it for you!**

 [![Deploy to Lemonade](https://lemonade.changedetection.io/static/images/lemonade.svg)](https://lemonade.changedetection.io/start)

+
+[_Let us host your own private instance - We accept PayPal and Bitcoin, Support the further development of changedetection.io!_](https://lemonade.changedetection.io/start)
+
+
+
 - Automatic Updates, Automatic Backups, No Heroku "paused application", don't miss a change!
 - Javascript browser included
- Pay with Bitcoin
+- Unlimited checks and watches!
+

 #### Example use cases

@@ -99,6 +105,8 @@ See the wiki for more information https://github.com/dgtlmoon/changedetection.io
 ## Filters
 XPath, JSONPath and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools.

+(We support LXML re:test, re:math and re:replace.)
+
 ## Notifications

 ChangeDetection.io supports a massive amount of notifications (including email, office365, custom APIs, etc) when a web-page has a change detected thanks to the <a href="https://github.com/caronc/apprise">apprise</a> library.
@@ -155,9 +163,9 @@ See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configura

 Raspberry Pi and linux/arm/v6 linux/arm/v7 arm64 devices are supported! See the wiki for [details](https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver)

-## Windows native support?
+## Windows support?

-Sorry not yet :( https://github.com/dgtlmoon/changedetection.io/labels/windows
+YES! See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Microsoft-Windows

 ## Support us

@@ -1,110 +1,11 @@
 #!/usr/bin/python3

-# Launch as a eventlet.wsgi server instance.
-
-import getopt
-import os
-import sys
-
-import eventlet
-import eventlet.wsgi
-import changedetectionio
-
-from changedetectionio import store
-
-def main():
-    ssl_mode = False
-    host = ''
-    port = os.environ.get('PORT') or 5000
-    do_cleanup = False
-
-    # Must be absolute so that send_from_directory doesnt try to make it relative to backend/
-    datastore_path = os.path.join(os.getcwd(), "datastore")
-
-    try:
-        opts, args = getopt.getopt(sys.argv[1:], "Ccsd:h:p:", "port")
-    except getopt.GetoptError:
-        print('backend.py -s SSL enable -h [host] -p [port] -d [datastore path]')
-        sys.exit(2)
-
-    create_datastore_dir = False
-
-    for opt, arg in opts:
-        #        if opt == '--purge':
-        # Remove history, the actual files you need to delete manually.
-        #            for uuid, watch in datastore.data['watching'].items():
-        #                watch.update({'history': {}, 'last_checked': 0, 'last_changed': 0, 'previous_md5': None})
-
-        if opt == '-s':
-            ssl_mode = True
-
-        if opt == '-h':
-            host = arg
-
-        if opt == '-p':
-            port = int(arg)
-
-        if opt == '-d':
-            datastore_path = arg
-
-        # Cleanup (remove text files that arent in the index)
-        if opt == '-c':
-            do_cleanup = True
-
-        # Create the datadir if it doesnt exist
-        if opt == '-C':
-            create_datastore_dir = True
-
-    # isnt there some @thingy to attach to each route to tell it, that this route needs a datastore
-    app_config = {'datastore_path': datastore_path}
-
-    if not os.path.isdir(app_config['datastore_path']):
-        if create_datastore_dir:
-            os.mkdir(app_config['datastore_path'])
-        else:
-            print ("ERROR: Directory path for the datastore '{}' does not exist, cannot start, please make sure the directory exists.\n"
-                   "Alternatively, use the -C parameter.".format(app_config['datastore_path']),file=sys.stderr)
-            sys.exit(2)
-
-    datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], version_tag=changedetectionio.__version__)
-    app = changedetectionio.changedetection_app(app_config, datastore)
-
-    # Go into cleanup mode
-    if do_cleanup:
-        datastore.remove_unused_snapshots()
-
-    app.config['datastore_path'] = datastore_path
-
-
-    @app.context_processor
-    def inject_version():
-        return dict(right_sticky="v{}".format(datastore.data['version_tag']),
-                    new_version_available=app.config['NEW_VERSION_AVAILABLE'],
-                    has_password=datastore.data['settings']['application']['password'] != False
-                    )
-
-    # Proxy sub-directory support
-    # Set environment var USE_X_SETTINGS=1 on this script
-    # And then in your proxy_pass settings
-    #
-    #         proxy_set_header Host "localhost";
-    #         proxy_set_header X-Forwarded-Prefix /app;
-
-    if os.getenv('USE_X_SETTINGS'):
-        print ("USE_X_SETTINGS is ENABLED\n")
-        from werkzeug.middleware.proxy_fix import ProxyFix
-        app.wsgi_app = ProxyFix(app.wsgi_app, x_prefix=1, x_host=1)
-
-    if ssl_mode:
-        # @todo finalise SSL config, but this should get you in the right direction if you need it.
-        eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen((host, port)),
-                                               certfile='cert.pem',
-                                               keyfile='privkey.pem',
-                                               server_side=True), app)
-
-    else:
-        eventlet.wsgi.server(eventlet.listen((host, int(port))), app)
+# Entry-point for running from the CLI when not installed via Pip, Pip will handle the console_scripts entry_points's from setup.py
+# It's recommended to use `pip3 install changedetection.io` and start with `changedetection.py` instead, it will be linkd to your global path.
+# or Docker.
+# Read more https://github.com/dgtlmoon/changedetection.io/wiki

+from changedetectionio import changedetection

 if __name__ == '__main__':
-    main()
+    changedetection.main()
@@ -0,0 +1 @@
+test-datastore
@@ -35,8 +35,11 @@ from flask import (
    url_for,
 )
 from flask_login import login_required
+from flask_wtf import CSRFProtect

-__version__ = '0.39.8'
+from changedetectionio import html_tools
+
+__version__ = '0.39.11'

 datastore = None

@@ -50,11 +53,10 @@ update_q = queue.Queue()

 notification_q = queue.Queue()

-# Needs to be set this way because we also build and publish via pip
-base_path = os.path.dirname(os.path.realpath(__file__))
 app = Flask(__name__,
-            static_url_path="{}/static".format(base_path),
-            template_folder="{}/templates".format(base_path))
+            static_url_path="",
+            static_folder="static",
+            template_folder="templates")

 # Stop browser caching of assets
 app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
@@ -70,6 +72,9 @@ app.config['LOGIN_DISABLED'] = False
 # Disables caching of the templates
 app.config['TEMPLATES_AUTO_RELOAD'] = True

+csrf = CSRFProtect()
+csrf.init_app(app)
+
 notification_debug_log=[]

 def init_app_secret(datastore_path):
@@ -126,7 +131,7 @@ def _jinja2_filter_datetimestamp(timestamp, format="%Y-%m-%d %H:%M:%S"):
    # return timeago.format(timestamp, time.time())
    # return datetime.datetime.utcfromtimestamp(timestamp).strftime(format)

-
+# When nobody is logged in Flask-Login's current_user is set to an AnonymousUser object.
 class User(flask_login.UserMixin):
    id=None

@@ -135,7 +140,6 @@ class User(flask_login.UserMixin):
    def get_user(self, email="defaultuser@changedetection.io"):
        return self
    def is_authenticated(self):
-
        return True
    def is_active(self):
        return True
@@ -214,6 +218,10 @@ def changedetection_app(config=None, datastore_o=None):
            return redirect(url_for('index'))

        if request.method == 'GET':
+            if flask_login.current_user.is_authenticated:
+                flash("Already logged in")
+                return redirect(url_for("index"))
+
            output = render_template("login.html")
            return output

@@ -249,6 +257,11 @@ def changedetection_app(config=None, datastore_o=None):
        # (No password in settings or env var)
        app.config['LOGIN_DISABLED'] = datastore.data['settings']['application']['password'] == False and os.getenv("SALTED_PASS", False) == False

+        # Set the auth cookie path if we're running as X-settings/X-Forwarded-Prefix
+        if os.getenv('USE_X_SETTINGS') and 'X-Forwarded-Prefix' in request.headers:
+            app.config['REMEMBER_COOKIE_PATH'] = request.headers['X-Forwarded-Prefix']
+            app.config['SESSION_COOKIE_PATH'] = request.headers['X-Forwarded-Prefix']
+
        # For the RSS path, allow access via a token
        if request.path == '/rss' and request.args.get('token'):
            app_rss_token = datastore.data['settings']['application']['rss_access_token']
@@ -259,7 +272,7 @@ def changedetection_app(config=None, datastore_o=None):
    @app.route("/rss", methods=['GET'])
    @login_required
    def rss():
-
+        from . import diff
        limit_tag = request.args.get('tag')

        # Sort by last_changed and add the uuid which is usually the key..
@@ -288,6 +301,15 @@ def changedetection_app(config=None, datastore_o=None):
        fg.link(href='https://changedetection.io')

        for watch in sorted_watches:
+
+            dates = list(watch['history'].keys())
+            # Convert to int, sort and back to str again
+            # @todo replace datastore getter that does this automatically
+            dates = [int(i) for i in dates]
+            dates.sort(reverse=True)
+            dates = [str(i) for i in dates]
+            prev_fname = watch['history'][dates[1]]
+
            if not watch['viewed']:
                # Re #239 - GUID needs to be individual for each event
                # @todo In the future make this a configurable link back (see work on BASE_URL https://github.com/dgtlmoon/changedetection.io/pull/228)
@@ -303,12 +325,16 @@ def changedetection_app(config=None, datastore_o=None):

                diff_link = {'href': "{}{}".format(base_url, url_for('diff_history_page', uuid=watch['uuid']))}

-                # @todo use title if it exists
                fe.link(link=diff_link)
-                fe.title(title=watch['url'])

-                # @todo in the future <description><![CDATA[<html><body>Any code html is valid.</body></html>]]></description>
-                fe.description(description=watch['url'])
+                # @todo watch should be a getter - watch.get('title') (internally if URL else..)
+
+                watch_title = watch.get('title') if watch.get('title') else watch.get('url')
+                fe.title(title=watch_title)
+                latest_fname = watch['history'][dates[0]]
+
+                html_diff = diff.render_diff(prev_fname, latest_fname, include_equal=False, line_feed_sep="</br>")
+                fe.description(description="<![CDATA[<html><body><h4>{}</h4>{}</body></html>".format(watch_title, html_diff))

                fe.guid(guid, permalink=False)
                dt = datetime.datetime.fromtimestamp(int(watch['newest_history_key']))
@@ -367,7 +393,10 @@ def changedetection_app(config=None, datastore_o=None):
                                 tags=existing_tags,
                                 active_tag=limit_tag,
                                 app_rss_token=datastore.data['settings']['application']['rss_access_token'],
-                                 has_unviewed=datastore.data['has_unviewed'])
+                                 has_unviewed=datastore.data['has_unviewed'],
+                                 # Don't link to hosting when we're on the hosting environment
+                                 hosted_sticky=os.getenv("SALTED_PASS", False) == False,
+                                 guid=datastore.data['app_guid'])

        return output

@@ -441,7 +470,7 @@ def changedetection_app(config=None, datastore_o=None):
                raw_content = file.read()

                handler = fetch_site_status.perform_site_check(datastore=datastore)
-                stripped_content = handler.strip_ignore_text(raw_content,
+                stripped_content = html_tools.strip_ignore_text(raw_content,
                                                             datastore.data['watching'][uuid]['ignore_text'])

                if datastore.data['settings']['application'].get('ignore_whitespace', False):
@@ -491,13 +520,13 @@ def changedetection_app(config=None, datastore_o=None):
                          'headers': form.headers.data,
                          'body': form.body.data,
                          'method': form.method.data,
+                          'ignore_status_codes': form.ignore_status_codes.data,
                          'fetch_backend': form.fetch_backend.data,
                          'trigger_text': form.trigger_text.data,
                          'notification_title': form.notification_title.data,
                          'notification_body': form.notification_body.data,
                          'notification_format': form.notification_format.data,
-                          'extract_title_as_title': form.extract_title_as_title.data
-
+                          'extract_title_as_title': form.extract_title_as_title.data,
                          }

            # Notification URLs
@@ -514,6 +543,7 @@ def changedetection_app(config=None, datastore_o=None):


            datastore.data['watching'][uuid]['css_filter'] = form.css_filter.data.strip()
+            datastore.data['watching'][uuid]['subtractive_selectors'] = form.subtractive_selectors.data

            # Reset the previous_md5 so we process a new snapshot including stripping ignore text.
            if form.css_filter.data.strip() != datastore.data['watching'][uuid]['css_filter']:
@@ -546,10 +576,14 @@ def changedetection_app(config=None, datastore_o=None):
                    flash('No notification URLs set, cannot send test.', 'error')

            # Diff page [edit] link should go back to diff page
-            if request.args.get("next") and request.args.get("next") == 'diff':
+            if request.args.get("next") and request.args.get("next") == 'diff' and not form.save_and_preview_button.data:
                return redirect(url_for('diff_history_page', uuid=uuid))
            else:
-                return redirect(url_for('index'))
+                if form.save_and_preview_button.data:
+                    flash('You may need to reload this page to see the new content.')
+                    return redirect(url_for('preview_page', uuid=uuid))
+                else:
+                    return redirect(url_for('index'))

        else:
            if request.method == 'POST' and not form.validate():
@@ -582,6 +616,7 @@ def changedetection_app(config=None, datastore_o=None):
        if request.method == 'GET':
            form.minutes_between_check.data = int(datastore.data['settings']['requests']['minutes_between_check'])
            form.notification_urls.data = datastore.data['settings']['application']['notification_urls']
+            form.global_subtractive_selectors.data = datastore.data['settings']['application']['global_subtractive_selectors']
            form.global_ignore_text.data = datastore.data['settings']['application']['global_ignore_text']
            form.ignore_whitespace.data = datastore.data['settings']['application']['ignore_whitespace']
            form.extract_title_as_title.data = datastore.data['settings']['application']['extract_title_as_title']
@@ -591,16 +626,15 @@ def changedetection_app(config=None, datastore_o=None):
            form.notification_format.data = datastore.data['settings']['application']['notification_format']
            form.base_url.data = datastore.data['settings']['application']['base_url']

-            # Password unset is a GET, but we can lock the session to always need the password
-            if not os.getenv("SALTED_PASS", False) and request.values.get('removepassword') == 'yes':
-                from pathlib import Path
+        if request.method == 'POST' and form.data.get('removepassword_button') == True:
+            # Password unset is a GET, but we can lock the session to a salted env password to always need the password
+            if not os.getenv("SALTED_PASS", False):
                datastore.data['settings']['application']['password'] = False
                flash("Password protection removed.", 'notice')
                flask_login.logout_user()
                return redirect(url_for('settings_page'))

        if request.method == 'POST' and form.validate():
-
            datastore.data['settings']['application']['notification_urls'] = form.notification_urls.data
            datastore.data['settings']['requests']['minutes_between_check'] = form.minutes_between_check.data
            datastore.data['settings']['application']['extract_title_as_title'] = form.extract_title_as_title.data
@@ -610,6 +644,7 @@ def changedetection_app(config=None, datastore_o=None):
            datastore.data['settings']['application']['notification_format'] = form.notification_format.data
            datastore.data['settings']['application']['notification_urls'] = form.notification_urls.data
            datastore.data['settings']['application']['base_url'] = form.base_url.data
+            datastore.data['settings']['application']['global_subtractive_selectors'] = form.global_subtractive_selectors.data
            datastore.data['settings']['application']['global_ignore_text'] =  form.global_ignore_text.data
            datastore.data['settings']['application']['ignore_whitespace'] = form.ignore_whitespace.data

@@ -695,10 +730,6 @@ def changedetection_app(config=None, datastore_o=None):
    @app.route("/diff/<string:uuid>", methods=['GET'])
    @login_required
    def diff_history_page(uuid):
-        from changedetectionio import content_fetcher
-
-        newest_version_file_contents = ""
-        previous_version_file_contents = ""

        # More for testing, possible to return the first/only
        if uuid == 'first':
@@ -724,28 +755,28 @@ def changedetection_app(config=None, datastore_o=None):

        # Save the current newest history as the most recently viewed
        datastore.set_last_viewed(uuid, dates[0])
+        newest_file = watch['history'][dates[0]]

-        previous_version = request.args.get('previous_version')
-        if ('content-type' in watch and content_fetcher.supported_binary_type(watch['content-type'])):
-            template = "diff-image.html"
-        else:
-            newest_file = watch['history'][dates[0]]
+        try:
            with open(newest_file, 'r') as f:
                newest_version_file_contents = f.read()
+        except Exception as e:
+            newest_version_file_contents = "Unable to read {}.\n".format(newest_file)

-            try:
-                previous_file = watch['history'][previous_version]
-            except KeyError:
-                # Not present, use a default value, the second one in the sorted list.
-                previous_file = watch['history'][dates[1]]
+        previous_version = request.args.get('previous_version')
+        try:
+            previous_file = watch['history'][previous_version]
+        except KeyError:
+            # Not present, use a default value, the second one in the sorted list.
+            previous_file = watch['history'][dates[1]]

+        try:
            with open(previous_file, 'r') as f:
                previous_version_file_contents = f.read()
+        except Exception as e:
+            previous_version_file_contents = "Unable to read {}.\n".format(previous_file)

-            template = "diff.html"
-
-        output = render_template(template,
-                                 watch_a=watch,
+        output = render_template("diff.html", watch_a=watch,
                                 newest=newest_version_file_contents,
                                 previous=previous_version_file_contents,
                                 extra_stylesheets=extra_stylesheets,
@@ -755,14 +786,16 @@ def changedetection_app(config=None, datastore_o=None):
                                 current_previous_version=str(previous_version),
                                 current_diff_url=watch['url'],
                                 extra_title=" - Diff - {}".format(watch['title'] if watch['title'] else watch['url']),
-                                 left_sticky= True )
+                                 left_sticky=True)

        return output

    @app.route("/preview/<string:uuid>", methods=['GET'])
    @login_required
    def preview_page(uuid):
-        from changedetectionio import content_fetcher
+        content = []
+        ignored_line_numbers = []
+        trigger_line_numbers = []

        # More for testing, possible to return the first/only
        if uuid == 'first':
@@ -776,26 +809,52 @@ def changedetection_app(config=None, datastore_o=None):
            flash("No history found for the specified link, bad link?", "error")
            return redirect(url_for('index'))

-        newest = list(watch['history'].keys())[-1]
-        fname = watch['history'][newest]
-
-        if ('content-type' in watch and content_fetcher.supported_binary_type(watch['content-type'])):
-            template = "preview-image.html"
-            content = fname
-        else:
-            template = "preview.html"
+        if len(watch['history']):
+            timestamps = sorted(watch['history'].keys(), key=lambda x: int(x))
+            filename = watch['history'][timestamps[-1]]
            try:
-                with open(fname, 'r') as f:
-                    content = f.read()
-            except:
-                content = "Cant read {}".format(fname)
+                with open(filename, 'r') as f:
+                    tmp = f.readlines()
+
+                    # Get what needs to be highlighted
+                    ignore_rules = watch.get('ignore_text', []) + datastore.data['settings']['application']['global_ignore_text']
+
+                    # .readlines will keep the \n, but we will parse it here again, in the future tidy this up
+                    ignored_line_numbers = html_tools.strip_ignore_text(content="".join(tmp),
+                                                                        wordlist=ignore_rules,
+                                                                        mode='line numbers'
+                                                                        )
+
+                    trigger_line_numbers = html_tools.strip_ignore_text(content="".join(tmp),
+                                                                        wordlist=watch['trigger_text'],
+                                                                        mode='line numbers'
+                                                                        )
+                    # Prepare the classes and lines used in the template
+                    i=0
+                    for l in tmp:
+                        classes=[]
+                        i+=1
+                        if i in ignored_line_numbers:
+                            classes.append('ignored')
+                        if i in trigger_line_numbers:
+                            classes.append('triggered')
+                        content.append({'line': l, 'classes': ' '.join(classes)})
+
+
+            except Exception as e:
+                content.append({'line': "File doesnt exist or unable to read file {}".format(filename), 'classes': ''})
+        else:
+            content.append({'line': "No history found", 'classes': ''})
+

        output = render_template("preview.html",
                                 content=content,
                                 extra_stylesheets=extra_stylesheets,
+                                 ignored_line_numbers=ignored_line_numbers,
+                                 triggered_line_numbers=trigger_line_numbers,
                                 current_diff_url=watch['url'],
-                                 uuid=uuid,
-                                 watch=watch)
+                                 watch=watch,
+                                 uuid=uuid)
        return output

    @app.route("/settings/notification-logs", methods=['GET'])
@@ -807,49 +866,6 @@ def changedetection_app(config=None, datastore_o=None):

        return output

-
-    # render an image which contains the diff of two images
-    # We always compare the newest against whatever compare_date we are given
-    @app.route("/diff/show-image/<string:uuid>/<string:datestr>")
-    def show_single_image(uuid, datestr):
-
-        from flask import make_response
-        watch = datastore.data['watching'][uuid]
-
-        if datestr == 'None' or datestr is None:
-            datestr = list(watch['history'].keys())[0]
-
-        fname = watch['history'][datestr]
-        with open(fname, 'rb') as f:
-            resp = make_response(f.read())
-            
-        # @todo assumption here about the type, re-encode? detect?
-        resp.headers['Content-Type'] = 'image/jpeg'
-        return resp
-
-    # render an image which contains the diff of two images
-    # We always compare the newest against whatever compare_date we are given
-    @app.route("/diff/image/<string:uuid>/<string:compare_date>")
-    def render_diff_image(uuid, compare_date):
-        from changedetectionio import image_diff
-
-        from flask import make_response
-        watch = datastore.data['watching'][uuid]
-        newest = list(watch['history'].keys())[-1]
-
-        # @todo this is weird
-        if compare_date == 'None' or compare_date is None:
-            compare_date = list(watch['history'].keys())[0]
-
-        new_img = watch['history'][newest]
-        prev_img = watch['history'][compare_date]
-        img = image_diff.render_diff(new_img, prev_img)
-
-        resp = make_response(img)
-        resp.headers['Content-Type'] = 'image/jpeg'
-        return resp
-
-
    @app.route("/api/<string:uuid>/snapshot/current", methods=['GET'])
    @login_required
    def api_snapshot(uuid):
@@ -1133,22 +1149,42 @@ def ticker_thread_check_time_launch_checks():
                running_uuids.append(t.current_uuid)

        # Re #232 - Deepcopy the data incase it changes while we're iterating through it all
-        copied_datastore = deepcopy(datastore)
+        while True:
+            try:
+                copied_datastore = deepcopy(datastore)
+            except RuntimeError as e:
+                # RuntimeError: dictionary changed size during iteration
+                time.sleep(0.1)
+            else:
+                break
+
+        # Re #438 - Don't place more watches in the queue to be checked if the queue is already large
+        while update_q.qsize() >= 2000:
+            time.sleep(1)

        # Check for watches outside of the time threshold to put in the thread queue.
+        now = time.time()
+        max_system_wide = int(copied_datastore.data['settings']['requests']['minutes_between_check']) * 60
+
        for uuid, watch in copied_datastore.data['watching'].items():
+
+            # No need todo further processing if it's paused
+            if watch['paused']:
+                continue
+
            # If they supplied an individual entry minutes to threshold.
-            if 'minutes_between_check' in watch and watch['minutes_between_check'] is not None:
+            watch_minutes_between_check = watch.get('minutes_between_check', None)
+            if watch_minutes_between_check is not None:
                # Cast to int just incase
-                max_time = int(watch['minutes_between_check']) * 60
+                max_time = int(watch_minutes_between_check) * 60
            else:
                # Default system wide.
-                max_time = int(copied_datastore.data['settings']['requests']['minutes_between_check']) * 60
+                max_time = max_system_wide

-            threshold = time.time() - max_time
+            threshold = now - max_time

-            # Yeah, put it in the queue, it's more than time.
-            if not watch['paused'] and watch['last_checked'] <= threshold:
+            # Yeah, put it in the queue, it's more than time
+            if watch['last_checked'] <= threshold:
                if not uuid in running_uuids and uuid not in update_q.queue:
                    update_q.put(uuid)

@@ -0,0 +1,114 @@
+#!/usr/bin/python3
+
+# Launch as a eventlet.wsgi server instance.
+
+import getopt
+import os
+import sys
+
+import eventlet
+import eventlet.wsgi
+from . import store, changedetection_app
+from . import __version__
+
+def main():
+    ssl_mode = False
+    host = ''
+    port = os.environ.get('PORT') or 5000
+    do_cleanup = False
+    datastore_path = None
+
+    # On Windows, create and use a default path.
+    if os.name == 'nt':
+        datastore_path = os.path.expandvars(r'%APPDATA%\changedetection.io')
+        os.makedirs(datastore_path, exist_ok=True)
+    else:
+        # Must be absolute so that send_from_directory doesnt try to make it relative to backend/
+        datastore_path = os.path.join(os.getcwd(), "../datastore")
+
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "Ccsd:h:p:", "port")
+    except getopt.GetoptError:
+        print('backend.py -s SSL enable -h [host] -p [port] -d [datastore path]')
+        sys.exit(2)
+
+    create_datastore_dir = False
+
+    for opt, arg in opts:
+        #        if opt == '--purge':
+        # Remove history, the actual files you need to delete manually.
+        #            for uuid, watch in datastore.data['watching'].items():
+        #                watch.update({'history': {}, 'last_checked': 0, 'last_changed': 0, 'previous_md5': None})
+
+        if opt == '-s':
+            ssl_mode = True
+
+        if opt == '-h':
+            host = arg
+
+        if opt == '-p':
+            port = int(arg)
+
+        if opt == '-d':
+            datastore_path = arg
+
+        # Cleanup (remove text files that arent in the index)
+        if opt == '-c':
+            do_cleanup = True
+
+        # Create the datadir if it doesnt exist
+        if opt == '-C':
+            create_datastore_dir = True
+
+    # isnt there some @thingy to attach to each route to tell it, that this route needs a datastore
+    app_config = {'datastore_path': datastore_path}
+
+    if not os.path.isdir(app_config['datastore_path']):
+        if create_datastore_dir:
+            os.mkdir(app_config['datastore_path'])
+        else:
+            print(
+                "ERROR: Directory path for the datastore '{}' does not exist, cannot start, please make sure the directory exists or specify a directory with the -d option.\n"
+                "Or use the -C parameter to create the directory.".format(app_config['datastore_path']), file=sys.stderr)
+            sys.exit(2)
+
+    datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], version_tag=__version__)
+    app = changedetection_app(app_config, datastore)
+
+    # Go into cleanup mode
+    if do_cleanup:
+        datastore.remove_unused_snapshots()
+
+    app.config['datastore_path'] = datastore_path
+
+
+    @app.context_processor
+    def inject_version():
+        return dict(right_sticky="v{}".format(datastore.data['version_tag']),
+                    new_version_available=app.config['NEW_VERSION_AVAILABLE'],
+                    has_password=datastore.data['settings']['application']['password'] != False
+                    )
+
+    # Proxy sub-directory support
+    # Set environment var USE_X_SETTINGS=1 on this script
+    # And then in your proxy_pass settings
+    #
+    #         proxy_set_header Host "localhost";
+    #         proxy_set_header X-Forwarded-Prefix /app;
+
+    if os.getenv('USE_X_SETTINGS'):
+        print ("USE_X_SETTINGS is ENABLED\n")
+        from werkzeug.middleware.proxy_fix import ProxyFix
+        app.wsgi_app = ProxyFix(app.wsgi_app, x_prefix=1, x_host=1)
+
+    if ssl_mode:
+        # @todo finalise SSL config, but this should get you in the right direction if you need it.
+        eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen((host, port)),
+                                               certfile='cert.pem',
+                                               keyfile='privkey.pem',
+                                               server_side=True), app)
+
+    else:
+        eventlet.wsgi.server(eventlet.listen((host, int(port))), app)
+
+
@@ -1,13 +1,14 @@
-import os
-import time
 from abc import ABC, abstractmethod
+import chardet
+import os
 from selenium import webdriver
 from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
 from selenium.webdriver.common.proxy import Proxy as SeleniumProxy
 from selenium.common.exceptions import WebDriverException
+import requests
+import time
+import urllib3.exceptions

-# image/jpeg etc
-supported_binary_types = ['image']

 class EmptyReply(Exception):
    def __init__(self, status_code, url):
@@ -21,7 +22,7 @@ class EmptyReply(Exception):
 class Fetcher():
    error = None
    status_code = None
-    content = None # Should always be bytes.
+    content = None
    headers = None

    fetcher_description ="No description"
@@ -31,7 +32,13 @@ class Fetcher():
        return self.error

    @abstractmethod
-    def run(self, url, timeout, request_headers, request_body, request_method):
+    def run(self,
+            url,
+            timeout,
+            request_headers,
+            request_body,
+            request_method,
+            ignore_status_codes=False):
        # Should set self.error, self.status_code and self.content
        pass

@@ -52,15 +59,6 @@ class Fetcher():
 #    def return_diff(self, stream_a, stream_b):
 #        return

-# Assume we dont support it as binary if its not in our list
-def supported_binary_type(content_type):
-    # Not a binary thing we support? then use text (also used for JSON/XML etc)
-    # @todo - future - use regex for matching
-    if content_type and content_type.lower().strip().split('/')[0] not in (string.lower() for string in supported_binary_types):
-        return False
-
-    return True
-
 def available_fetchers():
        import inspect
        from changedetectionio import content_fetcher
@@ -107,7 +105,13 @@ class html_webdriver(Fetcher):
        if proxy_args:
            self.proxy = SeleniumProxy(raw=proxy_args)

-    def run(self, url, timeout, request_headers, request_body, request_method):
+    def run(self,
+            url,
+            timeout,
+            request_headers,
+            request_body,
+            request_method,
+            ignore_status_codes=False):

        # request_body, request_method unused for now, until some magic in the future happens.

@@ -155,8 +159,13 @@ class html_webdriver(Fetcher):
 class html_requests(Fetcher):
    fetcher_description = "Basic fast Plaintext/HTTP Client"

-    def run(self, url, timeout, request_headers, request_body, request_method):
-        import requests
+    def run(self,
+            url,
+            timeout,
+            request_headers,
+            request_body,
+            request_method,
+            ignore_status_codes=False):

        r = requests.request(method=request_method,
                         data=request_body,
@@ -165,19 +174,21 @@ class html_requests(Fetcher):
                         timeout=timeout,
                         verify=False)

-        # https://stackoverflow.com/questions/44203397/python-requests-get-returns-improperly-decoded-text-instead-of-utf-8
-
-        if not supported_binary_type(r.headers.get('Content-Type', '')):
-            content = r.text
-        else:
-            content = r.content
+        # If the response did not tell us what encoding format to expect, Then use chardet to override what `requests` thinks.
+        # For example - some sites don't tell us it's utf-8, but return utf-8 content
+        # This seems to not occur when using webdriver/selenium, it seems to detect the text encoding more reliably.
+        # https://github.com/psf/requests/issues/1604 good info about requests encoding detection
+        if not r.headers.get('content-type') or not 'charset=' in r.headers.get('content-type'):
+            encoding = chardet.detect(r.content)['encoding']
+            if encoding:
+                r.encoding = encoding

        # @todo test this
        # @todo maybe you really want to test zero-byte return pages?
-        if not r or not content or not len(content):
+        if (not ignore_status_codes and not r) or not r.content or not len(r.content):
            raise EmptyReply(url=url, status_code=r.status_code)

        self.status_code = r.status_code
-        self.content = content
+        self.content = r.text
        self.headers = r.headers

@@ -2,22 +2,31 @@

 import difflib

+
+def same_slicer(l, a, b):
+    if a == b:
+        return [l[a]]
+    else:
+        return l[a:b]
+
 # like .compare but a little different output
 def customSequenceMatcher(before, after, include_equal=False):
    cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \\t", a=before, b=after)

+    # @todo Line-by-line mode instead of buncghed, including `after` that is not in `before` (maybe unset?)
    for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
        if include_equal and tag == 'equal':
            g = before[alo:ahi]
            yield g
        elif tag == 'delete':
-            g = "(removed) {}".format(before[alo])
+            g = ["(removed) " + i for i in same_slicer(before, alo, ahi)]
            yield g
        elif tag == 'replace':
-            g = ["(changed) {}".format(before[alo]), "(-> into) {}".format(after[blo])]
+            g = ["(changed) " + i for i in same_slicer(before, alo, ahi)]
+            g += ["(into   ) " + i for i in same_slicer(after, blo, bhi)]
            yield g
        elif tag == 'insert':
-            g = "(added) {}".format(after[blo])
+            g = ["(added  ) " + i for i in same_slicer(after, blo, bhi)]
            yield g

 # only_differences - only return info about the differences, no context
@@ -1,10 +1,11 @@
-import time
-from changedetectionio import content_fetcher
 import hashlib
-from inscriptis import get_text
-import urllib3
-from . import html_tools
+import os
 import re
+import time
+import urllib3
+
+from inscriptis import get_text
+from changedetectionio import content_fetcher, html_tools

 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

@@ -16,51 +17,20 @@ class perform_site_check():
        super().__init__(*args, **kwargs)
        self.datastore = datastore

-    def strip_ignore_text(self, content, list_ignore_text):
-        import re
-        ignore = []
-        ignore_regex = []
-        for k in list_ignore_text:
-
-            # Is it a regex?
-            if k[0] == '/':
-                ignore_regex.append(k.strip(" /"))
-            else:
-                ignore.append(k)
-
-        output = []
-        for line in content.splitlines():
-
-            # Always ignore blank lines in this mode. (when this function gets called)
-            if len(line.strip()):
-                regex_matches = False
-
-                # if any of these match, skip
-                for regex in ignore_regex:
-                    try:
-                        if re.search(regex, line, re.IGNORECASE):
-                            regex_matches = True
-                    except Exception as e:
-                        continue
-
-                if not regex_matches and not any(skip_text in line for skip_text in ignore):
-                    output.append(line.encode('utf8'))
-
-        return "\n".encode('utf8').join(output)
-
-
-
    def run(self, uuid):
        timestamp = int(time.time())  # used for storage etc too

        changed_detected = False
        stripped_text_from_html = ""
-        fetched_md5 = ""
-
-        original_content_before_filters = False

        watch = self.datastore.data['watching'][uuid]

+        # Protect against file:// access
+        if re.search(r'^file', watch['url'], re.IGNORECASE) and not os.getenv('ALLOW_FILE_URI', False):
+            raise Exception(
+                "file:// type access is denied for security reasons."
+            )
+
        # Unset any existing notification error
        update_obj = {'last_notification_error': False, 'last_error': False}

@@ -83,6 +53,7 @@ class perform_site_check():
            url = self.datastore.get_val(uuid, 'url')
            request_body = self.datastore.get_val(uuid, 'body')
            request_method = self.datastore.get_val(uuid, 'method')
+            ignore_status_code = self.datastore.get_val(uuid, 'ignore_status_codes')

            # Pluggable content fetcher
            prefer_backend = watch['fetch_backend']
@@ -94,8 +65,7 @@ class perform_site_check():


            fetcher = klass()
-            fetcher.run(url, timeout, request_headers, request_body, request_method)
-
+            fetcher.run(url, timeout, request_headers, request_body, request_method, ignore_status_code)
            # Fetching complete, now filters
            # @todo move to class / maybe inside of fetcher abstract base?

@@ -105,39 +75,38 @@ class perform_site_check():
            #  - Do we convert to JSON?
            # https://stackoverflow.com/questions/41817578/basic-method-chaining ?
            # return content().textfilter().jsonextract().checksumcompare() ?
-            update_obj['content-type'] = fetcher.headers.get('Content-Type', '').lower().strip()
-            
-            # Could be 'application/json; charset=utf-8' etc
-            is_json = 'application/json' in update_obj['content-type']
-            is_text_or_html = 'text/' in update_obj['content-type'] # text/plain , text/html etc
-            is_binary = not is_text_or_html and content_fetcher.supported_binary_type(update_obj['content-type'])
-            css_filter_rule = watch['css_filter']
-            has_filter_rule = css_filter_rule and len(css_filter_rule.strip())

-            # Auto-detect application/json, make it reformat the JSON to something nice
+            is_json = 'application/json' in fetcher.headers.get('Content-Type', '')
+            is_html = not is_json
+            css_filter_rule = watch['css_filter']
+            subtractive_selectors = watch.get(
+                "subtractive_selectors", []
+            ) + self.datastore.data["settings"]["application"].get(
+                "global_subtractive_selectors", []
+            )
+
+            has_filter_rule = css_filter_rule and len(css_filter_rule.strip())
+            has_subtractive_selectors = subtractive_selectors and len(subtractive_selectors[0].strip())
+            
            if is_json and not has_filter_rule:
                css_filter_rule = "json:$"
                has_filter_rule = True

-            ##### CONVERT THE INPUT TO TEXT, EXTRACT THE PARTS THAT NEED TO BE FILTERED
+            if has_filter_rule:
+                if 'json:' in css_filter_rule:
+                    stripped_text_from_html = html_tools.extract_json_as_string(content=fetcher.content, jsonpath_filter=css_filter_rule)
+                    is_html = False

-            # Dont depend on the content-type header here, maybe it's not present
-            if 'json:' in css_filter_rule:
-                is_json = True
-                rule = css_filter_rule.replace('json:', '')
-                stripped_text_from_html = html_tools.extract_json_as_string(content=fetcher.content,
-                                                                            jsonpath_filter=rule).encode('utf-8')
-                is_text_or_html = False
-                original_content_before_filters = stripped_text_from_html
-
-            if is_text_or_html:
+            if is_html:
                # CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
                html_content = fetcher.content
-                if 'text/plain' in update_obj['content-type']:
-                    stripped_text_from_html = html_content

-                # Assume it's HTML if it's not text/plain
-                if not 'text/plain' in update_obj['content-type']:
+                # If not JSON,  and if it's not text/plain..
+                if 'text/plain' in fetcher.headers.get('Content-Type', '').lower():
+                    # Don't run get_text or xpath/css filters on plaintext
+                    stripped_text_from_html = html_content
+                else:
+                    # Then we assume HTML
                    if has_filter_rule:
                        # For HTML/XML we offer xpath as an option, just start a regular xPath "/.."
                        if css_filter_rule[0] == '/':
@@ -145,52 +114,33 @@ class perform_site_check():
                        else:
                            # CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text
                            html_content = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content)
+                    if has_subtractive_selectors:
+                        html_content = html_tools.element_removal(subtractive_selectors, html_content)
                    # get_text() via inscriptis
                    stripped_text_from_html = get_text(html_content)

-                    # Extract title as title
-                    if self.datastore.data['settings']['application']['extract_title_as_title'] or watch['extract_title_as_title']:
-                        if not watch['title'] or not len(watch['title']):
-                            update_obj['title'] = html_tools.extract_element(find='title', html_content=fetcher.content)
-
-                # Re #340 - return the content before the 'ignore text' was applied
-                original_content_before_filters = stripped_text_from_html.encode('utf-8')
-
+            # Re #340 - return the content before the 'ignore text' was applied
+            text_content_before_ignored_filter = stripped_text_from_html.encode('utf-8')

            # We rely on the actual text in the html output.. many sites have random script vars etc,
            # in the future we'll implement other mechanisms.

            update_obj["last_check_status"] = fetcher.get_last_status_code()

-            ######## AFTER FILTERING, STRIP OUT IGNORE TEXT
-            if is_text_or_html:
-                text_to_ignore = watch.get('ignore_text', []) + self.datastore.data['settings']['application'].get('global_ignore_text', [])
-                if len(text_to_ignore):
-                    stripped_text_from_html = self.strip_ignore_text(stripped_text_from_html, text_to_ignore)
-                else:
-                    stripped_text_from_html = stripped_text_from_html.encode('utf8')
+            # If there's text to skip
+            # @todo we could abstract out the get_text() to handle this cleaner
+            text_to_ignore = watch.get('ignore_text', []) + self.datastore.data['settings']['application'].get('global_ignore_text', [])
+            if len(text_to_ignore):
+                stripped_text_from_html = html_tools.strip_ignore_text(stripped_text_from_html, text_to_ignore)
+            else:
+                stripped_text_from_html = stripped_text_from_html.encode('utf8')

-
-            ######## CALCULATE CHECKSUM FOR DIFF DETECTION
            # Re #133 - if we should strip whitespaces from triggering the change detected comparison
-            if is_text_or_html:
-                if self.datastore.data['settings']['application'].get('ignore_whitespace', False):
-                    fetched_md5 = hashlib.md5(stripped_text_from_html.translate(None, b'\r\n\t ')).hexdigest()
-                else:
-                    fetched_md5 = hashlib.md5(stripped_text_from_html).hexdigest()
-
-            if is_json:
+            if self.datastore.data['settings']['application'].get('ignore_whitespace', False):
+                fetched_md5 = hashlib.md5(stripped_text_from_html.translate(None, b'\r\n\t ')).hexdigest()
+            else:
                fetched_md5 = hashlib.md5(stripped_text_from_html).hexdigest()

-            # Goal here in the future is to be able to abstract out different content type checks into their own class
-
-            if is_binary:
-                # @todo - use some actual image hash here where possible, audio hash, etc etc
-                m = hashlib.sha256()
-                m.update(fetcher.content)
-                fetched_md5 = m.hexdigest()
-                original_content_before_filters = fetcher.content
-
            # On the first run of a site, watch['previous_md5'] will be an empty string, set it the current one.
            if not len(watch['previous_md5']):
                watch['previous_md5'] = fetched_md5
@@ -198,24 +148,16 @@ class perform_site_check():

            blocked_by_not_found_trigger_text = False

-            # Trigger text can apply to JSON parsed documents too
-            if is_text_or_html or is_json:
-                if len(watch['trigger_text']):
-                    blocked_by_not_found_trigger_text = True
-                    for line in watch['trigger_text']:
-                        # Because JSON wont serialize a re.compile object
-                        if line[0] == '/' and line[-1] == '/':
-                            regex = re.compile(line.strip('/'), re.IGNORECASE)
-                            # Found it? so we don't wait for it anymore
-                            r = re.search(regex, str(stripped_text_from_html))
-                            if r:
-                                blocked_by_not_found_trigger_text = False
-                                break
+            if len(watch['trigger_text']):
+                # Yeah, lets block first until something matches
+                blocked_by_not_found_trigger_text = True
+                # Filter and trigger works the same, so reuse it
+                result = html_tools.strip_ignore_text(content=str(stripped_text_from_html),
+                                                      wordlist=watch['trigger_text'],
+                                                      mode="line numbers")
+                if result:
+                    blocked_by_not_found_trigger_text = False

-                        elif line.lower() in str(stripped_text_from_html).lower():
-                            # We found it don't wait for it.
-                            blocked_by_not_found_trigger_text = False
-                            break

            if not blocked_by_not_found_trigger_text and watch['previous_md5'] != fetched_md5:
                changed_detected = True
@@ -223,5 +165,11 @@ class perform_site_check():
                update_obj["last_changed"] = timestamp


-        # original_content_before_filters is returned for saving the data to disk
-        return changed_detected, update_obj, original_content_before_filters
+            # Extract title as title
+            if is_html:
+                if self.datastore.data['settings']['application']['extract_title_as_title'] or watch['extract_title_as_title']:
+                    if not watch['title'] or not len(watch['title']):
+                        update_obj['title'] = html_tools.extract_element(find='title', html_content=fetcher.content)
+
+
+        return changed_detected, update_obj, text_content_before_ignored_filter
@@ -1,12 +1,30 @@
-from wtforms import Form, SelectField, RadioField, BooleanField, StringField, PasswordField, validators, IntegerField, fields, TextAreaField, \
-    Field
-from wtforms import widgets
-from wtforms.validators import ValidationError
-from wtforms.fields import html5
-from changedetectionio import content_fetcher
 import re

-from changedetectionio.notification import default_notification_format, valid_notification_formats, default_notification_body, default_notification_title
+from wtforms import (
+    BooleanField,
+    Field,
+    Form,
+    IntegerField,
+    PasswordField,
+    RadioField,
+    SelectField,
+    StringField,
+    SubmitField,
+    TextAreaField,
+    fields,
+    validators,
+    widgets,
+)
+from wtforms.fields import html5
+from wtforms.validators import ValidationError
+
+from changedetectionio import content_fetcher
+from changedetectionio.notification import (
+    default_notification_body,
+    default_notification_format,
+    default_notification_title,
+    valid_notification_formats,
+)

 valid_method = {
    'GET',
@@ -44,8 +62,8 @@ class SaltyPasswordField(StringField):
    encrypted_password = ""

    def build_password(self, password):
-        import hashlib
        import base64
+        import hashlib
        import secrets

        # Make a new salt on every new password and store it with the password
@@ -103,9 +121,10 @@ class ValidateContentFetcherIsReady(object):
        self.message = message

    def __call__(self, form, field):
-        from changedetectionio import content_fetcher
        import urllib3.exceptions

+        from changedetectionio import content_fetcher
+
        # Better would be a radiohandler that keeps a reference to each class
        if field.data is not None:
            klass = getattr(content_fetcher, field.data)
@@ -212,52 +231,69 @@ class ValidateListRegex(object):
                except re.error:
                    message = field.gettext('RegEx \'%s\' is not a valid regular expression.')
                    raise ValidationError(message % (line))
-
+              
 class ValidateCSSJSONXPATHInput(object):
    """
    Filter validation
    @todo CSS validator ;)
    """

-    def __init__(self, message=None):
+    def __init__(self, message=None, allow_xpath=True, allow_json=True):
        self.message = message
+        self.allow_xpath = allow_xpath
+        self.allow_json = allow_json

    def __call__(self, form, field):

+        if isinstance(field.data, str):
+            data = [field.data]
+        else:
+            data = field.data
+
+        for line in data:
        # Nothing to see here
-        if not len(field.data.strip()):
-            return
+            if not len(line.strip()):
+                return

-        # Does it look like XPath?
-        if field.data.strip()[0] == '/':
-            from lxml import html, etree
-            tree = html.fromstring("<html></html>")
+            # Does it look like XPath?
+            if line.strip()[0] == '/':
+                if not self.allow_xpath:
+                    raise ValidationError("XPath not permitted in this field!")
+                from lxml import etree, html
+                tree = html.fromstring("<html></html>")

-            try:
-                tree.xpath(field.data.strip())
-            except etree.XPathEvalError as e:
-                message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
-                raise ValidationError(message % (field.data, str(e)))
-            except:
-                raise ValidationError("A system-error occurred when validating your XPath expression")
+                try:
+                    tree.xpath(line.strip())
+                except etree.XPathEvalError as e:
+                    message = field.gettext('\'%s\' is not a valid XPath expression. (%s)')
+                    raise ValidationError(message % (line, str(e)))
+                except:
+                    raise ValidationError("A system-error occurred when validating your XPath expression")

-        if 'json:' in field.data:
-            from jsonpath_ng.exceptions import JsonPathParserError, JsonPathLexerError
-            from jsonpath_ng.ext import parse
+            if 'json:' in line:
+                if not self.allow_json:
+                    raise ValidationError("JSONPath not permitted in this field!")

-            input = field.data.replace('json:', '')
+                from jsonpath_ng.exceptions import (
+                    JsonPathLexerError,
+                    JsonPathParserError,
+                )
+                from jsonpath_ng.ext import parse

-            try:
-                parse(input)
-            except (JsonPathParserError, JsonPathLexerError) as e:
-                message = field.gettext('\'%s\' is not a valid JSONPath expression. (%s)')
-                raise ValidationError(message % (input, str(e)))
-            except:
-                raise ValidationError("A system-error occurred when validating your JSONPath expression")
+                input = line.replace('json:', '')

-            # Re #265 - maybe in the future fetch the page and offer a
-            # warning/notice that its possible the rule doesnt yet match anything?
+                try:
+                    parse(input)
+                except (JsonPathParserError, JsonPathLexerError) as e:
+                    message = field.gettext('\'%s\' is not a valid JSONPath expression. (%s)')
+                    raise ValidationError(message % (input, str(e)))
+                except:
+                    raise ValidationError("A system-error occurred when validating your JSONPath expression")

+                # Re #265 - maybe in the future fetch the page and offer a
+                # warning/notice that its possible the rule doesnt yet match anything?
+
+            
 class quickWatchForm(Form):
    # https://wtforms.readthedocs.io/en/2.3.x/fields/#module-wtforms.fields.html5
    # `require_tld` = False is needed even for the test harness "http://localhost:5005.." to run
@@ -282,14 +318,19 @@ class watchForm(commonSettingsForm):
    minutes_between_check = html5.IntegerField('Maximum time in minutes until recheck',
                                               [validators.Optional(), validators.NumberRange(min=1)])
    css_filter = StringField('CSS/JSON/XPATH Filter', [ValidateCSSJSONXPATHInput()])
+    subtractive_selectors = StringListField('Remove elements', [ValidateCSSJSONXPATHInput(allow_xpath=False, allow_json=False)])
    title = StringField('Title')

    ignore_text = StringListField('Ignore Text', [ValidateListRegex()])
    headers = StringDictKeyValue('Request Headers')
    body = TextAreaField('Request Body', [validators.Optional()])
    method = SelectField('Request Method', choices=valid_method, default=default_method)
+    ignore_status_codes = BooleanField('Ignore Status Codes (process non-2xx status codes as normal)', default=False)
    trigger_text = StringListField('Trigger/wait for text', [validators.Optional(), ValidateListRegex()])

+    save_button = SubmitField('Save', render_kw={"class": "pure-button pure-button-primary"})
+    save_and_preview_button = SubmitField('Save & Preview', render_kw={"class": "pure-button pure-button-primary"})
+
    def validate(self, **kwargs):
        if not super().validate():
            return False
@@ -310,5 +351,8 @@ class globalSettingsForm(commonSettingsForm):
                                               [validators.NumberRange(min=1)])
    extract_title_as_title = BooleanField('Extract <title> from document and use as watch title')
    base_url = StringField('Base URL', validators=[validators.Optional()])
+    global_subtractive_selectors = StringListField('Remove elements', [ValidateCSSJSONXPATHInput(allow_xpath=False, allow_json=False)])
    global_ignore_text = StringListField('Ignore Text', [ValidateListRegex()])
-    ignore_whitespace = BooleanField('Ignore whitespace')
+    ignore_whitespace = BooleanField('Ignore whitespace')
+    save_button = SubmitField('Save', render_kw={"class": "pure-button pure-button-primary"})
+    removepassword_button = SubmitField('Remove password', render_kw={"class": "pure-button pure-button-primary"})
@@ -1,4 +1,7 @@
 import json
+import re
+from typing import List
+
 from bs4 import BeautifulSoup
 from jsonpath_ng.ext import parse

@@ -16,16 +19,27 @@ def css_filter(css_filter, html_content):

    return html_block + "\n"

+def subtractive_css_selector(css_selector, html_content):
+    soup = BeautifulSoup(html_content, "html.parser")
+    for item in soup.select(css_selector):
+        item.decompose()
+    return str(soup)
+
+    
+def element_removal(selectors: List[str], html_content):
+    """Joins individual filters into one css filter."""
+    selector = ",".join(selectors)
+    return subtractive_css_selector(selector, html_content)
+    

 # Return str Utf-8 of matched rules
 def xpath_filter(xpath_filter, html_content):
-    from lxml import html
-    from lxml import etree
+    from lxml import etree, html

    tree = html.fromstring(html_content)
    html_block = ""

-    for item in tree.xpath(xpath_filter.strip()):
+    for item in tree.xpath(xpath_filter.strip(), namespaces={'re':'http://exslt.org/regular-expressions'}):
        html_block+= etree.tostring(item, pretty_print=True).decode('utf-8')+"<br/>"

    return html_block
@@ -64,7 +78,8 @@ def _parse_json(json_data, jsonpath_filter):
        # Re 265 - Just return an empty string when filter not found
        return ''

-    stripped_text_from_html = json.dumps(s, indent=4)
+    # Ticket #462 - allow the original encoding through, usually it's UTF-8 or similar
+    stripped_text_from_html = json.dumps(s, indent=4, ensure_ascii=False)

    return stripped_text_from_html

@@ -105,3 +120,50 @@ def extract_json_as_string(content, jsonpath_filter):
        return ''

    return stripped_text_from_html
+
+# Mode     - "content" return the content without the matches (default)
+#          - "line numbers" return a list of line numbers that match (int list)
+#
+# wordlist - list of regex's (str) or words (str)
+def strip_ignore_text(content, wordlist, mode="content"):
+    ignore = []
+    ignore_regex = []
+
+    # @todo check this runs case insensitive
+    for k in wordlist:
+
+        # Is it a regex?
+        if k[0] == '/':
+            ignore_regex.append(k.strip(" /"))
+        else:
+            ignore.append(k)
+
+    i = 0
+    output = []
+    ignored_line_numbers = []
+    for line in content.splitlines():
+        i += 1
+        # Always ignore blank lines in this mode. (when this function gets called)
+        if len(line.strip()):
+            regex_matches = False
+
+            # if any of these match, skip
+            for regex in ignore_regex:
+                try:
+                    if re.search(regex, line, re.IGNORECASE):
+                        regex_matches = True
+                except Exception as e:
+                    continue
+
+            if not regex_matches and not any(skip_text.lower() in line.lower() for skip_text in ignore):
+                output.append(line.encode('utf8'))
+            else:
+                ignored_line_numbers.append(i)
+
+
+
+    # Used for finding out what to highlight
+    if mode == "line numbers":
+        return ignored_line_numbers
+
+    return "\n".encode('utf8').join(output)
@@ -1,41 +0,0 @@
-# import the necessary packages
-from skimage.metrics import structural_similarity as compare_ssim
-import argparse
-import imutils
-import cv2
-
-# From https://www.pyimagesearch.com/2017/06/19/image-difference-with-opencv-and-python/
-def render_diff(fpath_imageA, fpath_imageB):
-
-	imageA = cv2.imread(fpath_imageA)
-	imageB = cv2.imread(fpath_imageB)
-
-	# convert the images to grayscale
-	grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY)
-	grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY)
-
-	# compute the Structural Similarity Index (SSIM) between the two
-	# images, ensuring that the difference image is returned
-	(score, diff) = compare_ssim(grayA, grayB, full=True)
-	diff = (diff * 255).astype("uint8")
-	print("SSIM: {}".format(score))
-
-	# threshold the difference image, followed by finding contours to
-	# obtain the regions of the two input images that differ
-	thresh = cv2.threshold(diff, 0, 255,
-		cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
-	cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
-		cv2.CHAIN_APPROX_SIMPLE)
-	cnts = imutils.grab_contours(cnts)
-
-	# loop over the contours
-	for c in cnts:
-		# compute the bounding box of the contour and then draw the
-		# bounding box on both input images to represent where the two
-		# images differ
-		(x, y, w, h) = cv2.boundingRect(c)
-		cv2.rectangle(imageA, (x, y), (x + w, y + h), (0, 0, 255), 2)
-		cv2.rectangle(imageB, (x, y), (x + w, y + h), (0, 0, 255), 2)
-
-	#return cv2.imencode('.jpg', imageB)[1].tobytes()
-	return cv2.imencode('.jpg', imageA)[1].tobytes()
@@ -54,3 +54,19 @@ ins {
  body {
    height: 99%;
    /* Hide scroll bar in Firefox */ } }
+
+td#diff-col div {
+  text-align: justify;
+  white-space: pre-wrap; }
+
+.ignored {
+  background-color: #ccc;
+  /*  border: #0d91fa 1px solid; */
+  opacity: 0.7; }
+
+.triggered {
+  background-color: #1b98f8; }
+
+/* ignored and triggered? make it obvious error */
+.ignored.triggered {
+  background-color: #ff0000; }
@@ -66,3 +66,23 @@ ins {
 		height: 99%; /* Hide scroll bar in Firefox */
 	}
 }
+
+td#diff-col div {
+    text-align: justify;
+    white-space: pre-wrap;
+}
+
+.ignored {
+    background-color: #ccc;
+   /*  border: #0d91fa 1px solid; */
+    opacity: 0.7;
+}
+
+.triggered {
+    background-color: #1b98f8;
+}
+
+/* ignored and triggered? make it obvious error */
+.ignored.triggered {
+  background-color: #ff0000;
+}
@@ -4,8 +4,7 @@
  "description": "",
  "main": "index.js",
  "scripts": {
-    "build": "node-sass styles.scss diff.scss -o .",
-    "watch": "node-sass --watch styles.scss diff.scss -o ."
+    "build": "node-sass styles.scss -o .;node-sass diff.scss -o ."
  },
  "author": "",
  "license": "ISC",
@@ -42,9 +42,14 @@ section.content {
  justify-content: center;
 }

+code {
+  background: #eee;
+}
+
 /* table related */
 .watch-table {
  width: 100%;
+  font-size: 80%;

  tr.unviewed {
    font-weight: bold;
@@ -55,7 +60,6 @@ section.content {
  }

  td {
-    font-size: 80%;
    white-space: nowrap;
  }

@@ -107,12 +111,12 @@ section.content {

 body:after {
  content: "";
-  background: linear-gradient(130deg, #ff7a18, #af002d 41.07%, #319197 76.05%)
+  background: linear-gradient(130deg, #5ad8f7, #2f50af 41.07%, #9150bf 84.05%);
 }

 body:after, body:before {
  display: block;
-  height: 600px;
+  height: 650px;
  position: absolute;
  top: 0;
  left: 0;
@@ -125,11 +129,8 @@ body::after {
 }

 body::before {
+  // background-image set in base.html so it works with reverse proxies etc
  content: "";
-  background-image: url(/static/images/gradient-border.png);
-}
-
-body:before {
  background-size: cover
 }

@@ -265,6 +266,7 @@ body:after, body:before {
  }
  legend {
    color: #fff;
+    font-weight: bold;
  }
 }

@@ -317,12 +319,10 @@ footer {
    */
 }

-
-
 .sticky-tab {
  position: absolute;
-  top: 80px;
-  font-size: 8px;
+  top: 60px;
+  font-size: 65%;
  background: #fff;
  padding: 10px;
  &#left-sticky {
@@ -331,6 +331,11 @@ footer {
  &#right-sticky {
    right: 0px;
  }
+  &#hosted-sticky {
+    right: 0px;
+    top: 100px;
+    font-weight: bold;
+  }
 }

 #new-version-text a {
@@ -417,18 +422,35 @@ footer {
  }
 }

+
+
+@media only screen and (max-width: 760px), (min-device-width: 768px) and (max-device-width: 800px) {
+
+  div.sticky-tab#hosted-sticky {
+    top: 60px;
+    left: 0px;
+    right: auto;
+  }
+
+  section.content {
+    padding-top: 110px;
+  }
+
+  // Make the tabs easier to hit, they will be all nice and horizontal
+  div.tabs ul li {
+    display: block;
+    border-radius: 0px;
+  }
+
+  input[type='text'] {
+    width: 100%;
+  }
+
 /*
 Max width before this PARTICULAR table gets nasty
 This query will take effect for any screen smaller than 760px
 and also iPads specifically.
 */
-
-@media only screen and (max-width: 760px), (min-device-width: 768px) and (max-device-width: 1024px) {
-
-  input[type='text'] {
-    width: 100%;
-  }
-    
  .watch-table {
    /* Force table to not be like tables anymore */
    thead, tbody, th, td, tr {
@@ -542,6 +564,16 @@ $form-edge-padding: 20px;
    display: block;
  }
 }
+
+.login-form {
+  .inner {
+    background: #fff;;
+    padding: $form-edge-padding;
+    border-radius: 5px;
+  }
+}
+
+
 .edit-form {
  min-width: 70%;
  .tab-pane-inner {
@@ -565,5 +597,14 @@ $form-edge-padding: 20px;
    display: block;
    background: #fff;
  }
+
+  .pure-form-message-inline {
+    padding-left: 0;
+  }
 }

+ul {
+    padding-left: 1em;
+    padding-top: 0px;
+    margin-top: 4px;
+}
@@ -1,15 +1,19 @@
-from os import unlink, path, mkdir
 import json
-import uuid as uuid_builder
-from threading import Lock
-from copy import deepcopy
-
 import logging
-import time
-import threading
 import os
+import threading
+import time
+import uuid as uuid_builder
+from copy import deepcopy
+from os import mkdir, path, unlink
+from threading import Lock
+
+from changedetectionio.notification import (
+    default_notification_body,
+    default_notification_format,
+    default_notification_title,
+)

-from changedetectionio.notification import default_notification_format, default_notification_body, default_notification_title

 # Is there an existing library to ensure some data store (JSON etc) is in sync with CRUD methods?
 # Open a github issue if you know something :)
@@ -46,6 +50,7 @@ class ChangeDetectionStore:
                    'extract_title_as_title': False,
                    'fetch_backend': 'html_requests',
                    'global_ignore_text': [], # List of text to ignore when calculating the comparison checksum
+                    'global_subtractive_selectors': [],
                    'ignore_whitespace': False,
                    'notification_urls': [], # Apprise URL list
                    # Custom notification content
@@ -82,6 +87,7 @@ class ChangeDetectionStore:
            'notification_body': default_notification_body,
            'notification_format': default_notification_format,
            'css_filter': "",
+            'subtractive_selectors': [],
            'trigger_text': [],  # List of text or regex to wait for until a change is detected
            'fetch_backend': None,
            'extract_title_as_title': False
@@ -144,8 +150,8 @@ class ChangeDetectionStore:
            unlink(password_reset_lockfile)

        if not 'app_guid' in self.__data:
-            import sys
            import os
+            import sys
            if "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ:
                self.__data['app_guid'] = "test-" + str(uuid_builder.uuid4())
            else:
@@ -184,10 +190,6 @@ class ChangeDetectionStore:

    def update_watch(self, uuid, update_obj):

-        # Skip if 'paused' state
-        if self.__data['watching'][uuid]['paused']:
-            return
-
        with self.lock:

            # In python 3.9 we have the |= dict operator, but that still will lose data on nested structures...
@@ -372,9 +374,7 @@ class ChangeDetectionStore:
        if not os.path.isdir(output_path):
            mkdir(output_path)

-        suffix = "stripped.txt"
-
-        fname = "{}/{}.{}".format(output_path, uuid.uuid4(), suffix)
+        fname = "{}/{}.stripped.txt".format(output_path, uuid.uuid4())
        with open(fname, 'wb') as f:
            f.write(contents)
            f.close()
@@ -400,13 +400,10 @@ class ChangeDetectionStore:
                # system was out of memory, out of RAM etc
                with open(self.json_store_path+".tmp", 'w') as json_file:
                    json.dump(data, json_file, indent=4)
-
+                os.replace(self.json_store_path+".tmp", self.json_store_path)
            except Exception as e:
                logging.error("Error writing JSON!! (Main JSON file save was skipped) : %s", str(e))

-            else:
-                os.rename(self.json_store_path+".tmp", self.json_store_path)
-
            self.needs_write = False

    # Thread runner, this helps with thread/write issues when there are many operations that want to update the JSON
@@ -439,6 +436,7 @@ class ChangeDetectionStore:
                index.append(self.data['watching'][uuid]['history'][str(id)])

        import pathlib
+
        # Only in the sub-directories
        for item in pathlib.Path(self.datastore_path).rglob("*/*txt"):
            if not str(item) in index:
@@ -34,9 +34,8 @@
                            </div>
                            <div class="pure-controls">
                            <span class="pure-form-message-inline">
-                                These tokens can be used in the notification body and title to
-                                customise the notification text.
-                            </span>
+                                These tokens can be used in the notification body and title to customise the notification text.
+
                                <table class="pure-table" id="token-table">
                                    <thead>
                                    <tr>
@@ -88,7 +87,7 @@
                                    </tr>
                                    </tbody>
                                </table>
-                                <span class="pure-form-message-inline">
+                                <br/>
                                URLs generated by changedetection.io (such as <code>{diff_url}</code>) require the <code>BASE_URL</code> environment variable set.<br/>
                                Your <code>BASE_URL</code> var is currently "{{current_base_url}}"
                            </span>
@@ -25,3 +25,6 @@
 {% endmacro %}


+{% macro render_button(field) %}
+  {{ field(**kwargs)|safe }}
+{% endmacro %}
@@ -12,7 +12,13 @@
        <link rel="stylesheet" href="{{ m }}?ver=1000">
        {% endfor %}
    {% endif %}
+    <style>
+    body::before {
+        background-image: url({{url_for('static_content', group='images', filename='gradient-border.png')}});
+    }
+    </style>
 </head>
+
 <body>

 <div class="header">
@@ -35,13 +41,13 @@
        {% if current_user.is_authenticated or not has_password %}
            {% if not current_diff_url %}
            <li class="pure-menu-item">
-                <a href="{{ url_for('get_backup')}}" class="pure-menu-link">BACKUP</a>
+                <a href="{{ url_for('settings_page')}}" class="pure-menu-link">SETTINGS</a>
            </li>
            <li class="pure-menu-item">
                <a href="{{ url_for('import_page')}}" class="pure-menu-link">IMPORT</a>
            </li>
            <li class="pure-menu-item">
-                <a href="{{ url_for('settings_page')}}" class="pure-menu-link">SETTINGS</a>
+                <a href="{{ url_for('get_backup')}}" class="pure-menu-link">BACKUP</a>
            </li>
            {% else %}
            <li class="pure-menu-item">
@@ -68,7 +74,7 @@
        </ul>
    </div>
 </div>
-
+{% if hosted_sticky %}<div class="sticky-tab" id="hosted-sticky"><a href="https://lemonade.changedetection.io/start?ref={{guid}}">Let us host your instance!</a></div>{% endif %}
 {% if left_sticky %}<div class="sticky-tab" id="left-sticky"><a href="{{url_for('preview_page', uuid=uuid)}}">Show current snapshot</a></div> {% endif %}
 {% if right_sticky %}<div class="sticky-tab" id="right-sticky">{{ right_sticky }}</div> {% endif %}
 <section class="content">
@@ -1,59 +0,0 @@
-{% extends 'base.html' %}
-
-{% block content %}
-
-<div id="settings">
-    <h1>Differences</h1>
-    <form class="pure-form " action="" method="GET">
-        <fieldset>
-            {% if versions|length >= 1 %}
-            <label for="diff-version">Compare newest (<span id="current-v-date"></span>) with</label>
-            <select id="diff-version" name="previous_version">
-                {% for version in versions %}
-                <option value="{{version}}" {% if version== current_previous_version %} selected="" {% endif %}>
-                    {{version}}
-                </option>
-                {% endfor %}
-            </select>
-            <button type="submit" class="pure-button pure-button-primary">Go</button>
-            {% endif %}
-        </fieldset>
-    </form>
-
-</div>
-
-<div id="diff-ui">
-  <img style="max-width: 100%" src="{{ url_for('render_diff_image', uuid=uuid, compare_date=current_previous_version) }}" />
-
-    <div>
-        <span style="width: 50%">
-            <img style="max-width: 100%" src="{{ url_for('show_single_image', uuid=uuid, datestr=newest_version_timestamp) }}" />
-        </span>
-        <span style="width: 50%">
-            <img style="max-width: 100%" src="{{ url_for('show_single_image', uuid=uuid, datestr=current_previous_version) }}" />
-        </span>
-    </div>
-</div>
-
-
-<script type="text/javascript" src="{{url_for('static_content', group='js', filename='diff.js')}}"></script>
-
-<script defer="">
-window.onload = function() {
-    /* Set current version date as local time in the browser also */
-    var current_v = document.getElementById("current-v-date");
-    var dateObject = new Date({{ newest_version_timestamp }}*1000);
-    current_v.innerHTML=dateObject.toLocaleString();
-
-    /* Convert what is options from UTC time.time() to local browser time */
-    var diffList=document.getElementById("diff-version");
-    if (typeof(diffList) != 'undefined' && diffList != null) {
-        for (var option of diffList.options) {
-          var dateObject = new Date(option.value*1000);
-          option.label=dateObject.toLocaleString();
-        }
-    }
-}
-</script>
-
-{% endblock %}
@@ -36,6 +36,7 @@
    <a onclick="next_diff();">Jump</a>
 </div>
 <div id="diff-ui">
+    <div class="tip">Pro-tip: Use <strong>show current snapshot</strong> tab to visualise what will be ignored.</div>
    <table>
        <tbody>
        <tr>
@@ -1,6 +1,7 @@
 {% extends 'base.html' %}
 {% block content %}
 {% from '_helpers.jinja' import render_field %}
+{% from '_helpers.jinja' import render_button %}
 {% from '_common_fields.jinja' import render_common_settings_form %}
 <script type="text/javascript" src="{{url_for('static_content', group='js', filename='tabs.js')}}" defer></script>

@@ -18,6 +19,7 @@
    <div class="box-wrap inner">
        <form class="pure-form pure-form-stacked"
              action="{{ url_for('edit_page', uuid=uuid, next = request.args.get('next') ) }}" method="POST">
+             <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>

            <div class="tab-pane-inner" id="general">
                <fieldset>
@@ -57,24 +59,33 @@
                        </span>
                    </div>

+                <hr/>
                <fieldset class="pure-group">
-                                    <div class="pure-control-group">
-                    {{ render_field(form.method) }}
-                </div>
-                    <strong>Note: <i>Request Headers and Body settings are ONLY used by Basic fast Plaintext/HTTP Client fetch method.</i></strong>
-                    {{ render_field(form.headers, rows=5, placeholder="Example
+
+                    <span class="pure-form-message-inline">
+                        <strong>Request override is currently only used by the <i>Basic fast Plaintext/HTTP Client</i> method.</strong>
+                    </span>
+                    <div class="pure-control-group">
+                        {{ render_field(form.method) }}
+                    </div>
+                    <div class="pure-control-group">
+{{ render_field(form.headers, rows=5, placeholder="Example
 Cookie: foobar
 User-Agent: wonderbra 1.0") }}
-                </fieldset>
-                <div class="pure-control-group">
-                    {{ render_field(form.body, rows=5, placeholder="Example
+                    </div>
+                    <div class="pure-control-group">
+                                        {{ render_field(form.body, rows=5, placeholder="Example
 {
   \"name\":\"John\",
   \"age\":30,
   \"car\":null
 }") }}
-                </div>
-
+                    </div>
+                    <div>
+                        {{ render_field(form.ignore_status_codes) }}
+                    </div>
+                </fieldset>
+                <br/>
            </div>

            <div class="tab-pane-inner" id="notifications">
@@ -88,22 +99,45 @@ User-Agent: wonderbra 1.0") }}

            <div class="tab-pane-inner" id="filters-and-triggers">
                <fieldset>
+                        <div class="pure-control-group">
+                            <strong>Pro-tips:</strong><br/>
+                            <ul>
+                                <li>
+                                    Use the preview page to see your filters and triggers highlighted.
+                                </li>
+                                <li>
+                                    Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a>
+                                </li>
+                            </ul>
+                    </div>
+
                    <div class="pure-control-group">
                        {{ render_field(form.css_filter, placeholder=".class-name or #some-id, or other CSS selector rule.",
                        class="m-d") }}
                        <span class="pure-form-message-inline">
                    <ul>
                        <li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
-                        <li>JSON - Limit text to this JSON rule, using <a href="https://pypi.org/project/jsonpath-ng/">JSONPath</a>, prefix with <b>"json:"</b>, <a
+                        <li>JSON - Limit text to this JSON rule, using <a href="https://pypi.org/project/jsonpath-ng/">JSONPath</a>, prefix with <code>"json:"</code>, use <code>json:$</code> to force re-formatting if required,  <a
                                href="https://jsonpath.com/" target="new">test your JSONPath here</a></li>
-                        <li>XPATH - Limit text to this XPath rule, simply start with a forward-slash, example  <b>//*[contains(@class, 'sametext')]</b>, <a
+                        <li>XPath - Limit text to this XPath rule, simply start with a forward-slash, example  <code>//*[contains(@class, 'sametext')]</code>, <a
                                href="http://xpather.com/" target="new">test your XPath here</a></li>
                    </ul>
                    Please be sure that you thoroughly understand how to write CSS or JSONPath, XPath selector rules before filing an issue on GitHub! <a
                                href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br/>
                </span>
                    </div>
-
+                    <fieldset class="pure-group">
+                      {{ render_field(form.subtractive_selectors, rows=5, placeholder="header
+footer
+nav
+.stockticker") }}
+                      <span class="pure-form-message-inline">
+                        <ul>
+                          <li> Remove HTML element(s) by CSS selector before text conversion. </li>
+                          <li> Add multiple elements or CSS selectors per line to ignore multiple parts of the HTML. </li>
+                        </ul>
+                      </span>
+                    </fieldset>
                </fieldset>
                <fieldset class="pure-group">
                    {{ render_field(form.ignore_text, rows=5, placeholder="Some text to ignore in a line
@@ -112,8 +146,9 @@ User-Agent: wonderbra 1.0") }}
                    <span class="pure-form-message-inline">
                        <ul>
                            <li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
-                            <li>Regular Expression support, wrap the line in forward slash <b>/regex/</b></li>
+                            <li>Regular Expression support, wrap the line in forward slash <code>/regex/</code></li>
                            <li>Changing this will affect the comparison checksum which may trigger an alert</li>
+                            <li>Use the preview/show current tab to see ignores</li>
                        </ul>
                </span>

@@ -127,8 +162,8 @@ User-Agent: wonderbra 1.0") }}
                    <ul>
                        <li>Text to wait for before triggering a change/notification, all text and regex are tested <i>case-insensitive</i>.</li>
                        <li>Trigger text is processed from the result-text that comes out of any CSS/JSON Filters for this watch</li>
-                        <li>Each line is process separately (think of each line as "OR")</li>
-                        <li>Note: Wrap in forward slash / to use regex  example: <span style="font-family: monospace; background: #eee">/foo\d/</span></li>
+                        <li>Each line is processed separately (think of each line as "OR")</li>
+                        <li>Note: Wrap in forward slash / to use regex  example: <code>/foo\d/</code></li>
                    </ul>
                        </span>
                    </div>
@@ -138,7 +173,8 @@ User-Agent: wonderbra 1.0") }}
            <div id="actions">
                <div class="pure-control-group">

-                    <button type="submit" class="pure-button pure-button-primary">Save</button>
+                      {{ render_button(form.save_button) }} {{ render_button(form.save_and_preview_button) }}
+
                    <a href="{{url_for('api_delete', uuid=uuid)}}"
                       class="pure-button button-small button-error ">Delete</a>
                    <a href="{{url_for('api_clone', uuid=uuid)}}"
@@ -4,6 +4,7 @@
 <div class="edit-form">
     <div class="inner">
        <form class="pure-form pure-form-aligned" action="{{url_for('import_page')}}" method="POST">
+            <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
            <fieldset class="pure-group">
              <legend>
                Enter one URL per line, and optionally add tags for each URL after a space, delineated by comma (,):
@@ -1,15 +1,15 @@
 {% extends 'base.html' %}

 {% block content %}
-<div class="edit-form">
-
+<div class="login-form">
 <div class="inner">
    <form class="pure-form pure-form-stacked" action="{{url_for('login')}}" method="POST">
+        <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
        <fieldset>
            <div class="pure-control-group">
                <label for="password">Password</label>
                <input type="password" id="password" required="" name="password" value=""
-                       size="15"/>
+                       size="15" autofocus />
                <input type="hidden" id="email" name="email" value="defaultuser@changedetection.io" />
            </div>
            <div class="pure-control-group">
@@ -1,13 +0,0 @@
-{% extends 'base.html' %}
-
-{% block content %}
-
-<div id="settings">
-    <h1>Current</h1>
-</div>
-
-<div id="diff-ui">
-    image goes here
-</div>
-
-{% endblock %}
@@ -3,19 +3,21 @@
 {% block content %}

 <div id="settings">
-    <h1>Current</h1>
+    <h1>Current - {{watch.last_checked|format_timestamp_timeago}}</h1>
 </div>

 <div id="diff-ui">
+    <span class="ignored">Grey lines are ignored</span> <span class="triggered">Blue lines are triggers</span>
    <table>
        <tbody>
        <tr>
            <td id="diff-col">
-                <span id="result">{{content}}</span>
+                    {% for row in content %}
+                    <div class="{{row.classes}}">{{row.line}}</div>
+                    {% endfor %}
            </td>
        </tr>
        </tbody>
    </table>
 </div>
-
 {% endblock %}
@@ -4,6 +4,7 @@
 <div class="edit-form">
    <div class="box-wrap inner">
    <form class="pure-form pure-form-stacked" action="{{url_for('scrub_page')}}" method="POST">
+        <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
        <fieldset>
            <div class="pure-control-group">
                This will remove all version snapshots/data, but keep your list of URLs. <br/>
@@ -1,7 +1,7 @@
 {% extends 'base.html' %}

 {% block content %}
-{% from '_helpers.jinja' import render_field %}
+{% from '_helpers.jinja' import render_field, render_button %}
 {% from '_common_fields.jinja' import render_common_settings_form %}

 <script type="text/javascript" src="{{url_for('static_content', group='js', filename='settings.js')}}" defer></script>
@@ -18,6 +18,7 @@
    </div>
    <div class="box-wrap inner">
        <form class="pure-form pure-form-stacked settings" action="{{url_for('settings_page')}}" method="POST">
+            <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
            <div class="tab-pane-inner" id="general">
                <fieldset>
                    <div class="pure-control-group">
@@ -27,8 +28,7 @@
                    <div class="pure-control-group">
                        {% if not hide_remove_pass %}
                            {% if current_user.is_authenticated %}
-                            <a href="{{url_for('settings_page', removepassword='yes')}}"
-                               class="pure-button pure-button-primary">Remove password</a>
+                                {{ render_button(form.removepassword_button) }}
                            {% else %}
                            {{ render_field(form.password) }}
                            <span class="pure-form-message-inline">Password protection for your changedetection.io application.</span>
@@ -83,7 +83,18 @@
                    </span>
                    </fieldset>

-
+                    <fieldset class="pure-group">
+                      {{ render_field(form.global_subtractive_selectors, rows=5, placeholder="header
+footer
+nav
+.stockticker") }}
+                      <span class="pure-form-message-inline">
+                        <ul>
+                          <li> Remove HTML element(s) by CSS selector before text conversion. </li>
+                          <li> Add multiple elements or CSS selectors per line to ignore multiple parts of the HTML. </li>
+                        </ul>
+                      </span>
+                    </fieldset>
                    <fieldset class="pure-group">
                    {{ render_field(form.global_ignore_text, rows=5, placeholder="Some text to ignore in a line
 /some.regex\d{2}/ for case-INsensitive regex
@@ -93,8 +104,9 @@
                        <ul>
                            <li>Note: This is applied globally in addition to the per-watch rules.</li>
                            <li>Each line processed separately, any line matching will be ignored (removed before creating the checksum)</li>
-                            <li>Regular Expression support, wrap the line in forward slash <b>/regex/</b></li>
+                            <li>Regular Expression support, wrap the line in forward slash <code>/regex/</code></li>
                            <li>Changing this will affect the comparison checksum which may trigger an alert</li>
+                            <li>Use the preview/show current tab to see ignores</li>
                        </ul>
                     </span>
                    </fieldset>
@@ -102,11 +114,9 @@

            <div id="actions">
                <div class="pure-control-group">
-                    <button type="submit" class="pure-button pure-button-primary">Save</button>
-                                           <a href="{{url_for('index')}}" class="pure-button button-small button-cancel">Back</a>
-                        <a href="{{url_for('scrub_page')}}" class="pure-button button-small button-cancel">Delete
-                            History
-                            Snapshot Data</a>
+                    {{ render_button(form.save_button) }}
+                    <a href="{{url_for('index')}}" class="pure-button button-small button-cancel">Back</a>
+                    <a href="{{url_for('scrub_page')}}" class="pure-button button-small button-cancel">Delete History Snapshot Data</a>
                </div>
            </div>
        </form>
@@ -5,6 +5,7 @@
 <div class="box">

    <form class="pure-form" action="{{ url_for('api_watch_add') }}" method="POST" id="new-watch-form">
+        <input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
        <fieldset>
            <legend>Add a new change detection watch</legend>
                {{ render_simple_field(form.url, placeholder="https://...", required=true) }}
@@ -42,6 +42,9 @@ def app(request):
    cleanup(app_config['datastore_path'])
    datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], include_default_watches=False)
    app = changedetection_app(app_config, datastore)
+
+    # Disable CSRF while running tests
+    app.config['WTF_CSRF_ENABLED'] = False
    app.config['STOP_THREADS'] = True

    def teardown():
@@ -4,8 +4,8 @@ from flask import url_for
 def test_check_access_control(app, client):
    # Still doesnt work, but this is closer.

-    with app.test_client() as c:
-        # Check we dont have any password protection enabled yet.
+    with app.test_client(use_cookies=True) as c:
+        # Check we don't have any password protection enabled yet.
        res = c.get(url_for("settings_page"))
        assert b"Remove password" not in res.data

@@ -46,15 +46,20 @@ def test_check_access_control(app, client):
        assert b"BACKUP" in res.data
        assert b"IMPORT" in res.data
        assert b"LOG OUT" in res.data
+        assert b"minutes_between_check" in res.data
+        assert b"fetch_backend" in res.data

-        # Now remove the password so other tests function, @todo this should happen before each test automatically
-        res = c.get(url_for("settings_page", removepassword="yes"),
-              follow_redirects=True)
-        assert b"Password protection removed." in res.data
-
-        res = c.get(url_for("index"))
-        assert b"LOG OUT" not in res.data
-
+        res = c.post(
+            url_for("settings_page"),
+            data={
+                "minutes_between_check": 180,
+                "tag": "",
+                "headers": "",
+                "fetch_backend": "html_webdriver",
+                "removepassword_button": "Remove password"
+            },
+            follow_redirects=True,
+        )

 # There was a bug where saving the settings form would submit a blank password
 def test_check_access_control_no_blank_password(app, client):
@@ -71,8 +76,7 @@ def test_check_access_control_no_blank_password(app, client):
            data={"password": "",
                  "minutes_between_check": 180,
                  'fetch_backend': "html_requests"},
-
-        follow_redirects=True
+            follow_redirects=True
        )

        assert b"Password protection enabled." not in res.data
@@ -91,7 +95,8 @@ def test_check_access_no_remote_access_to_remove_password(app, client):
        # Enable password check.
        res = c.post(
            url_for("settings_page"),
-            data={"password": "password", "minutes_between_check": 180,
+            data={"password": "password",
+                  "minutes_between_check": 180,
                  'fetch_backend': "html_requests"},
            follow_redirects=True
        )
@@ -99,8 +104,17 @@ def test_check_access_no_remote_access_to_remove_password(app, client):
        assert b"Password protection enabled." in res.data
        assert b"Login" in res.data

-        res = c.get(url_for("settings_page", removepassword="yes"),
-              follow_redirects=True)
+        res = c.post(
+            url_for("settings_page"),
+            data={
+                "minutes_between_check": 180,
+                "tag": "",
+                "headers": "",
+                "fetch_backend": "html_webdriver",
+                "removepassword_button": "Remove password"
+            },
+            follow_redirects=True,
+        )
        assert b"Password protection removed." not in res.data

        res = c.get(url_for("index"),
@@ -14,7 +14,6 @@ def set_response_data(test_return_data):


 def test_snapshot_api_detects_change(client, live_server):
-
    test_return_data = "Some initial text"

    test_return_data_modified = "Some NEW nice initial text"
@@ -27,7 +26,7 @@ def test_snapshot_api_detects_change(client, live_server):
    time.sleep(1)

    # Add our URL to the import page
-    test_url = url_for('test_endpoint', _external=True)
+    test_url = url_for('test_endpoint', content_type="text/plain", _external=True)
    res = client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -7,6 +7,13 @@ from . util import set_original_response, set_modified_response, live_server_set

 sleep_time_for_fetch_thread = 3

+# Basic test to check inscriptus is not adding return line chars, basically works etc
+def test_inscriptus():
+    from inscriptis import get_text
+    html_content="<html><body>test!<br/>ok man</body></html>"
+    stripped_text_from_html = get_text(html_content)
+    assert stripped_text_from_html == 'test!\nok man'
+

 def test_check_basic_change_detection_functionality(client, live_server):
    set_original_response()
@@ -18,6 +25,7 @@ def test_check_basic_change_detection_functionality(client, live_server):
        data={"urls": url_for('test_endpoint', _external=True)},
        follow_redirects=True
    )
+
    assert b"1 Imported" in res.data

    time.sleep(sleep_time_for_fetch_thread)
@@ -62,6 +70,11 @@ def test_check_basic_change_detection_functionality(client, live_server):
    res = client.get(url_for("rss"))
    expected_url = url_for('test_endpoint', _external=True)
    assert b'<rss' in res.data
+
+    # re #16 should have the diff in here too
+    assert b'(into   ) which has this one new line' in res.data
+    assert b'CDATA' in res.data
+    
    assert expected_url.encode('utf-8') in res.data

    # Following the 'diff' link, it should no longer display as 'unviewed' even after we recheck it a few times
@@ -100,14 +113,6 @@ def test_check_basic_change_detection_functionality(client, live_server):
    # It should have picked up the <title>
    assert b'head title' in res.data

-
-    # be sure the HTML converter worked
-    res = client.get(url_for("preview_page", uuid="first"))
-    assert b'<html>' not in res.data
-
-    res = client.get(url_for("preview_page", uuid="first"))
-    assert b'Some initial text' in res.data
-
    #
    # Cleanup everything
    res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
@@ -1,56 +0,0 @@
-#!/usr/bin/python3
-
-import time
-import secrets
-from flask import url_for
-from . util import live_server_setup
-
-
-def test_binary_file_change(client, live_server):
-    with open("test-datastore/test.bin", "wb") as f:
-        f.write(secrets.token_bytes())
-
-    live_server_setup(live_server)
-
-    sleep_time_for_fetch_thread = 3
-
-    # Give the endpoint time to spin up
-    time.sleep(1)
-
-    # Add our URL to the import page
-    test_url = url_for('test_binaryfile_endpoint', _external=True)
-    res = client.post(
-        url_for("import_page"),
-        data={"urls": test_url},
-        follow_redirects=True
-    )
-    assert b"1 Imported" in res.data
-
-    # Trigger a check
-    client.get(url_for("api_watch_checknow"), follow_redirects=True)
-
-    # Give the thread time to pick it up
-    time.sleep(sleep_time_for_fetch_thread)
-
-    # Trigger a check
-    client.get(url_for("api_watch_checknow"), follow_redirects=True)
-
-    # It should report nothing found (no new 'unviewed' class)
-    res = client.get(url_for("index"))
-    assert b'unviewed' not in res.data
-    assert b'/test-binary-endpoint' in res.data
-
-    #  Make a change
-    with open("test-datastore/test.bin", "wb") as f:
-        f.write(secrets.token_bytes())
-
-
-    # Trigger a check
-    client.get(url_for("api_watch_checknow"), follow_redirects=True)
-
-    # Give the thread time to pick it up
-    time.sleep(sleep_time_for_fetch_thread)
-
-    # It should report nothing found (no new 'unviewed' class)
-    res = client.get(url_for("index"))
-    assert b'unviewed' in res.data
@@ -0,0 +1,168 @@
+#!/usr/bin/python3
+
+import time
+
+from flask import url_for
+
+from ..html_tools import *
+from .util import live_server_setup
+
+
+def test_setup(live_server):
+    live_server_setup(live_server)
+
+
+def set_original_response():
+    test_return_data = """<html>
+    <header>
+    <h2>Header</h2>
+    </header>
+    <nav>
+    <ul>
+      <li><a href="#">A</a></li>
+      <li><a href="#">B</a></li>
+      <li><a href="#">C</a></li>
+    </ul>
+    </nav>
+       <body>
+     Some initial text</br>
+     <p>Which is across multiple lines</p>
+     </br>
+     So let's see what happens.  </br>
+    <div id="changetext">Some text that will change</div>
+     </body>
+    <footer>
+    <p>Footer</p>
+    </footer>
+     </html>
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+
+def set_modified_response():
+    test_return_data = """<html>
+    <header>
+    <h2>Header changed</h2>
+    </header>
+    <nav>
+    <ul>
+      <li><a href="#">A changed</a></li>
+      <li><a href="#">B</a></li>
+      <li><a href="#">C</a></li>
+    </ul>
+    </nav>
+       <body>
+     Some initial text</br>
+     <p>Which is across multiple lines</p>
+     </br>
+     So let's see what happens.  </br>
+    <div id="changetext">Some text that changes</div>
+     </body>
+    <footer>
+    <p>Footer changed</p>
+    </footer>
+     </html>
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+
+def test_element_removal_output():
+    from changedetectionio import fetch_site_status
+    from inscriptis import get_text
+
+    # Check text with sub-parts renders correctly
+    content = """<html>
+    <header>
+    <h2>Header</h2>
+    </header>
+    <nav>
+    <ul>
+      <li><a href="#">A</a></li>
+    </ul>
+    </nav>
+       <body>
+     Some initial text</br>
+     <p>across multiple lines</p>
+     <div id="changetext">Some text that changes</div>
+     </body>
+    <footer>
+    <p>Footer</p>
+    </footer>
+     </html>
+    """
+    html_blob = element_removal(
+        ["header", "footer", "nav", "#changetext"], html_content=content
+    )
+    text = get_text(html_blob)
+    assert (
+        text
+        == """Some initial text
+
+across multiple lines
+"""
+    )
+
+
+def test_element_removal_full(client, live_server):
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for("test_endpoint", _external=True)
+    res = client.post(
+        url_for("import_page"), data={"urls": test_url}, follow_redirects=True
+    )
+    assert b"1 Imported" in res.data
+
+    # Goto the edit page, add the filter data
+    # Not sure why \r needs to be added - absent of the #changetext this is not necessary
+    subtractive_selectors_data = "header\r\nfooter\r\nnav\r\n#changetext"
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={
+            "subtractive_selectors": subtractive_selectors_data,
+            "url": test_url,
+            "tag": "",
+            "headers": "",
+            "fetch_backend": "html_requests",
+        },
+        follow_redirects=True,
+    )
+    assert b"Updated watch." in res.data
+
+    # Check it saved
+    res = client.get(
+        url_for("edit_page", uuid="first"),
+    )
+    assert bytes(subtractive_selectors_data.encode("utf-8")) in res.data
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # No change yet - first check
+    res = client.get(url_for("index"))
+    assert b"unviewed" not in res.data
+
+    #  Make a change to header/footer/nav
+    set_modified_response()
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # There should not be an unviewed change, as changes should be removed
+    res = client.get(url_for("index"))
+    assert b"unviewed" not in res.data
@@ -0,0 +1,87 @@
+#!/usr/bin/python3
+# coding=utf-8
+
+import time
+from flask import url_for
+from .util import live_server_setup
+import pytest
+
+
+def test_setup(live_server):
+    live_server_setup(live_server)
+
+
+def set_html_response():
+    test_return_data = """
+<html><body><span class="nav_second_img_text">
+                  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;铸大国重器，挺制造脊梁，致力能源未来，赋能美好生活。
+                                  </span>
+</body></html>
+    """
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+    return None
+
+
+# In the case the server does not issue a charset= or doesnt have content_type header set
+def test_check_encoding_detection(client, live_server):
+    set_html_response()
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', content_type="text/html", _external=True)
+    client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(2)
+
+    res = client.get(
+        url_for("preview_page", uuid="first"),
+        follow_redirects=True
+    )
+
+    # Should see the proper string
+    assert "铸大国重".encode('utf-8') in res.data
+    # Should not see the failed encoding
+    assert b'\xc2\xa7' not in res.data
+
+
+# In the case the server does not issue a charset= or doesnt have content_type header set
+def test_check_encoding_detection_missing_content_type_header(client, live_server):
+    set_html_response()
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', _external=True)
+    client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(2)
+
+    res = client.get(
+        url_for("preview_page", uuid="first"),
+        follow_redirects=True
+    )
+
+    # Should see the proper string
+    assert "铸大国重".encode('utf-8') in res.data
+    # Should not see the failed encoding
+    assert b'\xc2\xa7' not in res.data
@@ -1,6 +1,7 @@
 #!/usr/bin/python3

 import time
+
 from flask import url_for
 from . util import live_server_setup

@@ -17,7 +18,9 @@ def test_error_handler(client, live_server):
    time.sleep(1)

    # Add our URL to the import page
-    test_url = url_for('test_endpoint_403_error', _external=True)
+    test_url = url_for('test_endpoint',
+                       status_code=403,
+                       _external=True)
    res = client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -3,6 +3,7 @@
 import time
 from flask import url_for
 from . util import live_server_setup
+from changedetectionio import html_tools

 def test_setup(live_server):
    live_server_setup(live_server)
@@ -23,7 +24,7 @@ def test_strip_regex_text_func():
    ignore_lines = ["sometimes", "/\s\d{2,3}\s/", "/ignore-case text/"]

    fetcher = fetch_site_status.perform_site_check(datastore=False)
-    stripped_content = fetcher.strip_ignore_text(test_content, ignore_lines)
+    stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)

    assert b"but 1 lines" in stripped_content
    assert b"igNORe-cAse text" not in stripped_content
@@ -3,6 +3,7 @@
 import time
 from flask import url_for
 from . util import live_server_setup
+from changedetectionio import html_tools

 def test_setup(live_server):
    live_server_setup(live_server)
@@ -23,7 +24,7 @@ def test_strip_text_func():
    ignore_lines = ["sometimes"]

    fetcher = fetch_site_status.perform_site_check(datastore=False)
-    stripped_content = fetcher.strip_ignore_text(test_content, ignore_lines)
+    stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines)

    assert b"sometimes" not in stripped_content
    assert b"Some content" in stripped_content
@@ -52,6 +53,8 @@ def set_modified_original_ignore_response():
     <p>Which is across multiple lines</p>
     </br>
     So let's see what happens.  </br>
+     <p>new ignore stuff</p>
+     <p>blah</p>
     </body>
     </html>

@@ -67,7 +70,7 @@ def set_modified_ignore_response():
       <body>
     Some initial text</br>
     <p>Which is across multiple lines</p>
-     <P>ZZZZZ</P>
+     <P>ZZZZz</P>
     </br>
     So let's see what happens.  </br>
     </body>
@@ -82,7 +85,8 @@ def set_modified_ignore_response():
 def test_check_ignore_text_functionality(client, live_server):
    sleep_time_for_fetch_thread = 3

-    ignore_text = "XXXXX\r\nYYYYY\r\nZZZZZ"
+    # Use a mix of case in ZzZ to prove it works case-insensitive.
+    ignore_text = "XXXXX\r\nYYYYY\r\nzZzZZ\r\nnew ignore stuff"
    set_original_ignore_response()

    # Give the endpoint time to spin up
@@ -142,13 +146,25 @@ def test_check_ignore_text_functionality(client, live_server):
    assert b'unviewed' not in res.data
    assert b'/test-endpoint' in res.data

+
+
+
+
    # Just to be sure.. set a regular modified change..
    set_modified_original_ignore_response()
    client.get(url_for("api_watch_checknow"), follow_redirects=True)
    time.sleep(sleep_time_for_fetch_thread)
+
    res = client.get(url_for("index"))
    assert b'unviewed' in res.data

+    # Check the preview/highlighter, we should be able to see what we ignored, but it should be highlighted
+    # We only introduce the "modified" content that includes what we ignore so we can prove the newest version also displays
+    # at /preview
+    res = client.get(url_for("preview_page", uuid="first"))
+    # We should be able to see what we ignored
+    assert b'<div class="ignored">new ignore stuff' in res.data
+
    res = client.get(url_for("api_delete", uuid="all"), follow_redirects=True)
    assert b'Deleted' in res.data

@@ -0,0 +1,190 @@
+#!/usr/bin/python3
+
+import time
+from flask import url_for
+from . util import live_server_setup
+
+
+def test_setup(live_server):
+    live_server_setup(live_server)
+
+
+def set_original_response():
+    test_return_data = """<html>
+       <body>
+     Some initial text</br>
+     <p>Which is across multiple lines</p>
+     </br>
+     So let's see what happens.  </br>
+     </body>
+     </html>
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+
+def set_some_changed_response():
+    test_return_data = """<html>
+       <body>
+     Some initial text</br>
+     <p>Which is across multiple lines, and a new thing too.</p>
+     </br>
+     So let's see what happens.  </br>
+     </body>
+     </html>
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+
+def test_normal_page_check_works_with_ignore_status_code(client, live_server):
+    sleep_time_for_fetch_thread = 3
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    set_original_response()
+
+    # Goto the settings page, add our ignore text
+    res = client.post(
+        url_for("settings_page"),
+        data={
+            "minutes_between_check": 180,
+            "ignore_status_codes": "y",
+            'fetch_backend': "html_requests"
+        },
+        follow_redirects=True
+    )
+    assert b"Settings updated." in res.data
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+    assert b"1 Imported" in res.data
+
+    time.sleep(sleep_time_for_fetch_thread)
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    set_some_changed_response()
+    time.sleep(sleep_time_for_fetch_thread)
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # It should report nothing found (no new 'unviewed' class)
+    res = client.get(url_for("index"))
+    assert b'unviewed' in res.data
+    assert b'/test-endpoint' in res.data
+
+
+# Tests the whole stack works with staus codes ignored
+def test_403_page_check_works_with_ignore_status_code(client, live_server):
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', status_code=403, _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+    assert b"1 Imported" in res.data
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # Goto the edit page, check our ignore option
+    # Add our URL to the import page
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={"ignore_status_codes": "y", "url": test_url, "tag": "", "headers": "", 'fetch_backend': "html_requests"},
+        follow_redirects=True
+    )
+    assert b"Updated watch." in res.data
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+    #  Make a change
+    set_some_changed_response()
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # It should have 'unviewed' still
+    # Because it should be looking at only that 'sametext' id
+    res = client.get(url_for("index"))
+    assert b'unviewed' in res.data
+
+
+# Tests the whole stack works with staus codes ignored
+def test_403_page_check_fails_without_ignore_status_code(client, live_server):
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', status_code=403, _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+    assert b"1 Imported" in res.data
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # Goto the edit page, check our ignore option
+    # Add our URL to the import page
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={"url": test_url, "tag": "", "headers": "", 'fetch_backend': "html_requests"},
+        follow_redirects=True
+    )
+    assert b"Updated watch." in res.data
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+    #  Make a change
+    set_some_changed_response()
+
+    # Trigger a check
+    client.get(url_for("api_watch_checknow"), follow_redirects=True)
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # It should have 'unviewed' still
+    # Because it should be looking at only that 'sametext' id
+    res = client.get(url_for("index"))
+    assert b'Status Code 403' in res.data
@@ -1,4 +1,5 @@
 #!/usr/bin/python3
+# coding=utf-8

 import time
 from flask import url_for
@@ -142,7 +143,7 @@ def set_modified_response():
        }
      ],
      "boss": {
-        "name": "Foobar"
+        "name": "Örnsköldsvik"
      },
      "available": false
    }
@@ -162,7 +163,7 @@ def test_check_json_without_filter(client, live_server):
    time.sleep(1)

    # Add our URL to the import page
-    test_url = url_for('test_endpoint_json', _external=True)
+    test_url = url_for('test_endpoint', content_type="application/json", _external=True)
    client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -193,7 +194,7 @@ def test_check_json_filter(client, live_server):
    time.sleep(1)

    # Add our URL to the import page
-    test_url = url_for('test_endpoint', _external=True)
+    test_url = url_for('test_endpoint', content_type="application/json", _external=True)
    res = client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -246,8 +247,10 @@ def test_check_json_filter(client, live_server):

    # Should not see this, because its not in the JSONPath we entered
    res = client.get(url_for("diff_history_page", uuid="first"))
+
    # But the change should be there, tho its hard to test the change was detected because it will show old and new versions
-    assert b'Foobar' in res.data
+    # And #462 - check we see the proper utf-8 string there
+    assert "Örnsköldsvik".encode('utf-8') in res.data


 def test_check_json_filter_bool_val(client, live_server):
@@ -258,7 +261,7 @@ def test_check_json_filter_bool_val(client, live_server):
    # Give the endpoint time to spin up
    time.sleep(1)

-    test_url = url_for('test_endpoint', _external=True)
+    test_url = url_for('test_endpoint', content_type="application/json", _external=True)

    res = client.post(
        url_for("import_page"),
@@ -313,7 +316,7 @@ def test_check_json_ext_filter(client, live_server):
    time.sleep(1)

    # Add our URL to the import page
-    test_url = url_for('test_endpoint', _external=True)
+    test_url = url_for('test_endpoint', content_type="application/json", _external=True)
    res = client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -125,7 +125,7 @@ def test_check_notification(client, live_server):
    # Diff was correctly executed
    assert "Diff Full: Some initial text" in notification_submission
    assert "Diff: (changed) Which is across multiple lines" in notification_submission
-    assert "(-> into) which has this one new line" in notification_submission
+    assert "(into   ) which has this one new line" in notification_submission


    if env_base_url:
@@ -77,14 +77,6 @@ def test_body_in_request(client, live_server):
    # Add our URL to the import page
    test_url = url_for('test_body', _external=True)

-    # Add the test URL twice, we will check
-    res = client.post(
-        url_for("import_page"),
-        data={"urls": test_url},
-        follow_redirects=True
-    )
-    assert b"1 Imported" in res.data
-
    res = client.post(
        url_for("import_page"),
        data={"urls": test_url},
@@ -94,19 +86,6 @@ def test_body_in_request(client, live_server):

    body_value = 'Test Body Value'

-    # Attempt to add a body with a GET method
-    res = client.post(
-        url_for("edit_page", uuid="first"),
-        data={
-              "url": test_url,
-              "tag": "",
-              "method": "GET",
-              "fetch_backend": "html_requests",
-              "body": "invalid"},
-        follow_redirects=True
-    )
-    assert b"Body must be empty when Request Method is set to GET" in res.data
-
    # Add a properly formatted body with a proper method
    res = client.post(
        url_for("edit_page", uuid="first"),
@@ -120,8 +99,7 @@ def test_body_in_request(client, live_server):
    )
    assert b"Updated watch." in res.data

-    # Give the thread time to pick up the first version
-    time.sleep(5)
+    time.sleep(3)

    # The service should echo back the body
    res = client.get(
@@ -129,9 +107,20 @@ def test_body_in_request(client, live_server):
        follow_redirects=True
    )

-    # Check if body returned contains the specified data
+    # If this gets stuck something is wrong, something should always be there
+    assert b"No history found" not in res.data
+    # We should see what we sent in the reply
    assert str.encode(body_value) in res.data

+    ####### data sanity checks
+    # Add the test URL twice, we will check
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+    assert b"1 Imported" in res.data
+
    watches_with_body = 0
    with open('test-datastore/url-watches.json') as f:
        app_struct = json.load(f)
@@ -142,6 +131,20 @@ def test_body_in_request(client, live_server):
    # Should be only one with body set
    assert watches_with_body==1

+    # Attempt to add a body with a GET method
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={
+              "url": test_url,
+              "tag": "",
+              "method": "GET",
+              "fetch_backend": "html_requests",
+              "body": "invalid"},
+        follow_redirects=True
+    )
+    assert b"Body must be empty when Request Method is set to GET" in res.data
+
+
 def test_method_in_request(client, live_server):
    # Add our URL to the import page
    test_url = url_for('test_method', _external=True)
@@ -0,0 +1,36 @@
+from flask import url_for
+from . util import set_original_response, set_modified_response, live_server_setup
+import time
+
+def test_setup(live_server):
+    live_server_setup(live_server)
+
+def test_file_access(client, live_server):
+
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": 'https://localhost'},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+
+    # Attempt to add a body with a GET method
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={
+              "url": 'file:///etc/passwd',
+              "tag": "",
+              "method": "GET",
+              "fetch_backend": "html_requests",
+              "body": ""},
+        follow_redirects=True
+    )
+    time.sleep(3)
+
+    res = client.get(
+        url_for("index", uuid="first"),
+        follow_redirects=True
+    )
+
+    assert b'denied for security reasons' in res.data
@@ -129,3 +129,8 @@ def test_trigger_functionality(client, live_server):
    time.sleep(sleep_time_for_fetch_thread)
    res = client.get(url_for("index"))
    assert b'unviewed' in res.data
+
+    # Check the preview/highlighter, we should be able to see what we triggered on, but it should be highlighted
+    res = client.get(url_for("preview_page", uuid="first"))
+    # We should be able to see what we ignored
+    assert b'<div class="triggered">foobar' in res.data
@@ -96,6 +96,7 @@ def test_check_markup_xpath_filter_restriction(client, live_server):
    res = client.get(url_for("index"))
    assert b'unviewed' not in res.data

+
 def test_xpath_validation(client, live_server):

    # Give the endpoint time to spin up
@@ -0,0 +1,3 @@
+After twenty years, as cursed as I may be
+ok
+and insure that I'm one of those computer nerds.
@@ -2,5 +2,6 @@ After twenty years, as cursed as I may be
 for having learned computerese,
 I continue to examine bits, bytes and words
 xok
+next-x-ok
 and insure that I'm one of those computer nerds.
 and something new
@@ -12,12 +12,19 @@ from changedetectionio import diff
 class TestDiffBuilder(unittest.TestCase):

    def test_expected_diff_output(self):
-        base_dir=os.path.dirname(__file__)
-        output = diff.render_diff(base_dir+"/test-content/before.txt", base_dir+"/test-content/after.txt")
+        base_dir = os.path.dirname(__file__)
+        output = diff.render_diff(previous_file=base_dir + "/test-content/before.txt", newest_file=base_dir + "/test-content/after.txt")
        output = output.split("\n")
-        self.assertIn("(changed) ok", output)
-        self.assertIn("(-> into) xok", output)
-        self.assertIn("(added) and something new", output)
+        self.assertIn('(changed) ok', output)
+        self.assertIn('(into   ) xok', output)
+        self.assertIn('(into   ) next-x-ok', output)
+        self.assertIn('(added  ) and something new', output)
+
+
+        output = diff.render_diff(previous_file=base_dir + "/test-content/before.txt", newest_file=base_dir + "/test-content/after-2.txt")
+        output = output.split("\n")
+        self.assertIn('(removed) for having learned computerese,', output)
+        self.assertIn('(removed) I continue to examine bits, bytes and words', output)

        # @todo test blocks of changed, blocks of added, blocks of removed

@@ -1,5 +1,6 @@
 #!/usr/bin/python3

+from flask import make_response, request

 def set_original_response():
    test_return_data = """<html>
@@ -37,45 +38,24 @@ def set_modified_response():

 def live_server_setup(live_server):

-    @live_server.app.route('/test-binary-endpoint')
-    def test_binaryfile_endpoint():
-
-        from flask import make_response
-
-        # Tried using a global var here but didn't seem to work, so reading from a file instead.
-        with open("test-datastore/test.bin", "rb") as f:
-            resp = make_response(f.read())
-            resp.headers['Content-Type'] = 'image/jpeg'
-            return resp
-
    @live_server.app.route('/test-endpoint')
    def test_endpoint():
-        # Tried using a global var here but didn't seem to work, so reading from a file instead.
-        with open("test-datastore/endpoint-content.txt", "r") as f:
-            return f.read()
+        ctype = request.args.get('content_type')
+        status_code = request.args.get('status_code')

-    @live_server.app.route('/test-endpoint-json')
-    def test_endpoint_json():
-
-        from flask import make_response
-
-        with open("test-datastore/endpoint-content.txt", "r") as f:
-            resp = make_response(f.read())
-            resp.headers['Content-Type'] = 'application/json'
-            return resp
-
-    @live_server.app.route('/test-403')
-    def test_endpoint_403_error():
-
-        from flask import make_response
-        resp = make_response('', 403)
-        return resp
+        try:
+            # Tried using a global var here but didn't seem to work, so reading from a file instead.
+            with open("test-datastore/endpoint-content.txt", "r") as f:
+                resp = make_response(f.read(), status_code)
+                resp.headers['Content-Type'] = ctype if ctype else 'text/html'
+                return resp
+        except FileNotFoundError:
+            return make_response('', status_code)

    # Just return the headers in the request
    @live_server.app.route('/test-headers')
    def test_headers():

-        from flask import request
        output= []

        for header in request.headers:
@@ -86,24 +66,16 @@ def live_server_setup(live_server):
    # Just return the body in the request
    @live_server.app.route('/test-body', methods=['POST', 'GET'])
    def test_body():
-
-        from flask import request
-
        return request.data

    # Just return the verb in the request
    @live_server.app.route('/test-method', methods=['POST', 'GET', 'PATCH'])
    def test_method():
-
-        from flask import request
-
        return request.method

    # Where we POST to as a notification
    @live_server.app.route('/test_notification_endpoint', methods=['POST', 'GET'])
    def test_notification_endpoint():
-        from flask import request
-
        with open("test-datastore/notification.txt", "wb") as f:
            # Debug method, dump all POST to file also, used to prove #65
            data = request.stream.read()
@@ -117,8 +89,6 @@ def live_server_setup(live_server):
    # Just return the verb in the request
    @live_server.app.route('/test-basicauth', methods=['GET'])
    def test_basicauth_method():
-
-        from flask import request
        auth = request.authorization
        ret = " ".join([auth.username, auth.password, auth.type])
        return ret
@@ -49,8 +49,6 @@ class update_worker(threading.Thread):
                        # We then convert/.decode('utf-8') for the notification etc
                        if not isinstance(contents, (bytes, bytearray)):
                            raise Exception("Error - returned data from the fetch handler SHOULD be bytes")
-
-
                    except PermissionError as e:
                        self.app.logger.error("File permission error updating", uuid, str(e))
                    except content_fetcher.EmptyReply as e:
@@ -134,8 +132,10 @@ class update_worker(threading.Thread):

                        except Exception as e:
                            # Catch everything possible here, so that if a worker crashes, we don't lose it until restart!
+                            print("!!!! Exception in update_worker !!!\n", e)
                            self.app.logger.error("Exception reached processing watch UUID: %s - %s", uuid, str(e))
                            self.datastore.update_watch(uuid=uuid, update_obj={'last_error': str(e)})
+
                    finally:
                        # Always record that we atleast tried
                        self.datastore.update_watch(uuid=uuid, update_obj={'fetch_time': round(time.time() - now, 3),
@@ -144,4 +144,7 @@ class update_worker(threading.Thread):
                self.current_uuid = None  # Done
                self.q.task_done()

+                # Give the CPU time to interrupt
+                time.sleep(0.1)
+
            self.app.config.exit.wait(1)
@@ -1,9 +1,9 @@
 version: '2'
 services:
-    changedetection.io:
+    changedetection:
      image: ghcr.io/dgtlmoon/changedetection.io
-      container_name: changedetection.io
-      hostname: changedetection.io
+      container_name: changedetection
+      hostname: changedetection
      volumes:
        - changedetection-data:/datastore

@@ -1,9 +1,9 @@
 flask~= 2.0
-
+flask_wtf
 eventlet>=0.31.0
 validators
 timeago ~=1.0
-inscriptis ~= 1.2
+inscriptis ~= 2.2
 feedgen ~= 0.9
 flask-login ~= 0.5
 pytz
@@ -17,7 +17,7 @@ wtforms ~= 2.3.3
 jsonpath-ng ~= 1.5.3

 # Notification library
-apprise ~= 0.9.6
+apprise ~= 0.9.7

 # apprise mqtt https://github.com/dgtlmoon/changedetection.io/issues/315
 paho-mqtt
@@ -34,5 +34,9 @@ lxml

 # 3.141 was missing socksVersion, 3.150 was not in pypi, so we try 4.1.0
 selenium ~= 4.1.0
-pytest ~=6.2
-pytest-flask ~=1.2
+
+# https://stackoverflow.com/questions/71652965/importerror-cannot-import-name-safe-str-cmp-from-werkzeug-security/71653849#71653849
+# ImportError: cannot import name 'safe_str_cmp' from 'werkzeug.security'
+# need to revisit flask login versions
+werkzeug ~= 2.0.0
+
@@ -32,11 +32,11 @@ setup(
    long_description_content_type='text/markdown',
    keywords='website change monitor for changes notification change detection '
             'alerts tracking website tracker change alert website and monitoring',
-    zip_safe=False,
-    entry_points={"console_scripts": ["changedetection.io=changedetection:main"]},
+    entry_points={"console_scripts": ["changedetection.io=changedetectionio.changedetection:main"]},
+    zip_safe=True,
+    scripts=["changedetection.py"],
    author='dgtlmoon',
    url='https://changedetection.io',
-    scripts=['changedetection.py'],
    packages=['changedetectionio'],
    include_package_data=True,
    install_requires=install_requires,
Author	SHA1	Message	Date
dgtlmoon	487967de29	Adding test	2022-03-30 10:32:51 +02:00
dgtlmoon	304ef746e4	Merge branch 'master' into ticket-16-diff-in-RSS	2022-03-30 10:24:40 +02:00
dgtlmoon	082634f851	Fix - {diff} and {diff_full} notifications tokens were not always including the full output	2022-03-29 19:18:26 +02:00
dgtlmoon	b9222e3243	Merge branch 'master' into ticket-16-diff-in-RSS	2022-03-26 15:53:12 +01:00
dgtlmoon	1d92d9461a	WIP	2022-03-26 15:35:56 +01:00
dgtlmoon	334010025f	Update README.md	2022-03-26 14:02:56 +01:00
dgtlmoon	81aa8fa16b	Update README.md	2022-03-26 09:56:56 +01:00
dgtlmoon	c79d6824e3	Minor UI cleanups (mobile tabs, font sizing) (#503 )	2022-03-25 23:37:28 +01:00
zznidar	946377d2be	Fix typo in Filters & Triggers settings. (#495 )	2022-03-23 23:18:04 +01:00
zznidar	5db9a30ad4	Add autofocus attribute to password login field (#496 )	2022-03-23 23:17:47 +01:00
dgtlmoon	1d060225e1	0.39.11	2022-03-23 09:42:51 +01:00
dgtlmoon	7e0f0d0fd8	Microsoft Windows installation fixes (#492 )	2022-03-22 23:08:08 +01:00
dgtlmoon	8b2afa2220	GitHub tweak - container tags should be CSV list (Fix ghcr.io not building)	2022-03-22 00:08:05 +01:00
dgtlmoon	f55ffa0f62	GitHub tweak - build containers also on push to master	2022-03-21 23:08:17 +01:00
dgtlmoon	942c3f021f	Allow changedetector to ignore status codes as a per-site setting (#479 ) (#485 ) Co-authored-by: Ara Hayrabedian <ara.hayrabedian@gmail.com>	2022-03-21 23:03:54 +01:00
dgtlmoon	5483f5d694	Security update - Use CSRF token protection for forms, make "remove password" use HTTP Post (#484 )	2022-03-21 22:54:27 +01:00
dgtlmoon	f2fa638480	Security update - Protect against file:/// type access by webdriver/chrome. (#483 )	2022-03-21 20:59:20 +01:00
dgtlmoon	82d1a7f73e	Only build container on GitHub releases, not tests	2022-03-20 16:57:36 +01:00
dgtlmoon	9fc291fb63	Also change container names to help stop some DNS issues	2022-03-17 19:59:37 +01:00
dgtlmoon	3e8a15456a	Detect byte-encoding when the server mishandles the content-type header reply (#472 )	2022-03-17 10:28:02 +01:00
dgtlmoon	2a03f3f57e	Improving form/edit example markup	2022-03-13 12:00:45 +01:00
dgtlmoon	ffad5cca97	JSON diff/preview should use utf-8 encoding where possible (#465 )	2022-03-13 11:37:51 +01:00
Tim Loderhose	60a9a786e0	Fix typo in settings form	2022-03-13 10:55:37 +01:00
dgtlmoon	165e950e55	Add python venv to .gitignore	2022-03-13 10:53:33 +01:00
dgtlmoon	c25294ca57	0.39.10	2022-03-12 17:28:30 +01:00
Tim Loderhose	d4359c2e67	Add filter to remove elements by CSS rule from HTML before change detection is run (#445 )	2022-03-12 13:29:30 +01:00
dgtlmoon	44fc804991	Minor updates to filters form text	2022-03-12 11:20:43 +01:00
dgtlmoon	b72c9eaf62	Re #448 - Dont use changedetection.io as the container name and hostname, fix problems fetching from the real changedetection.io webserver :)	2022-03-12 08:24:51 +01:00
dgtlmoon	7ce9e4dfc2	Testing - Refactor HTTP Request Type test (#453 )	2022-03-11 18:50:02 +01:00
dgtlmoon	3cc6586695	Make table header font size the same as content	2022-03-07 13:03:59 +01:00
dgtlmoon	09204cb43f	Adjust background colours	2022-03-06 19:03:59 +01:00
dgtlmoon	a709122874	Handle the case where the visitor is already logged-in and tries to login again (#447 )	2022-03-06 18:19:05 +01:00
dgtlmoon	efbeaf9535	Make the Request Override settings easier to understand	2022-03-06 17:23:21 +01:00
dgtlmoon	1a19fba07d	Minor tweak to notification token table	2022-03-06 17:10:30 +01:00
dgtlmoon	eb9020c175	Style tweak to watch form	2022-03-06 17:05:23 +01:00
dgtlmoon	13bb44e4f8	Login form style fixes	2022-03-06 17:03:15 +01:00
dgtlmoon	47f294c23b	Upgrade apprise notification engine to 0.9.7 (important telegram fixes)	2022-03-05 13:14:14 +01:00
dgtlmoon	a4cce16188	Remove pytest from production release pip requirements	2022-03-05 13:12:15 +01:00
dgtlmoon	69aec23d1d	Style fix for background image relative to X-Forwarded-Prefix when running via reverse proxy subdirectory	2022-03-05 13:08:57 +01:00
dgtlmoon	f85ccffe0a	Merge branch 'master' of github.com:dgtlmoon/changedetection.io	2022-03-04 13:13:54 +01:00
dgtlmoon	0005131472	Re-arranging primary links so the important ones are easier to find on mobile	2022-03-04 13:06:39 +01:00
dgtlmoon	3be1f4ea44	Set authentication cookie path relative to X-Forwarded-Prefix when running via reverse proxy subdirectory (#446 )	2022-03-04 11:23:32 +01:00
dgtlmoon	46c72a7fb3	Upgrade inscriptis HTML converter to version 2.2~ (#434 )	2022-03-01 17:58:54 +01:00
dgtlmoon	96664ffb10	Better text/plain detection and refactor tests (#443 )	2022-03-01 17:50:15 +01:00
dgtlmoon	615fa2c5b2	Tweak support tabs and text (#440 )	2022-02-28 22:39:32 +01:00
dgtlmoon	fd45fcce2f	Include link to changedetection.io hosted option (#439 )	2022-02-28 15:47:59 +01:00
dgtlmoon	75ca7ec504	Improved CPU usage around the loop responsible for what sites needs to be checked	2022-02-28 15:08:51 +01:00
dgtlmoon	8b1e9f6591	Update README.md with hosting options	2022-02-26 18:42:54 +01:00
dgtlmoon	883aa968fd	0.39.9	2022-02-24 17:02:50 +01:00
dgtlmoon	3240ed2339	Minor reliability upgrade for large datasets - retry deepcopy (#436 )	2022-02-24 16:58:51 +01:00
dgtlmoon	a89ffffc76	"Recheck" button should work when entry is in paused state	2022-02-24 16:49:48 +01:00
dgtlmoon	fda93c3798	Better file exception handling on saving index JSON	2022-02-24 16:36:24 +01:00
dgtlmoon	a51c555964	Fix small issue in highlight trigger/ignore preview page with setting the background colours, add test	2022-02-23 12:30:36 +01:00
dgtlmoon	b401998030	Ensure string matching on the ignore filter is always case-INsensitive	2022-02-23 12:01:11 +01:00
dgtlmoon	014fda9058	Ability to visualise trigger and filter rules against the current snapshot on the preview page	2022-02-23 10:49:25 +01:00
dgtlmoon	dd384619e0	Update README.md	2022-02-19 13:41:54 +01:00
Michael	85715120e2	XPath RegularExpression support	2022-02-19 13:40:57 +01:00
dgtlmoon	a0e4f9b88a	better checking of JSON type	2022-02-17 18:16:47 +01:00
dgtlmoon	04bef6091e	Make system level errors from the HTTP fetchers easier to find (#421 )	2022-02-13 23:43:45 +01:00