Files
changedetection.io/changedetectionio/tests/test_jsonpath_jq_selector.py
dgtlmoon 3d14df6a11 Development branch merge into release/master
Multi-language / Translations Support (#3696)
  - Complete internationalization system implemented
  - Support for 7 languages: Czech (cs), German (de), French (fr), Italian (it), Korean (ko), Chinese Simplified (zh), Chinese Traditional (zh_TW)
  - Language selector with localized flags and theming
  - Flash message translations
  - Multiple translation fixes and improvements across all languages
  - Language setting preserved across redirects

  Pluggable Content Fetchers (#3653)
  - New architecture for extensible content fetcher system
  - Allows custom fetcher implementations

  Image / Screenshot Comparison Processor (#3680)
  - New processor for visual change detection (disabled for this release)
  - Supporting CSS/JS infrastructure added

  UI Improvements

  Design & Layout
  - Auto-generated tag color schemes
  - Simplified login form styling
  - Removed hard-coded CSS, moved to SCSS variables
  - Tag UI cleanup and improvements
  - Automatic tab wrapper functionality
  - Menu refactoring for better organization
  - Cleanup of offset settings
  - Hide sticky tabs on narrow viewports
  - Improved responsive layout (#3702)

  User Experience
  - Modal alerts/confirmations on delete/clear operations (#3693, #3598, #3382)
  - Auto-add https:// to URLs in quickwatch form if not present
  - Better redirect handling on login (#3699)
  - 'Recheck all' now returns to correct group/tag (#3673)
  - Language set redirect keeps hash fragment
  - More friendly human-readable text throughout UI

  Performance & Reliability

  Scheduler & Processing
  - Soft delays instead of blocking time.sleep() calls (#3710)
  - More resilient handling of same UUID being processed (#3700)
  - Better Puppeteer timeout handling
  - Improved Puppeteer shutdown/cleanup (#3692)
  - Requests cleanup now properly async

  History & Rendering
  - Faster server-side "difference" rendering on History page (#3442)
  - Show ignored/triggered rows in history
  - API: Retry watch data if watch dict changed (more reliable)

  API Improvements

  - Watch get endpoint: retry mechanism for changed watch data
  - WatchHistoryDiff API endpoint includes extra format args (#3703)

  Testing Improvements

  - Replace time.sleep with wait_for_notification_endpoint_output (#3716)
  - Test for mode switching (#3701)
  - Test for #3720 added (#3725)
  - Extract-text difference test fixes
  - Improved dev workflow

  Bug Fixes

  - Notification error text output (#3672, #3669, #3280)
  - HTML validation fixes (#3704)
  - Template discovery path fixes
  - Notification debug log now uses system locale for dates/times
  - Puppeteer spelling mistake in log output
  - Recalculation on anchor change
  - Queue bubble update disabled temporarily

  Dependency Updates

  - beautifulsoup4 updated (#3724)
  - psutil 7.1.0 → 7.2.1 (#3723)
  - python-engineio ~=4.12.3 → ~=4.13.0 (#3707)
  - python-socketio ~=5.14.3 → ~=5.16.0 (#3706)
  - flask-socketio ~=5.5.1 → ~=5.6.0 (#3691)
  - brotli ~=1.1 → ~=1.2 (#3687)
  - lxml updated (#3590)
  - pytest ~=7.2 → ~=9.0 (#3676)
  - jsonschema ~=4.0 → ~=4.25 (#3618)
  - pluggy ~=1.5 → ~=1.6 (#3616)
  - cryptography 44.0.1 → 46.0.3 (security) (#3589)

  Documentation

  - README updated with viewport size setup information

  Development Infrastructure

  - Dev container only built on dev branch
  - Improved dev workflow tooling
2026-01-12 17:50:53 +01:00

475 lines
18 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#!/usr/bin/env python3
# coding=utf-8
import time
from flask import url_for
from markupsafe import escape
from . util import live_server_setup, wait_for_all_checks, delete_all_watches
import pytest
import os
jq_support = True
try:
import jq
except ModuleNotFoundError:
jq_support = False
def test_unittest_inline_html_extract():
# So lets pretend that the JSON we want is inside some HTML
content="""
<html>
food and stuff and more
<script>
alert('nothing really good here');
</script>
<script type="application/ld+json">
xx {"@context":"http://schema.org","@type":"Product","name":"Nan Optipro Stage 1 Baby Formula 800g","description":"During the first year of life, nutrition is critical for your baby. NAN OPTIPRO 1 is tailored to ensure your formula fed infant receives balanced, high quality nutrition.<br />Starter infant formula. The age optimised protein source (whey dominant) is from cows milk.<br />Backed by more than 150 years of Nestlé expertise.<br />For hygiene and convenience, it is available in an innovative packaging format with a separate storage area for the scoop, and a semi-transparent window which allows you to see how much powder is left in the can without having to open it.","image":"https://cdn0.woolworths.media/content/wowproductimages/large/155536.jpg","brand":{"@context":"http://schema.org","@type":"Organization","name":"Nan"},"gtin13":"7613287517388","offers":{"@context":"http://schema.org","@type":"Offer","potentialAction":{"@context":"http://schema.org","@type":"BuyAction"},"availability":"http://schema.org/InStock","itemCondition":"http://schema.org/NewCondition","price":23.5,"priceCurrency":"AUD"},"review":[],"sku":"155536"}
</script>
<body>
and it can also be repeated
<script type="application/ld+json">
{"@context":"http://schema.org","@type":"Product","name":"Nan Optipro Stage 1 Baby Formula 800g","description":"During the first year of life, nutrition is critical for your baby. NAN OPTIPRO 1 is tailored to ensure your formula fed infant receives balanced, high quality nutrition.<br />Starter infant formula. The age optimised protein source (whey dominant) is from cows milk.<br />Backed by more than 150 years of Nestlé expertise.<br />For hygiene and convenience, it is available in an innovative packaging format with a separate storage area for the scoop, and a semi-transparent window which allows you to see how much powder is left in the can without having to open it.","image":"https://cdn0.woolworths.media/content/wowproductimages/large/155536.jpg","brand":{"@context":"http://schema.org","@type":"Organization","name":"Nan"},"gtin13":"7613287517388","offers":{"@context":"http://schema.org","@type":"Offer","potentialAction":{"@context":"http://schema.org","@type":"BuyAction"},"availability":"http://schema.org/InStock","itemCondition":"http://schema.org/NewCondition","price":23.5,"priceCurrency":"AUD"},"review":[],"sku":"155536"}
</script>
<h4>ok</h4>
</body>
</html>
"""
from .. import html_tools
# See that we can find the second <script> one, which is not broken, and matches our filter
text = html_tools.extract_json_as_string(content, "json:$.offers.priceCurrency")
assert text == '"AUD"'
text = html_tools.extract_json_as_string('{"id":5}', "json:$.id")
assert text == "5"
# also check for jq
if jq_support:
text = html_tools.extract_json_as_string(content, "jq:.offers.priceCurrency")
assert text == '"AUD"'
text = html_tools.extract_json_as_string('{"id":5}', "jq:.id")
assert text == "5"
text = html_tools.extract_json_as_string(content, "jqraw:.offers.priceCurrency")
assert text == "AUD"
text = html_tools.extract_json_as_string('{"id":5}', "jqraw:.id")
assert text == "5"
# When nothing at all is found, it should throw JSONNOTFound
# Which is caught and shown to the user in the watch-overview table
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "json:$.id")
if jq_support:
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jq:.id")
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jqraw:.id")
def test_unittest_inline_extract_body():
content = """
<html>
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"testKey": 42}
</pre>
</body>
</html>
"""
from .. import html_tools
# See that we can find the second <script> one, which is not broken, and matches our filter
text = html_tools.extract_json_as_string(content, "json:$.testKey")
assert text == '42'
def set_original_ext_response(datastore_path):
data = """
[
{
"isPriceLowered": false,
"status": "ForSale",
"statusOrig": "for sale"
},
{
"_id": "5e7b3e1fb3262d306323ff1e",
"listingsType": "consumer",
"status": "ForSale",
"statusOrig": "for sale"
}
]
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(data)
return None
def set_modified_ext_response(datastore_path):
# This should get reformatted
data = """ [ { "isPriceLowered": false, "status": "Sold", "statusOrig": "sold" }, {
"_id": "5e7b3e1fb3262d306323ff1e",
"listingsType": "consumer",
"isPriceLowered": false,
"status": "Sold"
}
]
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(data)
return None
def set_original_response(datastore_path):
test_return_data = """
{
"employees": [
{
"id": 1,
"name": "Pankaj",
"salary": "10000"
},
{
"name": "David",
"salary": "5000",
"id": 2
}
],
"boss": {
"name": "Fat guy"
},
"available": true
}
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None
def set_json_response_with_html(datastore_path):
test_return_data = """
{
"test": [
{
"html": "<b>"
}
]
}
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None
def set_modified_response(datastore_path):
test_return_data = """
{
"employees": [
{
"id": 1,
"name": "Pankaj",
"salary": "10000"
},
{
"name": "David",
"salary": "5000",
"id": 2
}
],
"boss": {
"name": "Örnsköldsvik"
},
"available": false
}
"""
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write(test_return_data)
return None
def test_check_json_without_filter(client, live_server, measure_memory_usage, datastore_path):
# Request a JSON document from a application/json source containing HTML
# and be sure it doesn't get chewed up by instriptis
set_json_response_with_html(datastore_path=datastore_path)
# Add our URL to the import page
test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
wait_for_all_checks(client)
res = client.get(
url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True
)
# Should still see '"html": "<b>"'
assert b'&#34;html&#34;: &#34;&lt;b&gt;&#34;' in res.data
assert res.data.count(b'{') >= 2
delete_all_watches(client)
def check_json_filter(json_filter, client, live_server, datastore_path):
set_original_response(datastore_path=datastore_path)
# Add our URL to the import page
test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url, extras={"include_filters": json_filter.splitlines()})
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
wait_for_all_checks(client)
# Check it saved
res = client.get(
url_for("ui.ui_edit.edit_page", uuid=uuid),
)
assert bytes(escape(json_filter).encode('utf-8')) in res.data
# Give the thread time to pick it up
wait_for_all_checks(client)
# Make a change
set_modified_response(datastore_path=datastore_path)
# Trigger a check
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
wait_for_all_checks(client)
# It should have 'has-unread-changes' still
res = client.get(url_for("watchlist.index"))
assert b'has-unread-changes' in res.data
# Should not see this, because its not in the JSONPath we entered
res = client.get(url_for("ui.ui_diff.diff_history_page", uuid=uuid))
# But the change should be there, tho its hard to test the change was detected because it will show old and new versions
# And #462 - check we see the proper utf-8 string there
assert "Örnsköldsvik".encode('utf-8') in res.data
delete_all_watches(client)
def test_check_jsonpath_filter(client, live_server, measure_memory_usage, datastore_path):
check_json_filter('json:boss.name', client, live_server, datastore_path=datastore_path)
def test_check_jq_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_filter('jq:.boss.name', client, live_server, datastore_path=datastore_path)
def test_check_jqraw_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_filter('jqraw:.boss.name', client, live_server, datastore_path=datastore_path)
def check_json_filter_bool_val(json_filter, client, live_server, datastore_path):
set_original_response(datastore_path=datastore_path)
test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url, extras={"include_filters": [json_filter]})
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# Make a change
set_modified_response(datastore_path=datastore_path)
# Trigger a check
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
wait_for_all_checks(client)
res = client.get(url_for("ui.ui_diff.diff_history_page", uuid="first"))
# But the change should be there, tho its hard to test the change was detected because it will show old and new versions
assert b'false' in res.data
delete_all_watches(client)
def test_check_jsonpath_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
check_json_filter_bool_val("json:$['available']", client, live_server, datastore_path=datastore_path)
def test_check_jq_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path)
def test_check_jqraw_filter_bool_val(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_filter_bool_val("jq:.available", client, live_server, datastore_path=datastore_path)
# Re #265 - Extended JSON selector test
# Stuff to consider here
# - Selector should be allowed to return empty when it doesnt match (people might wait for some condition)
# - The 'diff' tab could show the old and new content
# - Form should let us enter a selector that doesnt (yet) match anything
def check_json_ext_filter(json_filter, client, live_server, datastore_path):
set_original_ext_response(datastore_path=datastore_path)
# Add our URL to the import page
test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
# Goto the edit page, add our ignore text
# Add our URL to the import page
res = client.post(
url_for("ui.ui_edit.edit_page", uuid=uuid),
data={"include_filters": json_filter,
"url": test_url,
"tags": "",
"headers": "",
"fetch_backend": "html_requests",
"time_between_check_use_default": "y"
},
follow_redirects=True
)
assert b"Updated watch." in res.data
# Check it saved
res = client.get(
url_for("ui.ui_edit.edit_page", uuid=uuid),
)
assert bytes(escape(json_filter).encode('utf-8')) in res.data
# Give the thread time to pick it up
wait_for_all_checks(client)
# Make a change
set_modified_ext_response(datastore_path=datastore_path)
# Trigger a check
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
wait_for_all_checks(client)
watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
dates = list(watch.history.keys())
snapshot_contents = watch.get_history_snapshot(timestamp=dates[0])
assert snapshot_contents[0] == '['
# It should have 'has-unread-changes'
res = client.get(url_for("watchlist.index"))
assert b'has-unread-changes' in res.data
res = client.get(url_for("ui.ui_preview.preview_page", uuid="first"))
# We should never see 'ForSale' because we are selecting on 'Sold' in the rule,
# But we should know it triggered ('has-unread-changes' assert above)
assert b'ForSale' not in res.data
assert b'Sold' in res.data
# And the difference should have both?
res = client.get(url_for("ui.ui_diff.diff_history_page", uuid="first"))
assert b'ForSale' in res.data
assert b'Sold' in res.data
delete_all_watches(client)
def test_ignore_json_order(client, live_server, measure_memory_usage, datastore_path):
# A change in order shouldn't trigger a notification
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write('{"hello" : 123, "world": 123}')
# Add our URL to the import page
test_url = url_for('test_endpoint', content_type="application/json", _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write('{"world" : 123, "hello": 123}')
# Trigger a check
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
assert b'has-unread-changes' not in res.data
# Just to be sure it still works
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write('{"world" : 123, "hello": 124}')
# Trigger a check
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
assert b'has-unread-changes' in res.data
delete_all_watches(client)
def test_correct_header_detect(client, live_server, measure_memory_usage, datastore_path):
# Like in https://github.com/dgtlmoon/changedetection.io/pull/1593
# Specify extra html that JSON is sometimes wrapped in - when using SockpuppetBrowser / Puppeteer / Playwrightetc
with open(os.path.join(datastore_path, "endpoint-content.txt"), "w") as f:
f.write('<html><body>{ "world": 123, "hello" : 123}')
# Add our URL to the import page
# Check weird casing is cleaned up and detected also
test_url = url_for('test_endpoint', content_type="aPPlication/JSon", uppercase_headers=True, _external=True)
uuid = client.application.config.get('DATASTORE').add_watch(url=test_url)
client.get(url_for("ui.form_watch_checknow"), follow_redirects=True)
wait_for_all_checks(client)
res = client.get(url_for("watchlist.index"))
# Fixed in #1593
assert b'No parsable JSON found in this document' not in res.data
res = client.get(
url_for("ui.ui_preview.preview_page", uuid="first"),
follow_redirects=True
)
watch = live_server.app.config['DATASTORE'].data['watching'][uuid]
dates = list(watch.history.keys())
snapshot_contents = watch.get_history_snapshot(timestamp=dates[0])
assert b'&#34;hello&#34;: 123,' in res.data # properly html escaped in the front end
import json
data = json.loads(snapshot_contents)
keys = list(data.keys())
# Should be correctly formatted and sorted, ("world" goes to end)
assert keys == ["hello", "world"]
delete_all_watches(client)
def test_check_jsonpath_ext_filter(client, live_server, measure_memory_usage, datastore_path):
check_json_ext_filter('json:$[?(@.status==Sold)]', client, live_server, datastore_path=datastore_path)
def test_check_jq_ext_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path)
def test_check_jqraw_ext_filter(client, live_server, measure_memory_usage, datastore_path):
if jq_support:
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server, datastore_path=datastore_path)
def test_jsonpath_BOM_utf8(client, live_server, measure_memory_usage, datastore_path):
from .. import html_tools
# JSON string with BOM and correct double-quoted keys
json_str = '\ufeff{"name": "José", "emoji": "😊", "language": "中文", "greeting": "Привет"}'
# See that we can find the second <script> one, which is not broken, and matches our filter
text = html_tools.extract_json_as_string(json_str, "json:$.name")
assert text == '"José"'