Commit Graph

58 Commits

Author SHA1 Message Date
dgtlmoon
fcfd1b5e10 Ability to configure extra proxies via the UI (#1235) 2022-12-19 21:48:01 +01:00
dgtlmoon
13c4121f52 PDF File change detection - Initial PDF fetcher support with basic text extraction (#1244) 2022-12-19 17:51:41 +01:00
dgtlmoon
0c380c170f Playwright - Better error reporting and re-try fetch on fail once (#1238) 2022-12-16 18:06:14 +01:00
dgtlmoon
b76148a0f4 Fetcher - CPU usage - Skip processing if the previous checksum and the just fetched one was the same (#925) 2022-12-14 15:08:34 +01:00
dgtlmoon
93cc30437f Playwright+BrowserSteps - Fetch changes - Fetch simply after page starts rendering + delay seconds, disable service workers 2022-12-14 12:16:04 +01:00
dgtlmoon
69756f20f2 VisualSelector & BrowserSteps - Scraper improvements, remove duplicate code 2022-11-25 10:45:38 +01:00
dgtlmoon
fde7b3fd97 Remove dupe xpath finder prep code 2022-11-25 09:25:05 +01:00
dgtlmoon
5b530ff61c Configurable "Browser Steps" when Playwright/Chrome is configured (enter text, scroll, wait for text, click button etc) (#478) 2022-11-24 20:53:01 +01:00
dgtlmoon
df6e835035 Make VisualSelector show first available multiple selector, refactor to make more maintainable (#1132) 2022-11-17 11:52:48 +01:00
dgtlmoon
359fc48fb4 Filters can now accept a list/multiple filters (#1064) #623 2022-11-03 12:13:54 +01:00
dgtlmoon
669fd3ae0b Dont use default Requests user-agent and accept headers in playwright+selenium requests, breaks sites such as united.com. (#1004) 2022-10-09 18:25:36 +02:00
dgtlmoon
3ebb2ab9ba Selenium fetcher - screenshot should be taken after 'wait' time, not before #873 2022-09-25 11:05:07 +02:00
dgtlmoon
3705ce6681 Render Extract Configurable Delay Seconds should also apply after executing any JS #958 2022-09-24 23:48:03 +02:00
dgtlmoon
f7ea99412f Re #958 - remove change screensize, should be in 1280x720 default, was causing "Unable to retrieve content because the page is navigating and changing the content." on some sites 2022-09-19 14:02:32 +02:00
dgtlmoon
1193a7f22c Playwright - Support proxy auth mechanisms (#859) 2022-08-18 09:46:28 +02:00
dgtlmoon
e461c0b819 Playwright fetcher didn't report low level HTTP errors correctly (like Connection Refused) (#852) 2022-08-17 13:25:08 +02:00
dgtlmoon
9942107016 Massive improvements to error handling - show separate output for non HTTP 200 status replies 2022-08-15 18:56:53 +02:00
dgtlmoon
1eb5726cbf Execute JS should happen after waiting seconds 2022-08-15 11:27:04 +02:00
dgtlmoon
e6173357a9 Visual Selector direct element finder fix 2022-07-28 09:19:10 +02:00
dgtlmoon
fae1164c0b Ability to specify JS before running change-detection (#744) 2022-07-10 13:56:01 +02:00
dgtlmoon
169c293143 Playwright - log console errors to output 2022-07-10 13:55:29 +02:00
dgtlmoon
6553980cd5 Playwright - Use HTTP Request Headers override (Cookie, etc) 2022-06-25 23:42:48 +02:00
dgtlmoon
4a91505af5 Playwright screenshots - no need for high-res "bug workaround" screenshot, use lower quality/faster configurable image quality env var 2022-06-15 10:52:24 +02:00
dgtlmoon
82b900fbf4 Give more helpful error message when a page doesnt load 2022-06-14 08:16:22 +02:00
dgtlmoon
358a365303 Tweaks to playwright fetch code - better timeout handling 2022-06-13 23:39:43 +02:00
dgtlmoon
8294519f43 Content fetcher - Handle when a page doesnt load properly 2022-06-01 13:12:37 +02:00
dgtlmoon
8ba8a220b6 Playwright - Correctly close browser context/sessions on exceptions 2022-06-01 12:59:44 +02:00
dgtlmoon
5cefb16e52 Minor code cleanup 2022-05-25 15:38:40 +02:00
dgtlmoon
341ae24b73 Re #616 - content trigger - adding extra test (#620) 2022-05-25 15:31:59 +02:00
dgtlmoon
9d742446ab Playwright - ByPass CSP for more reliable JS scraping, disable accept downloads 2022-05-25 11:05:18 +02:00
dgtlmoon
e3e022b0f4 VisualSelector - Better handling of filter targets that are no longer available in the HTML 2022-05-25 10:23:43 +02:00
dgtlmoon
7983675325 Visual Selector - be more resilient when sites interfere with the xPath scraping 2022-05-24 00:10:38 +02:00
dgtlmoon
eef56e52c6 Adding new Visual Selector for choosing the area of the webpage to monitor - playwright/browserless only (#566) 2022-05-23 23:44:51 +02:00
dgtlmoon
6734fb91a2 Option to control if pages with no renderable content are a change (example: JS webapps that dont render any text sometimes) (#608) 2022-05-17 22:22:00 +02:00
dgtlmoon
16809b48f8 Playwright - raise EmptyReply on empty reply, no need to process further 2022-05-17 18:40:15 +02:00
dgtlmoon
67c833d2bc Re #214 - configurable wait extra seconds for webdriver requests before extracting text (#606) 2022-05-17 18:35:33 +02:00
weeix
31fea55ee4 Fix PLAYWRIGHT_DRIVER_URL default value (cf. #587) (#599) 2022-05-14 22:34:44 +02:00
dgtlmoon
18f0b63b7d Ability to specify a list of proxies to choose from, always using the first one by default, See wiki (#591) 2022-05-08 20:35:36 +02:00
dgtlmoon
9807cf0cda Playwright - code fix 2022-05-07 17:29:59 +02:00
dgtlmoon
d4b5237103 Playwright fetcher - more reliable by just waiting arbitrary seconds after the last network IO 2022-05-07 17:14:40 +02:00
dgtlmoon
dc6f76ba64 Make proxy configuration more consistent - see https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration (#585) 2022-05-07 16:37:56 +02:00
dgtlmoon
1f2f93184e Playwright fetcher - use the correct default User-Agent 2022-05-06 23:59:38 +02:00
dgtlmoon
68db20168e Add new fetch method: Playwright Chromium (Selenium/WebDriver alternative) (#489) 2022-05-02 21:40:40 +02:00
dgtlmoon
9fe4f95990 When fetching a snapshot via Chrome, make the most recent screenshot available on the Diff and Preview pages (#516) 2022-04-02 14:49:32 +02:00
dgtlmoon
942c3f021f Allow changedetector to ignore status codes as a per-site setting (#479) (#485)
Co-authored-by: Ara Hayrabedian <ara.hayrabedian@gmail.com>
2022-03-21 23:03:54 +01:00
dgtlmoon
3e8a15456a Detect byte-encoding when the server mishandles the content-type header reply (#472) 2022-03-17 10:28:02 +01:00
dgtlmoon
8d2e240a2a When using Env. FETCH_WORKERS or WEBDRIVER_DELAY_BEFORE_CONTENT_READY , it should be type int 2022-02-08 20:01:24 +01:00
dgtlmoon
5214a7e0f3 Adding Env var "WEBDRIVER_DELAY_BEFORE_CONTENT_READY" to wait n seconds before extracting the text from the browser 2022-02-08 18:24:25 +01:00
dgtlmoon
f87f7077a6 Better handling of EmptyReply exception, always bump 'last_checked' in the case of an error (#354)
* Better handling of EmptyReply exception, always bump 'last_checked' in the case of an error, adds test
2022-01-05 14:13:30 +01:00
Unpublished
023951a10e Be sure that documents returned with a application/json header are not parsed with inscriptis (#337)
* Auto-detect JSON by Content-Type header
* Add test to not parse JSON responses with inscriptis
2022-01-02 22:35:33 +01:00