Scraperr

mirror of https://github.com/jaypyles/Scraperr.git synced 2025-10-30 14:07:04 +00:00

Go to file

Jayden Pyles 1b8c8c779a Feature: Allow Multiple Download Options (#75 )

* feat: allow downloading in MD format

* fix: unit tests

* fix: deployments [skip ci]

* fix: deployment

2025-05-13 18:23:59 -05:00

.github

Feature: Allow Multiple Download Options (#75 )

2025-05-13 18:23:59 -05:00

api/backend

Feature: Allow Multiple Download Options (#75 )

2025-05-13 18:23:59 -05:00

cypress

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

docker

Feature: Allow Multiple Download Options (#75 )

2025-05-13 18:23:59 -05:00

docs

chore: docs [skip ci]

2025-05-11 13:42:37 -05:00

helm

Feature: Allow Multiple Download Options (#75 )

2025-05-13 18:23:59 -05:00

public

fix: make general fixes to dev containers and log pages

2024-10-18 17:19:46 -05:00

src

Feature: Allow Multiple Download Options (#75 )

2025-05-13 18:23:59 -05:00

.gitignore

feat: add in media downloading (#62 )

2025-05-10 15:14:54 -05:00

.prettierignore

Feat/add helm chart (#69 )

2025-05-12 21:19:17 -05:00

.python-version

feat: fix authentication

2025-04-24 18:24:19 -05:00

cypress.config.ts

feat: add cypress E2E testing

2024-10-21 19:57:12 -05:00

docker-compose.dev.yml

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

docker-compose.yml

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

FUNDING.yml

chore: docs [skip ci]

2025-05-11 11:24:19 -05:00

LICENSE

Create LICENSE

2024-07-07 14:06:35 -05:00

Makefile

fix: make general fixes to dev containers and log pages

2024-10-18 17:19:46 -05:00

next-env.d.ts

feat: add next.js with prod/dev configs

2024-06-25 14:49:17 -05:00

next.config.mjs

wip: separate frontend from backend

2024-07-23 20:53:15 -05:00

package-lock.json

fix: vulns

2025-05-10 12:04:39 -05:00

package.json

fix: vulns

2025-05-10 12:04:39 -05:00

pdm.lock

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

postcss.config.js

wip: update Dockerfile with next deps

2024-06-26 16:14:43 -05:00

pyproject.toml

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

README.md

docs: update docs [skip ci]

2025-05-13 13:11:52 -05:00

supervisord.conf

Fix/UI and backend fixes (#67 )

2025-05-11 17:33:29 -05:00

tailwind.config.js

wip: update UI

2024-07-22 15:57:32 -05:00

tsconfig.json

feat: fix authentication

2025-04-24 18:24:19 -05:00

README.md

A powerful self-hosted web scraping solution

📋 Overview

Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

✨ Key Features

XPath-Based Extraction: Precisely target page elements
Queue Management: Submit and manage multiple scraping jobs
Domain Spidering: Option to scrape all pages within the same domain
Custom Headers: Add JSON headers to your scraping requests
Media Downloads: Automatically download images, videos, and other media
Results Visualization: View scraped data in a structured table format
Data Export: Export your results in various formats
Notifcation Channels: Send completion notifcations, through various channels

🚀 Getting Started

Docker

make up

Helm

Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment

⚖️ Legal and Ethical Guidelines

When using Scraperr, please remember to:

Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping
Terms of Service: Adhere to each website's Terms of Service regarding data extraction
Rate Limiting: Implement reasonable delays between requests to avoid overloading servers

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

💬 Join the Community

Get support, report bugs, and chat with other users and contributors.

👉 Join the Scraperr Discord

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👏 Contributions

Development made easier with the webapp template.

To get started, simply run make build up-dev.

Languages

TypeScript 67%

Python 30.4%

CSS 0.8%

Dockerfile 0.5%

Makefile 0.5%

Other 0.8%