lee/Scraperr: Self-hosted webscraper. - Scraperr - Gitea: Git with a cup of tea

lee/Scraperr

mirror of https://github.com/jaypyles/Scraperr.git synced 2026-07-08 15:10:48 +00:00

T

github-actions[bot] ee1e27ac1b

Merge / version (push) Has been cancelled

Details

Merge / build-and-deploy (push) Has been cancelled

Details

chore: bump version to 1.1.7

2025-10-12 16:55:31 +00:00

chore: push for arm64

2025-07-05 10:47:02 -05:00

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

fix: only log if it got a job

2025-10-12 11:55:20 -05:00

feat: edit ui + add return html option (#90 )

2025-06-08 18:14:02 -05:00

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

feat: edit ui + add return html option (#90 )

2025-06-08 18:14:02 -05:00

chore: bump version to 1.1.7

2025-10-12 16:55:31 +00:00

…

feat: auto deploy

2025-06-02 19:08:18 -05:00

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

.dockerignore

feat: add recording viewer and vnc (#78 )

2025-05-16 21:37:09 -05:00

.gitignore

feat: add media viewer + other fixes (#79 )

2025-05-17 16:31:34 -05:00

.prettierignore

Feat/add helm chart (#69 )

2025-05-12 21:19:17 -05:00

.python-version

…

alembic.ini

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

cypress.config.ts

…

docker-compose.dev.yml

feat: add media viewer + other fixes (#79 )

2025-05-17 16:31:34 -05:00

docker-compose.yml

Chore: app refactor (#88 )

2025-06-01 15:56:15 -05:00

FUNDING.yml

…

LICENSE

…

Makefile

Chore: app refactor (#88 )

2025-06-01 15:56:15 -05:00

next-env.d.ts

feat: add import/export for job configurations (#91 )

2025-06-12 18:00:39 -05:00

next.config.mjs

…

package.json

Chore: app refactor (#88 )

2025-06-01 15:56:15 -05:00

pdm.lock

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

postcss.config.js

…

pyproject.toml

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

README.md

feat: add media viewer + other fixes (#79 )

2025-05-17 16:31:34 -05:00

start.sh

Feat/swap to sqlalchemy (#99 )

2025-07-12 21:12:33 -05:00

supervisord.conf

feat: add recording viewer and vnc (#78 )

2025-05-16 21:37:09 -05:00

tailwind.config.js

…

tsconfig.json

Chore: app refactor (#88 )

2025-06-01 15:56:15 -05:00

yarn.lock

Chore: app refactor (#88 )

2025-06-01 15:56:15 -05:00

README.md

A powerful self-hosted web scraping solution

📋 Overview

Scrape websites without writing a single line of code.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

Scraperr Main Interface

✨ Key Features

XPath-Based Extraction: Precisely target page elements
Queue Management: Submit and manage multiple scraping jobs
Domain Spidering: Option to scrape all pages within the same domain
Custom Headers: Add JSON headers to your scraping requests
Media Downloads: Automatically download images, videos, and other media
Results Visualization: View scraped data in a structured table format
Data Export: Export your results in markdown and csv formats
Notifcation Channels: Send completion notifcations, through various channels

🚀 Getting Started

Docker

make up

Helm

Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment

⚖️ Legal and Ethical Guidelines

When using Scraperr, please remember to:

Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping
Terms of Service: Adhere to each website's Terms of Service regarding data extraction
Rate Limiting: Implement reasonable delays between requests to avoid overloading servers

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

💬 Join the Community

Get support, report bugs, and chat with other users and contributors.

👉 Join the Scraperr Discord

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👏 Contributions

Development made easier with the webapp template.

To get started, simply run make build up-dev.

Languages

TypeScript 67%

Python 30.4%

CSS 0.8%

Dockerfile 0.5%

Makefile 0.5%

Other 0.8%