mirror of
https://github.com/jaypyles/Scraperr.git
synced 2025-12-11 18:26:08 +00:00
48d3bf92144ebe0e4bae932672e18fef45ffc5bf
…
📋 Overview
Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data.
📚 Check out the docs for a comprehensive quickstart guide and detailed information.
✨ Key Features
- XPath-Based Extraction: Precisely target page elements
- Queue Management: Submit and manage multiple scraping jobs
- Domain Spidering: Option to scrape all pages within the same domain
- Custom Headers: Add JSON headers to your scraping requests
- Media Downloads: Automatically download images, videos, and other media
- Results Visualization: View scraped data in a structured table format
- Data Export: Export your results in various formats
- Notifcation Channels: Send completion notifcations, through various channels
🚀 Getting Started
make up
⚖️ Legal and Ethical Guidelines
When using Scraperr, please remember to:
- Respect
robots.txt: Always check a website'srobots.txtfile to verify which pages permit scraping - Terms of Service: Adhere to each website's Terms of Service regarding data extraction
- Rate Limiting: Implement reasonable delays between requests to avoid overloading servers
Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
👏 Contributions
Development made easier with the webapp template.
To get started, simply run make build up-dev.
Description
Languages
TypeScript
67%
Python
30.4%
CSS
0.8%
Dockerfile
0.5%
Makefile
0.5%
Other
0.8%

