Scraperr

Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

✨ Key Features

XPath-Based Extraction: Precisely target page elements
Queue Management: Submit and manage multiple scraping jobs
Domain Spidering: Option to scrape all pages within the same domain
Custom Headers: Add JSON headers to your scraping requests
Media Downloads: Automatically download images, videos, and other media
Results Visualization: View scraped data in a structured table format
Data Export: Export your results in various formats
Notifcation Channels: Send completion notifcations, through various channels

🚀 Getting Started

make up

⚖️ Legal and Ethical Guidelines

When using Scraperr, please remember to:

Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping
Terms of Service: Adhere to each website's Terms of Service regarding data extraction
Rate Limiting: Implement reasonable delays between requests to avoid overloading servers

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👏 Contributions

Development made easier with the webapp template.

To get started, simply run make build up-dev.

Languages

TypeScript 67%

Python 30.4%

CSS 0.8%

Dockerfile 0.5%

Makefile 0.5%

Other 0.8%