133 lines
3.2 KiB
Markdown
133 lines
3.2 KiB
Markdown
|
|
# Website Change Alert
|
||
|
|
|
||
|
|
A Python tool that monitors websites for content changes and sends email notifications.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- ✅ Monitor any website URL
|
||
|
|
- ✅ Hash-based change detection
|
||
|
|
- ✅ Optional XPath or CSS selector support to monitor specific page sections
|
||
|
|
- ✅ Email notifications via SMTP
|
||
|
|
- ✅ Environment-based configuration
|
||
|
|
- ✅ Designed for cron scheduling
|
||
|
|
|
||
|
|
## Setup
|
||
|
|
|
||
|
|
1. **Install dependencies:**
|
||
|
|
```bash
|
||
|
|
poetry install
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Configure environment:**
|
||
|
|
```bash
|
||
|
|
cp .env.example .env
|
||
|
|
# Edit .env with your settings
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Run manually to test:**
|
||
|
|
```bash
|
||
|
|
poetry run python alert.py
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Edit `.env` with your settings:
|
||
|
|
|
||
|
|
### Required Settings
|
||
|
|
|
||
|
|
- `URL`: The website to monitor
|
||
|
|
- `SMTP_HOST`: SMTP server hostname (e.g., `smtp.gmail.com`)
|
||
|
|
- `SMTP_USER`: Your SMTP username
|
||
|
|
- `SMTP_PASSWORD`: Your SMTP password or app-specific password
|
||
|
|
- `FROM_EMAIL`: Sender email address
|
||
|
|
- `TO_EMAIL`: Recipient email address
|
||
|
|
|
||
|
|
### Optional Settings
|
||
|
|
|
||
|
|
- `SELECTOR`: XPath or CSS selector to monitor specific content (leave empty for full page)
|
||
|
|
- `SELECTOR_TYPE`: Type of selector - `xpath` (default) or `css`
|
||
|
|
- `CACHE_FILE`: Location to store hash cache (default: `.cache/hash.txt`)
|
||
|
|
- `SMTP_PORT`: SMTP port (default: `587`)
|
||
|
|
- `SMTP_USE_TLS`: Use TLS encryption (default: `true`)
|
||
|
|
|
||
|
|
## Usage Examples
|
||
|
|
|
||
|
|
### Monitor entire page:
|
||
|
|
```env
|
||
|
|
URL=https://example.com/news
|
||
|
|
SELECTOR=
|
||
|
|
```
|
||
|
|
|
||
|
|
### Monitor specific section with XPath:
|
||
|
|
```env
|
||
|
|
URL=https://example.com/news
|
||
|
|
SELECTOR=//div[@class='main-content']
|
||
|
|
SELECTOR_TYPE=xpath
|
||
|
|
```
|
||
|
|
|
||
|
|
### Monitor specific section with CSS selector:
|
||
|
|
```env
|
||
|
|
URL=https://example.com/news
|
||
|
|
SELECTOR=.main-content
|
||
|
|
SELECTOR_TYPE=css
|
||
|
|
```
|
||
|
|
|
||
|
|
## Scheduling with Cron
|
||
|
|
|
||
|
|
Add to your crontab (`crontab -e`):
|
||
|
|
|
||
|
|
```cron
|
||
|
|
# Check every hour
|
||
|
|
0 * * * * cd /path/to/alert-website-change && /path/to/poetry run python alert.py
|
||
|
|
|
||
|
|
# Check every 15 minutes
|
||
|
|
*/15 * * * * cd /path/to/alert-website-change && /path/to/poetry run python alert.py
|
||
|
|
|
||
|
|
# Check daily at 9 AM
|
||
|
|
0 9 * * * cd /path/to/alert-website-change && /path/to/poetry run python alert.py
|
||
|
|
```
|
||
|
|
|
||
|
|
To get the full path to poetry:
|
||
|
|
```bash
|
||
|
|
which poetry
|
||
|
|
```
|
||
|
|
|
||
|
|
## Email Provider Setup
|
||
|
|
|
||
|
|
### Gmail
|
||
|
|
1. Enable 2-factor authentication
|
||
|
|
2. Generate an app-specific password: https://myaccount.google.com/apppasswords
|
||
|
|
3. Use the app password in `SMTP_PASSWORD`
|
||
|
|
|
||
|
|
### Other Providers
|
||
|
|
Common SMTP settings:
|
||
|
|
- **Gmail**: `smtp.gmail.com:587` (TLS)
|
||
|
|
- **Outlook**: `smtp-mail.outlook.com:587` (TLS)
|
||
|
|
- **Yahoo**: `smtp.mail.yahoo.com:587` (TLS)
|
||
|
|
|
||
|
|
## How It Works
|
||
|
|
|
||
|
|
1. **First run**: Fetches the page, computes a hash, and saves it to cache
|
||
|
|
2. **Subsequent runs**:
|
||
|
|
- Fetches the page content
|
||
|
|
- Computes hash of current content
|
||
|
|
- Compares with cached hash
|
||
|
|
- If different: sends email and updates cache
|
||
|
|
- If same: exits silently
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Import errors during development
|
||
|
|
Run `poetry install` to install all dependencies.
|
||
|
|
|
||
|
|
### No email received
|
||
|
|
- Check spam folder
|
||
|
|
- Verify SMTP credentials
|
||
|
|
- Test with a simple manual run
|
||
|
|
- Check cron logs: `grep CRON /var/log/syslog` (Linux) or `log show --predicate 'process == "cron"' --last 1h` (macOS)
|
||
|
|
|
||
|
|
### XPath/CSS selector returns nothing
|
||
|
|
- Test your selector in browser DevTools
|
||
|
|
- Use `//text()` at the end of XPath to get text content
|
||
|
|
- Verify the selector matches elements on the page
|