harvester/README.md

47 lines
1.6 KiB
Markdown
Raw Normal View History

2023-11-06 19:41:05 +00:00
# Harvester
Python package for harvesting commonly available data, such as free proxy servers.
2023-11-08 01:30:49 +00:00
## Running the Demo
If you just want proxies, just run the demo code in [main.py](main.py):
```shell
git clone https://git.juggalol.com/agatha/harvester
pip install -r requirements.txt
mkdir proxies
python main.py
```
2023-11-06 19:53:50 +00:00
## Modules
### Proxy
2023-11-06 21:29:44 +00:00
#### fetch_list
2023-11-06 19:53:50 +00:00
The `proxy` module will harvest proxies from URLs with the `fetch_list` function.
It functions by running a regular expression against the HTTP response, looking for
strings that match a `username:password@ip:port` pattern where username and password
are optional.
2023-11-06 21:29:44 +00:00
#### fetch_all
Proxies can be fetched from multiple source URLs by using the `fetch_all` function.
It takes a list of URLs and an optional `max_workers` parameter. Proxies will be fetched from
2023-11-08 01:30:49 +00:00
the source URLs concurrently using a `ThreadPoolExecutor`.
2023-11-06 21:29:44 +00:00
2023-11-08 00:02:52 +00:00
#### validate_socks
SOCKS5 proxies can be tested with the `validate_socks` method. The method takes a proxy
string as its only argument. It returns a `requests.Response` object if the request is successful
with no issues, otherwise it will raise an exception and the caller can decide how to proceed.
2023-11-08 02:27:10 +00:00
For an example implementation, see [main.py](main.py).
2023-11-08 00:02:52 +00:00
2023-11-06 19:41:05 +00:00
## Testing
2023-11-08 01:30:49 +00:00
I was trying to get into the habit of writing unit tests, but god damn I hate them. There are
a few, but I don't plan on continuing any time soon.
2023-11-06 19:41:05 +00:00
```
2023-11-06 19:51:09 +00:00
pip install -r requirements.txt
pip install -r requirement-dev.txt
2023-11-06 19:41:05 +00:00
pytest -v
```
2023-11-08 01:30:49 +00:00
2023-11-08 02:27:10 +00:00
## Greetz
2023-11-08 01:30:49 +00:00
Shoutouts to [acidvegas](https://git.supernets.org/acidvegas/). This project was inspired by
2023-11-08 02:27:10 +00:00
the scripts in [proxytools](https://git.supernets.org/acidvegas/proxytools).