harvester/README.md
2023-11-07 21:27:10 -05:00

1.6 KiB

Harvester

Python package for harvesting commonly available data, such as free proxy servers.

Running the Demo

If you just want proxies, just run the demo code in main.py:

git clone https://git.juggalol.com/agatha/harvester
pip install -r requirements.txt
mkdir proxies
python main.py

Modules

Proxy

fetch_list

The proxy module will harvest proxies from URLs with the fetch_list function.

It functions by running a regular expression against the HTTP response, looking for strings that match a username:password@ip:port pattern where username and password are optional.

fetch_all

Proxies can be fetched from multiple source URLs by using the fetch_all function.

It takes a list of URLs and an optional max_workers parameter. Proxies will be fetched from the source URLs concurrently using a ThreadPoolExecutor.

validate_socks

SOCKS5 proxies can be tested with the validate_socks method. The method takes a proxy string as its only argument. It returns a requests.Response object if the request is successful with no issues, otherwise it will raise an exception and the caller can decide how to proceed.

For an example implementation, see main.py.

Testing

I was trying to get into the habit of writing unit tests, but god damn I hate them. There are a few, but I don't plan on continuing any time soon.

pip install -r requirements.txt
pip install -r requirement-dev.txt
pytest -v

Greetz

Shoutouts to acidvegas. This project was inspired by the scripts in proxytools.