# Harvester Python package for harvesting commonly available data, such as free proxy servers. ## Running the Demo If you just want proxies, just run the demo code in [main.py](main.py): ```shell git clone https://git.juggalol.com/agatha/harvester pip install -r requirements.txt mkdir proxies python main.py ``` ## Modules ### Proxy #### fetch_list The `proxy` module will harvest proxies from URLs with the `fetch_list` function. It functions by running a regular expression against the HTTP response, looking for strings that match a `username:password@ip:port` pattern where username and password are optional. #### fetch_all Proxies can be fetched from multiple source URLs by using the `fetch_all` function. It takes a list of URLs and an optional `max_workers` parameter. Proxies will be fetched from the source URLs concurrently using a `ThreadPoolExecutor`. #### validate_socks SOCKS5 proxies can be tested with the `validate_socks` method. The method takes a proxy string as its only argument. It returns a `requests.Response` object if the request is successful with no issues, otherwise it will raise an exception and the caller can decide how to proceed. For an example implementation, see [main.py](main.py). ## Testing I was trying to get into the habit of writing unit tests, but god damn I hate them. There are a few, but I don't plan on continuing any time soon. ``` pip install -r requirements.txt pip install -r requirement-dev.txt pytest -v ``` ## Greets Shoutouts to [acidvegas](https://git.supernets.org/acidvegas/). This project was inspired by [proxytools](https://git.supernets.org/acidvegas/proxytools)