# Harvester Python package for harvesting commonly available data, such as free proxy servers. ## Modules ### Proxy #### fetch_list The `proxy` module will harvest proxies from URLs with the `fetch_list` function. It functions by running a regular expression against the HTTP response, looking for strings that match a `username:password@ip:port` pattern where username and password are optional. ```python from harvester.proxy import fetch_list URLS = [ 'https://api.openproxylist.xyz/socks4.txt', 'https://api.openproxylist.xyz/socks5.txt', 'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4', ] def main(): """Main entry point.""" for url in URLS: proxies = fetch_list(url) print(proxies) if __name__ == '__main__': main() ``` #### fetch_all Proxies can be fetched from multiple source URLs by using the `fetch_all` function. It takes a list of URLs and an optional `max_workers` parameter. Proxies will be fetched from the source URLs concurrently using a `ThreadPoolExecutor`: ```python from harvester.proxy import fetch_all URLS = [ 'https://api.openproxylist.xyz/socks4.txt', 'https://api.openproxylist.xyz/socks5.txt', 'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4', ] def main(): """Main entry point.""" proxies = fetch_all(URLS) print(proxies) if __name__ == '__main__': main() ``` #### validate_socks SOCKS5 proxies can be tested with the `validate_socks` method. The method takes a proxy string as its only argument. It returns a `requests.Response` object if the request is successful with no issues, otherwise it will raise an exception and the caller can decide how to proceed. For an example implementation, see [main.py](main.py). ## Testing ``` pip install -r requirements.txt pip install -r requirement-dev.txt pytest -v ```