Python module for harvesting commonly available data such as free proxy servers.
harvester | ||
tests | ||
.gitignore | ||
main.py | ||
README.md | ||
requirements-dev.txt | ||
requirements.txt |
Harvester
Python package for harvesting commonly available data, such as free proxy servers.
Modules
Proxy
The proxy
module will harvest proxies from URLs with the fetch_list
function.
It functions by running a regular expression against the HTTP response, looking for
strings that match a username:password@ip:port
pattern where username and password
are optional.
from harvester.proxy import fetch_list
URLS = [
'https://api.openproxylist.xyz/socks4.txt',
'https://api.openproxylist.xyz/socks5.txt',
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
]
def main():
"""Main entry point."""
for url in URLS:
proxies = fetch_list(url)
print(proxies)
Testing
pip install -r requirements.txt
pip install -r requirement-dev.txt
pytest -v