Python module for harvesting commonly available data such as free proxy servers.
harvester | ||
tests | ||
.gitignore | ||
main.py | ||
README.md | ||
requirements-dev.txt | ||
requirements.txt |
Harvester
Python package for harvesting commonly available data, such as free proxy servers.
Modules
Proxy
fetch_list
The proxy
module will harvest proxies from URLs with the fetch_list
function.
It functions by running a regular expression against the HTTP response, looking for
strings that match a username:password@ip:port
pattern where username and password
are optional.
from harvester.proxy import fetch_list
URLS = [
'https://api.openproxylist.xyz/socks4.txt',
'https://api.openproxylist.xyz/socks5.txt',
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
]
def main():
"""Main entry point."""
for url in URLS:
proxies = fetch_list(url)
print(proxies)
if __name__ == '__main__':
main()
fetch_all
Proxies can be fetched from multiple source URLs by using the fetch_all
function.
It takes a list of URLs and an optional max_workers
parameter. Proxies will be fetched from
the source URLs concurrently using a ThreadPoolExecutor
:
from harvester.proxy import fetch_all
URLS = [
'https://api.openproxylist.xyz/socks4.txt',
'https://api.openproxylist.xyz/socks5.txt',
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
]
def main():
"""Main entry point."""
proxies = fetch_all(URLS)
print(proxies)
if __name__ == '__main__':
main()
Testing
pip install -r requirements.txt
pip install -r requirement-dev.txt
pytest -v