2023-11-06 19:41:05 +00:00
|
|
|
# Harvester
|
|
|
|
Python package for harvesting commonly available data, such as free proxy servers.
|
|
|
|
|
2023-11-06 19:53:50 +00:00
|
|
|
## Modules
|
|
|
|
### Proxy
|
2023-11-06 21:29:44 +00:00
|
|
|
#### fetch_list
|
2023-11-06 19:53:50 +00:00
|
|
|
The `proxy` module will harvest proxies from URLs with the `fetch_list` function.
|
|
|
|
|
|
|
|
It functions by running a regular expression against the HTTP response, looking for
|
|
|
|
strings that match a `username:password@ip:port` pattern where username and password
|
|
|
|
are optional.
|
|
|
|
|
|
|
|
```python
|
|
|
|
from harvester.proxy import fetch_list
|
|
|
|
|
|
|
|
|
|
|
|
URLS = [
|
|
|
|
'https://api.openproxylist.xyz/socks4.txt',
|
|
|
|
'https://api.openproxylist.xyz/socks5.txt',
|
|
|
|
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
def main():
|
|
|
|
"""Main entry point."""
|
|
|
|
for url in URLS:
|
|
|
|
proxies = fetch_list(url)
|
|
|
|
print(proxies)
|
|
|
|
|
2023-11-06 19:54:28 +00:00
|
|
|
|
|
|
|
if __name__ == '__main__':
|
|
|
|
main()
|
|
|
|
|
2023-11-06 19:53:50 +00:00
|
|
|
```
|
|
|
|
|
2023-11-06 21:29:44 +00:00
|
|
|
#### fetch_all
|
|
|
|
Proxies can be fetched from multiple source URLs by using the `fetch_all` function.
|
|
|
|
|
|
|
|
It takes a list of URLs and an optional `max_workers` parameter. Proxies will be fetched from
|
|
|
|
the source URLs concurrently using a `ThreadPoolExecutor`:
|
|
|
|
|
|
|
|
```python
|
|
|
|
from harvester.proxy import fetch_all
|
|
|
|
|
|
|
|
|
|
|
|
URLS = [
|
|
|
|
'https://api.openproxylist.xyz/socks4.txt',
|
|
|
|
'https://api.openproxylist.xyz/socks5.txt',
|
|
|
|
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
def main():
|
|
|
|
"""Main entry point."""
|
|
|
|
proxies = fetch_all(URLS)
|
|
|
|
print(proxies)
|
|
|
|
|
|
|
|
|
|
|
|
if __name__ == '__main__':
|
|
|
|
main()
|
2023-11-06 21:33:56 +00:00
|
|
|
|
2023-11-06 21:29:44 +00:00
|
|
|
```
|
|
|
|
|
2023-11-08 00:02:52 +00:00
|
|
|
#### validate_socks
|
|
|
|
SOCKS5 proxies can be tested with the `validate_socks` method. The method takes a proxy
|
|
|
|
string as its only argument. It returns a `requests.Response` object if the request is successful
|
|
|
|
with no issues, otherwise it will raise an exception and the caller can decide how to proceed.
|
|
|
|
|
|
|
|
For an example implementation, see [main.py](main.py).
|
|
|
|
|
2023-11-06 19:41:05 +00:00
|
|
|
## Testing
|
|
|
|
```
|
2023-11-06 19:51:09 +00:00
|
|
|
pip install -r requirements.txt
|
|
|
|
pip install -r requirement-dev.txt
|
2023-11-06 19:41:05 +00:00
|
|
|
pytest -v
|
|
|
|
```
|