harvester/README.md

37 lines
840 B
Markdown
Raw Normal View History

2023-11-06 19:41:05 +00:00
# Harvester
Python package for harvesting commonly available data, such as free proxy servers.
2023-11-06 19:53:50 +00:00
## Modules
### Proxy
The `proxy` module will harvest proxies from URLs with the `fetch_list` function.
It functions by running a regular expression against the HTTP response, looking for
strings that match a `username:password@ip:port` pattern where username and password
are optional.
```python
from harvester.proxy import fetch_list
URLS = [
'https://api.openproxylist.xyz/socks4.txt',
'https://api.openproxylist.xyz/socks5.txt',
'https://api.proxyscrape.com/?request=displayproxies&proxytype=socks4',
]
def main():
"""Main entry point."""
for url in URLS:
proxies = fetch_list(url)
print(proxies)
```
2023-11-06 19:41:05 +00:00
## Testing
```
2023-11-06 19:51:09 +00:00
pip install -r requirements.txt
pip install -r requirement-dev.txt
2023-11-06 19:41:05 +00:00
pytest -v
```