data | ||
harvester | ||
migrations | ||
templates | ||
tests | ||
.gitignore | ||
.pylintrc | ||
alembic.ini | ||
main.py | ||
README.md | ||
requirements-dev.txt | ||
requirements.txt |
Harvester
Python package for harvesting commonly available data, such as free proxy servers.
Running the Demo
If you just want proxies, just run the demo code in main.py:
git clone https://git.juggalol.com/agatha/harvester
pip install -r requirements.txt
mkdir proxies
python main.py
Modules
Proxy
fetch_list
The proxy
module will harvest proxies from URLs with the fetch_list
function.
It functions by running a regular expression against the HTTP response, looking for
strings that match a username:password@ip:port
pattern where username and password
are optional.
fetch_all
Proxies can be fetched from multiple source URLs by using the fetch_all
function.
It takes a list of URLs and an optional max_workers
parameter. Proxies will be fetched from
the source URLs concurrently using a ThreadPoolExecutor
.
validate_socks
SOCKS5 proxies can be tested with the validate_socks
method. The method takes a proxy
string as its only argument. It returns a requests.Response
object if the request is successful
with no issues, otherwise it will raise an exception and the caller can decide how to proceed.
For an example implementation, see main.py.
Testing
I was trying to get into the habit of writing unit tests, but god damn I hate them. There are a few, but I don't plan on continuing any time soon.
pip install -r requirements.txt
pip install -r requirement-dev.txt
pytest -v
Greetz
Shoutouts to acidvegas. This project was inspired by the scripts in proxytools.