Before we dive into the piece, here’s a quick summary video for this post we created for this post using our Video service:
Public Internet proxies have a somewhat tainted reputation. A lot of people think they’re just for people to hide behind while plotting vile actions. They have legitimate uses, though. They let testers see how a site is localized for other parts of the world. They offer anonymity for investigative work or expression of opinions. They get around government-imposed blocking of sites.
There are huge numbers of proxy servers around. How do you find the ones that meet your needs? ProxyBroker can help. It’s a Python package for searching the Internet for proxies. You can use it as a command line tool or build your own interface using its API. It’s open-source code, available on GitHub under the Apache License 2.0.
It’s not a polished application. You’ll need some study to get it to work for you, but it’s helpful in finding the right proxy.
The package includes several capabilities:
- Searching for proxies, with a rich set of filters to find just the ones that meet your criteria.
- Running as a local proxy server.
- Getting geolocation information on a proxy.
In this tutorial, you will learn how to:
- Install ProxyBroker.
- Run it from the command line.
- Find a proxy that meets your needs.
You should be familiar with the command line and be comfortable about installing software on your computer. You will need root or administrative access. Knowledge of Python helps but isn’t mandatory.
If you’re running a command shell on Linux, Unix, or MacOS, you may have to put “sudo” in front of commands that perform the installation.
The first thing you need is Python, version 3.5 or higher. It may already be installed. To check, run this command:
You will see a response such as this:
If the version is 3.5 or higher, you’re good. If you get a message like “command not found” or a lower version, you’ll need to install the latest Python. The procedure is a little different for each operating system. Follow the instructions for yours.
You’ll need the pip package manager. Often pip is included with Python. To check, run this command:
If it’s missing, you’ll need to install it. If it’s already present, it can’t hurt to make sure it’s the latest version of pip. On Linux and Mac systems, this is the command:
pip install -U pip
On Windows, use this:
python -m pip install -U pip
ProxyBroker depends on several Python packages. Installing ProxyBroker may pick them up, but it’s cleanest to install them first. The packages are aiohttp, aiodns, and maxmindb. The commands to install them are:
pip install aiohttp pip install aiodns pip install maxminddb
On Windows, use these commands:
python -m pip install aiohttp python -m pip install aiodns python -m pip install maxminddb
Finally you’ve got everything you need! Install ProxyBroker with this command:
pip install proxybroker
Or on Windows:
python -m pip install proxybroker
To verify that it works, do this:
This should output a message listing the command options.
Finding proxies with find
There are two ways to find proxies with ProxyBroker. One is the “find” command, which allows detailed filtering and checks each one before listing it. The other, the “grab” command, is intended for bulk listing. It has fewer options, doesn’t check if the proxies are live, and saves the list to a file.
The search is asynchronous, checking for multiple proxies at once to create a list faster. ProxyBroker checks for cookie, referrer, and POST request support. Duplicates are removed when creating the list.
The most basic form of the “find” command is:
proxybroker find -l 10
This will find and list ten (or however many you specify) proxies. It’s not very useful by itself, since you haven’t told it what kind of proxies you’re looking for.
You can specify any or all of the following options to narrow the search and specify other behavior.
–countries country …
This option provides one or more 2-letter ISO country codes for the desired locations of proxies, separated by spaces.
You use this option if you want to search a file of proxy addresses rather than the Web. Its value is the path to the file.
This option checks proxies against the Domain Name System Blacklists, in order to avoid malicious proxies.
This option specifies the output format. The permitted values are:
- default is a plain text format.
- json produces a JSON structure.
This is the only required option. It specifies how many entries to list.
This is the level of anonymity. It applies to HTTP proxies only. The values are:
- transparent signifies that requests pass through unchanged, with no anonymity.
- anonymous proxies substitute their own IP address for yours.
- high means that the proxy provides extra protection for the IP address, such as rotating through a series of addresses or passing through multiple proxy layers.
Normally output goes to the console or standard output. This option specifies a file to write the results to.
Including this parameter indicates that the search should use a POST rather than a GET request. The default is GET. This option doesn’t take a value.
If this option is present, ProxyBroker will output verbose statistics. It doesn’t take a value.
Including this option instructs ProxyBroker to look for exact matches for the type and lvl parameters. The option doesn’t take a value. The default is that ProxyBroker looks for servers that equal or exceed the requirements. You can use -s as a synonym.
–types HTTP|HTTPS|SOCKS4|SOCKS5|CONNECT:25 CONNECT:80
This option specifies one or more protocols and connections to match. The permitted values are:
- HTTP proxies handle only HTTP (insecure) connections.
- HTTPS proxies handle secure HTTPS connections as well as HTTP.
- SOCKS4 is version 4 of SOCKS, a secure protocol for TCP/IP connections.
- SOCKS5 is version 5 of SOCKS, which also supports UDP connections.
- CONNECT:25 represents a Port 25 (email) connection.
- CONNECT:80 signifies a Port 80 (Web) connection.
Here is an example of a command that uses multiple qualifiers:
proxybroker find -l 20 --types SOCKS4 SOCKS5 --lvl anonymous --strict
To see what the available options are without checking back in this tutorial, you can enter:
proxybroker find --help
Listing proxies with grab
The “grab” command is similar to “find,” but it doesn’t check the proxies, and it doesn’t allow all the options. It will collect a large list more quickly. The options available are:
Here is an example:
proxybroker grab -l 500 --countries IS GR HU --format json --outfile proxies.js
You can list the available options with this:
proxybroker grab --help
Listing proxies with the API
A more flexible approach than the command line is using the API within your own Python application. You can create a GUI application or a Web service to find proxies. It provides more flexibility in setting operating parameters than the command line does.
A detailed explanation of how to use the API is beyond the scope of this tutorial. The API documentation provides the necessary information.
A find or grab operation uses the Broker class. It needs to be imported:
from proxybroker import Broker
The first step is to instantiate the class. There are several optional parameters when instantiating a Broker object, but the defaults will give reasonable behavior.
broker = Broker();
Example code for the find and grab functions is available. These functions return future objects so they can run asynchronously. The samples use the asyncio library to gather the results of the calls. It would be possible to run a simple loop and wait for each call to complete, but that would be very slow.
This article has given you a start with ProxyBroker, but there’s plenty more to learn. The best places for detailed information are the ProxyBroker GitHub repository and the online documentation. It’s sometimes possible to get questions answered on Stack Overflow. ProxyBroker doesn’t have an elegant user interface, but it can be very useful in searching through the huge number of proxies on the Internet.