How to configure Trusted Proxies with Scrapy

Overview

This guide provides a step-by-step setup for a Scrapy project that uses TrustedProxies for rotating proxy support. You will set up a virtual environment, install dependencies, configure middleware, write a spider, and test the integration.

 

What is Scrapy?

Scrapy is a Python framework for fast, scalable web scraping. It handles asynchronous requests, data parsing, and export, making it ideal for large-scale web data extraction tasks.

 

Step-by-Step Setup:

 

Step 1: Create Virtual Environment

python3 -m venv scrapy_env
source scrapy_env/bin/activate

For Windows:
scrapy_env\Scripts\activate

Step 2: Install Scrapy & Rotating Proxy Middleware

pip install scrapy scrapy-rotating-proxies

Step 3: Create Scrapy Project

scrapy startproject chair_scraper
cd chair_scraper

Project structure:

 

chair_scraper/
├── chair_scraper/
│ ├── middlewares.py
│ ├── settings.py
│ └── spiders/
│ └── chair_products_rotating.py
└── scrapy.cfg

Step 4: Trusted Proxy List in settings.py

 

TrustedProxies provides a list of authenticated proxies. Configure them like so:

 

ROTATING_PROXY_LIST = [
"http://testuser:password@shp-testuser-us-v00001.tp-ns.com:27281",
"http://testuser:password@shp-testuser-us-v00002.tp-ns.com:27281",
"http://testuser:password@shp-testuser-us-v000012.tp-ns.com:27281",
]

Scrapy will rotate proxies automatically. If one fails, another will be retried.

 

Step 5: Random User-Agent Middleware

 

Some websites block scrapers by detecting default or repeated browser headers. To randomize headers:

Add to middlewares.py:

import random

class RandomUserAgentMiddleware:
def process_request(self, request, spider):
ua = random.choice(spider.settings.getlist('USER_AGENTS'))
request.headers.setdefault("User-Agent", ua)

Add to settings.py:

USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
]

 

Step 6: Enable All Middlewares in settings.py

 

Ensure these middleware settings are included:

 

DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
'chair_scraper.middlewares.RandomUserAgentMiddleware': 400,
}

RETRY_TIMES = 5
DOWNLOAD_TIMEOUT = 30

Step 7: Create the Spider

 

File: chair_scraper/spiders/chair_products_rotating.py

import scrapy

class ChairProductsRotatingSpider(scrapy.Spider):
name = "chair_products_rotating"
start_urls = ["https://customers.trusted.com/downloads/demo_products.html"]


def parse(self, response):
for product in response.css(".product"):
yield {
"title": product.css(".title::text").get(),
"price": product.css(".price::text").get(),
"description": product.css(".description::text").get(),
}

 

Step 8: Run the Spider

 

After creating and configuring your spider, you’re ready to run it and start scraping data.

 

In this example, the spider targets a demo page hosted by TrustedProxies at:

 

https://customers.trustedproxies.com/downloads/demo_products.html

 

This page contains sample product listings, which your spider will crawl to extract product details like title, price, and description.

 

How to run the spider:

 

  1. Activate your virtual environment (if not already active):

source scrapy_env/bin/activate

(For Windows users, run scrapy_env\Scripts\activate)

  1. Navigate to your Scrapy project root directory where the scrapy.cfg file is located.

  2. Run the spider with the following command:

scrapy crawl chair_products_rotating -o chairs.csv

where

 

scrapy crawl runs the spider.

chair_products_rotating is the spider’s name you defined.

-o chairs.csv exports the scraped data into a CSV file named chairs.csv.

 

What happens during execution:

Scrapy visits the demo products page.

It parses the HTML and extracts each product’s title, price, and description.

The scraped data is saved into chairs.csv in a structured tabular format.

 

Sample Output:

The resulting chairs.csv file will look like:

title,price,description
"Stylish Wooden Chair","$129.99","Beautiful and sturdy chair crafted from oak wood."
"Ergonomic Office Chair","$199.00","Comfortable office chair with adjustable height."
...

Running the spider on this demo page is a simple way to validate your Scrapy setup, proxy rotation, and data extraction logic before scaling up to production scraping tasks.

 

  • 0 Users Found This Useful
Was this answer helpful?

Related Articles

Configure Opera to Use a Proxy Server

All of our Proxy Server configuration settings can now be found here:   How To Set Up A Proxy...

Configure Microsoft Edge to Use a Proxy Server

All of our Proxy Server configuration settings can now be found here:   How To Set Up A Proxy...

Configuring PHP to use Proxy Servers

If you have a script that needs to send traffic via a Proxy Server, one of the best options is to...

Configure Google Chrome to Use a Proxy Server

All of our Proxy Server configuration settings can now be found here:   How To Set Up A Proxy...

Configure Market Samurai to Use Proxy Servers

This solution has been tested for WebHarvy (v3.4.0.119) To add your Trusted Proxies proxy...