How to configure Trusted Proxies with Scrapy

Overview

This guide provides a step-by-step setup for a Scrapy project that uses TrustedProxies for rotating proxy support. You will set up a virtual environment, install dependencies, configure middleware, write a spider, and test the integration.

What is Scrapy?

Scrapy is a Python framework for fast, scalable web scraping. It handles asynchronous requests, data parsing, and export, making it ideal for large-scale web data extraction tasks.

Step-by-Step Setup:

Step 1: Create Virtual Environment

python3 -m venv scrapy_env
source scrapy_env/bin/activate

For Windows:
scrapy_env\Scripts\activate

Step 2: Install Scrapy & Rotating Proxy Middleware

pip install scrapy scrapy-rotating-proxies

Step 3: Create Scrapy Project

scrapy startproject chair_scraper
cd chair_scraper

Project structure:

chair_scraper/
├── chair_scraper/
│ ├── middlewares.py
│ ├── settings.py
│ └── spiders/
│ └── chair_products_rotating.py
└── scrapy.cfg

Step 4: Trusted Proxy List in settings.py

TrustedProxies provides a list of authenticated proxies. Configure them like so:

ROTATING_PROXY_LIST = [
"http://testuser:password@shp-testuser-us-v00001.tp-ns.com:27281",
"http://testuser:password@shp-testuser-us-v00002.tp-ns.com:27281",
"http://testuser:password@shp-testuser-us-v000012.tp-ns.com:27281",
]

Scrapy will rotate proxies automatically. If one fails, another will be retried.

Step 5: Random User-Agent Middleware

Some websites block scrapers by detecting default or repeated browser headers. To randomize headers:

Add to middlewares.py:

import random

class RandomUserAgentMiddleware:
def process_request(self, request, spider):
ua = random.choice(spider.settings.getlist('USER_AGENTS'))
request.headers.setdefault("User-Agent", ua)

Add to settings.py:

USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",
]

Step 6: Enable All Middlewares in settings.py

Ensure these middleware settings are included:

DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
'chair_scraper.middlewares.RandomUserAgentMiddleware': 400,
}

RETRY_TIMES = 5
DOWNLOAD_TIMEOUT = 30

Step 7: Create the Spider

File: chair_scraper/spiders/chair_products_rotating.py

import scrapy

class ChairProductsRotatingSpider(scrapy.Spider):
name = "chair_products_rotating"
start_urls = ["https://customers.trusted.com/downloads/demo_products.html"]

def parse(self, response):
for product in response.css(".product"):
yield {
"title": product.css(".title::text").get(),
"price": product.css(".price::text").get(),
"description": product.css(".description::text").get(),
}

Step 8: Run the Spider

After creating and configuring your spider, you’re ready to run it and start scraping data.

In this example, the spider targets a demo page hosted by TrustedProxies at:

https://customers.trustedproxies.com/downloads/demo_products.html

This page contains sample product listings, which your spider will crawl to extract product details like title, price, and description.

How to run the spider:

Activate your virtual environment (if not already active):

source scrapy_env/bin/activate

(For Windows users, run scrapy_env\Scripts\activate)

Navigate to your Scrapy project root directory where the scrapy.cfg file is located.
Run the spider with the following command:

scrapy crawl chair_products_rotating -o chairs.csv

where

scrapy crawl runs the spider.

chair_products_rotating is the spider’s name you defined.

-o chairs.csv exports the scraped data into a CSV file named chairs.csv.

What happens during execution:

Scrapy visits the demo products page.

It parses the HTML and extracts each product’s title, price, and description.

The scraped data is saved into chairs.csv in a structured tabular format.

Sample Output:

The resulting chairs.csv file will look like:

title,price,description
"Stylish Wooden Chair","$129.99","Beautiful and sturdy chair crafted from oak wood."
"Ergonomic Office Chair","$199.00","Comfortable office chair with adjustable height."
...

Running the spider on this demo page is a simple way to validate your Scrapy setup, proxy rotation, and data extraction logic before scaling up to production scraping tasks.

Help Guides

Categories

Categories

Related Articles

Tag Cloud

Support

Help Guides

Categories

Categories