Overview
This script demonstrates how to set up a headless browser (using Chromium via Playwright) for web automation tasks while routing traffic through a trusted proxy. Although the code currently uses Chromium, it can be adapted to work with Firefox as well by modifying the browser launch settings. The main purpose of the script is to obtain a proxy server from command-line arguments, fetch a random user agent from a trusted source, and then launch a headless (or non-headless) browser session that uses these settings. The session is used to navigate to a target URL (in this case, https://httpbin.org/ip) while evading detection by removing the navigator.webdriver property.
Installation and Dependencies:
Prerequisites:
- Python 3.7 or higher must be installed.
- A stable internet connection is required.
Virtual Environment (Recommended):
From the command-line, navigate to the project folder. Create a virtual environment with:
python -m venv venv
Activate the environment:
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
Required Python Packages:
Install Playwright and requests using pip:
pip install playwright requests
Install the necessary browsers for Playwright by running:
python -m playwright install
Script Configuration:
The script expects a proxy server URL to be passed with the --proxy-server flag. If the proxy is not provided, the script exits with an error message.
User agents are fetched from Trusted Proxies's site:
https://customers.trustedproxies.com/downloads/desktop_useragents.txt
One is chosen randomly for each run.
The script is designed to launch a headless Chrome session by default (headless can be set to False for debugging). It sets various Chromium arguments to disable sandboxing, disable infobars, and remove automation flags.
A JavaScript snippet is injected into every browser context to remove the navigator.webdriver property, helping to prevent detection.
Usage:
Run the script from the command line, providing the proxy setting, for example:
python playwright-script.py --proxy-server=http://your-proxy:port
The script prints diagnostic messages indicating which proxy and user agent are in use, along with the target URL.
It then launches the browser, navigates to the target URL, and performs a scroll-down action to mimic user interaction.
Testing Procedure:
- Verify that you can access the user agent URL from your network.
- Run the script with a valid proxy server.
- Confirm via the output (and optionally by checking httpbin.org/ip in the browser) that the proxy is being used and the user agent is randomized.
- Check the printed diagnostic messages for errors during navigation or proxy setup.
- In non-headless mode, observe the browser window to ensure that it loads the target page and scrolls as intended.
Additional Notes:
The script uses asynchronous programming (async/await) with Playwright's asynchronous API for efficient browser automation. Error handling is in place for issues such as a missing proxy parameter, failures when fetching user agents, or timeouts when loading the target URL.
To adapt the script for Firefox or to run in headless mode for production, update the browser launch options accordingly. This script is ideal for running automated tests, scraping data while using trusted proxies, and ensuring that browser automation tasks remain undetected.
Sample Code:
import sys
import asyncio
import random
import requests
from playwright.async_api import async_playwright
def get_proxy_from_args():
proxy_server = None
for arg in sys.argv[1:]:
if arg.startswith("--proxy-server="):
proxy_server = arg.split("=")[1]
if not proxy_server:
print("Error: Please provide a proxy using the --proxy-server flag. Example: --proxy-server=http://your-proxy:port")
sys.exit(1)
return proxy_server
def get_random_user_agent():
url = "https://customers.trustedproxies.com/downloads/desktop_useragents.txt"
try:
response = requests.get(url, timeout=10)
if response.status_code != 200 or not response.text.strip():
raise Exception("Failed to fetch user agents")
user_agents = [ua.strip() for ua in response.text.splitlines() if ua.strip()]
if not user_agents:
raise Exception("No user agents found in the fetched data")
return random.choice(user_agents)
except Exception as e:
print(f"Error fetching user agents: {e}")
raise
async def main():
proxy_server = get_proxy_from_args()
random_user_agent = get_random_user_agent()
random_url = "https://httpbin.org/ip"
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=False, # Change to True for headless mode
proxy={"server": proxy_server},
args=[
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-infobars",
"--disable-blink-features=AutomationControlled",
"--disable-extensions",
"--disable-dev-shm-usage",
"--disable-gpu",
"--log-level=3"
]
)
context = await browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent=random_user_agent
)
# Remove navigator.webdriver for detection evasion
await context.add_init_script("Object.defineProperty(navigator, 'webdriver', { get: () => undefined });")
page = await context.new_page()
print(f"Using proxy-server: {proxy_server}")
print(f"Using User-Agent: {random_user_agent}")
print(f"Navigating to: {random_url}")
try:
await page.goto(random_url, wait_until="networkidle", timeout=30000)
# Scroll the page gradually (500ms interval between scroll steps)
await page.evaluate("""async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 100;
const timer = setInterval(() => {
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= document.body.scrollHeight) {
clearInterval(timer);
resolve();
}
}, 500);
});
}""")
await asyncio.sleep(5)
except Exception as e:
print("Error during navigation:", e)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())