代码样例-Http隧道
本文档包含编程请求http隧道的代码样例,供开发者参考。
代码样例使用说明
- 代码样例不能直接运行,因为代码中的隧道服务器域名
XXX.XXX.com
、端口号15818
、用户名username
(隧道代理tid)、密码password
都是虚构的,请替换成您自己的信息。查看我的用户名密码>> - 隧道代理不需要使用API链接等其他方式获取代理IP,所有请求将在隧道服务器上进行更换IP并转发。
- 建议关闭HTTP协议的keep-alive功能,避免因连接复用导致隧道不能切换IP。
- 代码样例正常运行所需的运行环境和注意事项在样例末尾均有说明,使用前请仔细阅读。
- 使用代码样例过程中遇到问题请联系售后客服,我们会为您提供技术支持。
特别注意
以下代码样例均为基础样例,运行基础样例并不能保证成功爬取目标网站。目标网站通常具备反爬机制,比如跳转需要输入验证码的页面。
我们建议您在开发过程中基于基础样例进行如下改进:
- 合理控制对目标网站的请求频率,建议对同一网站1个代理IP每秒请求不超过1次;
- 发出的http请求尽可能带上完整的header信息。
Python3
requests
requests(推荐)
使用提示
- 基于requests的代码样例支持访问http,https网页,推荐使用
- requests不是python原生库,需要安装才能使用:
pip install requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求隧道服务器
请求http和https网页均适用
"""
import requests
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名密码方式
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": tunnel},
# "https": "http://%(proxy)s/" % {"proxy": tunnel}
# }
# 要访问的目标网页
target_url = "https://dev.kdlapi.com/testproxy"
# 使用隧道域名发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print(response.text) # 请勿使用keep-alive复用连接(会导致隧道不能切换IP)
aiohttp
aiohttp
使用提示
- 基于aiohttp的代码样例支持访问http,https网页
- aiohttp不是python原生库,需要安装才能使用:
pip install aiohttp
- aiohttp只支持Python3.5及以上
- 如Windows系统使用aiohttp访问https网站抛出异常,在import asyncio后调用 asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())即可解决。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用aiohttp请求代理服务器
请求http和https网页均适用
"""
import aiohttp
import asyncio
# asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) windows系统请求https网站报错时调用此方法
page_url = "https://dev.kdlapi.com/testproxy" # 要访问的目标网页
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名和密码方式
username = "username"
password = "password"
proxy_auth = aiohttp.BasicAuth(username, password)
async def fetch(session, url):
async with session.get(url, proxy="http://"+tunnel, proxy_auth=proxy_auth) as response:
return await response.text()
async def main():
# aiohttp默认使用严格的HTTPS协议检查。可以通过将ssl设置为False来放松认证检查
# async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False)) as session:
async with aiohttp.ClientSession() as session:
html = await fetch(session, page_url)
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
httpx
httpx
使用提示
- 基于httpx的代码样例支持访问http,https网页
- httpx不是python原生库,需要安装才能使用:
pip install httpx
- httpx运行环境要求 Python3.7+
- httpx暂时还不支持SOCKS代理
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求代理服务器
请求http和https网页均适用
"""
import httpx
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名和密码方式
username = "username"
password = "password"
proxy_url = "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
proxies = httpx.Proxy(
url=proxy_url
)
with httpx.Client(proxies=proxies) as client:
r = client.get('https://dev.kdlapi.com/testproxy')
print(r.text)
urllib
urllib
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用urllib请求隧道服务器
请求http和https网页均适用
"""
import urllib.request
import ssl
# 全局取消证书验证,避免访问https网页报错
ssl._create_default_https_context = ssl._create_unverified_context
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名密码方式
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": tunnel},
# "https": "http://%(proxy)s/" % {"proxy": tunnel}
# }
# 要访问的目标网页
target_url = "https://dev.kdlapi.com/testproxy"
# 使用隧道域名发送请求
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support)
# urllib.request.install_opener(opener) 注意此处是全局设置代理,如用这种写法进程内之后的所有urllib请求都会使用代理
# response = urllib.request.urlopen(target_url)
response = opener.open(target_url)
# 获取页面内容
if response.code == 200:
print(response.read().decode('utf-8'))
httpclient
httpclient(IP白名单)
使用提示
- 基于httpclient的代码样例同时支持访问http和https网页
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用http.client请求代理服务器
请求http和https网页均适用
"""
import http.client
# 代理服务器的地址和端口
proxy_host = "XXX.XXX.com"
proxy_port = 15818
# 目标服务器的地址
target_host = "dev.kdlapi.com"
# 创建连接对象
conn = http.client.HTTPSConnection(proxy_host, proxy_port)
# 设置代理信息
conn.set_tunnel(target_host)
# 发送请求
conn.request("GET", "/testproxy")
# 获取响应
response = conn.getresponse()
# 打印响应状态和内容
print(response.status, response.reason)
print(response.read().decode('utf-8'))
# 关闭连接
conn.close()
socket
socket
使用提示
- 基于socket的代码样例支持访问http,https网页
- socks不是python原生库,需要安装才能使用:
pip install PySocks
- 使用socket发起http请求需要按http协议格式完整构造http request
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用socket请求隧道服务器
请求http和https网页均适用
"""
import socket
import socks # pip install PySocks
socks.set_default_proxy(socks.HTTP, addr='XXX.XXX.com', port=15818, username='username',password='password') # 设置代理类型为HTPP
# socks.set_default_proxy(socks.SOCKS5, addr='XXX.XXX.com', port=20818) # 设置代理类型为socks
socket.socket = socks.socksocket # 把代理添加到socket
def main():
sock = socket.socket()
sock.connect(('dev.kdlapi.com', 80)) # 连接
# 按照http协议格式完整构造http request
request = 'GET https://dev.kdlapi.com/testproxy \r\nUser-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36\r\n Connection: close' # 包含method, url, headers
response = b'' # 接收数据
sock.send(request.encode()) # 发送请求
chunk = sock.recv(1024) # 一次接收1024字节数据
while chunk: # 循环接收数据,若没有数据了说明已接收完
response += chunk
chunk = sock.recv(1024)
print(response.decode())
if __name__ == '__main__':
main()
pyppeteer
pyppeteer
使用提示
- 基于pyppeteer的代码样例支持访问http,https网页
- pyppeteer不是python原生库,需要安装才能使用:
pip install pyppeteer
- pyppeteer只支持Python3.5及以上
- pyppeteer是异步渲染网页,需要使用asyncio等库
#!/#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
请求http和https网页均适用
"""
import asyncio
from pyppeteer import launch
# 隧道服务器
proxy_raw = "XXX.XXX.com:15818"
def accounts():
# 用户名密码, 若已添加白名单则不需要添加
username = "username"
password = "password"
account = {"username": username, "password": password}
return account
async def main():
# 要访问的目标网页
target_url = "https://dev.kdlapi.com/testproxy"
browser = await launch({'headless': False, 'args': ['--disable-infobars', '--proxy-server=' + proxy_raw]})
page = await browser.newPage()
await page.authenticate(accounts()) # 白名单方式,注释本行(需提前设置白名单)
await page.setViewport({'width': 1920, 'height': 1080})
# 使用代理IP发送请求
await page.goto(target_url)
await asyncio.sleep(209)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
playwright
playwright
使用提示
- 基于playwright的代码样例支持访问http,https网页
- playwright不是python原生库,需要安装才能使用:
pip install playwright
- 如果您的计算机上没有支持的浏览器,需要执行
playwright install
以安装依赖文件 - playwright只支持Python3.7及以上
- playwright支持同步或异步执行,以下为同步执行示例
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
请求http和https网页均适用
"""
from playwright.sync_api import sync_playwright
# 隧道服务器:端口
tunnel = "XXX.XXX.com:15818"
# 用户名密码方式
username = "username"
password = "password"
# 要访问的目标网页
url = "https://dev.kdlapi.com/testproxy"
proxies = {
"server": tunnel,
"username": username,
"password": password
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "server": tunnel,
# }
with sync_playwright() as playwright:
# headless=True 无头模式,不显示浏览器窗口
# browser = playwright.chromium.launch(channel="msedge", headless=True, proxy=proxies) # Microsoft Edge 浏览器
# browser = playwright.firefox.launch(headless=True, proxy=proxies) # Mozilla Firefox 浏览器
# browser = playwright.webkit.launch(headless=True, proxy=proxies) # WebKit 浏览器,如 Apple Safari
browser = playwright.chromium.launch(channel="chrome", headless=True, proxy=proxies) # Google Chrome 浏览器
context = browser.new_context()
page = context.new_page()
page.goto(url)
content = page.content()
print(content)
# other actions...
browser.close()
Python2
requests
requests(推荐)
使用提示
- 基于requests的代码样例支持访问http,https网页,推荐使用
- requests不是python原生库,需要安装才能使用:
pip install requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求隧道服务器
请求http和https网页均适用
"""
import requests
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名密码方式
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": tunnel},
# "https": "http://%(proxy)s/" % {"proxy": tunnel}
# }
# 要访问的目标网页
target_url = "https://dev.kdlapi.com/testproxy"
# 使用隧道域名发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print response.text
urllib2
urllib2
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用urllib2请求隧道服务器
请求http和https网页均适用
"""
import urllib2
import ssl
# 全局取消证书验证,避免访问https网页报错
ssl._create_default_https_context = ssl._create_unverified_context
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名密码方式
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
"https": "https://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": tunnel},
# "https": "https://%(proxy)s/" % {"proxy": tunnel}
# }
# 要访问的目标网页
target_url = "https://dev.kdlapi.com/testproxy"
# 使用隧道域名发送请求
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support)
# urllib2.install_opener(opener) 注意此处是全局设置代理,如用这种写法进程内之后的所有urllib请求都会使用代理
# response = urllib2.urlopen(target_url)
response = opener.open(target_url)
# 获取页面内容
if response.code == 200:
print response.read()
Python-Selenium
Chrome
Chrome(IP白名单,推荐)
使用提示
- 基于白名单方式使用Selenium+Chrome认证代理
- 运行环境要求python2/3 + selenium + Chrome + Chromedriver + Windows/Linux/macOS
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应)
- selenium不是python原生库,需要安装才能使用:
pip install selenium
(注意:selenium 4.6版本开始,无需手动下载driver) - 请注意替换代码中的部分信息:
${tunnelhost:tunnelport}:隧道域名:端口号,如:"XXX.XXX.com:15818"
${chromedriver_path}:您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=http://${tunnelhost:tunnelport}') # 隧道域名:端口号
# selenium 4.6及以上
driver = webdriver.Chrome(options=chrome_options)
# ${chromedriver_path}: chromedriver驱动存放路径
# driver = webdriver.Chrome(executable_path="${chromedriver_path}", options=chrome_options)
driver.get("https://dev.kdlapi.com/testproxy")
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Chrome(用户名密码认证)
使用提示
- 基于用户名密码方式使用Selenium+Chrome认证代理
- 运行环境要求python2/3 + selenium + Chrome + Chromedriver + Windows/Linux/macOS
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应)
- selenium不是python原生库,需要安装才能使用:
pip install selenium
(注意:selenium 4.6版本开始,无需手动下载driver) -
请注意替换代码中的部分信息:
${tunnelhost}:隧道域名
${tunnelport}:端口号
${username}:用户名
${password}:密码
${chromedriver_path}:您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import string
import zipfile
import time
def create_proxyauth_extension(tunnelhost, tunnelport, proxy_username, proxy_password, scheme='http', plugin_path=None):
"""代理认证插件
args:
tunnelhost (str): 你的代理地址或者域名(str类型)
tunnelport (int): 代理端口号(int类型)
proxy_username (str):用户名(字符串)
proxy_password (str): 密码 (字符串)
kwargs:
scheme (str): 代理方式 默认http
plugin_path (str): 扩展的绝对路径
return str -> plugin_path
"""
if plugin_path is None:
plugin_path = 'vimm_chrome_proxyauth_plugin.zip'
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = string.Template(
"""
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "${scheme}",
host: "${host}",
port: parseInt(${port})
},
bypassList: ["foobar.com"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "${username}",
password: "${password}"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
"""
).substitute(
host=tunnelhost,
port=tunnelport,
username=proxy_username,
password=proxy_password,
scheme=scheme,
)
with zipfile.ZipFile(plugin_path, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
return plugin_path
proxyauth_plugin_path = create_proxyauth_extension(
tunnelhost="${tunnelhost}", # 隧道域名
tunnelport="${tunnelport}", # 端口号
proxy_username="${username}", # 用户名
proxy_password="${password}" # 密码
)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_extension(proxyauth_plugin_path)
# selenium 4.6及以上
driver = webdriver.Chrome(options=chrome_options)
# ${chromedriver_path}: chromedriver驱动存放路径
# driver = webdriver.Chrome(executable_path="${chromedriver_path}", options=chrome_options)
driver.get("https://dev.kdlapi.com/testproxy")
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Firefox
Firefox(IP白名单,推荐)
使用提示
- 基于白名单方式使用Selenium+Firefox认证代理
- 运行环境要求python2/3 + selenium + Firefox + geckodriver + Windows/Linux/macOS
- 下载geckodriver(注意geckodriver版本要和Firefox版本对应)
- selenium不是python原生库,需要安装才能使用:
pip install selenium
(注意:selenium 4.6版本开始,无需手动下载driver) - 请注意替换代码中的部分信息:
${geckodriver_path}:您本机geckodriver驱动存放路径,如:"C:\geckodriver.exe"
#!/usr/bin/env python
# encoding: utf-8
import time
from selenium import webdriver
fp = webdriver.FirefoxProfile()
proxy_ip = "XXX.XXX.com" # 隧道服务器域名
proxy_port = 15818 # 端口号
fp.set_preference('network.proxy.type', 1)
fp.set_preference('network.proxy.http', proxy_ip)
fp.set_preference('network.proxy.http_port', proxy_port)
fp.set_preference('network.proxy.ssl', proxy_ip)
fp.set_preference('network.proxy.ssl_port', proxy_port)
driver = webdriver.Firefox(executable_path="${geckodriver_path}", firefox_profile=fp)
driver.get('https://dev.kdlapi.com/testproxy')
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Firefox(用户名密码认证)
使用提示
- 基于用户名密码方式使用Selenium-wire+Firefox认证代理
- 运行环境要求python3.4以上 + selenium-wire + Firefox + geckodriver + Windows/Linux/macOS
- 下载geckodriver(注意geckodriver版本要和Firefox版本对应)
- selenium-wire不是python原生库,需要安装才能使用:
pip install selenium-wire
- 请注意替换代码中的部分信息:
${geckodriver_path}:您本机geckodriver驱动存放路径,如:"C:\geckodriver.exe"
#!/usr/bin/env python
# encoding: utf-8
import time
from seleniumwire import webdriver # pip install selenium-wire
options = {
'proxy': {
'http': 'http://username:password@XXX.XXX.com:15818',
'https': 'http://username:password@XXX.XXX.com:15818',
}
}
driver = webdriver.Firefox(seleniumwire_options=options,executable_path="${geckodriver_path}")
driver.get('https://dev.kdlapi.com/testproxy')
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Python-DrissionPage
IP白名单,推荐
使用提示
- 基于白名单方式使用
- 运行环境要求 python3 + Windows/Linux
- 支持Chromium内核浏览器(如 Chrome 和 Edge)
- DrissionPage不是python原生库,需要安装才能使用:
pip install DrissionPage
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from DrissionPage import WebPage, ChromiumOptions
import time
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 要访问的目标网页
url = "https://dev.kdlapi.com/testproxy"
co = ChromiumOptions()
co.set_proxy("http://" + tunnel)
page = WebPage(chromium_options=co)
page.get(url)
# 获取页面内容
print(page.html)
# 等待3秒后关闭页面
time.sleep(3)
page.quit()
用户名密码认证
使用提示
- 基于用户名密码方式使用
- 运行环境要求 python3 + Windows/Linux
- 支持Chromium内核浏览器(如 Chrome 和 Edge)
- DrissionPage不是python原生库,需要安装才能使用:
pip install DrissionPage
- 请注意替换代码中的部分信息:
tunnelhost:隧道域名
tunnelport:端口号
username:代理用户名
password:代理密码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from DrissionPage import WebPage, ChromiumOptions
import string
import os
import time
tunnelhost = 'tunnelhost' # 隧道域名
tunnelport = 'tunnelport' # 端口号
username = 'username' # 代理用户名
password = 'password' # 代理密码
# 要访问的目标网页
url = 'https://dev.kdlapi.com/testproxy'
def create_proxyauth_extension(proxy_host, proxy_port, proxy_username, proxy_password, scheme='http', plugin_folder=None):
if plugin_folder is None:
plugin_folder = 'kdl_Chromium_Proxy' # 插件文件夹名称
if not os.path.exists(plugin_folder):
os.makedirs(plugin_folder)
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "kdl_Chromium_Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking",
"browsingData"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = string.Template("""
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "${scheme}",
host: "${host}",
port: parseInt(${port})
},
bypassList: []
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "${username}",
password: "${password}"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""").substitute(
host=proxy_host,
port=proxy_port,
username=proxy_username,
password=proxy_password,
scheme=scheme,
)
with open(os.path.join(plugin_folder, "manifest.json"), "w") as manifest_file:
manifest_file.write(manifest_json)
with open(os.path.join(plugin_folder, "background.js"), "w") as background_file:
background_file.write(background_js)
return plugin_folder
proxyauth_plugin_folder = create_proxyauth_extension(
proxy_host=tunnelhost,
proxy_port=tunnelport,
proxy_username=username,
proxy_password=password
)
co = ChromiumOptions()
current_directory = os.path.dirname(os.path.abspath(__file__))
co.add_extension(os.path.join(current_directory, 'kdl_Chromium_Proxy'))
page = WebPage(chromium_options=co)
page.get(url)
# 获取页面内容
print(page.html)
# 等待3秒后关闭页面
time.sleep(3)
page.quit()
Python-Scrapy
使用提示
- http/https 网页均可适用
- scrapy 不是 python 原生库,需要安装才能使用:
pip install scrapy
- 在第一级 tutorial 目录下运行如下命令查看结果:
scrapy crawl kdl
- Scrapy 在使用隧道代理可能会出现复用之前建立的连接导致不能正常更换IP,请在 header 中加入
Connection: close
Scrapy项目目录
运行命令:scrapy startproject tutorial
新建 Scrapy 项目,创建包含下列内容的 tutorial
目录
tutorial/
scrapy.cfg # 项目的配置文件
tutorial/ # 该项目的python模块。之后您将在此加入代码
__init__.py
items.py # 项目中的item文件
pipelines.py # 项目中的pipelines文件
settings.py # 项目的设置文件
spiders/ # 放置spider代码的目录
__init__.py
...
kdl_spider.py
编写爬虫(Spider):在 tutorial/spiders/
目录下新建 kdl_spider.py
文件
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import scrapy
class KdlSpider(scrapy.spiders.Spider):
name = "kdl"
def start_requests(self):
url = "https://dev.kdlapi.com/testproxy"
yield scrapy.Request(url, callback=self.parse)
def parse(self, response):
print(response.text)
# 如scrapy报ssl异常"('SSL routines', 'ssl3_get_record', 'wrong version number')", 您可以尝试打开以下代码解决
# from OpenSSL import SSL
# from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory
#
# init = ScrapyClientContextFactory.__init__
# def init2(self, *args, **kwargs):
# init(self, *args, **kwargs)
# self.method = SSL.SSLv23_METHOD
# ScrapyClientContextFactory.__init__ = init2
middlewares.py
middlewares.py
中新增ProxyDownloaderMiddleware
即代理中间件- 请注意替换代码中的部分信息:username:用户名,password:密码,*XXX.XXX.com:隧道域名
# -*- coding: utf-8 -*- from scrapy import signals class ProxyDownloaderMiddleware: _proxy = ('XXX.XXX.com', '15818') def process_request(self, request, spider): # 用户名密码认证 username = "username" password = "password" request.meta['proxy'] = "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": ':'.join(ProxyDownloaderMiddleware._proxy)} # 白名单认证 # request.meta['proxy'] = "http://%(proxy)s/" % {"proxy": proxy} request.headers["Connection"] = "close" return None def process_exception(self, request, exception, spider): """捕获407异常""" if "'status': 407" in exception.__str__(): # 不同版本的exception的写法可能不一样,可以debug出当前版本的exception再修改条件 from scrapy.resolver import dnscache dnscache.__delitem__(ProxyDownloaderMiddleware._proxy[0]) # 删除proxy host的dns缓存 return exception
settings.py
settings.py
中激活ProxyDownloaderMiddleware
代理中间件
# -*- coding: utf-8 -*-
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'tutorial.middlewares.ProxyDownloaderMiddleware': 100,
}
Python-feapder
使用提示
- http/https 网页均可适用
- 运行环境要求 python3.6 以上
- feapder 不是 python 原生库,需要安装才能使用:
pip install feapder
- 使用命令
feapder create -s py3_feapder
创建一个轻量爬虫
py3_feapder.py
py3_feapder.py
中新增download_midware
方法,即下载中间件
import feapder
class Py3Feapder(feapder.AirSpider):
def start_requests(self):
yield feapder.Request("https://dev.kdlapi.com/testproxy")
def download_midware(self, request):
# 隧道域名:端口号
tunnel = "XXX.XXX.com:15818"
# 用户名密码认证
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
# 白名单认证(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": tunnel},
# "https": "http://%(proxy)s/" % {"proxy": tunnel}
# }
request.proxies = proxies
return request
def parse(self, request, response):
print(response.text)
if __name__ == "__main__":
Py3Feapder().start()
Java
okhttp3
okhttp-3.8.1
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient不会发送两次进行认证,与使用白名单效果相同
- 使用用户名密码验证时必须重写
Authenticator
的authenticate
方法 - 添加依赖
- 建议关闭HTTP协议的keep-alive功能,避免因连接复用导致隧道不能切换IP。
import okhttp3.*;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Proxy;
public class TestProxyOKHttpClient {
public static void main(String args[]) throws IOException {
// 目标网站
String targetUrl = "https://dev.kdlapi.com/testproxy";
// 用户名密码, 若已添加白名单则不需要添加
final String username = "username";
final String password = "password";
String ip = "XXX.XXX.com"; // 代理服务器IP
int port = 15818;
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(ip, port));
Authenticator authenticator = new Authenticator() {
@Override
public Request authenticate(Route route, Response response) throws IOException {
String credential = Credentials.basic(username, password);
return response.request().newBuilder()
.header("Proxy-Authorization", credential)
.build();
}
};
OkHttpClient client = new OkHttpClient.Builder()
.proxy(proxy)
.proxyAuthenticator(authenticator)
.build();
Request request = new Request.Builder()
.url(targetUrl)
.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36")
.addHeader("Connection","close")
.build();
Response response = client.newCall(request).execute();
System.out.println(response.body().string());
}
}
httpclient
HttpClient-4.5.6
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 若有多个用户名、密码进行认证需要在代码中须添加
AuthCacheValue.setAuthCache(new AuthCacheImpl());
- 依赖包下载:
httpclient-4.5.6.jar
httpcore-4.4.10.jar
commons-codec-1.10.jar
commons-logging-1.2.jar
import java.net.URL;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
* 使用httpclient请求隧道服务器 请求http和https网页均适用
*/
public class TestProxyHttpClient {
private static String pageUrl = "https://dev.kdlapi.com/testproxy"; // 要访问的目标网页
private static String proxyIp = "XXX.XXX.com"; // 隧道服务器域名
private static int proxyPort = 15818; // 端口号
// 用户名密码, 若已添加白名单则不需要添加
private static String username = "username";
private static String password = "password";
public static void main(String[] args) throws Exception {
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(new AuthScope(proxyIp, proxyPort),
new UsernamePasswordCredentials(username, password));
CloseableHttpClient httpclient = HttpClients.custom().setDefaultCredentialsProvider(credsProvider).build();
try {
URL url = new URL(pageUrl);
HttpHost target = new HttpHost(url.getHost(), url.getDefaultPort(), url.getProtocol());
HttpHost proxy = new HttpHost(proxyIp, proxyPort);
/*
httpclient各个版本设置超时都略有不同, 此处对应版本4.5.6
setConnectTimeout:设置连接超时时间
setConnectionRequestTimeout:设置从connect Manager获取Connection 超时时间
setSocketTimeout:请求获取数据的超时时间
*/
RequestConfig config = RequestConfig.custom().setProxy(proxy).setConnectTimeout(6000)
.setConnectionRequestTimeout(2000).setSocketTimeout(6000).build();
HttpGet httpget = new HttpGet(url.getPath());
httpget.setConfig(config);
httpget.addHeader("Accept-Encoding", "gzip"); // 使用gzip压缩传输数据让访问更快
httpget.addHeader("Connection", "close");
httpget.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36");
CloseableHttpResponse response = httpclient.execute(target, httpget);
try {
System.out.println(response.getStatusLine());
System.out.println(EntityUtils.toString(response.getEntity()));
} finally {
response.close();
}
} finally {
httpclient.close();
}
}
}
jsoup
使用jsoup发起请求
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 若有多个用户名、密码进行认证需要在代码中须添加
AuthCacheValue.setAuthCache(new AuthCacheImpl());
- 依赖包下载:
jsoup-1.13.1.jar
import java.io.IOException;
import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.Proxy;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class TestProxyJsoup {
// 用户名密码, 若已添加白名单则不需要添加
final static String ProxyUser = "username";
final static String ProxyPass = "password";
// 隧道域名、端口号
final static String ProxyHost = "XXX.XXX.com";
final static Integer ProxyPort = 15818;
public static String getUrlProxyContent(String url) {
Authenticator.setDefault(new Authenticator() {
public PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(ProxyUser, ProxyPass.toCharArray());
}
});
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(ProxyHost, ProxyPort));
try {
// 此处自己处理异常、其他参数等
Document doc = Jsoup.connect(url).followRedirects(false).timeout(3000).proxy(proxy).get();
if (doc != null) {
System.out.println(doc.body().html());
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) throws Exception {
// 目标网站
String targetUrl = "https://dev.kdlapi.com/testproxy";
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
getUrlProxyContent(targetUrl);
}
}
hutool
使用hutool发起请求
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 依赖包下载:
hutool-all-5.5.4.jar
import java.net.Authenticator;
import java.net.PasswordAuthentication;
import cn.hutool.http.HttpResponse;
import cn.hutool.http.HttpRequest;
// 代理验证信息
class ProxyAuthenticator extends Authenticator {
private String user, password;
public ProxyAuthenticator(String user, String password) {
this.user = user;
this.password = password;
}
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(user, password.toCharArray());
}
}
public class TestProxyHutool {
// 用户名密码, 若已添加白名单则不需要添加
final static String ProxyUser = "username";
final static String ProxyPass = "password";
// 隧道域名、端口号
final static String ProxyHost = "XXX.XXX.com";
final static Integer ProxyPort = 15818;
public static void main(String[] args) {
// 目标网站
String url = "https://dev.kdlapi.com/testproxy";
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
// 设置请求验证信息
Authenticator.setDefault(new ProxyAuthenticator(ProxyUser, ProxyPass));
// 发送请求
HttpResponse result = HttpRequest.get(url)
.setHttpProxy(ProxyHost, ProxyPort)
.timeout(20000)//设置超时,毫秒
.execute();
System.out.println(result.body());
}
}
selenium-java
selenium-java(IP白名单,推荐)
使用提示
- 基于白名单方式使用selenium-java认证代理
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应)
- 依赖下载:
selenium-java-4.1.2.jar
- 请注意替换代码中的部分信息:
${tunnelhost:tunnelport}:隧道域名:端口号,如:"tpsXXX.XXX.com:15818"
${chromedriver_path}:您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
import org.openqa.selenium.By;
import org.openqa.selenium.Proxy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
public class TestProxySelenium {
public static void main(String[] args) throws InterruptedException {
// 目标网站
String targetUrl = "https://dev.kdlapi.com/testproxy";
// 隧道域名: 端口号
String proxyServer = "${tunnelhost:tunnelport}";
// 创建webdriver驱动,设置代理
System.setProperty("webdriver.chrome.driver", "${chromedriver_path}"); // webdriver驱动路径
Proxy proxy = new Proxy().setHttpProxy(proxyServer).setSslProxy(proxyServer);
ChromeOptions options = new ChromeOptions();
// 如果是无界面模式,需要取消注释以下三行
// options.addArguments("--headless");
// options.addArguments("--no-sandbox");
// options.addArguments("--disable-dev-shm-usage");
options.setProxy(proxy);
WebDriver driver = new ChromeDriver(options);
// 发起请求
driver.get(targetUrl);
WebElement element = driver.findElement(By.xpath("/html"));
String resText = element.getText().toString();
System.out.println(resText);
Thread.sleep(3000);
// 关闭webdriver
driver.quit();
}
}
selenium-java(用户名密码认证)
使用提示
- 基于用户名密码方式使用selenium-java认证代理,仅支持图形化界面
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应)
- 依赖下载:
selenium-java-4.1.2.jar
- 请注意替换代码中的部分信息:
${tunnelhost}:隧道域名,如:"tpsXXX.XXX.com"
${tunnelport}:隧道端口号,如:"15818"
${username}:用户名
${password}:密码
${chromedriver_path}:您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.util.zip.ZipOutputStream;
import java.util.zip.ZipEntry;
public class TestProxySelenium {
public static void main(String[] args) throws InterruptedException {
// 目标网站
String targetUrl = "http://dev.kdlapi.com/testproxy";
// 创建webdriver驱动,设置代理
System.setProperty("webdriver.chrome.driver", "${chromedriver_path}"); // webdriver驱动路径
File fileBackground = new File("./background.js");
File fileManifest = new File("./manifest.json");
try {
fileBackground.createNewFile();
fileManifest.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
String stringBackground = "var config = {\r\n"
+ " mode: \"fixed_servers\",\r\n"
+ " rules: {\r\n"
+ " singleProxy: {\r\n"
+ " scheme: \"http\",\r\n"
+ " host: \"${tunnelhost}\",\r\n" // 隧道域名
+ " port: parseInt(${tunnelport})\r\n" // 隧道端口号
+ " },\r\n"
+ " bypassList: [\"localhost\"]\r\n"
+ " }\r\n"
+ "};\r\n"
+ "chrome.proxy.settings.set({value: config, scope: \"regular\"}, function() {});\r\n"
+ "function callbackFn(details) {\r\n"
+ " return {\r\n"
+ " authCredentials: {\r\n"
+ " username: \"${username}\",\r\n" // 用户名
+ " password: \"${password}\"\r\n" // 密码
+ " }\r\n"
+ " };\r\n"
+ "}\r\n"
+ "chrome.webRequest.onAuthRequired.addListener(\r\n"
+ " callbackFn,\r\n"
+ " {urls: [\"<all_urls>\"]},\r\n"
+ " ['blocking']\r\n"
+ ");\r\n";
String stringManifest = "{\r\n"
+ "\"version\": \"1.0.0\",\r\n"
+ "\"manifest_version\": 2,\r\n"
+ "\"name\": \"Chrome Proxy\",\r\n"
+ "\"permissions\": [\r\n"
+ "\"proxy\",\r\n"
+ "\"tabs\",\r\n"
+ "\"unlimitedStorage\",\r\n"
+ "\"storage\",\r\n"
+ "\"<all_urls>\",\r\n"
+ "\"webRequest\",\r\n"
+ "\"webRequestBlocking\"\r\n"
+ "],\r\n"
+ "\"background\": {\r\n"
+ "\"scripts\": [\"background.js\"]\r\n"
+ "},\r\n"
+ "\"minimum_chrome_version\":\"22.0.0\"\r\n"
+ "}\r\n";
try (FileWriter fileWriterBackground = new FileWriter(fileBackground); FileWriter fileWriterManifest = new FileWriter(fileManifest)) {
fileWriterBackground.write(stringBackground);
fileWriterManifest.write(stringManifest);
} catch (IOException e) {
e.printStackTrace();
}
File[] srcFiles = { fileBackground, fileManifest };
File zipFile = new File("./proxy.zip");
try {
zipFile.createNewFile();
FileOutputStream fileOutputStream = new FileOutputStream(zipFile);
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
FileInputStream fileInputStream = null;
ZipEntry zipEntry = null;
for (int i = 0; i < srcFiles.length; i++) {
fileInputStream = new FileInputStream(srcFiles[i]);
zipEntry = new ZipEntry(srcFiles[i].getName());
zipOutputStream.putNextEntry(zipEntry);
int len;
byte[] buffer = new byte[1024];
while ((len = fileInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, len);
}
}
zipOutputStream.closeEntry();
zipOutputStream.close();
fileInputStream.close();
fileOutputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
ChromeOptions options = new ChromeOptions();
options.addExtensions(new File("./proxy.zip"));
WebDriver driver = new ChromeDriver(options);
// 发起请求
driver.get(targetUrl);
WebElement element = driver.findElement(By.xpath("/html"));
String resText = element.getText().toString();
System.out.println(resText);
Thread.sleep(3000);
// 关闭webdriver
driver.quit();
}
}
resttemplate
RestTemplate
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 依赖包下载:
httpclient-4.5.6.jar
httpcore-4.4.10.jar
commons-codec-1.10.jar
commons-logging-1.2.jar
spring-web-5.2.24.jar
spring-beans-5.2.24.jar
spring-core-5.2.24.jar
spring-jcl-5.2.24.jar
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.client.ProxyAuthenticationStrategy;
import org.springframework.http.client.HttpComponentsClientHttpRequestFactory;
import org.springframework.web.client.RestTemplate;
public class TestProxyRestTemplate {
// 目标网站
private static String pageUrl = "https://dev.kdlapi.com/testproxy";
// 隧道域名、端口号
private static String proxyHost = "XXX.XXX.com";
private static Integer proxyPort = 15818;
// 用户名密码, 若已添加白名单则不需要添加
private static String ProxyUser = "username";
private static String Proxypass = "password";
public static void main(String[] args) {
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
new AuthScope(proxyHost, proxyPort),
new UsernamePasswordCredentials(ProxyUser, Proxypass)
);
HttpHost proxy = new HttpHost(proxyHost, proxyPort);
HttpClientBuilder clientBuilder = HttpClientBuilder.create();
clientBuilder.useSystemProperties();
clientBuilder.setProxy(proxy);
clientBuilder.setDefaultCredentialsProvider(credsProvider);
clientBuilder.setProxyAuthenticationStrategy(new ProxyAuthenticationStrategy());
CloseableHttpClient client = clientBuilder.build();
HttpComponentsClientHttpRequestFactory factory = new HttpComponentsClientHttpRequestFactory();
factory.setHttpClient(client);
RestTemplate restTemplate = new RestTemplate();
restTemplate.setRequestFactory(factory);
String result = restTemplate.getForObject(pageUrl, String.class);
System.out.println(result);
}
}
playwright
playwright
使用提示
- 基于白名单方式使用playwright认证代理
- 添加pom.xml依赖
- 请注意替换代码中的部分信息:
${ip:port}:代理IP:端口号,如:"59.38.241.25:23916"
// pom.xml中添加playwright依赖
<dependencies>
<dependency>
<groupId>com.microsoft.playwright</groupId>
<artifactId>playwright</artifactId>
<version>1.35.0</version>
</dependency>
</dependencies>
package org.example;
import com.microsoft.playwright.*;
public class App {
// 目标网站
private static String pageUrl = "https://dev.kdlapi.com/testproxy";
// 用户名密码认证(隧道代理)
private static String tunnelHost = "tunnelhost";
private static String tunnelPort = "tunnelport";
private static String ProxyUser = "username";
private static String Proxypass = "password";
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
Browser browser = playwright.chromium().launch();
BrowserContext context = browser.newContext(new Browser.NewContextOptions()
.setProxy(String.format("http://%s:%s", tunnelHost, tunnelPort))
.setHttpCredentials(ProxyUser, Proxypass));
Page page = context.newPage();
Response response = page.navigate(pageUrl);
System.out.println("响应为:" + response.text());
}
}
}
GoLang
标准库
标准库
// 请求隧道服务器
// http和https网页均适用
package main
import (
"compress/gzip"
"fmt"
"io"
"io/ioutil"
"net/http"
"net/url"
"os"
)
func main() {
// 用户名密码, 若已添加白名单则不需要添加
username := "username"
password := "password"
// 隧道服务器
proxy_raw := "XXX.XXX.com:15818"
proxy_str := fmt.Sprintf("http://%s:%s@%s", username, password, proxy_raw)
proxy, err := url.Parse(proxy_str)
// 目标网页
page_url := "http://dev.kdlapi.com/testproxy"
// 请求目标网页
client := &http.Client{Transport: &http.Transport{Proxy: http.ProxyURL(proxy)}}
req, _ := http.NewRequest("GET", page_url, nil)
req.Header.Add("Accept-Encoding", "gzip") //使用gzip压缩传输数据让访问更快
res, err := client.Do(req)
if err != nil {
// 请求发生异常
fmt.Println(err.Error())
} else {
defer res.Body.Close() //保证最后关闭Body
fmt.Println("status code:", res.StatusCode) // 获取状态码
// 有gzip压缩时,需要解压缩读取返回内容
if res.Header.Get("Content-Encoding") == "gzip" {
reader, _ := gzip.NewReader(res.Body) // gzip解压缩
defer reader.Close()
io.Copy(os.Stdout, reader)
os.Exit(0) // 正常退出
}
// 无gzip压缩, 读取返回内容
body, _ := ioutil.ReadAll(res.Body)
fmt.Println(string(body))
}
}
CSharp
标准库
标准库
使用提示
- http和https网页均可适用
- HttpWebRequest 在使用隧道代理可能会出现复用之前建立的连接导致不能正常更换IP,可在创建 HttpWebRequest对象后,使用 request.KeepAlive = false
using System;
using System.Text;
using System.Net;
using System.IO;
using System.IO.Compression;
namespace csharp_http
{
class Program
{
static void Main(string[] args)
{
// 要访问的目标网页
string page_url = "http://dev.kdlapi.com/testproxy";
// 构造请求
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(page_url);
request.Method = "GET";
request.Headers.Add("Accept-Encoding", "Gzip"); // 使用gzip压缩传输数据让访问更快
// request.KeepAlive = false // 出现复用之前的IP时使用
// 隧道域名、端口号
string tunnelhost = "XXX.XXX.com";
int tunnelport = 15818;
// 用户名密码, 若已添加白名单则不需要添加
string username = "username";
string password = "password";
// 设置代理 <IP白名单>
// request.Proxy = new WebProxy(tunnelhost, tunnelport);
// 设置代理 <用户名密码>
WebProxy proxy = new WebProxy();
proxy.Address = new Uri(String.Format("http://{0}:{1}", tunnelhost, tunnelport));
proxy.Credentials = new NetworkCredential(username, password);
request.Proxy = proxy;
// 请求目标网页
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine((int)response.StatusCode); // 获取状态码
// 解压缩读取返回内容
using (StreamReader reader = new StreamReader(new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))) {
Console.WriteLine(reader.ReadToEnd());
}
}
}
}
Node.js
标准库(http+url)
标准库(http,https均适用)
const http = require("http"); // 引入内置http模块
const url = require("url");
// 要访问的目标页面
const targetUrl = "http://dev.kdlapi.com/testproxy";
const urlParsed = url.parse(targetUrl);
// 隧道域名
const proxyIp = "XXX.XXX.com"; // 隧道服务器域名
const proxyPort = "15818"; // 端口号
// 用户名密码, 若已添加白名单则不需要添加
const username = "username";
const password = "password";
const base64 = new Buffer.from(username + ":" + password).toString("base64");
const options = {
host : proxyIp,
port : proxyPort,
path : targetUrl,
method : "GET",
headers : {
"Host" : urlParsed.hostname,
"Proxy-Authorization" : "Basic " + base64
}
};
http.request(options, (res) => {
console.log("got response: " + res.statusCode);
// 输出返回内容(使用了gzip压缩)
if (res.headers['content-encoding'] && res.headers['content-encoding'].indexOf('gzip') != -1) {
let zlib = require('zlib');
let unzip = zlib.createGunzip();
res.pipe(unzip).pipe(process.stdout);
} else {
// 输出返回内容(未使用gzip压缩)
res.pipe(process.stdout);
}
})
.on("error", (err) => {
console.log(err);
})
.end()
;
标准库(http+tls+util)
标准库(适用http和https请求)
let http = require('http'); // 引入内置http模块
let tls = require('tls'); // 引入内置tls模块
let util = require('util');
// 用户名密码, 若已添加白名单则不需要添加
const username = 'username';
const password = 'password';
const auth = 'Basic ' + new Buffer.from(username + ':' + password).toString('base64');
// 隧道服务器域名和端口
let tunnelhost = 'XXX.XXX.com';
let tunnelport = 15818;
// 要访问的主机和路径
let remote_host = 'https://dev.kdlapi.com/testproxy';
let remote_path = '/';
// 发起CONNECT请求
let req = http.request({
host: tunnelhost,
port: tunnelport,
method: 'CONNECT',
path: util.format('%s:443', remote_host),
headers: {
"Host": remote_host,
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36",
"Proxy-Authorization": auth,
"Accept-Encoding": "gzip" // 使用gzip压缩让数据传输更快
}
});
req.on('connect', function (res, socket, head) {
// TLS握手
let tlsConnection = tls.connect({
host: remote_host,
socket: socket
}, function () {
// 发起GET请求
tlsConnection.write(util.format('GET %s HTTP/1.1\r\nHost: %s\r\n\r\n', remote_path, remote_host));
});
tlsConnection.on('data', function (data) {
// 输出响应结果(完整的响应报文串)
console.log(data.toString());
});
});
req.end();
request
request
let request = require('request'); // 引入第三方request库
let util = require('util');
let zlib = require('zlib');
// 用户名密码, 若已添加白名单则不需要添加
const username = 'username';
const password = 'password';
// 要访问的目标地址
let page_url = 'https://dev.kdlapi.com/testproxy'
// 隧道服务器域名和端口
let tunnelhost = 'XXX.XXX.com';
let tunnelport = 15818;
// 完整隧道服务器url
let proxy = util.format('http://%s:%s@%s:%d', username, password, tunnelhost, tunnelport);
// 发起请求
request({
url: page_url,
method: 'GET',
proxy: proxy,
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36",
"Accept-Encoding": "gzip" // 使用gzip压缩让数据传输更快
},
encoding: null, // 方便解压缩返回的数据
}, function(error, res, body) {
if (!error && res.statusCode == 200) {
// 输出返回内容(使用了gzip压缩)
if (res.headers['content-encoding'] && res.headers['content-encoding'].indexOf('gzip') != -1) {
zlib.gunzip(body, function(err, dezipped) {
console.log(dezipped.toString());
});
} else {
// 输出返回内容(没有使用gzip压缩)
console.log(body);
}
} else {
console.log(error);
}
});
puppeteer
puppeteer(IP白名单)
使用提示
- 基于用户名密码认证的http/https代理Puppeteer
- 运行环境要求: node7.6.0或以上 + puppeteer
- 请先安装puppeteer:
npm i puppeteer
// 引入puppeteer模块
const puppeteer = require('puppeteer');
// 要访问的目标网页
const url = 'http://dev.kuaidaili.com/testproxy';
// 添加headers
const headers = {
'Accept-Encoding': 'gzip' // 使用gzip压缩让数据传输更快
};
// 隧道服务器域名和端口
let tunnelhost = 'XXX.XXX.com'
let tunnelport = 15818
(async ()=> {
// 新建一个浏览器实例
const browser = await puppeteer.launch({
headless: false, // 是否不显示窗口, 默认为true, 设为false便于调试
args: [
`--proxy-server=${tunnelhost}:${tunnelport}`,
'--no-sandbox',
'--disable-setuid-sandbox'
]
});
// 打开一个新页面
const page = await browser.newPage();
// 设置headers
await page.setExtraHTTPHeaders(headers);
// 访问目标网页
await page.goto(url);
})();
puppeteer(用户名密码认证)
使用提示
- 基于白名单的http/https代理Puppeteer
- 运行环境要求: node7.6.0或以上 + puppeteer
- 请先安装puppeteer:
npm i puppeteer
// 引入puppeteer模块
const puppeteer = require('puppeteer');
// 要访问的目标网页
const url = 'http://dev.kuaidaili.com/testproxy';
// 添加headers
const headers = {
'Accept-Encoding': 'gzip' // 使用gzip压缩让数据传输更快
};
// 隧道服务器域名和端口
let tunnelhost = 'XXX.XXX.com'
let tunnelport = 15818
// 用户名密码 (可到会员中心查看)
const username = 'username';
const password = 'password';
(async ()=> {
// 新建一个浏览器实例
const browser = await puppeteer.launch({
headless: false, // 是否不显示窗口, 默认为true, 设为false便于调试
args: [
`--proxy-server=${tunnelhost}:${tunnelport}`,
'--no-sandbox',
'--disable-setuid-sandbox'
]
});
// 打开一个新页面
const page = await browser.newPage();
// 设置headers
await page.setExtraHTTPHeaders(headers);
// 用户民密码认证
await page.authenticate({username: username, password: password});
// 访问目标网页
await page.goto(url);
})();
axios
axios
使用提示
- 请先安装
axios
和https-proxy-agent
库:npm install axios https-proxy-agent
const axios = require('axios');
// https-proxy-agent 6.0.0 及以上版本用法
const {HttpsProxyAgent} = require("https-proxy-agent");
// https-proxy-agent 6.0.0 以下版本用法
// const HttpsProxyAgent = require("https-proxy-agent");
// 隧道域名和端口
let tunnelHost = 'XXX.XXX.com'
let tunnelPort = '15818'
// 配置用户名和密码
let username = 'username'
let password = 'password'
axios({
url: 'https://dev.kdlapi.com/testproxy',
method: "get",
httpAgent: new HttpsProxyAgent(`http://${username}:${password}@${tunnelHost}:${tunnelPort}`),
httpsAgent: new HttpsProxyAgent(`http://${username}:${password}@${tunnelHost}:${tunnelPort}`),
}).then(
res => {
console.log(res.data);
}
).catch(err => {
console.log(err);
})
websocket
websocket
使用提示
- 请先安装
ws
和https-proxy-agent
库:npm install ws https-proxy-agent
const WebSocket = require('ws');
// https-proxy-agent 6.0.0 及以上版本用法
const {HttpsProxyAgent} = require("https-proxy-agent");
// https-proxy-agent 6.0.0 以下版本用法
// const HttpsProxyAgent = require("https-proxy-agent");
// 隧道域名和端口
let tunnelHost = 'XXX.XXX.com'
let tunnelPort = '15818'
// 配置用户名和密码
let username = 'username'
let password = 'password'
const target = 'ws://echo.websocket.events/';
const agent = new HttpsProxyAgent(`http://${username}:${password}@${tunnelHost}:${tunnelPort}`);
const socket = new WebSocket(target, {agent});
socket.on('open', function () {
console.log('"open" event!');
socket.send('hello world');
});
socket.on('message', function (data, flags) {
console.log('"message" event!', data, flags);
socket.close();
});
Ruby
net/http
net/http(IP白名单)
# -*- coding: utf-8 -*-
require 'net/http' # 引入内置net/http模块
require 'zlib'
require 'stringio'
# 隧道服务器域名和端口
tunnelhost = 'XXX.XXX.com'
tunnelport = 15818
# 要访问的目标网页, 以快代理testproxy页面为例
page_url = "https://dev.kuaidaili.com/testproxy"
uri = URI(page_url)
# 新建代理实例
proxy = Net::HTTP::Proxy(tunnelhost, tunnelport)
# 创建新的请求对象
req = Net::HTTP::Get.new(uri)
# 设置User-Agent
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
req['Accept-Encoding'] = 'gzip' # 使用gzip压缩传输数据让访问更快
# 使用代理发起请求, 若访问的是http网页, 请将use_ssl设为false
res = proxy.start(uri.hostname, uri.port, :use_ssl => true) do |http|
http.request(req)
end
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
net/http(用户名密码认证)
# -*- coding: utf-8 -*-
require 'net/http' # 引入内置net/http模块
require 'zlib'
require 'stringio'
# 隧道服务器域名和端口
tunnelhost = 'XXX.XXX.com'
tunnelport = 15818
# 用户名密码
username = 'username'
password = 'password'
# 要访问的目标网页, 以快代理testproxy页面为例
page_url = "https://dev.kuaidaili.com/testproxy"
uri = URI(page_url)
# 新建代理实例
proxy = Net::HTTP::Proxy(tunnelhost, tunnelport, username, password)
# 创建新的请求对象
req = Net::HTTP::Get.new(uri)
# 设置代理用户名密码认证
req.basic_auth(username, password)
# 设置User-Agent
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
req['Accept-Encoding'] = 'gzip' # 使用gzip压缩传输数据让访问更快
# 使用代理发起请求, 若访问的是http网页, 请将use_ssl设为false
res = proxy.start(uri.hostname, uri.port, :use_ssl => true) do |http|
http.request(req)
end
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
httparty
httparty(IP白名单)
require "httparty" # 引入httparty模块
require 'zlib'
require 'stringio'
# 隧道服务器域名和端口
tunnelhost = 'XXX.XXX.com'
tunnelport = 15818
# 要访问的目标网页, 以快代理testproxy页面为例
page_url = 'https://dev.kuaidaili.com/testproxy'
# 设置headers
headers = {
"User-Agent" => "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Accept-Encoding" => "gzip",
}
# 设置代理
options = {
:headers => headers,
:http_proxyaddr => tunnelhost,
:http_proxyport => tunnelport,
}
# 发起请求
res = HTTParty.get(page_url, options)
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
httparty(用户名密码认证)
require "httparty" # 引入httparty模块
require 'zlib'
require 'stringio'
# 隧道服务器域名和端口
tunnelhost = 'XXX.XXX.com'
tunnelport = 15818
# 用户名密码
username = 'username'
password = 'password'
# 要访问的目标网页,以快代理testproxy页面为例
page_url = 'https://dev.kuaidaili.com/testproxy'
# 设置headers
headers = {
"User-Agent" => "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Accept-Encoding" => "gzip",
}
# 设置代理
options = {
:headers => headers,
:http_proxyaddr => tunnelhost,
:http_proxyport => tunnelport,
:http_proxyuser => username,
:http_proxypass => password,
}
# 发起请求
res = HTTParty.get(page_url, options)
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
php
curl
curl
使用提示
- 此样例同时支持访问http和https网页
- curl不是php原生库,需要安装才能使用:
Ubuntu/Debian系统:apt-get install php5-curl
CentOS系统:yum install php-curl
- 建议关闭HTTP协议的keep-alive功能,避免因连接复用导致隧道不能切换IP。
<?php
//要访问的目标页面
$page_url = "http://dev.kdlapi.com/testproxy";
$ch = curl_init();
$tunnelhost = "tunnelhost";
$tunnelport = "tunnelport";
$proxy = $tunnelhost.":".$tunnelport;
//隧道用户名密码
$username = "username";
$password = "password";
//$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page_url);
//发送post请求
$requestData["post"] = "send post request";
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($requestData));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
//设置代理
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//设置代理用户名密码
curl_setopt($ch, CURLOPT_PROXYAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, "{$username}:{$password}");
//自定义header
$headers = array();
$headers["user-agent"] = 'User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
//自定义cookie
curl_setopt($ch, CURLOPT_COOKIE,'');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip'); //使用gzip压缩传输数据让访问更快
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo "$result"; // 使用请求页面方式执行时,打印变量需要加引号
echo "\n\nfetch ".$info['url']."\ntimeuse: ".$info['total_time']."s\n\n";
?>