666
Table of Contents
Table of Content
How to Debugging a HTTP 400 Bad Request error in Nginx
一些常见的端口号及其用途如下:
21端口:FTP 文件传输服务
22端口:SSH 端口
23端口:TELNET 终端仿真服务
25端口:SMTP 简单邮件传输服务
53端口:DNS 域名解析服务
80端口:HTTP 超文本传输服务
110端口:POP3 “邮局协议版本3”使用的端口
443端口:HTTPS 加密的超文本传输服务
1433端口:MS SQL*SERVER数据库 默认端口号
1521端口:Oracle数据库服务
1863端口:MSN Messenger的文件传输功能所使用的端口
3306端口:MYSQL 默认端口号
3389端口:Microsoft RDP 微软远程桌面使用的端口
5631端口:Symantec pcAnywhere 远程控制数据传输时使用的端口
5632端口:Symantec pcAnywhere 主控端扫描被控端时使用的端口
5000端口:MS SQL Server使用的端口
8000端口:腾讯QQ
Nginx 目录结构
$ cd /etc/nginx
$ ls -l
total 60
drwx------ 2 ubuntu ubuntu 4096 Jun 16 09:27 cert ## ssl证书目录
drwxr-xr-x 2 root root 4096 Jul 12 2017 conf.d
-rw-r--r-- 1 root root 1077 Feb 11 2017 fastcgi.conf
-rw-r--r-- 1 root root 1007 Feb 11 2017 fastcgi_params
-rw-r--r-- 1 root root 2837 Feb 11 2017 koi-utf
-rw-r--r-- 1 root root 2223 Feb 11 2017 koi-win
-rw-r--r-- 1 root root 3957 Feb 11 2017 mime.types
-rw-r--r-- 1 root root 1501 Aug 31 07:42 nginx.conf ## 配置文件
-rw-r--r-- 1 root root 180 Feb 11 2017 proxy_params
-rw-r--r-- 1 root root 636 Feb 11 2017 scgi_params
drwxr-xr-x 2 root root 4096 Aug 31 09:42 sites-available ## 虚拟主机配置代理目录
drwxr-xr-x 2 root root 4096 Jun 15 06:39 sites-enabled ## 启动配置代理目录
drwxr-xr-x 2 root root 4096 Jun 4 06:03 snippets
-rw-r--r-- 1 root root 664 Feb 11 2017 uwsgi_params
-rw-r--r-- 1 root root 3071 Feb 11 2017 win-utf
Start, Stop, and Restart Nginx with systemctl
To check the Status:
$ sudo systemctl status nginx
To start Nginx:
$ sudo systemctl start nginx
To stop Nginx:
$ sudo systemctl stop nginx
To enable Nginx at boot:
$ sudo systemctl enable nginx
To disable Nginx at boot:
$ sudo systemctl disable nginx
To reload the Nginx service (used to apply configuration changes):
$ sudo systemctl reload nginx
To hard restart of Nginx:
$ sudo systemctl restart nginx
nginx配置防止爬虫
方案1:站点根目录下存放robots.txt文件
方案2:
Nginx可以根据User-Agent过滤请求,只需要在需要URL入口位置通过一个简单的正则表达式就可以过滤不符合要求的爬虫请求:
location / {
if ($http_user_agent ~* "python|curl|java|wget|httpclient|okhttp") {
return 503;
}
# 正常处理
...
}
变量$http_user_agent是一个可以直接在location中引用的Nginx变量。~*表示不区分大小写的正则匹配,通过python就可以过滤掉80%的Python爬虫。
step 1:做一个爬虫的配置文件,里面包含爬虫策略:
#禁止Scrapy等工具的抓取
if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {
return 403;
}
#禁止指定UA及UA为空的访问
if ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) {
return 403;
}
#禁止非GET|HEAD|POST方式的抓取
if ($request_method !~ ^(GET|HEAD|POST)$) {
return 403;
}
#屏蔽单个IP的命令是
#deny 123.45.6.7
#封整个段即从123.0.0.1到123.255.255.254的命令
#deny 123.0.0.0/8
#封IP段即从123.45.0.1到123.45.255.254的命令
#deny 124.45.0.0/16
#封IP段即从123.45.6.1到123.45.6.254的命令是
#deny 123.45.6.0/24
other
#禁止Scrapy等工具的抓取
if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {
return 403;
}
#禁止指定UA及UA为空的访问
if ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|
FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|
CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|
Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|
lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|
YandexBot|FlightDeckReports|Linguee Bot|^$" ) {
return 403;
}
使用curl -A 模拟抓取即可,比如:
curl -I -A 'YYSpider' www.haoeasy.cn
output
[root@izwz93bcx7adgtozg4rvanz conf]# curl -I -A 'YYSpider' www.haoeasy.cn
HTTP/1.1 403 Forbidden
Server: nginx/1.12.0
Date: Wed, 24 Apr 2019 11:35:21 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
http_user_agent
step 2:把这些爬虫的信息,加入到nginx的配置文件中,在80端口和443端口都做配置。
server {
listen 80;
server_name www.wulaoer.org wulaoer.org;
index index.html index.htm index.php;
...................;
include enable-php.conf;
include /usr/local/nginx/conf/anti_spider.conf; #爬虫配置文件
server {
listen 443 ssl;
server_name www.wulaoer.org wulaoer.org;
index index.html index.htm index.php;
..................;
include enable-php.conf;
include /usr/local/nginx/conf/anti_spider.conf; #爬虫配置文件
step 3:重启nginx,重启后进行一下验证
[root@wulaoer ~]# curl -I -A "Scrapy" www.wulaoer.org
HTTP/1.1 403 Forbidden
Server: nginx
Date: Tue, 16 Mar 2021 03:09:17 GMT
Content-Type: text/html
Content-Length: 146
Connection: keep-alive
UA类型
FeedDemon 内容采集
BOT/0.1 (BOT for JCE) sql注入
CrawlDaddy sql注入
Java 内容采集
Jullo 内容采集
Feedly 内容采集
UniversalFeedParser 内容采集
ApacheBench cc攻击器
Swiftbot 无用爬虫
YandexBot 无用爬虫
AhrefsBot 无用爬虫
YisouSpider 无用爬虫(已被UC神马搜索收购,此蜘蛛可以放开!)
jikeSpider 无用爬虫
MJ12bot 无用爬虫
ZmEu phpmyadmin 漏洞扫描
WinHttp 采集cc攻击
EasouSpider 无用爬虫
HttpClient tcp攻击
Microsoft URL Control 扫描
YYSpider 无用爬虫
jaunty wordpress爆破扫描器
oBot 无用爬虫
Python-urllib 内容采集
Indy Library 扫描
FlightDeckReports Bot 无用爬虫
Linguee Bot 无用爬虫
location / {
if ($request ~* (Scrapy|Curl|blogspot)) {
return 403;
}
GET /common/SetLanguage?culture=en&returnUrl=http://b23finaciali.blogspot.com/ HTTP/1.1" 403 656 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
nginx HTTP Return Codes
Constants
HTTP Return Codes
These constants are an easy way to reference the HTTP return codes in your modules
Option Value
NGX_HTTP_CONTINUE 100
NGX_HTTP_SWITCHING_PROTOCOLS 101
...
more detail:
https://www.nginx.com/resources/wiki/extending/api/http/
How to Debug in Nginx
Once the client's IP address was identified I enabled the "debug mode" for this particular IP. Nginx allows to set a certain IP address or range into debug mode by using the "debug_connection" parameter in the events context. This context is usually found in /etc/nginx/nginx.conf:
events {
# Debugging a certain IP
debug_connection 192.168.55.12; # client getting http 400 errors
}
https://nginx.org/en/docs/ngx_core_module.html#debug_connection
How to fix nginx throws 400 bad request headers?
telnet serverip 80
会造成400错误。
https://yq.aliyun.com/articles/483987
When nginx returns 400 (Bad Request) it will log the reason into error log, at "info" level.
Yes changing the error_to debug level (edit /etc/nginx/nginx.conf ).
The second parameter determines the level of logging, and can be one of the following: debug, info, notice, warn, error, crit, alert, or emerg.
in /etc/nginx/nginx.conf, you can put at the beginning of the file the line
error_log /var/log/nginx/error.log debug;
And then restart nginx:
sudo service nginx restart
That way you can detail what nginx is doing and why it is returning the status code 400.
2017/02/08 22:32:24 [debug] 1322#1322: *1 connect to unix:///run/uwsgi/app/socket, fd:20 #2
2017/02/08 22:32:24 [debug] 1322#1322: *1 connected
2017/02/08 22:32:24 [debug] 1322#1322: *1 http upstream connect: 0
2017/02/08 22:32:24 [debug] 1322#1322: *1 posix_memalign: 0000560E1F25A2A0:128 @16
2017/02/08 22:32:24 [debug] 1322#1322: *1 http upstream send request
2017/02/08 22:32:24 [debug] 1322#1322: *1 http upstream send request body
2017/02/08 22:32:24 [debug] 1322#1322: *1 chain writer buf fl:0 s:454
2017/02/08 22:32:24 [debug] 1322#1322: *1 chain writer in: 0000560E1F2A0928
2017/02/08 22:32:24 [debug] 1322#1322: *1 writev: 454 of 454
2017/02/08 22:32:24 [debug] 1322#1322: *1 chain writer out: 0000000000000000
2017/02/08 22:32:24 [debug] 1322#1322: *1 event timer add: 20: 60000:1486593204249
2017/02/08 22:32:24 [debug] 1322#1322: *1 http finalize request: -4, "/?" a:1, c:2
2017/02/08 22:32:24 [debug] 1322#1322: *1 http request count:2 blk:0
2017/02/08 22:32:24 [debug] 1322#1322: *1 post event 0000560E1F2E5DE0
2017/02/08 22:32:24 [debug] 1322#1322: *1 post event 0000560E1F2E5E40
2017/02/08 22:32:24 [debug] 1322#1322: *1 delete posted event 0000560E1F2E5DE0
2017/02/08 22:32:24 [debug] 1322#1322: *1 http run request: "/?"
2017/02/08 22:32:24 [debug] 1322#1322: *1 http upstream check client, write event:1, "/"
2017/02/08 22:32:24 [debug] 1322#1322: *1 http upstream recv(): -1 (11: Resource temporarily unavailable)
How to Customize Nginx Web Logs
Setting Up the CLF on Nginx.Make sure to place your CLF at the beginning of the http {} block:
/etc/nginx/nginx.conf
http {
log_format myclf '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" "$gzip_ratio"';
...
...
}
$remote_addr is the IP address of the visitor
$remote_user is the authenticated user (if any)
$time_local is time of the request
$request is the first line of the request
$status is the HTTP status of the request
$body_bytes_sent is the size (in bytes) of server's response
$http_referer is the referrer URL
$http_user_agent detects the user agent used by the client
A real life request logged by this configuration would look like this:
201.217.xx.xx - - [01/Oct/2015:08:46:48 -0400] "HEAD /wp-login.php HTTP/1.1" 200 0 "http://wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fwordpress.com%2Fwp-admin%2F&reauth=1" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
201.217.xx.xx is the IP address from the visitor.
[01/Oct/2015:08:46:48 -0400] is the time of the request.
HEAD /wp-login.php HTTP/1.1 is the first line of the requested URL.
200 is the HTTP status code, which is OK in this case.
http://wordpress.com/wp-login… is the referrer URL
Mozilla/5.0 (X11; Linux x86_64… is the User Agent, from which the request came.
In this case, “myclf” was used in the log configuration; this will be useful in the next step.
Finally, inside the server {} block, you can define the access log as usual. At the end, add the name of the CLF you created before:
server {
access_log /spool/logs/nginx-access.log myclf;
...
}
Restart Nginx to apply the changes:
service nginx restart
How to Modify Default Html for Welcome to nginx!
我们通过http://IP:Port/访问nginx时,出现welcome to nginx的界面,默认这是正常的。这个文件在/usr/share/nginx/html/index.html
修改缺省页提示信息
/usr/share/nginx/html/index.html
Welcome to nginx
If you see this page, the nginx web server is successfully installed and working. Further configuration is required.
For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.
Thank you for using nginx.
访问出现下面的页面,说明域名转发未配置。
nginx会在已配置的server 章节找你输入的域名,如果没找到,会显示下面的信息。这个信息由缺省的服务器提供:
# Default server configuration
#
server {
listen 80 default_server;
listen [::]:80 default_server;
..
}
80端口缺省页面
/var/www/html/index.nginx-debian.html
502 Bad Gateway
如果配置了域名转发,但提示下面的信息,说明nginx和后台的网站没联系上。只有两种情况:
1.nginx转发的端口和后台web server监听的端口不一致。
2.http://0.0.0.0和http://localhost,前者走外网,可能被防火墙阻止了。后者走本地机器。
502 Bad Gateway
502 Bad Gateway
put 出现400错误
生产环境用的nginx配置是域名,而预生产环境用的是IP+端口,除此之外没有任何区别.
I'm running .asp net core 3.x Web api behind Nginx.
When nginx returns 400 (Bad Request) it will log the reason into error log, at "info" level.
step 1:查看日志
/var/log/nginx/api.ggg.com.log
h-"api.ggg.com" -106.156.193.134 for - - - [28/Aug/2020:06:31:21 +0800] "GET /api/EmployeeRole/5f45306a50b21d800c2f0367 HTTP/1.1" "api.usdotnet.com" https:443 200 316 "-" Upstream ["127.0.0.1:9100" (0.020) 200 : -] "-" "-" - -
h-"api.ggg.com" -106.156.193.134 for - - - [28/Aug/2020:06:31:25 +0800] "PUT /api/EmployeeRole HTTP/1.1" "api.usdotnet.com" https:443 400 0 "-" Upstream ["127.0.0.1:9100" (0.004) 400 : -] "-" "-" - -
put 出现400错误
HTTP400错误
谷歌浏览器问题
解决HTTP 400错误的方法
好长时间以来,野草在访问野草博客时就经常遇到HTTP400错误,现象是访问野草的个人门户一切正常,访问别人的博客也一切正常,今天又发现在谷歌浏览器chrome里访问野草博客遇到HTTP400错误,但使用其他浏览器访问野草博客却一切正常,遇到的错误提示如下:400 Bad Request nginx/0.8.15
野草之前搜索后,发现很多人都说这个错误与DNS有关,于是野草把自己的DNS折腾了好多遍,改用Google的DNS后还会遇到HTTP400错误,改用电信的DNS也还会遇到HTTP400错误,改成自动获取DNS服务器地址以后还是会遇到HTTP400错误,真是把野草郁闷的要死。
今天万般无奈之下,向和讯博客管理员求助,因为野草实在是怀疑野草博客所在的和讯服务器是不是出了啥问题,然后,野草得到了和讯博客管理员的提示,也就是本文要分享的解决HTTP400错误的方法:
先删除一下浏览器中的cookies,操作方法是:在浏览器中点击“Internet选项”然后再点击删除cookies。再尝试通过博客首页登录。
野草按照这个思路,尝试清除了谷歌浏览器chrome的浏览记录以后,发现果然彻底解决了野草访问时经常遇到的HTTP400错误的问题。
就这么简单!搞了半天,还是客户端的问题。希望这个方法能够帮到大家解决HTTP400错误的问题。