
如何做搜索引擎蜘蛛日志分析
搜索引擎蜘蛛日志文件是一种非常强大但未被站长充分利用的文件,分析它可以获取有关每个搜索引擎如何爬取网站内容的相关信息点,及查看搜索引擎蜘蛛在一段时间内的行为。
IP地址(247) | 服务器名称 | 所属国家 |
---|---|---|
45.61.92.114 | ? | CA |
23.109.195.101 | 23.109.195.101 | NL |
149.71.246.77 | 149.71.246.77 | DE |
23.109.193.136 | 23.109.193.136 | NL |
85.209.78.108 | 85.209.78.108 | GB |
45.56.135.60 | srv60.mailer-static.whitelistmaildomain.net | US |
45.56.133.159 | mail-srv45-56-133-159.host.whoisthismail.net | US |
45.129.235.25 | 45.129.235.25 | NL |
107.181.157.207 | 107.181.157.207 | GB |
45.93.129.217 | 45.93.129.217 | RO |
174.140.201.109 | ? | JP |
185.214.196.145 | ? | FR |
155.254.59.6 | ? | GB |
155.254.50.190 | 155.254.50.190 | GB |
173.211.16.77 | 173.211.16.77.rdns.colocationamerica.com | JP |
107.181.157.83 | 107.181.157.83 | GB |
185.181.122.107 | 185.181.122.107 | DE |
45.80.63.134 | 45.80.63.134 | GB |
45.93.129.212 | 45.93.129.212 | ? |
185.135.212.139 | 185.135.212.139 | GB |
155.254.50.239 | ? | GB |
155.254.50.240 | ? | GB |
154.30.105.177 | ? | US |
23.109.197.62 | ? | NL |
23.109.195.95 | ? | NL |
193.254.54.88 | ? | ? |
174.140.202.82 | ? | JP |
188.208.222.79 | ? | GB |
173.211.16.216 | 173.211.16.216.rdns.colocationamerica.com | JP |
45.129.235.234 | 45.129.235.234 | NL |
155.254.56.174 | ? | GB |
185.181.112.54 | ? | DE |
38.18.20.87 | ? | US |
158.222.119.82 | host.sindad.net | US |
155.254.58.13 | 155.254.58.13 | GB |
154.3.178.88 | 154.3.178.88 | CA |
209.147.81.145 | ? | US |
85.209.79.57 | ? | GB |
158.222.117.172 | host.sindad.net | US |
38.18.27.85 | ? | US |
45.41.130.88 | src088.host.sendmailnice.com | US |
104.232.222.99 | ? | ZA |
23.109.191.107 | ? | NL |
158.222.127.240 | host.sindad.net | US |
45.41.128.147 | src45-41-128-147.mail.berlin-business-school.org | US |
157.97.126.21 | ? | US |
45.93.131.34 | ? | RO |
185.214.199.6 | ? | FR |
173.211.29.83 | ? | JP |
185.214.199.248 | ? | FR |
37.35.46.73 | ? | RO |
45.56.135.44 | srv44.mailer-static.whitelistmaildomain.net | US |
154.3.180.161 | ? | CA |
45.41.129.117 | mail-static117.mailer-static.mailinatorlabs.com | US |
154.60.70.57 | ? | GB |
185.181.120.111 | ? | DE |
158.222.113.3 | host.sindad.net | US |
188.208.222.157 | ? | GB |
185.182.235.236 | 185.182.235.236 | ? |
185.182.235.180 | 185.182.235.180 | DE |
185.182.232.54 | ? | DE |
174.140.202.212 | 174.140.202.212.rdns.colocationamerica.com | JP |
45.65.79.116 | ? | FR |
185.214.198.217 | ? | FR |
154.30.110.214 | ? | US |
157.97.124.193 | ? | US |
185.181.120.223 | ? | DE |
23.109.193.61 | ? | NL |
45.56.133.86 | mail-srv45-56-133-86.host.whoisthismail.net | US |
45.56.135.166 | srv166.mailer-static.whitelistmaildomain.net | US |
154.60.70.255 | ? | GB |
167.160.50.11 | ? | US |
207.199.173.3 | ? | US |
89.35.89.111 | ? | NL |
185.181.122.173 | ? | DE |
可以考虑拦截。。爬虫通常会下载公开的互联网内容,这些内容默认情况下可以免费访问。不过,如果你不希望你的内容被用于未经授权的目的,你应该拦截它们。
您可以通过在网站的 robots.txt 中设置用户代理访问规则来屏蔽 Dormouse 或限制其访问权限。我们建议安装 Spider Analyser 插件,以检查它是否真正遵循这些规则。
# robots.txt # 下列代码一般情况可以拦截该代理 User-agent: Dormouse Disallow: /
您无需手动执行此操作,可通过我们的 Wordpress 插件 Spider Analyser 来拦截不必要的蜘蛛或者爬虫。