820980.jpg

工口画师+

求助一下会整爬虫的老哥

从github上找了个springboot的项目照着做,运行时一直没法正常启动爬虫咋回事?是要在application.yml里配置个代理还是咋样?

错误日志(不停止运行就一直报这个):
ERROR 20176 --- [      Thread-11] c.n.a.n.common.utils.RestTemplateUtil    : I/O error on GET request for "http://m.biquta.la/class/1/1.html": Connection reset; nested exception is java.net.SocketException: Connection reset
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://m.biquta.la/class/1/1.html": Connection reset; nested exception is java.net.SocketException: Connection reset at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:741) at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:684) at org.springframework.web.client.RestTemplate.getForEntity(RestTemplate.java:359) at com.novel.article.novels.common.utils.RestTemplateUtil.getBodyByUtf8(RestTemplateUtil.java:93) at com.novel.article.novels.common.crawl.BiqutaCrawlSource.parse(BiqutaCrawlSource.java:43) at com.novel.article.novels.common.listener.StartListener.lambda$contextInitialized$0(StartListener.java:54) at java.lang.Thread.run(Thread.java:748)Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87) at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48) at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:727) ... 6 common frames omitted



github上找的项目:https://github.com/Xunzhuo/English-Novel
上面项目对应的爬虫:https://github.com/201206030/novel/wiki/%E3%80%90Java%E3%80%91%E4%BD%BF%E7%94%A8%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F%E9%87%87%E9%9B%86%E6%95%B4%E7%AB%99%E5%B0%8F%E8%AF%B4%E6%95%B0%E6%8D%AE-%E5%B0%8F%E8%AF%B4%E7%B2%BE%E5%93%81%E5%B1%8B%E7%88%AC%E8%99%AB%E6%A8%A1%E5%9D%97%E7%9A%84%E8%AE%BE%E8%AE%A1%E4%B8%8E%E5%AE%9E%E7%8E%B0


775635.jpg

d88bc32a

连接被重置,大概率是被屏蔽了,需要代理,你可以浏览器访问下看通不通。

820980.jpg

工口画师+

回 1楼(d88bc32a) 的帖子

开了代理的时候浏览器可以打开网页
那我是直接用梯子的代理还是?

none.gif

MF2JY

回 2楼(工口画师+) 的帖子

在构造client的方法里添加本地代理。