Friday, 23 August 2013

why my scrapy always tell me "TCP connection timed out"

why my scrapy always tell me "TCP connection timed out"

DEBUG: Retrying
(failed 2 times): TCP connection timed out: 110: Connection timed out.
ps: System is ubuntu, I can do this successfully:
wget http://www.dmoz.org/Computers/Programming/Languages/Python/Book/
the spider code:
#!/usr/bin/python
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls =
["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/","
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
for site in sites:
title = site.select('a/text()').extract()
link = site.select('a/@href').extract()
desc = site.select('text()').extract()
print title, link, desc

No comments:

Post a Comment