Sometimes the silliest little things are the most frustrating (Enough to send one screaming back to urllib...)


So, today was spent on (computer) house-cleaning and then looking at tools for the semi-automation of web-site reviews. It would seem to me that code something like this:
from twisted.web import client

df = client.getPage(
str(self.url),
timeout=20,
)


(though, obviously wrapped up in all the trappings of a Twisted app so that it gets called by the reactor) should retrieve the given url as a string, or trigger the error-back on the df. Instead, I get back an empty string as the content of the page.

Obviously I'm just missing something simple... which is what makes it frustrating to have spent almost 30 minutes trying to figure out what's going wrong. Oh well, sleep will probably make the answer magically appear in the morning.

Comments

  1. Chris

    Chris on 02/07/2007 9:58 p.m. #


    Did you ever figure this one out? I'm having the same problem 2.5 years later, to no avail. Been googling like crazy.. Even the example code on the TM site doesn't work as expected..<br />
    <br />
    ARGH!

  2. Chris

    Chris on 02/07/2007 10:26 p.m. #


    Why does posting something for all the world to see mean that you'll figure out your gaffe less than 20 minutes later?<br />
    <br />
    In case anybody else comes across this and has the same problems... If you're trying to pull the root page of a site... make sure you have a trailing "/" *sigh*<br />
    <br />
    I was trying to use "http://www.google.com" for the url, to no avail... then when I merely added the trailing slash to make it "http://www.google.com/", it worked fine... :p<br />
    <br />
    This isn't python or Twisted's fault, per se... without a proper path, the HTTP GET request is invalid and Google drops you immediately. It'd be nice if they at least gave you a 400... Even Microsoft is kind enough to do that... (which it should, based on the RFC)

Comments are closed.

Pingbacks

Pingbacks are closed.