page refresh using mechanize, python
I want to track a page using mechanize. I am new to python and mechanize,
here are the steps that I am doing. Setup the browser using mechanize
(done only once in the start of the program)
br = mechanize.Browser()
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
Refresh the page after every 3 seconds. I am doing this by br.open(url) in
a loop with sleep of 3 seconds. Each time I open the url I parse it for
some information using BeautifulSoup.
The program works fine, but the issue is that when it is executing
br.open(url) on the 12th iteration of the loop. It gives this error:
File "filename.py", line 20, in subroutine_name br.open(url) File
"/Library/Python/2.7/site-packages/mechanize/_mechanize.py", line 203, in
open return self._mech_open(url, data, timeout=timeout) File
"/Library/Python/2.7/site-packages/mechanize/_mechanize.py", line 255, in
_mech_open raise response mechanize._response.httperror_seek_wrapper: HTTP
Error 403: Forbidden
I am not sure what the issue is, is there a better way to refresh a page
rather than opening it again and again in a loop, or should I initialize
the browser each time during the loop ?
Thanks for the help !
No comments:
Post a Comment