What happens if you violate the terms of service and get flagged by the website owner? How can you help the site owners in contacting you, so that they can nicely ask you to back off to what they consider a reasonable level of scraping?
What you can do to facilitate this is add info about yourself in the User-Agent header of the requests. We have seen an example of this in robots.txt files, such as from amazon.com. In their robots.txt is an explicit statement of a user agent for Google: GoogleBot.
During scraping, you can embed your own information within the User-Agent header of the HTTP requests. To be polite, you can enter something such as 'MyCompany-MyCrawler ([email protected])'. The remote server, if tagging you in violation, will definitely be capturing this information, and if provided like this, it gives them a convenient means...