I work for a company that does a fair amount of web crawling (no, not that one), and recently there has been an ongoing discussion between the engineering side and the business side about various sundry details of URL validation. On a whim, I created this diagram (among others) to help facilitate that discussion.
Image may be NSFW.
Clik here to view.
Click for a larger image.