Capturing web pages

  • Hi,

    $150,000 sounds a lot of money!

    From VB add the MSINET.OCX

    Add the INET control to the page as Inet1.

    put the following code in the form onLoad event or under a button.

    Inet1.URL = "http://www.microsoft.com" 'replace with your url

    content = Inet1.OpenURL

    'wait until all of URL is read before trying to save it

    Do Until Inet1.StillExecuting = False

    Loop

    'save the URL to a file

    Open "C:\mydir\myfile.txt" For Output As #1

    Print #1, content

    Close #1

    Form1.Caption = "Complete"

    End

    You could easily modify the code and have it loop through a recordset containing all the URL's to collect stored in a table in your database.

    Assuming that the pages are HTML rather than XML; I would be more concerned with the parsing of the data once you have downloaded the pages. How can you guarentee that the sites in question will not make changes to the design/layout of their pages that render your data extraction useless?

    HTH

    Chris

  • quote:


    Try searching for something called Black Widow. It's sort of a many-purposed piece of software and it might be free. It can map web sites, download entire web sites or just specific types of files, links or email addresses.

    (I heard about this from a friend... Really! I've never used it! Really!)


    Yes, Black Widow was one of the apps I had in mind when I first responsed to this thread.

    Shouldn't be difficult to find, however be aware of the *might be* implication by using such a tool. Using this from office in the name of a company is one of the things only a fool would do.

    planet115's Excel sheet is really nice and easy to understand and I guess fully sufficient for the given task.

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks for all the replies.

    I've had a look at Planet115's excel app and it seems to do what I need - and it's legal.

    There is nothing sinister in what I am trying to do. I am just trying to replace the key strokes that a human does with ones controlled by a PC.

    Jeremy

  • >>There is nothing sinister in what I am trying to do.

    I'm sure! Perhaps the point being made is that it is *possible* to abuse such a tool. Given what you are trying to achieve, this would certainly be by accident rather than malicious intent or lack of consideration for others.

    It's good manners to make sure that your spider will back off for a reasonable time if it doesn't get the requested resource, and to limit the maximum number of retries to something reasonable.

    If possible, test the tool against your own webserver first.

  • planet115,

    RE: I have VBA code which performs HTTP GET and POST operations by calling the WinInet dll. You could run it directly from Excel and integrate it with your scraper macros. Let me know if you want it and I will email for you.

    Can you send me a copy of the VBA code you use for screan scraping?

     

    Thanks,

    John

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply