Quick Search for:  in language:    
HTML,CGI,Webspider,Perl,script,given,start,pa
   Code/Articles » |  Newest/Best » |  Community » |  Jobs » |  Other » |  Goto » | 
CategoriesSearch Newest CodeCoding ContestCode of the DayAsk A ProJobsUpload
Perl Stats

 Code: 74,273. lines
 Jobs: 25. postings

 How to support the site

 
Sponsored by:

 
You are in:
 
Login





Latest Code Ticker for Perl.
Click here to see a screenshot of this code!Mailing List v2.0
By Aaron L. Anderson on 1/7

(Screen Shot)

ShowIMG
By Jeff Mills on 1/5


Simple Perl Ping
By John Hass on 12/29


Very basic login script template with cookies
By Aaron L. Anderson on 12/29


Click here to put this ticker on your site!


Add this ticker to your desktop!


Daily Code Email
To join the 'Code of the Day' Mailing List click here!

Affiliate Sites



 
 
   

Wb Spider

Print
Email
 
VB icon
Submitted on: 4/8/2002 9:43:38 PM
By: nfs 
Level: Advanced
User Rating: By 2 Users
Compatibility:5.0 (all versions), Active Perl specific, 4.0 (all versions)

Users have accessed this code 4790 times.
 
(About the author)
 
     Webspider is a Perl script that, when given a start page, will "follow" every link it finds, scanning the HTML code for the use of CGI's. After Webspider has gone over all available links it will report every CGI used on the web site.
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
 
Terms of Agreement:   
By using this code, you agree to the following terms...   
1) You may use this code in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.   
2) You MAY NOT redistribute this code (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.   
3) You may link to this code from another website, but ONLY if it is not wrapped in a frame. 
4) You will abide by any additional copyright restrictions which the author may have placed in the code or code's description.

    =**************************************
    = Name: Wb Spider
    = Description:Webspider is a Perl script
    =     that, when given a start page, will "fol
    =     low" every link it finds, scanning the H
    =     TML code for the use of CGI's. After Web
    =     spider has gone over all available links
    =     it will report every CGI used on the web
    =     site.
    = By: nfs
    =
    =This code is copyrighted and has    = limited warranties.Please see http://w
    =     ww.Planet-Source-Code.com/vb/scripts/Sho
    =     wCode.asp?txtCodeId=309&lngWId;=6    =for details.    =**************************************
    
    # Hmmm, why on earth would we need a socket ?
    use Socket;
    sub preps() {
    if ($ARGV[2] eq '') { 
    print "\n\nUsage: perl webspider_1.1.pl <proxy server> <proxy port> <URL>\n";
    print "Example: perl webspider_1.1.pl proxy.pandora.be 8080 http://www.microsoft.com/\n";
    exit;
    }
    $proxy = $ARGV[0];
    $port = $ARGV[1];
    @currentlayer[0] = $ARGV[2];
    $layer = "10";
    $maxcurrentlayerteller = "100";
    $noname = "WebSpider 1.1";
    @currentlayer[$currentlayerteller] =~ s/http:\/\///g ;
    ($server, $dir, $file) = split(/\//, @currentlayer[$currentlayerteller]);
    $logfile = "WebSpider_Log.txt";
    @currentlayer[$currentlayerteller] = "http://@currentlayer[$currentlayerteller]";
    @dontignore[1] = ".html";
    @dontignore[2] = ".xml";
    @dontignore[3] = ".asp";
    @dontignore[4] = ".php";
    @dontignore[5] = ".htm";
    $prepsdontignoreteller = 0 ;
    while (@dontignore[$prepsdontignoreteller] ne '') { print "Don\'t Ignore: @dontignore[$prepsdontignoreteller]\n"; $prepsdontignoreteller++; }
    }
    sub LogToFile() {
    open(OUTF, ">>$logfile");
    print OUTF "$layerteller $currentlayerteller @foundcgi[$foundcgiteller] http://@currentlayer[$currentlayerteller]\n";
    close(OUTF);
    }
    sub CheckCGIHistory() {
    $cgihistoryteller = 0 ;
    $cgiwasinhistory = 0 ;
    while (@cgihistory[$cgihistoryteller] ne '') { if (@cgihistory[$cgihistoryteller] eq @foundcgi[$foundcgiteller]) { $cgiwasinhistory = 1; } $cgihistoryteller++; }
    if ($cgiwasinhistory != 0) { $foundcgiteller-- ; } else { @cgihistory[$cgihistoryteller] = @foundcgi[$foundcgiteller] ; print "$layerteller:$currentlayerteller @foundcgi[$foundcgiteller]\n"; LogToFile(); }
    }
    sub CheckHistory() {
    $historyteller = 0 ;
    $wasinhistory = 0 ;
    while (@history[$historyteller] ne '') { 
    if (@history[$historyteller] eq @nextlayer[$nextlayerteller]) { 
    $wasinhistory = 1; 
    $placeinhistory = $historyteller ;
    } 
    $historyteller++; 
    }
    if ($wasinhistory == 0) { 
    @history[$historyteller] = @nextlayer[$nextlayerteller] ;
    } else { 
    @nextlayer[$nextlayerteller] = "";
    $nextlayerteller-- ; 
    }
    }
    sub itcontainslocation() {
    ($temp, $link) = split(/ /, @response[$responseteller]);
    if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
    CheckHistory() ;
    $nextlayerteller++ ;
    }
    sub itcontainshref() {
    ($temp, $therest) = split(/href=\"/, @response[$responseteller]);
    ($link,$temp) = split(/\"/, $therest);
    if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
    CheckHistory() ;
    $nextlayerteller++ ;
    }
    sub itcontainsscr() {
    ($temp, $therest) = split(/scr=\"/, @response[$responseteller]);
    ($link,$temp) = split(/\"/, $therest);
    if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
    CheckHistory() ;
    $nextlayerteller++ ;
    }
    sub itcontainsaction() {
    ($temp, $therest) = split(/action=\"/, @response[$responseteller]);
    ($cgi,$temp) = split(/\"/, $therest);
    if ($cgi =~ /(.*)http:\/\/(.*)/) { $tempfoundcgi = "$cgi"; } else { $tempfoundcgi = "http://$server/"; if ($dir ne '') { $tempfoundcgi = "$tempfoundcgi$dir/$cgi"; } else { $tempfoundcgi = "$tempfoundcgi$cgi"; } }
    @foundcgi[$foundcgiteller] = $tempfoundcgi ;
    CheckCGIHistory() ;
    $foundcgiteller++ ;
    }
    sub parse() {
    $serverIP = inet_aton($proxy);
    $serverAddr = sockaddr_in($port, $serverIP);
    socket(SOCKET, PF_INET, SOCK_STREAM, getprotobyname('tcp')); 
    if (!connect(SOCKET, $serverAddr)) { print "Could not connect, try another proxy server.\n"; exit ; }
    # Send the URL 
    print "Sending: GET http://@currentlayer[$currentlayerteller] HTTP/1.0\n";
    send(SOCKET,"GET http://@currentlayer[$currentlayerteller] HTTP/1.0\n\n",0);
    @response=<SOCKET>;
    $responseteller = 0 ;
    while (@response[$responseteller] ne '') {
    chomp (@response[$responseteller]);
    # Convert everything to lowercase...
    @response[$responseteller] = "\L@response[$responseteller]\E";
    # If we get a 302...
    if (@response[$responseteller] =~ /(.*)Location:(.*)/) { itcontainslocation() ; }
    # If we get a 200...
    if (@response[$responseteller] =~ /(.*)href=(.*)/) { 
    $dontignoreteller = 0 ;
    $dontignoreit = 0 ;
    # If the link is not in the @dontignore-list, $dontignoreit stays 0
    while(@dontignore[$dontignoreteller] ne '') { if (@response[$responseteller] =~ /(.*)@dontignore[$dontignoreteller](.*)/) { $dontignoreit = 1 ; } $dontignoreteller++; }
    if ($dontignoreit == 0) { itcontainshref(); }
    }
    # Site has frames...
    if (@response[$responseteller] =~ /(.*)scr=(.*)/) { itcontainsscr() ; }
    # CGI found...
    if (@response[$responseteller] =~ /(.*)action=(.*)/) { itcontainsaction() ; }
    $responseteller++;
    }
    }
    ################
    # MAIN PROGGIE #
    ################
    print "\nPreparing...";
    preps();
    print "Done.\n";
    for ($layerteller=0;$layerteller<$layer;$layerteller++) {
    for ($currentlayerteller=0;$currentlayerteller<$maxcurrentlayerteller;$currentlayerteller++) {
    @currentlayer[$currentlayerteller] =~ s/http:\/\///g ;
    ($server, $dir, $file) = split(/\//, @currentlayer[$currentlayerteller]);
    if (@currentlayer[$currentlayerteller] ne '') { parse(); }
    }
    @currentlayer = @nextlayer ;
    $nextlayerteller = 0 ;
    }


Other 1 submission(s) by this author

 

 
Report Bad Submission
Use this form to notify us if this entry should be deleted (i.e contains no code, is a virus, etc.).
Reason:
 
Your Vote!

What do you think of this code(in the Advanced category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor See Voting Log
 
Other User Comments
4/8/2002 10:05:44 PM:vsim
It works,Thanks.
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
4/10/2002 3:43:32 AM:Harry
what if i do not have a proxy server?
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
4/16/2002 5:22:28 PM:New User
I was looking for it and its Good so 
5*,Thanks
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
6/22/2002 4:19:45 PM:boujouj
tried to run it; ful of syntax 
errors.
does not work for me.
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
3/8/2003 1:21:32 AM:
Could not connect , try another proxy 
server
Could not connect , try another 
proxy server
Could not connect , try 
another proxy server
Could not connect 
, try another proxy server
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
8/14/2003 1:34:29 PM:
wow, looks just like another script i 
found on PacketStorm... Looks like 
someone is stealing code?
Keep the Planet clean! If this comment was disrespectful, please report it:
Reason:

 
Add Your Feedback!
Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.

NOTICE: The author of this code has been kind enough to share it with you.  If you have a criticism, please state it politely or it will be deleted.

For feedback not related to this particular code, please click here.
 
Name:
Comment:

 

Categories | Articles and Tutorials | Advanced Search | Recommended Reading | Upload | Newest Code | Code of the Month | Code of the Day | All Time Hall of Fame | Coding Contest | Search for a job | Post a Job | Ask a Pro Discussion Forum | Live Chat | Feedback | Customize | Perl Home | Site Home | Other Sites | About the Site | Feedback | Link to the Site | Awards | Advertising | Privacy

Copyright© 1997 by Exhedra Solutions, Inc. All Rights Reserved.  By using this site you agree to its Terms and Conditions.  Planet Source Code (tm) and the phrase "Dream It. Code It" (tm) are trademarks of Exhedra Solutions, Inc.