|
| | Terms of Agreement:
By using this article, you agree to the following terms...
1) You may use
this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
2) You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.
3) You may link to this article from another website, but ONLY if it is not wrapped in a frame.
4) You will abide by any additional copyright restrictions which the author may have placed in the article or article's description. | Introduction to writing your very own Web Clients Welcome to this brief tutorial. This tutorial will outline the creation of simple perl scripts which have the capability of requesting HTML data from the internet, storing it in a file, mirroring it, printing it, and searching it. Perl: You've learned how to write scripts, parse text, blah blah... now what? Sure, its nice to save a text file to your hard drive... rename it... and... uh- save another text file after that. But surely there has to be somthing else- beside cgi- that perl can be used for. Here's an answer to that question- just one of the thousands found at www.cpan.org. Within five minutes you'll be writing perl scripts capable of retreaving webpages. Cool, eh? Let's begin. The first step to writing a www aware perl script is to download the libwww-perl module. This module features a non-object oriented package, and a object oriented package. For your conveniance, I'll cover the non-object oriented (it's faster and does the same stuff). To download this perl module, simply go here:
http://www.cpan.org/authors/id/GAAS/libwww-perl-5.63.tar.gz
For those wishing to do this on the Windows platform, you can find this at the ActiveState website (www.ActiveState.com, I believe)- if not, search either google for libwww-perl, or www.cpan.org, for a windows version. This tutorial was written on a linux machine, thus, I wont truly be able to refer you to the windows download location. Once you've downloaded it, un tar/zip it to a new directory. Enter the shell/DOS, change directory to the directory you unzipped/tarred to, and type the following:
perl Maketest.PL
-wait for this program to complete. If it cannot find this this file, type dir on dos, or ls on unix to view the content of the directory. Type perl and whatever file you have that ends in .PL (capitals)
Next, type
make
-wait for this to finish, then type
make test
-and then
make install
once you type all four of these commands(perl Makefile.pl -> make -> make test -> make install), your copy of Perl will have been patched to recognize the wwwlib-perl module.
Next, open up your favorite perl editor, and create a new file. In this file, you must say- "Hey, Perl-- I want to use the wwwlib-perl thingie I just installed, so stick it in my code" Which can be roughly translated into perl-speak by typing
use LWP::Simple;
So, not much of a program yet, eh? What LWP stands for is libwww-perl (Duh)... which should help you remember it. What next? Well, you've got a program that knows this modules there- but... how do you use it?
Simply! This package is called "Simple" for just that reason. Go figure. What you'll find is that your copy of Perl has suddenly been expanded to house several brand spanking new functions- thats right, simple, old fashioned, functions.
1. get($url);
2. getstore($url, $filename);
3. getprint($url);
4. mirror($url, $filename);
Oohh, ahhh! I'm sure you guys could memorize those right now. Let me tell you what each of those do, and how your expected to use them. First, you've got get($url). Simply take a scalar variable- lets say $html, and assign it to get("http://get_this_url/"). You can replace the string inside the get function with the url of whatever website you'd like to get. SO, This is an example of doing just that:
use LWP::Simple;
$html = get("http://www.megathink.com");
print $html;
A program that gets a webpage's html and crams it into a scalar
Don't forget the use LWP::Simple; at the top! That can fudge everything up. Let's say you wanna do this real quick- illiminate the need for a variable at all. Well- you've got the getprint($url); function to do that! Simply type getprint("http://www.whatever.com/");, and the program will automatically print the html of that website. What if you wanna store the html to a file on your harddrive- for backup. Or, let's say you want to start your own cache of websites. That's just as easy! Type getstore("http://url", "stored.htm");, and badda bing, badda boom- you've got a new .htm file in your directory, loaded to the brim with whatever URL you requested. A working example, you say? Sure!
use LWP::Simple;
getstore("http://www.megathink.com", "temp.htm");
Well- thats cool, huh? No? You want to be able to only store a website when it has changed-- like google does, for example? That's no big deal-- change the function in the last example ("getstore"), to mirror- leaving the parameters as they are- and the program will only store the html file if it has changed from the version already stored on disk. Cool? Yup! At this point, you have a ton of things you can do. Let's say you want to check for dead links. Simple get the html into a variable ($html = get("http://www.google.com/");), and use a couple search strings and splits on it until you find all the <.a href.> tags-- next, find the src= parameter, and add each URL to an array. Create a loop (foreach $i(@array_of_links){}) to cycle through each, and attempt to connect to them.If the link is bad, get() will return a false string ("") [there is nothing between the quotes]. Otherwise, the get will return the html. I don't want to ruin this for you- since I'm sure you'd love to try it on your own [yeaaaa, riiight]. Another "creative" idea is to write a proxy. If you run Apache on your machine (or IIS for you windowers), you can now use these functions in cgi programs. YUP! Think of the bucks you can make for writing a proxy to get past that god forsaken netnanny, or bess, software your school/home/office forces onto you. Simply write a script to accept a query_string of a URL, and use getprint() to display it. Cha-ching! I hope you enjoyed reading this tutorial, and continue to contribute to the free Perl community. It was my pleasure writing this lil' file. Please post feedback so I know whether or not I'm actually helping.
Happy New Year!
| | Other 3 submission(s) by this author
| | | Report Bad Submission | | | Your Vote! |
See Voting Log | | Other User Comments | 1/2/2002 6:42:03 PM:T. E. Geek I just wanted to say thank you to those
who have voted. I'm always encouraged
when someone benefits from somthing I
do. Thank you :-) (If you'd like me to
write a tutorial on anything you don't
understand, simply post it- I'll write
it asap)
| 1/18/2002 5:42:05 AM:Scratch Monkey Very nice tutorial, well written and
easy to understand even for a complete
Perl lameo such as myself. Keep up the
good work, looking forward to more of
your tutorials in the future.
| 1/18/2002 5:23:34 PM:T. E. Geek Thank you very much for your warm
comment :-)
I'm open to suggestions:
If you guys have anything you'd like a
tutorial written for- just post it as a
comment. If no one says anything, I'll
write a tutorial on writing a POP3
client- and eventually- how to use the
AIM module to create chat bots
;-)
Thanks for the support!
| 1/22/2002 4:54:06 PM:Tr1pX Write a tutorial for making a client
server application where you describe
how to make the client comunicate with
the server. please send a reply to this
to tr1px@hackermail.com
| 1/26/2002 9:30:39 AM:kamal this script is so good. umph!
| 2/19/2002 3:30:13 PM:Terry Paul Thanks+for+sharing+your+knowledge+of+the
+subject%2E+It%27s+really+kewl+that+ther
e+are+still+people+like+you+out+there+do
ing+good+and+sharing+what+you+know%21+Ke
ep+up+the+awesome+work+bro%21
| 3/3/2002 1:49:03 PM:Taylor I'd like to know how to connect to AIM
using Cold Fusion, ASP, VB, or Perl.
| 3/10/2002 4:02:53 AM:Flaxus I got a good laugh and learned
something usefull at the same time.
Thanks a million
| 3/18/2002 3:59:53 AM:Jay How to use these functions are
explained in the wwwlib manpage... This
article was not needed.
| 3/19/2002 12:11:33 AM:T. E. Geek Thats a good point. It should also be
noted that C++, C, php, perl, sql, vb,
java, pascal, kylix, delphi, xml,
JavaScript, VBScript, Bash, Ata, Lisp,
Basic, Cobol, and for you mac users,
Applescript are all various other
computer-related utilities which come
with documentation. Surely, one could
aster any of these languages quickly
and easily with the documentation
provided by each. (continued)
| 3/19/2002 12:11:53 AM:T. E. Geek (continuation) My only question is why
amazon.com offers all of those
superfluous books on C++, OpenGL, ad
nauseum, when anyone could simply learn
what they crave through
documentation.Documentation is not
always user-friendly. I had hoped that
this brief introduction would be a bit
more friendly and easy to understand
than the documentation provided.
Furthermore,had you read the title of
this article, you would realize that
this posting was intended for those
that do not know about PWL to begin
with.Although this article was not
intended for a user of your level, I
still regret that my work was not to
your liking.
| 3/20/2002 12:13:25 AM:T. E. Geek On a happier note, as soon as life
releases me from my utterly boring
school-related responsibilities, I am
going to write a tutorial on the
Net-AIM module; perhaps even some basic
chatter-bot theory. Thank you for being
so supportive of this article! I
promise many more in the future.
| 4/5/2002 1:44:31 AM:Marcel First of all thx for your Manual. I
still keep on smiling.
In order to
provide your Manual, i want to complete
the Information for
Windows-Perl-Users.
- Since
"ActivePerl 5.6.0.613" the LWP-Module
is included ('think it was distributed
in the Year 2000...)
- It fits
;-)
Last not least: This article was
needed.
Keep on going this way.
| 4/10/2002 7:39:15 PM:Allen ##########################
use
LWP::Simple;
use strict;
use
LWP::UserAgent;
use CGI qw (
:standard);
print "Content-type:
text/html\n\n";
my
$url='http://www.yahoo.com';
my $con =
get $url;
print
"$con";
########################
Quest
ions:
1) It works fine and gets the
whole page info of
http://www.yahoo.com
but PROBLEM:
if
I switch to a page to this web page,
get nothing.
Steps
Replace:
my
$url='http://merchantaccount.quickbooks.
com/j/mas/signup';
2) When I am in a
webpage, such as yahoo page, I would
like to select a radio button and press
"Next" to continue. How could I modify
the above to do it.
Need help on
it.
Thanks
Allen...
| 5/3/2002 3:23:28 PM:Ddl_Smurf Hey, thanks, that looks simple enough.
I do not code in perl, but it looks
like a great language. I have
experience with Delphi and VB, and C,
and quite a few others, but I'm really
having trouble getting through perl
tutorials. Could you direct me to one
that is as simple and friendly as yours
?
Thanks, Best regards.
| 5/3/2002 8:25:06 PM:T. E. Geek Well! I did write an introductory
tutorial to the perl language itself
awhile back ;-) If you're interested,
parse the perl planetsourcecode
directory, and see if you can find it.
It should help you get your hands dirty
;-)
| | Add Your Feedback! | Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.
NOTICE: The author of this article has been kind enough to share it with you. If you have a criticism, please state it politely or it will be deleted.
For feedback not related to this particular article, please click here. | | |