Show Bid Request
Java Spider
Bid Request Id: 66297
|
|
|
Posted by: |
softarchitect (2 ratings)
(Software buyer rating 10)
|
Non-action Ratio: |
Very Good - 0.00%
|
Buyer Security Verifications: |
Good
|
Approved on: |
Jun 2, 2003 2:34:36 PM EDT
|
Bidding Closes: |
Jun 5, 2003 1:45:29 PM EDT
|
Viewed (by coders): |
252 times
|
Deadline: |
6/23/2003
TIME EXPIRED
|
|
|
|
Description:
I'd like a web spider/crawler in Java. I'd like it to run on my Windows 2000 desktop.
It should take a html starting document and parse for links to .htm, .html and root domain type links. It should add these links to a queue to follow. The queue of links to spider and the links that have been visited should be stored in a Mysql database. Mysql will be running on the machine. Each spidered link should have its URL and Title stored in the database.
There should be a number of configuration options that are settable before the spidering starts. These could be stored a properties file.
a)maximum number of documents to spider from a given domain. If this number is reached then future links to that domain should be ignored and not added to the queue.
b)all info sent to the webserver when making request for each document should be configurable (eg. browser type, screen res, referring document)
c)spidering may be done through web proxy server(s) - a list of proxies should be specifiable. Where a number of proxies are used, each one should be used in turn.
d)the amount of bandwidth used should be configurable - some way to throttle or restrict the number of requests that are made per hour should be available.
The spider should have an on-screen presence with a scrolling description of the most recent actions taken. It should also have an error log for problems that it encounters and exceptions that are thrown.
It should be possible to stop the spider and start it from the point it left off.
If bidding for this project, please either state what you think a poor coder would be most likely to get wrong when coding this project and why you'd get it right.
Thanks for reading this specification.
Deliverables: 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request.
3) Complete ownership and distribution copyrights to all work purchased.
4) Database design and set-up script (for Mysql)
Platform:
Windows 2000. Java 1.3/1.4. MYSQL. Should work with all web proxies.
Must be 100% finished and received by buyer on:
Jun 23, 2003 EDT
Deadline legal notes: All times are expressed in the time zone of the site EDT (UT - 5). If the buyer omitted a time, then the deadline is 11:59:59 PM EDT on the indicated date.
Remember that contacting the other party outside of the site (by email, phone, etc.) on all business projects < $500 (before the buyer's money is escrowed) is a violation of both the software buyer and seller agreements.
We monitor all site activity for such violations and can instantly expel transgressers on the spot, so we thank you in advance for your cooperation.
If you notice a violation please help out the site and report it. Thanks for your help.
|
|
Bidding/Comments:
|
All monetary amounts on the site are in United States dollars.
Rent a Coder is a closed auction, so coders can only see their own bids and comments. Buyers can view every posting made on their bid requests. |
See all rejected bids (and all comments)
Name |
Bid Amount |
Date |
Coder Rating |
|
|
|
This bid was accepted by the buyer!
|
$120 (USD)
|
Jun 2, 2003 4:03:42 PM EDT
|
9.09
(Superb)
|
|
|
Hi, i have already done this project for my university.I finished Network security course and this was my project.I has some additional functionality.It also can be set to download some files(for example in config file you must set that all files .mp3 must be downloaded).It is multithreaded so requests are concurent.You can also control number of concurent requests.I also use regular expressions to parse for links in returned html code.You are free to ask me what you like about this project.I will answer you at once.I am sure i can do it to fit all your requirements.I have 3 years experience with Java language. Multithreading and detailed knowledge of HTTP protocol are very important for this project. Choose my bid and you will not be disappointed.
greetings
|
|
|
|
|
|