Quick Search for:  in language:    
code,basically,merges,more,text,files,into,ma
   Code/Articles » |  Newest/Best » |  Community » |  Jobs » |  Other » |  Goto » | 
CategoriesSearch Newest CodeCoding ContestCode of the DayAsk A ProJobsUpload
Perl Stats

 Code: 75,356. lines
 Jobs: 26. postings

 How to support the site

 
Sponsored by:

 
You are in:
 
Login





Latest Code Ticker for Perl.
Message Sender
By sp on 1/15


Click here to see a screenshot of this code!Mailing List v2.0
By Aaron L. Anderson on 1/7

(Screen Shot)

ShowIMG
By Jeff Mills on 1/5


Click here to put this ticker on your site!


Add this ticker to your desktop!


Daily Code Email
To join the 'Code of the Day' Mailing List click here!

Affiliate Sites



 
 
   

Word list merger

Print
Email
 
VB icon
Submitted on: 7/1/2003 3:14:19 PM
By: Moth7 
Level: Intermediate
User Rating: Unrated
Compatibility:5.0 (all versions), Active Perl specific

Users have accessed this code 789 times.
 
 
     This code basically merges 2 or more text files into one more manageable file. It removes all duplicates and commented(#) lines in the files. Useful for custom dictionary files etc. It processes approximately 1mb of text in 3 seconds.
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
 
Terms of Agreement:   
By using this code, you agree to the following terms...   
1) You may use this code in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.   
2) You MAY NOT redistribute this code (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.   
3) You may link to this code from another website, but ONLY if it is not wrapped in a frame. 
4) You will abide by any additional copyright restrictions which the author may have placed in the code or code's description.

    =**************************************
    = Name: Word list merger
    = Description:This code basically merges
    =     2 or more text files into one more manag
    =     eable file. It removes all duplicates an
    =     d commented(#) lines in the files. Usefu
    =     l for custom dictionary files etc. It pr
    =     ocesses approximately 1mb of text in 3 s
    =     econds.
    = By: Moth7
    =
    = Inputs:merge.pl [listname] [outputfile
    =     name]
    If no paramaters are registered then the script prompts for manual input of filenames.
    =
    = Assumes:[listname] This is the path/na
    =     me of a file containg a list of the file
    =     s that you wish to merge
    [outputfilename] This is the path/name of the file that the listed files will be merged into
    =
    =This code is copyrighted and has    = limited warranties.Please see http://w
    =     ww.Planet-Source-Code.com/vb/scripts/Sho
    =     wCode.asp?txtCodeId=481&lngWId;=6    =for details.    =**************************************
    
    #C:\Perl\Bin\Perl.exe
    #
    #Moth Merger 1.0
    #2003 Moth7
    #
    if($ARGV[0])
    {
    	AutoMake($ARGV[0]);
    }
    else
    {
    	print "Moth Merger 1.0 Wordlist Merger\n";
    	print "Please specify an output file:";
    	$of = <stdin>;
    	chop $of;
    	&Begin;
    }
    sub Begin
    {
    	$dup = 0;
    	$wc = 0;
    	print "Please specify an input file:";
    	$if = <stdin>;
    	chop $if;
    	open(IF, $if);
    	@words = <IF>;
    	close(IF);
    	open(OF, ">>$of");
    	foreach $word(@words)
    	{
    		if(substr($word,0,1) eq '#')
    		{
    			$comments = $comments + 1;
    		}
    		elsif($used{"$word"} == 1)
    		{
    			$dup = $dup + 1;
    		}
    		else
    		{
    			print OF "$word";
    			$used{"$word"} = 1;
    			$wc = $wc + 1;
    		}
    	}
    	close(OF);
    	&Report;
    }
    sub Report
    {
    	print "$wc added to $of\n$dup duplicates in $if";
    	print "Add another file?(y/n):";
    	$response = <stdin>;
    	chop $response;
    	if($response eq 'y')
    	{
    		&Begin;
    	}
    }
    sub AutoMake
    {
    	open(LIST,$_[0]);
    	@files = <LIST>;
    	close(LIST);
    	open(Output,">>$ARGV[1]");
    	foreach $file(@files)
    	{
    		chop $file;
    		print "Processing $file...\n";
    		open(CUR,$file);
    		@words = <CUR>;
    		foreach $word(@words)
    		{
    			if(substr($word,0,1) eq '#')
    			{
    				$comments = $comments + 1;
    			}
    			elsif($used{"$word"} == 1)
    			{
    				$dup = $dup + 1;
    			}
    			else
    			{
    				print Output "$word";
    				$used{"$word"} = 1;
    			}
    		}
    	}
    	open(REPORT,">>$ARGV[1]-report.txt");
    	print REPORT "$word words added to $ARGV[1]\n$dup duplicates ignored\n$comments comments ignored\n";
    	close(REPORT);
    }


Other 2 submission(s) by this author

 

 
Report Bad Submission
Use this form to notify us if this entry should be deleted (i.e contains no code, is a virus, etc.).
Reason:
 
Your Vote!

What do you think of this code(in the Intermediate category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor See Voting Log
 
Other User Comments

 There are no comments on this submission.
 
Add Your Feedback!
Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.

NOTICE: The author of this code has been kind enough to share it with you.  If you have a criticism, please state it politely or it will be deleted.

For feedback not related to this particular code, please click here.
 
Name:
Comment:

 

Categories | Articles and Tutorials | Advanced Search | Recommended Reading | Upload | Newest Code | Code of the Month | Code of the Day | All Time Hall of Fame | Coding Contest | Search for a job | Post a Job | Ask a Pro Discussion Forum | Live Chat | Feedback | Customize | Perl Home | Site Home | Other Sites | About the Site | Feedback | Link to the Site | Awards | Advertising | Privacy

Copyright© 1997 by Exhedra Solutions, Inc. All Rights Reserved.  By using this site you agree to its Terms and Conditions.  Planet Source Code (tm) and the phrase "Dream It. Code It" (tm) are trademarks of Exhedra Solutions, Inc.